미래의 보건의료 리더 선발을 위한 MMI의 신뢰도 향상 (Acad Med, 2011)

Enhancing the Reliability of the Multiple Mini-Interview for Selecting Prospective Health Care Leaders

Sebastian Uijtdehaage, PhD, Lawrence “Hy” Doyle, EdD, and Neil Parker, MD





미국에서 효과적이고 접근가능한 의료 제공과 관련한 현재의 위기는 미국 의과대학 학부 프로그램에 듀얼-학위 리더십 프로그램을 낳았다. Program in Medical Education (PRIME), David Geffen School of Medicine at UCLA, UCLA-PRIME

The current crisis in providing effective and accessible health care in the United States has spawned a number of dual- degree leadership programs for medical undergraduates.1

  • In 2005, the University of California (UC) initiated an ambitious initiative, the Program in Medical Education (PRIME), to increase enrollment in its medical schools in order to address the needs of California’s disadvantaged populations.2,3
  • In 2007, at the David Geffen School of Medicine at UCLA, UCLA-PRIME was developed as a five-year dual-degree program focused on the development of leadership skills in 18 medical students per year whose career goals would be to improve health care for the disadvantaged and medically underserved.


미래의 의사를 선발하는 것은 종종 몇 가지 이유로 실패하곤 한다.

The selection of future physicians, however, often fails on several accounts.4

  • GPA나 MCAT같은 인지적 성취기록이 비인지적 특성을 무시하게끔 한다.
    First, the cognitive record of the applicant, that is, grade point average (GPA) and Medical College Admission Test (MCAT) scores, commonly overrides any consideration of noncognitive attributes in decisions to admit.5
  • 지원자들로부터 확인하고자 하는 비인지적 특징들이 불명확하고, Implicit하고 합의되지 않았다.
    Second, the noncognitive qualities sought in applicants are unclear, remain implicit, and are not necessarily agreed on by stakeholders.
  • 합의되고 명확한 경우에도 신뢰도와 타당도를 갖춘 평가법이 적다
    Third, even if a set of desirable noncognitive qualities for candidates is clear and agreed on, reliable and valid assessment methods are scarce. This is particularly true for characteristics such as altruism, empathy, and leadership.
  • 전체 입학 프로세스가 투명하거나 uniformly 적용되는 경우가 적다.
    Furthermore, the entire admissions process is rarely transparent or uniformly applied.


불행하게도, 입학 면접은 맥락-특이적이다. 지원자의 응답이 면접관, 질문, 그 외 요인 등에 따라 달라질 수 있다는 것이다. Kreiter 등은 입학면접의 variance component에 대해서 지원자들로부터 기인하는 변인성분이 지원자-상황 상호작용 성분보다 작다고 보고했다. 이런 유사한 결과가 전통적 면접의 신뢰도가 부적절하며, 따라서 타당도도 의문을 가지게 됨을 시사한다.

Unfortunately, admissions interviews are, like many other assessments, prone to “context specificity.”7 That is, the performance of an applicant during the interview may depend to an important extent on the particular interviewer, the specific questions asked, or other factors irrelevant to the applicant’s suitability. Indeed, Kreiter and colleagues8 studied the variance components of admissions interview scores and found that the variance component attributable to applicants was smaller than variance component attributable to the applicant- by-occasion interaction. These and similar findings imply that traditional interviews may have inadequate reliability and, thus, questionable validity.


Eva 등이 최초로 연구한 MMI는 학부졸업생을 대상으로, 의과대학 지원자들이라는 상대적으로 이질적진 집단에서 연구되었다. 이는 신뢰도 결과를 부풀리는 결과를 가져왔을 수 있다. Eva 등이 이후 연구에서 밝힌 바와 같이 "어떤 평가의 신뢰도와 타당도는 그 전략이 적용되는 맥락이나 평가의 내용에 따라 달라진다"라고 하였고, 다른 말로는 MMI의 우수한 psychometric properties는 더 균질한 집단에서는 보장되지 않을 수 있는 것이다.

The initial MMI study by Eva and colleagues12 was conducted on graduate students, a relatively heterogeneous group compared with a pool of medical school applicants. This may have inflated their reliability results. As Eva and colleagues22 put forth in a subsequent article, “the reliability and validity of any assessment strategy is dependent on the context in which the strategy is applied and the content of the assessment.” In other words, the promising psychometric properties of the MMI may not necessarily hold up for a more homogenous pool of applicants who have been selected for consideration on the basis of a more specific set of attributes.



방법

Method


우리는 우선 델파이 접근을 통해서 리더십과 취약계층에 대한 헌신에 초점을 둔 UCLA-PRIME 지원자가 갖추어야 할 바람직한 특성의 인벤토리를 만들었다. 

First, we generated an inventory of the desirable characteristics of UCLA-PRIME candidates with a focus on leadership and commitment to disadvantaged populations using a Delphi approach among stakeholders (program administrators, deans, faculty members, and community leaders). We described the details of the Delphi study elsewhere.23 Characteristics that were deemed essential for the PRIME program included

  • 헌신 commitment to and experience with underserved populations,
  • 문화적 민감성 cultural sensitivity,
  • 리더십 잠재력 leadership potential,
  • 성숙 maturity, and
  • 효과적인 팀 구성원 되기 being an effective team member.


연구 1

Study 1 (2009)



In 2009, we created a panel of 28 interviewers consisting of 18 faculty members, 6 medical students, and 4 community members.


  • On the day of the MMI, we handed out the scenarios and a list of applicants to the interviewers.
  • The interviewers practiced the scenarios with each other before the applicants arrived.
  • We instructed the interviewers to rate the overall performance of the applicant using a seven-point Likert scale (1 unsatisfactory; 7 outstanding).
  • Specifically, we asked themto “consider the applicant’s communication skills, strength of the argument, and suitability for the medical profession.
  • We strongly encouraged the interviewers to use the full rating scale, recognizing that interviewees had been selected from a very large pool of applicants and exceeded all other admissions requirements. Interviewers scored the applicants immediately after each interview.
  • They could adjust their scoring after they completed interviewing the entire cohort.
  • A total score was calculated for each applicant by summing the scores for individual stations. Thus, total scores could range from12 through 84.




연구 2

Study 2 (2010)

 

몇 가지 변화

  • 장소 변화 First, we moved the MMI venue to our education building and used adjacent rooms typically used for small-group teaching of medical students. The applicants could familiarize themselves with the layout of the facility before commencing the MMI. 
  • 쉬운 문항을 어려운 문항으로 Second, we replaced an easy station (Station 9, “How did you prepare for this interview?”) with a perhaps more challenging task in which applicants were asked to describe student characteristics desirable for the PRIME program. Difficulty level was not assessed formally but was suggested by the fact that interviewers had difficulty differentiating performance of the applicants in the original station. The remaining 11 stations were the same as in 2009. 
  • Normative scoring rubric으로 Third, we asked the interviewers to rate the performance of an applicant relative to the pool of all applicants. Accordingly, we changed the seven-point Likert-scale anchors to a normative scoring rubric (1 bottom15%; 4 middle 50%; 7 top 15%). 
  • 워딩 수정 Finally, we changed the wording of two stations that previously led to confusion among some applicants. In 2009, one station asked the applicants to discuss “surgeons’ mortality rates.” A few applicants proceeded to discuss the mortality rate of surgeons and not their patients. In 2010, we changed the prompt to “surgeons’ patient mortality rates.” In another station, we replaced the term “SARS epidemic” with the more recent “H1N1 epidemic” but left the crux of the station the same.





결과

Results


연구 1

Study 1 (2009)


분포가 최대치 점수쪽으로 치우쳐져 있음

The distribution of the total MMI scores, however, was skewed toward the maximum score, suggesting that interviewers had difficulty using the lower range of the rating rubric (Figure 1).

 

 


 

연구 2

Study 2 (2010)

 

 





 


고찰

Discussion


MMI가 균일한 지원자 집단에 대해서도 효과적으로 사용가능하다.

Our study showed that the MMI can be effectively used to assess a homogeneous group of applicants and that its reliability can be enhanced with minor changes in protocol.


처음 2009년에 도입된 MMI의 신뢰도는 0.58이었고 다른 연구의 보고된 결과보다 낮았다. 1차와 2차 지원 정보를 통해서 취약계층에 대한 강한 헌신을 보이는 학생을 일차적으로 스크리닝했기에 상대적으로 균일한 지원자 집단이었다. 이러한 균일성과 작은 표본크기가 variability를 작게 만들었을 수 있다.

Reliability of the first MMI implementation in 2009 was 0.58—lower than reported elsewhere. Our interviewees were a relatively homogenous group of applicants because initial screening considered primary and secondary application information that demonstrated a strong commitment to disadvantaged populations. This homogeneity and the smaller sample size may have resulted in comparatively less variability among the interviewees and could have suppressed the reliability of the overall MMI assessment as estimated by the generalizability coefficient.


2010년에는 몇 가지 변화를 가져왔고 이것들이 신뢰도에 기여한 것으로 보인다. 하나는 쉬운 스테이션을 어렵게 바꾼 것인데, 지원자 간 구분discrimination을 촉진하기 위해서는 적절한 수준의 난이도를 유지해야 한다. IRT에서는 중간 난이도가 가장 변별력이 있다고 제안한다.

We made a few changes in the 2010 implementation of the MMI process that, all taken together, seemed to have contributed to a substantial improvement of the reliability. One such change was the replacement of a seemingly “easy” station (determined at face value) with a more challenging one. To facilitate discrimination between applicants, the stations must have an optimal level of difficulty. Item response theory suggests that items of median difficulty best discriminate between groups with either high or low magnitude of a latent trait.28


실제로, 우리의 결과를 보면 쉬운 스테이션은 단순히 '시그널에 노이즈만 더한' 결과를 가져왔다. 우리가 쉬운 스테이션을 제외하고 신뢰도를 분석하면 신뢰도가 상승하였고, 이는 한 평가 포인트를 제외했을 때 신뢰도가 감소할 것이라는 일반적 기대와 다른 결과이다.

And, indeed, our analysis showed that an easy station simply “added noise to the signal.” When we recalculated the reliability excluding Station 9, the reliability improved; it did not decrease, as one would expect when taking away one assessment point.


2010년 연구에서 평가자들은 채점 anchor를 하위 15%, 하위 30%, 중위 50% 등으로 바꿨을 때 더 전체 평가 스케일을 사용할 수 있었던 것으로 드러난다. 이러한 채점방법을 통해서 우리는 지원자들의 순위를 매길 것을 권장한 것이다. 면접관들은 13명의 지원자를 본 이후에 점수를 보정할 수 있게 하였으며 2009년에도 이는 동일하였다.

In our 2010 study, the interviewers seemed better able to use the full range of the rating scale after we changed its anchors to “bottom15%,” “bottom30%,” “middle 50%,” etc., and asked interviewers to rate an applicant’s performance relative to the pool of all applicants. Thus, we encouraged rank- ordering of candidates with a more normative approach of scoring. Interviewers could adjust their scoring after having seen a cohort of 13 applicants (and this was allowed in the 2009 study as well).



MMI를 도입하는 것은 가능하긴 하지만, 여전히 부담스러운 일이다.

We found that implementing MMIs was feasible but a daunting task nonetheless.


 

인적자원이 많이 들어간다. 준비할 것이 많다(securing space, identifying appropriate interview questions, interviewer training, etc.). 그러나 이러한 비용은 각 평가자가 지원자 풀을 평가하는데 들어가는 시간이 덜 들어가는 것으로 보상된다. 면접관이 보고서를 작성거나 위원회 회의에 들어가는 시간 등을 고려하면 시간의 절감 효과는 더 크다.

Clearly, the MMI requires extensive human resources. In a recent cost- efficiency analysis, Rosenfeld et al29 found that MMI requires more upfront preparation (securing space, identifying appropriate interview questions, interviewer training, etc.) compared with the traditional interview process. This cost, however, was offset by considerably fewer hours required of each person to assess a pool of applicants. We would note that the time saving is even more considerable if the time spent by interviewers in writing reports and attending committee meetings in which applicants are discussed is taken into account. 



한계점. Validity를 평가하지 않았음.

Our study has several limitations. First, we did not assess the validity of the MMI process even though one could argue that blueprinting the MMI stations based on our Delphi study provided an acceptable level of content validity.


이 영역의 연구는 널리 사용되나 여전히 잘 정의되지 않는 용어인 '비인지적 특성'이라는 용어로 인해서 제약을 받는다. Norman이 지적한 바와 같이 'noncognitive skills'라는 용어는 MCAT점수나 GAP점수가 반영하지 않는 특성을 의미하며, 여기에는 tacit knowledge, communication skills, emotional intelligence, and stable personality traits 등이 포함된다. 입학위원회는 의사로서의 진로와 의료행위, 그리고 기관의 철학과 목적에 맞춰 이러한 특성이 무엇인지 명확히 정의해야 할 것이다.

Research in this area is hampered by the ubiquitous but ill-defined term “noncognitive characteristics.” As Norman32 pointed out, the umbrella term“noncognitive skills” is used to describe those characteristics that MCAT score or GPA do not reflect, such as tacit knowledge, communication skills, emotional intelligence, and stable personality traits. We feel that admissions committees must explicitly define those qualities they deem essential for a successful medical school career and subsequent practice and that are in concordance with the institution’s philosophy and goals.




 




 

 



1 Crites GE, Ebert JR, Schuster RJ. Beyond the dual degree: Development of a five-year programin leadership for medical undergraduates. Acad Med. 2008;83:52–58. http://journals.lww.com/academicmedicine/ Fulltext/2008/01000/Beyond_the_Dual_ Degree__Development_of_a_Five_Year.8. aspx. Accessed April 28, 2011.



26 Crossley J, Russell J, Jolly B, et al. ‘I’mpickin’ up good regressions’: The governance of generalisability analyses. Med Educ. 2007;41: 926–934.



34 Ko M, Edelstein RA, Heslin KC, et al. Impact of the University of California, Los Angeles/ Charles R. Drew University Medical Education Programon medical students’ intentions to practice in underserved areas. Acad Med. 2005;80:803–808. http://journals. lww.com/academicmedicine/Fulltext/2005/ 09000/Impact_of_the_University_of_ California,_Los.4.aspx. Accessed April 28, 2011.








 2011 Aug;86(8):1032-9. doi: 10.1097/ACM.0b013e3182223ab7.

Enhancing the reliability of the multiple mini-interview for selecting prospective health care leaders.

Author information

  • 1Center for Educational Development and Research, David Geffen School of Medicine, University of California, Los Angeles, USA. bas@mednet.ucla.edu

Abstract

PURPOSE:

The David Geffen School of Medicine at UCLA Program in Medical Education (UCLA-PRIME) used a 12-station multiple mini-interview(MMI) circuit to assess applicants. The authors sought to determine the reliability of the MMI, potential bias in scores, and the degree of acceptance by interviewers and applicants.

METHOD:

In 2009, 28 interviewers interviewed a cohort of 76 applicants. An anonymous survey assessed interviewers' and applicants' satisfaction with the MMI process and perceived bias. Psychometric properties were determined with generalizability and decision theory. The process was repeated the following year with a new cohort of 78 applicants and minor modifications aimed at improving reliability.

RESULTS:

The MMI format was well received by both applicants and interviewers. No bias based on gender or disadvantaged status was found. The preliminary reliability of the MMI in 2009 was 0.58-lower than reported in previous studies-but improved in 2010 to 0.71 after an easy station was replaced with a more challenging one and a new scoring rubric was introduced.

CONCLUSIONS:

This interview technique proved to be reliable and was seen as transparent, uniform, and fair. The predictive validity of this process remains to be determined.

PMID:
 
21694560
 
[PubMed - indexed for MEDLINE]


+ Recent posts