MMI에서 면접관의 특성과 평가 점수의 관계(Acad Med, 2004)

The Relationship between Interviewers’ Characteristics and Ratings Assigned during a Multiple Mini-Interview

Kevin W. Eva, PhD, Harold I. Reiter, MD, MSc, Jack Rosenfeld, PhD, and Geoffrey R. Norman, PhD






MMI는 지원자의 수행능력에 대한 신뢰도있는 추정을 가능하게 해주나, 이질적인 평가자들의 서로 다른 vantage point로부터 생길 수 있는 bias에 관심을 둬야 한다.

This Multiple Mini-Interview (MMI) has been shown to provide a reliable estimate of candidates’ perfor- mance,1 but the new protocol demands that attention be paid to the biases that might arise as a result of the different vantage points held by heterogeneous raters.



배경

Background


문제는 내용-특이성이다. 학생선발 결정은 Albanese 등이 지적한 바와 같이, "거의 무한에 가까운 서로 다른 상황에 대해서 발생가능성이 가장 높은 안정적인 특질에 관심이 있다". 비록 그러한 "안정적인 특질"이 존재하느냐에 대한 논쟁은 있지만, 다양한 상황을 맞닥뜨리면서 보여주는 평균적인 수행능력이 어떠한 단일한 상황에서의 모습보다 한 개인의 질(qualities)에 대해서 더 일반화가능하다는 것이 여러 context에서 명확해지고 있다.

The problem is one of content spec- ificity. In making selection decisions, as indicated by Albanese et al. “one is most interested in stable qualities that have a high probability of occurrence in an almost infinite number of different sit- uations.”2,p.317Although debate exists regarding whether such “stable qualities” exist, it has become clear in various con- texts that the average performance an individual displays over the course of many encounters is a more generalizable indication of that individual’s qualities than is any single encounter.5


MMI

The Multiple Mini-Interview


MMI가 입학에서 사용되는 OSCE라고 할 수 있지만, 우리는 이 이름을 바꿨는데, 그 이유는 판단이 객관적이지 않고, 스테이션이 의도적으로 임상과 무관하게 설정되기 때문이다.

Although essen- tially an admissions OSCE, we have opted to change the name of the proto- col to make explicit the facts that the judgments are not objective and the stations are intentionally nonclinical.


이 절차는 입학위원회가 종사하는, MMI를 도입하는 기관의 교육 철학에 따라 영향을 받게 되며, 또한 더 넓은 차원에서 진료행위를 하는 의사의 핵심역량에 대해 설명하는 문헌의 영향을 받는다. 그 절차는 Reiter and Eva에 의해서 개발된 바 있다.

This process should be informed by the educational philosophy adopted by the institution in which the admissions committee works as well as broader documents that out- line the key competencies of practicing physicians.6,7 A process for doing so has been developed by Reiter and Eva.8


기존의 연구를 살펴보면, MMI는 지원자의 역량에 대한 신뢰도높은 평가를 가능하게 해준다. 전반적인 검사의 신뢰도는 스테이션당 평가자보다 스테이션의 숫자를 늘릴 때 더 향상되며, 지원자와 평가자 모두에게 긍정적인 평가를 받는다. 그러나 아직 남겨진 질문은 교수와 비-교수 사이에 평가가 서로 다른가 하는 것이다. McMaster에서 다양성(heterogeneity)는 언제나 근본적인 원칙이었는데, 왜냐하면 학생들의 경험의 폭을 넓혀주는 것이 학업적 경험을 더 풍요롭게 해준다고 믿기 때문이다. 학생들의 다양성을 최대화하기 위하여 면접관들은 다양한 인구집단에서 선발되어왔는데, 여기에는 교수, 학생, 지역사회인사 등이 다 포함된다. 우리가 한 스테이션당 한 명의 면접관을 배치하기 때문에, 교수와 지역사회인사의 평가향상이 서로 일치하는가를 보는 것이 중요하다.

Previous research has shown that the MMI provides a reliable assessment of candidates’ abilities, that the overall test reliability improves to a greater ex-tent by maximizing the number of sta-tions rather than by maximizing the number of observers per station, and that the MMI is viewed positively by both candidates and examiners alike.1Remaining unanswered, however, is the question of whether faculty members and nonfaculty members are distin-guishable by their ratings. At McMas-ter, heterogeneity has always been a fundamental principle because it is be-lieved that breadth of experiences across students enriches the scholastic experi-ence.9 To try to maximize heterogeneity across students, interviewers have tradi-tionally been drawn from various popula-tions, including faculty members, medical students, and individuals from the com-munity at large. As we propose assigning a single interviewer to each station, the question of whether faculty members and individuals from the community assign performance ratings consistent with one another becomes an increasingly impor-tant question.



방법

METHOD


참가자

Participants


In addition, 18 health sciences fac- ulty members and 18 community mem- bers drawn from the legal profession and human resource departments of both local businesses and the university were recruited to act as examiners. In two instances, faculty members had to with- draw—they were replaced with current medical students.


절차

Procedure


On the study weekend, three sessions were run sequentially on each of two days with a 40-minute break for the examiners between sessions. Two examiners were assigned to each station. 

    • 3개는 교수만 Three of the nine stations were staffed by two faculty members, 
    • 3개는 지역사회인사만 three by two community members, and 
    • 3개는 교수와 지역사회인사 각 1명씩 three by one member of each group. 

Before the first MMI on each day the authors of this article met with the examiners to ensure that the procedure was clear, to answer any last-minute queries, and to reinforce that the ratings should be assigned in- dependently.



결과

RESULTS


점수

Scores

internal consistency는 높음. 총점만 사용하기로 함.

Table 1 shows the average score and standard deviation assigned to candi- dates for each of the four items on the evaluation form. The internal consis- tency (i.e., the average relationship be- tween pairs of questions) was found to equal .96, indicating a high degree of redundancy. As a result, only the “over- all performance” score was used in sub- sequent analyses.



To determine whether the ratings faculty members assigned were biased relative to those community members assigned, a repeated measures ANOVA was performed on the data collected within the three stations that were staffed by both a community and a fac- ulty member. The mean score assigned by faculty members (4.66) bordered on being significantly less than that as- signed by community members (4.96; F1,53 3.972, mean squared error 1.790, p .06).




신뢰도 분석

Reliability Analysis



평가자의 특성과 평가 점수와의 관계

The Relationship between Interviewers’ Characteristics and Ratings


두 명의 지역사회인사가 들어간 경우 일반화가능도는 가장 높은 경우 0.58정도였다. 두 명의 교수가 들어간 곳에서는 0.46, 한 명의 교수와 한 명의 지역사회인사가 들어간 경우는 0.31이었다. 각각 짝을 지어 보았을 때 그 차이는 통계적으로 유의했다.

The generaliz- ability for the three stations that were staffed by two community members was highest at .58. The three stations that were staffed by two faculty members revealed the second highest generaliz- ability .46. Least reliable were the three stations that were staffed by one member of each group (generalizability .31). Each pairwise difference is statis- tically significant: .58 versus .46, z(106) 2.78, p .05; .46 versus .31, z(106) 3.12, p .05; .58 versus .31, z(106) 5.90, p .05.


어떤 경우든 MMI의 일반화가능도는 각각 1명씩 들어간 경우 가장 낮았고, 둘 간에 larger inconsistency가 있음을 의미한다.

In either case, the generaliz- ability of the MMI appears to be lowest among stations evaluated by one commu- nity member and one faculty member, suggesting that there are larger inconsis- tencies in the way that community mem- bers rate candidates relative to the way that faculty members rate candidates than there are within either group of raters.



Post-MMI Surveys








DISCUSSION


면접이 지원자의 성격을 안정적이고 일반화가능한 수준으로 측정하기 위해서 평가자간 신뢰도를 보여주는 것 만으로는 충분한 근거가 되지 않음을 보여준다. 반면, 지원자가 이 면접과 저 면접 사이에 예측불가능한 형태로 엄청난 차이를 보여준다는 것을 제시한다. 그 결과 한 면접에서의 결과는 다음 면접에서의 결과를 거의 예측해주지 못한다.

These findings suggest that the dem- onstration of adequate interrater reli- ability, which has been used in the past as an argument for standardized inter-views, is insufficient evidence to ensure that an interview is measuring stable and generalizable applicant characteris-tics. By contrast, the findings suggest that applicants will vary considerably,in unpredictable fashion, from one in-terview to another. Consequently, the scores derived from any one interview will be a poor predictor of performance in a second interview.


적어도 이 결과는 Ferrier 등이 주장한 '다양한 평가자가 더 다양한 학생군을 만든다'라는 것을 지지한다. 교수와 지역사회인사가 준 평균점수의 차이는 더 많은 평가자 훈련을 통해서 극복가능하겠지만, 점수 차이의 절대값은 각 그룹에 속한 평가자가 동등한 비율로 있다면 문제가 되지는 않을 것이다. 

At the very least these results support Ferrier et al.’s9 claim that using heterogeneous raters may result in a more heterogeneous class. The difference we observed in the mean scores faculty and community rat- ers provide may be overcome with fur- ther training, but the absolute differ- ence in scores will not matter as long as all circuits contain an equal proportion of examiners from each group. It should be noted that the distinction drawn in this study between raters of different backgrounds is very broad.


MMI의 또 다른 장점은 Edward 등이 밝힌 네 가지 입학면접의 목적을 (굳이 한 차례의 면접에 뒤섞지 않고서도) 달성할 수 있다는 것이다. (정보 수집, 의사 결정, 확인, 모집) 또한 전통적인 면접에서 지적된 시간의 비효율적 사용 문제도 극복할 수 있다.

Additional advantages to the MMI include the potential to achieve the four purposes of admissions interviews identified by Edwards et al.4 (i.e., infor- mation gathering, decision making, ver- ification, and recruitment) without con- founding these purposes within a single interview (e.g., one station could be designed as a recruitment station with- out the goal of attracting the best can- didates affecting the rest of the inter- view process). The MMI also corrects for the inefficient use of time that has been identified by Litton-Hawes et al.12 as a problem in more traditional inter- views.


"깐깐한" 혹은 "널럴한" 면접관에게 배정될 가능성이 무작위였지만 더 많은 수의 평가자에 의해 평가되면 이 효과는 사라질 것이다.

Similarly, any chance effects of being randomly assigned to an “easy” or “hard” panel of interviewers will be di- luted with the MMI as candidates are exposed to a greater number of examin- ers.


왜 지역사회인사의 평가가 교수들의 평가보다 더 less consistent 할까?

Of further interest is the finding that community members’ ratings were less consistent with those provided by fac- ulty members than were the ratings pro- vided within either group.




8. Reiter HI, Eva KW. Reflecting the relative values of community, faculty, and students in the admissions tools of medical school. Sub- mitted manuscript.


Background: In defining the characteristics of medical students that society and the medical profession find desirable, little effort has been spent assessing the relative value of the dozens of characteristics that have been identified. Furthermore, many institutions go to great lengths to ensure equal representation across stakeholder groups in an effort to maximize the heterogeneity of the pool of students accepted to study medicine; however, the extent to which different stakeholders value different characteristics has yet to be determined. 


Purpose: This study was an attempt to assess the relative value of the characteristics of medical students that society and the medical profession find desirable. 


Methods: Using documents created internationally to identify the core competencies of medical personnel, a series of 7 characteristics were generated for inclusion in a study that adopted the paired comparison technique. Of 347 surveyed, 292 respondents indicated the rank ordering they would assign to each characteristic by circling the more important characteristic in all possible pairings. 


Results: Overwhelmingly,ethical” was deemed to be the most important characteristic on which selection tools should be based. Surprisingly, the pattern of responses was highly consistent regardless of stakeholder group and degree of affiliation with the undergraduate medical program. 


Conclusions: The generalizable features of this study not only include the empirical findings but also demonstrate useful survey protocol that can be adapted by any admission committee to guide the generation of an institution-specific admissions blueprint. A novel protocol that provides the necessary flexibility is discussed.














 2004 Jun;79(6):602-9.

The relationship between interviewers' characteristics and ratings assigned during a multiple mini-interview.

Author information

  • 1Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada. evakw@mcmaster.ca

Abstract

PURPOSE:

To assess the consistency of ratings assigned by health sciences faculty members relative to community members during an innovative admissions protocol called the Multiple Mini-Interview (MMI).

METHOD:

A nine-station MMI was created and 54 candidates to an undergraduate MD program participated in the exercise in Spring 2003. Three stations were staffed with a pair of faculty members, three with a pair of community members, and three with one member of each group. Raters completed a four-item evaluation form. All participants completed post-MMI questionnaires. Generalizability Theory was used to examine the consistency of the ratings provided within each of these three subgroups.

RESULTS:

The overall test reliability was found to be .78 and a Decision Study suggested that admissions committees should distribute their resources by increasing the number of interviews to which candidates are exposed rather than increasing the number of interviewers within each interview. Divergence of ratings was greater within the pairing of community member to faculty member and least for pairings of community members. Participants responded positively to the MMI.

CONCLUSION:

The MMI provides a reliable protocol for assessing the personal qualities of candidates by accounting for context specificity with amultiple sampling approach. Increasing the heterogeneity of interviewers may increase the heterogeneity of the accepted group of candidates. Further work will determine the extent to which different groups of raters provide equally valid (albeit different) judgments.

PMID:
 
15165983
 
[PubMed - indexed for MEDLINE]


+ Recent posts