입학 OSCE: MMI (Med Educ, 2004)
An admissions OSCE: the multiple mini-interview
Kevin W Eva, Jack Rosenfeld, Harold I Reiter & Geoffrey R Norman
많은 북미의 의과대학은 대체로 유급률이 매우 낮아서 입학 단계가 의과대학에서 이뤄지는 평가 중 가장 중요한 평가단계라고 말하곤 한다.
Because many medical schools, particularly in North America, have very low rates of attrition, one could argue that the admissions procedure is the most important evaluation exercise conducted by a school.
일반적으로 몇 가지 형태의 면접이 사용되어왔다. 1980년대 초반, 99%의 미국 의과대학은 입학선발과정에서 면접을 사용해왔는데, 물리치료사 프로그램의 81%, 직업치료사 프로그램의 63%에서 사용하고 있었다. 더 최근 결과를 보면 이 비율에는 거의 차이가 없으며, Naver는 99%의 의과대학과 83%의 물리치료사 프로그램에서 면접을 사용한다.
Typically, some form of interview is used; by the early 1980s, 99%of medical programmes in the USA were found to use the interview as part of the admissions process,2 as were 81% of physiother- apy programmes and 63% of occupational therapy programmes.3 A more recent survey suggests there has been little change in these proportions; Nayer reported that 99% of US medical schools and 83% of US physiotherapy programmes use interviews.4
면접의 안면타당도는 상당히 높았지만, 그 효과성의 근거는 모호하다. 평가자간 신뢰도가 크게 차이가 나서 0.14부터 .95까지 차이가 나는데, 이러한 비일관성은 면접 수행의 방식에 다른 것으로 보인다. 비구조화된 면접에 비하여 구조화된 면접은 보다 높은 신뢰도와 타당도를 보여준다.
While the face validity of the interview remains strong, evidence of its effectiveness is more equivocal. Interrater reliability estimates vary widely, from 0.14 to 0.95, but this inconsistency might largely be an effect of variability in the way in which interviews are administered;5 structured formats (i.e. standardised questions with, sample answers provided to inter- viewers) tend to yield higher rates of reliability and validity than do unstructured formats.6,7
However, even these reliability estimates may be artificially inflated by:
1. 면접팀이 지원자의 학업정보에 대한 정보를 가지고 있다.
2. 면접관 사이의 비언어적 의사소통(비록 의도가 없었더라도)
1 the interview team having access to academic information on candidates,8,9 and
2 non-verbal communication (which is, admittedly, often unintentional) between members of the interviewing team.
As a result, despite acceptable interrater reliability in some cases, a candidate’s score may still be attributable, in large part, to chance. A lucky candidate who is randomly assigned to a like-minded, easy interviewer who influences the rest of the interview panel will score highly, whereas an identical, but less fortunate candidate who is randomly assigned to an incompatible, hard interviewer who influences the rest of the interview panel will score poorly.10
면번에 영향을 주는 다른 비뚤림은 면접관의 배경과 면접관의 기대이다. 실제로 Harasym 등은 면접관 사이의 차이가 총 변인의 56%를 차지한다는 것을 발견했다. 이러한 강력한 비뚤림은 (면접관이 아니라) 지원자의 인적특성을 평가하고자 하는 목적의 면접에서는 수용되어서는 안되며, 비윤리적이다.
Other biases that have been shown to impinge upon the personal interview include both the interviewers’ backgrounds6,8,11 and the inter- viewers’ expectations.6,12 In fact, Harasym et al.found that interviewer variability accounts for 56%of the total variance in interview ratings.12 Such strong biases are unacceptable (and unethical) for an assessment tool that is intended to examine the characteristics of the candidate, not the interviewers.
그러나 면접점수의 일반화에 제한을 거는 것은 단순히 면접관뿐만이 아니다. 여러 면접에서, 적어도 일부분은, 맥락-특이성에 영향을 받는 또 하나의 영역일 가능성을 보여준다. 수십년의 연구를 보면 우리의 인지적 기술이 맥락에 상당히 의존적이라는 것을 보여준다. 다른 말로 하자면, 우리의 수행능력은 'trait(개인에게 안정적으로 나타나는 특성)'보다는, 우리가 직관적으로 알 수 있듯이 'state(그 수행이 이뤄지는 맥락)'에 의해서 결정된다.
However, it is not simply interviewer bias that limits the generalisability of interview scores. Many of the problems with the personal interview might be explained, at least in part, by the possibility that the personal interview is yet another domain that is plagued by context specificity.13 Decades of research have indicated that many of our cognitive skills are highly dependent on context.14,15 In other words, our performance is commonly less determined by trait (the stable characteristics of the individual) than our intuitions suggest, and more determined by the state (the context within which the performance is elicited).
예컨대, 개개인이 지구의 자기장에 대해서 문제를 해결하는 능력이나 효과적으로 의사소통을 하는 능력은 국제 경제에서 독점의 유해함에 대해 문제를 해결하거나 효과적으로 의사소통하는 능력을 잘 예측하지 못할 것이다.
For example, an individual’s ability to problem solve or communicate effectively when discussing the impact of the magnetic compass on the modern world will not predict with great certainty that individual’s ability to problem solve or commu- nicate effectively when discussing the detrimental effect of monopolies on the world’s economy.16
이러한 가능성과 맞물려서 Turnbull 등은 캐나다의 RCPS에서 사용한 구술면접의 평가자간 신뢰도가 높았음에도, 면접 세션간의 일반화가능도는 낮았고, 이로 인해서 전체적인 시험의 신뢰도가 낮아졌다고 하였다. 그 결과 한 차례의 면접으로는 일반화가능한 지원자의 진짜 능력을 알아볼 수 없는데, 이는 문항을 표준화하고 평가자를 훈련시켜서 평가자간 신뢰도를 향상시키더라도 그렇다.
Consistent with this possibility, Turnbull et al. showed that, although interrater reliability within the oral interview certification examinations used by the Royal College of Physicians and Surgeons of Canada was high, the generalisability across interview sessions was low, thereby lowering the overall test reliability.17 As a result, a single interview may not provide an accurate, generalisable portrayal of a candidate’s true abilities even though interrater reliability may be improved by standardising the questions asked and training the interviewers. Multiple topics might be raised within an interview, but this may still represent a small sample of possible responses by the candidate and an interviewer’s impressions of each response may not be independent of one another.
MMI의 개발에 관한 논문
The current paper will first outline the development of an innovative admissions protocol – the multiple mini- interview(MMI) – that is intended to take advantage of this lesson in the context of student admissions and, second, report results from 2 studies of this protocol performed at McMaster University. In testing this innovation, it was necessary to make many decisions based solely on educated intuition. As a result, we make no claims at this point regarding the optimal use of the MMI, but instead present our logic and reasoning with the hope that some of our assumptions and expectations will be further tested in the future.
다면인적성면접 MMI
THE MULTIPLE MINI-INTERVIEW
First and foremost, it should be noted that the term OSCE has been used in the title of this article simply to orient the reader to the protocol that has been developed for the MMI. Like the OSCE, the MMI is intended to consist of a large number of short stations, each with a different examiner. The MMI is not, however, objective. Nor is it clinical. Research on both the clinical reasoning exercise19,20 and the OSCE21,22 has shown that subjective ratings can be reliable and valid estimates of an individual’s abilities. As a result, we do not view the subjective nature of the interview process itself to be a limiting feature of this admissions tool. Furthermore, we have carefully avoided developing stations that require clinical knowledge in an effort to prevent biasing the process in favour of health sciences students ⁄ personnel.
In contrast to what it is not, the MMI is an OSCE-style exercise consisting of multiple, focused encounters. It is intended to assess many of the cognitive and non- cognitive skills that are currently assessed (inad- equately) by the personal interview. Its specific advantage is that multiple interviews should dilute the effect of chance and interviewer ⁄ situational biases. Unlike traditional interviews, we can ensure that the ratings assigned to the multiple points of discussion are given independently because inter- viewers engage the applicants in separate rooms.
While the term interview has been maintained, one of the intended benefits of this protocol is the flexibility with which stations can be developed. For any given station, the examiner might be an inter- viewer or an observer.
- 면접관과 직접 의사소통 As an example, a station on ethical decision making, such as station 1 (see Appendix) can consist of a discussion between candidate and interviewer. Obviously some part of the rating assigned by the interviewer will be influenced by the candidate’s ability to communicate effectively, but stations that are intended to tap into communi- cation skills more directly can also be developed.
- 면접관은 모의환자와 대화하는 것을 관찰 For example, communication skills stations might consist of interviews conducted with a simulated patient while the examiner acts as an observer. Station 3 (see Appendix) is one such station in which the candidate is told s ⁄ he has to pick up a colleague to fly to a conference only to discover upon entering the room that the colleague has developed a fear of flying as a result of the September 11th tragedy. The observer rates the candidate based on the communication skills and empathy observed during the interaction be- tween the candidate and colleague .
이러한 면접실의 유연성은 지원자가 구체적인 질문에 대해서 대비하거나 예행연습을 할수 있는 가능성을 낮춰준다. 전통적인 질문(왜 의사가 되려고 하나요?)을 사용하는 대신, 지원자는 자연스럽게 주어진 상황에 대응해야 한다. 의심할 여지 없이, 지원자는 여전히 답변 예행연습을 해야 하지만, 스테이션의 DB가 충분한 크기로 개발된다면 무슨 질문을 받을지를 예측하는 것이 더 어렵다.
This flexibility in station development reduces the likelihood that candidates will benefit frompreparing and rehearsing responses to specific questions. Instead of asking the usual historical questions (e.g. Why do you want to become a doctor?), candidates must respond sponta- neously to the presented situation. Undoubtedly, candidates will still prepare and rehearse responses, but it will be more difficult to predict the types of questions one will be asked if a database of stations is developed to sufficient size.
지원자의 성장배경을 탐색하는 면접에서 전통적인 방식은 어떤 경험이든, 고난이든, 신념이든 지원자가 입학위원회에게 인정받고 싶은 내용을 이야기하게 하는 것이었다. 유사하게 만약 프로그램에서 이 면접을 사용한다면 한 부분에서는 이러한 스테이션을 나머지 면접 절차를 크게 훼손시키지 않으면서 포함시킬 수 있다.
If a programme does desire to query applicants regarding their life history, traditional interview stations can be used in which the interviewer allows the candidate to discuss whatever personal experi- ences, challenges or beliefs s ⁄ he would like the admissions committee to recognise. Similarly, if a programme desires to use the interview, in part, as a recruitment exercise, then a station can be assigned for this purpose without fear of impinging upon the rest of the interview process.
남은 면접 스테이션에서 구체적인 면접 질문이 예술의 역사부터 동물학까지 어떤 주제에서든 선별될 수 있다. 실제로 이러한 방식의 이차 이득은 학문분야나 지역사회의 다양한 분야에서 면접관을 모집할 수 있다는 데에서 오는데, 우리는 네 개의 영역을 선택하였는데 비록 이 영역이 모든 영역을 포괄하지는 않지만 의료인으로서 필수적이라고 여겨지는 것을 넣었다.
For the remaining stations, specific interview topics can potentially be drawn from any subject ranging from art history to zoology. In fact, an anticipated secondary advantage of this new protocol lies in its potential to draw interviewers from diverse academic and community areas and allow them to assess topics that are consistent with their domain of expertise. We opted to focus our test stations on 4 domains that are not considered to be comprehensive, but are considered to be vital for a career in the health sciences:
1 비판적 사고 critical thinking;
2 윤리적 판단 ethical decision making;
3 의사소통 communication skills, and
4 의료시스템에 대한 지식 knowledge of the health care system.
면접 스테이션의 적절성을 평가하기 위해서 우리는 지원자에게 전문적 지식을 기대하지는 말아야 한다고 결정했다. 예컨대 의학적 구체적 지식을 알 것을 요구해서는 안되며, 면접 스테이션은 지원자들이 주어진 주제에서 논리적으로 생각하고 아이디어를 효과적으로 의사소통할 수 있는 능력을 평가해야 한다. 추가적으로 우리는 어떤 문항도 정해진 답이 있는 것은 부적절하다고 보았다. 어떤 답이 다른 답보다 낫지 않다는 것을 뜻하는 것은 아니며, 면접관들이 특정한 '문구'나 '의견'을 찾아내려고 하지는 않아야 한다는 의미이다.
To assess the suitability of potential stations, we decided that candidates should not be expected to possess specialised knowledge. For example, they should not be expected to know details of a medical condition. Rather, stations should be developed in such a way that they allow candidates to display an ability to think logically through a topic and com- municate their ideas effectively. In addition, as a simple heuristic, we viewed any question that had a definitively correct answer to be inadequate. That is not to say that some answers are not better than others, but rather that the interviewers should not be searching for a specific catch phrase or a specific opinion.
실험 1: 졸업생 대상 파일럿 스터디
EXPERIMENT 1: PILOT STUDY WITH GRADUATE STUDENT PARTICIPANTS
OSCE에서처럼 독립적인 방이 사용되었다. 면접 개요.
As in an OSCE, separate rooms were used for each station. Posted to each door was a card with the Instructions to Applicants , as shown in the Appen- dix. In addition, as this was not intended to be a memory task, the same information was included on a card inside the interview room so that the candidate could refer back to it if s ⁄ he desired to do so. Each station lasted 8 minutes and was followed by a 2-minute interval during which interviewers comple- ted standardised evaluation forms and candidates prepared for the subsequent station. The evaluation forms requested interviewers to rate each of the candidates using 7-point scales on:
1 communication skills;
2 strength of the arguments raised;
3 suitability for the health sciences, and
4 overall performance.
일반적으로 스테이션의 숫자를 늘리는 것이 한 스테이션 안에서 평가자 수를 늘리는 것보다 효과가 크다. 이는 기존 면접이 맥락-특이성에 의해 훼손되었다는 가설을 지지한다.
In general, it appears that increasing the number of stations has a greater impact on the reliability of the test than increasing the number of raters within any given station, thereby supporting the hypothesis that context specificity plagues the traditional interview.
실험 2: 학부 의과대학생 선발
EXPERIMENT 2: UNDERGRADUATE MD PROGRAMME CANDIDATES
면접 진행 개요
All applicants (n = 396) who were offered an interview by McMaster University’s undergraduate medical programme were sent a letter inviting them to parti- cipate in an admissions research study. The letter stressed that their participation (or lack of participa- tion) would in no way influence their chances of being accepted to the medical programme and offered candidates $40 in an attempt to make it clear that this initiative was completely separate from the regular admissions process. A total of 182 candidates respon- ded affirmatively, of which the first 120 candidates whose schedules coincided with participation in one of 12 prearranged research sessions were selected.
4일에 걸쳐서 3 세션을 연속적으로 진행. 세션간 40분의 휴식시간.
Three sessions were run sequentially during each of the 4 interview days, with a 40-minute break for examiners between sessions. All candidates were allowed to participate only after completion of the regular admissions protocol. Three candidates backed out due to illness, resulting in a total sample size of 117; 2 of these left before completing a post-MMI survey.
면접관 모집: 대부분 교수였으나 8명의 학생과 2명의 HR부서 직원 포함
Interviewers were recruited broadly from the Faculty of Health Sciences, the students currently in the medical programme, and the community at large (including McMaster University’s Human Resources Department). From the surplus of individuals who volunteered to participate, we selected 40 (10 per day) based on their willingness to volunteer for an entire day. Evaluators were mostly drawn from the Faculty of Health Sciences, but 8 students and 2 members of the Human Resources Department also participated. The list of health sciences volunteers included representation from rehabilitation sciences, nursing, biochemistry and medicine.
절차 Procedure
1 모든 10개 스테이션 사용 all 10 stations reported in the Appendix were used;
2 스테이션당 면접관 1명 only 1 interviewer was assigned per station, and
3 파일럿 연구에서 문항간 상관이 높게 나와서 종합적 평가만 하도록 함 as a result of the high correlations among the 4 evaluation questions used during the pilot study, we opted to ask evaluators to simply score the applicant s overall performance on this station’.
신뢰도 분석 Reliability analyses
지원자-스테이션 상호작용에 따른 변인이 지원자 자체에 의한 변인보다 5배 큼. 이 역시 context-specificity을 의미
Furthermore, the variance attributable to the candidate–station interaction was 5 times greater than that assigned to the candidates themselves, further supporting the hypothesis that context spe- cificity negatively impacts on traditional interviews.
다른 척도와의 상관관계 Correlation with other measures
The MMI scores did not correlate highly with any of the other admissions tools currently used by McMas- ter’s admissions protocol. The correlations between the MMI and the existing admissions tools23 –
personal interview, 0.185,
simulated tutorial, 0.317,
undergraduate grade and 0.227
autobiographical sketch 0.170,
– were r ¼ r ¼ r ¼ ) and r ¼ respectively.
면접후 설문 Post-MMI surveys
추가적으로 지원자에게 3개의 개방형 질문을 했다. MMI의 최고 장점은? 한 스테이션에서 못 한 것을 다른 스테이션에서 ㅁ나회할 수 있음. 지원자의 기술/경험에 보다 균형잡힌 관점을 제공해준다.
In addition, candidates were asked 3 open-ended questions. In response to the question: What do you believe to be the greatest benefits of using the MMI? , many commented on the opportunity to recover from poor stations and the belief that the MMI should provide a more balanced view of the applicant s skills and experiences’. Positive comments were also recorded regarding the oppor- tunity to maintain a dialogue with the interviewer and the opportunity to solve and discuss REAL PROB- LEMS [sic] .
어떤 점을 개선하면 좋을까? 약점은 무엇인가? 스테이션간 의자를 놔달라, 각 면접방별 시간을 늘려달라 그룹 스킬을 평가하는 스테이션을 넣어달라 등
Candidates were also asked the questions: Are there any improvements you would like to see made before the MMI is implemented? and What do you believe to be the greatest weaknesses of the MMI? Their responses to these focused primarily on logistical issues, such as including a chair between stations , lengthening the amount of time for each interview (most often suggested as lengthening to 10 minutes) and allow[ing] for some discussion at the end, [to provide an] opportunity to go back to a point not adequately covered . Some commented that the MMI would allow for a shorter interview day , but that a break half way through would help . Others noted the lack of an opportunity to reveal group skills – a domain that could potentially be built into future iterations of the MMI.
Interestingly, in contrast to the comments offered by some candidates, examiners tended to suggest that 8 minutes was more than enough time to get a sense of the candidate’s performance. In general, the most consistent comment, raised by approxi- mately a quarter of respondents, was that the examiners would have liked more training before- hand, potentially in the form of including more information and a longer list of potentially relevant questions in the preparatory package received by all examiners.
평가 프로토콜을 평가할 때 네 가지 주요 이슈가 있다.
There are 4 issues that need to be considered when evaluating the efficacy of any assessment protocol:
1 신뢰도 reliability;
2 타당도 validity;
3 적용가능성 feasibility, and
4 수용가능성 acceptability.
신뢰도: 받아들일 수 있는 수준이며, 다른 입학평가도구와 상관관계가 낮은 것은 맥락-특이성을 보여주는 것이다.
The reliability of the MMI has now been shown to be in an acceptable range (0.65–0.81) across 2 studies, using graduate student volunteers and actual applicants to the undergraduate medical programme. While ade- quate, this reliability might be further improved with examiner training. The low correlations between various admissions tools including the MMI is consis- tent with the hypothesis that context specificity impacts upon admissions protocols, thereby further promoting the need for a tool that adopts a multiple observations approach analogous to that provided by OSCEs when assessing clinical competence.
타당도: 블루프린트가 필요하다. 우리는 4 영역을 선정하였다. 학교나 프로그램마다 MMI 스테이션을 만들 때 중요하게 여기는 가치를 포함시킬 것을 권한다.
The blueprinting process undertaken for the gen- eration of stations was intended to maximise the content validity of the MMI. We selected 4 domains that are thought to represent important, non-cogni- tive characteristics for success in the health sciences. We advocate that specific schools and specific pro- grammes within the schools that consider imple- menting the MMI engage in a similar process, determining the characteristics they value before creating MMI stations. This blueprinting technique might then ensure an optimal match between the curricular tenets of the programme and the charac- teristics of the individuals accepted into the pro- gramme.
적용가능성이나 비용효과성이 없으면 활용할 수 없을 것:
That being said, even the most reliable and most valid of admissions exercises will not be useful if they do not prove to be feasible and cost-effective. In fact, the issue of cost-effectiveness ranks high among the primary assaults that have been launched against the use of personal interviews.5 For McMas- ter’s medical programme, approximately 400 appli- cants are interviewed annually, each of whom requires an hour of interview time (30 minutes for the interview and 30 minutes for scoring and a break). There are 4 people on each interview team, so each personal interview requires 4 person hours per applicant; the entire interview programme therefore requires 1600 person hours in total. Of these, 550 are typically faculty hours, the cost of which amounts to an estimated $27 500 per annum. The use of other non-cognitive tools, particularly the simulated tutorial, increases total interviewer time to about 1800 hours and faculty cost to about $32 000.
비용이 더 적게 들어가며 이미 많은 학교에서 OSCE를 활용하고 있음.
A 10-station MMI (with 10 minutes per station) could be run for only 2 person hours per candidate (including a 20-minute break for all examiners). Assuming the same ratios of faculty versus commu- nity personnel, this would require 275 faculty hours at a cost of $13 750 per annum. These values could potentially be reduced even further if it is deter- mined that 10 minutes per station is not required or if fewer stations are used (although the disadvantage of this latter strategy will be poorer reliability). The cost will be increased slightly by the use of SPs, with the absolute value of the increase depending on the number of SP stations used. As one final note on the feasibility of implementing the MMI, most health sciences programmes have considerable experience with mounting OSCEs. This expertise can potentially be used to make the transition from a personal interview to the MMI as smooth as possible.
수용가능성: 평가자는 참여하고자 하는 의지가 있는가? 일반적 면접보다 더 피곤하다는 의견. 이러한 점은 면접시간, 휴식시간, 프로토콜 등을 조정하여 고칠 수 있을 것.
Finally, a note on the acceptability of admissions tools. As the MMI and personal interview do require more human resources than simple reliance on grades, it is important that the individuals who are asked to act as interviewers are willing to participate in the process.
Interviewers in the pilot study were most concerned about the experience being more tiring than the personal interview, because a single person is responsible for each interview. Addressing this concern might require adjusting the protocol, increasing the number or length of breaks, or changing some other aspect of the process.
이 과정이 재밌다고 응답한 평가자도 많았음
It should be kept in mind, however, that an equal number of interviewers reported the exercise to be fun and entertaining.
MMI의 장점
The anticipated strengths of the MMI are 6-fold:
1 it allows multiple samples of insight into a candidate’s abilities;
2 it dilutes the effect of chance and examiner bias;
3 stations can be structured so that all candidates respond to the same questions and interviewers receive background information a priori;
4 admissions directors have a great deal of flexibility in that stations can be designed with a blueprint of the qualities they would like to select for in mind;
5 candidates can feel confident that they will be given a chance to recover from a disastrous station by moving onto a new, independent interviewer, and
6 fewer resources might be required.
An admissions OSCE: the multiple mini-interview.
Author information
- 1Department of Clinical Epidemiology and Biostatistics, Programme for Educational Research and Development, McMaster University, Hamilton, Ontario, Canada. evakw@mcmaster.ca
Abstract
CONTEXT:
METHODS:
RESULTS:
DISCUSSION:
- PMID:
- 14996341
- [PubMed - indexed for MEDLINE]
'Articles (Medical Education) > 입학, 선발(Admission and Selection)' 카테고리의 다른 글
미국 의과대학생 선발에서 면접절차 (J Med Educ. 1981) (0) | 2015.11.24 |
---|---|
변형된 면접: 학생선발을 위한 신뢰성 있는 면접의 부활? (Acad Med, 2012) (0) | 2015.11.24 |
의과대학 선발에서 MMI를 통한 비인지적 역량 측정(Med Educ, 2007) (0) | 2015.11.20 |
의학의 기술: 과학이라 여겨지는 미신: 의과대학생 선발(Lancet, 2010) (0) | 2015.11.20 |
의과대학생 선발의 신뢰성에 평가자와 상황이 미치는 영향(Med Educ, 2009) (0) | 2015.11.20 |