보건의료전문직 선발시 평가방법 : 오타와 2010컨퍼런스에서의 컨센서스와 제언

Assessment for selection for the health care professions and specialty training: Consensus statement and recommendations from the Ottawa 2010 Conference



의학 및 보건의료 전문직에 있어 선발을 위한 평가는 선발 후 교육과정에서 이루어지는 평과와 동일한 수준의 질이 확보되어야 한다. 선발에 대한 논문들은 확고한 이론이나 개념에 의해 뒷받침되지 못하고 제한적이다. 

우선 필기시험에 대해서는, MCAT이 의과대학과 면허시험에서의 수행능력에 대해 예측타당도를 갖는다는 근거들이 있다.

GPA의 예측타당도에 대한 근거도 있는데, 특히 MCAT과 결합해서 사용했을 때 예측타당도가 높다는 것이 (북미 North America의) 의학전문대학원(graduate entry) 체제에서 많은 연구가 되어 있다. 반면 호주/영국 등에서의 졸업시학점(school leaver score)의 예측타당도에 대한 근거는 적다.

여러 연구에서 MMI가 좋은 예측타당도와 신뢰도를 갖는다는 결과가 보고되어 있다. 선발에 사용되는 여러 방법들 중 인성(personality)시험에 대해서만이 관심이 높아지고 있으며, 미래에도 많은 연구가 될 것으로 생각된다. 

의과대학과 보건의료직에 대한 문호를 넓히는(widening access) 문제는 보건의료전문직의 사회적 책무성 문제와 연관되어 점차 관심이 높아지고 있다. 전통적인 선발 방식은 많은 인구집단을 배제하게 되는 특징이 있는데, 그렇다고 해서 새로이 등장하는 비전통적인 방법이 여러 집단에게 '문호를 넓힌다'는 근거가 있는 것도 아니다. 사전준비(preperation)프로그램과 지역사회지원(outreach)프로그램 정도가 효과가 있을 것으로 기대된다.

요약하자면 선발시 평가에 대한 지금까지의 컨센서스 영역은 넓지 않다. 좋은 평가원칙을 적용하여 교과과정과 잘 맞는 방법을 적용하고, 다양한 방식을 활용한 프로그램적 접근법(programmatic approach)를 사용하며, 다학문적 관점에서 정교한 측정모델을 활용해야 한다. 사회적 책무성을 다하기 위해서는 다양한 계층을 포용하여 노동력(배치) 문제, 문호를 넓히는 문제 등이 선발 원칙에 적용되어야 할 것이다.

여타 High-stake assessment에 적용되는 것과 동일한 기전이 작동해야 함.

By conceptualising selection as ‘assessment for selection’, the well-developed quality assurance mechanisms associated with high-stakes assessment can be applied to the selection process. These include:

선발에 대한 확실한 청사진.

Psychometric연구의 근거, 이론적 배경

선발, 교육과정, 평가의 일치성

확고한 기준, 의사결정 과정

선발의 영향력에 초점

. proceeding from a clear blueprint of the content for selection;

. using evidence from psychometric studies and a theory base to inform the selection process;

. developing congruity between selection, curriculum and assessment;

. using clear standard-setting and decision-making procedures; 

. providing a focus on the impact of selection (a variant of the adage that assessment drives learning).

The current position: Written tests

MCAT이 완벽한 예측인자는 아니며, 근면성, 동기부여, 의사소통 능력 등의 변수가 있다.

The study concluded that MCAT was not a perfect predictor and other variables such as ‘diligence’, ‘motivation’ and ‘communication skills’ need further investigation.

MCAT과 GPA는 의과대학에서의 성공과 상관관계가 있긴 하나, 고위험군(at risk)학생들에 대해서는 잘 예측하지 못한다.

MCAT and prior GPA scores were correlated with success in medical schools but did not have sufficient ability to define or differentiate the success or failure of students considered ‘at risk’.

The current position: Achievement ratings

과학과목 GPA가 비과학과복 GPA에 비해서 더 유용하다는 근거는 없다. 

Didier등은 기관 간 GPA를 맞추는 방법을 보고했다.

There is no clear evidence about the relative merits of GPA in science compared to non-science subjects. Didier et al. (2006) reported on a method to adjust GPA to equate for differences between institutions

McManus 는 영국 학교에서 A레벨성적이 의과대학에서 과 선택에서 예측인자였지만, 일반지능은 그렇지 않았다.

McManus et al. (2003) found that A level grades for UK schools were predictive of medical career choice but the results of a general intelligence test were not.

The current position: Interviews

많이 사용되긴 하나, psychometric 특징을 정의한 연구는 없다. 

Despite its ubiquity, there are very few studies defining its psychometric properties. Those that do exist do not indicate that the interview is a robust selection measure.

일반적인 면접의 신뢰도를 뒷받침해주는 근거가 불충분하다.

They concluded that there was not sufficient evidence to establish the reliability of interviews

면접 형태에 따라 면접관 간 차이가 크다.

In their review of the assessment of personal qualities for selection for medicine, Albanese et al. (2003) reached a similar conclusion. They described the results of reliability and validity studies as ‘equivocal’. Furthermore, they indicated a high degree of variability amongst interview formats, particularly the characteristics that they purport to measure

Stansfield와 Kreiter는 면접의 신뢰도를 높이는 방법으로 5점 스케일 대신 3점 스케일을 사용하는 것이 낫다고 주장한다.

Stansfield and Kreiter (2007) have indicated at least one way to improve reliability. In their study in one medical school, they found higher reliability for ratings at the high or low ends of a rating scale rather than middle levels. As a result, they argue that a three-point ranking scale may be as useful as the commonly used five-point scale.

The current position: MMIs

OSCE와 마찬가지로 MMI는 시험-재시험 신뢰도의 문제와 한 상황에서 적용할 수 있는 특징을 다른 상황에서 적용하지 못하는 상황특수성(context specificity)을 극복해냈다. 시험-재시험 신뢰도는 평가자간 신뢰도보다 더 나은 지표이다.

Eva et al. (2004a) indicate that, like the OSCE, the MMI overcomes the problem of poor test– retest reliability and context specificity where the measurement of an attribute in one context does not necessarily transfer to another. Test–retest reliability provides a better indication of the quality of a test than inter-rater reliability because it focuses on the overall test not just a component of its operation.

MMI가 예측타당도와 신뢰도가 높다는 것은 많은 연구에서 발혀진 바 있다.

Good predictive validity and reliability of the MMI have been established in studies by Eva et al. (2004a, b, 2009), LeMay (2007), Reiter et al. (2007) and Roberts et al. (2009). Eva et al. (2009)

MMI의 다른 특징으로는 '스테이션의 수를 늘리는 것이 면접관의 수를 늘리는 것보다 더 효과있다' 라는 것, '8~5분정도 면접시간을 줄이는 것은 신뢰도에 별로 영향을 주지 않는다'는 것, '보안이 뚫려도 MMI의 결과에는 영향을 받지 않는다는 것' 등이 있다. 

Kumar등은 어떻게 면접관이 결론에 도달하는지, 그리고 어떤 편견에 빠지기 쉬운지에 대한 이론적 근거를 제시했다. 

또한 면접관과 응시자의 지원(interviewer and candidate support)에 대한 근거도 있으며 MMI가 공간은 더 많이 필요하지만 준비시간은 더 적다는 연구도 있다.

Further studies have demonstrated other attributes of the MMI. The Eva et al. (2004a) study demonstrated that increasing the number of stations had a greater impact on reliability than increasing interviewers. Dodson et al. (2009) demonstrated that reducing station length from 8–5 min had little impact on reliability and it has also been shown the results of MMI appear not to be affected by security violations (Reiter et al. 2006). Kumar et al. (2009) have provided some theoretical insights into how judges arrive at their decisions and the biases to which they are subject. There is also evidence for both interviewer and candidate support of the process (Kumar et al. 2009; Razack et al. 2009) and that, while the MMI may require more physical space, it requires fewer planning hours (Rosenfeld et al. 2008).

The current position: Other measures

자기소개서, 추천서 등이 있지만 신뢰도가 높거나 예측타당도가 높다는 근거가 있는 것들은 없다.

Other measures used in the selection process include personal statements, autobiographical statements or letters of recommendation. However, there is no evidence that they are necessarily reliable or have predictive validity. In the Albanese et al’s (2003) review of personal qualities in selection, no research papers could be located on such measures nor could any evidence be found that they measured anything different from interviews

인성 검사에 대한 관심도 높아지고 있다. Price 등이 제시한 성공적인 의사의 87가지 특징이 있고, Big Five라 불리는 특징(openness, conscientiousness, extrovertness, agreeableness and neuroticism)이 있지만 실제 사용된 시도는 거의 없다.

There is also growing interest in the application of personality testing used in business or commerce careers for selection. Albanese et al.’s (2003) review points to one of the difficulties with this approach. They point to Price et al.’s (1971) study indicating 87 qualities of successful doctors. There is great variability in the qualities currently assessed through interviews, MMIs and other non-cognitive measures. The psychology literature has shown some acceptance of the ‘big five’ personality characteristics: openness, conscientiousness, extrovertness, agreeableness and neuroticism but there have been few attempts to apply this to selection for the medical and health professions.

 2011;33(3):215-23. doi: 10.3109/0142159X.2011.551560.

Assessment for selection for the health care professions and specialty trainingconsensus statement andrecommendations from the Ottawa 2010 Conference.


Medical Education, Flinders University, GPO Box 2100, Adelaide, South Australia 5064, Australia. david.prideaux@flinders.edu.au


Assessment for selection in medicine and the health professions should follow the same quality assurance processes as in-course assessment. The literature on selection is limited and is not strongly theoretical or conceptual. For written testing, there is evidence of the predictive validity of Medical College Admission Test (MCAT) for medical school and licensing examination performance. There is also evidence for the predictive validity of grade point average, particularly in combination with MCAT for graduate entry but little evidence about the predictive validity of school leaver scores. Interviews have not been shown to be robust selection measures. Studies of multiple mini-interviews have indicated good predictive validity and reliability. Of other measures used in selection, only the growing interest in personality testing appears to warrant future work. Widening access to medical and health professional programmes is an increasing priority and relates to the social accountability mandate of medical and healthprofessional schools. While traditional selection measures do discriminate against various population groups, there is little evidence on the effect of non-traditional measures in widening access. Preparation and outreach programmes show most promise. In summary, the areas of consensus forassessment for selection are small in number. Recommendations for future action focus on the adoption of principles of good assessment and curriculum alignment, use of multi-method programmatic approaches, development of interdisciplinary frameworks and utilisation of sophisticated measurement models. The social accountability mandate of medical and health professional schools demands that social inclusion, workforce issues and widening of access are embedded in the principles of good assessment for selection.

보건의료전문직 선발시 평가방법 : 오타와 2010컨퍼런스에서의 컨센서스와 제언

Assessment for selection for the health care professions and specialty training: Consensus statement and recommendations from the Ottawa 2010 Conference



의학 및 보건의료 전문직에 있어 선발을 위한 평가는 선발 후 교육과정에서 이루어지는 평과와 동일한 수준의 질이 확보되어야 한다. 선발에 대한 논문들은 확고한 이론이나 개념에 의해 뒷받침되지 못하고 제한적이다. 

우선 필기시험에 대해서는, MCAT이 의과대학과 면허시험에서의 수행능력에 대해 예측타당도를 갖는다는 근거들이 있다.

GPA의 예측타당도에 대한 근거도 있는데, 특히 MCAT과 결합해서 사용했을 때 예측타당도가 높다는 것이 (북미 North America의) 의학전문대학원(graduate entry) 체제에서 많은 연구가 되어 있다. 반면 호주/영국 등에서의 졸업시학점(school leaver score)의 예측타당도에 대한 근거는 적다.

여러 연구에서 MMI가 좋은 예측타당도와 신뢰도를 갖는다는 결과가 보고되어 있다. 선발에 사용되는 여러 방법들 중 인성(personality)시험에 대해서만이 관심이 높아지고 있으며, 미래에도 많은 연구가 될 것으로 생각된다. 

의과대학과 보건의료직에 대한 문호를 넓히는(widening access) 문제는 보건의료전문직의 사회적 책무성 문제와 연관되어 점차 관심이 높아지고 있다. 전통적인 선발 방식은 많은 인구집단을 배제하게 되는 특징이 있는데, 그렇다고 해서 새로이 등장하는 비전통적인 방법이 여러 집단에게 '문호를 넓힌다'는 근거가 있는 것도 아니다. 사전준비(preperation)프로그램과 지역사회지원(outreach)프로그램 정도가 효과가 있을 것으로 기대된다.

요약하자면 선발시 평가에 대한 지금까지의 컨센서스 영역은 넓지 않다. 좋은 평가원칙을 적용하여 교과과정과 잘 맞는 방법을 적용하고, 다양한 방식을 활용한 프로그램적 접근법(programmatic approach)를 사용하며, 다학문적 관점에서 정교한 측정모델을 활용해야 한다. 사회적 책무성을 다하기 위해서는 다양한 계층을 포용하여 노동력(배치) 문제, 문호를 넓히는 문제 등이 선발 원칙에 적용되어야 할 것이다.

The current situation: Widening access

지방 출신 학생들이 더 지방에서 자리를 잡는다는 것은 잘 알려져 있다.

It is acknowledged that rural students are more likely to practise in rural locations after graduation. A common approach has been to institute quotas for such groups

Consensus and conclusion

Regehr가 제시한 programmatic research로 돌아가볼 필요가 있다. van der Wleuten과 Schuwirth는 'programmatic assessment'를 이야기한다.

It may be useful to take a step backwards from the pursuit of unifying theory to consider Regehr’s (2004) concept of programmatic research where ‘communities’ of researchers work together towards an eventual goal of consensus

van der Vleuten and Schuwirth (2005) have argued that thinking about assessment should be moved from a consideration of methods to programmes; another use of the term ‘programmatic’ this time in programmatic assessment. Programmatic assessment concentrates on the overall programme of assessment with a combination of methods, each with their differing psychometric properties, to make decisions about student performance.


(1) Admissions committees and all who have an interest in selection processes should adopt the principles of good assessment in defining the purpose of selection

blueprinting of assessable domains and attributes, 

selecting appropriate formats, 

employing transparent standard setting and decision making, 

and including an evaluation cycle in a programmatic manner.

(2) An integrative approach should apply the principles of good assessment and curriculum alignment along the education and training pathway including the progression hurdles between health professional degrees, prevocational practice and basic and advanced speciality training.

(3) There should be a focus on multi-method programmatic approaches in collecting, analysing, interpreting and reporting data from a range of selection instruments, which are fit for purpose

(4) There needs to be an emphasis on developing interdisciplinary theoretical frameworks that underpin development of both policy and the research agenda.

(5) There is an urgent need for the development of sophisticated measurement models from the family of regression methods which will require application to multi-site high-quality data sets, for increasing the sophistication of predictive validity studies using a range of attributes from selection blueprints, and for a focus on test–retest reliability.

(6) The social accountability of universities demands that social inclusion, workforce issues, consumer choice and widening of access to students of promise are embedded in the principles of good assessment for selection with recognition that there are political (and non-universal) issues that need to be considered in the definition of optimal decisions.

(7) Outreach, targeting strategies, preparation programmes and conditional selection should be considered as core strategies for medical and health professional schools to achieve their widening access missions.

 2011;33(3):215-23. doi: 10.3109/0142159X.2011.551560.

Assessment for selection for the health care professions and specialty trainingconsensus statement andrecommendations from the Ottawa 2010 Conference.


Medical Education, Flinders University, GPO Box 2100, Adelaide, South Australia 5064, Australia. david.prideaux@flinders.edu.au


Assessment for selection in medicine and the health professions should follow the same quality assurance processes as in-course assessment. The literature on selection is limited and is not strongly theoretical or conceptual. For written testing, there is evidence of the predictive validity of Medical College Admission Test (MCAT) for medical school and licensing examination performance. There is also evidence for the predictive validity of grade point average, particularly in combination with MCAT for graduate entry but little evidence about the predictive validity of school leaver scores. Interviews have not been shown to be robust selection measures. Studies of multiple mini-interviews have indicated good predictive validity and reliability. Of other measures used in selection, only the growing interest in personality testing appears to warrant future work. Widening access to medical and health professional programmes is an increasing priority and relates to the social accountability mandate of medical and healthprofessional schools. While traditional selection measures do discriminate against various population groups, there is little evidence on the effect of non-traditional measures in widening access. Preparation and outreach programmes show most promise. In summary, the areas of consensus forassessment for selection are small in number. Recommendations for future action focus on the adoption of principles of good assessment and curriculum alignment, use of multi-method programmatic approaches, development of interdisciplinary frameworks and utilisation of sophisticated measurement models. The social accountability mandate of medical and health professional schools demands that social inclusion, workforce issues and widening of access are embedded in the principles of good assessment for selection.

미래의 의사로서, 전문직으로서 수행능력을 예측하는데 성공한 입학요소는 무엇이고, 실패한 입학요소는 무엇인가?

Overview: what’s worked and what hasn’t as a guide towards predictive admissions tool development

Eric Siu, Harold I. Reiter

전 세계의 입학 위원회 및 연구자들은 미래의 의사로서, 전문직으로서 수행능력을 예측할 수 있는 방법을 개발, 도입하려고 근면성실하게, 상상력을 발휘하여 매진해왔다. 그러나 대부분의 학계에서 미래의 직업 수행능력을 예측하는데 성공했던 것들도 고도로 경쟁적인 의과대학 지원자의 세계에서는 통하지 않았다. 의과대학에 들어오고자 하는 매우 높은, 그리고 좁은 범위의 사람들 중에서 선발을 하고자 할 때는 가장 신뢰할 수 있는 평가도구만을 적용할 필요가 있다. 

미래의 수행능력에 예측타당도를 보여준 도구로는 GPA, MCAT과 같은 aptitude test, 그리고 MMI와 같은 비인지적 시험(non-cognitive testing) 이 있다.

반면, 이러한 예측력을 충족시키지 못한 도구들은 훨씬 더 많은데, 개인면접, 자기소개서, 추천서, 성격검사, 감성지능, 그리고 (적어도 지금까지는) 상황판단시험이 있다. 

예측타당도의 관점에서 본다면, 이러한 측정도구들이 예측에 성공하거나 변화하는 양상은 미래에 다른 측정도구 개발에 중요한 통찰을 제시해준다. 

 2009 Dec;14(5):759-75. doi: 10.1007/s10459-009-9160-8. Epub 2009 Apr 2.

Overviewwhat's worked and what hasn't as a guide towards predictive admissions tool development.


McMaster University, 1200 Main Street West, MDCL 3112, Hamilton, ON, L8N 3Z5, Canada.


Admissions committees and researchers around the globe have used diligence and imagination to develop and implement various screening measures with the ultimate goal of predicting future clinical and professional performance. What works for predicting future job performance in the human resources world and in most of the academic world may not, however, work for the highly competitive world of medical school applicants. For the job of differentiating within the highly range-restricted pool of medical school aspirants, only the most reliable assessment tools need apply. The tools that have generally shown predictive validity in future performance include academic scores like grade point average, aptitude tests like the Medical College Admissions Test, and non-cognitive testing like the multiple mini-interview. The list of assessment tools that have not robustly met that mark is longer, including personal interview, personal statement, letters of reference, personality testing, emotional intelligence and (so far) situational judgment tests. When seen purely from the standpoint of predictive validity, the trends over time towards success or failure of these measures provide insight into future tool development.





[PubMed - indexed for MEDLINE]

의과대학 중퇴에 미치는 영향 - 입학시 보는 시험 vs 전적대학 성적 

Medical school dropout - testing at admission versus selection by highest grades as predictors

Lotte O’Neill,1,2,3 Jan Hartvigsen,1,2,4 Birgitta Wallstedt,3 Lars Korsholm1 & Berit Eika5


의과대학 중퇴자에 대한 입학시험(admission test)의 효과를 다룬 연구는 매우 극소수이다. 이 연구의 주 목적은 비성적기준 입학시험(non-grade based admission testing)과 성적 기준 입학(grade-based admission)이 차후 의과대학 중퇴자와 어떤 관계가 있는지 알아보고자 하는 것이다.


이번 전향적 코호트 연구는 University of Southern Denmark에서 2002–2007년에 입학한 총 1544명으로 구성된 6개의 코호트를 대상으로 하였다. 절반의 입학생은 이전 학업 성적 성과(prior achivement of highest grades)로 입학하였으며 (전형 1), 나머지 반은 비성적기준 시험(non-grade-based admission test)를 통과해야 했다.(전형 2)

학생들에 대한 사회적 예측 변인들 (부모 중 의사가 있는지, 출신지, 부모 특성, 부모와 같이 사는지, 부모의 교육 수준은 어떠한지)도 같이 조사되었다. 최종적으로 보고자 하는 것은 학생이 입학 2년 후에 어느 정도나 중퇴를 하게 되는가였다. Multivariate logistic regression analysis를 사용하였다.


'전형 2' 학생들이 2년 내에 중퇴할 상대위험(relative risk)이 더 낮았다(odds ratio 0.56, 95% confidence interval 0.39–0.80). 입학 전형과 Qualifying examination의 종류, 그리고 프로그램에 대한 우선순위가 최종 모델에 유의미하게 기여한 요인들이었다. 사회적 예측 변인들은 중퇴자 뿐만 아니라 전형 2 입학시험 점수도 예측하지 못했다.


입학시험을 통해서 학생을 선발하는 것은 중퇴자에 대한 독립적, 보호적 효과가 있다.

Powis 등의 연구에 따르면 21개의 서로 다른 입학 면접의 subscale과 subscore를 의과대학 중퇴의 예측인자로 놓고 case-control study를 했을 때, 이 중 단 하나만이 중퇴와 상관관계가 있었다. (number of negative comments assigned by interviewers to the subscale for supportive and encouraging behaviour)

Powis et al. examined 21 different admission interview subscales and subscores as predictors of dropout in a case–control study, and found only one of these to be significantly associated with dropout.2 The number of negative comments assigned by interviewers to the subscale for supportive and encouraging behaviour was found to be associated with dropout (OR = 1.65, 95% confidence interval [CI] 1.01–2.70).2

 1992 Dec;22(6):692-8.

The structured interview as a tool for predicting premature withdrawal from medical school.


Faculty of Medicine, University of Newcastle, NSW.

반대로 Urlings-Strop 등은 비성적기준 입학시험(non-grade based admission test)에 대한 조금 더 긍정적인 사례를 보고했는데, 선발된(selected) 학생들이 추첨(lottery-admitted)으로 입학한 학생들보다 학교에 남을 가능성이(relative risk가) 2배 높았다. 이 연구는 또한 전적대학의 우수한 GPA로 입학한 세 번째 그룹(direct access)에 대한 언급도 했지만 direct access그룹과 선발된(selected) 그룹을 비교하지는 않았다.

By contrast, Urlings-Strop et al. presented a more optimistic case for the protective effect of non-grade-based admission tests on student dropout.3 They found that selected students were more than twice as likely as lottery-admitted control subjects to remain in school (relative risk [RR] 2.58, 95% CI 1.59–4.17; p = 0.000).3 This study also referred to a third admission group (‘direct access’) consisting of students with the highest pre-university grade point averages (GPAs), but did not report a comparison of dropout rates between the ‘direct access’ and ‘selected’ groups.3


 2009 Feb;43(2):175-83. doi: 10.1111/j.1365-2923.2008.03267.x.

Selection of medical students: a controlled experiment.


Institute of Medical Education and Research, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, The Netherlands. l.urlings-strop@erasmusmc.nl


Descriptions of all variables delivered by the USD Admission Office and Statistics Denmark were scrutinised to check for changes in data collection methods. The variables set forth in the research protocol were either prepared for analysis (educational ⁄USD data) or generated by merging various datasets (social ⁄ Statistics Denmark variables). Missing data for the social variables generated from Statistics Denmark data were categorised with the nonevent ⁄ reference category and summary tables were produced.

Variables were then examined for co-linearity and zero cells before analyses, by inspection of matrix graph plots, 2 · 2 tables and boxplots

Individual predictors of dropout were then examined with univariate logistic regression analyses, and variables with p < 0.1 were eventually included in the multivariate models

Multivariate logistic regression was used to analyse the dichotomous outcome of dropout ⁄ non-dropout. Post-estimation diagnostics of models consisted mainly of checking linearity assumptions and influential data points

Additivity was assumed because we did not want to risk overfitting models by including interactions , due to the relatively large number of potential predictors and the modest number of dropouts. 

The linearity of the age variable was checked by inspecting LOWESS (locally weighted scatterplot smoothing) smoothed plots, with the logit transformed probability of dropout on the y-axis against age on the x-axis. 

Influential cases or cases for which the model fitted poorly were identified by inspection of deviance residuals, leverage and Pregibon’s delta-beta influence statistic.5 Influential cases were inspected to establish whether they were outliers on any of the predictor variablesin order to assess whether or not they should be removed from analysis.

Key findings

The aim of this study was to examine whether admission strategy was independently associated with dropout while controlling for relevant educational and socio-demographic variables. 

Participation in an admission test, general gymnasium exam and being enrolled on a first-priority programme were all independently associated with a reduced risk of dropout. 

By contrast, socio-demographic variables had little or no independent influence on dropout (Tables 2 and 3). This study is, to the best of our knowledge, the first published study to compare admission testing and pure grade-based admission on the outcome of dropout in medical education.

 2011 Nov;45(11):1111-20. doi: 10.1111/j.1365-2923.2011.04057.x.

Medical school dropout--testing at admission versus selection by highest grades as predictors.


Institute of Sports Science and Clinical Biomechanics, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark. ldyhrberg@health.sdu.dk



Very few studies have reported on the effect of admission tests on medical school dropout. The main aim of this study was to evaluate the predictive validity of non-grade-based admission testing versus grade-based admission relative to subsequent dropout.


This prospective cohort study followed six cohorts of medical students admitted to the medical school at the University of Southern Denmark during 2002-2007 (n=1544). Half of the students were admitted based on their prior achievement of highest grades (Strategy 1) and the other half took a composite non-grade-based admission test (Strategy 2). Educational as well as social predictor variables (doctor-parent, origin, parenthood, parents living together, parent on benefit, university-educated parents) were also examined. The outcome of interest was students'dropout status at 2 years after admission. Multivariate logistic regression analysis was used to model dropout.


Strategy 2 (admission test) students had a lower relative risk for dropping out of medical school within 2 years of admission (odds ratio 0.56, 95% confidence interval 0.39-0.80). Only the admission strategy, the type of qualifying examination and the priority given to the programme on the national application forms contributed significantly to the dropout model. Social variables did not predict dropout and neither did Strategy 2admission test scores.


Selection by admission testing appeared to have an independent, protective effect on dropout in this setting.

© Blackwell Publishing Ltd 2011.





[PubMed - indexed for MEDLINE]

의과대학에서의 수행능력 예측에 있어서 UMAT점수와 GPA의 비교

Comparison of UMAT scores and GPA in prediction of performance in medical school: a national study

Phillippa Poole,1 Boaz Shulruf,2 Joy Rudland3 & Tim Wilkinson4


의과대학들은 의학을 공부하고, 훈련받고, 행하기에 가장 적합한 특성을 가진 학생을 선발하기 위한 노력을 지속하고 있다. 인지적 능력을 측정하는 시험과 지난 학업성적 등등을 조합한 방법을 사용하고 있으나 그 예측타당도(predictive validity)는 알려진바가 없다. 이 연구에서는 Undergraduate Medicine and Health Sciences Admission Test (UMAT) 점수, GPA, 혹은 이 둘의 조합의 예측타당도를 비교해보았다.


뉴질랜드에서 UMAT점수로 선발된 2003년 이후의 학생들(1346명)을 대상으로 하였다. 회귀분석 모델에는 인구학적 데이터, UMAT점수, 입학시 GPA와 일상적 평가에서의 수행능력을 포함시켰다.


두 기관간 UMAT에 준 가중치가 다르고, 학생들의 인구학적 구성과 프로그램의 작은 차이들이 있었지만, 결과는 비슷하였다. 입학시 GPA(admission GPA)의 예측력은 2학년, 5학년, 6학년 프로그램에서 가장 뛰어났고, 전체 분산의 17-35%를 설명하였다. 반면 UMAT점수는 분산의 10%이내만을 설명하였다. UMAT점수가 가장 높은 예측력을 보여준 것은 5학년의 필기시험에서 9.9%를 설명한 것이었다. UMAT과 GPA를 합한 것은 대부분의 결과에 걸쳐서 약간의 설명력을 증가시켜줬다. 학년군(grading bands)이나 수가 더 적었음에도 UMAT점수나 GPA모두 인턴 수련을 마친 시점에서의 결과를 예측하지는 못하였다. 


일반적인 인지능력 검사인 UMAT이 의과대학 프로그램에서의 성적을 설명해내는 부분은 GPA의 그것에 비해서 더 적었다. 하지만 UMAT은 GPA와 결합해서 사용할 시에 약간의 설명력을 높여주었다. UMAT점수는 이번 연구에서 다루지 않은 결과를 예측해줄지도 모르기 때문에 추가의 연구가 필요해 보인다.


Separate analyses were conducted for each programme. Each analysis used regression models to measure the predictive association between UMAT score, admission GPA or combined GPA and UMAT score, and the outcomes listed in Table 1 when background factors were controlled for (age, ethnicity, graduate, rural pathway). For students at the University of Auckland, the interview score was included among the background factors.

An R2 multiple linear regression model was used when the dependent variable consisted of continuous scores; a Nagelkerke pseudo R2 ordinal regression model was used when the dependent variable was categorical (Distinction, Pass, Fail). Seven regression models were established for each outcome in each programme; an example is given in Table 2. One of these models included the background variables alone. The remaining models included background variables with admission GPA, UMAT score, admission GPA and UMAT score combined, or one of the UMAT sections. 

Thus, for each outcome in each year and university, it was possible to quantify the net predictive effect of the UMAT score (overall score or any section score), admission GPA, or both, on outcomes over and above other information available at selection. This was calculated by extracting the percentage of variance explained by the background factors from the total variance explained by the model.

Interactions among the independent variables were not measured because the outcome of interest was the predictive power of these variables. 

Multi-collinearity among the admission GPA and scores on the UMAT sections was measured by the variance inflation factor (VIF). This was < 1.4 for each regression, well below the unacceptable level of 10. There are no published data on the reliability of the UMAT. Although reliability data are calculated for many of the assessment outcomes that contribute to an overall year result, the reliability of the overall result cannot be determined.

 2012 Feb;46(2):163-71. doi: 10.1111/j.1365-2923.2011.04078.x.

Comparison of UMAT scores and GPA in prediction of performance in medical school: a national study.


Department of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand. p.poole@auckland.ac.nz



Medical schools continue to seek robust ways to select students with the greatest aptitude for medical education, training and practice. Tests of general cognition are used in combination with markers of prior academic achievement and other tools, although their predictive validity is unknown. This study compared the predictive validity of the Undergraduate Medicine and Health Sciences Admission Test (UMAT), the admission grade point average (GPA), and a combination of both, on outcomes in all years of two medical programmes.


Subjects were students (n = 1346) selected since 2003 using UMAT scores and attending either of New Zealand's two medical schools. Regression models incorporated demographic data, UMAT scores, admission GPA and performance on routine assessments.


Despite the different weightings of UMAT used in selection at the two institutions and minor variations in student demographics and programmes, results across institutions were similar. The net predictive power of admission GPA was highest for outcomes in Years 2 and 5 of the 6-year programme, accounting for 17-35% of the variance; UMAT score accounted for < 10%. The highest predictive power of the UMAT score was 9.9% for a Year 5 written examination. Combining UMAT score with admission GPA improved predictive power slightly across all outcomes. NeitherUMAT score nor admission GPA predicted outcomes in the final trainee intern year well, although grading bands for this year were broad and numbers smaller.


The ability of the general cognitive test UMAT to predict outcomes in major assessments within medical programmes is relatively minor incomparison with that of the admission GPA, but the UMAT score adds a small amount of predictive power when it is used in combination with theGPA. However, UMAT scores may predict outcomes not studied here, which underscores the need for further validation studies in a range of settings.

© Blackwell Publishing Ltd 2012.





[PubMed - indexed for MEDLINE]

면접의 평등 : 개인적인 특징(인종, 성별, 사회경제적 지위)이 면접 점수에 영향을 줄까?

Equity in interviews: do personal characteristics impact on admission interview scores?

Andrew B Lumb,1 Matthew Homer1 & Amy Miller2


일부 사회 계층이 의대 선발 시스템상 불이익을 받고 있다는 연구결과가 있다. 선발의 각 단계 중 어느 단계에서 이런 일이 벌어지는지는 알 수 없지만, 서로 면대 면으로 만나게 되는 면접 상황에서는 이러한 사회적 비뚤림(bias)가 생길 수 있다.


우리는 영국 의과대학의 한 해의 입학에서 이뤄진 면접에 대한 자세한 조사를 하였다. 면접관과 응시자에 대한 개인적인 성향을 조사하여 이들 중 어떤 요인이 면접관-응시자 매칭에 따라서 면접 점수에 영향을 주는가를 보았다.


총 320명의 면접관이 734명의 응시자를 평가하였고, 2007년의 면접관-응시자 상호작용에 대한 자료를 분석하였다. 일반화 이론(generalizability theory)에 따른 면접의 신뢰도는 0.82-0.87이었다. 면접관과 응시자 모두에서 성별, 인종, 사회경제적상태, 학적에 따라서 받은 점수와 준 점수의 차이는 없었다. 스텝면접관과 학생면접관의 점수간에도 유의미한 차이가 있지 않았다. 각각의 스텝면접관 그룹은 그 수가 너무 작아서 통계적 분석을 할 수 없었지만, 서로 다른 전공을 가진 면접관 또는 서로 다른 면접경험을 가진 면접관 사이에 유의미한 차이는 없었다.


이 연구결과로부터 '면접' 단계가 일부 사회적 그룹이 의과대학 선발에서 불이익을 받는 단계는 아닌 것으로 나타났다. 이러한 결과는 또한 시니어 의과대학 학생을 면접에 참여시키는 것을 지지해준다. 면접 점수가 미래의 학업적, 임상적 성공을 예측하는데 유용하다는 근거는 부족하지만 대부분의 의과대학은 면접을 선발 과정에서 중요하게 활용하고 있다. 우리의 연구는 면접이 선발 과정에 사회적 bias를 증가시키는 것은 아님을 보여주었다. 

At interview, the candidates are assessed in five separate areas covering: 

insight into a career in medicine; 


social and cultural awareness; 

non-academic achievements, 

interpersonal skills. 

These five areas are designed to cover the personal qualities of applicants that are regarded by the admissions committee as desirable characteristics in future doctors 

(see http://www.leeds.ac.uk/medicine/admissions/personal.html for further details). 

The constructs explored in these areas have been evolved over several years through continuous development and monitoring by the admissions committee of the medical school.

The reliability of the interviewing process was calculated using variance components MINQUE methods in SPSS Version 15 (SPSS, Inc., Chicago, IL, USA), treating interview scores as the dependent variable and both interviewee and interviewer as random effects in a mixed-effects linear model.7 This allows a generalisability coefficient to be calculated as the proportion of the variance in the interview scores that can be properly attributed to the interviewees, with all non-interviewee variance treated as error. 

Data analysis was carried out in three separate parts: 

(i) analysis of interviewee performance; 

(ii) analysis of interviewer performance, and 

(iii) analysis of interviewee–interviewer interactions. 

In all three parts, the potential effect of dependency in the data (interview scores are partially nested within candidates and interviewers) has been ignored in order to simplify the statistical analysis. It is therefore possible that any effects that appear to be statistically significant are (slightly) overstated in our findings. However, the substantive nature of the main findings is not affected.

Analysis of interviewee performance

Potential determinants of performance by interviewees were analysed using univariate general linear models 

with total interview score as the outcome variable

gender, ethnicity (White ⁄non-White), school type (independent ⁄ state selective ⁄ state nonselective [as the reference group]) as fixed effects

and socio-economic classification and date of birth as covariate dependent variables.

A main effects-only model indicated that no predictors were playing a significant role in determining the marks awarded to interviewees and explained < 1% of the variation in the data. 

The full factorial model (all main effects and their interactions) was still relatively poor, explaining only approximately 2% of the variation in the interview total mark. 

Thus, most of the variation in the marks was not accounted for by the available predictors. In this model, no main effects were statistically significant and the largest interaction effect was for school type (state selective versus ethnicity; F1,609 = 9.352, p = 0.002, effect size 2%), whereby non-White students from selective schools tended to be awarded slightly lower marks than their White counterparts. The difference lay in the opposite direction for applicants who did not come from such schools

Analysis of interviewer performance

The potential determinants of marks awarded by interviewers were also analysed using univariate general linear models

with mean interview score awarded by the interviewer as the outcome variable

gender, ethnicity (White ⁄non-White), staff or student and school type as fixed effects

and socio-economic classification as the covariate dependent variable.

A simple model including only main effects found no predictors playing a significant role and explained almost none of the variance in the data. 

The full factorial model, including predictors and all interactions and explaining 4.8% of the variation in mean marks, found a small but significant gender main effect, whereby male interviewers awarded slightly higher marks than females (estimated marginal means 11.1 and 10.7, respectively; F1,280 = 3.999, p = 0.047, effect size 1%). There was also evidence in this model of small interaction effects, including school type (independent) with ethnicity: those from independent schools tended to give slightly higher marks to non-White candidates, whereas those not from independent schools tended to give higher marks to White candidates (F1,280 = 9.569, p = 0.002, effect size 3%).

A separate analysis pertaining to staff interviewers only was carried out with interviewer experience included in the model as a covariate. However, this variable did not play a significant role in influencing interview scores.

 2010 Nov;44(11):1077-83. doi: 10.1111/j.1365-2923.2010.03771.x.

Equity in interviews: do personal characteristics impact on admission interview scores?


Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, UK. a.lumb@leeds.ac.uk



Research indicates that some social groups are disadvantaged by medical school selection systems. The stage(s) of a selection process at which this occurs is unknown, but at interview, when applicant and interviewer are face-to-face, there is potential for social bias to occur.


We performed a detailed audit of the interview process for a single-entry year to a large UK medical school. Our audit included investigating the personal characteristics of both interviewees and interviewers to find out whether any of these factors, including the degree of social matching between individual pairs of interviewees and interviewers, influenced the interview scores awarded.


A total of 320 interviewers interviewed 734 applicants, providing complete data for 2007 interviewer-interviewee interactions. The reliability of the interview process was estimated using generalisability theory at 0.82-0.87. For both interviewers and interviewees, gender, ethnic background, socio-economic group and type of school attended had no influence on the interview scores awarded or achieved. Staff and student interviewer marks did not differ significantly. Although numbers in each group of staff interviewers were too small for formal statistical analysis, there were no obvious differences in marks awarded between different medical specialties or between interviewers with varying amounts of interviewing experience.


Our data provide reassurance that the interview does not seem to be the stage of selection at which some social groups are disadvantaged. These results support the continued involvement of senior medical students in the interview process. Despite the lack of evidence that an interview is useful for predicting future academic or clinical success, most medical schools continue to use interviews as a fundamental component of their selection process. Our study has shown that at least this arguably misplaced reliance upon interviewing is not introducing further social bias into the selection system.

© Blackwell Publishing Ltd 2010.





[PubMed - indexed for MEDLINE]

신뢰도 추정 : 의과대학 입학에서 행동평가 스테이션과 설문지

Reliability estimates: behavioural stations and questionnaires in medical school admissions

Naomi Gafni,1 Avital Moshinsky,1 Orit Eisenberg,1 David Zeigler1 & Amitai Ziv2


의과대학 지원자의 비인지적 특질을 평가하는 센터는 보고자 하는 특질을 정확하게 반영하는 측정점수를 얻을 수 있어야 한다. 지금까지 이들 센터들의 신뢰도 계수(reliability coefficients)는 제한된 숫자의 샘플과 개인적인 집행(administration)에 의해서만 이뤄졌고 반복시험 및 다수 센터에서 동일한 특질을 평가할 때 생기는 에러에 대한 레퍼런스 없이 이뤄져왔다.


이스라엘의 The National Institute for Testing and Evaluation는 두 평가센터를 만들었다. MOR은 두 개의 의과대학과 하나의 치의과대학, MIRKAM은 다른 의과대학에 의해서 사용되었다. 각 센터는 8~9개의 행동평가 스테이션(behavioral stations)로 이뤄져있고, 표준화된 문항과 평가자, 그리고 의사결정 설문지(decision making questionnaire)로 되어있다. 우리는 각 센터의 8~9개의 스테이션에 대해 각 해의 일반화 계수(generalizability coefficient)를 계산하여 평가 센터 전체에 대한 신뢰도 계수(reliability coefficient)로 종합하고, 시험-재시험 상관관계, 센터간 상관관계를 구하였다.


2006년과 2009년 사이에 2662명과 2003명의 응시자가 각각 MOR과 MIRKAM에서 시험을 보았다. 1479명은 두 개 모두에서 시험을 보았다. 평균적인 일반화 계수는 0.69, 0.67(각각의 센터에 대해)였다. composite reliability coefficient는 0.79, 0.76이었다. 재시험에 대한 상관관계는 0.59, 0.43이었고, 전체 평가에 대해서는 0.72와 0.65였다. MOR와 MIRKAM 스테이션간 상관관계는 0.56이었다.


high-stakes decision making에 필요한 최소한의 신뢰도(0.80)은 14~15개의 스테이션을 시행했을 때만이 가능하다. 그럼에도 불구하고 여기서 얻어진 값들은 한 번의 면접에서 얻어지는 신뢰도보다는 훨씬 높다. Questionnaire는 평가의 정확성에 큰 기여를 하며, reliability 는 validity의 상한선(upper threshold)를 구성한다.

 2012 Mar;46(3):277-88. doi: 10.1111/j.1365-2923.2011.04155.x.

Reliability estimatesbehavioural stations and questionnaires in medical school admissions.


National Institute for Testing and Evaluation (NITE), Jerusalem, Israel. naomi@nite.org.il



Assessment centres used in evaluating the non-cognitive attributes of medical school candidates must generate scores that reflect as accurate a measurement as possible of these attributes. Thus far, reliability coefficients for such centres have been based on limited samples and individual administrations, without reference to the error of variance that may result from retesting, or from the existence of multiple centres designed to measure the same attributes.


The National Institute for Testing and Evaluation in Israel has developed and administered two assessment centres: MOR is used by twomedical schools and one dental school, and MIRKAM by another medical school. Each centre comprises eight or nine behavioural stations, a standardised biographical questionnaire, and a judgement and decision-making questionnaire. We calculated generalisability coefficients for each centre's eight or nine stations by year, composite reliability coefficients for the overall assessment centres, test-retest correlation coefficients for repeaters, and a correlation coefficient between the centres.


Between 2006 and 2009, 2662 and 2023 examinees participated in MOR and MIRKAM, respectively; 1479 of these participated in both. The average generalisability coefficients for the stations were 0.69 for MOR and 0.67 for MIRKAM. The composite reliability coefficients for the full centres (behavioural stations plus questionnaires) were 0.79 and 0.76 for MOR and MIRKAM, respectively. The correlations for repeaters, corrected for restriction of range, were 0.59 and 0.43 for MOR and MIRKAM stations, respectively, and 0.72 and 0.65 for the full MOR and MIRKAM assessments, respectively. The correlation between scores on the MOR and MIRKAM stations was 0.56 (0.75 for the overall score).


The minimal reliability desirable for high-stakes decision making (0.80) was obtained only for 14 or 15 stations with questionnaires. Nevertheless, the values obtained are considerably higher than reliability coefficients for single interviews. The questionnaires contribute significantly to the accuracy of the measurement. These reliability measures constitute an upper threshold for measures of validity.

© Blackwell Publishing Ltd 2012.

[PubMed - indexed for MEDLINE]

MCAT 언어추론 점수 : 영어가 모국어가 아닌 학생에서 의과대학 성적 예측이 떨어짐.

MCAT Verbal Reasoning score: less predictive of medical school performance for English language learners

Babbi Winegarden,1 Dale Glaser,2 Alan Schwartz3 & Carolyn Kelly4


MCAT점수는 의과대학 입학생 선발 기준에 널리 사용된다. 영어를 외국어로 배운 학생은 모국어가 아닌 언어로 시험을 보는 것에 있어서 불이익이 있을 수 있다. English language learners(ELL), 즉 영어를 11세 이후에 배운 학생과 그렇지 않은 학생간 MCAT의 언어추론(verbal reasoning, VR) 점수에서 유의미한 차이를 보인다는 연구결과가 있다. 이번 연구의 목적은 VR 점수와 의과대학에서의 성적간 관계가 ELL과 non-ELL에서 서로 다른 경향을 보이는지를 보고자 하는 것이다.


University of California San Diego School of Medicine admissions files and the Association of American Medical Colleges database로부터 MCAT VR시험과 학생의 의과대학 성적(평점, 시험점수 등)을 1998년~2005년 입학생(2002년~2009년 졸업생) 924명의 학생으로부터 얻었다. 회귀분석 모델을 활용하여 MCAT VR시험점수가 의과대학 성적을 비슷하게 예측하는지를 비교하였다.


ELL여부에 따라 임상실습 전 성적, 학업 우수성, USMLE Stpe 2 CK점수, 두 개의 임상실습 shelf examination 등등을 비롯한 몇 가지 결과에 대해서 VR점수의 예측 정도가 크게 차이가 있었다. ELL학생보다 non-ELL학생에서 VR점수와 의과대학 수행능력간의 높은 상관관계가 나타났다.


MCAT VR점수는 의과대학에서 ELL지원자를 평가할 때 좀 더 신중하게 사용되어야 할 것이다.

Data analysis

A hierarchical approach was used for testing a moderated multiple regression model. For each outcome variable, we first entered ELL status (ELL or non-ELL) and MCAT VR scores as predictors, and then, in a second step, entered the interaction between the two predictors into the model. Scores on the VR sub-test were centred in order to minimise non-essential collinearity.10,11

Linear regressions were performed when outcomes were continuous and expected to be normally distributed. When the outcome variable was binary (e.g. pass ⁄ fail), multiple logistic regression was conducted. These analyses were conducted using SPSS Version 18.0.2 (SPSS, Inc., Chicago, IL, USA). When outcomes were counts of unusual events (e.g. number of failed courses), zero-inflated Poisson regression was performed using Mplus Version 6.1 (Muthe´n &Muthe´n, Los Angeles, CA, USA).

In these models, the key test for the differential predictive ability of the VR sub-test on medical school outcomes for ELLs and non-ELLs is whether the interaction term is significant. In the coding of our data, a significant negative interaction would be interpreted as a finding that the relationship between VR score and outcome is stronger for non-ELLs than for ELLs. When interactions were significant, we examined the associations between VR score and outcome within each group.

 2012 Sep;46(9):878-86. doi: 10.1111/j.1365-2923.2012.04315.x.

MCAT Verbal Reasoning scoreless predictive of medical school performance for English language learners.


Division of Medical Education and Department of Psychiatry, University of California San Diego, La Jolla, California 92093-0092, USA. bwinegarden@ucsd.edu



Medical College Admission Test (MCAT) scores are widely used as part of the decision-making process for selecting candidates for admission to medical school. Applicants who learned English as a second language may be at a disadvantage when taking tests in their non-nativelanguage. Preliminary research found significant differences between English language learners (ELLs), applicants who learned English after the age of 11 years, and non-ELL examinees on the Verbal Reasoning (VR) sub-test of the MCAT. The purpose of this study was to determine if relationships between VR sub-test scores and measures of medical school performance differed between ELL and non-ELL students.


Scores on the MCAT VR sub-test and student performance outcomes (grades, examination scores, and markers of distinction and difficulty) were extracted from University of California San Diego School of Medicine admissions files and the Association of American MedicalColleges database for 924 students who matriculated in 1998-2005 (graduation years 2002-2009). Regression models were fitted to determine whetherMCAT VR sub-test scores predicted medical school performance similarly for ELLs and non-ELLs.


For several outcomes, including pre-clerkship grades, academic distinction, US Medical Licensing Examination Step 2 Clinical Knowledge scores and two clerkship shelf examinations, ELL status significantly affects the ability of the VR score to predict performance. Higher correlations between VR score and medical school performance emerged for non-ELL students than for ELL students for each of these outcomes.


The MCAT VR score should be used with discretion when assessing ELL applicants for admission to medical school.

© Blackwell Publishing Ltd 2012.





[PubMed - indexed for MEDLINE]

의과/치의과 학생 선발에 있어서 인터넷 기반 MMI의 활용

Internet-based multiple mini-interviews for candidate selection for graduate entry programmes

David Tiller, Deborah O’Mara, Imogene Rothnie, Stewart Dunn, Lily Lee & Chris Roberts


2006년 이래로 The University of Sydney graduate medical and dental programmes에서는 Multiple Mini Interviews (MMI)를 활용해왔다. 2011년에는 국제 지원자(international candidates)를 대상으로 Skype를 이용한 (internet based MMI, iMMI)를 시행하였으며, 국내 지원자는 직접 MMI를 수행하였다. 우리는 이 두 가지 방법의 MMI에서 나온 점수가 서로 동등한지를 살펴보았다. 또한 iMMI의 실현가능성, 비용-효과성, 수용가능성 등을 살펴보고자 하였다.


2011 국제 지원자의 iMMI결과를 2009 국제 지원자의 MMI점수 및 2011년의 국내 지원자 MMI점수와 비교하였다. ANOVA를 사용하여 두 가지 형식의 MMI를 비교하였다. 이 과정의 수용가능성(acceptability)에 대한 정보는 피면접자와 면접자의 피드백을 통해서 얻었으며, 비용 절감(cost savings)를 추산하였다.


2011 iMMI점수와 2009년의 MMI점수에는 유의미한 차이가 없었다. 또한 국내 지원자와 국제 지원자간 2011년에 점수의 차이는 없었다. 국제 지원자의 MMI점수는 variation이 더 컸다. 일반화가능성 이론(generalizability theory)를 이용했을 때 9개의 iMMI 문항의 신뢰도는 0.76이었고, MMI는 0.70이었다. iMMI 상황의 전달은 원활히 이루어졌으며 면접자와 면접관은 형식과 전달에 대한 피드백을 주었다. 비용절감은 50,000호주달러였으며, 84%의 절감효과가 있었다.


우리는 이 연구가 High stake interview에 있어서 최초의 인터넷 기반 MMI에 대한 보고라고 생각한다. 여기서 면접관들은 iMMI를 통해서 타당하고 신뢰성 있는 결정을 내릴 수 있음을 보여주었고, 그 과정은 참여자도 수용할 수 있었고, 비용은 절감하면서도 직접 하는 MMI와 동등한 결과를 보여줬다. iMMI에서 분산이 조금 더 큰 것에 대한 향후 연구가 필요할 것이다.

Equivalence of MMI and iMMI scores

Two separate one-way analyses of variance (ANOVAs) were used to investigate the equivalence of the two formats by exploring whether the medium of interviewing resulted in significantly different mean scores for the in-person MMI and the internet-based iMMI

In the first analysis, the iMMI scores for international candidates in 2011 were compared with the in-person MMI scores for international candidates in 2009. To ensure that differences in mean scores were not attributable to the different application process for the two time periods and differences in the MMI questions, a comparison was also made for both cohorts for 2011. Thus, the scores for the iMMI for international candidates in 2011 were compared with the scores for the in-person MMI for local candidates in 2011. In accordance with the assumptions for ANOVA, the homogeneity of variance between each comparison was tested according to Levene’s statistic. Where the two groups were shown to have unequal variance, the non-parametric Brown–Forsythe median test was used to assess the difference between the average scores. Eta squared was used to examine the proportion of variance associated with the main effects assessed in the ANOVA.

The equality of the two formats was also investigated through a variance components analysis using a minimum norm quadratic unbiased estimation in order to compare effects for the 2011 iMMI candidates and the in-person MMI. We assumed that all effects were random given the unbalanced design. We estimated the variance due to ‘candidate’, ‘question’, ‘interviewer’ and ‘question X interviewer’Interviewers varied in the number of circuits they participated in, ranging from one to 15 for both formats, with an average of four circuits for the iMMI and three for the MMI. However, the interaction of ‘candidate X interviewer’ and ‘candidate X question’ could not be separated from each other because interviewers were confounded with MMI questions within each candidate and, thus, remained in the error term. There were insufficient candidates per circuit of individual questions to include ‘circuit’ as a variable in the analysis.

Reliability of the iMMI

A generalisability coefficient was calculated to provide the estimate of reliability for the iMMI and the MMI. The formula to determine the generalisability coefficient has been published elsewhere.4

(4 Roberts C, Walton M, Rothnie I, Crossley J, Lyon P, Kumar K, Tiller D. Factors affecting the utility of the multiple mini-interview in selecting candidates for graduate-entry medical school. Med Educ 2008;42 (4):396–404.)

 2013 Aug;47(8):801-10. doi: 10.1111/medu.12224.

Internet-based multiple mini-interviews for candidate selection for graduate entry programmes.


The University of Sydney, Sydney, Australia.



Multiple mini-interviews (MMIs) have been used by The University of Sydney graduate medical and dental programmes since 2006. In 2011, interviews with international candidates were conducted using Skype (iMMI), whereas interviews with local candidates were conducted in person. We determined whether the MMI scores derived from both methods were comparable. We describe the feasibility, acceptability and cost-effectiveness of the iMMI.


We compared 2011 international student internet-based iMMI results with data from 2009 international student MMIs and 2011 local student MMIs. Analyses of variance (anovas) were used to investigate equivalence of the two formats by exploring whether the medium of interviewing resulted in significantly different mean scores and variance for the in-person MMI and the iMMI. Acceptability of the process was informed by feedback surveys from interviewers and candidates, and cost savings were estimated.


No significant difference was found between the 2011 iMMI scores for international candidates and MMI scores in 2009 (p > 0.05). There was no significant difference between the MMI scores for local and international candidates in 2011 (p > 0.05); the MMI scores for international candidates had greater variation (p < 0.01). Using generalisability theory, the reliability of the nine-question iMMI was 0.76 and for the MMI was 0.70. Delivery of the iMMI occurred smoothly and candidates and interviewers gave positive feedback on its format and delivery. Cost savings have been estimated to be over AU$50 000, representing an 84% saving.


We believe this is the first study reporting an internet-based MMI for a high stakes interview. We have shown that interviewers were able to make valid and reliable decisions about candidates through the iMMI in a process that was acceptable to participants, producing comparable results to the in-person MMI with a saving of resources. The slightly wider variance in iMMI scores warrants further investigation.

© 2013 John Wiley & Sons Ltd.

[PubMed - in process]

의과대학 입학 전형 요소들의 4년 후 레지던트 성적 예측성

Effectiveness of medical school admissions criteria in predicting residency ranking four years later

Christopher Peskun, Allan Detsky & Maureen Shandling


캐나다의 의과대학들은 우수한 지원자 풀에서 학생을 선발하기 위해서 많은 노력을 기울이고 있다. 의학을 행하는데(practice of medicine) 있어서 기본적으로 중요한 비인지적 평가 대부분의 학교에서 이뤄지고 있다. 우리는 University of Toronto의 학업적, 비학업적 입학 평가요소를 분석하여 내과와 가정의학과 레지던트 프로그램에서의 을 예측하고자 하였다.


연구 대상은 University of Toronto에 1994년부터 1998년 사이에 입학한 학생들 중, 캐나다 레지던트 매칭 프로그램을 통해 졸업시 University of Toronto의 내과나 가정의학과에 매칭된 학생들이다. 입학 요소들이 의과대학 기간, 또는 레지던트 기간의 수행능력에 미치는 영향을 보았다.


내과 레지던트의 랭킹은 학부의 학점평균, 그리고 입학시의 비인지적

평가요소와 유의한 상관관계를 보였다. 또한 2학년의 OSCE점수, 내과 임상실습 점수, 의과대학 최종 성적과도 연관성을 보였다. 

가정의학과 레지던트의 랭킹은 입학시 면접 점수, 2학년의 OSCE점수, 가정의학과 임상실습 점수, 내과병동 임상실습 점수, 그리고 의과대학 최종 성적과 연관성을 보였다.


의과대학 입학시 평가하는 인지적 요소와 비인지적 요소 모두 졸업후의 성공수준을 예측하는데 중요하게 작용했다. 비인지적 평가요소들은 레지던트 프로그램에서의 랭킹을 예측함으로서 입학전형 요소로서의 유용성을 보여줬다.

KEYWORDS *school admission criteria; *schools, medical; Ontario; internship and residency ⁄ *standards; clinical competence ⁄ *standards; education, medical, undergraduate ⁄ *standards.


What is already known on this subject

Undergraduate GPA and MCAT scores are correlated with academic performance in medical school

What this study adds

Non-cognitive measures assessed at admission to medical school correlate with non-cognitive performance measures in medical school as well as overall residency ranking

Suggestions for further research

Methods for optimising cognitive and noncognitive evaluation of students at the time of medical school application to predict future success in Medicine

Statistical techniques

Using both univariate and multivariate techniques the analyses described above were completed. The numerical rankings of candidates applying to Internal Medicine were collapsed into 3 approximately equal categories: rank 1–40, 41–90 or >90. These categories were chosen based on the fact that those in the top category had a very high likelihood of acceptance, whereas those in the lowest category were certain not to be accepted. As a result of this categorical designation, a logistic regression model based on binary collapse was utilised during analysis.

Because there was less predictability in the matching process in Family Medicine, ranking was weighted equally from highest to lowest and was treated as a continuous variable. A linear regression model was therefore used in the analysis involving Family Medicine residency rank as the outcome variable.

The results are expressed in terms of slope of the regression line. As a result of these differences in data and statistical analysis, results for Internal Medicine are expressed throughout in terms of odds ratios, while those for Family Medicine are expressed in terms of slope. The units of the odds ratios for Internal Medicine results and the slopes for Family Medicine results were standardised for 1 standard deviation (SD) of the variable of interest. 

Through this standardisation, comparison of the results between both programmes is possible. A small proportion of candidates had applied for postgraduate training in both Internal Medicine and Family Medicine. Due to the blinded nature of the residency application process, these applicants had the ability to be ranked equally and independently by each residency programme and as such were considered to be independent members of both cohorts.

For all analyses involving the continuous outcome variables of OSCE score and final grade in medical school the results were expressed in terms of slope. The units of slopes in these cases were once again standardised for 1 SD of the variable of interest.

 2007 Jan;41(1):57-64.

Effectiveness of medical school admissions criteria in predicting residency ranking four years later.


University of Toronto, University of Toronto, University of Toronto, Canada. chris.peskun@utoronto.ca



Medical schools across Canada expend great effort in selecting students from a large pool of qualified applicants. Non-cognitive assessments are conducted by most schools in an effort to ensure that medical students have the personal characteristics of importance in the practice of Medicine. We reviewed the ability of University of Toronto academic and non-academic admission assessments to predict ranking by Internal Medicine and Family Medicine residency programmes.


The study sample consisted of students who had entered the University of Toronto between 1994 and 1998 inclusive, and had then applied through the Canadian resident matching programme to positions in Family or Internal Medicine at the University of Toronto in their graduating year. The value of admissions variables in predicting medical school performance and residency ranking was assessed.


Ranking in Internal Medicine correlated significantly with undergraduate grade point average (GPA) and the admissions non-cognitive assessment. It also correlated with 2-year objective structured clinical examination (OSCE) score, clerkship grade in Internal Medicine, and final grade in medical schoolRanking in Family Medicine correlated with the admissions interview score. It also correlated with 2nd-year OSCE score, clerkship grade in Family Medicine, clerkship ward evaluation in Internal Medicine and final grade in medical school.


The results of this study suggest that cognitive as well as non-cognitive factors evaluated during medical school admission are important in predicting future success in Medicine. The non-cognitive assessment provides additional value to standard academic criteria in predictingranking by 2 residency programmes, and justifies its use as part of the admissions process.

[PubMed - indexed for MEDLINE]

무시험 입학 학생과 시험을 보고 입학한 학생의 진전(progress) 비교

Progress of medical students after open admission or admission based on knowledge tests

Gilbert Reibnegger, Hans-Christian Caluba, Daniel Ithaler, Simone Manhal, Heide Maria Neges & Josef Smolle


오스트리아에서 일반적으로 대학 입학은 고등학교를 마친 사람이라면 누구에게나 열려있으나, 일부 과(의학과 치의학 포함)에서는 몇 가지 추가적인 입학기준을 적용하는 것이 2005년 European Court의 결정에 따라 가능해졌다. 우리는 의과대학 학생들의 성취도 변화의 차이를 시간에 따라 보고자 하였다.


2002/03부터 2007/08 년까지 Medical University of Graz의 Human medicine programme에 입학한 모든 2532명의 입학생을 대상으로 하였다.  Non-parametric, 그리고 semiparametric 생존분석 기술을 사용해서 admission test도입 전/후에 대해 첫 번째 두 학기를 마치는데까지 걸리는 시간을 비교하였다. 이 목표(첫 번째 두 학기를 마치는 것)를 도달하지 못하고 유급하게 되는 학생들의 시간적 패턴도 살펴보았다. 성별, 연령, 국적을 교란변수로서 고려하였다.


학생들이 성공적으로 학업을 마치는 것에 대한 누적확률이 무시험으로 입학한(admitted openly) 학생보다 selected student에서 훨씬 더 높았다. 무시험으로 입학한 학생들 중 20.1~26.4%의 학생만이 첫 두 학기를 1년에 마친데 비해서, 입학시험을 치르고 들어온 학생에서는 75.6~91.9%의 학생이 그 목표에 도달했다.


이 분석으로 의과대학에 있어 open admission에 비해서 performance-based 선발이 학습 성취를 크게 향상시킨다는 것을 확인할 수 있었다. 추가적으로, 유급의 비율이 크게 감소하였다. 입학시험은 학생의 시간이라는 측면과 공공의 자원이라는 측면에서 비용을 크게 절감시킨다고 할 수 있다.

Statistical methods

We decided to measure the effects of different modes of student admission on study progress by identifying the length of time that elapsed between a defined starting point (i.e. the start of term in the year of enrolment) and a defined endpoint (i.e. the time of successful completion of the first part of the study programme). 

When analysing such ‘waiting time’ data, the application of ordinary statistical methods is frequently hindered by the presence of so-called censored data: students differ substantially in their individual study progress; some of them succeed within the expected time, but others, for quite diverse reasons, need more time to achieve the same goal. 

In addition, a certain proportion of students ‘disappear’ from study: they change to another programme or university, or they drop out of study altogether. These individuals who do not reach the defined endpoint within the period of investigation contribute information to the study for a certain amount of time but not thereafter and are hence referred to as ‘censored’ observations. Commonly used multivariate techniques such as linear or logistic regression analysis cannot handle censored data and, thus, typically are not adequate for analysing waiting times properly. 

Consequently, for our investigation into data on study progress, we adopted statistical methods from the field of survival analysis.6 The non-parametric product limit technique by Kaplan and Meier7 was used to compute the cumulative probabilities for the study success of defined categories of students. 

In ‘normal’ survival studies, the results of Kaplan–Meier calculations are usually represented by depicting the cumulative probability of survival as a step function decreasing from 100% to smaller percentages, as observation time progresses. In such studies, the endpoint is usually a negative event, such as death. As the endpoint in our study is a desirable event, namely, the successful completion of the first part of the study programme, by contrast with conventional survival curves we decided to represent the results as ‘one minus survival’ curves (i.e. cumulative probabilities of success start at 0% and increase with observation time). 

Differences of these cumulative probabilities among different categories were tested by the generalised likelihood ratio method (Breslow chi-squared statistic). The semi-parametric proportional hazards model of Cox8 was then employed in order to study the effect of potential predictor variables in a multivariate manner and to identify the relative strength of each individual predictor variable in the context of all other variables.

The ‘hazard’ can be described as the instantaneous probability that an individual will experience an event at time t while this individual is at risk for an event.

In survival analysis, we generally distinguish between non-parametric, semi-parametric and parametric methods

For example, the Kaplan–Meier product limit method does not make any assumption about the underlying hazard function (‘baseline hazard’), but in a completely empirical way computes the cumulative probabilities of the terminating events merely from the data at hand: it is a non-parametric approach. 

Likewise, the Cox model does not make assumptions about the baseline hazard, but the effect of covariates is modelled in a parameterised fashionThe parameters are estimated from the data and allow quantification of the relative strength of the respective covariate (predictor variable). Therefore, the Cox technique is called a semi-parametric approach. 

A parametric model provides an explicit mathematical model for the baseline hazard assuming one of several possible distribution models (exponential distribution, Weibull distribution, Gompertz distribution and others) with adjustable parameters and, if appropriate, allows not only the estimation of the relative strengths of predictor variables, but also the prediction of the cumulative probabilities as a function of time by means of an analytic expression.

 2010 Feb;44(2):205-14. doi: 10.1111/j.1365-2923.2009.03576.x. Epub 2010 Jan 5.

Progress of medical students after open admission or admission based on knowledge tests.


Medical University of Graz, Graz, Austria. gilbert.reibnegger@medunigraz.at



Although admission to university in Austria is generally open for applicants who have successfully completed secondary school, in some areas of study, including human medicine and dentistry, the selection of students by additional criteria has become legally possible as a result of a decision by the European Court in 2005. We studied the impact of this important change on the temporal pattern of medical studentsprogressthrough the study programme.


All 2532 regular students admitted to the diploma programme in human medicine at the Medical University of Graz during the academic years 2002/03-2007/08 were included in the analysis. Non-parametric and semi-parametric survival analysis techniques were employed to compare the time required to complete the first two study semesters (first part of the curriculum) before and after the implementation of admission tests. Temporal patterns of dropout before this goal was achieved were also investigated. Sex, age and nationality of students were assessed as potential confounding variables.


The cumulative probability of study success was dramatically better in selected students versus those who were admitted openly (P < 0.0001). Whereas only 20.1-26.4% of openly admitted students completed the first two study semesters within the scheduled time of 1 year, this percentage rose to 75.6-91.9% for those selected by admission tests. Similarly, the cumulative probability for dropping out of study was also significantly lower in selected students (P < 0.0001). By univariate as well as multivariate techniques, student nationality, age and sex were also identified as partly significant, albeit weak, predictors.


The analysis convincingly demonstrates that, by contrast with open admission, performance-based selection of medical studentssignificantly raises the probability of successful study progress. Additionally, the proportion of dropouts is significantly reduced. Thus, admissiontests save considerable costs, in terms of both student time and public resources.

[PubMed - indexed for MEDLINE]

Multiple Mini-Interviews(MMI)가 임상실습과 의사면허시험에서의 수행능력을 예측한다.

Multiple mini-interviews predict clerkship and licensing examination performance

Harold I Reiter,1 Kevin W Eva,2 Jack Rosenfeld3 & Geoffrey R Norman2


Multiple Mini-Interview (MMI)가 초기의 의과대학 성적과 정(positive)의 상관관계가 있다는 것이 보여진바 있다. 이제 그 데이터들은 임상실습에서의 성적이나 의사면허시험에서의 성적과 비교해볼 수 있을 정도로까지 성숙되었다.


Michael G DeGroote School of Medicine at McMaster University에 지원한 117명의 지원자 중에서 45명이 입학하였고 임상실습 평가를 거쳐 MCCQE의 Part I시험까지 치뤘다. 이 117명에 대해서는  MMI점수, 전통적인 비인지적 능력 평가점수, 학부 학점(uGPA)의 데이터가 있다. 임상실습의 평가는 임상실습 총괄평가, OSCE평가, 과정평가(progress test score)로 구성되어 있다. MCCQE시험은 전문과와 관련된 하부 영역들이 포함되어 있으며, 광범위한 법적, 윤리적 문제들을 다루는 시험(CLEO/PHELO)가 포함된다.


MMI는 OSCE점수, 임상실습카드(clerkship encounter cards), 임상실습 수행능력 점수를 잘 예측했다. MCCQE Part I에 대해서 MMI는 CLEO/PHELO점수, 그리고 임상의사결정(clinical decision making, CDM) 점수를 잘 예측했다. 이들 중 어떤 것도 다른 비인지 능력 평가방법이나 uGPA로 예측되지 않았고, uGPA는 과정평가(progress test)점수와 MCCQE Part I의 다지선다형 전공별 하부영역 점수를 예측했다.


MMI는 기존의 임상실습의 결과나 캐나다 의사면허국가시험 점수를 예측하는데 있어서 기존의 방법(pre-admission cognitive measures)를 보완할 수 있다.

KEYWORDS clinical clerkship ⁄ *standards; clinical competence ⁄ *standards; *licensure, medical; Ontario; school admission criteria; schools, medical


What is already known on this subject

The Multiple Mini-Interview (MMI) is a feasible, acceptable and reliable admissions protocol that is predictive of pre-clerkship OSCE performance.

What this study adds

This study provides further support for the MMI as a measure of non-cognitive (i.e. personal) qualities in medical school applicants. Relative to other admissions tools, the MMI was the best predictor of intramural clinical performance ratings and ethical ⁄ clinical decision-making scores on the Canadian national licensing examination.

Suggestions for further research

The relationship between MMI scores and scores on the OSCE-based component of the Canadian licensing examination has yet to be considered, as does the adequacy of the MMI for making postgraduate admissions decisions.

1 Does the MMI predict clinical clerkship performance?

2 Does the MMI predict national licensing examination performance?

3 How does the predictive validity of the MMI compare with that of more traditional admission measures of professional qualities?

4 How does the predictive validity of the MMI compare with that of the uGPA?


We examined the correlation between admissions measures, in-course measures, and the MCCQE Part I using Pearson’s correlation coefficients

Regression analyses were then performed to determine which admissions tools were statistically predictive of each outcome when scores on the other admissions tools were taken into account (i.e. to determine the independent predictability of each admissions tool).

 2007 Apr;41(4):378-84.

Multiple mini-interviews predict clerkship and licensing examination performance.


Department of Oncology, McMaster University, Hamilton, Ontario, Canada.



The Multiple Mini-Interview (MMI) has previously been shown to have a positive correlation with early medical school performance. Data have matured to allow comparison with clerkship evaluations and national licensing examinations.


Of 117 applicants to the Michael G DeGroote School of Medicine at McMaster University who had scores on the MMI, traditional non-cognitive measures, and undergraduate grade point average (uGPA), 45 were admitted and followed through clerkship evaluations and Part I of the Medical Council of Canada Qualifying Examination (MCCQE). Clerkship evaluations consisted of clerkship summary ratings, a clerkship objective structured clinical examination (OSCE), and progress test score (a 180-item, multiple-choice test). The MCCQE includes subsections relevant to medical specialties and relevant to broader legal and ethical issues (Population Health and the Considerations of the Legal, Ethical and Organisational Aspects of Medicine[CLEO/PHELO]).


In-programme, MMI was the best predictor of OSCE performanceclerkship encounter cards, and clerkship performance ratings. On the MCCQE Part I, MMI significantly predicted CLEO/PHELO scores and clinical decision-making (CDM) scores. None of these assessments were predicted by other non-cognitive admissions measures or uGPA. Only uGPA predicted progress test scores and the MCQ-based specialty-specific subsections of the MCCQE Part I.


The MMI complements pre-admission cognitive measures to predict performance outcomes during clerkship and on the Canadian national licensing examination.





[PubMed - indexed for MEDLINE]

의과대학 학생선발 방식 조정 : 상황판단시험을 활용하여 대인관계능력 평가하기

Adjusting medical school admission: assessing interpersonal skills using situational judgement tests

Filip Lievens


의과대학의 교육과정은 인지적 능력과, 비인지적 능력을 모두 필요로 함에도 불구하고, 현재의 의과대학 입학 시스템은 종종 인지적 능력을 평가하는 시험(cognitively oriented test)만 포함하곤 한다. 상황판단시험(Situational judgement tests, SJT)는 의과대학 학생선말 과정에서 대인관계 능력을 평가할 수 있는 새로운 접근방법이 될 수 있다. 이 연구에서는 비디오 기반 SJT(video based SJT)가 대인관계 능력 평가에 타당성을 갖는지 다양한 측정결과와의 비교를 통해 알아보고자 하였다.


이 연구는 종단적, 다중 코호트 디자인을 활용하여 익명화된 입학시험 결과와 의과대학에서의 결과를 조사하였다. 1999년부터 2002년까지 Flemish 의과대학의 데이터를 활용하였으며, 의과대학 입학시험을 치룬 5444명이 그 대상이 되었다. 1학년의 GPA, 대인관계 커뮤니케어션 과목의 GPA, 비대인관계 과정의 GPA, 학부 GPA, 석사 GPA, 최종GPA(7년 후)등이 비교대상이 되었다. 일반의(general practice)를 하고자 하는 학생에 대해서는 추가적으로 입학 9년 후 supervisor의 평가와 OSCE, 일반 진료 지식검사와 사례 기반 인터뷰 점수를 분석하였다.


SJT를 활용해 평가한 대인관계 기술은 대인관계 커뮤니케이션 과목의 GPA, 의사로서의 수행능력, OSCE에서의 수행능력, 사례 기반 인터뷰에서의 점수를 예측하는데 상당한 기여를 하였다. 반면 다른 점수들에 대해서는 인지적 능력 평가시험(cognitive test)가 더 나은 예측력을 보여줬다. 여학생들은 남학생들에 비해서 SJT에서 훨씬 더 우수한 결과를 보였다. 대인관계 SJT 점수는 인지능력 시험점수보다 업무능력과 더 관계가 있었다.


비디오 기반 SJT는 대인관계 행동에 관한 절차적 지식(procedural knowledge)를 평가하는 수단이 될 수 있으며, 인지능력 시험점수를 보완할 수 있다는 것을 보여줬다. 의과대학에서 의사소통기술을 훈련시킬 수 있다고 해서, 학생을 선발할 때 의사소통기술을 볼 필요가 없는 것은 아니다. 그러나 다른 문화나 학생 집단에서 SJT의 활용을 더 조사해볼 필요가 있다.

 2013 Feb;47(2):182-9. doi: 10.1111/medu.12089.

Adjusting medical school admissionassessing interpersonal skills using situational judgement tests.


Department of Personnel Management and Work and Organisational Psychology, Ghent University, Ghent, Belgium. filip.lievens@ugent.be



Today's formal medical school admission systems often include only cognitively oriented tests, although most medical school curricula emphasise both cognitive and non-cognitive factors. Situational judgement tests (SJTs) may represent an innovative approach to the formal measurement of interpersonal skills in large groups of candidates in medical school admission processes. This study examined the validity ofinterpersonal video-based SJTs in relation to a variety of outcome measures.


This study used a longitudinal and multiple-cohort design to examine anonymised medical school admissions and medical education data. It focused on data for the Flemish medical school admission examination between 1999 and 2002. Participants were 5444 candidates taking themedical school admission examination. Outcome measures were first-year grade point average (GPA), GPA in interpersonal communication courses, GPA in non-interpersonal courses, Bachelor's degree GPA, Master's degree GPA and final-year GPA (after 7 years). For students pursuing careers in general practice, additional outcome measures (9 years after sitting examinations) included supervisor ratings and the results of an interpersonalobjective structured clinical examination (OSCE), a general practice knowledge test and a case-based interview.


Interpersonal skills assessment carried out using SJTs had significant added value over cognitive tests for predicting interpersonal GPA throughout the curriculum, doctor performance, and performance on an OSCE and in a case-based interview. For the other outcomes, cognitive testsemerged as the better predictors. Females significantly outperformed males on the SJT (d = -0.26). The interpersonal SJT was perceived as significantly more job-related than the cognitive tests (d = 0.55).


Video-based SJTs as measures of procedural knowledge about interpersonal behaviour show promise as complements to cognitive examination components. The interpersonal skills training received during medical education does not negate the selection of students on the basis ofinterpersonal skills. Future research is needed to examine the use of SJTs in other cultures and student populations.

© Blackwell Publishing Ltd 2013.

[PubMed - indexed for MEDLINE]

의과대학의 입학기준과 다양성

Admission criteria and diversity in medical school

Lotte O’Neill,1 Maria C Vonsild,2 Birgitta Wallstedt2 & Tim Dornan3


의과대학에서 낮은 사회경제적 배경의 학생을 과소평가(under-representation)하는 것은 중요한 사회적 이슈이다. 현재까지는 입학 전형을 변화시키는 것이 의과대학 학생의 다양성을 높인다는 근거가 적다. 덴마크는 '자질중심(attribute-based)'이라는 입학전형을 만들어서 '성적중심(grade-based)' 입학전형에서는 입학이 어려운 학생들을 학문적 역량보다 '자질'을 갖춘 학생들을 선발하고자 했다. 이 연구의 목적은 각각의 입학전형을 통해서 들어온 학생들의 사회적 구성의 차이를 보고자 하는 것이다.


이 전향적 코호트 연구는 2002년부터 2007년까지 입학한 1074명의 학생을 대상으로 하였다. 이 중 454명은 성적중심 입학전형으로, 620명은 자질중심 입학전형으로 들어온 학생들이다. 각각의 입학전형에서 사회적 구성의 혼합정도를 평가하기 위해서 덴마크에서 학업성취와 관련이 되있다고 알려진 사회적 요인들에 대한 정보를 수집하였다(인종, 아버지의 직업, 어머니의 교육, Parenthood, 부모와 동거, 부모의 사회이득(social benefit))


선발 전략은 의과대학 구성의 차이에 통계적으로 유의미한 차이를 주지 않았다.


입학전형의 선택은 의과대학에 대한 접근성이나 사회적 다양성을 높이는데 그다지 중요하게 작용하지 않았다. 다양한 지원자 풀을 확보하는 것이 학생 구성의 다양성을 높이는데 더 좋은 전략으로 보인다.

 2013 Jun;47(6):557-61. doi: 10.1111/medu.12140.

Admission criteria and diversity in medical school.


Centre of Medical Education, Aarhus University, Aarhus, Denmark. lotte@medu.au.dk



The under-representation in medical education of students from lower socio-economic backgrounds is an important social issue. There is currently little evidence about whether changes in admission strategies might increase the diversity of the medical student population. Denmark introduced an 'attribute-based' admission track to make it easier for students who may not be eligible for admission on the 'grade-based' track to be admitted on the basis of attributes other than academic performance. The aim of this research was to examine whether there were significant differences in the social composition of student cohorts admitted via each of the two tracks during the years 2002-2007.


This prospective cohort study included 1074 medical students admitted during 2002-2007 to the University of Southern Denmark medical school. Of these, 454 were admitted by grade-based selection and 620 were selected on attributes other than grades. To explore the social mix of candidates admitted on each of the two tracks, respectively, we obtained information on social indices associated with educational attainment in Denmark (ethnic origin, father's education, mother's education, parenthood, parents living together, parent in receipt of social benefits).


Selection strategy (grade-based or attribute-based) had no statistically significant effect on the social diversity of the medical student population.


The choice of admission criteria may not be very important to widening access and increasing social diversity in medical schools. Attracting a sufficiently diverse applicant pool may represent a better strategy for increasing diversity in the student population.

© 2013 John Wiley & Sons Ltd.





[PubMed - in process]

레지던트 선발 전략과 의사로서의 능력 : 메타분석

Associations between residency selection strategies and doctor performance: a meta-analysis

Stephanie Kenny, Matthew McInnes & Vivek Singh


본 연구의 목적은 메타분석을 통해서 레지던트선발과 관련된 어떤 정보가 레지던트나 의사로서의 수행능력과 연관이 있는가를 알아보고자 하였다.


다양한 전자 데이터베이스를 조사하였다. 두 명의 리뷰어가 독립적으로 기준에 맞는 연구를 골라서 중복된 것은 제하고, 의견이 다른 것은 합의를 보았다. 평가 중에 생길 수 있는 Bias에 대한 위험은 customised bias 평가 툴을 이용하여 평가하였다. 연관성의 척도는 common effect size (Hedges' g)로 변환되었다. 각각의 레지던트 선택 전략과 그에 따른 결과를 pooling하지 않고 random-effect model로 메타분석을 수행하였다. 각각의 선발전략-결과 쌍은 effect size를 pooling하여 Sensitivity analysis를 수행하였다.


총 41704명의 지원자에 대한 80개의 연구 논문이 메타분석에 포함되었다. 17가지의 서로 다른 선발 전략과 17가지의 outcome을 평가하였다. 가장 강력한 양의 연관성은 USMLE step1과 같은 시험 기반 선발전략과 in-training exam과 같은 시험 기반 결과였다. 의과대학 성적과 시험기반/주관적 결과 사이에는 중등도의 양의 연관성이 있었다. 면접이나 추천서 등과 같은 선발전략은 매우 작거나 거의 연관성이 없었다.


표준화된 시험의 점수나 의과대학의 성적은 현재 의사의 performance를 측정하는 방식과 가장 강력한 연관성을 보였다. 추천서나 면접은 연관성이 약했다. 현재의 평가 시스템에서는 객관적인 선발 전략이 더 강력해보인다. 하지만 장기적인 측면에서 의사의 수행능력을 다룬 연구는 부족한 실정이다.

 2013 Aug;47(8):790-800. doi: 10.1111/medu.12234.

Associations between residency selection strategies and doctor performance: a meta-analysis.


Department of Medical Imaging, Ottawa Hospital, University of Ottawa, Ottawa, Ontario, Canada.



The purpose of this study was to use meta-analysis to establish which of the information available to the resident selection committee is associated with resident or doctor performance.


Multiple electronic databases were searched to 4 September 2012. Two reviewers independently selected studies that met the present inclusion criteria and extracted data in duplicate; disagreement was resolved by consensus. Risk for bias was assessed using a customised bias assessment tool. Measures of association were converted to a common effect size (Hedges' g). Meta-analysis was performed using the random-effects model for each selection strategy and all outcomes without pooling. Sensitivity analysis for each selection strategy-outcome pair was performed with pooling of effect size.


Eighty studies involving a total of 41 704 participants were included in the meta-analysis. Seventeen different selection strategies and 17 outcomes were assessed across these studies. The strongest positive associations referred to examination-based selection strategies, such as the US Medical Licensing Examination (USMLE) Step 1, and examination-based outcomes, such as scores on in-training examinations. Moderate positive associations were present for medical school marks and both examination-based and subjective outcomes. Minimal or no associations were seen for the selection tools represented by interviews, reference letters and deans' letters.


Standardised examination performance and medical school grades show the strongest associations with current measures ofdoctor performance. Deans' letters, reference letters and interviews all show a lower than expected strength of association given the relative value often assigned to them during resident doctor selection. Objective selection strategies are potentially the most useful to residency selectioncommittees based on current evaluative methods. However, reports in the literature of validated long-term doctor performance outcomes are scant.

© 2013 John Wiley & Sons Ltd.

(출처 : http://admissions.berkeley.edu/selectsstudents)

지난 수십년간, 각 의과대학은 AAMC의 지원을 받아 

의과대학 지원자를 평가하는 기준틀을 확장시키는 작업을 해왔다. 

이러한 노력의 결과로 "holistic review"라는 이름으로 2003년 미국 대법원의 배서를 받았다. 

"고도로 개인화된, 개개 지원자의 파일을 전인적으로 평가함으로써 

지원자가 다양한 교육적 환경 속에서 어떠한 기여를 할 것인가를 판단할 수 있다."

이러한 접근법 하에서 의과대학은 

"각 지원자가 나름의 강점, 성과, 특징을 가지고 눈에 띄는 기여를 할 수 있을 것인가를 평가한다"

학부GPA, MCAT점수, 봉사단체에서 리더십 역할 등과 같은 각각의 요소들은 

지원자의 전체적인 포트폴리오/지원자 정보의 맥락에서 평가된다.

2003년 holistic review를 도입한 BUSM은 구조화된 면접, 

교수와 스텦 교육, 데이터의 체계화된 분석 등을 활용하여 

인식하고 있는, 그리고 인지하지 못하고 있는 편견을 최소화시킨다.

이렇게 선발된 학생들은, 문화적/언어적/인종적/민족적으로 이전에 그렇게 선발되지 않은 학번보다 더 다양했으며,

 GPA와 MCAT점수를 기준으로 봤을 때, 학업적으로도 동등한 수준으로 준비되어 있었다.

Modern medicine has been characterized by rapid and accelerating progress in biomedical sciences as the foundation for clinical practice. In 1910, the Flexner Report established these sciences as the core of medical education.1 

Admissions committees at U.S. medical schools have, for the past century, focused their attention largely on predictors of success in the foundational science curriculum, relying heavily on academic performance in the biologic and physical sciences and scores on the Medical College Admission Test (MCAT) in selecting applicants for medical school

Over the past decade, individual medical schools, supported by the Association of American Medical Colleges (AAMC), have been working to expand the frame of reference for evaluating applicants for medical school. These efforts have come together under the “holistic review” rubric endorsed by the U.S. Supreme Court in 2003: “highly individualized, holistic review of each applicant’s file, giving serious consideration to all the ways an applicant might contribute to a diverse educational environment.” Under such an approach, a school “seriously considers each ‘applicant’s promise of making a notable contribution to the class by way of a particular strength, attainment, or characteristic — e.g., an unusual intellectual achievement, employment experience, nonacademic performance, or personal background.’”3

The AAMC Holistic Review Project has defined holistic review in medical school admissions as “a flexible, individualized way of assessing an applicant’s capabilities by which balanced consideration is given to experiences, attributes, and academic metrics . . . and, when considered in combination, how the individual might contribute value as a medical student and future physician.” 4

Each factor, be it the undergraduate grade-point average (GPA), the MCAT score, or the leadership roles assumed in volunteer service organizations, is evaluated in the context of the complete portfolio of information available about the applicant. That is, a given level of accomplishment for one applicant may look very different in the context of another applicant with a different life story.

In 2003, the Boston University School of Medicine (BUSM) became one of a number of U.S. medical schools to launch a systematic transition from a traditional admissions model based largely on the review of academic metrics to a comprehensive, holistic review process. It was a slow and deliberative transition, but by 2008, changes in the BUSM admissions program were clear and substantial, and the effects were evident in the entering class of 2009.

The table shows one such tool: a list of desirable traits for physicians matched with the elements of applicant data that reveal or predict those traits. Direct measures of these traits are often unavailable, so proxies are used. Holistic review is an information-hungry process;

The BUSM program uses structured interviewing, rigorous training of participating faculty and staff, and systematic evaluation of data elements, all of which minimize the influence of conscious and unconscious bias. 

Since BUSM became engaged in holistic review, the profile of its entering class has changed dramatically 

Students are culturally, linguistically, racially, ethnically, and demographically more diverse than previous classes, and according to the standard measures of undergraduate GPA and MCAT score, they are at least as well prepared academically

 2013 Apr 25;368(17):1565-7. doi: 10.1056/NEJMp1300411. Epub 2013 Apr 10.

Holistic review--shaping the medical profession one applicant at a time.


Boston University School of Medicine, Boston, USA.

[PubMed - indexed for MEDLINE] 

Free full text

(출처 : http://www.nytimes.com/2011/07/18/opinion/l18docs.html?_r=0)


의과대학의 학생 선발에 있어서 학업적 성취도와 더불어 비인지적(non-cognitive), 또는 비학업적(non-academic)한 특성이 중요하다는 것은 잘 알려져 있다. 2004년 맥마스터 대학에서 개발된 이후, Dundee 대학은 multiple mini interview (MMI)를 주요한 전입학(pre-admission) 평가법으로 사용해왔다.

It is widely accepted that so-called ‘non-cognitive’ or ‘non-academic’ attributes (such as interpersonal skills and moral reasoning) are important for medical school selection in addition to academic achievement. 1 Developed and introduced at McMaster in 2004, Dundee has since adopted the multiple miniinterview (MMI) as the primary pre-admissions measure

for this purpose. Other schools in the UK are increasingly following suit.

MMI는 OSCE와 같은 형태로 여러 방에서 순차적으로 면접을 보면서, 지원자를 다양한 측면에서 바라보면서(multiple snapshots) 다양한 인성적 특성을 평가하기 위한 목적으로 쓰인다. 이러한 종류의 면접은 Eva 등에 의해서 psychometric 특성의 측정을 위해서 처음 개발되었다.

MMIs aim to assess a broad array of candidates’ personal characteristics through ratings from multiple snapshots of behaviour in an objective structured clinical examination (OSCE)-like rotational approach. This type of interview was first introduced by Eva et al.2 because of the need for an interview process with robust psychometric properties, unlike most traditional interviews.

많은 내용으로 많은 면접자들을 평가한 결과 MMI는 지원자의 행동특성에 대한 정확한 그림을 얻는데 효과적임이 밝혀졌다. 미국, 호주, 영국에서 많은 연구결과가 있었고 MMI는 점차 전세계적으로 확산중에 있다.

By testing a larger content sample with multiple independent interviewers, MMIs have demonstrated that they can offer a more accurate picture of a candidate’s behaviour.3 With compelling evidence on reliability and other satisfactory psychometric properties from the USA, Australia and the UK,2,4–8 MMIs continue to be adopted across medical and dental schools worldwide.

이제 관심은 MMI가 의과대학과 의과대학 졸업 이후에까지의 예측력을 가지는가에 대한 것에 쏠리고 있다. 많은 연구를 통해 MMI의 결과가 미래의 수행능력과 통계적으로 유의미하면서 실용적인 관계를 보인다는 것이 확인되었다.

Attention has now shifted to the ability of MMIs to predict performance in medical school and beyond. A number of studies have demonstrated that they show statistically significant and practically relevant relationships with future performance.9–11

비록 이러한 연구들이 높은 예측타당도를 보여줬지만, 대부분의 연구가 소수의 캐나다 중심의 코호트로 수행되었다는 한계가 있다.

Although these studies have successfully demonstrated predictive validity, it is clear that more research is needed as the majority of this work was based on the same small Canadian cohort

따라서 북아메리카 이외의 지역에서, 그리고 더 많은 수의 코호트에서 MMI의 예측타당도를 검증해볼 필요가 있다.

Therefore, the body of evidence examining the predictive validity of MMIs would benefit from an analysis of different and larger cohorts and from outside of North America

MMI를 활용하는 것의 효용성은 이미 확실해졌지만, 다른 입학 기준에 대해서도 같은 기대를 가질 수 있다. 따라서 MMI가 다른 기준들보다 더 예측력이 뛰어난가는 확인해 볼 필요가 있다.

Although it is certainly beneficial to consider the usefulness of MMIs, the same expectation should be set for all admissions measures.9 It is therefore important to consider the predictive ability of MMIs relative to other pre-admissions measures

Ferguson 등과 Siu와 Reiter는 의과대학에서의 성공 예측에 있어서 자기소개서는 아무런 예측력이 없음을 밝혔다. Wright와 Bradley는 자기소개서를 평가한 점수가 의과대학에서의 수행능력을 예측하지 못하는 것은 물론이고, 오히려 사회경제적 배경에  따라 특정 지원자에게 더 이득을 주는 식의 비뚤림(bias)만 가지고 있다고 밝혔다.

Ferguson et al.13 and Siu and Reiter14 reviewed predictors of success in medical school and found that there was a lack of evidence that personal statements or references have any predictive value in subsequent achievement. Wright and Bradley15 also found that not only did scores derived from the personal statement fail to predict medical school examination performance, but they were also biased towards those from more advantaged socio-economic backgrounds.

UKCAT은 지식 검사로서 의과대학과 치과대학에서 필요한 다양한 범위의 정신적 능력(mental abilities)를 평가하기 위한 시험이다.

The UKCAT (http://www.ukcat.ac.uk) is an intelligence test used to ‘assesses a range of mental abilities identified by university medical and dental schools as important’.16

여러 문헌에서 MCAT, GAMSAT, BMAT이 모두 일부 미래의 수행능력을 예측할 수 있음이 밝혀졌지만, UKCAT에서 같은 종류의 연구결과가 나온 적은 없다.

Though the literature suggests that the MCAT, GAMSAT and BMAT each have some success at predicting future performance, level of success has not been replicated so far with the UKCAT

Statistical Analysis

Correlations were adjusted for range restriction and are referred to in this study as ‘unrestricted’ correlations. Statistical significance was determined prior to correcting the correlations. This adjustment is common in predictive validity studies and is carried out to counter correlation underestimates when the observed sample is not representative of the population of interest.21,22

The strengths of correlations were compared using Cohen’s effect size interpretations (small 0.10, medium 0.30, large 0.50)24 and the US Department of Labour, Employment Training and Administration’s guidelines for interpreting correlation coefficients in predictive validity studies (‘unlikely to be useful’ < 0.11; ‘dependent on circumstances’, 0.11–0.20; ‘likely to be useful’ 0.21– 0.35; ‘very beneficial’ > 0.35).25


이 연구에서는 두 개의 서로 다른 코호트에서 MMI가 의과대학 시험의 성취도와 일관된 예측력을 가진다는 것을 보여줌으로써 MMI의 타당도에 대한 중요한 근거를 제시하고 있다.

this study does provide important evidence of the validity of the MMIs by demonstrating that it was the most consistent predictor of success in medical school examinations across two separate cohorts and years

비록 상관관계의 크기가 그다지 높지는 않지만, 이정도 수준의 예측타당도만으로도 지원자의 수가 많고 정교한 선발 결정이 중요한 선발시스템에서는 상당한 가치를 지닐 수 있음이 주장된 바 있다. 범위제한(range restriction)을 한 뒤에도, 이들 계수(coefficient)들은 '도움이 되어 보임' 또는 '상당한 도움이 됨' 이라고 나왔다. 상관관계는 OSCE평가에서 가장 높았으며, 이는 아마도 '의사소통 능력'이나, '압박적인 상황에서 수행하는 능력'과 같은 공통적 요소를 평가했기 때문이라고 생각된다.

Although the size of these correlations can be described as moderate, it has been asserted that measures with even modest predictive validity could add considerable value to selection systems where the ratio of applicants to places is large and the importance of sound selection decisions is high.28 After adjusting for range restriction, these coefficients can be described as ‘likely to be useful’ or ‘very beneficial’.25 Correlations were largest in OSCE assessments, perhaps because certain components are common in both, such as communication skills, or even more generally an ability to ‘perform under pressure’.

 2013 Jul;47(7):717-25. doi: 10.1111/medu.12193.

Predictive validity of the Dundee multiple mini-interview.


Division of Clinical and Population Sciences and Education, University of DundeeDundee, UK.



The multiple mini-interview (MMI) is the primary admissions tool used to assess non-cognitive skills at Dundee Medical School. Although the MMI shows promise, more research is required to demonstrate its transferability and predictive validity, for instance, relative to other UK pre-admissions measures.


Applicants were selected for interview based on a combination of measures derived from the Universities and Colleges Admissions Service (UCAS) form (academic achievement, medical experience, non-academic achievement and references) and the UK Clinical Aptitude Test (UKCAT) in 2009 and 2010. Candidates were selected into medical school according to a weighted combination of the UKCAT, the UCAS form and MMI scores. Examination scores were matched for 140 and 128 first- and second-year students, respectively, who took the 2009 MMIs, and 150 first-year students who took the 2010 MMIs. Pearson's correlations were used to test the relationships between pre-admission variables, examination scores and demographic variables, namely gender and age. Statistically significant correlations were adjusted for range restrictions and were used to select variables for multiple linear regression analysis to predict examination scores.


Statistically significant correlations ranged from 0.18 to 0.34 and 0.23 to 0.50 unrestricted. Multiple regression confirmed that MMIs remained the most consistent predictor of medical school assessments. No scores derived from the UCAS form correlated significantly with examination scores.


This study reports positive findings from the largest undergraduate sample to date. The MMI was the most consistent predictor of success in early years at medical school across two separate cohorts. UKCAT and UCAS forms showed minimal or no predictive ability. Further research in this area appears worthwhile, with longitudinal studies, replication of results from other medical schools and more detailed analysis of knowledge, skills and attitudinal outcome markers.

© 2013 John Wiley & Sons Ltd.

+ Recent posts