Factors affecting the utility of the multiple mini-interview in selecting candidates for graduate-entry medical school

Chris Roberts,1 Merrilyn Walton,1 Imogene Rothnie,1 Jim Crossley,2 Patricia Lyon,1 Koshila Kumar1

& David Tiller3




Introduction

전세계적으로 의과대학은 최고의 학생을 뽑아서 좋은 의사로 만들어내고자 한다. 그러나 '낮은 탈락률'이 의미하는 바는 한 번 입학하면 대부분의 학생이 그들의 인성이나 전문직업적 특성과 관련없이 의사가 된다는 것을 의미한다. 선발 과정은 의과대학에 있는 모든 평가 중에서도 분명히 가장 high-stakes, highly stressful, resource intensive한 평가이다. 일반적으로 선발과정에는 학업적 능력이 일부 포함되며, 지원자의 인성을 평가하기 위한 면접이나 추천서 등이 포함된다. 학생의 '성적'이란 이전 교육과정에서 학생이 얼만큼의 수행능력을 보였는가를 보여주는 것이고, 여러 연구에서 지속적으로 의과대학에서의 미래 수행능력을 예측하는 가장 우수한 예측인자로 보고되고 있다

Worldwide, medical schools aim to select the best students into their programmes and consequently expect to produce good doctors. Low attrition rates mean, however, that, once admitted, most students graduate as doctors regardless of their personal and professional characteristics.1 Selection procedures are arguably the most high-stakes, highly stressful and resource-intensive of all medical school assessments. They generally include some measure of academic ability (the ‘marks’) and some measure of a candidate’s personability as assessed in an interview or letter.2 Student ‘marks’ reflect past performance over a number of years of previous education and are consistently the best predictor of future performance, whether at medical school, or, for example, in North American licensing examinations.2


좋은 의사를 만드는 것은 좋은 성적 뿐만이 아니며, 대부분의 의과대학이 면접 등과 같은 방식으로 지원자의 가치관/헌신/비인지적 특성을 평가하고자 노력하고 있다. 그러나 면접은 미래 수행능력(학생이든, 의사든)을 예측하는데 뚜렷한 가치를 보여주지 못해서 입학과정의 중요한 요소로서의 공정성이 훼손되고 있다는 지적을 받았다. 면접에 대한 Psychometric studies를 보면 매우 다양한 결과가 나타나는데, 이는 신뢰도의 정의가 서로 다르고, 서로 연구방법론이 다르며, 소수의 연구만이 generalisability approach를 사용했기 때문이다.

It takes more than good marks to make a good doctor and most schools attempt to assess values and commitment and other important non-cognitive characteristics of candidates in some form of interview. However, the interview is of limited value in predicting anything about future performance, either as a student or as a doctor,2,3 which undermines its fairness as an important part of admissions procedures.3 Psychometric studies of interviews have produced highly variable results, largely because of differing definitions of reliability and differing research methodologies, and only a few studies have used a generalisability approach.3–6


다면인적성면접은 면접의 신뢰도에 대한 우려에서부터 시작되어 비교적 새롭게 등장한 평가방법이다. 이는 OSCE형식을 가져와서 긴 면접이 갖는 문제를 피해가고자 했다. 즉, 지원자에 대한 점수가 제한된 면접 주제 및 면접관에서 오는 편향에 영향을 받는다는 것이다. 이러한 면접 방식은 ‘stable qualities within candidates that have a high probability of occurring in an infinite range of contexts’이라는 인식에 도전하는데, MMI가 내용과 독립된 평가자라는 두 가지 측면에서 한 평가자가 갖는 단점을 극복하여 지원자의 행동에 대해서 더 신뢰도높게 generalisation이 가능하다.

The multiple mini-interview (MMI)5 is a relatively new assessment tool which addresses concerns about interview reliability. It uses the objective structured clinical examination (OSCE) format, and so avoids the issues of the long interview (cf. the long case in clinical competence), where much of the observed mark of the candidate relates to biases from the limited interview content and the interviewer panel.3 It challenges the notion that the interview can test ‘stable qualities within candidates that have a high probability of occurring in an infinite range of contexts’1 by confirming the issue of context specificity.5,6 Because the MMI tests a larger sample of both content and independent interviewers than a single interview can, more reliable generalisations about a candidate’s behaviour can be made.


원래 MMI가 개발된 센터 바깥에서 진행된 Pilots를 보면 입학생들에 대한 그들 나름의 비인지적 특성에 대한 framework를 개발하였다. 이번 연구에서 우리는 의학교육 연속체를 가로지르는 professionalism의 하나로서 pre-professionalism을 평가하고자 하였다. McMaster의과대학의 MMI역시 미래의 의과대학생, 그리고 미래의 의사로서 수행능력을 예측하는 상관관계가 우수하다는 예측타당도를 주장한 바 있다.

Pilots of MMIs conducted outside the original centre have developed their own frameworks to establish the preferred non-cognitive characteristics of entry-level students.7 In our study we assumed that we were measuring the behaviours of candidates that have been variously linked to frameworks of professionalism which cut across the medical education continuum,8 and have been called pre-professionalism, to reflect the potential of entry-level students for professionalism. The McMasters University (Hamilton, ON, Canada) MMI also claims predictive validity in that it makes good correlations with future performance as a medical student9 and as a doctor.10


우리 연구의 목적은 의학전문대학원 프로그램 선발에 있어서 면접관이 타당하고 신뢰도 높은 판정을 내릴 수 있는가를 pre-professionalism framework를 사용하여 연구하고자 했다. 또한 MMI의 어떤 특성이 가장 유용한지도 알아보고자 했다.

The aim of our study was to establish whether interviewers can make reliable and valid decisions about applicants when selecting candidates for entry to a graduate-entry medical programme, using a pre-professionalism framework and the MMI format. Secondly, we wanted to know which features of the MMI were most useful in guiding admissions committees to focus their resources in making robust decisions about candidates.



Methods  

Data came from a high-stakes admissions procedure. Content validity was assured by using a framework based on international criteria for sampling the behaviours expected of entry-level students. A variance components analysis was used to estimate the reliability and sources of measurement error. Further modelling was used to estimate the optimal configurations for future MMI iterations.



Results  

This study refers to 485 candidates, 155 interviewers and 21 questions taken from a pre- prepared bank. For a single MMI question and 1 assessor, 22% of the variance between scores reflected candidate-to-candidate variation. The reliability for an 8-question MMI was 0.7; to achieve 0.8 would require 14 questions. Typical inter-question correlations ranged from 0.08 to 0.38. A disattenuated correlation with the Graduate Australian Medical School Admissions Test (GAMSAT) subsection ‘Reasoning in Humanities and Social Sciences’ was 0.26.










Conclusions

In a high-stakes admissions procedure performed outside the original centre, on a large sample, using generalisability theory, 

    • we confirmed that the MMI is a moderately reliable method of assessment
    • We established the construct validity of the MMI by showing a small positive correlation with GAMSAT section scores for ‘Reasoning in Humanities and Social Sciences’ and ‘Written Communication’. 
    • The largest source of identifiable measurement error relates to aspects of interviewer subjectivity, suggesting that further training of interviewers would be beneficial
    • Applicant performance on one question did not correlate strongly with performance on another question, demonstrating the importance of context specificity when testing professional behaviours. 
    • Multiple mini-interviews must have a sufficient number of questions for precise comparison for ranking purposes because of the size of the measurement error. 
    • We demonstrated that a significant proportion of students with high GPAs and GAMSAT scores can fail an MMI


Further research is required into the construct and predictive validity of the MMI in order to justify its long-term use, and to establish the impact of training on measurement error through careful experimental design.




 2008 Apr;42(4):396-404. doi: 10.1111/j.1365-2923.2008.03018.x.

Factors affecting the utility of the multiple mini-interview in selecting candidates for graduate-entry medical school.

Abstract

CONTEXT:

We wished to determine which factors are important in ensuring interviewers are able to make reliable and valid decisions about the non-cognitive characteristics of candidates when selecting candidates for entry into a graduate-entry medical programme using the multiple mini-interview(MMI).

METHODS:

Data came from a high-stakes admissions procedure. Content validity was assured by using a framework based on international criteria for sampling the behaviours expected of entry-level students. A variance components analysis was used to estimate the reliability and sources of measurement error. Further modelling was used to estimate the optimal configurations for future MMI iterations.

RESULTS:

This study refers to 485 candidates, 155 interviewers and 21 questions taken from a pre- prepared bank. For a single MMI question and 1 assessor, 22% of the variance between scores reflected candidate-to-candidate variation. The reliability for an 8-question MMI was 0.7; to achieve 0.8 would require 14 questions. Typical inter-question correlations ranged from 0.08 to 0.38. A disattenuated correlation with the Graduate Australian Medical School Admissions Test (GAMSAT) subsection 'Reasoning in Humanities and Social Sciences' was 0.26.

CONCLUSIONS:

The MMI is a moderately reliable method of assessment. The largest source of error relates to aspects of interviewer subjectivity, suggesting interviewer training would be beneficial. Candidate performance on 1 question does not correlate strongly with performance on another question, demonstrating the importance of context specificity. The MMI needs to be sufficiently long for precise comparison for ranking purposes. We supported the validity of the MMI by showing a small positive correlation with GAMSAT section scores.

PMID:

 

18338992

 

[PubMed - indexed for MEDLINE]


+ Recent posts