각 선발방법은 얼마나 효과적인가? systematic review (Med Educ, 2016)
How effective are selection methods in medical education? A systematic review
Fiona Patterson,1 Alec Knight,2 Jon Dowell,3 Sandra Nicholson,4 Fran Cousans2 & Jennifer Cleland5
INTRODUCTION
실제로, 의학교육에서의 선발은 종종 정치적 고려 및 핵심 이해관계자에 따라 움직인다. 이러한 영향력은 '전통적인' 척도로부터 벗어나고자 하는 모든 움직임에 - 비록 그렇게 해야 하는 확고한 근거가 있음에도 - 반대하는 결과를 낳기도 하며, 근거-기반 선발을 어렵게 한다. 그러나 Kreiter와 Axelson의 non-systemic review를 보면 지난 25년간 효과적인 교육 인터벤션이 학습에 가져다주는 이득은 0.20이하의 효과크기이나, 근거-기반 선발은 훨씬 더 강력해서, 잘 설계된 선발 도구는 1SD 이상의 향상을 가져온다.
Indeed, selection for medi- cal education internationally is frequently driven by political considerations and the preferences of key stakeholders.1 Such influences may result in resis- tance against any move away from ‘traditional’ mea- sures despite compelling evidence to do so, often to the detriment of evidence-based selection practices. However, Kreiter and Axelson’s2 non-systematic review of medical admissions research and practice in the last 25 years noted that effective educational interventions typically produce only small gains in learning (effect sizes generally below 0.20), whereas evidence-based selection is comparatively far more powerful, with well-designed selection tools achieving performance gains exceeding one standard devia- tion.
이전 학업 성취도는 일반적으로, 그리고 앞으로도 선발의 기반 근거가 될 것이고, 초기 스크리닝 단계에서 평가될 것이다.
Prior academic attainment has gener- ally been, and continues to be, the primary basis for selection and is usually assessed at an initial screen- ing stage.3
그러나 이렇나 접근법에 대해서 몇 가지 우려가 있다. 우선, 이전 연구에서 학업성취도가 좋긴 하나 수행능력의 완벽한 예측인자는 아니며, UME의 23%, PGME의 6% 분산만을 설명한다.
How- ever, there are several concerns about this approach. Firstly, previous reviews have concluded that aca- demic performance is a good, but not perfect, pre- dictor of performance, accounting for approximately 23% of the variance in performance in undergradu- ate medical training and 6% in postgraduate performance.4
둘째로, 학업성취도가 지속적으로 의과대학 수행능력의 좋은 예측인자라는 것을 보여주고 있으나, 역사적으로 중요한 비학업적 특성, 흥미, 동기부여요인과 같은 것들을 신뢰성있게 평가하는 방법에 관한 연구는 덜 이루어져 왔다.
Secondly, although academic achievement is consis- tently shown to be a good predictor of performance in medical school,5 historically substantially less attention has been paid to researching methods that reliably evaluate important non-academic personal attributes, interests and motivational qualities.
셋째로, 장기적 코호트 연구가 부족하다.
Thirdly, there has been a dearth of longitudinal cohort studies examining the predictors of success after qualification.
의과대학 선발절차와 전공의 선발절차의 공정성은 대중의 많은 관심과 비판의 대상이 되어왔다.
Medical school admissions processes and selection for specialty training attract strong public interest and often criticism regarding fairness.7–9
방법
METHODS
자료 출처
Data sources
We conducted a formal literature search using the criteria specified in Table S1 (online).
연구 포함 및 제외 기준
Study selection and inclusion and exclusion criteria
연구 유형, 퀄리티, 선발방법 평가
Assessment of study type, quality and selection method
연구질문과 근거의 퀄리티는 Table 1에. Muir and Grey의 ‘salience’ and ‘safety’ 카테고리는 삭제
The research questions and evidence quality cate- gories are displayed in Table 1. In relation to the different research questions under investigation, we removed Muir and Grey’s (1996)10 ‘salience’ and ‘safety’ categories as they were not relevant to our context.
연구에 대해서 다음을 평가함.
Therefore, we examined each study in relation to four research questions concerning, respectively:
- effectiveness;
- proce- dural issues;
- acceptability, and
- cost-effectiveness.
예측타당도가 선발방법의 효과성에 있어 가장 중요한 척도라는 은연중의 가정을 해소하기 위한 것. 또한 선발도구의 성패는 그 외에도 accessibility, 실행(도입)의 용이성, 핵심 이해관계자들에게 받아들여지는acceptable 정도 등에 따라 달려있다.
This approach was intended to address the assumption implicit in much previous research that predictive validity is the most important measure of the effec- tiveness of a selection method; we acknowledge that the success of a selection tool may be determined by a range of additional factors, including its acces- sibility, ease of implementation and the extent to which it is viewed as acceptable by key stakeholders.
RESULTS
For a full list and description of all papers identified in the review, refer to Tables S2 and S3 (online).
Type of evidence
Effectiveness
Procedural issues
Acceptability
Cost-effectiveness
적성검사
Aptitude tests
요약 Summary
학생 선발에 있어서 적성검사의 유용성에 대한 근거는 혼재되어 있으며, 어떠한 적성검사를 대상으로 하였는가에 따라 크게 달라진다. 따라서 적성검사에 대한 일반적인 결론을 내리는 것은 어렵다. 예컨대, 어떤 연구는 적성검사의 예측타당도를 지지하나 다른 연구에서는 어떤 적성검사는 예측타당도가 부족하다고 지적한다. 이러한 mixed 근거는 적성검사의 공정성에 대해서도 마찬가지로 나타나는데, 일부 연구에서는 특정 그룹이 더 점수를 받는다고 하며, 어떤 연구에서는 또 그렇지 않다고 한다. 예컨대, 의과대학 지원자의 여러 그룹 간 공정성equity에 대한 근거는 다양하다(sex, age, language status and socio-economic sta- tus) 또 다른 적성검사에 대한 근거는 지원자의 배경에 상관없이 공정하며, 코칭에 영향을 거의 받지 않고, 시간이 지나도 안정적인stable 성격을 보인다고 말하며, 그 예외로 UMAT을 지적한다. 따라서 각 적성검사에 대해서 평가하는 것이 중요하다.
Mixed evidence exists among researchers on the usefulness of aptitude tests in medical student selec- tion and findings largely depend on the specific aptitude test studied; hence commenting on the generality of findings is problematic. For example, some studies support the predictive validity of apti- tude tests, but other research suggests that some specific aptitude tests lack predictive validity. Mixed evidence also exists on the fairness of aptitude tests, with some research suggesting that certain groups score more highly on aptitude tests than other groups, whereas other research suggests that this is not the case. For example, there is varied evidence on the equity of aptitude tests for different groups of medical school applicants (e.g. according to sex, age, language status and socio-economic sta- tus).11,15,20,24,46–50 Other evidence suggests that apti- tude tests are equitable with respect to candidate background, are affected relatively little by candi- date coaching, and remain stable over time,20,24,44,50–52 with the possible exception of the UMAT.30 It is therefore important to evaluate each aptitude test in its own right in order to draw con- clusions on the quality of the tool.
학업성취도
Academic records
Summary
연구자들 사이에서 학업성취도가 의과대학 선발에 유용한 정보를 준다는 합의가 있다. 연구 결과는 일반적으로 학업성취도가 예측력이 있으며, 즉 학업성취도가 더 뛰어날수록 의과대학에서 성공 가능성이 높다는 것이다. 그러나 이전 학업성취도의 변별력에 대한 우려가 있어서 이는 의과대학 지원자가 최상위권top grades를 받을수록 점차 변별력이 없어진다는 우려도 있다. 또한 높은 성적을 받은 지원자가 더 좋은 의사가 된다는 장기 추적 자료근거가 부족하다. 더 나아가 Milburn은 영국에서 지나치게 A-level 지원자에 의존하는 것이 대학의 사회적 유입 social intake를 왜곡시키며, 의과대학을 학업성취도에만 근거해서 뽑는것이 중요한 비학업적 요인을 무시하는 결과를 가져올 수 있다고 지적한다.
There is a high level of consensus among researchers that academic records provide useful information to inform medical student selection. Research generally suggests that prior academic attainment has predictive power, meaning that those with stronger academic records are more likely to succeed in medical school. However, there is concern that the discriminatory power of prior academic attainment may be diminishing as increasing numbers of medical school applicants have top grades. There is also a lack of long-term follow-up data to provide evidence that medical school applicants with higher grades go on to become better physicians. Moreover, Milburn8 notes that over-reliance on A-level results in the UK may create a distorted social intake to univer- sities, and recruiting medical students solely on the basis of academic attainment may neglect important non-academic factors required for suc- cess in medical school and beyond.
자기소개서
Personal statements
효과성 Effectiveness
예측타당도에 대한 효과성 근거는 엇갈린다. 비록 일부 근거가 자기소개서의 유급/탈락, 내과 수행능력, 임상 관련 교육 등에 관한 예측타당도를 지지하고 있지만, 또 다른 연구는 자기소개서는 다른 흔히 사용되는 선발도구에 비해서 신뢰성이 떨어진다고 주장하기도 하며, 의과대학 성공의 예측을 잘 해주지 못한다고 지적한다. 그러나 일부 저자들은 자기소개서는 지원자들로 하여금 그들이 지원하는 의학 학위의 특징에 대해서 인식하게 해주며, 좀더 informed decision을 하게 도와준다고 말한다.
Evidence on the predictive validity of personal state- ments is varied. Although some evidence has been found for the predictive validity of personal state- ments for medical school dropout rates,65 perfor- mance on internal medicine14 and clinical aspects of training,66 several others have reported that personal statements have low reliability compared with other commonly used selection instruments70 and are not predictive of subsequent success at medical school.2,71–73 Some authors suggest, however, that personal statements may have some value for making applicants aware of the characteristics of the medical degree they are applying to, which may help themto make a more informed decision to apply.73
절차적 이슈 Procedural issues
절차적 요인이 자기소개서의 신뢰도와 타당도에 영향을 준다. 의과대학 지원자는 자기소개서를 통해서 입학위원회에게 매력적으로 보일 만한 방법으로 스스로를 보여주나, 그것이 지원자의 특성을 반드시 정확하게 보여주지 않을 수도 있다. 따라서 자기소개서에 드러나는 인적 특성은 부분적이고 주관적이다. 자기소개서의 효과성에 영향을 주는 요인으로는 마감시기에 비해서 일찍 냈는지, 채점 방식, onsite vs offsite 등이 있다. 마지막으로 한 연구는 자기소개서가 여러 영국 의과대학 사이에 서로 다양한 방법으로 사용되고 있음을 지적했다. 일부 의과대학은 선발 결정을 내리는 공식적 정보로서 활용했으나, 어떤 의과대학은 선발에 부당한 bias를 줄 수 있어서 이 정보를 무시하였다.
Evidence suggests that a number of procedural factors affect the reliability and validity of personal statements. Medical school candidates may use personal statements to present themselves in ways they believe are attractive to admission commit- tees, which may not necessarily be accurate.74,75 Hence, the information captured by personal statements is likely to be both partial and subjec- tive in nature. Factors that may affect the effec- tiveness of the selection method include the earliness of submission in relation to a deadline,76 marking method, and on-site versus off-site com- pletion.77 Finally, one article highlighted the fact that personal statements are used differentially by different UK medical schools.78 Some medical schools use the information formally in making selection decisions, whereas others ignore this information out of concern that it may unfairly bias selection decisions.
수용가능성 Acceptability
연구 결과로부터 자기소개서의 데이터 오염의 가능한 원인이 지적된 바 있다. 여기에는 지원자의 이전 기대, 제출까지 걸리는 시간, 제3자의 도움 candidates’ prior expectations, the length of time spent completing submissions, and input to submis- sions from third parties등이 있다. 또 다른 연구에서 정치적 타당성과 이해관계자의 만족도에 대해서 지적한 바 있으며, Stevens 등은 약 60%의 학생이 자기소개서를 의과대학 선발도구로서 적절하다고 인식함을 보여주었다. Elam 등은 의과대학 지원서에 작성해야 하는 내용이 입학위원회가 내리는 결정에 중요한 영향력도 행사할 가능성이 매우 낮다는 것을 보고했다. White 등은 의과대학 지원자가 자신을 보여줄 때, 지원자로서 바람직한 모습을 보여주지, 진짜 자신의 모습을 성찰항 보여주지 않는다고 지적했다. 마찬가지로 Kumwenda는 대부분의 의과대학 지원자는 다른 지원자들이 진실을 왜곡한다고 생각했고, 상당 비율의 지원자가 지원서의 정확성accuracy(진실성)을 평가하지 않을 것으로 생각함을 보여주었다.
Research has highlighted potential sources of data contamination in personal statements, including candidates’ prior expectations, the length of time spent completing submissions, and input to submis- sions from third parties. Other research14,74 has commented on the political validity and stakeholder satisfaction of personal statements in medical stu- dent selection. Whereas Stevens et al.45 found that approximately 60% of students thought that per- sonal statements were suitable to use for admission to medical school, Elam et al.13 reported that the contents of medical school candidates’ application forms are very unlikely to exert any significant influ- ence on decisions made by admissions committees. White et al.74 also argued that medical school candi- dates present themselves in ways that they believe are expected of candidates, rather than in ways that are genuine reflections of themselves. Likewise, Kumwenda et al.79 found that most medical school applicants believed that others stretched the truth in their personal statements, and a proportion of applicants believed it was unlikely that statements were checked for accuracy.
요약 Summary
자기소개서의 효과성은 좋게 봐줘야 mixed 되어있다고 할 수 있으며, 예측타당도를 지지하는 근거는 매우 적고, 많은 연구에서 신뢰도와 타당도가 부족하다고 지적한다. 자기소개서는 선발도구로서의 효과성이 다양한 외부 요인에 영향을 받음에도 전세계적으로 의과대학 선발에서 널리 사용된다. 자기소개서의 내용은 선발결정을 내리는 사람들의 판단을 불공정하게 흐릴 수 unfairly cloud 있다.
Evidence on the effectiveness of personal statements in medical student selection is mixed at best. Little evidence exists to support the predictive validity of personal statements, and a large volume of research evidence suggests that the selection method lacks reliability and validity. Personal statements remain widely used in medical school selection worldwide, despite concerns that the effectiveness of the selec- tion method is influenced by numerous extraneous factors. The content of personal statements may also unfairly cloud the judgement of individuals making selection decisions.
추천서
References
요약 Summary
추천서의 신뢰성과 타당성 모두에서 부정적이라는 근거는 충분하다. 그럼에도 추천서는 의과대학 선발에 흔히 사용되는 도구이다. 이러한 측면에서, 의과대학 선발에 추천서를 넣는 것은 도움이 되지 않으며, 소중한 자원은 다른 선발 도구에 사용하는 것이 더 좋을 것이다.
There is a good level of consensus that references are neither a reliable nor a valid tool for selecting candidates for medical school. Despite these find- ings, references remain a common feature of med- ical school selection worldwide. To this extent, the inclusion of references in medical school admis- sion processes may be unhelpful and may use valuable resources that could be directed more usefully to selection methods with evidentially based reliability and validity.
SJT
Situational judgement tests
요약 Summary
SJT가 잘 만들어지기만 한다면 신뢰성 있고, 타당하교, 비용효과적이고, 수용가능하다는 근거가 충분하다. SJT는 개발이 복잡하고, 따라서 문항의 형식, Instruction, 채점 등과 관련하여 다양한 옵션이 있다. 이러한 옵션이 적절하게 보정calibrate된다면 SJT에 근거들은 이것이 의과대학에서 비학업적 특성 평가에 강점을 갖음을 보여준다.
There is a good level of consensus among research- ers that SJTs, when properly constructed, can form a reliable, valid, cost-effective and acceptable ele- ment of medical school selection systems. SJTs are complex to develop and there is a wide range of options available in relation to item formats, instruc- tions and scoring. When these options are cali- brated appropriately, research evidence points to the strength of SJTs in medical student selection for assessing non-academic attributes.
성격, 감정지능
Personality and emotional intelligence
요약 Summary
포괄적으로 말해서, 연구자들은 성격의 어떤 영역은 의과대학 수행능력에 유의미하게 긍정적/부정적 방향으로 관련됨에 합의를 이룬다. 그러나 성격 영역과 의과대학 수행능력간의 관계는 종종 매우 복잡한데, 예를 들면 conscientiousness 는 지식-기반 평가에는 긍정적으로 연관되어 있으나, 일부 임상상황에서의 평가에서는 부정적으로 연관되어 있다. 이러한 결과는 성격-기반 선발도구를 검토할 때 준거의 구인에 대해서 보다 자세히 살펴볼 필요가 있음을 제안한다. 성격검사는 비용-효과적이고 면접 방법 등과 같이 추가 probe가 가능한 다른 선발도구와 함께 사용될 수 있다.선발을 하는 사람들은 성격검사가 의과대학을 넘어선 장기적 예측타당도에 대한 근거가 부족함을 알아야 한다. 또한 성격검사가 의과대학에 입학하는 학생들의 다양성을 축소시킬 수 있음을 알아야 한다. EI의 예측타당도에 관한 연구는 거의 없고, 매우 초기 단계이다.
Taken broadly, there is a relatively high level of con- sensus among researchers that some domains or traits of personality are significantly positively or neg- atively associated with aspects of performance in medical school. However, the associations between personality domains and medical school perfor- mance are often complex, as is demonstrated by evidence that conscientiousness may be positively associated with knowledge-based assessment, but negatively associated with some clinical aspects of medical school assessment. This suggests that closer attention to the criterion constructs should also be considered when reviewing personality-based selection tools. Personality assessment can be cost-ef- fective and may be used in combination with an interview method in which applicant responses can be probed further. Recruiters should be aware that there is a relative dearth of evidence regarding the long-term predictive validity of personality assess- ment beyond medical school, and that there has been some concern that personality assessment may narrow the diversity of types of individuals entering medical education and training. Research on the predictive validity of EI assessment was sparse and at a very early stage of development.
면접, MMI
Interviews and multiple mini-interviews
Type of evidence
효과성 Effectiveness
일부 반하는 근거가 있지만, 근거를 종합하면 전통적인 면접방식은 학생선발로서 예측타당도가 부족하고 강건한robust 방법이 아니라는 것이 중론이다. Edwards 등은 면접에서의 수행능력이 낮은 것이 높은 의과대학 성적과 연괸된다고 하였다. 면접의 효과성에 대한 혼재된 근거는 면접 방법의 다양성을 보여주는 것이기도 하며, 상대적으로 비구조화된 것부터 고도로 구조화된 패널 면접까지 다양하다. Eva와 Macala는 비록 행동면접스테이션behavioural indicator stations가 다른 타입보다 더 신뢰도가 높긴 했으나, 면접관 평가의 신뢰도에 있어서 비구조화된 것과 구조화된 MMI 간 차이가 없음을 보여주었다.
Despite some evidence to the contrary,14,16,33,123–130 the balance of evidence suggests that generally, the traditional interview is not a robust method of selecting medical students, and lacks predictive validity.4,9,28,80,131–137 Edwards et al.17 found that poorer interview performance was associated with higher medical school grade point average (GPA). The mixed findings on the effectiveness of inter- views may reflect substantial differences in interview methods, which range from relatively unstructured individual interviews to highly structured panel interviews. However, Eva and Macala138 found no difference between the reliability of interviewer ratings in unstructured and structured multiple mini-interview (MMI) stations, although behavioural indicator stations differentiated between candidates more reliably than other station types.
MMI에 관한 연구는 전통적 면접에 관한 것보다 일관된다. 예컨대 psychometric properties는 적절한 것으로 보고된다. Uijtdehaage and Parker는 지원자에 대한 상대적(rather than 절대적absolute) 평가를 사용한 연구에서 MMI의 신뢰성이 쉬운 스테이션을 보다 어려운 것으로 바꿔서 향상될 수 있음을 보여주었다. 그러나 Hissbach 등은 지원자의 수행능력에 대한 systemic difference보다 평가자의 bias가 지원자 점수에 더 큰 영향을 줄 수 있음을 보여주었다. 비록 의사소통기술과 같은 일부 특성은 MMI에서 흔히 평가대상이 되곤 하나, 여러 면접밥법 사이에 측정하고자 하는 것이 무엇인가에 대한 명확성이 부족하다. 비록 설계와 무관하게 MMI와 학업성취도 간의 관계는 작거나 없지만, MMI의 구인타당도는 아직 연구대상이다. 더 나아가서 매우 표준화된 면대면 면접은 표준화된 배우를 활용한 시나리오-기반 MMI면접에 비할 바가 아니며, MMI 스테이션의 차원성dimensionality(MMI가 스테이션당 하나 이상의 구인을 측정하는가)에 관한 문제는 논쟁거리가 되고 있다.
The findings from research on MMIs tend to be more directionally consistent than those from research on traditional interviews: for example, the psychometric properties of MMIs are usually reported to be adequate.44,139–146 Uijtdehaage and Parker146 found that the reliability of an MMI was improved by replacing an easy station with a more challenging one, and using relative, rather than absolute, ratings of candidate performance. How- ever, Hissbach et al.147 found that rater bias had a greater effect on applicant scores than systematic differences in candidate performance. There is little clarity about what is being measured within the dif- ferent approaches described, although some attri- butes, such as communication skills, are commonly purported to be assessed by MMIs. Construct validity evidence for MMIs remains exploratory and largely inconclusive, although irrespective of design differ- ences, the relationships between MMIs and aca- demic measures are small to absent.145 Moreover, tightly standardised face-to-face interviews may not be comparable with scenario-based MMI stations utilising standardised role actors, and the dimen- sionality of MMI stations (i.e. whether MMIs can measure more than one construct per station/inter- view question) has been debated in the literature.145
절차적 이슈 Procedural issues
MMI는 대학별로 길이, 패널 구성, 구조, 내용, 채점방법 등이 다양하다. 면접방법이 다양한 것은 신뢰도와 타당도의 혼재된 연구결과의 원인일 수 있다. 다른 근거들은 지원자의 수행능력이 코칭에 따라 영향을 많이 받는다고 지적한다. 비록 많은 연구자들이 MMI를 성공적으로 도입하였다고는 하나 면접을 사용함에 있어 질문의 범위나 유형에 관련된 logistical 어려움이나 면접관의 주관성 등과 같은 어려움이 있었다고 보고한다. Uijt- dehaage and Parker 는 'MMI도입은 할 수는 있지만 상당히 부담스러운daunting 일이다'라고 요약했다.
Schools differ significantly in terms of the length, panel composition, structure, content and scoring methods for interviews. The differential usage of the interview method in medical student selection may underlie the mixed findings on both the relia- bility and validity of interviews reported above. Other research evidence suggests that candidate performance may be significantly affected by coach- ing.30 Using interviews in a selection process also presents logistical difficulties relating to the range and type of questions155 and interviewer subjectiv- ity,51,143,156,157 although numerous authors report on the successful implementation of MMIs into their medical school admission processes.44,146 Uijt- dehaage and Parker summarised that ‘implementing an MMI was feasible but a daunting task’.146
수용가능성 Acceptability
대부분의 연구는 면접 절차에 대한 지원자와 면접관의 긍정적 인식을 보여주며, MMI와 더 구조화된 면접이 덜 구조화된 면접보다 선호된다는 근거가 있다. 일부 근거는 의과대학 지원자는 면접을 시행하는 의과대학을 더 선호함을 보여준다. Campagna-Vaillan- court 등은 대부분의 지원자와 평가자가 MMI가 다양한 역량을 평가하는데 적절한 방법이며, 이를 공정fair하다고 보았고, 전통적 방법보다 선호함을 보여주었다. MMI를 선발에 도입할 때 단계적으로 staged 도입하는 것이 더 받아들여질 가능성acceptance을 높일 수 있다. 표준화된 면접은 PGME 선발에도 사용할 수 있으며, IMG학생이나 면접관에게도 acceptable하다.
Most research reports that applicants and interviewers tend to viewthe interviewing process posi- tively,44,45,60,146 and there is tentative evidence that MMIs and more structured interviews are preferred over less structured methods.138,158 Some evidence suggests that aspiring medical students may prefer the schools that conduct interviews.159 Campagna-Vaillan- court et al.144 found that the majority of applicants and assessors perceived an MMI to be appropriate to assess a range of competencies and considered it to be a fair process, as well as being preferable to a tradi- tional interview. The staged introduction of an MMI into a selection process may foster institutional accep- tance of the method.160 Standardised interviews can also be adapted for use in postgraduate medical selec- tion to measure characteristics that are considered important and acceptable to both international medi- cal graduates and interviewers.139,141,161
비용 효과성 Cost-effectiveness
비록 면접이 기계-채점 방식의 시험보다 더 비용이 많이 들긴 하고, MMI가 전통적 면접보다 스테이션 개발과 연기자 인건비로 인해서 비용이 더 올라가나, MMI의 비용-효과성은 일반적으로 괜찮은 편이다. Value for money는 스테이션 수를 늘리거나 신뢰도가 충분하지 않은 스테이션을 줄여서 더 높아질 수 있다. 그러나 일부 연구결과를 보면 스테이션 수나 질문question의 수를 늘리는 것이 면접관을 늘리는 것보다 더 신뢰성 향상에 도움이 됨을 보여준다. 실제로 Roberts 등은 Cronbach's alpha가 고부담 시험에서 0.80에 달해야 한다고 추정하며, 한 스테이션당 1명의 면접관을 사용할 경우 14스테이션짜리 MMI 가 이 정도에 도달한다고 했다. 이 숫자는 7~12개 스테이션 정도로 줄일 수 있는데, 이 경우 스테이션당 두 명의 면접관이 필요하다. 또한 Dodson 등은 MMI 스테이션당 길이를 8분에서 5분으로 줄임으로서 자원을 아끼면서도 지원자의 등수나 검사 신뢰도에 영향을 최소화 할 수 있다고 말했다. Knorr과 Hissbach는 최소 MMI 스테이션 수에 대해서 일반적 권고안을 내리기 어렵다고 했다.
The cost-effectiveness of MMIs is generally reported to be good,154 although comparatively interviews are significantly more costly than machine-marked tests, and MMIs are more expensive than traditional inter- views because they incur increased costs for station development and actor payments.145,146 Value for money may be improved by examining the number of stations in an MMI, and reducing the number of stations if reliability is not affected. However, some research suggests that increasing the number of questions or stations in MMIs increases reliability more than increasing the number of interview- ers.143,145,162 Indeed, Roberts and colleagues esti- mated that to reach a Cronbach’s coefficient alpha of 0.80 for high-stakes assessment, MMIs must include 14 stations if each is manned by a single interviewer. This number could be reduced to between seven and 12 stations if each station is manned by two interviewers.143 Alternatively, Dod- son et al.163 found that reducing the duration of MMI stations from 8 to 5 minutes conserves resources with minimal effect on applicant ranking and test reliability. Knorr and Hissbach145 concluded in their systematic review that no general recommen- dation for the minimum number of MMI stations can be derived from the literature at present.
Tiller 등은 비용과 시간을 줄이기 위해서 스카이프로 MMI를 시행가능함을 보여주었다.
Tiller et al.164 found that cost and time savings for candidates were substantial when an MMI was con- ducted online via Skype rather than in person, although further research is required regarding the impact on fidelity of the lack of a face-to-face encounter.
요약 Summary
면접은 가장 많이 사용되는 선발도구 중 하나이다. 여러 근거를 보면 전통적인 면접은 고부담 결정의 도구로 사용하기에는 신뢰도와 타당도가 떨어지며, MMI가 신뢰도와 타당도를 높일 수 있는 방법이다. MMI의 예측타당도와 구인타당도에 대해서는, 특히 구인이 정확하게 측정가능한가에 대해서, 더 많은 이론-주도theory-driven연구가 필요하다. 면접에서 평가될 준거의 적절성에 대한 근거가 더 필요하고, validation study가 필요하다. 비용효과성이 평가되어야 하며, 채점이나 점수의 대안적 활용(최저 기준(과락) 설정)에 대한 연구도 더 필요하다. MMI는 그 신뢰성 근거가 누적되며 최근 빠르게 확산되어가고 있다. 그러나 구인타당도와 차원성dimensionality에 대한 이슈는 아직 문제의 여지가 있다. 대학들은 그들이 측정하고자 하는 것이 무엇인지, 실제로 측정하는 것은 무엇인지를 더 잘 이해해야 한다. MMI가 지원자에 미치는 영향은(공정성fairness, 수행능력, 코칭의 영향력 등) question rotation과 같은 설계 관련 결정에 매우 중요한 실제적 문제이다.
Interviews are among the most widely used tools in selection for medical school admission. Evidence suggests that traditional interviews lack the reliability and validity that would be expected of a selection instrument in a high-stakes selection setting. Evidence also suggests that MMIs offer improved reliability and validity over traditional interview approaches. Further theory-driven research is war- ranted, however, in relation to the predictive and construct validity of the MMI method, particularly with respect to the constructs that can be assessed accurately (e.g. communication, critical thinking, empathy, etc.). More evidence is required regarding the appropriateness of criteria that can be assessed in interviews and should be informed by validation studies. In addition, the cost-efficiency and utility of MMIs should be evaluated, along with alternative approaches to scoring and alternative uses of scores (including any minimum threshold criteria). The use of MMIs has spread rapidly in recent years as they can be designed as a reliable selection method. However, issues surrounding the construct validity and dimensionality of MMIs remain problematic: it is critically important that schools better understand what they are seeking to measure, and actually are measuring, with this approach. The impact of the MMI on candidates (in terms of fairness, perfor- mance, coaching effects, etc.) is an outstanding practical concern that should influence design deci- sions such as question rotation.
선발센터
Selection centres
Summary
전반적으로 SC의 유용성에 대한 연구가 부족하다. PG 선발에서 SC의 예측타당도 근거가 강력하며, 더 많은 연구 필요.
Overall, research on the utility of SCs for medical student selection was relatively sparse. Evidence on the predictive validity of SCs for postgraduate selec- tion is stronger, although further evidence is required to build a case for their predictive validity in medical school selection.
DISCUSSION
핵심결과요약
Summary of key findings
지나치게 단면연구설계에 대한 의존도가 높고, 타당도보다는 신뢰도에 집중되어 있어서 'reliably wrong'한 결과를 가져올 수 있다. 비록 일부 연구가 예측타당도를 다루었지만, 구인타당도(무엇이 측정되고 있는가)를 다룬 연구는 적고, 비용-효과성 연구도 적다. 비록 18년간의 연구를 다루었지만, 장기 추적 연구가 부족하다. 지난 2년간 증가하고 있기는 하다.
There is an over-reliance on cross-sectional study designs and a general focus on reliability estimates as indicators of quality rather than aspects of validity (a method may have high reliability but be ‘reliably wrong’25). Although some studies have addressed issues relating to pre- dictive validity, very little research has explored construct validity issues (i.e. what is being mea- sured) and the relative cost-effectiveness of selec- tion methods. During the 18 years covered by this review, there have been remarkably few long-term evaluation studies; however, we note that over the last 2 years there has been an increase in the amount of longitudinal evidence emerging in this area.
여러 선발방법이 복합적으로 사용된 경우 다양한 선발방법들을 아우르는(그리고 가중치의 영향력을 포함한) 선발 시스템과 관련한 연구가 적다.
There remain comparatively few studies examining selection system design overall and the relative contributions of the various selection methodolo- gies (and the impacts of various weightings) when methods are used in combination (as is the norm in medical school selection172,173).
그러나 신뢰성, 타당성, 효과성에 대한 명확한 메시지는 있다. 학업성취도는 대부분의 선발정책과 근거의 strength에서 공통적 특징으로 지속되고 있으며, 앞으로도 그러할 것으로 생각된다. 여러 근거가 전통적 면접, 자기소개서, 추천서보다 구조화된 면접, MMI, SJT, SC가 더 효과적이고 공정한 방법임을 보여준다. 적성검사의 효과성과 공정성에 대한 근거는 혼재되어있고 검사에 따라 다르다. 이는 현재로서 '적성'이 의미하는 바가 무엇인지 합의된 프레임워크가 없기 때문일 것이다. 현재로서는 '순수한' 인지능력 평가(UKCAT)부터 학력검사(BMAT)까지 다양하다. 이런 상태에서는 다양한 적성검사의 상대적 기여를 systematic하게 평가하기 어렵다.
There are, however, some clear messages about the comparative reliability, validity and effectiveness of various selection methods. The academic attainment of candidates remains a common feature of most selection policies and the strength of evidence in support of it continuing to do so remains strong. The extant evidence paints a relatively clear picture illustrating that structured interviews or MMIs, SJTs and SCs are more effective methods and generally fairer than traditional interviews, references and personal statements. Evidence is currently mixed regarding the effectiveness and fairness of aptitude tests, depending on the tool in question. This stems largely from the fact that there is no currently agreed framework that specifies what is meant by aptitude; at present tests range from assessments of ‘pure’ cognitive ability (e.g. the UKCAT) to aca- demic tests (e.g. the BMAT). As such, it is difficult to systematically assess the relative contributions of different aptitude tests, and of aptitude tests within a wider selection system.
다양한 선발방식의 수용가능성에 대한 결과도 혼재되어 있는데, 다양한 정치적 이슈 - 이해관계자의 다양한 관점, 의과대학생과 의과대학에 관한 철학적 차이, 선발도구가 도입되는 형태 - 때문이다.
The picture regarding the acceptability of various selection methods is also mixed, and may be influenced by a variety of political issues including differing stakeholder views, variations in the philosophies of both medical students and medical schools, and the ways in which the tool is implemented as part of a selection system.
여기에 실린 논문을 평가할 때 어떤 용어는 그 스펙트럼이 다양하다는 것을 명확히 해야한다. 그 설계방식에 따라서 평가도구의 질이 엄청나게 달라질 수 있으며, 따라서 효과성에 대한 결론을 내리기 전에 개별적으로 각 설계방식을 검토해봐야 한다.
When judging the papers in this review, it was clear that some terms cover a broad spectrum of meth- ods: MMIs, SJTs, aptitude tests, personality assess- ments and SCs are measurement methods that comprise a multitude of different design parame- ters. Depending on the design, this may significantly alter the quality of the instrument to the extent that each needs to be indi- vidually evaluated before conclusions about its effec- tiveness can be reached.
이론에 대한 함의
Implications for theory
선발연구에 대해서 지속적인 문제는 우리가 선발도구로 예측하려는 성과와 관련되어 있다. 예를 들어 준거criterion에 있어서 conscientiousness 와 수행능력간 관계에 있어 의과대학 초기 성과와 후기(임상)성과에 따라 혼재된 결과를 보여준다. 또한 선발도구 평가에 사용되는 성과척도가 성취도와 최대 수행능력에 대한 것이기에 (의과대학 성취도, 면허시험 수행능력), 임상 진료행위나 전형적(day-to-day) 수행능력과는 다를 수 있다.
A persistent problem with selection research relates to the issue of which outcomes we are trying to pre- dict by using various selection methods.59 For exam- ple, to illustrate this criterion problem, when exploring the association between conscientiousness and per- formance outcomes, we find mixed results when examining outcomes relating to early examination performance in medical school and performance within clinical practice in later years. Furthermore, our review also highlights that outcome measures used to evaluate selection methods most often focus on indicators of attainment and maximal perfor- mance (e.g. medical school achievements, perfor- mance in licensure examinations) rather than indicators relating to clinical practice and typical (day-to-day) in-role job performance.
선발 방법의 정확성과 관련해서 outcome criteria의 명확한 프레임워크가 필요하다.
In judging the evidence for the relative accuracy of selection methods, it becomes appar- ent that a clear framework of outcome criteria with which to interpret the research evidence and compare selection methods, both individually, and within a selection system, has yet to be established;
또한 주로 예측타당도에 초점을 맞춰왔으며, 각 평가도구가 무엇을 측정하고 있는가(구인타당도construct validity)에 대해서는 덜 연구되어왔으며, 어떻게 각 방법이 합해져서 선발시스템을 만드는가에 대한 의문을 갖게 한다. 이는 특히 MMI에 대해서 그러한데, 비록 최근 매우 유명해졌지만, MMI를 가지고 평가하려는 특징attribute가 무엇인가에 대한 consistency가 부족한 것이 구인타당도에 관련된 근거 결론을 내리지 못하게 한다.
In addition, evidence regarding the effectiveness of some methods has focused pre- dominantly on the predictive validity of the tool, rather than on assessing precisely what different methods are measuring (i.e. construct validity); this raises the question of how a method can be considered to add value to a selection system if the constructs it is measuring are unknown. This is particularly the case for MMI research, in which, despite the method’s increasing popularity in recent years, there is a lack of consistency regard- ing the attributes selectors are using MMIs to assess for and, relatedly, evidence regarding con- struct validity remains inconclusive.
지원자의 역량의 지표로 무엇을 봐야 하는가는 medical career의 어느 지점을 기준으로 보느냐에 따라서 달라질 수 있다. 따라서 구체적인 역할에 따라서 지원자를 평가하는 선발 준거가 다양해지고 달라지는데, 여기에는 학업적, 비학업적 지표가 모두 포함된다. 어떤 요인이 UME에는 중요한 예측인자로 나올 수 있지만 임상 수행능력에서는 반대로 작용할 수도 있다. 따라서 서로 다른 선발 방법은 서로 다른 단계마다 서로 다른 방식으로 사용되어야 한다. 예컨대 SJT는 의과대학 초기 수행능력과는 예측력이 낮으나(주로 학업에 초점이 맞춰지므로), clinical practice에 있어서는 더 예측력이 높다. 의학 분야의 선발시스템 설계 어려움은 학업적, 비학업적 자질을 아우르는, 학부선발에서 신뢰도와 타당도가 있는 것과 수 년이 지난 전공의 수련에서 신뢰도와 타당도가 있는 것에 대한 연구 근거를모두 포함시켜야 하는 것이다.
It is clear that indicators of competence for entrance to medical training and practice are likely to be different at different points in a medical career; thus, applicants are judged on multiple selection criteria depending on the specific role, which may include varying combinations of aca- demic and non-academic indicators of aptitude. A factor may be identified as an important predictor for undergraduate training, but may actually hinder some aspects of performance in clinical prac- tice.59,66 As such, different selection methods may predict differently at different stages: for example, an SJT may be less predictive of performance in the early years at medical school (which tends to be more academically-focused), but significantly more predictive of performance outcomes when trainees enter clinical practice.28,174 A major challenge within medicine is to integrate the research evi- dence to inform the design of selection systems that are reliable and valid (and weighted appropriately) from undergraduate selection through to selection for specialty training after many years of education, for both academic and non-academic qualities.
따라서, 더 이론-주도적 연구가 'competent'의사란 누구인가 를 밝히기 위해 이뤄져야 한다. unified taxonomy of performance indicators 를 만들어서 단기- 장기- 예측 타당도의 표지자로서 활용해야 한다. 예컨대, 일부 연구자들은 의과대학선발시에는 학업성취도를 기반으로 select in 하고, 비학업적 기술을 바탕으로 select out해야 한다고 주장한다. 비학업적 능력이 PGME 선발에서 더 큰 역햘을 하며, 전공에 다라서 가중치가 달라질 수 있다는 주장도 있다. 예컨대 공감과 의사소통은 일반의와 소아과에서 중요하고, 경계vigilance와 상황인지situational awareness는 마취과에서 중요하다.
Hence, there is a need for more theoretically driven, future-oriented research aimed at identifying what a ‘competent’ physician is at the various stages of training and practice. This will allow researchers and practi- tioners to move towards crafting a unified taxonomy of performance indicators which may be used as markers in short- and long-term predictive validity studies of selection methods. For example, some researchers suggest that from undergraduate selec- tion onwards, medical students should be selected in on the basis of academic attainment and selected out on the basis of non-academic skills and attributes.175 It could be argued that non-academic attributes and skills should therefore play a much larger role in postgraduate selection and the weighting of these may differ depending on the specialty. For example, research from job analysis studies shows that empa- thy and communication are weighted more heavily for selection into general practice176 and paedi- atrics, whereas vigilance and situational awareness carry more weight in anaesthesia.177
실제practice적 함의
Implications for practice
추천서나 자기소개서보다 SJT와 MMI가 inter- and intrapersonal (non-aca- demic) 특성을 더 타당하게 예측한다. SJT와 MMI는 보완적일 수 있다. SJT가 더 넓은 영역의 구인을 효율적으로 평가한다면, MMI는 면대면 접촉을 포함한다. 비록 비용이 들지만 구조화된 면접은 지원자 응답을 더 멀리, 더 깊게 probe할 수 있다.
Our review shows that SJTs and MMIs are more valid predictors of inter- and intrapersonal (non-aca- demic) attributes than personal statements or refer- ences. Situational judgement tests (SJTs) and MMIs may be complementary: whereas SJTs can measure a broader range of constructs efficiently as they can be machine-marked, MMIs, by contrast, involve a face-to-face encounter. Although expensive, struc- tured interviews (including MMIs) allow applicant responses to be probed further and in more depth.
현재로서는 적성검사와 인지요인에 대한 그림은 덜 분명하다.
At present, the picture for aptitude tests and cogni- tive factors is less clear as a result of
- the large num- ber of aptitude tests and the differences between those that are currently available,
- the diverse out- come measures against which performance on apti- tude tests is compared (to assess validity, see the ‘criterion problem’ discussed above),
- the multiple ways in which aptitude tests are implemented, and
- the mixed nature of the evidence on the effective- ness of aptitude testing.
일부 적성검사는 특정 지원자를 선호한다는 근거도 있다.
There is also some evidence that some aptitude tests may favour certain types of candidate,46 which may have unfavourable implica- tions for fairness and widening access to medicine.
선발방법의 근거를 해석하고 적용하는데 대한 어려움에는 아래와 같은 것들
The challenges of interpreting and apply- ing evidence of selection methods include
- 장기 자료 부족 the relative lack of longitudinal data,
- 성과 준거의 합의된 기준 부족 lack of an agreed-upon framework of outcome criteria, and
- 기관별 차이 institutional differences (including in available resources, curricula and philosophies of what a high-performing medical student is considered to be).
Kreiter and Axelson는 학생선발의 목표의 복잡성이 장애가 된다고 지적함. social jus- tice, educational equality, health care and political outcomes 등이 종종 서로 경쟁하는 목표가 됨. 선발방법의 질과 효과성을 판단할 때, 어떤 준거는 서로 경쟁관계에 있음을 알아야 함. 예컨대 이해관계자나 평가자들이 생각하는 acceptability가 높더라도 타당도 근거가 낮을 수 있다. 유사하게, SC의 타당도 근거는 높지만, 비용이 많이 들어 사용하기 힘들다. 이러한 측면에서 선발도구의 질과 효과성을 판단할 때 의과대학은 선발시스템이 작동하는 시스템 내에서의 맥락을 고려해야 한다.
Kreiter and Axelson2 acknowledge that the complexity of admissions goals may also be an obsta- cle to evidence-based progress in medical school admissions because concerns regarding social jus- tice, educational equality, health care and political outcomes are broad and frequently competing. When judging the quality and effectiveness of selec- tion methods, it is noteworthy that some criteria may compete with one another. For example, the stakeholder acceptability of referees’ reports in selection is generally high, but the evidence for their validity is poor. Similarly, regarding other cri- teria, the evidence for the validity of SCs is high, but they are relatively costly to implement. In this respect, when judging the quality and effectiveness of different selection methods, medical schools and employers may choose to weight different features depending on the context within which the selec- tion system is operating.
코칭에 대한 취약성은 모든 평가도구의 공통된 우려사항이다.
A common central concern for any selection tool is susceptibility to coaching. Research over the last 10 years has increasingly focused on this issue, prob- ably because there has been increasing emphasis on how to validly assess non-academic attributes in selection for medical education.
- 자기소개서: 코칭에 영향을 받음. 다국적 기업이 있음. In particular, per- sonal statements are at significant risk of being influenced by coaching, or indeed of being written by somebody other than the applicant; a brief online search reveals a large number of companies internationally that sell pre-written personal state- ments.
- SJT: 코칭의 효과가 없음. With regard to SJTs, recent studies have found no effects of commercial coaching on SJT scores or the predictive validity of SJTs.87,178 How- ever, ongoing research is required to assess the coachability of the full range of non-academic selec- tion tools in greater depth.
미래 연구 아젠다
Scoping a future research agenda
명확한 결론은 내리기 어렵다.
It is clear from our review that it is challenging to draw firm conclusions regarding the relative strength of the different tools given the variety in the quality and design of the currently available research evidence: at present there are insufficient data, and medical education providers’ agendas are too diverse, to propose a fully comprehensive frame- work for international best practice in medical selec- tion methods.
잘 설계된 연구가 필요하다.
There is a clear need for well-planned studies focusing on the long-term follow-up of medical students, tracking students from admission through to assessments in more senior training posts in clini- cal practice, at the point of licensure and beyond.
widening access and diversity 에 관한 연구가 필요하다.
Within the broader sphere of issues of fairness in selection, more research exploring issues of widening access and diversity is required, whether it refers to race, ethnicity or social class, as this remains a chal- lenge within medical school admissions globally, and it is becoming increasingly important politically to reflect society within the health care profes- sions.179,180
O’Neill 등은 선발방법이 socal diversity에 미치는 유의한 영향은 없다고 하면서, 지원자 풀을 다양하게 하는 것이 더 중요하다고 했다. 아직까지 결론은 임시적이다.
O’Neill et al.181 found no significant effect of selection method on social diversity in the medical student population,
and sug- gest that the attraction of a sufficiently diverse appli- cant pool is more important for widening access than which selection tool is used. Therefore, only tentative conclusions can be drawn.
이전 교육성취도는 높은 예측타당도로 인해서 의학교육의 'academic backbone'이라고 불리지만, 어떻게 'contextual data'가 활용될 수 있을 것인가에 대한 연구 필요.
Whereas traditional markers of prior educational attainment have been called the ‘academic backbone’ of medical education because they are highly predictive of subsequent perfor- mance both at medical school and beyond, there is a need to explore how ‘contextual data’ can be used to allow the social and educational backgrounds of applicants to be taken into consideration alongside their educational achievements.
'비인지적'이라는 용어는 문제가 있는데problematic, '생각하지 않음'을 의미하기 때문이다.
A key criticism of selection research is that there is a distinct lack of theory-driven studies that examine issues related to validity and the constructs being measured and that, more broadly, acknowledge con- temporary models of adult intellectual development and skill acquisition, or attempt to integrate cogni- tive and non-cognitive factors.172,173 The term ‘non- cognitive’ is in itself problematic as it arguably implies ‘not thinking’;
다음을 제안함
In summary, we propose the following priorities for a future research agenda over the next 50 years in order to enable schools and employers to make evi- dence-based decisions about which selection tools to use and why:
1 longitudinal research exploring predictive valid- ity and following students throughout the course of their careers within education, train- ing and practice;
2 research enabling greater understanding of how selection tools may impact on widening access and diversity agendas, and
3 theory-driven studies of the construct validity of both academically and non-academically ori- ented selection methods and selection systems that will help us to understand what we are assessing for in both the short and long terms.
Finally, we propose that the following five consid- erations will be integral in shaping the direction of medical education research over the next 50 years:
1. 의과대학 입학은 여전히 경쟁이 높을 것이다.
1. Medical school admissions will remain highly compet- itive. The prestige of being a physician is likely to continue to drive a high applicant-to-selec- tion ratio in medical school selection interna- tionally over the next 50 years. However, this is unlikely to be true in all postgraduate spe- cialties; some medical career pathways may be perceived to be of higher status and will there- fore be more competitive than others. Medical selection may become part of a process to facil- itate recruitment into areas of most need. This may, in turn, require varying emphasis on selec- tion for specific attributes and competencies: one size is unlikely to fit all.
2. 비학업적 역량에 대해 더 집중될 것이다.
2. There will be an increased focus on, and value of, non-academic attributes and skills in medical selec- tion, aligned with what wider society wishes from its physicians. The role of the physician’s own well- being and resilience, and how these can best be selected for, then supported and developed, will be of increasing importance. Trainees’ expectations of their work–life balance will also be integral to medical selection over the next 50 years. Consideration must be given during selection to the discourse around how we encourage new generations of medical students to expend discretionary effort in future.This is strongly related to:
3. 다학제간 팀을 이끄는 능력, 제한된 자원으로 '일상의' 혁신 문화를 만드는 능력
3. a growing focus on capability to lead multidisci- plinary teams, and building a culture of ‘everyday’ innovation in an environment of reduced resources.
4. 한두명의 '혁신가'에 집중하기 보다는 모든 구성원의 헌신이 필요함
4. Rather than a focus on just one or two people in a team, who are touted as the ‘innovators’, there is likely to be an increased 책임onus on all health care professionals to innovate and pro- vide leadership in order to engage multiprofes- sional teams and to continue to deliver high- quality and compassionate care in a climate of ongoing health care spending cuts.185,186 This may represent a significant change in how applicants to medical education are selected. This, in turn, relates to:
5. 더 넓은 지원자 풀 확보
5. a focus on attracting a wider selection pool and recruiting a more diverse workforce, reflecting a philosophical shift towards acknowledging that non-traditional students may be able to align themselves with patients from diverse back- grounds and also contribute to the education of their peers by acting to challenge the cur- rent medical culture.187,188 Bringing such ‘non- traditional’ applicants into the health care sys- tem may promote, and indeed necessitate, innovative working practices. However, as we have discussed elsewhere,180 there is currently a multitude of unanswered questions on how this may be best implemented and how outcomes can be measured in a reliable and valid way.
Med Educ. 2016 Jan;50(1):36-60. doi: 10.1111/medu.12817.
How effective are selection methods in medical education? A systematic review.
- 1Department of Organisational Psychology, City University, London, UK.
- 2Work Psychology Group, Derby, UK.
- 3School of Medicine, University of Dundee, Dundee, UK.
- 4Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.
- 5School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK.
Abstract
CONTEXT:
Selection methods used by medical schools should reliably identify whether candidates are likely to be successful in medical training and ultimately become competent clinicians. However, there is little consensus regarding methods that reliably evaluate non-academic attributes, and longitudinal studies examining predictors of success after qualification are insufficient. This systematic review synthesises the extant research evidence on the relative strengths of various selection methods. We offer a research agenda and identify key considerations to inform policy and practice in the next 50 years.
METHODS:
A formalised literature search was conducted for studies published between 1997 and 2015. A total of 194 articles met the inclusion criteria and were appraised in relation to: (i) selection method used; (ii) research question(s) addressed, and (iii) type of study design.
RESULTS:
Eight selection methods were identified: (i) aptitude tests; (ii) academic records; (iii) personal statements; (iv) references; (v) situational judgement tests (SJTs); (vi) personality and emotional intelligence assessments; (vii) interviews and multiple mini-interviews (MMIs), and (viii)selection centres (SCs). The evidence relating to each method was reviewed against four evaluation criteria: effectiveness (reliability and validity); procedural issues; acceptability, and cost-effectiveness.
CONCLUSIONS:
Evidence shows clearly that academic records, MMIs, aptitude tests, SJTs and SCs are more effective selection methods and are generally fairer than traditional interviews, references and personal statements. However, achievement in different selection methods may differentially predict performance at the various stages of medical education and clinical practice. Research into selection has been over-reliant on cross-sectional study designs and has tended to focus on reliability estimates rather than validity as an indicator of quality. A comprehensive framework of outcome criteria should be developed to allow researchers to interpret empirical evidence and compare selection methods fairly. Thisreview highlights gaps in evidence for the combination of selection tools that is most effective and the weighting to be given to each tool.
© 2015 John Wiley & Sons Ltd.
- PMID:
- 26695465
- [PubMed - in process]