Articles (Medical Education)/입학, 선발(Admission and Selection)

학생선발과정에서 얻은 네러티브 정보가 문제행동을 예측한다 (Med Teach, 2016)

Meded. 2016. 11. 14. 04:21

2016. 11. 14. 04:21

학생선발과정에서 얻은 네러티브 정보가 문제행동을 예측한다 (Med Teach, 2016)

Narrative information obtained during student selection predicts problematic study behavior

MIRJAM G. A. OUDE EGBRINK & LAMBERT W. T. SCHUWIRTH

Maastricht University, The Netherlands

도입

Introduction

최근까지 초점은 cognitive academic performance 의 예측인자에 있었다. 그러나 이제 비인지적 quality도 미래 의과대학생과 의사로서 중요하다는 것이 명확하다.

Until recently, the focus has been primarily on predictors of cognitive academic perform- ance (Salvatori 2001; Siu & Reiter 2009). Nowadays, however, it is clear that, besides cognitive skills, non-cognitive qualities are important competencies of future medical students and doctors.

MMI가 사용되고 있음.

Recently, the so-called multiple mini-interview (MMI) show that multiple individual human judgments of non- cognitive skills when combined predict future performance in a sufficiently reliable way.

2007년 Maastricht University 의 P-CI 선발에 MMI를 사용하기 시작. 선발 과정에서 순위리스트가 나오는데, research master로서의 성공적인 수행 적합도를 예측에 대한 순위이다.

In 2007, the MMI method was introduced as part of the selection procedure for the four-year medical research master Physician-Clinical Investigator (P-CI) at Maastricht University (Guyaux et al. 2010). The selection procedure results in a ranking list, representing differences in predicted suitability to perform successfully in this research master.

대부분의 선발된 학생이 인지적 측면과 비인지적 측면 모두에서 성공적이지만, 일부는 문제행동을 보인다. 명확하게 이들 문제는 MMI점수에 의해서 예측되지 않으며 선발과정의 다른 부분에 의해서도 예측되지 못한다. 이론적으로 MMI 진행과정에서 면접관이 기록한 narrative information은 학생 파일에 저장되고, 이것이 미래 행동을 더 잘 예측해줄 수도 있다.

Although most selected students are successful in both cognitive and non-cognitive aspects of the study, some encounter professional lapses or problematic study behavior. Clearly, these problems were not predicted by the MMI scores or any other part of the selection procedure. Theoretically, the narrative information that is written down by the interviewers during the MMIs and stored in the student files could be a better predictor of such problems and could constitute a useful resource for the student mentors (called counselors in the P-CI master), but till now this information has been unused.

방법

Methods

맥락

Context

The four-year P-CI research master is a graduate-entry program that enables students to become medical doctor as well as clinical investigator. This combination makes it a challenging program for the students. Each year, a selection procedure decides which 30 students are allowed to enter this master.

They must have finished a biomedical bachelor with good results; GPAs as well as a cognitive test are taken into account in the first part of the selection procedure.
The second part consists of MMIs on different topics, such as motivation, past performance, empathy and communication skills. The applicants’ performances on each individual interview are graded independently by the interviewers as being ‘‘suffi- cient’’, ‘‘doubtful’’ or ‘‘insufficient’’, and the combination of all individual scores adds up to a ranking list. In each station, interviewers also make notes that are not used in the procedure itself; both notes and grading are completed in the time interval between individual interviews. The notes are stored for possible use in appeals, to underpin the inter- viewers’ judgments.

학생과 카운셀러(지도교수)

Students and counselors

In this study, we focused on students who enrolled into the n¼30) P-CI master in 2007 (cohort 2007; and 2008 (cohort 2008; n¼30). In this master, each student is assigned to a counselor at the start of the first year, who mentors the student on an individual basis throughout his/her study. Each counselor typically takes care of 3–8 students per cohort. Every year, student and counselor meet at least four times.

Seven counselors mentored the 60 students in cohorts 2007 and 2008 (five in cohort 2007 and six in cohort 2008; four of them were active in both cohorts). In the end, 54 out of 60 students have finished their study within four to five years, while one student is currently finishing the last part.

연구설계

Study design

This retrospective exploratory study was subdivided into three parts.

First, the seven counselors were asked to name the three most prevalent non-cognitive problems they encountered in ‘their’ students, and grade them (3-2-1) to indicate the graduate-entry (3 ¼most From their frequency of occurrence frequent). program that enables students to become medical doctor as reactions the two most highly-graded problems were selected well as clinical investigator. This combination makes it a for further analysis.
Second, two independent and blinded investigators (MoE and LS) analyzed the de-identified notes written down during the MMIs of 15 randomly chosen students out of the total of 55, and identified what they thought to be possible indicators for these two most frequent non-cognitive problems.
Third, a case-control study design was used. The coun- selors were asked to identify the students who exhibited either one or both of these non-cognitive problems during their study (cases). The notes of their MMIs were de-identified and screened by the same two independent and blinded investi- gators (MoE and LS) to investigate whether the proposed indicators of these problems were indeed present. As a control, the MMI notes of a similar number of control students from the same cohorts (without the identified non-cognitive problems) were screened for the presence of these indicators as well.

Results

두 가지 가장흔한 비인지적 문제

Part 1: The two most prevalent non-cognitive problems

계획 문제

Planning difficulties related to problems with

시간 관리 time management,
학습량의 과소추정 under- estimation of study load, and
우선순위 배정 문제 problematic prioritizing of tasks.

자기성찰 문제

Self-reflection-related problems were addressed as

자신의 행동의 결과에 대한 인식 부족 insufficient awareness of (the consequences of) own functioning,
방어적 행동 indica- tions of defensive behavior, and
개선을 위한 불충분한/비효과적 행동 insufficient or non-effective actions to improve this.

MMI노트에서 나타난 지표들

Part 2: Indicators in MMI notes

The narrative information that was written down during MMIs with 15 randomly chosen students was analyzed to investigate whether indications for the two most prevalent non-cognitive problems were already present during the selection procedure preceding the master.

In the MMI notes of five students both investigators found no indicators at all for the two non-cognitive problems. In the MMI notes of the other 10 students one or more potential indicators were found. In four of them potential indicators for both planning-related and self-reflection-related problems were present.

As a result of this analysis, a limited number of potential indicators for planning-related and self-reflection-related problems were identified (Table 2).

사례-대조군 연구

Part 3: Case-control study

Based on the above-mentioned findings, a case-control study was performed to investigate how predictive these indicators were for planning-related and/or reflection-related problems during the research master P-CI.

The seven counselors identified 23 students who exhibited prob-lems during their study planning-related and/or reflection-related had (cases).

Thirteen students planning-related problems, while
six had reflection-related problems; another
four students showed problems in both domains.

Altogether, the data indicate a statistically-significant asso- ciation between the presence of indicators for planning-related problems in MMI notes and the actual occurrence of such problems during the subsequent study (Table 3A: odds ratio 9.33; 95% confidence interval 2.12–41.07; p ¼0.003). No such evidence was found for self-reflection-related problems (Table 3B: odds ratio 1.39; 95% confidence interval 0.29–6.68).

고찰

Discussion

보통 선발 단계는 누구를 선발하고 떨어뜨릴지 결정에만 사용된다. 이번 연구에서 선발단계에서 얻어진 정부를 미래의 문제행동을 예측하는데 사용하였다.

As a result, the selection proced- ure is merely used to decide on who is admitted and who is not. In the current study, we propose to use narrative information obtained during selection interviews to predict future problems

선발된 학생이 성공할 수 있도록 early and dedicated counseling and remediation을 가능하게 해줄 것이다. 선발은 단순히 assessment-of-learning이 아니라 assessment-for-learning의 역할을 할 것이다.

This may enable early and dedicated counseling and remediation to improve the selected students’ study success. This way, selection will not only serve as an assessment-of-learning measure but also as a first assessment-for-learning step (Shepard 2000; Schuwirth & Van der Vleuten 2011).

Counseling은 연구커리어의 초반부터 이뤄지는 것이 educational, therapeutic intervention을 가능하게 해줄 것이다. Unorganized한 학생은 사전에 정해진 시간표에 따라 학습이 이뤄지는 과정에서의 학업부담과 압박때문에 힘들어한다. 적성 외에도 시간관리와 우선순위 설정은 학업적 성취에 중요하다. Organized한 학습은 progress와 success 모두와 연결된다. 따라서 early and dedicated counseling은 계획-관련 학습문제를 예방하거나 없애줄 것이며, study success를 높여줄 것이다.

With the cur- rent knowledge, however, counseling can be more focused right from the beginning of a study career, enabling specific educational and even therapeutic interventions. Literature shows that unorganized students suffer most from workload and pressure of progressing in their studies according to a predetermined timetable (Ruohoniemi et al. 2010). More than aptitude, time management and prioritizing are important for academic achievement (West & Sadoski 2011). Organized studying appears to be related to both study progress and success (Rytkonen et al. 2012). Therefore, early and dedicated counseling will help to prevent or diminish planning-related study problems and, as a consequence, improve study success.

절절한 자기-성찰은 의료전문직에게 중요하다. 이것이 우리가 포트폴리오와 카운셀링 시스템에서 학생에게 자기-성찰의 중요성을 깨닫게 하고, 성찰 스킬 개발을 자극하는 것을 중요한 목표로 삼은 이유이다.

Adequate self-reflection is nowadays considered an essential attribute of competent healthcare professionals. This is why it is one of the important goals of our portfolio and counseling system to increase students’ awareness of the importance of self-reflection and to stimulate development of their reflective skills (Driessen et al. 2005).

선발에 들인 노력에도 불구하고 의과대학 기간에 낙제하거나 유급이 발생하는 것은 우려를 낳는다. personal distress로 힘들어 하는 학생도 걱정하고, 대학 역시 struggling student에 쏟는 시간과 에너지가 disproportionate하여 걱정하며, 사회도 이들 학생에게 들어가는 공적 자금의 부담 때문에 걱정한다.

Drop-out from or delay during medical school, in spite of selection efforts, is a cause for concern (Yates 2011; Stratton & Elam 2014). This is the case

for the students involved who suffer from personal distress,
for the university that is faced with a disproportionate amount of time and energy spent on struggling students, and
for society that has to bear the financial in burden for drop-out and delayed students countries where they receive public funding.

실제로, 선발자료의 사용 용도가 많아지는 것은 재정적 관점에서도 매력적이다. 네덜란드같이 교육이 공적 자금으로 이뤄지는 국가에서, delay 나 drop-out을 막는 것은 상당한 비용을 보상한다.

Indeed, the additional use of selection data is attractive from a financial perspective. In countries like the Netherlands, where education is publicly funded, the gains of avoiding delay or drop-out will compensate largely for the costs of a selection procedure and counseling system.

Siu E, Reiter HI. 2009. Overview: What’s worked and what hasn’t as a guide towards predictive admissions tool development. Adv Health Sci Educ Theory Pract 14:759–775.

Stratton TD, Elam CL. 2014. A holistic review of the medical school admission process: examining correlates of academic underperform- ance. Med Educ Online 19:22919.

Med Teach. 2016 Aug;38(8):844-9. doi: 10.3109/0142159X.2015.1132410. Epub 2016 Jan 25.

Narrative information obtained during student selection predicts problematic study behavior.

Oude Egbrink MG1, Schuwirth LW1.

Author information

1a Maastricht University , The Netherlands.

Abstract

INTRODUCTION:

Up to now, student selection for medical schools is merely used to decide which applicants will be admitted. We investigated whether narrative information obtained during multiple mini-interviews (MMIs) can also be used to predict problematicstudy behavior.

METHODS:

A retrospective exploratory study was performed on students who were selected into a four-year research master's program Physician-Clinical Investigator in 2007 and 2008 (n = 60). First, counselors were asked for the most prevalent non-cognitive problems among their students. Second, MMI notes were analyzed to identify potential indicators for these problems. Third, a case-control study was performed to investigate the association between students exhibiting the non-cognitive problems and the presence of indicators for these problems in their MMI notes.

RESULTS:

The most prevalent non-cognitive problems concerned planning and self-reflection. Potential indicators for these problems were identified in randomly chosen MMI notes. The case-control analysis demonstrated a significant association between indicators in the notes and actual planning problems (odds ratio: 9.33, p = 0.003). No such evidence was found for self-reflection-related problems (odds ratio: 1.39, p = 0.68).

CONCLUSIONS:

Narrative information obtained during MMIs contains predictive indicators for planning-related problems during study. This information would be useful for early identification of students-at-risk, which would enable focused counseling and interventions to improve their academic achievement.

PMID:: 26805655
DOI:: 10.3109/0142159X.2015.1132410

[PubMed - in process]

저작자표시 비영리 변경금지

'Articles (Medical Education) > 입학, 선발(Admission and Selection)' 카테고리의 다른 글

AAMC의 전인적 평가(Holistic Review) (0)	2017.02.02
AAMC에서 요구하는 의과대학 입학생의 역량(Core Competencies for Entering Medical Students) (0)	2017.02.02
의과대학 Trainee선발에서 집단의사결정을 위한 새로운 방법(Med Educ, 2016) (0)	2016.11.10
썪은 사과 골라내기 (Adv in Health Sci Educ, 2015) (0)	2016.10.13
MMI점수 타당화: 다양한 기질을 측정하는가? (Adv in Health Sci Educ, 2014) (0)	2016.04.26

의과대학 Trainee선발에서 집단의사결정을 위한 새로운 방법(Med Educ, 2016)

Meded. 2016. 11. 10. 17:07

2016. 11. 10. 17:07

의과대학 Trainee선발에서 집단의사결정을 위한 새로운 방법(Med Educ, 2016)

A new method for group decision making and its application in medical trainee selection

James R Kiger & David J Annibale

도입

INTRODUCTION

의과대학이나 레지던트 프로그램에서 지원자를 선바하는 기준은 시험점수나 grade에 기반하고 있다. 그러나 많은 경우, 비록 이 숫자 점수의 합이 면접수행능력, 리더십, 기존 경험 등과 같이 정량화하기 어려운 것들보다 덜 중요한 것은 아니지만, 숫자 자료들은 combine된다. 결국, 모든 프로그램에서는 어떻게든 이 모든 정보를 '선호'의 순서로 단순화시킨 리스트로 승화시켜야 한다. 이 목표를 달성하기 위하여, 종종 pseudo-quantitative scoring systems 을 사용하나, 수학적으로 타당하지 못하고, counterproductive하다.

The criteria by which a medical school or residency training programme selects its preferred applicants may, in part, rely on test scores or grades. In almost every case, however, these numerical data are combined with, if not superseded by, considerations that are difficult to quantify, such as interview performance, leadership traits and prior experience. In the end, every schoolor programme must find a way of distilling all this information into a simple list of applicants in order of preference. To achieve this goal, groups often rely on pseudo-quantitative scoring systems that are mathematically unsound and may be counterpro- ductive to the collaborative process of making a list.

우리의 전공 수련 프로그램은 NRMP를 사용한다. NRMP는 1952년 도입되었는데, 이 당시에는 의과대학생과 레지던트 프로그램에서 혼란과 불만이 늘어나던 시기였다. 중앙화된 기구가 모든 의과대학졸업생을 available residency spot에 배정하는 역할을 맡게 되었다. NRMP 시스템은 60년간 그 자리를 지켜왔고, 더 많은 전공, 세부전공까지 확장되었다.

Our subspecialty training programme uses the National Resident Matching Program (NRMP) for applicant selection. The NRMP was formed in 1952 in response to escalating confusion and exas- peration on the part of medical students and resi- dency programmes. This centralised body assumed the task of sorting all of the nation’s graduating medical students into available residency spots.2 The NRMP system has stood relatively unchanged for more than 60 years, and has expanded to cover more specialties and subspecialties.

지원자와 훈련프로그램은 NRMP에 각자 자기의 입장에서의 순위를 제출한다. NRMP는 'deferred acceptance'알고리즘을 사용하여 지원자를 안정적이고 최적의 결과를 얻을 수 있게 sort해준다. 지원자에게 있어서 순위를 매기는 것은 부담이 크지만 근본적으로 개인적인 문제이다. 훈련프로그램 입장에서 순위를 정하는 것은 더 복잡하다. 어떻게 정량적 자료를 질적 특성과 통합할지를 결정해야 하고, 다수의 면접관에게 받은 주관적 정보를 최종 순위 정보로 만들지 고민해야 한다. 이 단계에서 발생하는 부정확성은 여러 문헌에서 밝혀진 바 있다

Applicants and training programmes both submit rank-order lists to the NRMP, which employs a ‘deferred acceptance’ algorithm to sort the appli- cants into training positions such that stable and optimal results are achieved.2,3 For applicants, creat-ing a rank order may be taxing, but is a fundamen- tally personal matter. For training programmes, generating a rank-order list may be significantly more complicated. Each programme must decide how to integrate objective quantitative data (test scores, grades, etc.) with qualitative characteristics (volunteer work, written statements, etc.) and the subjective opinions of multiple interviewers into a final rank-order list. The imprecision of this process is highlighted by published reports that have demonstrated the lack of correlation between information gathered during the interview process, the position of applicants on a programme’s rank- order list, and future resident performance.4–8

ERAS는 AAMC가 제공하는 순위 산정을 위한 pseudo- quantitative method 이다. 면접관은 지원자를 리커트-타입 평가 스케일에 배정하고(1~9), 지원자에 대한 평균점수가 예비적 순위를 만들어준다. ERAS시스템은 리커트 스케일 기반 시스템의 한 예이다.

The Electronic Residency Application Service (ERAS), provided by the Association of American Medical Colleges (AAMC), incorporates a pseudo- quantitative method to generate a rank-order list. Interviewers assign applicants scores on a Likert-typerating scale (integers of 1–9), and averaged scores for applicants are sorted to create a preliminary rank-order list. This ERAS sys- tem is simply one example of a Likert scale-based system,

이러한 Pseudo-quantitative methods 는 몇 가지 근본적 문제가 있다.

Pseudo-quantitative methods such as this are beset by a number of fundamental problems:

1 면접관마다 분포가 다름.
the scores assigned by different interviewers are differently distributed;
2 면접관에게 '숫자'의 의미가 일관되지 않음
numeric scores have no consistent meaning for interviewers (e.g. an interviewer who gives con- sistently lower scores may view a score of 7 points as signifying an excellent candidate, whereas another interviewer may view the same score as indicating an average candidate);
3 임의적 스케일의 순위자료이다. arithmetic operation에 부적절하다.
Likert scale-type scores are ordinal data on an arbitrary scale; it is inappropriate to perform arithmetic operations, such as the calculation of means, on such data,9–11 and
4 지원자는 일부 교수에 의해서만 면접을 하게 되고, 교수도 일부 지원자만 면접한다.
candidates are interviewed only by a subset of faculty staff, and each faculty member may interview only a subset of candidates. Any partic- ular candidate’s final score may be altered sub- stantially by the inclusion or exclusion of an interviewer who gives consistently high or low scores.

이러한 문제로, 우리는 ERAS에서 만들어준 순위를 그룹토의를 거쳐 재평가한 뒤 NRMP에 제출한다. 토론과정에서 점수는 '집단 의견'에 맞게 조정되어 순위를 재조정한다. 물론, 이러한 집단 토의도 목소리가 큰 소수의 영향을 받을 수 밖에 없고, 참여못한 사람의 의견은 토론에서 배제된다.

Given these problems, our programme has had to re-evaluate the preliminary ERAS-generated rank- order list in group discussions prior to submission to the NRMP. During such discussions, scores are modified to force the rank list to conform to the ‘group opinion’. Of course, this group opinion may be unduly influenced by a vocal minority, and those who are unable to attend are left out of the discussion.

rank-ordering process 향상을 위한 수학적 노력이 있어왔다.

Others have suggested different mathematical meth-ods to improve the rank-ordering process.

One approach is to have interviewers compile individually ordered preference lists of applicants, instead of assigning scores. Both Chew et al. and Collins et al. suggest applying a formula to individ- ual rank lists to create scores that can then be aver- aged.12,13
These systems resemble the Borda voting system in which each voter gives each candidate a number of points proportional to that candidate’s place on the voter’s list.14 These systems are ham- pered by the fact that the score derived from any given voter is dependent on the number of candi- dates seen by that voter.
A recent article by Ross and Moore suggests retaining scores, but comparing candidates pairwise and assigning a ‘win percentage’to each in a system similar to that used in sports ranking.15

우리는 몇 가지 설계원칙을 정했다.

We proposed a set of design principles to which an optimal system should adhere:

1 the opinions of all interviewers will carry equal weight;
2 the rank-order list will not be influenced by which interviewers meet any individual candi- date;
3 interviewers will compare only applicants whom they have met;
4 the system will not depend on scores assigned on an arbitrary scale, and
5 the final ordering will be transparent and repro- ducible.

METHODS

알고리즘 개발

Algorithm development

We developed an algorithm termed ‘collab-orative unbiased rank list integration’ (CURLI)

네 단계로 이뤄짐

The CURLI algorithm involves four steps:

1 each interviewer submits a personal ranked pref- erence list of the applicants he or she has met or reviewed;
2 each personal rank-order list is used to generate a pairwise preference table of applicants;
3 the individual preference tables are summed to generate a composite preference table, and
4 a sorting algorithm is applied to the composite preference table to generate a final rank-order list.

기본적인 결과는 이렇다. 만약 지원자 A와 B가 모두 일부 교수에 의해서만 면접을 봤다면, 그리고 A가 B보다 더 많은 면접관들에게 선호된다면, A는 선호도 리스트에서 더 높은 순위를 받는다. 이는 얼마나 많은 인터뷰를 했는지, 몇 명의 교수가 했는지, 어떤 배점 bias가 있는지에 무관하다.

The fundamental result of the CURLI algorithm is as follows: if applicants A and B are both inter- viewed by a subset of faculty members, and candi- date A is preferred to candidate B by a majority of those interviewers, then candidate A will appear higher on the final preference list. This is unaf- fected by how many interviews any specific faculty member conducts or any individual scoring biases.

개별 순위 리스트

Personal rank-order lists

The fundamental change for interviewers is that instead of scoring applicants on an arbitrary scale, they are asked to maintain a personal ranked prefer- ence list of the applicants they have interviewed. Interviewers include only applicants they have met, conforming to design principles 2 and 3 above. Interviewers no longer assign arbitrary scores, removing the undue influence exerted by interview- ers who give consistently high or low scores, satisfy- ing principles 1 and 4.

짝지은 순위 표

Pairwise preference tables

지원자 선호가 더 높으면 상대비교에서 1 입력

Each interviewer’s ranked preference list is converted to a preference table, which is populated by the numbers 1 or 1 depending upon which applicant appears higher on that preference list. No values are assigned to applicants the interviewer did not meet. A preference list implies a comparison between all possible pairs of applicants on that list. Applicants appearing higher on the rank-order list are preferred to all applicants ranked below them. Therefore, a rank-order list of size n contains (n 9 [n 1])/2 pairwise comparisons between applicants.

4명의 지원자 A B C D중, C는 면접을 못 보고, 나머지 셋의 순위는 B D A 순서인 경우

For example, imagine there are four applicants: A, B, C and D. An interviewer meets all but applicant C, and submits the following rank-order list: B–D–A.

Table 1 shows the preference table generated from this list.

혼합 순위 표

Composite preference table

A composite preference table is computed simply by adding all of the individual preference tables.

For example, four interviewers (I, II, III and IV) provide the following rank lists for four applicants:

Interviewer I: B–D–A;
Interviewer II: C–B–A–D;
Interviewer III: B–C–D–A, and
Interviewer IV: C–D–B.

Table 2 shows the resulting four individual prefer- ence tables. Table 3 shows the composite preference table yielded by the sum for each cell.

배열

Sorting

modified bubble-sort algorithm 를 사용하여 composite table을 만들었음.

A sorting algorithm is applied to the composite preference table to obtain the final rank-order list. For our programme, we applied a modified bubble-sort algorithm to the composite table.16 An initial unsorted list is generated. Each applicant is compared with the applicant immediately below on the rank list by checking the corresponding value inthe composite preference table. If the lower-ranked applicant is preferred (i.e. the value in the cell is > 0), the order of the two applicants is swapped. This is continued until no more pairs of applicants are swapped. In the ideal scenario, the re-sorted list will yield a composite preference table with all nega-tive values in the upper triangle.

Re-sorting하면 Table 4가 됨

For our example, the final sorted rank list is: C–B– D–A. Re-sorting the preference table to reflect this order gives a matrix with a fully negative upper tri- angle which indicates that every applicant is pre- ferred by a majority of interviewers to all the applicants below them on the list (Table 4).

Borda voting scheme으로 같은 것을 한다고 했을 때, 각 지원자가 획득 점수 기준으로 나열했을 때 두 명이 C를 더 선호했음에도 B가 가장 높을 수도 있다.

If one imagines running the same example with a Borda voting scheme, for instance, in which each applicant is awarded points based on his or her position on each list, it is possible that applicant B may have been ranked highest, although two of the three interviewers who directly compared applicants B and C preferred applicant C.

방법론

Methodology

We implemented this new ranking algorithm during the 2013 neonatal-perinatal fellowship match. All faculty members and fellows were instructed to maintain a personal ranked preference list of the applicants they interviewed. They were also asked to assign a score of 1–9 to each participant as had been done in previous years, as per the ERAS sys- tem. These ‘shadow’ scores were used to compare the outcome of the CURLI algorithm with the results that would have been generated by the old Likert scale-based method.

결과

RESULTS

During the trial year 14 applicants were interviewed, and 19 faculty members and fellows served as inter- viewers. Figure 1 shows the minimum, maximum, median and interquartile ranges for the scores assigned by each individual interviewer.

평가자들은 점수 범위의 일부만 사용하였고 86%는 6점 이상이었다.

On average, each interviewer scored nine applicants. All inter- viewers utilised a truncated part of the scoring range at the top of the scale. Of 162 total scores assigned, 139 (86%) were ≥ 6. The median score assigned by each interviewer ranged between 6 and 8.

개별 면접관마다 discordance가 있었다. 총 162개의 점수를 주었는데, 그 중 23개는 자신이 매긴 순위와 점수의 순위가 달랐다.

We observed discordance between individual inter- viewers’ assigned scores and their final assessments of an applicant’s desirability. Collectively, the inter- viewers assigned a total of 162 scores, 23 (14%) of which were out of order in relation to the rank- order list of the interviewer who had given them.

new CURLI algorithm에 따라서 14명의 지원자 중 9명이 서로 다른 ranking list에 assign됨.

by the new CURLI algorithm. Of the 14 applicants, nine would have been assigned to a dif- ferent place on the final ranking list.

지난 3년간, 우리 분과는 2시간씩 2번의 미팅을 해서 preliminary list를 조정했는데, 이번에는 1시간만 걸렸다. 순위가 달라진 지원자는 없었다.

In the prior 3 years, our division had scheduled two 2-hour meetings to discuss and modify the prelimi- nary rank-order list. In this trial year, we required only a single 1-hour meeting to achieve consensus. No candidates were moved as a result of that discus- sion. Figure 2 shows the relationships between the preliminary rank-order list and the final rank-order list for 2013 and the prior 2 years. The changes reflect the alterations made during the divisional meeting. In 2011 and 2012, the positions of nine of 14 applicants, and 13 of 16 applicants, respectively, were moved on the final list.

고찰

DISCUSSION

행정적 관점에서 미팅이 4시간에서 1시간으로 줄었고, 순위의 변화가 없었다. composite preferene table을 공개하여 투명성을 확보하였다.

From an administrative perspective, the new method reduced meeting time from 4 hours to 1 hour, dur- ing which no changes were made to the rank-order list. During that meeting the composite preference table was displayed, providing complete trans- parency.

CURLI algorithm 는 몇 가지 장점이 있다. 재생산가능하고 투명하다. 지원자의 순위를 바꾸려는 소수의 압력을 극복할 수 있다. 면접관의 intrinsic difference에 의한 불공평함을 줄일 수 있다.

We suggest that our CURLI algorithm has numer- ous theoretical benefits that are borne out in prac- tice. It is reproducible and transparent. There is reduced vulnerability to pressure from a minority of participants to change a candidate’s rank position, and the inequality imposed by intrinsic differences in scoring among interviewers is removed.

CURLI algorithm 는 확실한 장점이 있다. Borda voting scheme과 유사한 방법들에서 지원자는 '점수'로 평가받거나 랭킹을 평균낸다.

Compared with other options that have been pro- posed, we feel that the CURLI method offers clear advantages. Borda voting schemes, and similar methods, introduce a process whereby applicants receive points for their place on each list, or in which the rank number on each list is averaged.12–14

이러한 방법은 모든 면접관이 모든 지원자를 면접할 경우에는 만족스러운 결과를 줄지도 모르나, 각 면접관이 일부 지원자만 면접할 경우 문제가 될 수 있다. 예컨대 일부 지원자만 면접했는데, 이들이 모두 least desirable한 지원자들일 수도 있다. 이 경우 Borda-like 방법에서는 이 지원자들 중 순위가 높은 사람은 엄청난 이득을 보는 셈이다. CURLI에서는 상대적 비교만 하기 때문에 그러한 문제가 없다.

These methods may yield satisfactory results if all interviewers see all applicants (i.e. every individual preference list is full), but in cases like ours in which each interviewer sees only a subset of appli- cants, these methods are problematic and allow bias. Take, for example, an interviewer who inter- views only a few applicants, all of whom happen to be among the least desirable. Under the Borda-like methods, the top-ranked applicant on this list will obtain a huge advantage in points or rank, even though that applicant may actually not be desirable compared with all the other applicants that particu- lar interviewer did not see. As the CURLI method uses the rank lists only to make pairwise compar- isons between applicants the interviewer actually saw, it suffers no such bias.

다른 pairwise 비교법도 있지만 CURLI보다 덜 투명하고 더 쓰기 힘들다. 대부분의 면접관은 심지어 내적일관성조차 유지하기 힘들다. CURLI는 arbitrary score의 가능성을 완전이 없앤다.

Other pairwise comparison methods have been proposed, but we feel they are less transparent and more cumbersome than our CURLI method.15 As our case study highlights, the majority of interviewers failed to maintain even internal consistency in their score assignment during one interview season. The CURLI method we have described dispenses with arbitrary scores entirely.

지식점수, 임상추론점수, SCT등에서 사용 가능할 것이다.

We believe this method may find fur- ther application in medical training in the scoring of knowledge or clinical reasoning assessment tools, such as script concordance testing.17

2016 Oct;50(10):1045-53. doi: 10.1111/medu.13112.