의과대학 학생선발: 자기소개서의 신뢰도와 타당도 향상 (Acad Med, 2006)

Medical School Admissions: Enhancing the Reliability and Validity of an Autobiographical Screening Tool

Kelly L. Dore, Mark Hanson, Harold I. Reiter, Melanie Blanchard, Karen Deeth, and Kevin W. Eva





많은 다른 학교들처럼 Michael G. DeGroote School of Medicine at McMaster University 는 uGPA와 이들이 제출한 자기소개서(ABS)를 기반으로 학생들을 초청하여 지원자 면접을 한다. 연구 결과를 보면 uGPA의 신뢰도와 타당도는 비교적 안정적이나 ABS의 그것은 약하다.

Like many schools, the Michael G. DeGroote School of Medicine at McMaster University invites candidates to interview based on grade point average (uGPA) and a candidate-written autobiographical submission (ABS). Local research has demonstrated strong reliability and validity for uGPA,2,3 but the reliability of the ABS has been weak.2


ABS는 다섯개의 질문으로 되어있으며, 여기에는 지원자의 개인적 경험, McMaster에 적합성, 의학 진로와 전공에 적합성 등을 포함한다. 각 지원자의 ABS는 개인정보를 삭제한 다음 세 명의 독립적 평가자에 의해서 평가된다(one health science faculty member, one community member, and one medical student.)

The ABS is composed of five questions designed to evaluate noncognitive characteristics such as applicants’ personal experiences, suitability for McMaster and suitability for a career in medicine. Each applicant’s five ABS questions, stripped of any personal identifiers, are scored by three independent raters: one health science faculty member, one community member, and one medical student.


각 평가자는 30~60개의 ABS를 평가하며, 매년 최대 150명의 평가자가 동원된다.

Each rater scores 30–60 ABS submissions and upwards of 150 raters participate annually.



평가의 비-독립성

Non-independence of the ratings


ABS점수는 의과대학에서의 수행능력과의 상관성이 매우 낮은 것으로 밝혀졌으며, NLE와도 마찬가지다. 이에 대한 한 자기 이유는 ABS점수의 평가자간 신뢰도가 0.45로 낮기 때문이다. ABS는 그러나 높은 내적일치도를 보여준다. 비록 높은 내적일치도가 척도로서의 신뢰성을 보여주는 지표이긴 하나, 동시에 이것은 한 개인에 대한 평가가 지원자별로 독립적으로 이뤄지고 있지 않음을 보여주는 것이기도 하다. 즉, 후광효과가 반영된다는 의미이다. 어떤 지원자의 첫 번째 답변이 그 지원자의 두 번째, 세 번째 항목에 대한 답변의 수행능력에 영향을 주게 되면, 전체적인 지원자가 작성한 각각에 대한 점수의 평균이 아니라 지원자에 대한 첫인상이 그 지원자의 점수를 결정짓게 되는 것이다. 이는 중요한데 왜냐하면 기능적으로, 열 다섯개가 아니라(평가자 수 x 문항 수), 단 세 개의(즉 평가자 수 만큼의) 관찰결과만이 수집된다고 볼 수 있기 때문이다.

Scores on the ABS have been shown to correlate poorly with performance both within medical school and on the national licensing examinations written postgraduation.2 One reason identified by Kulatunga-Moruzi and Norman is that the interrater reliability of ABS scoring is less than adequate (0.45). The ABS has, however, been seen to have high internal consistency (0.88). Although high internal consistency may be seen as supportive of the reliability of a measure, it may in fact be a negative indication that the scores assigned to the individual questions do not provide independent measures of the applicant. That is, the halo effect may be afflicting this measure; if performance on the first question influences the raters’ perceptions of performance on subsequent questions, then the initial overall impression of the candidate will determine the scores assigned to individual questions rather than the individual questions summing to provide a global assessment. This is an important distinction, because it would indicate that, functionally, only three observations (from three raters) are being collected in the current system instead of the desired fifteen


이것이 문제인지 아닌지 알기 위해서 평가가 수집되는 방향을 바꾸었다.

To test whether or not this was an issue, we altered the direction in which ratings were collected.


피평가자의 비-독립성

Non-independence of the ratees


의심할 여지 없이, 소수의 지원자는 ABS를 대리인을 시켜 작성하게 한다. 더 흔한 것은 지원자가 작성한 ABS를 친구, 가족, 재학생, 의사 등에게 보여주고 피드백을 받는 것이다.

Undoubtedly a small percentage of candidates are less than scrupulous and hire ghostwriters in an attempt to generate a more appealing ABS. More commonly, however, candidates will pass their submissions around to friends, family, current students, or practicing physicians for feedback to improve the submission.



여기에는 몇 가지 측정과 관련한 문제가 있는데, 첫째로 좋은 ABS의 상한선이 존재하는 한, 그리고 이러한 피드백을 통해서 향상이 이뤄진다면, 결국 지원자가 homogeneous해지는데 기여할 것이며, 획득가능한 신뢰도와 타당도의 최대치를 낮출 것이다. 둘째로, 이러한 restriction of range 가 아니어도 타당도에 대한 의문이 생기는데, 지원자를 평가하는 것인지 지원자의 지지기반시스템을 평가하는 것인지 헷갈리기 때문이다.

it creates a pair of measurement problems. First, given that there is an upper limit on how good an ABS can appear, and assuming that the collection of feedback results in improvement, the submissions may end up being more homogeneous than the candidates, thus lowering the maximum achievable reliability and validity. Second, even without restriction of range, the validity itself must be questioned as it becomes questionable whether one is discriminating between candidates or between candidate support systems.


방법

Method



현장에서 작성하는 ABS는 사전에 제출한 ABS와 대등하나 동일하지 않다. 윤리적 의사결정, advocacy, 개인 경험 등에 초점을 둔다.

The onsite ABS questions were comparable with, but not identical to, the noninvigilated questions participants answered offsite, with questions focusing on ethical decision making, advocacy, and personal experiences.



30개의 무작위 선택한 지원자의 ABS를 평가

For a subset of 30 randomly selected candidates, two scoring methods were compared for each ABS.



Results


 

사전 제출한 ABS의 점수가 현장 작성 ABS보다 높았다. 유의미한 interaction이 있어서, 이러한 main effect는 전통적 방법(offsite, vertical) 평가방법에 영향을 받는 것으로 보인다.

The scores for the ABS completed offsite (mean 4.4) were significantly higher than those completed onsite (mean 4.1; F 5.7, p .05). A significant interaction between site and scoring method (p .01) revealed that this main effect was driven by a higher mean score in the traditional (offsite, vertical) scoring method (mean 4.7) relative to the other three groups (mean 4.0 to 4.2).


평가자간 신뢰도는 onsite에서 높았다. 그러나 offsite ABS의 평가자간 신뢰도는 수평방향 평가는 중등도였으나, 수직방향 평가에서는 낮았다.

When the interrater reliability was assessed, it was found to be high for ABS’s completed onsite (0.81 with vertical scoring, 0.78 with horizontal scoring). However, the offsite ABS interrater reliability was moderate when horizontal scoring was used (0.69), but poor when vertical scoring was used (0.03).


가장 중요한 것으로, ABS 점수와 MMI 의 상관관계는 수직방향보다 수평방향 평가에서 더 뚜렷했다.

Perhaps more importantly, the ABS scores correlated better with the MMI when the horizontal scoring method was used (r 0.44 offsite and 0.65 onsite) relative to when the vertical scoring method was used (r 0.12 offsite and 0.28 onsite).


Discussion


수직방향 평가를 사용했을 때의 높은 내적일관성은 후광효과에 대하여 우려하게끔 한다.

The higher internal consistency achieved using the vertical scoring method provides evidence for our concern that the halo effect may have been biasing ABS assessments


외부와 단절된 상태에서 진행되는 ABS는 수평방향 평가를 활용하여, onsite에서 감독하에, 시간제한을 두고 작성하게 했을 때 가장 좋았다. 감독을 두고 작성하게 함으로써 피평가자의 독립성이 유지된다. 그러나 ABS는 외부와 단절된 상태에서 진행되지 않는다. MMI와 onsite ABS를 비교하여 보았을 때, MMI가 여러 이유로 더 선호된다. 첫째로, 전반적인 일반화가능도에서 MMI는 onsite ABS만큼 강ㄺ하다. 둘째로, 예측타당도에 있어서 MMI는 의과대학에서의 측정과 유의한 정의 상관을 보인다. 셋째로, onsite ABS 채점은 평가자의 시간을 많이 들여야 하고 의사결정이 지연되나 MMI는 즉석에서 그 날 결과가 나온다.

Seen in a vacuum, the method of ABS administration that performed best is clearly application of the horizontal scoring method to submissions collected in onsite, invigilated, time-controlled circumstances. Invigilation ensures independence of the ratees. However, the ABS does not function in a vacuum. Given the choice between MMI and onsite ABS, the MMI is preferred for a number of reasons. First, in terms of overall test generalizability, the MMI is at least as strong as the onsite ABS4. Second, with respect to predictive validity, the MMI has demonstrated significant positive correlation with in-school measures,5 and national licensing examination scores.6 Third, scoring of onsite ABS’s requires rater time subsequent to the date of interview, thus delaying decision making, whereas MMI scores are available immediately on that date.



 







 2006 Oct;81(10 Suppl):S70-3.

Medical school admissionsenhancing the reliability and validity of an autobiographical screening tool.

Author information

  • 1Program for Educational Research and Development, McMaster University, MDCL 3510, 1200 Main Street West, Hamilton, Ontario, L8N 3Z5, Canada. kelly.dore@learnlink.mcmaster.ca

Abstract

BACKGROUND:

Most medical school applicants are screened out preinterview. Some cognitive scores available preinterview and some noncognitive scores available at interview demonstrate reasonable reliability and predictive validity. A reliable preinterview noncognitive measure would relax dependence upon screening based entirely on cognitive tendencies.

METHOD:

In 2005, applicants interviewing at McMaster University's Michael G. DeGroote School of Medicine completed an offsite, noninvigilated,Autobiographical Submission (ABS) preinterview and another onsite, invigilated, ABS at interview. Traditional and new ABS scoring methods were compared, with raters either evaluating all ABS questions for each candidate in turn (vertical scoring-traditional method) or evaluating all candidates for each question in turn (horizontal scoring-new method).

RESULTS:

The new scoring method revealed lower internal consistency and higher interrater reliability relative to the traditional method. More importantly, the new scoring method correlated better with the Multiple Mini-Interview (MMI) relative to the traditional method.

CONCLUSIONS:

The new ABS scoring method revealed greater interrater reliability and predictive capacity, thus increasing its potential as a screen for noncognitive characteristics.

PMID:
 
17001140
 
[PubMed - indexed for MEDLINE]


+ Recent posts