의과대학생 선발의 신뢰성에 평가자와 상황이 미치는 영향(Med Educ, 2009)

Rater and occasion impacts on the reliability of pre-admission assessments

Rick D Axelson & Clarence D Kreiter CONTEXT





지난 5년간 발표된 논문을 보면, MSPI(medical school pre-admission interview)의 대안을 개발하고 평가해왔다. 여러 의과대학이 이미 전통적인 MSPI를 OSCE 타입의 면접방법으로 바꿨는데, MMI와은 이런 면접방식은 지원자의 비인지적 특성을 평가하고 GPA와 같은 학업성취도나 MCAT같은 인지적성을 보완하는 척도로서 역할을 했다. 기존 연구에 따르면 MSPI에 비해서 이런 방식이 더 신뢰도가 높다.

However, research published over the last 5 years has documented the development and evaluation of new assessment techniques that are designed as alternatives to the MSPI. This literature suggests that a number of medical schools have already replaced the traditional MSPI with objective structured clinical examination (OSCE)-style mea- surement methods. Similar in function to the MSPI, station-based OSCE methods such as the multiple mini-interview (MMI)1,2 attempt to assess applicants’ non-cognitive attributes and generate scores that are used to supplement measures of undergraduate academic achievement (e.g. grade point average [GPA]) and cognitive aptitude (e.g. Medical College Admission Test [MCAT] score) in making admission decisions. Research suggests that, compared with the traditional MSPI, these new techniques yield summary scores with superior reliability.3–6


이러한 새로운 방법을 적용하는 것이 지원자에 대한 비인지적 정보 습득의 신뢰도와 타당도를 높여줄 지는 모르지만, 의과대학에서는 이러한 OSCE식의 면접방식을 위해서는 면접 절차와 지원자의 캠퍼스 방문 프로그램의 상당한 재설계가 필요하다. OSCE형태에서는 여러 스테이션이 있고, 다수의 독립적인 평가자가 지원자를 평가하므로 다수의 지원자, 평가자, 표준화연기자가 동시에 한 캠퍼스에 모여야 한다. 인터뷰와 캠퍼스 방문의 특성, 기능, 비용을 크게 변화시키게 된다. 

Although adopting this new method may improve the reliability and validity of the non-cognitive information obtained about an applicant, medical colleges must weigh this against the fact that OSCE-style assessments will require a significant restructuring of both the pre-admission interview process and the applicant’s campus visit. Because the OSCE format requires multiple independently rated performances of each applicant in response to station challenges, logistics necessitate that a large number of applicants, raters and stan- dardised participants be simultaneously present at one location on campus for the administration of this type of assessment. The changes required by an OSCE- style assessment can alter the nature, function and costs of both the interview and the campus visit. 


각 의과대학의 좋은 점을 지원자에게 보여주기 위해서 면접을 활용하고 있는 학교들은 작은 수의 학생을 대상으로 상대적으로 덜 구조화된 면접을 운영한다. 비록 일부 학교에서는 추가 비용이 적다고 했지만, 대체로 이러한 학교들에게 있어서 OSCE 식의 면접을 도입하는 것은 비용을 크게 상승시키는 일이 된다. 

In schools that currently use the interview to familiarise applicants with the positive attributes of the institu- tion, interviews generally require smaller groups of students and are typically conducted in relatively informal, less structured sessions. For these schools, implementing an OSCE-style approach would require significant restructuring of existing recruitment and admissions procedures and would probably increase their cost. Although some have maintained that the added expense is small,8 changing from the MSPI to an OSCE-style format will undoubtedly incur consid- erable development costs and alter the nature and function of the applicant’s pre-admission campus visit.


입학OSCE에서 사용되는 면접 시나리오와 지원자에게 주어지는 과제와 면접 질문등은 면접에서 평가하고자 하는 구인을 어떻게 정의하고 개념화하느냐에 따라 달라지므로, 각 스테이션 안에서 어떤 일이 일어날지에 대해서는 학교간 차이가 크다.

Because the scenarios, tasks and interview questions used in an admission OSCE differ depending on how assessment designers define and conceptualise the construct being assessed, there is considerable variability across schools in what transpires within the stations pre- sented.


OSCE스타일의 평가를 위한 내용을 만들기 위해서 프로페셔널리즘이나 직무분석 등에 맞추어 스테이션을 설계하는데, 어떤 시긍로 하든 대체로 신뢰도 높은 결과를 낸다. 그러나 스테이션별 내용에 차이에도 불구하고 모든 OSCE스타일의 방법은 다수의 독립적 평가자가 지원자를 평가한다. performance-based 평가에 있어서 G study와 D study가 독립적으로 평가되는 행동 샘플의 수가 늘어나면 신뢰도가 높아진다는 결과를 꾸준히 내고 있어서 MMI 사용해서 높은 신뢰도를 얻을 수 있다는 것이 놀랍지는 않다. 더 나아가서 타당도 일반화 이론에 따르면 신뢰도가 타당도의 최대치를 규정짓기 때문에, 타당도가 높아지는 결과도 기대할 수 있다.

Approaches to shaping the content of the OSCE-style assessment have focused on designing station challenges that fit within a framework of professionalism, job analysis or another domain, but all tend to yield reliable scores. Yet, despite content differences, all OSCE-style methods are similar in that they elicit multiple independently rated applicant performances. Given that generalisability (G) studies and decision (D) studies of performance-based assessment scores have consistently demonstrated that increasing the number of independently rated behavioural samples also efficiently increases reli- ability, the positive reliability outcome fromthe use of the MMI is not surprising. Further, as validity gener- alisation theory suggests that reliability governs the maximum attainable validity,9 the positive validity outcomes are also expected.


그러나 MMI에서 얻은 G study 결과의 해석에는 오해가 조금 있다. 최근 Roberts는 G study에서 8개 스테이션을 합하여 0.70의 G 계수를 추정하였다. 비록 이 신뢰도 계산 결과가 다른 MMI와 비슷하고 MSPI보다는 훨씬 우월하지만, Roberts는 이 결과를 '면접관 주관성'이 신뢰도가 높게 나온 주된 이유라고 언급했다. 비록 이것이 사실일지라도, 그들의 G study가 이 결론을 지지하지는 않는다. 더 나아가서, 우리가 보여줄 것처럼, 한 면접방에 평가자 수만 늘리는 것은 Roberts가 보고한 수준으로 신뢰도를 높여주지 못한다. 만약 평가자의 주관성이 에러의 주된 원인이었다면 단순히 하나의 스테이션에 평가자 수를 늘리거나 패널 인터뷰를 하는 것 만으로도 MMI만큼의 신뢰도가 나와야 할 것이다. Roberts의 G study에서 높은 신뢰도는 다수의 독립적 평가상황에 기인한 것일 가능성이 높다.

It should be pointed out, however, that there remains some misunderstanding regarding the interpretation of G study results derived using MMI scores. In a recent example, Roberts et al.3 published a G study of an MMI trial and estimated a G coefficient of 0.70 for a score summarising performance on an eight-station MMI. Although this reliability result is consistent with other studies of the MMI and is far superior to results obtained with the MSPI,10 Roberts et al.3 interpreted the results as suggesting that ‘interviewer subjectivity’ is the most important determinant governing the level of obtained reliability. Although this may be true, their G study does not support this conclusion. Further, as we will show, increasing the number of raters for a single encounter does not yield reliabilities to the level reported by Roberts and his colleagues.3 If rater subjectivity were the primary source of error, simply adding raters to a single station or panel interview would achieve reliabilities similar to those reported for the MMI. In the G study reported by Roberts et al.,3 it seems much more likely that the high level of reliability can be primarily attributed to the number of independently rated occasions on which the applicant was allowed to perform.


MMI의 타당도를 이해하는데 있어서, OSCE스타일의 기법에서 왜 신뢰도가 높은가를 연구하는 것이 도움이 될 것이다. 

In understanding MMI validity, it is informative to study why OSCE-style techniques yield these high reliabilities. Do they emanate from the number of raters, the unique challenges presented by the MMI assessment or, alternatively, from the OSCE-style measurement format that affords multiple opportu- nities to perform? To help address these issues, the present study examines whether independent repli- cations of the MSPI are likely to positively impact reliability. If the MSPI achieves dramatically improved reliabilities with a simple strategic restructuring, this may also imply that this modified MSPI is a useful intermediate approach for those who are currently unable to implement MMIs.



방법

METHODS


Each interviewee participated in a 25-minute inter- view conducted by two faculty members. 

  • 구조화된 부분으로 시작하여 (5점 척도)
    Interviews began with a structured component, in which candi- dates were read and responded to a series of four predetermined questions. Answers to each question were independently and immediately scored by the interviewers on a scale of 1–5 (5 = excellent, 1 = poor) using an established scoring rubric. 
  • 비구조화된 면접으로 이어짐(5점 척도)
    Fol- lowing the completion of the structured questions, the interview was opened to a free-flowing, unstruc- tured
    exchange between the faculty interviewers and the candidate on any questions or topics of interest. At the end of the interview, each faculty interviewer independently assigned a score for the unstructured portion of the interview on a scale of 1–5 (5 = excel- lent, 1 = poor). 
  • 두 부분 사이의 시간은 비슷함
    On average, equal amounts of interview time were spent on the structured and unstructured parts of the interview during the study period (2003–2007).



Across the 5 years, 168 applicants were interviewed twice in consecutive years. As the faculty interviewers were drawn from a large pool (n > 150) and assigned to an applicant in a ‘pseudo random’ fashion, it is very unlikely that students who interviewed in con- secutive years encountered the same interviewers. Consequently, a random model with rater (r) nested within both person (p) and occasion (o) and person crossed with occasion (r : [p · o]) was used to estimate variance components (VCs) for those appli- cants who interviewed twice.












스테이션 수 증가에 따른 효과 > 면접관 수 증가에 따른 효과

As shown in Fig. 2, increasing the number of interview occasions is much more effective than increasing the number of raters within an occasion. 

  • For example, the reliability estimate for one rater for one occasion is 0.23, but rises to 0.73 for nine occasions each with one rater. 
  • However, when the number of raters for a single occasion is increased, the reliability, estimated at 0.23 for one rater, increases to only 0.36 for nine raters.



DISCUSSION


다수의 단일 평가자 MSPI만으로도 신뢰도를 높일 수 있다. 

These results suggest that the reliability of a score reflecting the summary of performances on multiple single-rater MSPIs is likely to be quite high and that a simple modification of the panel interview might substantially improve the quality of interview scores. For those schools that are reluctant to implement an MMI, a restructured MSPI might prove to be an effective intermediate approach.


단일한 면접에서 질문의 숫자나 평가자의 숫자를 늘리는 것은 면접실 전체를 여러 번 복제하는 것 만큼 좋지는 못함

As G studies clearly indicate that increasing the number of questions or raters within a single inter- view will not enhance reliability in the same way as replicating the entire interview process,10 changes to a single-interview format are unlikely to provide an efficient means of enhancing reliability.


이것을 추천함.

In summary, this study indicates that replicating a number of brief interviews, each with one rater, is likely to be superior to the often recommended panel interview approach and may offer a practical, low-cost method for enhancing MSPI reliability.







 2009 Dec;43(12):1198-202. doi: 10.1111/j.1365-2923.2009.03537.x.

Rater and occasion impacts on the reliability of pre-admission assessments.

Author information

  • 1Department of Family Medicine, University of Iowa, Iowa City, USA. rick-axelson@uiowa.edu

Abstract

CONTEXT:

Some medical schools have recently replaced the medical school pre-admission interview (MSPI) with the multiple mini-interview (MMI), which utilises objective structured clinical examination (OSCE)-style measurement techniques. Their motivation for doing so stems from the superior reliabilities obtained with the OSCE-style measures. Other institutions, however, are hesitant to embrace the MMI format because of the time and costs involved in restructuring recruitment and admission procedures.

OBJECTIVES:

To shed light on the aetiology of the MMI's increased reliability and to explore the potential of an alternative, lower-cost interview format, this study examined the relative contributions of two facets (raters, occasions) to interview score reliability.

METHODS:

Institutional review board approval was obtained to conduct a study of all students who completed one or more MSPIs at a large Midwestern medical college during 2003-2007. Within this dataset, we identified 168 applicants who were interviewed twice in consecutive years and thus provided the requisite data for generalisability (G) and decision (D) studies examining these issues.

RESULTS:

Increasing the number of interview occasions contributed much more to score reliability than did increasing the number of raters.

CONCLUSIONS:

Replicating a number of interviews, each with one rater, is likely to be superior to the often recommended panel interview approach and may offer a practical, low-cost method for enhancing MSPI reliability. Whether such a method will ultimately enhance MSPI validity warrants further investigation.

PMID:
 
19930511
 
[PubMed - indexed for MEDLINE]


+ Recent posts