MMI에서 시험내용 보안 위반에 따른 영향(Med Educ, 2006)

The effect of defined violations of test security on admissions outcomes using multiple mini-interviews

Harold I Reiter, Penny Salvatori, Jack Rosenfeld, Kien Trinh & Kevin W Eva






2001년 11월, MMI의 첫 번째 파일럿 프로젝트가 완료되었다. OSCE의 형식을 따라 6스테이션의, 18명의 가상 지원자를 대상으로 하여 괜찮은 수준의 일반화가능도(신뢰도), 수용가능성, 실행가능성을 확인하였다. 이후 실제 지원자를 대상으로 2002년과 2003년 대규모 연구를 통해 이전의 이러한 결론을 재확인하고 예측타당도를 검증하였다.

In November 2001 the first pilot project of a multiple mini-interview (MMI) process for student admissions was completed.1 Modelled after the objective structured clinical examinations (OSCEs), a 6-station MMI with 18 faux-applicants generated promising data regarding overall test generalisability (reliability), acceptability and feasibility. Results of subsequent large-scale studies of actual medical school applicants in April 2002 and 2003 confirmed prior conclusions and generated preliminary data demonstrating pre- dictive validity.1–3


MMI를 도입하고자 할 때 시험의 보안에 대한 우려가 있다. 그리고 이것은 현실이다. 면접 과정의 신뢰성이 2개의 핵심적 요인에 의해서 위험에 처해 있다. 입학은 엄청나게 중요한(high-stake) 시험이면서, 이 진실성을 깨트릴 수 있는 수단은 많다. 그 결과 MMI의 보안은 위협받게 된다. 그렇게 할 동기도 있고, 수단도 있고, 기회도 있다. 면접의 지시문(stems)이 일반 대중에게 공개될 가능성이 높으며, 그러나 그러한 부도덕한 행위로 인해서 어떤 이득이 있을지는 불확실하다.

With the anticipated move towards MMI implementation, concerns arose regarding test security. Cause for concern is real. The integrity of the interview process, like any other evaluation in academia, is endangered to a greater or lesser extent based upon 2 critical factors. How high are the stakes involved? What obstacles are in place to limit the extent of breaches of academic integrity? As a result, the MMI provides an attractive target for such breaches. 

  • There is motive, with the exceedingly high stakes of career-making in the balance. 
  • There is method, with the explosion of communication tech- nology decreasing obstacles to information dissem- ination. 
  • There is also opportunity, with the stems of interview stations available, of necessity, to those applicants undergoing the MMI. 

Thus the availability of stems to the general populace is anticipated. It remains far less certain whether anything is gained by such unscrupulous conduct.


더 포괄적으로 보자면, 보안문제에 관해 중요한 것 하나는 그에 따른 영향이 얼마나 되느냐인데, 18개의 연구 중 6개는 통계적으로 유의미한 향상을, 4개는 제한적인 향상을, 4개는 차이 없음을 보고했다.

More broadly, the issue is one of determining the impact of security violations on perceived compet- ence levels. Literature exists outlining this impact in the domain of clinical skills assessment. Of 18 studies, 6 showed a statistically significant improvement in performance after test security violations,4–9 4 showed limited benefits10–13 and 4 revealed no difference.14–21


Swanson은 문헌 고찰을 하면서 이 부분에 대한 방법론적 개선을 요구해쓴데, 이러한 연구를 할 때 방법론적 4가지 핵심적 고려사항이 있다.

Swanson et al.,22 in their review of this literature, promoted the need for methodologi- cal improvements in this area. They described 4 key methodological aspects to ensure when designing these types of studies. As applied to the MMI, these are as follows.


  • 1 면접대상자의 일부가 비교가능해야 한다. 평가대상이 무작위 배정되어야 한다.
    Subgroups of applicants being interviewed must be comparable, achievable using random assign- ment of those being rated. 
  • 2 보안 위반이 발생했음이 확실해야 한다.
    The violation(s) must be known to have occurred. 
  • 3 통계적으로 영향력을 예측하기 위해서 충분한 표본크기가 있음어야 하고, 연구의 power가 충분해야 한다.
    The study must have sufficient power, in terms of sample size, to enable any presumptive impact to be identifiable statistically. 
  • 4 평가도구의 신뢰도가 충분해야 한다.
    The tool must be sufficiently reliable for true shifts in the ability to perform to be detectable.



연구 1 

STUDY 1


Methods


57명의 지원자. 전통적 면접 수행 이후, MMI Trial에 자발적 참가

A total of 57 applicants to the MD programme participated in a voluntary trial run of the MMI after their traditional interviews were completed.


절반의 지원자는 2주 전 모든 9개 스테이션의 내용을 제공받음. 절반은 제공받지 않음.

Two weeks in advance of the interview date, half of the volunteers were provided copies of all 9 station stems via electronic mail. Access to these 9 station stems remained restricted from the other half of the volunteers.


2명의 평가자가 종합적 수행능력 평가함 (7점척도)

Two examiners provided a global per- formance rating for each candidate at each stationusing an anchored 7-point scale. 



Results


24명은 2주 전에 면접내용 제공받음. 0.06 차이가 있었으며, 통계적으로 유의미하려면 그룹당 1495명짜리 샘플 필요.

Twenty-four applicants received the station summar- ies 2 weeks in advance of their participation. The mean score of these participants was 4.97 (SD ¼ 0.46). The 33 applicants who did not receive the stations in advance achieved a mean score of 4.91 (SD ¼ 0.67). This difference is not statistically signi- ficant; F1,55 ¼ 0.19, MSE ¼ 6.22, P >0.65. To reveal a difference of 0.06 to be significant with the pooled standard deviation of 0.58 would require a sample size of 1495 per group.


Discussion


그룹간 차이가 존재하고 그 방향이 우려되는 방향이었으나, 매우 미미했고 그 수치가 유의미하려면 많은 지원자가 필요해야 함. 즉 임상적으로 중요하지 않음.

While the difference between groups is in the direction that would cause concern, it is so minuscule that 7 times the number of partici- pants that the MD programme interviews typically would be required to show the difference to be significant, thereby suggesting that the result would be clinically unimportant even if large enough samples were drawn.



연구 2

STUDY 2


Methods


2004년 3~4월에 진행됨. 실제 MMI

The second study occurred in March⁄ April 2004 with the first real high stakes implementation of the MMI.


24개 스테이션 개발하여서, 2일에 걸쳐 진행. 각각 12개 스테이션. 스테이션당 1명의 평가자가 7점척도로 평가

Twenty-four stations were developed, with 12 used on each of 2 interview dates. The 24 stations again focused upon personal quality domains. The system of scoring remained similar to that described above, with the exception that only 1 examiner was present per station.


12개중 2개 스테이션을 파일럿 스테이션으로 사용하였고, 이 스테이션 점수는 총점에 반영 안됨. 절반의 지원자는 그 2개중 1개, 나머지 절반은 다른 1개에 대한 내용을 제공받음. 면접날 지원자는 일부 스테이션이 파일럿 목적으로 포함되었으며, 입학 결정에 영향이 없다는 설명을 받음. 그러나 그 스테이션이 무엇인지는 알려주지 않았음. Repeated measure t-test 사용하여 정보가 없었던 스테이션과 그렇지 않은 스테이션 점수를 비교

Once again an intentional security violation was introduced, this time by using 2 of the 12 stations as pilot stations, scores on which did not count toward the admissions decision. Half the applicants received 1 of the 2 pilot stations with their mailed letter inviting them to interview; in a covering letter they were told to expect to encounter that particular station during their interview. The other half of the applicants received the other pilot station in the same manner. On the day of the interview applicants were told that some stations were included for pilot purposes and that these stations would not count towards their admis- sions decision; they were not told, however, which stations fell into this category. Repeated measures t-tests were used to compare scores on the station seen in advance to scores received on stations to which applicants were naive.


Results


평균 점수

The mean overall performance score received by candidates per station was 4.94 (SD ¼ 1.10). The overall test–retest reliability of this 12-station MMI with 1 examiner per station was 0.70.


Discussion


high-stake 였음에도, 그리고 2주전에 내용을 제공했음에도 benefit은 없었다.

Despite the high stakes nature of this interview process and the fact that stations were delivered 2 weeks in advance with clear indication that they would be included in the interview, we again wit- nessed no benefit of prior exposure in the performance ratings assigned.


그러나 일부 평가자는 - 이 intervention에 대해서 모르는 - 자발적으로 일부 지원자가 지나치게 연습이 되어있었다고 말했는데, 이것이 왜 스테이션에 대한 정보를 알더라도 별 이득이 없는지를 설명해주는 기전의 가능성을 제시함.

Anecdotally, a number of examiners, each of whom were blinded to the intervention, noted spontaneously that some responses seemed too rehearsed, potentially providing insight into the mechanism by which potential benefits of prior knowledge of the stations are lost.



연구 3 

STUDY 3 


Methods


직업치료사 면접을 본 사람 중 38명은 물리치료사에도 지원함. 이 38명은 7개 스테이션 MMI에대해서 오전에 OT, 오후에 PT 선발용 면접을 수행함. 따라서 이 38명은 면접 문항 뿐 아니라, 실제로 그 스테이션에 대한 경험이 있음. 7점척도를 사용하였으며, 이 프로그램의 면접관들은 이 38명이 누군지 몰랐음. 7점척도로 종합적 수행능력과 직업적합성 정도를 각각 평가하게 했음. 이 둘 사이의 상관관계가 0.95를 넘어서 종합적 수행능력 점수만 비교함.

Of the interviewees for occupa- tional therapy seats, 38 also interviewed for physio- therapy seats. These 38 applicants underwent the same 7-station MMI for both interviews (OT in the morning and PT in the afternoon). They were therefore privy not only to the stems of the MMI stations, but also potentially gained benefit from the experience of working through the 7 stations with an interviewer. As before, the stations focused on personal quality domains and were globally scored using a 7-point anchored scale. Interviewers in this programme were blinded to the candidates being repeated interviewees. They were asked to assign ratings of each candidate’s overall performance and to provide a 7-point gut opinion of the person as a candidate for the profession⁄ programme. The cor- relation between these two scores was greater than 0.95, so only the overall performance score will be reported for the sake of comparison with the first 2 studies outlined above.


평균은 0.01차이가 있었으며, 유의미하려면 그룹당 29000이 필요함.

The mean score provided to the sample of 38 applicants during their interview for the OT pro- gramme was 3.46 (SD ¼ 0.43). The mean score provided to the same group during their interview for the PT programme was 3.45 (SD ¼ 0.44). This difference is not statistically significant; t(37) ¼ 0.14, P >0.8. To reveal a difference of 0.01 to be significant with the pooled standard deviation of 0.43 would require a sample size of over 29 000 per group.





고찰 

GENERAL DISCUSSION



MMI 면접 스테이션 개발은 노동집약적이고 여러 단계를 거친다. 다음의 결과물을 만든다.

MMI station development can be labour intensive, requiring several steps.24 The written product consists of:


1 제시문 A station stem entitled Instructions for the Applicant 

2 스테이션 가이드 (면접관용 가이드) A station guide entitled Instructions for the Observer 

3 스테이션의 배경과 이론에 대한 심도 평가 An in-depth review of the station implications entitled Background and Theory .

4 평가지 A station score sheet. 


이 4개중 1번은 지원자와 평가자에게 제공되고 나머지 3개는 평가자에게만 제공된다. 모든 문서는 높은 수준으로 보안되는데, 평가자들은 일반적인 MMI에 대해서는 오래 전에 교육을 받으나, 스테이션에 대해서는 면접 당일에 정보를 받는다. MMI가 시행되면 지원자들은 처음으로 지시문을 접한다. 그러나 종이와 펜이 없다고 해서 이 제시문을 복원하는데 지장이 있지 않다.

The first of these 4 is available to the applicant throughout the station; the other 3 are available to the observer only. All documents are jealously guarded. While observers receive general MMI training well in advance, they remain station-naive until the morning of the interview date when they receive station-specific training. Once the MMI commences, the interviewed applicants become privy to the station stems. Their lack of paper and pen has not significantly constrained the subse- quent publication of those stems.


2004년에 시행한 MMI에서 많은 정보통신기구에 의한 보안 위협이 있었다. 한 지원자는 다른 지원자에게 어떻게 MMI가 진행되었는지 알려주었다고 했고, 웹사이트에 MMI가 끝난지 7분만에 그 정보가 올라갔다고 했다. 몇 주가 지나자 모든 24개 스테이션의 제시문이 상당한 정확도로 복원되었다.

The practical administration of the MMI in March 2004 provideda sample of security challenges in the age of hand-held computers, wireless communication and the internet. The first comment about the MMI by aninterviewed applicant, informing others about howthe MMI was run, was posted in a forum website 7minutes after MMI completion.23 In the subsequentweeks, after interviews were completed, reasonably accurate descriptions of all 24 MMI station stems could be viewed on the same site. 


이렇게 제시문이 빠르게 퍼져나가는 것은 놀랍지도 않고, MMI도입에 걱정거리이다. 체크리스트와 배경정보와 배경이론이 새어나가는 것의 영향력은 모르나, 제시문이 공개되는 것에 대한 우려는 보다 현실적이다.

The rapid publication of these stems was therefore hardly surprising. Nor, apparently, is it particularly unnerving for prospective MMI implementation. While the effects of security violations of station checklists and background and theory remain unknown, the more practical concern regarding security violation of station stems appears misplaced.


직관에 반하는 이러한 결과에 대한 한 가지 설명은, MMI 스테이션은 OSCE나 다른 지식/능력 검사와 달리 1개의 정답만 있을 가능성을 배제하고 있으며, 지원자가 무슨 답변을 하든 면접관이 그에 대한 반문이 가능하다. 모든 가능한 재질문에 대하여 준비하기는 대단히 어려우며, 그래서 오히려 지원자가 사전에 진행 의제를 설정하여 그것대로 진행하려고 하면 오히려 안좋은 결과가 나타나는 것이다. 
One plausible explanation of this counter-intuitive result is that MMI stations, unlike OSCEs and other knowledge⁄ ability tests, are designed to guard against the possibility of there being 1 correct answer, thereby allowing the interviewer to challenge any response provided by the candidate. It would be very difficult to prepare responses for every possible challenge, thus resulting in poorer performance if a candidate attempts to force a pre-planned agenda on the discussion.


시험 보안에 대해서 연구는 제시문에 제한되어 있었지면, 여기서 보면 2주간의 기회를 주어도 향상은 없으며 따라서 얼마나 긴 시간을 주는가는 그다지 문제가 되지 않아 보인다.

In these studies the extent of the violation was limited to the availability of the stem. Our results show that there was no score enhancement despite the 2-week window of opportunity. It appears that time delay is not an issue.


조금 불분명한 것은 보안 위반의 범위에 대한 것이다. 앞의 두 연구에서 제시문 보안만 위반된 경우에는 유의한 영향이 없어 보인다. 시험-재시험 위반의 경우에도 역시나 결과에 영향은 없었다. 그러나 세 번째 연구의 결과는 time delay가 짧았기 때문일 수 있다.

Less clear is the influence of the extent of the violation. One5 of 25,15 OSCE studies with egregious and identifiable violations suggested that extent of violation is a critical factor. In the first 2 MMI studies broadcasting of the stem alone, a more limited violation, had no significant impact on test scores. The more extensive, test–retest violation of the third MMI study also failed to demonstrate any impact on scores. However, this may have been a result of the short time delay (several hours only), and thus short potential responsive preparatory time between information access and the retest.


security violation이나 time delay 둘 중 하나만으로는 MMI 점수를 향상시키는데 불충분하다. 이 둘이 모두 갖춰진다면 가능할지도 모른다. 혹은, MMI에 정답이 없다는 것이 바라지않은 수행능력 향상을 애초에 불가능하게 할 수도 있다.

Alone, neither factor is sufficient to enhance MMI scores. Together, they may be sufficient. Alternatively, the absence of correct answers on MMI performance might result in no unwanted performance modification, even in a setting combining both more extensive security violations and greater time delay between violation and subsequent performance.



결론

CONCLUSIONS


의과대학 지원부터 전문의가 되기까지의 단계.

From 

  • application to medical school through to its successful completion,24 
  • national licensing examina- tions for general medical licence,25 
  • entry into one’s preferred speciality training26 and to 
  • speciality certi- fication,27 


이 중 단 하나의 가장 높은 허들은 의과대학 입학이다. McMaster 의과대학에는 지원자의 3.8%만 합격한다. 들어오면 99%가 졸업하고, 캐나다 의사국가시험 Part I은 95%, Part II는 91%의 합격률을 보인다. 전공과목 수련은 88%가 마치며, 91%는 전문의 시험에 합격한다.

the single greatest hurdle in terms of likelihood of success is, overwhelmingly, admission to medical school. Only 3.8% of applicants to the McMaster University Undergraduate Medical Pro- gram were admitted in 2004.24 Of those who enter the programme, 99%graduate.25 Canadian graduates nationally enjoy a greater than 95% success rate on Part I and 91%success rate on Part II, respectively, of the Licentiate Medical Council of Canada examina- tion upon their first sitting of each examination.26 They also enjoy an 88%likelihood of being chosen by the preferred speciality training programmes in Canada27 and a 91% first attempt success rate on Royal College fellowship speciality certification examinations.28


미국과 캐나다 시스템에서 의과대학에 일단 들어오면 원하는 전공과목 전문의가 못 될 가능성은 별로 없다. 의과대학생과 레지던트의 노력을 폄하하는 것은 아니다. 그러한 노력이 의과대학 입학단계에까지 확장되어야 한다.

In the American and Canadian systems, failure to complete medical school through failure to obtain one’s preferred speciality or family practice certification remains unlikely. This is not meant to denigrate the Herculean efforts and enor- mous talents required on the part of dedicated medical students and residents, but rather to recog- nise that the odds are very much in their favour for those later stages. That same effort, talent and dedication, expended at the level of admission to medical school, combine for far lower success rates.


일반적인 상황, 즉 제시문이 노출되는 보안 위반에도 MMI의 진실성은 유지될 수 있다.

Under normal circumstances, including the potential security violation of distribution of station stems, confidence in the veracity of MMI outcomes can be maintained.




24 Ontario Medical School Application Service Statistical Summary 2004. Ontario: Ontario Universities’ Appli- cation Centre, Council of Ontario Universities, 8 October 2004.









 2006 Jan;40(1):36-42.

The effect of defined violations of test security on admissions outcomes using multiple mini-interviews.

Author information

  • 1Dept. of Clinical Epidemiology and Biostatistics, McMaster University, 1200 Main Street West, Hamilton, Ontario L8Z 3N5, Canada.

Abstract

INTRODUCTION:

Heterogeneous results exist regarding the impact of security violations on student performances in objective structured clinical examinations (OSCEs). Three separate studies investigate whether anticipated security violations result in undesirable enhancement of MMI performance ratings.

METHODS:

Study 1: low-stakes: MMI station stems provided to a random half of 57 medical school applicants 2 weeks in advance of participation in a research study. Study 2: high-stakes: 384 medical school applicants sat a 12-station MMI to determine admission. Each half received 1 of 2 pilot MMI station stems 2 weeks in advance. Study 3: high-stakes: 38 interviewees with dual applications to occupational therapy and physiotherapy experienced the same 7-station MMI twice on the same date.

RESULTS:

No statistically significant differences in MMI performances were detected.

CONCLUSIONS:

Predictable violations of MMI security do not unduly influence applicant performance ratings.

PMID:
 
16441321
 
[PubMed - indexed for MEDLINE]


+ Recent posts