의과대학 입학의 도덕성(Adv Health Sci Educ Theory Pract. 2004)

Editorial – The Morality of Medical School Admissions

GEOFF NORMAN




이번 호는 정확히 theme issue는 아니지만 우리는 3 개의 원본 기사와 Reflection을 가지고 있습니다 (Eva & Reiter, 2004; Ginsburg, Schreiber & Regehr, 2004; Kreiter et al., 2004; Marrin et al., 2004). 입학 절차와 관련이 있습니다. 이 article들은 여러 article가 이전에 해왔 던 것처럼 (Salvatori, 2001; Kulatunga-Moruzi & Norman, 2002), 프로세스의 신뢰성과 타당성에 초점을 맞추지 않고 그 대신 입학 조치에서 이러한 가치의 과정과 운영에 관한 고유 한 가치를 비판적으로 검토함으로써 독특한 관점을 제시한다. 

This issue is not exactly a theme issue, but we do have three original articles and a Reflections piece (Eva & Reiter, 2004; Ginsburg, Schreiber & Regehr, 2004; Kreiter et al., 2004; Marrin et al., 2004) all related to the admissions process. In combination, these articles present a unique perspective, focussing not simply on the reliability and validity of the process, as many articles have done before (Salvatori, 2001; Kulatunga-Moruzi & Norman, 2002), but instead critically examining the values inherent in the process and the operationalization of these values in admissions measures. 


에바 (Eva)가 논의한 바와 같이, 모든 딜레마는 '좋은 의사를 만들기 위해 더 높은 점수 이상(more than)을 필요로 한다'는 점을 받아 들일 것이지만, 우리는 '높은 점수'와 'more than'이 상호 배타적 인 것처럼 행동합니다. 에바 (Eva)가 종종 잊어 버리는 요점을 강화하기 위해 이 기회를 빌자. 훌륭한 임상가가되기 위해선 좋은 점수 이상을 필요로한다고해서, 반드시 좋은 점수와 좋은 사람 중에서 하나를 선택해야한다는 것을 의미하지는 않습니다. 이 전략은 marks과 charm이 완전히 역으로 상관되는 경우에만 필요합니다. 그렇다면 우리는 정말로이 둘을 가진 사람을 찾을 수 없을 것입니다. 그러나 Eva가 보여 주듯이, makrs과 personality은 반비례하지 않습니다. 어떤 것이라도 긍정적으로 상관됩니다. Eva의 데이터 세트에서 상관 관계는 + 0.15입니다. 상관 관계가 없어도 점수와 매력을 지닌 후보자는 여전히 많이 있습니다. 에바 (Eva)가 이것으로부터 끌어 낸 함의는 다시 말해서 가치가 있습니다. 높은 점수와 좋은 사람 사이에서 선택을 할 필요는 없습니다. 두 가지를 모두 갖춘 후보자가 많이 있으며, 이들 후보자를 선택해야합니다.

As Eva has discussed, one dilemma is that, while all will accept that ‘it takes more than high marks to make a good doctor’, we continue to act as if the ‘high marks’ and the ‘more than’ are mutually exclusive. Let me take this opportunity to reinforce a point that Eva makes which is often forgotten. That it takes more than good marks to make a good clinician does not imply that we will inevitably find ourselves having to choose between good marks and nice persons. That strategy would only be necessary if marks and charm were perfectly inversely correlated. Then we really would not be able to find someone who has both. But as Eva shows, marks and personality are not inversely related; if anything they are positively correlated. In Eva’s data set, the correlation was+0.15. Even with a zero correlation, there are still lots of candidates who have both marks and charm. The implication Eva draws from this is worth reiterating. It need not be a choice between high marks and nice persons. There are lots of candidates who have both, and these are the ones we should select. 


그러나 진술을 승인 한 후에 실제 연습으로 넘어 가면 아주 미끄러운 경사가됩니다. 프로그램, 대학 및 국가의 표준화에 대한 우려가 있음에도 불구하고 marks를 쉽게 평가할 수 있습니다. 그러나 우리가 non-cognitive factor의 측정으로 돌아 가면 측정이 훨씬 더 어려워집니다. 

  • 자서전 서적의 경우, 사기는 가능하지 않습니다. 그렇지만 그것은 잘 문서화 된 사실입니다. 

  • 면접 시험과 같은 측정의 신뢰성은이 문제에서 Kreiter가 보여준 것처럼 의심 스럽습니다. 

  • 그리고 Kulatunga-Moruzi의 연구에서 인터뷰와 추천서의 예측 타당성은 의사 소통 기술과 같은 비인지적 척도를 예측할 때조차도 매우 낮았습니다. 

Marks 외의 것들에 대한 평가는 많은 학교들이 매우 정교하고 노동 집약적이며 값 비싼 추첨을 진행하는 과정에 종사하고 있다고 말하는 것조차 과장이 아니다.

However, once you go beyond endorsement of the statement to actual practice, it becomes a very slippery slope. We can easily assess marks, albeit with some concern about standardization across programs, universities and countries. However, when we turn to measures of non-cognitive factors, measurement is much more difficult. For autobiographical letters, fraud is not a possibility; it is a well-documented fact. The reliability of measures like the interview is questionable, as shown by Kreiter in this issue. And in Kulatunga-Moruzi’s study, the predictive validity of interview and letter, even in predicting non-cognitive measures like communication skills, was very low. It is not too big a stretch to suggest that, once we go beyond marks, many of our schools are engaged in the process of conducting a very elaborate, labour-intensive, and expensive lottery. 


일부 관할 지역에서는 이를 인정하고 그것에 대해 뭔가를했습니다. 그들의 실용주의로 널리 존경받는 네덜란드 인은 국영 추첨을 사용하여 학생을 선발했습니다. 1970 년대 이래 시행되었고, 성적이 높으면 당첨 기회가 올라간다. 일부 학교는 추가 letter나 인터뷰를 시도하지만 대부분은 그렇지 않습니다.

Some jurisdictions have recognized this and done something about it. The Dutch, widely admired for their pragmatism, have been selecting students using a state-run lottery, where the higher your marks the more tickets you get, since the 1970s. Some schools attempt an additional letter or interview, but most do not. 


그러나 나머지 사람들은 우리가 가지고 있을지도 모르는 불안감에도 불구하고 편지와 인터뷰에 의존한다. Nayer (1992)가 지적했듯이, 북미 의과 대학의 99 % 이상이 여전히 인터뷰에 의지하고 있으며, 그것이 제한된 가치임을 시사하는 풍부한 데이터를 무시한 것으로 보인다. 유감스럽게도 이 낭만적인 인터뷰 의식에 대한 또 다른 부정적인 결과는 입학을 위한 marks 사용에 대한 거의 보편적 인 비난입니다. Best (1989)가 말했듯이, Powis (2004)의 말 :

But the rest of us appear to soldier on with our letters and interviews, despite the misgivings we may have. As Nayer (1992) indicated, more than 99% of North American medical schools still rely on an interview, apparently choosing to ignore a wealth of data suggesting that it is of limited value. Regrettably, one other negative consequence of this romantic adherence to the ritual of the interview is an almost universal condemnation of the use of marks for admissions. As Best (1989) said, quoted by Powis (2004): 


사춘기 말기의 시험에서 받은 높은 점수를 인간적이고 돌보는 의학 전문직과 동일시하는 논리의 비약은 말도 안되지만, 아무도 "충분히 높은 성적을 가진 자를 선발하는 전략"과 전투를하기에 충분히 강한 다른 해결책을 가지고 있지 않기 때문에 지속된다. '

‘The leap of logic that equates high marks in an examination at the terminal end of adolescence with a humane and caring medical profession is a nonsense, but is sustained because nobody has any other solution which is strong enough to combat… the ‘‘high enough mark’’ method.’ 



메서드의 전반부에서 발생하는 문제는 대처하기 쉽습니다. Marks에 초점을 맞추는 것은 '난센스'가 아닙니다. Marks은 수년 간의 대학 또는 고등학교 성적을 반영합니다. 따라서, 그들은 지능과 동기 부여의 지표이자 의학을 이해하는 개념을 이해하는 데 필수적인 주제 분야의 숙달입니다. 따라서 대학에서 의과 대학의 성적 및 면허 시험에 대한 성과를 예측할 수 있는지 여부와 상관없이 미래 성과에 대한 평가 지표 (예 : 과거 성과)가 지속적으로 미래 성과를 예측하는 데 놀랄만 한 것은 아닙니다. 

It is easy to take issue with the first half of the method. A focus on marks is not ‘nonsense’. Marks reflect a number of years of university or high school performance. As such, they are an index of both intelligence and motivation, as well as mastery of subject areas essential for understanding the concepts underpinning medicine. It is not, therefore, surprising that marks (i.e., past performance) are consistently the best predictor of future performance, whether it is marks in university predicting 

  • medical school performance and performance on licensing examinations (Kulatunga-Moruzi & Norman, 2002; Salvatori, 2001), 

  • performance on licensing examinations predicting performance in specialties (Case & Swanson, 1993), or 

  • marks on specialty examinations predicting performance in practice (Norcini, Lipner & Kimball, 2002; Ramsey et al., 1989). 


더욱이 상관 관계는 사소한 것 (일반적으로 0.3에서 0.6까지)과는 거리가 멀며, 내용의 일치와 직접적인 관련이 있습니다 (Case & Swanson, 1993). 또한, 시험을 예측하는 시험의 경우 만이 아닙니다. practice measure의 결과를 보면, 한 연구에서 동료 평가 였고 다른 하나는 MI 이후의 환자 사망률이었습니다. 물론 예외가 있습니다. 예를 들어 Neame, Powis & Bristow (1992)는 구조화 된 인터뷰가 의과 대학에서 withdrawal을 예측하는데 가장 좋은 척도임을 보여주었습니다. 그러나 그것은 인턴 수행을 예측하지 못했다 (Rolfe et al., 1995)

Moreover, the correlations are far from trivial, typically in the range from 0.3 to 0.6, with the magnitude of the correlation directly related to the matching of content (Case & Swanson, 1993). Further, it is not just a case of tests predicting tests – the practice measures were, in one study, peer ratings, and in the other, patient mortality rates post-MI. Of course, there are exceptions; for example, Neame, Powis & Bristow (1992) showed that a structured interview was the best predictor of withdrawal from medical school. But it did not predict intern performance (Rolfe et al., 1995). 


그러나 정말로 중요한 것은 후반부입니다. Marks가 (비인지적 척도보다 일관되게 더 우수한 정도로) performance를 예측한다고 하더라도, 아직 오지 않은 어떤 유토피아적 미래만 기다리는 것은 의미가 없습니다. 여기에 대한 해법은 marks를 덜 강조하는 것이 아니라, (지금까지 잘 측정해오지 못했던) 다른 특성에 대한 더 나은 측정 방법을 개발하는 것이다. OSCE와 비슷한 샘플링 전략을 갖춘 다중 미니 인터뷰 (MMI, Eva et al., 2004)는 실제 약속을 갖고있는 것으로 보입니다. 그러나 MMI가 궁극적으로 승리하든 지든간에, 상황을 바꿀 수 있는 잠재력을 가진 것은 공허하게 격렬한 비판이 아니라, 이와 같은 견고한 연구입니다.

But it is the second half of the quote that really matters. Given that marks do predict performance, with a consistency and at a level that cannot be matched by non-cognitive measures, it makes no sense to pine for some utopian future time when they will not count. The solution does not come from de-emphasizing marks, but from developing better measures of other characteristics that are equally important, but poorly measured. The multiple mini-interview (MMI; Eva et al., 2004, in press), with its OSCE-like sampling strategy, appears to have real promise. But whether the MMI ultimately wins or loses, it is solid research like this, rather than empty polemics about the sorry state of affairs, that has real potential to change things. 


이 실질적인 연구에 대한 무지는 상황에 따라 용서받을 수 있습니다. 결국 우리는 바쁜 입학 담당자가 최신문헌을 따라 잡을 것으로 기대할 수 없습니다. 그러나 이 경우 피해는 단지 자원을 낭비하는 것 이상의 가치가 있다고 생각합니다. 그것은 과실에 불과합니다. 이것은 내 타이틀의 '도덕성'이 나타나는 부분입니다.

Ignorance of this substantial research may be forgivable under circumstances; after all we cannot expect busy admissions officers to also keep up with the literature. But in this case, I believe the damage amounts to more than just squandering resources; it is nothing less than negligence. This is where the ‘morality’ part of my title emerges. 


베시를 소개해 드리겠습니다. 나는 그녀를 심리학 파티에서 몇 년 만났습니다. 그녀는 막 박사 학위를 마쳤습니다. 심리학에서, 그리고 의과 대학에 가기를 희망하고 있었다. 그녀는 매력적이었고, 나가고, 분명히 매우 똑똑했습니다. 그녀는 제가 의과 대학에 연루되었다는 것을 알았 기 때문에 이야기하기 시작했습니다. 그리고 그녀는 의과 대학에 입학하려했으나 실패했습니다. 그녀는 황폐 해졌고, 왜 그녀가 그 인터뷰를 '날려 버렸는가'를 알고 있다고 생각했습니다. 그녀는 어떻게해야합니까, 그녀는 물었다. 나는 '다른 복권을 사세요'라고 대답했다. 그러나 나는 이해하지 못했다. 그녀는 인터뷰에 실패했다. 그녀에게 뭔가 잘못된 것이있었습니다. 명백한 실패를 바로 잡기 위해 할 수있는 일이 없었습니까? 나는 그것이 그녀의 잘못이 아니라는 것을 계속 강조하면서, 문제는 그녀의 성격에 있는 것이 아니라 심각하게 결함이있는 과정이라는 것을 깨닫기 시작했다. 그녀는 나의 주장을 머리로는 이해했을지 몰라도, 그녀는 실패를 내면 화하는 것을 멈출 수 없었습니다.

May I introduce you to Betsy. I met her at a Psychology party some years. She had just completed her Ph.D. in psychology, and was hoping to go to med school. She was, and is, charming, outgoing, and obviously very intelligent. We started talking because she knew I was involved in the med school, and she had just tried, and failed, to get into medical school – again. She was devastated, and thought she knew why – she had ‘blown’ the interview. What should she do, she asked. ‘Buy another lottery ticket’, I replied. But I did not understand – she had failed the interview. There was something wrong with her. Wasn’t there something she could do to remediate her obvious failings? As I continued to reiterate that it was not her fault, that it was not a character defect, that it was simply a seriously flawed process, I began to realize that, although at an intellectual level she must have understood my arguments, at another level, she could not stop internalizing her failure. 


그리고 여기에 진짜 악마가 있다. 입학 면접과 추천서는 교수 및 커뮤니티 자원의 낭비입니다. 이 자원 봉사 시간을 고려할 때 입학의 실제 비용은 엄청납니다. 더 나은 의대생을 선발하는 측면에서이 전체 과정의 이점이 매우 모호한 것은 아닙니다. 이 중 하나라도 이러한 조치의 의문없는 사용에 도전하기에 충분할 것입니다. 그러나 입학에 합당한 10 %를 선택하는 과정에서 (따라서 사회에서 존중 받고 돈이 많이 드는 곳을 보장받는) 다른 90 %에게 이렇게 말하고 있습니다. 너희들은 가치가 없다. 너희들은 충분히 좋지 않다. 너희들은 개인적으로 실패한 것이다

And therein lies the real evil. It is not that admissions interviews and letters are a costly waste of faculty and community resources; it is not that the real cost of admissions when you factor in this volunteer time is enormous; it is not that the benefit of this whole process in terms of selecting better medical students is highly dubious. Any one of these things would be sufficient to challenge the unquestioning use of these measures. But in the course of selecting the 10% who are worthy of admission (and hence guaranteed an esteemed and well-paid place in society), we are telling the other 90% 

that they are unworthy; 

that they are not good enough, 

that they have personal failings. 


그러나 우리가 가진 증거는 별자리 점보다 조금 나은 것 같습니다. 최소한 우리가 별자리로 판단한다면, 대부분의 사람들은 자신의 개인적인 특성이 측정되지 않는다는 사실을 깨닫지 못할 것입니다 (그리고 그들의 운세가 진실로 입학 자격이없는 것일 수도 있다고 믿는 사람들).

Yet the evidence we have is likely little better than a horoscope. At least if we judged them by a horoscope, most would not be devastated to learn that their personal qualities did not measure up (and those who did believe their horoscopes may truly be unworthy of admission!). 


왜이 시대에 최우수 의료 교육 (Best Evidence Medical Education)을 받았 는가, 왜 자서전 서신과 같은 불명예스러운 방법에 입학위원회가 집착합니까? 이들의 제한된 가치의 이미 수십 년 동안 증거가 쌓여왔다. 이 전문적인 쓰레기 촬영을 계속하기 위해 막대한 자원을 소비하는 대신, Eva의 기사에서 설명한 다중 미니 인터뷰 (Multiple Mini-Interview)와 같은 독창적 인 대안으로 자금을 전환 한 경우 이전에 문제가 해결되었을 수있었습니다.

Why, in this era, of Best Evidence Medical Education, do admissions committees cling to discredited methods like autobiographical letters? Evidence of their limited value has been available for decades. If, instead of expending enormous resources to continue this professional crap shoot, we had diverted funds to creative alternatives like the Multiple Mini-Interview, described in Eva’s article, we could have had the issue resolved a long time ago. 


우리는 할 수 있고 더 잘해야합니다. 우리는 미래의 세대의 지원자들이, 그리고 우리 자신이 그 수혜자가 될 것이다.

We can, and must do better. We owe it to future generations of applicants, and to ourselves, who may well be beneficiaries of their care.







The morality of medical school admissions.

PMID:
 
15222333

[Indexed for MEDLINE]

사회과학과 인문과학(SSH)을 전공한 학생이 의과대학 입학과정에서 소외되는가? Review and Contextualization (Acad Med, 2014)

Is Social Sciences and Humanities (SSH) Premedical Education Marginalized in the Medical School Admission Process? A Review and Contextualization of the Literature 

Justin N. Hall, MSc, MPH, Nicole Woods, PhD, and Mark D. Hanson, MD, MEd, FRCPC




MCAT2015가 의과대학생에 대한 자연과학/행동 및 사회과학/인문학에 대한 전문가 의견을 반영한다.

These reports and the MCAT2015 reflect current expert opinion regarding the contributions of natural sciences, behavioral and social sciences, and the humanities to medical student selection.



미국의 premed 교육에 관한 오래된 논란이 있다. 그것은 SSH를 전공한 학생이 이공학을 전공한 학생만큼 의과대학에 준비된 상태인가, 그리고 의과대학에서의 수행능력은 동등한가에 대한 것이다. 1910년 플렉스너는 의과대학의 변화를 일으키면서, 의과대학 입학정책의 변화를 제안했다. 물리, 생물, 화학, 실험실 경험 등이 premed 교육과정의 기초가 되어야 하며, 이것을 잘 배우지 않은 학생은 지원자격이 되지 못한다고 했다. 플렉스너의 기풍은 미국과 캐나다 의과대학에 스며들었으며, 대부분은 과학과 수학을 선수과목으로 요구했다.

There is a long-standing debate regarding U.S. premedical education. That is, are students who majored in SSH disciplines as prepared as their science major counterparts, and do they perform as well in medical school? In his 1910 report, Flexner5 recommended sweeping changes to medical education including medical school admission policy. He described physics, biology, and chemistry, including laboratory experience, as foundational to the premedical curriculum, asserting that poor performance in these disciplines could identify unqualified applicants.6 Flexner’s ethos continues to permeate U.S. and Canadian medical schools, as most require some combination of prerequisite courses in science and mathematics.


플렉스너 보고서 100년을 맞아 학자들은 premed에서 인문학의 중요성을 다시 성찰해보았다. Riggs는 의과대학 입학시 인문학 선수과목이 없는 것을 이학 선수과목이 없는 것 만큼이나 의과대학 입학을 못하게 해야 한다고 주장했다. 1978년 Thomas는 의과대학 입학정책이 premed 교육과정에 미치는 안좋은 영향을 묘사하면서, premed 교육이 이학 과목에만 집중하면서, liberal arts 교육이 희생당하고 있다고 지적했다. 또한 premed교육은 문학의 고전을 다뤄야 하고, 이학과목의 심층 학습은 의과대학의 것으로 남겨두어야 한다고 했다. 또한 MCAT이 과학 과목에 초점이 맞춰져 있는 것을 완전히 없애든 초점을 바꾸든 하여 인문학, 역사학 등에 더 초점을 맞춰야 한다고 주장했다.

On the centenary of the Flexner report, scholars revisited the importance of humanities within premedical education.7,8 Riggs7 opined that the absence of humanities prerequisites should preclude medical school admission as does the absence of science prerequisites. In 1978, Thomas9 decried the detrimental impact of medical school admission policies on the premedical curriculum, criticizing premedical education’s emphasis on sciences at the expense of a liberal arts education. He suggested that premedical education should focus on the classics in literature, with in-depth study of sciences saved for medical school itself. He also suggested that the MCAT either be dropped entirely or changed to lessen the focus on sciences and increase the focus on literature, humanities, and history.



그러나 과학 과목을 선수과목으로 요구하는 것은 오늘날 의과대학 입학정책에서 강조되고 있다 AAMC의 가이드북을 보면 가장 흔히 요구되는 10개의 선수과목이 나와있는데, 8개는 과학 혹은 수학이고(물리, 유기화학, 무기화학), 2개만이 SSH이다(영어, 인문학). Muller와 Kase는 AAMC 자료에 따르면 2009년 의과대학에 입학한 학생 중 18% 이하만이 SSH 전공자이다

However, science prerequisites continue to be emphasized in medical school admission policies today. The AAMC Medical School Admission Requirements Guidebook identifies the 10 most common premedical course requirements,11 8 of which are science and mathematics courses (e.g., physics, organic and inorganic chemistry); only 2 are SSH courses (English and humanities). Muller and Kase12 report that, according to AAMC data, less than 18% of medical students matriculating in 2009 were SSH majors.


SSH premed 교육에 관한 오랜 논란을 보며 우리는 다음의 질문을 한다.

Given the long-standing debate about SSH premedical education, we pose the following question:

  • Does SSH premedical education have a role in today’s medical school admission process?
  • That is, should there be prerequisite SSH course requirements for all applicants?
  • Should there be standards for how medical school admission committees consider and compare applicants with majors in SSH versus those with majors in the sciences?



Method





문헌 고찰

Literature review: Scope and criteria for inclusion



문헌 고찰

Literature review: Search terms and selection process


 

결과

Results


문헌 고찰

Part 1: Review of the literature


미국의 경험 An American experience.


The Humanities and Medicine Program (HuMed) at Mount Sinai School of Medicine (MSSM) 

The Humanities and Medicine Program (HuMed) at Mount Sinai School of Medicine (MSSM) offers the most compelling evidence for the advancement of SSH premedical education.12,14 HuMed is designed as

  • SSH-특이적 입학 코스. 일부 2, 3학년 SSH전공자에게 의과대학 입학을 보장함.
    an SSH-specific admission stream, offering guaranteed medical school admission (contingent on successful completion of an undergraduate degree) to some second- and third-year students majoring in the humanities or social sciences.
  • GPA 3.5 이상, 생물학과 일반화학에서 B학점 이상 받아야
    Students are required to maintain a minimum grade point average (GPA) of 3.5 in addition to earning a “B” grade in biology and general chemistry.
  • 유기화학, 물리, Calculus, MCAT 불필요
    They are not required to take organic chemistry, physics, calculus, or the MCAT exam.12,14
  • 고등학교,대학교 성적, 자기소개서 2가지, 추천서 3부, SAT점수, 2차례의 면접
    Other components of HuMed admission include high school and university transcripts, two personal essays, three letters of reference, SAT scores, and two interviews.12
  • 일단 합격하면 학부 3학년을 마치고 8주간 임상경험, 의학 관련 주제(의료윤리, 의료정책) 세미나 참석, accelerated course 이수
    Once accepted, HuMed students must spend eight weeks after their third undergraduate year gaining clinical experience, attending seminars on medical topics such as bioethics and health policy, and taking an accelerated course on the “Principles of Organic Chemistry and Physics Related to Medicine.”12
  • 의과대학 입학 전 여름학교 초청됨
    HuMed students are invited to a summer enrichment program before commencing medical school to familiarize themselves with clinical sciences teaching.12

대부분의 수행척도에서 동등함. NMBE Part II 정신과학 subtest에서 더 잘함.

Yens and Stimmel15 found that nonscience majors performed on par or better than their peers with science premedical education on the majority of performance measures and were significantly more likely to perform at the superior level of the National Board of Medical Examiners (NBME) Part II Psychiatry subtest.


비교 결과 

In the two HuMed studies included in our review,12,14

  • 학업 no significant academic disadvantage was reported for HuMed students in terms of higher rates of serious academic difficulty in the first or second year of medical school12 or attrition rates.14
  • nonschlarly leaves가 더 많음 HuMed students did, however, take significantly more nonscholarly leaves for personal, academic, or psychiatric reasons,12
  • 일부 학업에 대한 헌신이 부족하여 잘 못함 with a subset of HuMed students not performing well academically because they lacked the commitment necessary for medical education.14
  • 전임상 수행능력 약함 Preclerkship performance ratings (United States Medical Licensing Examination [USMLE] Step 1 plus first- and second-year basic science courses)14 were weak.
  • 임상 수행능력 동등 Yet clerkship performance ratings (Comprehensive Clinical Assessment [COMPASS] II, clerkship performance)12 indicated basic equivalency, and those HuMed students with multiple clerkship honors often were those who had experienced USMLE Step 1 or basic science performance difficulties.14
  • 정신과학, 일차의료 선호 HuMed students demonstrated enhanced performance outcomes and a predilection for psychiatry and primary care specialties.
  • 정신과학 소아과학 임상실습 우월 HuMed students excelled in psychiatry12,14 and pediatric14 clerkships, and
  • 일차의료, 정신과 전공 선택 they were more likely than other medical students to select primary care and psychiatry residencies.12
  • 연구 비슷 HuMed and other medical students attained similar graduation research distinctions, but
  • 연구 장학금 더 많이 수혜 HuMed students were significantly more likely than other students to be recipients of Doris Duke Clinical Research Fellowships and to undertake a research year.12


학업능력 Academic performance.


기본적 학업 수행 비슷

Generally, these studies reported basic equivalency in academic performance irrespective of premedical education. Outcomes compared included

  • GPAs for each year of medical school,
  • NBME and USMLE scores,
  • delayed graduation and attrition rates,
  • rates of academic difficulty, and
  • average class ranking (see Table 1).


과학과목 이수 안해도 1학년 비슷하나, 생화학은 예외였음.

Caplan and colleagues22 found that the grades of first-year medical students who had not taken advanced science courses were equal to those of students with stronger premedical science backgrounds. The exception was biochemistry, where those with prior biochemistry experience had significantly higher grades. Koenig18 reported that although there was no significant difference between broadly prepared and science-focused medical students’ NBME Part I scores, science- focused students achieved higher mean scores for three science subtests, and broadly prepared students achieved higher mean scores for the behavioral sciences subtest.


유급률은 비슷하다고 나오는 연구도 다수이나, 일부 다른 결과를 보여주기도 함.

Although multiple studies reported similar attrition rates,14,16,27 other studies have reported contrasting findings.



임상능력 Clinical performance.


동등함

Multiple studies reported basic equivalency of clinical competence across medical students, residents, or physicians irrespective of premedical education.16,19,20,27,28,31,32 Outcomes compared included

  • performance in clinical clerkships,
  • first-year residency clinical performance ratings,
  • humanism scores,
  • patient-centered attitudes, and
  • various clinical competencies (see Table 1).

HuMed 학생이 정신과, 소아과 실습에서 더 잘함

HuMed students’ clerkship competencies were basically equivalent with those of non-HuMed students except for HuMed students’ excellent performance in psychiatry and pediatric clerkships.12,14




입학과정의 상업화와 국제화 Commercialization of and globalization of admission processes.


의과대학 입시시장은 MCAT, MMI, 시험대비반 등이 있다. Tompkins는 플렉스너 이후 proprietary medical school이 사라졌으나, 시장은 시험 대비 과정에 관심을 가지고 있다. MCAT 준비는 수백만달러 짜리 사업이다 2010~2012에 61.3%~65.3%가 MCAT 준비 과정을 사용했다고 보고함. 또한 1/3 이상이 MCAT을 2차례 이상 봤다고 대답. ProFitHR 는 맞춤형 MMI를 제공.

This medical school admission marketplace comprises tools including the MCAT exam,39 the MMI,37,40 and test preparation courses.41–43 Tompkins43 notes that although proprietary medical schools may have disappeared post Flexner, business interests now focus on test preparation courses. MCAT preparation courses are one facet of this multi- million-dollar business. In 2011, 91,600 MCAT exams were completed.44 Between 2007 and 2009, MCAT computer-based practice test sales increased by almost 50%.41,45 Moreover, from 2010 to 2012, 61.3% to 65.3% of matriculating medical students reported using MCAT preparation courses, and more than one-third of medical school matriculants annually report taking the MCAT exam multiple times.46 The company ProFitHR has monetized the MMI, offering customizable MMI materials that can be applied across a range of tasks with applicants to medical, pharmacy, veterinary, and dental schools.37,40


미국 의과대학의 국제 협력이 사업이 늘고 있음.

U.S. medical schools are expanding48 with global collaborative ventures such as the

  • Weill Cornell Medical College in Doha, Qatar;
  • the DUKE-NUS Medical School and Research Center in Singapore; and
  • the Medical School for International Health (Ben Gurion University), a joint venture with Columbia University Medical Center in Beer-Sheva, Israel— all of which employ the MCAT exam.50–52


고찰

Discussion and Conclusions


SSH 전공 학생의 학업/임상/연구 수행능력이 다른 학생들과 비슷하나 패턴이 조금 다르다. 예컨대 연구 탁월 수준은 비슷하나, SSH 학생이 임상연구에 대한 흥미를 더 보인다. 정신과나 일차의료 전공과에 대한 진로 선호가 높다.

The studies reviewed indicate that the academic, clinical, and research performance of medical students with SSH premedical education is equivalent to that of other medical students, but different patterns of competencies exist. For example, although research distinction is similar, increased clinical research interest is associated with SSH background.12 Enhanced performance outcomes in and career preferences for psychiatry and primary care specialties such as pediatrics are reported for students with SSH backgrounds.12,14 Notably, these career preferences may present a health human resourcing opportunity to address long- standing primary care and psychiatry physician shortages.33,34


MCAT2015에서 사회과학과 행동과학의 중요도를 높였지만, 충분하지 않아 보인다. 지원자들은 전략적으로 일부 SSH과목만 선택하여 들어도 MCAT을 잘 볼 수 있으며, 심리학이나 사회학 개론 정도 과목만 들을 것이고, SSH에 관한 폭넓은 독서나 학습을 할 것 같지 않다. 현재 MCAT은 임상수행능력을 예측한다(MMI처럼). MCAT시험과 MMI는 SSH전공 여부와 무관하게 의과대학생이 임상스킬의 잠재력을 가지고 있는지를 평가한다. 임상스킬은 입학과 관련된 중요한 성과지표 중 하나이며, 따라서 지속적으로 MCAT, MMI에 의존하면 premed교육에서 SSH의 설 자리가 점점 좁아질 것이다.

Although the MCAT2015 exam blueprinting process recognized the importance of the humanities and of social and behavioral sciences,4 we believe it does not go far enough, and applicants will behave strategically in their SSH course selections and readings to remain competitive for the MCAT2015 exam, likely narrowly selecting introductory psychology and sociology courses, rather than selecting and reading broadly within SSH as predicted.4 The current MCAT exam predicts clinical performance outcomes,41 as does the MMI.37 The MCAT exam and MMI enable selection of medical students with the potential for strong clinical skills, irrespective of SSH premedical education status. Because clinical skills are an important admission outcome, continued reliance on these tools may render SSH premedical education irrelevant.




MMI는 널리 사용되고 있으나 SSH를 premed에서 교육하는 것에 대한 논쟁은 아직 논란중

While the MMI is being implemented across the United States, Canada, and globally, SSH premedical education’s acceptance as legitimate premedical preparation continues to be debated.


더 나아가면 premed에서 SSH를 교육하는 것이 premed에서 과학과목을 강조하는 것을 '대체'하게 될 것 같지는 않다. 그러나 SSH교육의 장점을 더 잘 활용할 수 있는 세 가지 방법이 있다.

Going forward, it is unlikely that SSH premedical education will supplant the emphasis on science premedical education in the medical school admission process. However, three approaches could be explored to better incorporate the beneficial outcomes of SSH premedical education.

 

첫째, 일차의료의사 양성에 목적을 두는 의과대학은 적극적으로 SSH전공자나 SSH집중이수자를 선발할 수 있다.

First, medical schools with a mission to graduate primary care physicians could actively recruit prospective applicants with SSH majors or concentrations.



두 번째, SSH 입학을 HuMed와 같이 하나의 의과대학 입학 stream으로 도입하는 것이다. 이러한 HuMed식 접근법은 이렇게 들어온 학생이 동료들과 비슷하게 잘 한다는 지속적 경험을 통해서 문화의 변화를 촉진할 수 있다. 실제로 MSSM은 HuMed 프로그램을 확장하여 FlexMed 프로그램으로 만들고 있다. 비록 의과대학의 문화가 이들 SSH 출신 학생들에게 초기에 과학-관련 전임상 과목에 대한 추가적 지원을 해주는 쪽으로 바뀌어야 할테지만, 궁극적으로 이들은 동료보다 더 잘하거나 비슷하게 졸업한다.

A second approach would be adoption of an SSH admission stream as one component of a school’s overall admission

process, similar to MSSM’s successful approach with its HuMed Program. This HuMed approach may foster, as it has at MSSM,54,55 a cultural shift as faculty consistently experience the selection of medical students with graduation success on par with (or sometimes better than) their peers with premedical science education. Indeed, MSSM is transforming the HuMed program with its expanded FlexMed program.55 Although a school’s culture may have to shift to accommodate medical students with an SSH background who need early, additional support for science-based preclerkship courses, ultimately these students will excel and graduate alongside their peers.


SSH 특이적 입학 stream이 가지는 추가적 장점은 premed 교육의 비용을 낮추고 임상연구에 관심이 있는 학생을 더 모집하게 된다는 것이다. HuMed 모델에서 학생들은 MCAT을 보지 않아도 되므로 MCAT 준비에 들어가는 비용 부담이 사라진다. 이들은 연구 수월성 측면에서 비슷하나 Research Fellowship은 더 받았다. 이 fellowship 프로그램은 임상연구에 관심이 있는 학생을 대상으로 만들어진 것으로 - 기초과학이 아니라 - MD-PhD 프로그램의 보완complement이다.

SSH-specific admission streams like HuMed have additional benefits: decreased premedical educational costs and increased student interest in clinical research.12 The HuMed model12,14 does not require its applicants to take the MCAT, which eliminates the premedical education cost barrier of the MCAT exam and its associated preparation courses.41 HuMed students experience similar research distinction as their peers but significantly more often receive Doris Duke Clinical Research Fellowships.12 This fellowship program is specifically designed for medical students interested in clinical research, not basic science research, and serves as a complement for MD-PhD programs, which commonly attract students with basic science research interest.56



세 번째, 의과대학은 SSH 선수과목을 한두개, 혹은 매우 제한적으로 이수한 학생만 선발하는 것에서 더 넓혀 나가야 한다. 한두개의 SSH 선수과목만으로는 의도하는 교육 효과를 내기 어렵다. 단순히 SSH 과목을 수강했느냐 아닌가가 아니라, SSH 전공자나 SSH 집중이수만이 의도한 장점과 연관되어 있었다.

In a third approach, medical schools could revise their admission policies to include more than a single or limited number of SSH course prerequisites, as single or limited SSH course prerequisites policies may not have the intended educational outcomes. SSH premedical education with either an SSH major or course concentration, not simply SSH course counts, was associated with the beneficial outcomes noted in our review.


11 Association of American Medical Colleges. Chapter 2: Building a strong foundation: Your undergraduate years. In: Medical School Admission Requirements (MSAR), 2012–2013, United States and Canada. Washington, DC: Association of American Medical Colleges; 2012.



Table 1 Characteristics of the 20 Studies Included in a Review of the Literature on How Social Sciences and Humanities (SSH) Premedical Education Affected Performance During and/or After Medical School




 2014 Jul;89(7):1075-86. doi: 10.1097/ACM.0000000000000284.

Is social sciences and humanities (SSHpremedical education marginalized in the medical school admissionprocess? A review and contextualization of the literature.

Author information

  • 1Mr. Hall is a third-year medical student and Leadership Education and Development (LEAD) Program scholar, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada. Dr. Woods is a scientist, The Wilson Centre, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada. Dr. Hanson is associate dean, Admissions and Student Finances, Undergraduate Medical Education, and associate professor, Department of Psychiatry, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Abstract

PURPOSE:

To investigate the performance outcomes of medical students with social sciences and humanities (SSHpremedical education during and beyond medical school by reviewing the literature, and to contextualize this review within today's admission milieu.

METHOD:

From May to July 2012, the lead author searched the PubMed, MEDLINE, and PsycINFO databases, and reference lists of relevant articles, for research that compared premedical SSH education with premedical sciences education and its influence on performance during and/or after medical school. The authors extracted representative themes and relevant empirical findings. They contextualized their findings within today'sadmission milieu.

RESULTS:

A total of 1,548 citations were identified with 20 papers included in the reviewSSH premedical education is predominately an American experience. For medical students with SSH background, equivalent academic, clinical, and research performance compared with medical students with a premedical science background is reported, yet different patterns of competencies exist. Post-medical-school equivalent or improved clinical performance is associated with an SSH background. Medical students with SSH backgrounds were more likely to select primary care or psychiatry careers. SSH major/course concentration, not SSH course counts, is important for admission decision making. The impact of today's admissionmilieu decreases the value of an SSH premedical education.

CONCLUSIONS:

Medical students with SSH premedical education perform on par with peers yet may possess different patterns of competencies, research, and career interests. However, SSH premedical education likely will not attain a significant role in medical school admission processes.

PMID:
 
24826852
 
[PubMed - indexed for MEDLINE] 
Free full text


이공학계열 출신 학생과 비이공학계열출신 학생 혼합의 효과 탐색(Medical Education, 2014)

Exploring the consequences of combining medical students with and without a background in biomedical sciences

Rachel H Ellaway,1 Amanda Bates,2 Suzanne Girard,2 Deanna Buitenhuis,2 Kyle Lee,2 Aidan Warton,3 Steve Russell,3 Jill Caines,3 Eric Traficante3 & Lisa Graves1,4



CONTEXT:

의과대학에는 대체로 뛰어난 이공계열 출신 학생들이 입학한다. 기존의 연구에서 사회과학계열 출신 학생도 의과대학에서 비슷한 수준의 성취를 올릴 수 있음을 보여준 바 있지만, 오랜 기간동안 '비이공학계열 출신 학생'으로 지내는 것에 대해 연구된 바는 많지 않다.

Medical schools have tended to admit students with strong backgrounds in the biomedical sciences. Previous studies have shown that those with backgrounds in the social sciences can be as successful in medical school as those with science backgrounds. However, the experience of being a 'non-science' student over time has not been well described.


METHODS:

Mixed-methods 연구를 하였음. 개인 수준의 경험을 확인하고자 설문과 면담을 하였다. 스스로의 정체성, 스스로 느끼는 준비된 저도, 스트레스 등을 확인하였고 의과대학 기간 전반에 걸친 경험을 물었다. 설문 결과는 descriptive statistics를 활용하였고, 포커스그룹 결과와 unstructured data는 common theme을 찾았다. 모듈 종료 후, 학년 종료 후 시험성적을 분석하였다.

A mixed-methods study was developed and run with the aim of elucidating the personal experiences of science and non-science students at our institution. Data were generated from a student survey that focused on participants' self-identification as science or non-science students, and on their sense of preparedness and stress, and from a series of student focus groups exploring participants' experiences of science and non-science issues in all aspects of their training. Descriptive statistics were generated for structured survey data. Focus group data and unstructured survey data were analysed to identify common themes. End-of-module and end-of-year examination data for the four class cohorts in the programme were also analysed to compare science and non-science student performance over time.


RESULTS:

두 그룹간 확연한 차이가 있었다. 준비도와 스트레스 수준에 대해 차이가 있었으며, 시험성적에도 차이가 있었으나 이 차이는 3학년으로 올라가면서 거의 사라졌다. 같은 교실에 두 그룹의 학생이 모두 있는 것은 서로 다른 방향으로 서로 다른 수준의 차이를 만들었는데, 여기서 생기는 혼란(disruption)도 수행능력 차이가 줄어들면서 점차 사라졌다.

There were clear differences between the experiences and performance of science and non-science students. We found dichotomies in students' self-reported sense of preparedness and stress levels, and marked differences in their examination performance, which diminished over time to converge around the third year of their studies. Combining science and non-science students in the same class affected the students to different extents and in different ways. The potential disruption of mixing science and non-science students diminished as their levels of performance converged.


CONCLUSIONS:

비이공계출신 학생들이 겪는 정신사회적 스트레스는 학업적 부분과 개인적 부분에 모두 있었으며, 이는 이들 학생에 대한 지원이 어떠해야 하는지, 교육과정이 어떻게 모든 학생들에게 도움이 될 수 있게 적합하게 조정되어야 하는가에 대해 시사하는 바가 많다.

The psychosocial stress experienced by non-science students and the challenges it posed, in both their academic and their personal lives, have implications for how such students should be supported, and how curricula can be configured to afford quality learning for all medical students.





학생들의 분류 Categorising students according to their academic backgrounds is an essential part of studying this phenomenon with prior studies employing similar models. For instance, Yens and Stimmel categorised students as having a major in ‘traditional science’ (e.g. biology, chemistry, zoology), ‘other science’ (e.g. psychology, sociology) or ‘humanities’ (e.g. English, philosophy), noting that the performances of students in the ‘other science’ and ‘humanities’ categories were very similar.[2] Dickman et al.[3] defined a ‘non-science’ group based on students’ majors in subjects such as English, history, psychology and sociology (roughly 27% of the class); the remaining students were given the ‘science’ category and had backgrounds in subjects such as zoology, biology, chemistry and physics. We chose to adopt a similar dichotomised model using the terms ‘science’ and ‘non-science’. However, rather than defining the inclusion criteria for either category in advance, we allowed these definitions to emerge from our data.




수행능력에 대한 기존 연구 Many studies have compared the performance of science and non-science students. 

    • Gough observed that whereas student examination performance differed in Years 1 and 2, it had become indistinguishable by Year 4 of the programme.[1] 
    • Dickman et al.[3] found no significant differences between science and non-science students in examination performance, licensing examination performance, and residency selection. 
    • Craig et al.[10] grouped students into five categories based on their backgrounds (health professions, biomedical sciences, other biology, physical sciences, and non-science) and compared their performance on different kinds of science-based examination. They found differences between all groups, with non-science students performing less well than students in the other categories. The differences among all groups diminished over time.[10]


수행능력 차이가 사라진다는 점은 조심히 볼 필요가 있음. This issue of convergence should be considered with some caution. Although several studies found no significant difference in final performance, they also noted that both science and non-science students had comparable MCAT scores,[2, 12] indicating that non-science students in these studies had studied sufficient biomedical science material to pass the test, thereby reducing the differences between the two groups. Although the MCAT is a prerequisite for entry to many schools in the USA and Canada, some medical schools have opted not to use the MCAT (or an equivalent) that necessarily selects candidates with a science background over those with other backgrounds,[13] thereby removing any normalising effects it might have.


모든 연구가 시험점수에만 초점을 맞춘 것은 아님 Not all studies have concentrated on the examination performance of science and non-science students. Ferrier and Woodward looked at the attitudes of students at the then relatively new McMaster University medical school and reported: ‘…academic background has been found to be of little influence on graduates’ perceptions of the […] programme.’[14] However, a more recent study found that although non-science students were able to cope with the demands of medical school, ‘students who had a non-science background prior to entering medicine were significantly less positive than those students who had a science background’.[11] This suggests that, although non-science students are able to catch up with their science-educated colleagues, there are clear differences in the personal experiences and expectations of the two groups. Indeed, several papers have identified science students as having an intrinsic advantage in their studies,[4, 5, 15] which in turn raises equity of access issues.[16]


서로에 대한 영향이 어떤지에 대한 연구는 적음 Although these studies have compared the grades and performance of science and non-science students, relatively little attention has been focused on the impact of science and non-science students on each other, or on how their personal experiences differ. One study, that did look at how different students approached their studies, concluded that: ‘…liberal arts graduates favoured “discussing issues” over “memorising facts” and “problem solving”. These points likely reflect students' familiarity and comfort with different pedagogies in their undergraduate settings.'[17]





Analytical methods

All focus groups were audio-recorded and transcribed. Free-text responses were combined with the focus group transcripts. Thematic analysis involved seven reviewers (RHE, AB, JC, ET, KL, SG, and Boxhill). The six student reviewers independently generated a narrative interpretation of the transcripts and the faculty lead performed a line-by-line open coding thematic analysis. The analyses were shared among the review team for comment. These comments were assimilated into a single thematic framework that accommodated and linked the various concepts and themes identified in the earlier stages. This framework was further reviewed and adapted to accommodate the remaining differences between reviewer perspectives. Descriptive statistics were generated for structured responses from the survey. Examination data were analysed by Theme across all 4 years of the programme for each cohort (for which data were available) and descriptive statistics generated.







Science and non-science students' personal experiences were different

    • Although science and non-science students did not necessarily represent two completely dichotomous groups, they did tend to self-identify as belonging to one or other category. Student perceptions of this difference in others seemed to diminish over time:
    • However, the stigma of being ‘non-science’ may last longer:
    • The difference between science and non-science students was emphasised in sessions in which non-science students were seeing material for the first time and science students were reviewing material they had previously been taught:
    • This difference was amplified by the fact that non-science students had much less time to learn this material than their peers had originally had:
    • Although students perceived the difference between science and non-science academic backgrounds in themselves and their peers, they did not necessarily see a non-science background as an intrinsic disadvantage:



Science and non-science students had different approaches to study

    • Setting aside individual learning preferences, students from science and non-science backgrounds tended to approach their learning in different ways. In general, science students reported having been taught to study details within single concepts, whereas non-science students were trained to look for general themes crossing multiple concepts. Approaches to learning that were unfamiliar added to students' stress levels:
    • Science and non-science students sometimes helped each other with unfamiliar material:
    • Sometimes this division of labour led to a positive group dynamic:
    • Unfamiliar language, terminology and presentation styles were more troubling for non-science students than unfamiliar content:
    • Differences between science and non-science students changed the learning environment.
    • Different approaches to learning were more apparent in small-group learning contexts, in which students were more dependent on one another:
    • Differences in the needs and behaviours of science and non-science students sometimes led to a negative group dynamic:
    • Different group dynamics determined students' ability to help others or to be helped:




Specific non-science student concerns

    • Many non-science students were worried they were falling further behind their science peers:
    • Non-science students' attention to learning biomedical science material could lead them to miss out on other aspects of medical school life:
    • Some non-science students became increasingly solitary as they sought to catch up with their science peers:
    • Non-science students sometimes voiced concerns about the impact they had on their science peers, for instance limiting in-depth discussion of biomedical science topics:
    • Non-science students also described being particularly stressed in their first year:
    • A number of non-science students had not known where to start to prepare for medical school:
    • Several non-science students suggested that they might have prepared differently had they known what would be required of them once they started:



Specific science student concerns

    • Some science students acknowledged their advantage:
    • Others acknowledged that a science perspective could be both a strength and a weakness:
    • Some science students expressed sympathy for the struggles of non-science students:
    • Some science students reported helping their non-science colleagues:
    • Other science students were not always well disposed to their non-science colleagues:



Examination data







의학교육에 시사하는 바

Although this was a single-institution study, its findings have broader implications for medical education.


Firstly, schools and programmes need to be more sensitive to the personal experiences of students from non-science backgrounds, particularly in the initial stages of their training. This is not to say that all non-science students will struggle to the same extent, but the hidden curriculum of non-science students who must stoically accept their additional workload and associated stress should be acknowledged and, where appropriate, addressed.


Secondly, group cohesiveness is an indicator of the quality of learning for members of that group.[24] Teachers need to be aware of the potential for disruption that the combining of science and non-science students in a single group can bring. Positive group dynamics should be encouraged, which should include the building of a shared recognition of the different strengths of those in the group, and students should be encouraged to help each other according to these strengths.


Thirdly, teachers should seek to address common learning challenges for their non-science students. For example, a recurring issue identified in this study was non-science students' struggle to understand how biomedical science material was spoken in terms of its style, syntax and underlying assumptions. Supporting non-science students' orientation to these unfamiliar discourses would go some way to making their journeys easier.


Fourthly, schools should consider their support for non-science students before they start school as the challenges of access to medical training clearly continue after their places are confirmed. Although this would probably fall short of classes or other face-to-face activities, support could be provided in the form of online study materials such as self-assessment quizzes linked to study materials and primers. We are planning a follow-up study to explore this area further.


We should note that we found little indication that significant changes to the curriculum as a whole were either necessary or desired. We identified issues around the accessibility of the programme rather than problems with the programme itself. We should also be clear that the issues we have identified are not about the shortcomings of either non-science or science students, but, rather, about the impact of combining them in a single class. A non-science background should not be a barrier to medical school admission, as others have noted.[11, 13]


Not all medical schools are alike. Some schools may concentrate on creating physician scientists, whereas others may have more of a community focus.[25] Community-focused and socially accountable schools often seek to open access to medicine to previously under-represented populations[26] and the findings of this study may therefore be more relevant to these institutions. However, any institution that admits non-science students should consider our findings from the perspectives of both student welfare and curriculum design and delivery.







 2014 Jul;48(7):674-86. doi: 10.1111/medu.12496.

Exploring the consequences of combining medical students with and without a background in biomedicalsciences.

Author information

  • 1Undergraduate Medical Education, Northern Ontario School of Medicine, Sudbury, Ontario, Canada.

Abstract

CONTEXT:

Medical schools have tended to admit students with strong backgrounds in the biomedical sciences. Previous studies have shown that those with backgrounds in the social sciences can be as successful in medical school as those with science backgrounds. However, the experience of being a 'non-science' student over time has not been well described.

METHODS:

A mixed-methods study was developed and run with the aim of elucidating the personal experiences of science and non-sciencestudents at our institution. Data were generated from a student survey that focused on participants' self-identification as science or non-sciencestudents, and on their sense of preparedness and stress, and from a series of student focus groups exploring participants' experiences of science and non-science issues in all aspects of their training. Descriptive statistics were generated for structured survey data. Focus group data and unstructured survey data were analysed to identify common themes. End-of-module and end-of-year examination data for the four class cohorts in the programme were also analysed to compare science and non-science student performance over time.

RESULTS:

There were clear differences between the experiences and performance of science and non-science students. We found dichotomies instudents' self-reported sense of preparedness and stress levels, and marked differences in their examination performance, which diminished over time to converge around the third year of their studies. Combining science and non-science students in the same class affected the students to different extents and in different ways. The potential disruption of mixing science and non-science students diminished as their levels of performance converged.

CONCLUSIONS:

The psychosocial stress experienced by non-science students and the challenges it posed, in both their academic and their personal lives, have implications for how such students should be supported, and how curricula can be configured to afford quality learning for allmedical students.

© 2014 John Wiley & Sons Ltd.

PMID:
 
24909529
 
[PubMed - indexed for MEDLINE]


자기소개서와 교수추천서에 의한 일개 의과대학 응시자 특성 분석

Analysis of characteristics shown in self introduction letter and professor’s recommendation letter

김상현

Sang Hyun Kim

강원대학교 의학전문대학원 미생물학교실

Department of Microbiology, Kangwon National University School of Medicine, Chuncheon, Korea


[[[[[


서론


자기소개서는 주로 진학과 취업을 목적으로 쓰이는 글로서 최근 다양화와 특성화를 추구하는 각 대학에서 모집단위 특성에 적합한 학생들을 선발하거나 차별화된 인재를 선발하기 위해 도입된 입학전형요소이다[1]. 자기소개서나 교수추천서는 학부모추천, 동료추천, 학생생활기록부, 면접 등의 여러 다면 정보에 속하며 국내 의학전문대학원(의전원) 입시에서 보편적으로 사용되는 양식이다. 그 동안 활용범위가 제한적이었으나 2007년도에 입학사정관제가 10개 대학을 중심으로 시작되면서 자기소개서의 중요성이 점차 커지게 되었다. 2013년 기준으로 26개 의전원 수시모집에서 25개 의전원이 자기소개서 제출을 요구하고 있으며 9개 의전원이 교수추천서를 제출하도록하고 있을 정도로 필수적인 전형요소로 인식되고 있다.


자기소개서는 자기보고서의 형태로 자아개념, 특기, 가치, 교내외에서의 활동과 성취 등에 대해 학생 자신이 설명하는 기록이며, 성적 등의 정량화된 도구로서 파악하기 힘든 학생의 논리력, 창의력 등의 특성을 나타내어 학생을 정성적으로 평가하는 데 도움을 주는 양식이다[2]. 이러한 자기소개서에는 자신의 능력에 대한 평가, 자신의 강점과 약점, 관심 분야, 지원동기, 향후 학습 계획 등이 포함된다[2]. 또한 대학입시에서 사용되는 자기소개서에는 자신의 능력과 꿈 등과 관련하여 자기 자신에게 문제를 제기하기도 하고, 그러한 문제를 해결하기 위해 어떠한 활동을 하였는지를 포함하기도 한다[3]. 그러므로 평가자 입장에서는 학생의 특성을 미리 파악하는 데 도움이 되고 구술 면접 시에 사전 질문거리를 만드는 데 실질적인 도움을 준다[4]. 입학을 목적으로 쓰이는 자기소개서는 친교나 성찰을 목적으로 하는 글과는 다른 몇 가지 독특한 특징을 가진다[5]

첫째, 수필과 같이 주관적인 성격을 가진 글이 아니라 공적인 성격을 가진 글이므로 지나치게 친근하거나 독특한 표현을 지양하고 객관적으로 작성되어야 한다[5]. 이러한 점에서 자기소개서는 James Britton의 글쓰기 분류에서 자기 표현적 글쓰기와 의사소통적 글쓰기의 중간단계에 위치한다고 볼 수 있다[6]. 

둘째, 설득을 목적을 가진 의도적인 글이므로 평가자의 의도를 염두에 두고 자신에 관한 긍정적인 정보를 전달할수 있도록 작성되어야 한다[5]. 

셋째, 분량이 정해진 글이다. 대부분의 의전원에서는 10개 이내의 항목에 400∼1,000자 이내의 제한된 글자 수를 요구하고 있으므로 한정된 분량에 맞추어 효과적으로 자신을 표현할 수 있도록 효율적인 단어를 선택하여야 한다. 대학에서 자기소개서에 포함되기를 요구하는 내용을 크게 4가지 영역으로 나누면 전공에 대한 열정(대학[학과] 지원 동기, 관심 분야, 학업 계획), 학과적성(지원자를 선발 해야하는 이유), 개인능력(자질, 가능성, 자기주도적 학습능력), 개인인성(봉사, 도전의식, 대인관계)이다[5].


한편, 영재판별과정 중에서 일어나는 지능검사와 성취도검사 등의 정량화된 검사 사이에 존재하는 선입견을 극복하기 위한 한 가지 방안으로 영재교육 대상자를 선발하는 데 있어서 교사추천서가 널리 활용되고 있다[7]. 교수추천서는 지원자의 특징을 다면적으로 기술하는 데 효과적이며 인지적 측면을 강조하는 기존의 학점위주의 획일적 선발방법을 보완하는 데 도움을 준다. 의전원 입시에서도 학부 성적과 영어 성적 등의 표준화된 도구의 한계를 보완하는 방안으로써 활용되는 교수추천서는 그 의의가 크다고 할 수 있다. 자기소개서에는 의전원에 지원하여 의사가 되고자 하는 학생의 행동특성이 나타나 있고, 교수추천서에는 교수가 오랜 기간 관찰해온 지원자에 대한 행동특성이 나타나게 된다. 교수추천서는 교수의 관점에서 쓴 내용이므로 주관적일 수 있지만 항목에 대한 정확한 기준이 제시되어 있고 객관적인 자세로 기술된다면 상당히 공식적일 수 있다[8]. 일반적으로 교수추천서에는 학생의 인성, 전공 적합성, 발전 가능성, 창의성 요소가 포함되지만 의전원 입학을 목적으로 쓰이는 교수추천서에는 의학도로서의 기본 자질(성실성, 도덕성, 봉사정신), 발전 가능성(창의성, 리더십, 도전정신, 대인관계), 의학 수행의 적합성(의학수행의 열의, 준비도, 학습능력) 등이 포함된다.


2013년 기준으로 수시모집에서 26개 의전원 중 25개 의전원이 자기소개서를 요구하고 있고 9개 의전원이 교수추천서 제출을 요구할 정도로 의전원 입시에서 광범위하게 활용되는 만큼 현재 의전원 입학전형에 사용되는 자기소개서와 교수추천서의 각 항목 및 모집단위에 대한 면밀한 검토를 통해 자기소개서와 교수추천서가 의전원생 선발 과정에서 좀 더 공정하고 객관적인 자료로 활용될 수 있는 방안을 모색해 보고자 하였다. 이번 연구에서는 의전원 입학생 선발에 사용되는 자기소개서와 교수추천서에서 지원자와 교수가 기술하고 있는 내용을 인지적, 정의적, 사회적 행동특성으로 나누어 빈도분석하였고, 자기소개서와 교사추천서 특성 간에는 어떠한 상관관계가 존재하는지를 분석하였다.





대상 및 방법


1. 연구 대상

2013년도 강원대학교 의학전문대학원 입학생 선발 수시모집 응시자 109명 중에서 일반전형 지원자 40명, 특별전형 지원자 52명, 정시모집 응시자 중 M.D.Ph.D. 과정 지원자 17명(서류 미비 2명 제외)을 대상으로 하였다.


2. 연구 방법

수학과 과학 영재의 24가지 행동 특성 중에서 수학적, 과학적 특성만을 표현하는 특성을 제외하고 일반적으로 지적으로 우수한 성인에게 적용 가능한 17가지 특성을 이용하여 분석하였다[9,10,11,12,13,14]. 지원자와 교수가 언급하고 있는 내용을 인지적 행동특성으로는 지적호기심, 문제해결능력, 창의성, 자기주도적 학습능력, 의사소통능력 등 총 5항목, 정의적 행동특성으로는 의학적성, 자신감, 도전정신, 다재다능함, 완벽주의, 자아개념, 내적동기, 적극성, 목표의식 등 총 9항목, 사회적 행동특성으로는 도덕성, 사회성, 리더십 등 총 3항목으로 나누어 분석하였다(Table 1, Appendix 1, 2, 3).


3. 자료 분석

자기소개서와 교수추천서에 나타난 학생의 행동특성을 항목 별로 빈도분석하였다. 각 행동특성들의 빈도 차이의 유의성을 알아보기 위해 χ2 검증을 실시하였고, 각 행동특성들의 상관관계를 알아보기 위하여 Pearson의 단순적률상관계수를 이용한 상관관계를 분석하였다. 모든 통계분석은 SPSS version21.0 (IBM SPSS Inc., Chicago, USA)을 사용하였다.








결과

1. 전체 지원자들의 자기소개서에서 나타난 행동특성

2. 수시지원 학생과 M.D.Ph.D. 프로그램 지원학생의 자기소개서에서 나타난 행동특성

3. 남학생과 여학생의 자기소개서에서 나타난 행동특성

4. 교수추천서에서 나타난 지원자의 행동특성

5. 자기소개서와 교수추천서에 나타난 행동특성들의 상관관계 분석














고찰


자기소개서와 교수추천서의 내용 분석을 통하여 의전원에 입학하려는 학생들이 자기 자신에 대해 기술한 행동특성과 학생을 곁에서 오랜 기간(평균 3년 7개월) 관찰해 온 교수가 객관적 입장에서 기술한 예비의료인의 행동특성을 확인하였다. 개인의 풍부한 발달은 인지적 영역과 사회정의적 영역 간의 상호작용을 통해서 이룰 수 있다고 한다[15]. 한편 영재의 경우 일찍부터 가치, 공정함, 정의감을 발달시키고 구체화시키면서 내면화시킨다고 하였고 다른 사람의 권리와 감정을 쉽게 이해할 수 있다고 한다[2]. 의전원 지원자를 영재의 기준에서 보았을 때 인지적, 정의적, 사회적 영역의 상호작용 속에서 의사로서의 소양을 이룰 수 있으므로 인지적, 정의적, 사회적 특성을 행동특성의 분석기준으로 사용하였다.


자기소개서와 교수추천서 모두에서 의학적성이 가장 많이 언급되었다. 교수와 학생은 공통적으로 의학적성을 중요시하며 기술하고 있다는 것을 알 수 있었다. 의전원에 지원한 학생은 자기소개서에서 지원하려는 학과의 특성이 자신들에게 적합함을 우선적으로 강조한다는 것을 알 수 있었고, 교수는 의사라는 직업적 특성에 지원자가 적합함을 추천서에서 우선적으로 강조한다는 것을 알 수 있었다. 교수추천서의 사회적 특성에서 가장 많이 언급된 것은 사회성으로 교수의 입장에서는 사회에 잘 적응하고 대인관계가 원만한 것을 중요시한다는 것을 알 수 있었다. 자아개념은 교수추천서에서는 언급되어있지 않았고 자기소개서에서만 확인할 수 있었는데 이는 자기 자신에 대한 의식이 자기소개서에서만 나타나기 때문인 것으로 생각되었다. 학생은 자신의 내면적 특성을 가장 잘 보여주는 정의적 특성을 자기소개서에서 가장 많이 언급하는 반면에 이를 확인하기 힘든 교수는 외면적 관찰이 용이한 학생의 인지적 특성사회적 특성을 추천서에 많이 언급하고 있음을 알 수 있었다. 학생 스스로가 창의성이 있음을 판단하는 것은 다소 어려울 수가 있지만 창의성에 대한 언급 빈도는 31회로 가장 적었는데 무의식, 직관적, 통찰적인 면과 관련된 창의성이 의전원에 지원하려는 학생에게서 부족할 가능성이 있다는 것을 알 수 있었다. 지적호기심, 문제해결능력, 창의성, 자기주도적 학습능력, 의사소통능력 등 모든 인지적 특성이 교수추천서에서보다 자기소개서에서 더 많이 언급되었고 빈도수에서 의미 있는 차이를 보여주었다(p<0.05). 또한 수시 지원 학생과 M.D.Ph.D. 프로그램 지원 학생의 자기소개서에 기술된 행동특성의 빈도분포는 인지적 특성, 정의적 특성, 사회적 행동특성에서 M.D.Ph.D. 프로그램 지원 학생이 의미 있게 많이 언급하였다(p<0.05). 남학생은 정의적 특성인 의학적성, 다재다능함을 여학생보다 많이 언급하였으며 반면에 여학생은 도전정신과 도덕성을 남학생보다 많이 언급하여 남녀 학생 사이에 자기소개서를 기술하는 데 있어서 의미 있는 차이가 있음을 확인하였다(p<0.05).


지원자는 자신이 직접 기술한 자기소개서에 교수가 쓴 추천서에서 보다 자신의 특성에 대해 약 3배 더 많은 내용을 기술하고 있음을 알 수 있었다. 학생은 자신의 소개서를 쓸 때 본인의 특성을 가장 잘 표현하려고 노력하고 있음을 알 수가 있었고 반면에 교수는 한 명의 학생만 추천하는 것이 아니라 여러 명의 학생들의 추천서를 써야하기도 하고 그 학생에 대한 정확한 관찰과 이해가 부족한 가운데 추천서를 쓰기 때문에 여러 행동특성의 기술도 많이 빈약해진다고 할 수 있었다. 또한 교수추천서에서는 인지적 특성이 가장 많이 언급되어 있으므로 교수들은 지원자의 특성을 파악할 때 인지적 특성을 주로 관찰하고 추천서에 기술한다는 것을 알 수 있었다. 이를 통해 평소에 학생면담과 생활지도를 할 때 교수들이 학생들의 정의적 특성과 사회적 특성을 균형 있게 파악할 수 있도록 체계적인 교수법이 의전원 차원에서 필요함을 제시해 주었다. 교수는 학생의 학업에 대한 열정과 학습능력, 원만한 대인관계, 의사소통능력을 추천서에서 많이 강조하는 경향이 있다는 것을 알 수 있었다. 자기소개서를 구성하는 하위 요소들과의 상관관계를 분석한 결과 자기소개서에 인지적 특성이 서술되어 있는 학생일수록 교수추천서에서는 인지적 특성이 강조되고 있다고 해석할 수 있었고 자기소개서에 정의적 특성이 서술되어 있는 학생일수록 교수추천서에서는 인지적 특성과 정의적 특성이 강조되어 언급되어 있다고 해석할 수 있었다(Table 3).


자기소개서와 교수추천서는 글 내용에 대한 신뢰도의 의문, 글쓰기의 기교가 실제 평가에 반영될 가능성이 높다는 점, 평가자 간에 일관성 있는 채점이 어렵다는 점 등으로 인해 입학전형에서 중요성이 간과되기 쉬운 면이 있다[12]. 그러나 대학 학점과 영어 성적 등의 정량화된 자료만으로는 평가하기 어려운 학생의 다양한 정보를 수집 활용할 수 있는 자기소개서와 교수추천서는 매우 귀중한 자료임이 분명하다. 자기소개서의 내용과 양식은 각 대학별로 대학의 교육이념에 부합하는 인재를 선발할 수 있도록 대학특성에 맞게 변경되어 사용되기도 하지만, 지원자의 잠재능력을 다면적으로 평가하여 대학의 특성에 맞는 학생을 선발할 수 있도록 대학당국은 사전에 자기소개서 작성에 관한 구체적이고 일관된 기준을 제시하는 것이 좋겠다[3].


한편, 관찰 추천 영재 선발 시 사용되는 다면정보들을 활용할 때는 여러 원칙들이 제시되고 있는데 이 중에서 

총체적인 관점의 원리, 

맥락성의 원리, 

실증가능성의 원리, 

개별성의 원리, 

집단 숙의의 원리 

등은 의전원생을 선발할 때에 사용되는 자기소개서와 교수추천서에도 적용이 가능하다[16]. 


    • 총체적인 관점의 원리란 가능한 한 여러 가지 평가정보를 수집하여 평가 정보 간의 교차 점검을 통해 종합적인 질적 판단을 하는것 이다. 인지적, 정의적 측면을 포함하여 체험 기록이나 추천내용 등을 포괄적으로 고려하여 평가 교수 간의 집단 토론을 통해 점수로 정량화하지 않고 학생의 특성을 객관적으로 파악할 수 있게 해 준다[16]. 
    • 맥락성의 원리란 지원자는 다양한 문화적, 경제적 영향을 받으며 성장하는데 학생의 현재의 모습이 어떠한 영향을 받으면서 성취되었는지를 파악해 보는 것이다[16]. 
    • 실증가능성의 원리란 지원자는 자기소개서와 교수추천서를 통해 자신을 능력을 과시하는 경향이 있으므로 실증적이고 객관적인 자료 검증을 통해서만 파악하는 것이다[16]. 
    • 개별성의 원리학생의 고유한 인격, 잠재력 및 역량을 개별적으로 평가하고 가늠하는 것이다[16]. 
    • 집단 숙의의 원리란 평가자인 교수개인의 주관적인 평가가 아니라 자기소개서와 교수추천서에 나타난 사실에 대한 종합적인 이해를 바탕으로 평가 교수들 간에 효과적이고 개방적인 토론을 통해 집단적으로 판단하는 것이다[16]. 


그러므로 이러한 원칙들을 활용하여 자기소개서와 교수추천서가 평가 된다면 의전원 입학전형에 귀중한 자료들로 활용될 수 있을 것이다. 그리고 이와 더불어 학생의 잠재력을 공정하고 객관적으로 평가할 수 있는 자기소개서나 교수추천서의 작성방법에 대한 연구도 지속적으로 필요하다고 하겠다.










Korean J Med Educ > Volume 25(3); 2013 > Article

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Holistic Review

Holistic review is a flexible, individualized way of assessing an applicant’s capabilities by which balanced consideration is given to experiences, attributes, and academic metrics and, when considered in combination, how the individual might contribute value as a medical student and physician.

About Holistic Admissions

A core element of holistic admissions involves widening the lens through which we view applicants, recognizing and valuing different dimensions that shape each individual. The Project’s Experiences-Attributes-Academic Metrics (E-A-M) model translates that concept into a useful tool and provides admissions staff and committee members with a shared framework for thinking broadly about diversity, identifying mission-based criteria that take into account the whole applicant, and spark thinking about applicants as future physicians, rather than merely as prospective students.

An integrated holistic admissions process incorporates four core principles at each stage: screening, interview, and selection. These four core principles emphasize the importance of giving individualized consideration to every applicant and provide operational guidance to ensure that admissions processes and criteria are both mission- and evidence-based, promote diversity, and use a balance of experiences, attributes, and academic metrics.

Definition

Holistic review is a flexible, individualized way of assessing an applicant’s capabilities by which balanced consideration is given to experiences, attributes, and academic metrics and, when considered in combination, how the individual might contribute value as a medical student and physician.

Four Core Principles

1.

In a holistic admissions process, selection criteria are broad-based, clearly linked to school mission and goals, and promote diversity as an essential element to achieving institutional excellence.

2.

A balance of experiences, attributes, and academic metrics (EAM) is

  • Used to assess applicants with the intent of creating a richly diverse interview and selection pool and student body;

  • Applied equitably across the entire candidate pool; and

  • Grounded in data that provide evidence supporting the use of selection criteria beyond grades and test scores.

3.

Admission staff and committee members give individualized consideration to how each applicant may contribute to the medical school learning environment and practice of medicine, weighing and balancing the range of criteria needed in a class to achieve the outcomes desired by the school.

4.

Race and ethnicity may be considered as factors when making admission-related decisions only when such consideration is narrowly tailored to achieve mission-related educational interests and goals associated with student diversity, and when considered as part of a broader mix of factors, which may include personal attributes, experiential factors, and demographics. Or other considerations.*

*Under federal law (and where permitted by state law)


Additional Advancing Holistic Review Initiative Resources

Learn more about the AAMC's work in holistic review.


<출처: https://www.aamc.org/admissions/admissionslifecycle/409104/prepholisticreview.html>



Core Competencies for Entering Medical Students

The 15 Core Competencies for Entering Medical Students (defined below) have been endorsed by the AAMC Group on Student Affairs (GSA) Committee on Admissions (COA). The competencies fall into four categories: Interpersonal, Intrapersonal, Thinking and Reasoning, and Science.

Interpersonal Competencies

Service Orientation: Demonstrates a desire to help others and sensitivity to others’ needs and feelings; demonstrates a desire to alleviate others’ distress; recognizes and acts on his/her responsibilities to society; locally, nationally, and globally.

Social Skills: Demonstrates an awareness of others’ needs, goals, feelings, and the ways that social and behavioral cues affect peoples’ interactions and behaviors; adjusts behaviors appropriately in response to these cues; treats others with respect.

Cultural Competence: Demonstrates knowledge of socio-cultural factors that affect interactions and behaviors; shows an appreciation and respect for multiple dimensions of diversity; recognizes and acts on the obligation to inform one’s own judgment; engages diverse and competing perspectives as a resource for learning, citizenship, and work; recognizes and appropriately addresses bias in themselves and others; interacts effectively with people from diverse backgrounds.

Teamwork: Works collaboratively with others to achieve shared goals; shares information and knowledge with others and provides feedback; puts team goals ahead of individual goals.

Oral Communication: Effectively conveys information to others using spoken words and sentences; listens effectively; recognizes potential communication barriers and adjusts approach or clarifies information as needed.

Intrapersonal Competencies

Ethical Responsibility to Self and Others: Behaves in an honest and ethical manner; cultivates personal and academic integrity; adheres to ethical principles and follows rules and procedures; resists peer pressure to engage in unethical behavior and encourages others to behave in honest and ethical ways; develops and demonstrates ethical and moral reasoning.

Reliability and Dependability: Consistently fulfills obligations in a timely and satisfactory manner; takes responsibility for personal actions and performance.

Resilience and Adaptability: Demonstrates tolerance of stressful or changing environments or situations and adapts effectively to them; is persistent, even under difficult situations; recovers from setbacks.

Capacity for Improvement: Sets goals for continuous improvement and for learning new concepts and skills; engages in reflective practice for improvement; solicits and responds appropriately to feedback.

Thinking and Reasoning Competencies

Critical Thinking: Uses logic and reasoning to identify the strengths and weaknesses of alternative solutions, conclusions, or approaches to problems.

Quantitative Reasoning: Applies quantitative reasoning and appropriate mathematics to describe or explain phenomena in the natural world.

Scientific Inquiry: Applies knowledge of the scientific process to integrate and synthesize information, solve problems and formulate research questions and hypotheses; is facile in the language of the sciences and uses it to participate in the discourse of science and explain how scientific knowledge is discovered and validated.

Written Communication: Effectively conveys information to others using written words and sentences.

Science Competencies

Living Systems: Applies knowledge and skill in the natural sciences to solve problems related to molecular and macro systems including biomolecules, molecules, cells, and organs.

Human Behavior: Applies knowledge of the self, others, and social systems to solve problems related to the psychological, socio-cultural, and biological factors that influence health and well-being.

Additional Resources


<출처: https://www.aamc.org/admissions/admissionslifecycle/409090/competencies.html>




Supporting the Admissions Lifecycle

Admissions is often the first encounter that aspiring physicians have with each medical school, and it serves as the gateway into the medical profession. We are all committed to identifying and fostering a capable, diverse, and compassionate future physician workforce. While medical school admissions policies and processes vary based on your institutional mission and goals, there are still many shared elements. This page provides a centralized repository of AAMC tools and resources to support you in designing your admissions process, evaluating applicants, finalizing your matriculating classes with the AAMC, and reviewing and refining your process after each cycle.

<출처: https://www.aamc.org/admissions/admissionslifecycle/>







학생선발과정에서 얻은 네러티브 정보가 문제행동을 예측한다 (Med Teach, 2016)

Narrative information obtained during student selection predicts problematic study behavior

MIRJAM G. A. OUDE EGBRINK & LAMBERT W. T. SCHUWIRTH

Maastricht University, The Netherlands





도입

Introduction


최근까지 초점은 cognitive academic performance 의 예측인자에 있었다. 그러나 이제 비인지적 quality도 미래 의과대학생과 의사로서 중요하다는 것이 명확하다.

Until recently, the focus has been primarily on predictors of cognitive academic perform- ance (Salvatori 2001; Siu & Reiter 2009). Nowadays, however, it is clear that, besides cognitive skills, non-cognitive qualities are important competencies of future medical students and doctors.


MMI가 사용되고 있음.

Recently, the so-called multiple mini-interview (MMI) show that multiple individual human judgments of non- cognitive skills when combined predict future performance in a sufficiently reliable way.


2007년 Maastricht University 의 P-CI 선발에 MMI를 사용하기 시작. 선발 과정에서 순위리스트가 나오는데, research master로서의 성공적인 수행 적합도를 예측에 대한 순위이다.

In 2007, the MMI method was introduced as part of the selection procedure for the four-year medical research master Physician-Clinical Investigator (P-CI) at Maastricht University (Guyaux et al. 2010). The selection procedure results in a ranking list, representing differences in predicted suitability to perform successfully in this research master.


대부분의 선발된 학생이 인지적 측면과 비인지적 측면 모두에서 성공적이지만, 일부는 문제행동을 보인다. 명확하게 이들 문제는 MMI점수에 의해서 예측되지 않으며 선발과정의 다른 부분에 의해서도 예측되지 못한다. 이론적으로 MMI 진행과정에서 면접관이 기록한 narrative information은 학생 파일에 저장되고, 이것이 미래 행동을 더 잘 예측해줄 수도 있다.

Although most selected students are successful in both cognitive and non-cognitive aspects of the study, some encounter professional lapses or problematic study behavior. Clearly, these problems were not predicted by the MMI scores or any other part of the selection procedure. Theoretically, the narrative information that is written down by the interviewers during the MMIs and stored in the student files could be a better predictor of such problems and could constitute a useful resource for the student mentors (called counselors in the P-CI master), but till now this information has been unused.



방법

Methods


 

맥락

Context


The four-year P-CI research master is a graduate-entry program that enables students to become medical doctor as well as clinical investigator. This combination makes it a challenging program for the students. Each year, a selection procedure decides which 30 students are allowed to enter this master.

  • They must have finished a biomedical bachelor with good results; GPAs as well as a cognitive test are taken into account in the first part of the selection procedure.

  • The second part consists of MMIs on different topics, such as motivation, past performance, empathy and communication skills. The applicants’ performances on each individual interview are graded independently by the interviewers as being ‘‘suffi- cient’’, ‘‘doubtful’’ or ‘‘insufficient’’, and the combination of all individual scores adds up to a ranking list. In each station, interviewers also make notes that are not used in the procedure itself; both notes and grading are completed in the time interval between individual interviews. The notes are stored for possible use in appeals, to underpin the inter- viewers’ judgments.


학생과 카운셀러(지도교수)

Students and counselors


In this study, we focused on students who enrolled into the n¼30) P-CI master in 2007 (cohort 2007; and 2008 (cohort 2008; n¼30). In this master, each student is assigned to a counselor at the start of the first year, who mentors the student on an individual basis throughout his/her study. Each counselor typically takes care of 3–8 students per cohort. Every year, student and counselor meet at least four times.


Seven counselors mentored the 60 students in cohorts 2007 and 2008 (five in cohort 2007 and six in cohort 2008; four of them were active in both cohorts). In the end, 54 out of 60 students have finished their study within four to five years, while one student is currently finishing the last part.


연구설계

Study design


This retrospective exploratory study was subdivided into three parts.

 

  • First, the seven counselors were asked to name the three most prevalent non-cognitive problems they encountered in ‘their’ students, and grade them (3-2-1) to indicate the graduate-entry (3 ¼most From their frequency of occurrence frequent). program that enables students to become medical doctor as reactions the two most highly-graded problems were selected well as clinical investigator. This combination makes it a for further analysis.

  • Second, two independent and blinded investigators (MoE and LS) analyzed the de-identified notes written down during the MMIs of 15 randomly chosen students out of the total of 55, and identified what they thought to be possible indicators for these two most frequent non-cognitive problems.

  • Third, a case-control study design was used. The coun- selors were asked to identify the students who exhibited either one or both of these non-cognitive problems during their study (cases). The notes of their MMIs were de-identified and screened by the same two independent and blinded investi- gators (MoE and LS) to investigate whether the proposed indicators of these problems were indeed present. As a control, the MMI notes of a similar number of control students from the same cohorts (without the identified non-cognitive problems) were screened for the presence of these indicators as well.



Results


두 가지 가장흔한 비인지적 문제

Part 1: The two most prevalent non-cognitive problems


계획 문제

Planning difficulties related to problems with

  • 시간 관리 time management,

  • 학습량의 과소추정 under- estimation of study load, and

  • 우선순위 배정 문제 problematic prioritizing of tasks.

 

자기성찰 문제

Self-reflection-related problems were addressed as

  • 자신의 행동의 결과에 대한 인식 부족 insufficient awareness of (the consequences of) own functioning,

  • 방어적 행동 indica- tions of defensive behavior, and

  • 개선을 위한 불충분한/비효과적 행동 insufficient or non-effective actions to improve this.

 

 


 

MMI노트에서 나타난 지표들

Part 2: Indicators in MMI notes


The narrative information that was written down during MMIs with 15 randomly chosen students was analyzed to investigate whether indications for the two most prevalent non-cognitive problems were already present during the selection procedure preceding the master.



In the MMI notes of five students both investigators found no indicators at all for the two non-cognitive problems. In the MMI notes of the other 10 students one or more potential indicators were found. In four of them potential indicators for both planning-related and self-reflection-related problems were present.



As a result of this analysis, a limited number of potential indicators for planning-related and self-reflection-related problems were identified (Table 2).

 

 


 


사례-대조군 연구

Part 3: Case-control study


Based on the above-mentioned findings, a case-control study was performed to investigate how predictive these indicators were for planning-related and/or reflection-related problems during the research master P-CI.


The seven counselors identified 23 students who exhibited prob-lems during their study  planning-related and/or reflection-related had (cases).

  • Thirteen students planning-related problems, while

  • six had reflection-related problems; another

  • four students showed problems in both domains.


Altogether, the data indicate a statistically-significant asso- ciation between the presence of indicators for planning-related problems in MMI notes and the actual occurrence of such problems during the subsequent study (Table 3A: odds ratio 9.33; 95% confidence interval 2.12–41.07; p ¼0.003). No such evidence was found for self-reflection-related problems (Table 3B: odds ratio 1.39; 95% confidence interval 0.29–6.68).

 

 


 

 

고찰

Discussion


보통 선발 단계는 누구를 선발하고 떨어뜨릴지 결정에만 사용된다. 이번 연구에서 선발단계에서 얻어진 정부를 미래의 문제행동을 예측하는데 사용하였다.

As a result, the selection proced- ure is merely used to decide on who is admitted and who is not. In the current study, we propose to use narrative information obtained during selection interviews to predict future problems


선발된 학생이 성공할 수 있도록 early and dedicated counseling and remediation을 가능하게 해줄 것이다. 선발은 단순히 assessment-of-learning이 아니라 assessment-for-learning의 역할을 할 것이다.

This may enable early and dedicated counseling and remediation to improve the selected students’ study success. This way, selection will not only serve as an assessment-of-learning measure but also as a first assessment-for-learning step (Shepard 2000; Schuwirth & Van der Vleuten 2011).


Counseling은 연구커리어의 초반부터 이뤄지는 것이 educational, therapeutic intervention을 가능하게 해줄 것이다. Unorganized한 학생은 사전에 정해진 시간표에 따라 학습이 이뤄지는 과정에서의 학업부담과 압박때문에 힘들어한다. 적성 외에도 시간관리와 우선순위 설정은 학업적 성취에 중요하다. Organized한 학습은 progress와 success 모두와 연결된다. 따라서 early and dedicated counseling은 계획-관련 학습문제를 예방하거나 없애줄 것이며, study success를 높여줄 것이다.

With the cur- rent knowledge, however, counseling can be more focused right from the beginning of a study career, enabling specific educational and even therapeutic interventions. Literature shows that unorganized students suffer most from workload and pressure of progressing in their studies according to a predetermined timetable (Ruohoniemi et al. 2010). More than aptitude, time management and prioritizing are important for academic achievement (West & Sadoski 2011). Organized studying appears to be related to both study progress and success (Rytkonen et al. 2012). Therefore, early and dedicated counseling will help to prevent or diminish planning-related study problems and, as a consequence, improve study success.


절절한 자기-성찰은 의료전문직에게 중요하다. 이것이 우리가 포트폴리오와 카운셀링 시스템에서 학생에게 자기-성찰의 중요성을 깨닫게 하고, 성찰 스킬 개발을 자극하는 것을 중요한 목표로 삼은 이유이다.

Adequate self-reflection is nowadays considered an essential attribute of competent healthcare professionals. This is why it is one of the important goals of our portfolio and counseling system to increase students’ awareness of the importance of self-reflection and to stimulate development of their reflective skills (Driessen et al. 2005).


선발에 들인 노력에도 불구하고 의과대학 기간에 낙제하거나 유급이 발생하는 것은 우려를 낳는다. personal distress로 힘들어 하는 학생도 걱정하고, 대학 역시 struggling student에 쏟는 시간과 에너지가 disproportionate하여 걱정하며, 사회도 이들 학생에게 들어가는 공적 자금의 부담 때문에 걱정한다.

Drop-out from or delay during medical school, in spite of selection efforts, is a cause for concern (Yates 2011; Stratton & Elam 2014). This is the case

  • for the students involved who suffer from personal distress,

  • for the university that is faced with a disproportionate amount of time and energy spent on struggling students, and

  • for society that has to bear the financial in burden for drop-out and delayed students countries where they receive public funding.


실제로, 선발자료의 사용 용도가 많아지는 것은 재정적 관점에서도 매력적이다. 네덜란드같이 교육이 공적 자금으로 이뤄지는 국가에서, delay 나 drop-out을 막는 것은 상당한 비용을 보상한다.

Indeed, the additional use of selection data is attractive from a financial perspective. In countries like the Netherlands, where education is publicly funded, the gains of avoiding delay or drop-out will compensate largely for the costs of a selection procedure and counseling system.


Siu E, Reiter HI. 2009. Overview: What’s worked and what hasn’t as a guide towards predictive admissions tool development. Adv Health Sci Educ Theory Pract 14:759–775.


Stratton TD, Elam CL. 2014. A holistic review of the medical school admission process: examining correlates of academic underperform- ance. Med Educ Online 19:22919.



 

 

 






 2016 Aug;38(8):844-9. doi: 10.3109/0142159X.2015.1132410. Epub 2016 Jan 25.

Narrative information obtained during student selection predicts problematic study behavior.

Author information

  • 1a Maastricht University , The Netherlands.

Abstract

INTRODUCTION:

Up to now, student selection for medical schools is merely used to decide which applicants will be admitted. We investigated whether narrative information obtained during multiple mini-interviews (MMIs) can also be used to predict problematicstudy behavior.

METHODS:

A retrospective exploratory study was performed on students who were selected into a four-year research master's program Physician-Clinical Investigator in 2007 and 2008 (n = 60). First, counselors were asked for the most prevalent non-cognitive problems among their students. Second, MMI notes were analyzed to identify potential indicators for these problems. Third, a case-control study was performed to investigate the association between students exhibiting the non-cognitive problems and the presence of indicators for these problems in their MMI notes.

RESULTS:

The most prevalent non-cognitive problems concerned planning and self-reflection. Potential indicators for these problems were identified in randomly chosen MMI notes. The case-control analysis demonstrated a significant association between indicators in the notes and actual planning problems (odds ratio: 9.33, p = 0.003). No such evidence was found for self-reflection-related problems (odds ratio: 1.39, p = 0.68).

CONCLUSIONS:

Narrative information obtained during MMIs contains predictive indicators for planning-related problems during study. This information would be useful for early identification of students-at-risk, which would enable focused counseling and interventions to improve their academic achievement.

PMID:
 
26805655
 
DOI:
 
10.3109/0142159X.2015.1132410
[PubMed - in process]


의과대학 Trainee선발에서 집단의사결정을 위한 새로운 방법(Med Educ, 2016)

A new method for group decision making and its application in medical trainee selection

James R Kiger & David J Annibale






도입

INTRODUCTION


의과대학이나 레지던트 프로그램에서 지원자를 선바하는 기준은 시험점수나 grade에 기반하고 있다. 그러나 많은 경우, 비록 이 숫자 점수의 합이 면접수행능력, 리더십, 기존 경험 등과 같이 정량화하기 어려운 것들보다 덜 중요한 것은 아니지만, 숫자 자료들은 combine된다. 결국, 모든 프로그램에서는 어떻게든 이 모든 정보를 '선호'의 순서로 단순화시킨 리스트로 승화시켜야 한다. 이 목표를 달성하기 위하여, 종종 pseudo-quantitative scoring systems 을 사용하나, 수학적으로 타당하지 못하고, counterproductive하다.

The criteria by which a medical school or residency training programme selects its preferred applicants may, in part, rely on test scores or grades. In almost every case, however, these numerical data are combined with, if not superseded by, considerations that are difficult to quantify, such as interview performance, leadership traits and prior experience. In the end, every schoolor programme must find a way of distilling all this information into a simple list of applicants in order of preference. To achieve this goal, groups often rely on pseudo-quantitative scoring systems that are mathematically unsound and may be counterpro- ductive to the collaborative process of making a list. 


우리의 전공 수련 프로그램은 NRMP를 사용한다. NRMP는 1952년 도입되었는데, 이 당시에는 의과대학생과 레지던트 프로그램에서 혼란과 불만이 늘어나던 시기였다. 중앙화된 기구가 모든 의과대학졸업생을 available residency spot에 배정하는 역할을 맡게 되었다. NRMP 시스템은 60년간 그 자리를 지켜왔고, 더 많은 전공, 세부전공까지 확장되었다.

Our subspecialty training programme uses the National Resident Matching Program (NRMP) for applicant selection. The NRMP was formed in 1952 in response to escalating confusion and exas- peration on the part of medical students and resi- dency programmes. This centralised body assumed the task of sorting all of the nation’s graduating medical students into available residency spots.2 The NRMP system has stood relatively unchanged for more than 60 years, and has expanded to cover more specialties and subspecialties.


 

지원자와 훈련프로그램은 NRMP에 각자 자기의 입장에서의 순위를 제출한다. NRMP는 'deferred acceptance'알고리즘을 사용하여 지원자를 안정적이고 최적의 결과를 얻을 수 있게 sort해준다. 지원자에게 있어서 순위를 매기는 것은 부담이 크지만 근본적으로 개인적인 문제이다. 훈련프로그램 입장에서 순위를 정하는 것은 더 복잡하다. 어떻게 정량적 자료를 질적 특성과 통합할지를 결정해야 하고, 다수의 면접관에게 받은 주관적 정보를 최종 순위 정보로 만들지 고민해야 한다. 이 단계에서 발생하는 부정확성은 여러 문헌에서 밝혀진 바 있다

Applicants and training programmes both submit rank-order lists to the NRMP, which employs a ‘deferred acceptance’ algorithm to sort the appli- cants into training positions such that stable and optimal results are achieved.2,3 For applicants, creat-ing a rank order may be taxing, but is a fundamen- tally personal matter. For training programmes, generating a rank-order list may be significantly more complicated. Each programme must decide how to integrate objective quantitative data (test scores, grades, etc.) with qualitative characteristics (volunteer work, written statements, etc.) and the subjective opinions of multiple interviewers into a final rank-order list. The imprecision of this process is highlighted by published reports that have demonstrated the lack of correlation between  information gathered during the interview process, the position of applicants on a programme’s rank- order list, and future resident performance.4–8



ERAS는 AAMC가 제공하는 순위 산정을 위한 pseudo- quantitative method 이다. 면접관은 지원자를 리커트-타입 평가 스케일에 배정하고(1~9), 지원자에 대한 평균점수가 예비적 순위를 만들어준다. ERAS시스템은 리커트 스케일 기반 시스템의 한 예이다.

The Electronic Residency Application Service (ERAS), provided by the Association of American Medical Colleges (AAMC), incorporates a pseudo- quantitative method to generate a rank-order list. Interviewers assign applicants scores on a Likert-typerating scale (integers of 1–9), and averaged scores for applicants are sorted to create a preliminary rank-order list. This ERAS sys- tem is simply one example of a Likert scale-based system,



이러한 Pseudo-quantitative methods 는 몇 가지 근본적 문제가 있다.

Pseudo-quantitative methods such as this are beset by a number of fundamental problems:


  • 1 면접관마다 분포가 다름.
    the scores assigned by different interviewers are differently distributed;

  • 2 면접관에게 '숫자'의 의미가 일관되지 않음
    numeric scores have no consistent meaning for interviewers (e.g. an interviewer who gives con- sistently lower scores may view a score of 7 points as signifying an excellent candidate, whereas another interviewer may view the same score as indicating an average candidate);

  • 3 임의적 스케일의 순위자료이다. arithmetic operation에 부적절하다.
    Likert scale-type scores are ordinal data on an arbitrary scale; it is inappropriate to perform arithmetic operations, such as the calculation of means, on such data,9–11 and

  • 4 지원자는 일부 교수에 의해서만 면접을 하게 되고, 교수도 일부 지원자만 면접한다.
    candidates are interviewed only by a subset of faculty staff, and each faculty member may interview only a subset of candidates. Any partic- ular candidate’s final score may be altered sub- stantially by the inclusion or exclusion of an interviewer who gives consistently high or low scores.


이러한 문제로, 우리는 ERAS에서 만들어준 순위를 그룹토의를 거쳐 재평가한 뒤 NRMP에 제출한다. 토론과정에서 점수는 '집단 의견'에 맞게 조정되어 순위를 재조정한다. 물론, 이러한 집단 토의도 목소리가 큰 소수의 영향을 받을 수 밖에 없고, 참여못한 사람의 의견은 토론에서 배제된다.

Given these problems, our programme has had to re-evaluate the preliminary ERAS-generated rank- order list in group discussions prior to submission to the NRMP. During such discussions, scores are modified to force the rank list to conform to the ‘group opinion’. Of course, this group opinion may be unduly influenced by a vocal minority, and those who are unable to attend are left out of the discussion.


rank-ordering process 향상을 위한 수학적 노력이 있어왔다.

Others have suggested different mathematical meth-ods to improve the rank-ordering process.

  • One approach is to have interviewers compile individually ordered preference lists of applicants, instead of assigning scores. Both Chew et al. and Collins et al. suggest applying a formula to individ- ual rank lists to create scores that can then be aver- aged.12,13

  • These systems resemble the Borda voting system in which each voter gives each candidate a number of points proportional to that candidate’s place on the voter’s list.14 These systems are ham- pered by the fact that the score derived from any given voter is dependent on the number of candi- dates seen by that voter.

  • A recent article by Ross and Moore suggests retaining scores, but comparing candidates pairwise and assigning a ‘win percentage’to each in a system similar to that used in sports ranking.15



우리는 몇 가지 설계원칙을 정했다.

We proposed a set of design principles to which an optimal system should adhere:


  • 1 the opinions of all interviewers will carry equal weight;

  • 2 the rank-order list will not be influenced by which interviewers meet any individual candi- date;

  • 3 interviewers will compare only applicants whom they have met;
  • 4 the system will not depend on scores assigned on an arbitrary scale, and
  • 5 the final ordering will be transparent and repro- ducible.



METHODS


알고리즘 개발

Algorithm development


We developed an algorithm termed ‘collab-orative unbiased rank list integration’ (CURLI) 


네 단계로 이뤄짐

The CURLI algorithm involves four steps:


  • 1 each interviewer submits a personal ranked pref- erence list of the applicants he or she has met or reviewed;

  • 2 each personal rank-order list is used to generate a pairwise preference table of applicants;
  • 3 the individual preference tables are summed to generate a composite preference table, and

  • 4 a sorting algorithm is applied to the composite preference table to generate a final rank-order list.


기본적인 결과는 이렇다. 만약 지원자 A와 B가 모두 일부 교수에 의해서만 면접을 봤다면, 그리고 A가 B보다 더 많은 면접관들에게 선호된다면, A는 선호도 리스트에서 더 높은 순위를 받는다. 이는 얼마나 많은 인터뷰를 했는지, 몇 명의 교수가 했는지, 어떤 배점 bias가 있는지에 무관하다.

The fundamental result of the CURLI algorithm is as follows: if applicants A and B are both inter- viewed by a subset of faculty members, and candi- date A is preferred to candidate B by a majority of those interviewers, then candidate A will appear higher on the final preference list. This is unaf- fected by how many interviews any specific faculty member conducts or any individual scoring biases.


개별 순위 리스트

Personal rank-order lists


The fundamental change for interviewers is that instead of scoring applicants on an arbitrary scale, they are asked to maintain a personal ranked prefer- ence list of the applicants they have interviewed. Interviewers include only applicants they have met, conforming to design principles 2 and 3 above. Interviewers no longer assign arbitrary scores, removing the undue influence exerted by interview- ers who give consistently high or low scores, satisfy- ing principles 1 and 4.


짝지은 순위 표

Pairwise preference tables


지원자 선호가 더 높으면 상대비교에서 1 입력

Each interviewer’s ranked preference list is converted to a preference table, which is populated by the numbers 1 or 1 depending upon which applicant appears higher on that preference list. No values are assigned to applicants the interviewer did not meet. A preference list implies a comparison between all possible pairs of applicants on that list. Applicants appearing higher on the rank-order list are preferred to all applicants ranked below them. Therefore, a rank-order list of size n contains (n 9 [n 1])/2 pairwise comparisons between applicants.



4명의 지원자 A B C D중, C는 면접을 못 보고, 나머지 셋의 순위는 B D A 순서인 경우

For example, imagine there are four applicants: A, B, C and D. An interviewer meets all but applicant C, and submits the following rank-order list: B–D–A.


Table 1 shows the preference table generated from this list.



혼합 순위 표

Composite preference table

A composite preference table is computed simply by adding all of the individual preference tables.


For example, four interviewers (I, II, III and IV) provide the following rank lists for four applicants:


  • Interviewer I: B–D–A; 

  • Interviewer II: C–B–A–D; 

  • Interviewer III: B–C–D–A, and

  • Interviewer IV: C–D–B.


Table 2 shows the resulting four individual prefer- ence tables. Table 3 shows the composite preference table yielded by the sum for each cell.

 

 


 

 

배열

Sorting


modified bubble-sort algorithm 를 사용하여 composite table을 만들었음.

A sorting algorithm is applied to the composite preference table to obtain the final rank-order list. For our programme, we applied a modified bubble-sort algorithm to the composite table.16 An initial unsorted list is generated. Each applicant is compared with the applicant immediately below on the rank list by checking the corresponding value inthe composite preference table. If the lower-ranked applicant is preferred (i.e. the value in the cell is > 0), the order of the two applicants is swapped. This is continued until no more pairs of applicants are swapped. In the ideal scenario, the re-sorted list will yield a composite preference table with all nega-tive values in the upper triangle. 


Re-sorting하면 Table 4가 됨

For our example, the final sorted rank list is: C–B– D–A. Re-sorting the preference table to reflect this order gives a matrix with a fully negative upper tri- angle which indicates that every applicant is pre- ferred by a majority of interviewers to all the applicants below them on the list (Table 4).



Borda voting scheme으로 같은 것을 한다고 했을 때, 각 지원자가 획득 점수 기준으로 나열했을 때 두 명이 C를 더 선호했음에도 B가 가장 높을 수도 있다.

If one imagines running the same example with a Borda voting scheme, for instance, in which each applicant is awarded points based on his or her position on each list, it is possible that applicant B may have been ranked highest, although two of the three interviewers who directly compared applicants B and C preferred applicant C.

 

 



방법론

Methodology



We implemented this new ranking algorithm during the 2013 neonatal-perinatal fellowship match. All faculty members and fellows were instructed to maintain a personal ranked preference list of the applicants they interviewed. They were also asked to assign a score of 1–9 to each participant as had been done in previous years, as per the ERAS sys- tem. These ‘shadow’ scores were used to compare the outcome of the CURLI algorithm with the results that would have been generated by the old Likert scale-based method.



결과

RESULTS


During the trial year 14 applicants were interviewed, and 19 faculty members and fellows served as inter- viewers. Figure 1 shows the minimum, maximum, median and interquartile ranges for the scores assigned by each individual interviewer.

 

 


 

평가자들은 점수 범위의 일부만 사용하였고 86%는 6점 이상이었다.

On average, each interviewer scored nine applicants. All inter- viewers utilised a truncated part of the scoring range at the top of the scale. Of 162 total scores assigned, 139 (86%) were ≥ 6. The median score assigned by each interviewer ranged between 6 and 8.


개별 면접관마다 discordance가 있었다. 총 162개의 점수를 주었는데, 그 중 23개는 자신이 매긴 순위와 점수의 순위가 달랐다. 

We observed discordance between individual inter- viewers’ assigned scores and their final assessments of an applicant’s desirability. Collectively, the inter- viewers assigned a total of 162 scores, 23 (14%) of which were out of order in relation to the rank- order list of the interviewer who had given them.


 new CURLI algorithm에 따라서 14명의 지원자 중 9명이 서로 다른 ranking list에 assign됨.

by the new CURLI algorithm. Of the 14 applicants, nine would have been assigned to a dif- ferent place on the final ranking list.

 

 

지난 3년간, 우리 분과는 2시간씩 2번의 미팅을 해서 preliminary list를 조정했는데, 이번에는 1시간만 걸렸다. 순위가 달라진 지원자는 없었다.

In the prior 3 years, our division had scheduled two 2-hour meetings to discuss and modify the prelimi- nary rank-order list. In this trial year, we required only a single 1-hour meeting to achieve consensus. No candidates were moved as a result of that discus- sion. Figure 2 shows the relationships between the preliminary rank-order list and the final rank-order list for 2013 and the prior 2 years. The changes reflect the alterations made during the divisional meeting. In 2011 and 2012, the positions of nine of 14 applicants, and 13 of 16 applicants, respectively, were moved on the final list.

 

 


 

 

고찰

DISCUSSION


행정적 관점에서 미팅이 4시간에서 1시간으로 줄었고, 순위의 변화가 없었다. composite preferene table을 공개하여 투명성을 확보하였다.

From an administrative perspective, the new method reduced meeting time from 4 hours to 1 hour, dur- ing which no changes were made to the rank-order list. During that meeting the composite preference table was displayed, providing complete trans- parency.


CURLI algorithm 는 몇 가지 장점이 있다. 재생산가능하고 투명하다. 지원자의 순위를 바꾸려는 소수의 압력을 극복할 수 있다. 면접관의 intrinsic difference에 의한 불공평함을 줄일 수 있다.

We suggest that our CURLI algorithm has numer- ous theoretical benefits that are borne out in prac- tice. It is reproducible and transparent. There is reduced vulnerability to pressure from a minority of participants to change a candidate’s rank position, and the inequality imposed by intrinsic differences in scoring among interviewers is removed.


CURLI algorithm 는 확실한 장점이 있다. Borda voting scheme과 유사한 방법들에서 지원자는 '점수'로 평가받거나 랭킹을 평균낸다.

Compared with other options that have been pro- posed, we feel that the CURLI method offers clear advantages. Borda voting schemes, and similar methods, introduce a process whereby applicants receive points for their place on each list, or in which the rank number on each list is averaged.12–14


이러한 방법은 모든 면접관이 모든 지원자를 면접할 경우에는 만족스러운 결과를 줄지도 모르나, 각 면접관이 일부 지원자만 면접할 경우 문제가 될 수 있다. 예컨대 일부 지원자만 면접했는데, 이들이 모두 least desirable한 지원자들일 수도 있다. 이 경우 Borda-like 방법에서는 이 지원자들 중 순위가 높은 사람은 엄청난 이득을 보는 셈이다. CURLI에서는 상대적 비교만 하기 때문에 그러한 문제가 없다.

These methods may yield satisfactory results if all interviewers see all applicants (i.e. every individual preference list is full), but in cases like ours in which each interviewer sees only a subset of appli- cants, these methods are problematic and allow bias. Take, for example, an interviewer who inter- views only a few applicants, all of whom happen to be among the least desirable. Under the Borda-like methods, the top-ranked applicant on this list will obtain a huge advantage in points or rank, even though that applicant may actually not be desirable compared with all the other applicants that particu- lar interviewer did not see. As the CURLI method uses the rank lists only to make pairwise compar- isons between applicants the interviewer actually saw, it suffers no such bias.


다른 pairwise 비교법도 있지만 CURLI보다 덜 투명하고 더 쓰기 힘들다. 대부분의 면접관은 심지어 내적일관성조차 유지하기 힘들다. CURLI는 arbitrary score의 가능성을 완전이 없앤다.

Other pairwise comparison methods have been proposed, but we feel they are less transparent and more cumbersome than our CURLI method.15 As our case study highlights, the majority of interviewers failed to maintain even internal consistency in their score assignment during one interview season. The CURLI method we have described dispenses with arbitrary scores entirely.



지식점수, 임상추론점수, SCT등에서 사용 가능할 것이다.

We believe this method may find fur- ther application in medical training in the scoring of knowledge or clinical reasoning assessment tools, such as script concordance testing.17


 





 2016 Oct;50(10):1045-53. doi: 10.1111/medu.13112.

new method for group decision making and its application in medical trainee selection.

Author information

  • 1Department of Pediatrics, Medical University of South Carolina, Charleston, South Carolina, USA. kiger@musc.edu.
  • 2Department of Pediatrics, Medical University of South Carolina, Charleston, South Carolina, USA.

Abstract

CONTEXT:

The problems associated with generating a collaborative ranked preference list represent a common source of dilemma in academic medicine and medical education. Such issues present during the process of choosing among applicants to medical schools, during the selection of postgraduate trainees, and in the course of performance assessments and the prioritising of financial expenditures. Currently, most institutions use pseudo-quantitative methods, such as the averaging of scores awarded on an arbitrary scale. These methods are mathematically problematic and may not accurately reflect group opinion.

METHODS:

The present authors developed a novel algorithm for creating a collaborative preference list that generates and sorts a matrix of pairwise comparisons between applicants or choices without placing any reliance on arbitrary Likert scale-type scores. This method achieves equality in influence across individual assessors, as well as transparency and reproducibility. The authors report a case study of their experience using this new algorithm in the 2013 neonatal-perinatal fellowship match.

RESULTS:

When used by this group in the selection of fellowship trainees, the method proposed here allowed for greater efficiency and created a rank-order list that did not require reshuffling or significant debate. A survey of faculty staff and fellows showed much higher levels of satisfaction with the new algorithm and a unanimous desire to use the new algorithm in the future, in preference to a score-based system.

CONCLUSIONS:

The algorithm developed and described here may reduce arbitrariness in processes that require the collaborative creation of a preference list. This method may have wide applicability in medical education and training, and beyond. The present authors' experience of using this algorithm during the National Resident Matching Program match showed improved perceptions of fairness, ease of use and efficiency.

PMID:
 
27628721
 
DOI:
 
10.1111/medu.13112
[PubMed - in process]


썪은 사과 골라내기 (Adv in Health Sci Educ, 2015)

Identifying the bad apples


Geoff Norman




35년 전, 두 명의 사회심리학자가 "Human Inference"라는 책을 썼다. 그 책에서 어떻게 인간이 판단과 행동이 다양한 맥락적 변인들에 얼마나 취약한지를 보여주었다. 그 중 하나는 "vividness hypothesis"인데, 단 하나의 생생한 경험이 아주 명맥한 통계적 근거에도 불구하고 사회적 태도에 영향을 준다는 것이다.

Thirty-five years ago, two social psychologists, Richard Nisbett and Lee Ross, wrote a classic book called ‘‘Human Inference: Strategies and Shortcomings of Social Judgment’’ (1980). In that book, they demonstrated how human judgments and actions are vulnerable to many contextual variables. One particular shortcoming they labeled the ‘‘vividness hypothesis’’—A single vivid instance can influence social attitudes when pallid statistics of far greater evidential value do not’ (p. 57).


심리학적 편견에 대한 근거는 넘쳐난다.

Evidence of this psychological bias abounds.

  • 모든 developed countries에서 범죄율은 25년간 지속적으로 감소중이다.
    Politicians continue to garner votes claiming they are ‘‘tough on crime’’ despite the fact that crime rates have been steadily declining in all developed countries for 25 years; as one example of many, homicide rates in Canada are half what they were in 1999.

  • 비행기 사고는 얼마나 심각한 것일까? 1970년에 비하면 1/3밖에 안된다. 비행거리가 5배나 늘었음에도 말이다.
    How bad was 2014 for air crashes? Remember MH370 and MH 17? In fact, the number of civil aviation crashes was the lowest on record, and about 1/3 of what it was in 1970, despite a five-fold increase in passenger miles flown.

  • 지하드에게 살해당하는 사람은 얼마나 될까? Violent death rate는 2000년 이후 계속 감소중이다.
    What about all those people killed by jihadists? Violent death rates have been on a steady decline for millennia (Pinker 2011).


Dr. Harold Shipman이라는 영국의 GP사건. 다시는 이런일이 일어나지 않게 교육프로세스를 개혁하라는 요구가 이어졌다. 그 이후 '비인지적', 특히 프로페셔널리즘에 대한 관심이 높아짐을 보고 있다.

Dr. Harold Shipman, a British GP who is esti- mated to have killed 250 of his patients and was eventually convicted of 15 murders. The publicity surrounding his trial and conviction led to calls to reform the educational process so that such things do not happen again (Powis 2015). In particular, we have seen increased focus on ‘‘non-cognitive’’ factors, particularly professionalism.


van Mook은 dyscompetent 레지던트를 어떻게 찾아내고 교정할 것인가와 관련된 몇 가지 이슈를 짚어보았다. 연구에 따르면 진료상황에서의 unprofessional behavior는 의과대학의 performance로 부터 예측가능하다.

In a review article, van Mook et al. (2014) examines the multiple issues related to the identification and remediation of ‘‘dyscompetent’’ residents, particularly in the area of professionalism. And an original study by Santen et al. (2014) extends the findings of two landmark studies by Papadakis et al. (2004, 2008), which showed that unprofessional behavior in practice was apparently predictable from performance in medical school. Both studies used a ‘‘case control’’ design


Saten은 일상적인 진급위원회에서 제기되는 학생평가자료로부터 위의 결과를 replicate and extend하였다.

The Santen et al. (2014) study in this issue replicated and extends these findings by examining the routine student assessments arising from promotion committees, instead of creating a system geared to identifying professionalism issues.


두 연구 모두 case-control 연구로서, 이러한 연구는 관심의 대상이 되는 결과(암 발생, 죽음)가 infrequent한 것일 때 흔히 사용된다. Papadakis 연구에서는 6330명의 졸업생 중 70명이 캘리포니마 stae board에 의해서 제제disciplined를 받았고, 1.1%의 prevalence를 보여준다. 즉 6260명은 그런 일이 없었다.

Both studies use a case–control design. Case–control studies are frequently used when the outcome of interest, such as developing cancer or dying, is infrequent. This design certainly applies in the Papadakis study of 6330 graduates from UCSF over the time interval, 70 were disciplined by the California state board, a prevalence of about 1.1 %. And, of course, 6260 were not.


그리고 여기에 핵심이 있다. 우리가 찾으려는 unprofessionalism이라는 '질병'은 유병률이 매우 낮다. 이러한 상황에서는 매우 좋은 진단도구라도 진짜 양성인 사례조차 위양성으로 가려진다. 위의 70명중 38%(27명)만을 가려낼 수 있을 뿐인데, 이것을 가려내기 위해서 1190명의 다른 졸업생이 unprofessional한 것으로 잘 못 label될 수 있다. PPV는 27/(1190+27)로 2.2%에 불과하다. Saten의 연구에서도 2000명 이상의 졸업생 중 140명만이 의과대학에서 poor performance가 있었다.

And there’s the rub. The disease we’re screening for— documented unprofessionalism—has a very low prevalence. Under these circumstances, even very good diagnostic tests result in true positive cases that are swamped by false positives. Working through this example, if we used documented concerns as a medical student as a screening test to decide if a graduate should be allowed to proceed, we would detect 38 % of the bad apples or 27; but we would incorrectly label .19 9 6260 = 1190 other graduates as unprofessional. The positive predictive value of the test is 27/ (1190 ? 27) = 2.2 %. Similar data arise in the Santen study, where review of 20 years’ data, involving over 2000 graduates, showed that 140 had poor performance in school, and only 29 were subsequently sanctioned by the state medical board.


따라서 Papadakis의 연구에서 졸업생 100명당 2명만이 state board에 보고되는 것에 그치고 만다.

So in the Papadakis study, for every 100 students who would have been denied grad- uation, if they had proceeded to implement a policy based on documented concerns, only two would end up reported to the State board.


 

여기에 경제적 논리를 더하면, 한 의사를 양성하는데 매년 10만달러가 필요할 때, 40만달러x98명 = 약 4천만달러의 사회적 비용이 들어간다는 것을 의미한다. 왜냐하면 이 98명의 학생들은 satisfactory 하게 행동했음에도 unsatisfactory한 것으로 적발되어 졸업하지 못하기 때문이다.

If you want to put an economic spin on it, if it costs $100,000/year to educate a doctor, that policy would result in a social cost of $400,000 9 98 = $40 million of education costs based on the number of satisfactory students who had an unsatisfactory and then could not graduate, without even considering lost income in practice.


그러나 unprofessional behavior는 하루아침에 생기는 것이 아니며 입학 시점부터 발견가능할 수 있다. 이것이 성격검사를 입학 때 사용하자는 Powis 등의 주장이기도 하다.

But if these unprofessional behaviours are longstanding, perhaps they are detectable at the time of admissions. This is the promise held out by Powis, who has argued repeatedly for the more widespread use of personality tests at admissions (2003, 2009, 2015).


실제로, 이러한 정책의 유용성에 대한 근거가 있다. Papadakis는 2007년의 연구에서  CPI 성격검사 결과를 활용하여 평균점수에 차이가 있음을 보여주었고, 2.1SD의 차이가 있었다. 여기까지는 좋다.

In fact, there is some useful evidence to inform this policy. Papadakis, in another published study (2007), looked at performance on a personality test (the California Psy- chological Inventory), using a subsample from her earlier study that had undergone the psychological testing as part of admissions. The sample was 19 cases (fromthe original 70) who had difficulty with the state board, and 26 controls (of 196 sampled from 6260), all of whom who had taken the CPI as part of admission to medical school. For the total score, the mean of the cases was 156 (SD = 14.7); for the controls 181 (SD = 11.7). This means that the case mean was 25/11.7 = 2.1 SDs below the control mean. So far so good.


우리가 이 점수를 선발에 사용한다고 상상해보자. 비유을 따져볼 수 있을 것이다. 궁극적으로 문제를 일으킬 사람은 70명이고, 6260명은 그러하지 않다.

Now let us imagine using these data for selection, by establishing a threshold score that students must attain to be considered for admission—a policy directly advocated by Powis (2015). We can look at the proportion of each group who are accepted or rejected, keeping in mind that our real denominator is 70 cases who will eventually get in trouble with the state board, and 6260 controls who won’t.


156명의 "case"를 가지고 threshold를 정하면 50%(35명)의 case는 놓칠 것이다. 그리고 -2.1SD로 설정한다고 할 때 "control"에서 2%를 false label하는데, 이 숫자가 125명이다. 즉 125/(125+35),, 즉 81%가 실제로는 문제가 없다.

If we were to set the threshold at 156, the ‘‘case’’ mean, then we’ll miss 50 % of the cases, 35. And this is a z score of -2.1 for the controls, so we’ll falsely label 2 % of the controls, 125, as unprofessional. And 81 % (125/(125 ? 35)) of the people we have la- beled would not have any problems in practice.


분명히, Case 중 50%만 탐지해낼 수 있는 검사는 문제가 많다. 그러면 sensitivity를 90%로 올려보자. 그러면 70명 중 63명을 잡아내지만, Control 중 1315명이 같이 적발된다. 즉 '잡힌' 사람 중 95%는 나중에 문제가 없다.

Clearly, a test that only detects 50 %of the cases is of little value. So let’s rack it up to a sensitivity of 90 %; which is a Z value on the ‘‘Cases’’ distribution of 1.28. We will detect 63 of 70 cases. That means the threshold, in Z units on the ‘‘control’’ distribution is (-2.1 ? 1.28) =-0.82, which equates to 21 % of the Control distribution below the threshold, or 1315. In short, similar to the previous calculation, 1315/(1315 ? 63) = 95 % of the individuals identified by a low score on the psychological test would not have any further problems in practice.

 


 

명확하게,  성격검사를 활용해서 궁극적으로 주정부에서 제제를 받을 사람을 탐지해내는 것은 심각한 비용을 치른다.

Clearly, any attempt to identify individuals who will be eventually subject to report to the State disciplinary board using personality tests comes at a serious cost in terms of denying access to many who would not have problems.


이러한 접근의 전제는 '인지적 척도'만으로는 나중에 문제를 일으킬 사람을 찾기에 불충분하다는 것이다. 그러나 정말 그러한가?

The underlying premise of this approach is that cognitive measures are inadequate to identify individuals who will become problems in practice. But is this necessarily the case?


Tamblyn 등은 MCCQE의 타당도를 연구하였다. MCCQE는 두 파트로 되어있다. 지필고사와 OSCE. 의사소통의 complaints를 예측하는데 있어서, OSCE의 하위 1/4의 RR은 1.43이었다. 지필고사는 1.34였다. quality of care에 대한 complaints를 예측하는데 있어서 RR은 의사소통점수에서 1.38, 지필고사는 1.54였다. 따라서 얼마나 폄하되든지간에, 인지적 척도는 practice performance의 중요한 예측요인이다. 동일한 결론이 Teherani 등의 연구에서도 드러나는데, 이들은 레지던트의 졸업후 퍼포먼스로 나중의 displinary action를 예측가능한지 보았다. '퍼포먼스'는 두 가지로 보았는데 하나는 ABIM의 in-training 평가, 다른 하나는 ABIM 인증시험. Discipline charge에 대한 hazard ratio는 1.9정도로 매우 인상적이었다. ABIM 인증시험은 1.7정도였다. 그리고 여기에서도 prevalence는 1% 정도였다.

Tamblyn et al. (2007) has studied the validity of the Medical Council of Canada Qualifying Examination in predicting complaints (quality of care and communication skills) to provincial licensing bodies. The MCCQE examination has two parts—a written ex- amination, primarily multiple choice completed at graduation, and an OSCE completed 1 year later. In terms of predicting communication complaints, the relative risk of a complaint for a communication skill performance in the bottom quartile of the OSCE was 1.43; for the written exam score was 1.34. For quality of care complaints, the relative risks were 1.38 for communication skills and 1.54 for the written test. (Relative risks for the data gathering and problem solving parts of the OSCE ranged from .97 to 1.13, predicting nothing). So it appears that, however much they are disparaged, cognitive measures of performance are an important predictor of practice performance. The same conclusion came from a study by Teherani et al. (2005), who looked at postgraduate performance of residents as a predictor of disciplinary action in practice. Performance was measured two ways: by American Board of Internal Medicine in-training evaluations, and by the ABIM certification examination. Again, the hazard ratio in predicting discipline charges looked impressive—about 1.9. However, the ABIM certification examination was not far behind at 1.7. And as before, with a prevalence of disciplinary action of about 1 %in this sample, the results do not support the use of either measure as a ‘‘diagnostic test’’.


"비인지적" 혹은 성격을 입학 때 평가하는 것을 논할 때 또 다른 가정 중 하나는 either-or 가설이다. 입학위원회가 '인성이 좋은' 지원자와 '학업능력이 좋은' 지원자 중에서 선택을 내리는 Faustian 선택을 해야 한다는 assumption이다. 그러나 러한 선택은 성격과 학업능력에 negative correlation이 있을 때에만 적용되는 선택이다.

One other assumption pervades discussion of assessing ‘‘non-cognitive’’ or personality at admissions—the ‘‘either-or’’ hypothesis. It is presumed that the admissions committee must make a Faustian choice between selecting someone who is personable, professional and compassionate, or someone who is academically top-tier. Such a choice would only be necessary if there were a strong negative correlation between personal qualities and aca- demic performance. But is there?


한 연구에서는 고등학교 성적과 면접 성적간 negative association을 보여주었다. 그러나 최근의 MMI연구를 보면, 하나는 -0.21, 다른 하나는 0.07이다. 진실이 어디 있든지 학업적 수월성과 대인관계기술 모두를 가지고 선발하는 것에 문제는 없어 보인다. 더 나아가서 성격검사의 현재 대표격인 Neo-5성격검사에서 다른 척도와 일관된 관계를 보이는 것은 conscientiousness와 성적의 moderate positive relationship 뿐이다.

One study (Powis and Bristow 1997) showed a significant negative association between scores on a personal interview and high school grades. However, two more recent studies examined the relation between the MMI (a well-validated measure of non-cognitive skills) and university GPA. In the first study (Eva et al. 2004) the correlation was -0.21; in the second (Kulasegaram et al. 2010) the correlation was ?.07. Wherever the true correlation lies, it would appear that there should be no problem identifying students who have both academic excellence and interpersonal skills. Moreover, when one examines the constructs measured by the current stte of the art personality test, the Neo-5 personality test, about the only consistent relationship with other measures that has emerged is a moderate positive relationship between conscientiousness and grades (Kulasegaram et al. 2010).


입학전략을 academic and interpersonal measures 모두를 활용하는 것은 완벽하게 적절하다. 그러나 둘 중 하나를 선택하게 강요하는 것은 부적절하다. 또한 'unprofessionalism'이라는 희귀질환을 진단해내는 검사를 만들 수 있을 것이라는 기대는 그릇된 것이다.

It is perfectly appropriate to devise admissions strategies, in-course performance indi-ces, and certification procedures that include both academic and interpersonal measures. It is not appropriate to force a choice between one and the other. And it is folly to presume that we will ever be able to create an adequate diagnostic test to the ultimately rare disease of unprofessionalism. 



 



Powis, D. (2015). Selecting medical students: An unresolved challenge. Medical Teacher, 37, 252–260.



 2015 May;20(2):299-303. doi: 10.1007/s10459-015-9598-9.

Identifying the bad apples.

Author information

  • 1McMaster University, Hamilton, ON, Canada, norman@mcmaster.ca.
[PubMed - indexed for MEDLINE]


MMI점수 타당화: 다양한 기질을 측정하는가? (Adv in Health Sci Educ, 2014)

Validating MMI scores: are we measuring multiple attributes?

Tom Oliver • Kent Hecker • Peter A. Hausdorf • Peter Conlon






도입

Introduction


MMI는 지원자의 비인지적 특성을 평가하기 위한 면접방법이다. 전통적인 면접(덜 구조화된)은 신뢰도와 타당도가 낮다고 보고되어 왔으며, MMI는 신뢰도가 충분히 높고, 의과대학 및 면허시험 수행능력과 유의한 상관관계가 있다. 추가적으로, MMI총저은 GPA와 같은 인지적 능력과 discriminant validity가 있어서, MMI가 뭔가 다른 것을 측정한다고 볼 수 있다.

The multiple mini-interview (MMI) is an interview method used in health professional school selection to assess the non-cognitive attributes of applicants (Eva et al. 2004). Whereas more traditional—and often less-structured—interviews have been found to have poor reliability and validity in health professional school selection (Kreiter et al. 2004; Eva et al. 2004; Albanese et al. 2003; Edwards et al. 1990), previous studies have found MMI scores to have sufficient reliability and to be significantly correlated to performance in school and licensure exams (Eva et al. 2009, 2012; Hecker and Violato 2011; Reiter et al. 2007). In addition, there is consistent evidence for the discriminant validity of total MMI scores from ratings of cognitive skill such as incoming grade point average (GPA; Eva et al. 2004, 2009, 2012; Reiter et al. 2007), which suggests that the MMI is measuring something other than cognitive skill.


MMI로 평가하는 비인지적 특성

Non-cognitive attributes assessed by the MMI


비인지적 특성에는 다양한 것들이 포함된다. MMI는 평가자가 한 스테이션 내에서 그리고 여러 스테이션을 거치며 지원자로부터 다양한 구인을 평가하게끔 설계되어 있다. 그러나 동일한 스테이션 내에서 평가한 서로 다른 종류의 비인지적 특성은 서로 상관관계가 매우 높은 것으로 보고되고 있어서, MMI점수는 흔히 각각의 구인에 기반한 점수가 아니라 총점을 활용하는 것이 일반적이다.

Non-cognitive attributes can include a variety of individual differences related to attitudes, personality traits, and motivations (Schmitt et al. 2009). MMIs have been designed to have raters assess candidates on multiple constructs (e.g. oral communication and moral rea- soning) both within and across interview stations. However, MMI measures of different non-cognitive attribute constructs assessed within the same station have been found to be highly correlated (Eva et al. 2004; Lemay et al. 2007; Roberts et al. 2009). As a result, it is common practice to report total scores (i.e. the average score across all measures) within each station instead of construct-based scores.


MMI가 측정할 수 있는 것에는 구두의사소통Oral Communication (OrCo)와 문제해결Problem Evaluation, PrEv가 있다. OrCo는 다양한 언어 메시지를 구조적으로 전달할 수 있는 능력이며, PrEv는 문제를 찾고 다양한 이해관계자들의 관점을 고려하여 의사결정과 판단을 내리는 능력이다.

Two of the more distinct interpersonal constructs that an MMI can attempt to measure are oral communication and problem evaluation. Oral communication is the ability to convey verbal messages constructively; and problem evaluation is the ability to identify and take into account multiple perspectives from various different stakeholders in decision making and judgment.


평가자들은 지원자의 OrCo와 PrEv 능력을 관찰하고 평가할 수 있다.

raters have an opportunity to observe and rate the candidate on

  • the clarity of their language and confidence in their conveyed verbal response (oral communication), and

  • the breadth and depth to which they can explore underlying issues within cases and correctly balance pros and cons for the situation (problem evaluation).


성격특성과 관련한 MMI 척도

MMI measures related to personality characteristics


세 가지 연구에서 MMI 총점과 성격척도의 관계를 살펴본 바 있다. 이들 연구의 결과는 혼재되어 있다.

Three exploratory studies have investigated the relationship between total MMI scores and personality measures (Griffin and Wilson 2012; Jerant et al. 2012; Kulasegaram et al. 2010). The results from these studies found mixed evidence


보건의료전문직의 대인관계 능력에 영향을 주는 두 가지 성격특성에는 emotionality와 extraversion이 있다.

Two personality traits that are likely to be related to health professionals’ interpersonal performance are emotionality and extraversion (Ashton and Lee 2007).

  • Emotionality가 높은 사람: 공감을 잘 하고, 위험하거나 스트레스 상황에 민감하며, 다른 사람의 감정적 지지에 의지를 느낄 수 있다 People who have high emotionality tend to feel empathy and sentimental attachments with others, are sensitive to dangerous and stressful situations, and feel dependent on the emotional support from others;

  • Extraversion이 높은 사람: 자신감이 넘치고, 그룹을 이끌며, 각종 모임과 관계를 즐기고, 열정과 에너지지가 넘친다 people who are extraverted tend to feel confident when leading or addressing groups of people, enjoy social gatherings and interactions, and frequently experience positive feelings of enthusiasm and energy.


외향성이란..

Extraversion is a trait that includes tendencies such as acting confident with others and expressing enthusiasm and energy (Ashton and Lee 2007).


정서성이란..

Emotionality is a trait that includes tendencies such as being familiar with the anxieties and fears that come with stressful situations, and feeling emotional connections with others (Ashton and Lee 2007).


미래 수행능력과 관계된 MMI 척도

MMI measures related to future performance


 

MMI에서는 다음을 측정(OrCo와 PrEv)

MMI measures of two distinct constructs (oral communication and problem evaluation) and

 

의사소통 스킬 인터뷰에서는 다음을 측정

communication skill interview scores of students’

  • 효과적 관계구축 effectiveness in building a rela- tionship (i.e. build a patient’s or client’s feelings of rapport and trust with the practitioner) and

  • 효과적 설명과 계획 effectiveness in explaining and planning (i.e. build a patient’s or client’s under- standing and motivation to support an action plan; Silverman et al. 2005).


OrCo는 효과적 관계구축과, PrEv는 효과적 설명 및 계획과 관계가 있을 것으로 생각함.

Oral communication should be more closely related to building a relationship and problem evaluation should be more closely related to explaining and planning.


연구목적과 가설

Research objectives and hypotheses


H1 Given the explicit measurement and distinctiveness of oral communication and problem evaluation, there will be a stronger model fit for a 2-factor solution for MMI scores than for a 1-factor solution. 


H2a Oral communication MMI scores will be positively related to building the rela- tionship score in a communication interview.

H2b Problem evaluation MMI scores will be positively related to explaining and plan- ning scores in a communication interview. 


H3a Oral communication MMI scores will be positively related to extraversion scores measured by the HEXACO-PI-R-60 (Ashton and Lee 2009). 


H3b Problem evaluation MMI scores will be positively related to emotionality scores measured by the HEXACO-PI-R-60 (Ashton and Lee 2009).


 

방법

Method


표본 Sample


척도 Measures


다면인적성면접

MMI


The MMI consisted of eight 10-min stations, with two raters per station who each inde- pendently rated the participants on two constructs. The development of the MMI followed the description outlined in Hecker et al. (2009). The majority of the stations were devel- oped at the University of Calgary, Canada and modified by the admissions committee at OVC.


The eight stations were meant to assess oral communication and problem evaluation for a range of issues relevant to success as a veterinarian. These issues were ethical and moral (2 stations), interpersonal (3 stations), intrapersonal (1 station), and professional (2 stations). At each station raters scored candidates on two items, one for each construct. Each item was scored on a scale of 1–5 (1 = unacceptable; 3 = meets expectations; 5 = exceptional)


커뮤니케이션 인터뷰

Communication interview


 

Adams and Ladner 가 설계한 표준화된 임상 커뮤니케이션 인터뷰가 있다. 각 참여자는 두 개의 인터뷰 스테이션에서 효과적인 의사소통 스킬의 활용을 평가받았다. 의학지식은 거의 필요하지 않았다. simulated client가 7개 항목에 대해서 즉각적으로 참여자를 평가하였다.(관계형성 4 문항, 설명과 계획 3문항)

The standardized clinical communication interviews were initially designed by Adams and Ladner (2004) with the consultation of practicing veterinarians. Each participant partici- pated in two communication interview stations designed to assess participants’ use of effective communication skills. Medical and technical knowledge requirements were minimal. The simulated client rated the participant immediately after each station on 7 items (using a 9 point scale) meant to assess two constructs, building the relationship (4 items) and explaining and planning (3 items).

 

두 점수는 T-score로 변환되어서 스테이션간 서로 다른 simlated client의 차이를 보정하고자 하였다. 두 스테이션의 T-score의 평균 점수를 계산하였다.

The two scores within each station were converted to a T-score to account for differences in simulated client scores between sta- tions (Howell 2002). The mean of each participant’s T-score across the two stations was calculated to use as his/her building the relationship and explaining and planning score.


성격

Personality


The personality traits of emotionality and extraversion were measured with the HEXACO- PI-R-60 (Ashton and Lee 2009).


분석

Analysis


 

결과

Results













고찰

Discussion


주요 결과

The major findings from this study were:


1. 2-요인 모델을 지지한다. 그러나 OrCo와 PrEv 구인은 두 모델에서 모두 상관관계가 매우 높았다.

1. There was support for a two factor model, however, the oral communication and problem evaluation constructs were highly correlated both within the model (.87) and the correlation analyses with the actual data (.73; Table 4).


2. OrCo 점수는 외향성, 그리고 관계구축과 유의한 상관관계가 있었다.

2. Oral Communication MMI score was significantly correlated with extraversion (small but significant) and building the relationship scores, supporting Hypotheses 2a and 3a.


3. PrEv점수는 정서성과 유의한 상관관계가 없었으나, 관계구축 및 설명과 계획과 유의한 상관관계가 있었다.

3. Problem evaluation MMI score was not significantly related to emotionality score but did correlate with building the relationship (not hypothesized) and explaining and planning, thus not supporting Hypothesis 2b but supported Hypothesis 3b.


4. MMI총점은 외향성과 작지만 유의한 상관관계가 있었고, 관계구축, 설명과 계획 과 유의한 상관관계가 있었다.

4. Total MMI score had a weak but significant correlation with extraversion, and significant correlations with building the relationship and explaining and planning.


2-요인 모델이 더 강력했으나, 두 요인의 상관관계가 높았다. 따라서 두 개의 truly distinct factor를 측정한다고 결론을 내리기에는 조심스럽다.

While there was a stronger and significantly better model fit for a two factor model (Fig. 1) than a one factor model, the two constructs were highly corre- lated (.87). Thus while there was support for a two factor model, caution must be taken in concluding that we are measuring two truly distinct factors as there was weak evidence for discriminant validity between the two construct scores.


본 연구결과의 실용적 의의를 찾자면,

practical implications of these findings : there is evidence for

  • MMI 스테이션 구성에 시간과 노력을 투자할 가치가 있다. investing the time and effort in MMI station construction,

  • 미래 수행능력을 예측하는 것으로 알려진 특성에 기반하여 평가표를 만들어야 함 creating appropriate scoring rubrics based upon attributes known to be pre- dictive for future performance and

  • 평가자 훈련을 통해서 공정한 평가가 이뤄지도록 해야 함 conducting rater training to ensure appropriate and fair assessment of the candidate.


두 번째 연구의 목적은 MMI척도가 비인지적 구인의 nomological network에 부합하는지를 보는 것이었다. 흥미롭게도 OrCo와 PrEv의 MMI 점수가 매우 상관관계까 높았지만, 다른 비인지적 구인과의 상관관계는 서로 다르게 나타났다.

The second research objective was to test whether the MMI measures fit within the nomological network for non-cognitive constructs. Interestingly, even though the MMI scores of oral communication and problemevaluation were highly related, they were found to have different relationships to other measures of non-cognitive constructs.


2B가설과 같이, MMI의 PrEv점수는 '설명과 계획'과 유의한 관계가 있었다. 그러나 MMI의 PrEv점수에 대해서 정서성과 관련될 것이라는 가설은 맞지 않았다.

Consistent with our hypothesis (2B), the MMI problem evaluation rating had a sig- nificant positive relationship with explaining and planning. However, the hypothesized relationship between the MMI problemevaluation measure and emotionality was not found (hypothesis 3B). 

 

한 가지 설명은 MMI가 학생의 공감능력을 제대로 측정하지 못한 것이다. 현재의 MMI는 학생이 다른 사람의 관점을 얼마나 잘 인식하는지를 측정하지만, 얼마나 학생이 다른 사람에게 공감을 표현하는지는 측정하지 못한다. 좀 더 직접적으로 상호작용하는 스테이션을 포함시킴으로서 이러한 한계를 극복 가능할 것이다.

One explanation is that the MMI did not effectively measure students’ ability to empathize with others. The current MMI measured students’ ability to recognize the points of views of others, but it did not measure how students’ would express feelings of empathy towards others. One way that this could be done is to include stations that require candidates to engage more directly in an interaction.


또 다른 설명은 정서성이 광범위한 성격특성이라는 점이다. 광범위한 성격특성으로서 정서성은 성격의 여러가지 측면을 포함한다. 구체적인 준거와 강력한 개념상의 연결관계가 있을 때, Narrow trait가 Broad trait보다 종종 더 predictive한 것으로 알려져있다.

Another explanation is that emotionality is a broad personality trait. As a broad personality trait, emotionality measures a broad range of individual attributes (e.g. empathy towards others, sensitivity to physical harm). Narrow traits are often found to be more predictive than broad traits when there is a strong conceptual link to a specific criterion (Rothstein and Goffin 2006; Tett et al. 1991).


모든 valid한 선발과정은 그 과정을 거침으로서 일반적인 지원자 집단이 보다 균질한 집단으로 변해야 하나, 그러한 균질성은 학교나 직장에서의 성공과 관계된 특성에 대해서만 균질해야 한다.

Any valid selection process should lead to the selection of a more homogenous group of successful candidates from the general applicant pool, wherein the successful applicants are homogenous only on the characteristics that lead to in-school or in-job success.


따라서, 임상에서의 인터뷰 또는 health outcome을 더 향상시킬 수 있는 성격특성이 존재한다면, MMI시나리오가 그러한 것들을 평가할 수 있도록 설계되어야 한다.

Thus, if these are personality traits that can lead to better performance within the clinical interview and potentially better health outcomes, then it can be argued MMI scenarios should be designed to assess attributes related to these traits.



 

Ashton, M. C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review, 11, 150–166.


Jerant, A., Griffin, E., Rainwater, J., Henderson, M., Sousa, F., Bertakis, K. D., et al. (2012). Does applicant personality influence multiple mini-interview performance and medical school acceptance offers? Academic Medicine, 87, 1–10.




 2014 Aug;19(3):379-92. doi: 10.1007/s10459-013-9480-6. Epub 2014 Jan 22.

Validating MMI scores: are we measuring multiple attributes?

Author information

  • 1University of Guelph, Guelph, Canada.

Abstract

The multiple mini-interview (MMI) used in health professional schools' admission processes is reported to assess multiple non-cognitive constructs such as ethical reasoning, oral communication, or problem evaluation. Though validation studies have been performed with total MMI scores, there is a paucity of information regarding how well MMI scores differentiate the constructs being measured, the relationship between MMI scores (construct or total) and personality characteristics, and how well MMI scores (construct or total) predict future performance in practice. Results from these studies could assist with MMI station development, rater training, score interpretation, and resource allocation. The purpose of this study was to investigate the validity of MMI construct scores (oral communication and problem evaluation), and their relationship to personality measures (emotionality and extraversion) and specific scores from standardized clinical communications interviews (building the relationship and explaining and planning). Confirmatory factor analysis results support a two factor MMI model, however the correlation between these factors was .87. Oral communication MMI scores significantly correlated with extraversion (r c = .25, p < .05), but MMI scores were not related to emotionality. Scores for building a relationship were significantly related to MMI oral communication scores, (r c = .46, p < .001) and problem evaluation scores (r c = .43, p < .001); scores for explaining and planning were significantly related to MMI problem evaluation scores (r c = .36, p < .01). The results provide validity evidence for assessing multiple non-cognitive attributes during the MMI process and reinforce the importance of developing MMI stations and scoring rubrics for attributes identified as important for future success in school and practice.

PMID:
 
24449121
 
[PubMed - in process]


입학면접의 조건부 신뢰도: 극단적 평가에서 얻을 정보가 더 많다(Med Educ, 2007)

Conditional reliability of admissions interview ratings: extreme ratings are the most informative

R Brent Stansfield1 & Clarence D Kreiter2





INTRODUCTION


평가자들은 보통 지원자들을 양적이지만, 신뢰도가 낮은 첫도인 리커트식 척도로 평가하게 된다.

Interviewers typically rate applicants on Likert-type scales1 that yield quantitative, but unreliable, meas- ures.4,6


면접의 예측타당도에 대한 근거는 매우 적다.

There is little evidence for the predictive validity of interviews.


면접점수의 낮은 신뢰도는 면접 절차가 invalid하다는 것으로부터 유래한 것이 아니라, 평가점수가 모든 영역에 있어서 균등하게 informative하다는 태도에서 기인했을 수 있다. 만약 평가자가 매우 우수한 지원자를 감별할 수는 있으나, 중간정도 혹은 불충분한 지원자는 감별하지 못한다고 하자. 그렇다면 이 평가자의 점수는 높은 점수 범위에서 낮은 점수 범위보다 더 informative할 것이다. 이 타당도에도 불구하고 이 평가자의 점수를 전체적으로 보면 낮을 것이다. 이 경우 평가자가 준 점수를 적절히 활용하는 것이 조건부신뢰도(conditional reliability)이며, 서로 다른 점수영역에서의 신뢰도를 말한다.

Unreliable interview scores may not arise from invalid interviewing processes, but rather from the treatment of ratings as homogenously informative measures. Imagine an interviewer able to identify stellar candi- dates, but unable to distinguish mediocre from poor ones; his high scores would be more informative than his low scores. Despite this validity, his ratings would have low reliability overall. The proper use of his ratings would account for conditional reliability: the reliability of different scale ranges.


조건부신뢰도에 대한 또 다른 연구에서는 리커트식 척도에서 error variance의 이질성heterogeneity를 발견한 바 있다. 정치적 의견에 대해서 중간지점의 점수midpoint가 있을때, 이것이 의미하는 바는 '결정하지 못함' 일 수도 있고 '생각해본 적 없음' 일수도 있으며, 이 경우 '찬성도 반대도 아닌 중립'과는 다른 의미이다. 이는 midpoint의 응답은 non-midpoint의 응답에 비해서 확신이 낮다는 것을, 즉 높은 SE를 보임을 의미한다. 불안 정도에 대한 한 연구에서 단순히 midpoint를 결측치로 설정한 것 만으로 Cronbach's alpha가 0.7에서 0.94로 상승하였다.

Other investigations of conditional reliability have found heterogeneity of error variance in Likert-type scales. Use of midpoint responses on political opin- ion questions may represent undecided or never thought about it as opposed to neutral or neither agree nor disagree .13 This suggests less certainty, and therefore a higher standard error of measurement, in midpoint responses than in non-midpoint responses. A study of education graduate students’ responses on an anxiety scale raised Cronbach’s alpha from0.70 to 0.94 merely by treating midpoint responses as missing data.14


방법

METHODS


참가자

Participants: observed and simulated



관찰 집단 1

Observed set 1


관찰 집단 2

Observed set 2


가상 집단

Simulated set


분석

Analysis




결과

RESULTS


관찰 집단 1이 가상 집단보다 더 reliable하다.

Observed set 1 is more reliable than the simulated set


높은 평가점수와 낮은 평가점수에서 더 reliable하다.

Low and high ratings are more reliable


높은 점수와 낮은 점수에 가중치를 둠으로써 validity를 향상시킬 수 있다.

Weighting low and high responses improves validity



DISCUSSION



평가자들은 가장 높은 퀄리티와 가장 낮은 퀄리티의 지원자 면접에 대해서 서로 동의하게 되는 경우가 더 많다. 이러한 동의가 발생하는 것은 수학적 artefact가 아니다. 실제 관찰집단에 비해서 가상 집단에서 극단치 점수에서 평가자간 불일치가 더 크게 나타났다. 평가자는 한 명의 (이상의) 평가자가 '평균수준'으로 여긴 지원자에 대해서 우연의일치를 보이는 확률보다 더 높은 확률로 불일치를 보였다. 이 중간정도 지원자에 대한 평가는 negatively reliable했으며, 이는 modal response를 활용하는 것이 invalid함을 보여준다. 즉 '평균수준이다'가 아니라 '나는 모르겠다'의 응답에 가깝다는 것이다. 만약 그렇다면, 평가자간 불일치가 크게 나타나는 것은 substance의 문제가 아니라 자신감confidence의 문제일 수 있다. 평가자가 5점척도에서 1점과 2점을 거의 사용하지 않는다면 4점이 사실상의 3점척도(3, 4, 5점)에서 중간치가 된다

Raters tend to agree more about the lowest and highest quality applicant interviews. This agreement is not a mathematical artefact: the simulated set contains much more inter-rater disagreement at extreme ratings than observed sets 1 or 2 (Fig. 2). Raters tend to disagree more than chance about applicants whom 1 rater has deemed average. These moderate ratings are actually negatively reliable , suggesting an invalid use of the modal response, perhaps denoting I don t know’ rather than average applicant . If so, these large inter-rater disagreements reflect differences in confidence rather than sub- stance. As raters rarely use levels 1 and 2, the modal level 4 is effectively the midpoint on a 3-point scale; these results mirror those finding midpoint responses unreliable.13,14


더 중요한 것은, 이 결과가 입학절차에 있어서 각 부분점수에 가중치를 두어 최종점수를 구할 때, 중간치 평가점수moderate interview rating을 무시해버리는 것이 더 낫다는 점을 시사한다. 신뢰도가 낮은 척도를 신뢰도가 높은 척도와 함께 가중-점수에 넣는 것은 그 결과로 나오는 점수의 신뢰도를 하락시킬 수 있다. 모든 moderate response를 결측치로 처리하는 것이 이 자료에 미치는 noise의 영향을 제거할 수 있는 길이며, 극단치 점수는(이 점수들은 예측타당도가 잇으므로) 지원자의 상대적 비교를 할 때 영향을 주게끔 해야 한다.

More importantly, these results suggest that ignoring moderate interview ratings entirely during the admissions process is preferable to using them when computing larger weighted sum scores. Introducing unreliable measures into weighted averages with reliable ones can compromise the reliability of the resulting score.6 Treating all moderate responses as missing data eliminates the impact of the noise in those responses, while allowing extreme scores (which in these data have some predictive validity) to influence applicants’ relative standings.


7 Kreiter CD, Gordon JA, Elliott S, Callaway M. Recom- mendations for assigning weights to component tests to derive an overall course grade. Teach Learn Med 2004;16:133–8.









 2007 Jan;41(1):32-8.

Conditional reliability of admissions interview ratingsextreme ratings are the most informative.

Author information

  • 1Department of Medical Education, University of Michigan, Ann Arbor, Michigan 48109, USA. rbent@umich.edu

Abstract

CONTEXT:

Admissions interviews are unreliable and have poor predictive validity, yet are the sole measures of non-cognitive skills used by most medical school admissions departments. The low reliability may be due in part to variation in conditional reliability across the rating scale.

OBJECTIVES:

To describe an empirically derived estimate of conditional reliability and use it to improve the predictive validity of interview ratings.

METHODS:

A set of medical school interview ratings was compared to a Monte Carlo simulated set to estimate conditional reliability controlling for range restriction, response scale bias and other artefacts. This estimate was used as a weighting function to improve the predictive validity of a second set of interview ratings for predicting non-cognitive measures (USMLE Step II residuals from Step I scores).

RESULTS:

Compared with the simulated set, both observed sets showed more reliability at low and high rating levels than at moderate levels. Rawinterview scores did not predict USMLE Step II scores after controlling for Step I performance (additional r2 = 0.001, not significant). Weightinginterview ratings by estimated conditional reliability improved predictive validity (additional r2 = 0.121, P < 0.01).

CONCLUSIONS:

Conditional reliability is important for understanding the psychometric properties of subjective rating scales. Weighting these measures during the admissions process would improve admissions decisions.

PMID:
 
17209890
 
[PubMed - indexed for MEDLINE]


미래의 보건의료 리더 선발을 위한 MMI의 신뢰도 향상 (Acad Med, 2011)

Enhancing the Reliability of the Multiple Mini-Interview for Selecting Prospective Health Care Leaders

Sebastian Uijtdehaage, PhD, Lawrence “Hy” Doyle, EdD, and Neil Parker, MD





미국에서 효과적이고 접근가능한 의료 제공과 관련한 현재의 위기는 미국 의과대학 학부 프로그램에 듀얼-학위 리더십 프로그램을 낳았다. Program in Medical Education (PRIME), David Geffen School of Medicine at UCLA, UCLA-PRIME

The current crisis in providing effective and accessible health care in the United States has spawned a number of dual- degree leadership programs for medical undergraduates.1

  • In 2005, the University of California (UC) initiated an ambitious initiative, the Program in Medical Education (PRIME), to increase enrollment in its medical schools in order to address the needs of California’s disadvantaged populations.2,3
  • In 2007, at the David Geffen School of Medicine at UCLA, UCLA-PRIME was developed as a five-year dual-degree program focused on the development of leadership skills in 18 medical students per year whose career goals would be to improve health care for the disadvantaged and medically underserved.


미래의 의사를 선발하는 것은 종종 몇 가지 이유로 실패하곤 한다.

The selection of future physicians, however, often fails on several accounts.4

  • GPA나 MCAT같은 인지적 성취기록이 비인지적 특성을 무시하게끔 한다.
    First, the cognitive record of the applicant, that is, grade point average (GPA) and Medical College Admission Test (MCAT) scores, commonly overrides any consideration of noncognitive attributes in decisions to admit.5
  • 지원자들로부터 확인하고자 하는 비인지적 특징들이 불명확하고, Implicit하고 합의되지 않았다.
    Second, the noncognitive qualities sought in applicants are unclear, remain implicit, and are not necessarily agreed on by stakeholders.
  • 합의되고 명확한 경우에도 신뢰도와 타당도를 갖춘 평가법이 적다
    Third, even if a set of desirable noncognitive qualities for candidates is clear and agreed on, reliable and valid assessment methods are scarce. This is particularly true for characteristics such as altruism, empathy, and leadership.
  • 전체 입학 프로세스가 투명하거나 uniformly 적용되는 경우가 적다.
    Furthermore, the entire admissions process is rarely transparent or uniformly applied.


불행하게도, 입학 면접은 맥락-특이적이다. 지원자의 응답이 면접관, 질문, 그 외 요인 등에 따라 달라질 수 있다는 것이다. Kreiter 등은 입학면접의 variance component에 대해서 지원자들로부터 기인하는 변인성분이 지원자-상황 상호작용 성분보다 작다고 보고했다. 이런 유사한 결과가 전통적 면접의 신뢰도가 부적절하며, 따라서 타당도도 의문을 가지게 됨을 시사한다.

Unfortunately, admissions interviews are, like many other assessments, prone to “context specificity.”7 That is, the performance of an applicant during the interview may depend to an important extent on the particular interviewer, the specific questions asked, or other factors irrelevant to the applicant’s suitability. Indeed, Kreiter and colleagues8 studied the variance components of admissions interview scores and found that the variance component attributable to applicants was smaller than variance component attributable to the applicant- by-occasion interaction. These and similar findings imply that traditional interviews may have inadequate reliability and, thus, questionable validity.


Eva 등이 최초로 연구한 MMI는 학부졸업생을 대상으로, 의과대학 지원자들이라는 상대적으로 이질적진 집단에서 연구되었다. 이는 신뢰도 결과를 부풀리는 결과를 가져왔을 수 있다. Eva 등이 이후 연구에서 밝힌 바와 같이 "어떤 평가의 신뢰도와 타당도는 그 전략이 적용되는 맥락이나 평가의 내용에 따라 달라진다"라고 하였고, 다른 말로는 MMI의 우수한 psychometric properties는 더 균질한 집단에서는 보장되지 않을 수 있는 것이다.

The initial MMI study by Eva and colleagues12 was conducted on graduate students, a relatively heterogeneous group compared with a pool of medical school applicants. This may have inflated their reliability results. As Eva and colleagues22 put forth in a subsequent article, “the reliability and validity of any assessment strategy is dependent on the context in which the strategy is applied and the content of the assessment.” In other words, the promising psychometric properties of the MMI may not necessarily hold up for a more homogenous pool of applicants who have been selected for consideration on the basis of a more specific set of attributes.



방법

Method


우리는 우선 델파이 접근을 통해서 리더십과 취약계층에 대한 헌신에 초점을 둔 UCLA-PRIME 지원자가 갖추어야 할 바람직한 특성의 인벤토리를 만들었다. 

First, we generated an inventory of the desirable characteristics of UCLA-PRIME candidates with a focus on leadership and commitment to disadvantaged populations using a Delphi approach among stakeholders (program administrators, deans, faculty members, and community leaders). We described the details of the Delphi study elsewhere.23 Characteristics that were deemed essential for the PRIME program included

  • 헌신 commitment to and experience with underserved populations,
  • 문화적 민감성 cultural sensitivity,
  • 리더십 잠재력 leadership potential,
  • 성숙 maturity, and
  • 효과적인 팀 구성원 되기 being an effective team member.


연구 1

Study 1 (2009)



In 2009, we created a panel of 28 interviewers consisting of 18 faculty members, 6 medical students, and 4 community members.


  • On the day of the MMI, we handed out the scenarios and a list of applicants to the interviewers.
  • The interviewers practiced the scenarios with each other before the applicants arrived.
  • We instructed the interviewers to rate the overall performance of the applicant using a seven-point Likert scale (1 unsatisfactory; 7 outstanding).
  • Specifically, we asked themto “consider the applicant’s communication skills, strength of the argument, and suitability for the medical profession.
  • We strongly encouraged the interviewers to use the full rating scale, recognizing that interviewees had been selected from a very large pool of applicants and exceeded all other admissions requirements. Interviewers scored the applicants immediately after each interview.
  • They could adjust their scoring after they completed interviewing the entire cohort.
  • A total score was calculated for each applicant by summing the scores for individual stations. Thus, total scores could range from12 through 84.




연구 2

Study 2 (2010)

 

몇 가지 변화

  • 장소 변화 First, we moved the MMI venue to our education building and used adjacent rooms typically used for small-group teaching of medical students. The applicants could familiarize themselves with the layout of the facility before commencing the MMI. 
  • 쉬운 문항을 어려운 문항으로 Second, we replaced an easy station (Station 9, “How did you prepare for this interview?”) with a perhaps more challenging task in which applicants were asked to describe student characteristics desirable for the PRIME program. Difficulty level was not assessed formally but was suggested by the fact that interviewers had difficulty differentiating performance of the applicants in the original station. The remaining 11 stations were the same as in 2009. 
  • Normative scoring rubric으로 Third, we asked the interviewers to rate the performance of an applicant relative to the pool of all applicants. Accordingly, we changed the seven-point Likert-scale anchors to a normative scoring rubric (1 bottom15%; 4 middle 50%; 7 top 15%). 
  • 워딩 수정 Finally, we changed the wording of two stations that previously led to confusion among some applicants. In 2009, one station asked the applicants to discuss “surgeons’ mortality rates.” A few applicants proceeded to discuss the mortality rate of surgeons and not their patients. In 2010, we changed the prompt to “surgeons’ patient mortality rates.” In another station, we replaced the term “SARS epidemic” with the more recent “H1N1 epidemic” but left the crux of the station the same.





결과

Results


연구 1

Study 1 (2009)


분포가 최대치 점수쪽으로 치우쳐져 있음

The distribution of the total MMI scores, however, was skewed toward the maximum score, suggesting that interviewers had difficulty using the lower range of the rating rubric (Figure 1).

 

 


 

연구 2

Study 2 (2010)

 

 





 


고찰

Discussion


MMI가 균일한 지원자 집단에 대해서도 효과적으로 사용가능하다.

Our study showed that the MMI can be effectively used to assess a homogeneous group of applicants and that its reliability can be enhanced with minor changes in protocol.


처음 2009년에 도입된 MMI의 신뢰도는 0.58이었고 다른 연구의 보고된 결과보다 낮았다. 1차와 2차 지원 정보를 통해서 취약계층에 대한 강한 헌신을 보이는 학생을 일차적으로 스크리닝했기에 상대적으로 균일한 지원자 집단이었다. 이러한 균일성과 작은 표본크기가 variability를 작게 만들었을 수 있다.

Reliability of the first MMI implementation in 2009 was 0.58—lower than reported elsewhere. Our interviewees were a relatively homogenous group of applicants because initial screening considered primary and secondary application information that demonstrated a strong commitment to disadvantaged populations. This homogeneity and the smaller sample size may have resulted in comparatively less variability among the interviewees and could have suppressed the reliability of the overall MMI assessment as estimated by the generalizability coefficient.


2010년에는 몇 가지 변화를 가져왔고 이것들이 신뢰도에 기여한 것으로 보인다. 하나는 쉬운 스테이션을 어렵게 바꾼 것인데, 지원자 간 구분discrimination을 촉진하기 위해서는 적절한 수준의 난이도를 유지해야 한다. IRT에서는 중간 난이도가 가장 변별력이 있다고 제안한다.

We made a few changes in the 2010 implementation of the MMI process that, all taken together, seemed to have contributed to a substantial improvement of the reliability. One such change was the replacement of a seemingly “easy” station (determined at face value) with a more challenging one. To facilitate discrimination between applicants, the stations must have an optimal level of difficulty. Item response theory suggests that items of median difficulty best discriminate between groups with either high or low magnitude of a latent trait.28


실제로, 우리의 결과를 보면 쉬운 스테이션은 단순히 '시그널에 노이즈만 더한' 결과를 가져왔다. 우리가 쉬운 스테이션을 제외하고 신뢰도를 분석하면 신뢰도가 상승하였고, 이는 한 평가 포인트를 제외했을 때 신뢰도가 감소할 것이라는 일반적 기대와 다른 결과이다.

And, indeed, our analysis showed that an easy station simply “added noise to the signal.” When we recalculated the reliability excluding Station 9, the reliability improved; it did not decrease, as one would expect when taking away one assessment point.


2010년 연구에서 평가자들은 채점 anchor를 하위 15%, 하위 30%, 중위 50% 등으로 바꿨을 때 더 전체 평가 스케일을 사용할 수 있었던 것으로 드러난다. 이러한 채점방법을 통해서 우리는 지원자들의 순위를 매길 것을 권장한 것이다. 면접관들은 13명의 지원자를 본 이후에 점수를 보정할 수 있게 하였으며 2009년에도 이는 동일하였다.

In our 2010 study, the interviewers seemed better able to use the full range of the rating scale after we changed its anchors to “bottom15%,” “bottom30%,” “middle 50%,” etc., and asked interviewers to rate an applicant’s performance relative to the pool of all applicants. Thus, we encouraged rank- ordering of candidates with a more normative approach of scoring. Interviewers could adjust their scoring after having seen a cohort of 13 applicants (and this was allowed in the 2009 study as well).



MMI를 도입하는 것은 가능하긴 하지만, 여전히 부담스러운 일이다.

We found that implementing MMIs was feasible but a daunting task nonetheless.


 

인적자원이 많이 들어간다. 준비할 것이 많다(securing space, identifying appropriate interview questions, interviewer training, etc.). 그러나 이러한 비용은 각 평가자가 지원자 풀을 평가하는데 들어가는 시간이 덜 들어가는 것으로 보상된다. 면접관이 보고서를 작성거나 위원회 회의에 들어가는 시간 등을 고려하면 시간의 절감 효과는 더 크다.

Clearly, the MMI requires extensive human resources. In a recent cost- efficiency analysis, Rosenfeld et al29 found that MMI requires more upfront preparation (securing space, identifying appropriate interview questions, interviewer training, etc.) compared with the traditional interview process. This cost, however, was offset by considerably fewer hours required of each person to assess a pool of applicants. We would note that the time saving is even more considerable if the time spent by interviewers in writing reports and attending committee meetings in which applicants are discussed is taken into account. 



한계점. Validity를 평가하지 않았음.

Our study has several limitations. First, we did not assess the validity of the MMI process even though one could argue that blueprinting the MMI stations based on our Delphi study provided an acceptable level of content validity.


이 영역의 연구는 널리 사용되나 여전히 잘 정의되지 않는 용어인 '비인지적 특성'이라는 용어로 인해서 제약을 받는다. Norman이 지적한 바와 같이 'noncognitive skills'라는 용어는 MCAT점수나 GAP점수가 반영하지 않는 특성을 의미하며, 여기에는 tacit knowledge, communication skills, emotional intelligence, and stable personality traits 등이 포함된다. 입학위원회는 의사로서의 진로와 의료행위, 그리고 기관의 철학과 목적에 맞춰 이러한 특성이 무엇인지 명확히 정의해야 할 것이다.

Research in this area is hampered by the ubiquitous but ill-defined term “noncognitive characteristics.” As Norman32 pointed out, the umbrella term“noncognitive skills” is used to describe those characteristics that MCAT score or GPA do not reflect, such as tacit knowledge, communication skills, emotional intelligence, and stable personality traits. We feel that admissions committees must explicitly define those qualities they deem essential for a successful medical school career and subsequent practice and that are in concordance with the institution’s philosophy and goals.




 




 

 



1 Crites GE, Ebert JR, Schuster RJ. Beyond the dual degree: Development of a five-year programin leadership for medical undergraduates. Acad Med. 2008;83:52–58. http://journals.lww.com/academicmedicine/ Fulltext/2008/01000/Beyond_the_Dual_ Degree__Development_of_a_Five_Year.8. aspx. Accessed April 28, 2011.



26 Crossley J, Russell J, Jolly B, et al. ‘I’mpickin’ up good regressions’: The governance of generalisability analyses. Med Educ. 2007;41: 926–934.



34 Ko M, Edelstein RA, Heslin KC, et al. Impact of the University of California, Los Angeles/ Charles R. Drew University Medical Education Programon medical students’ intentions to practice in underserved areas. Acad Med. 2005;80:803–808. http://journals. lww.com/academicmedicine/Fulltext/2005/ 09000/Impact_of_the_University_of_ California,_Los.4.aspx. Accessed April 28, 2011.








 2011 Aug;86(8):1032-9. doi: 10.1097/ACM.0b013e3182223ab7.

Enhancing the reliability of the multiple mini-interview for selecting prospective health care leaders.

Author information

  • 1Center for Educational Development and Research, David Geffen School of Medicine, University of California, Los Angeles, USA. bas@mednet.ucla.edu

Abstract

PURPOSE:

The David Geffen School of Medicine at UCLA Program in Medical Education (UCLA-PRIME) used a 12-station multiple mini-interview(MMI) circuit to assess applicants. The authors sought to determine the reliability of the MMI, potential bias in scores, and the degree of acceptance by interviewers and applicants.

METHOD:

In 2009, 28 interviewers interviewed a cohort of 76 applicants. An anonymous survey assessed interviewers' and applicants' satisfaction with the MMI process and perceived bias. Psychometric properties were determined with generalizability and decision theory. The process was repeated the following year with a new cohort of 78 applicants and minor modifications aimed at improving reliability.

RESULTS:

The MMI format was well received by both applicants and interviewers. No bias based on gender or disadvantaged status was found. The preliminary reliability of the MMI in 2009 was 0.58-lower than reported in previous studies-but improved in 2010 to 0.71 after an easy station was replaced with a more challenging one and a new scoring rubric was introduced.

CONCLUSIONS:

This interview technique proved to be reliable and was seen as transparent, uniform, and fair. The predictive validity of this process remains to be determined.

PMID:
 
21694560
 
[PubMed - indexed for MEDLINE]


의과대학 입학도구에 지역사회, 교수, 학생의 가치 반영하기 (Teach Learn Med. 2005)

Reflecting the Relative Values of Community, Faculty, and Students in the Admissions Tools of Medical School

Harold I. Reiter Kevin W. Eva 

McMaster University Department of Clinical Epidemiology and Biostatistics Hamilton, Ontario, Canada






두 번째 천년을 마무리지으며, 미국과 캐나다에서는 의사에게 요구되는 특질attribute을 정의했을 뿐 아니라, 이 특질들을 postgraduate와 practice 수준까지 강화하기 위한 교육과정과 평가 프로세스를 강화하였다. ACGME의 six competencies, 캐나다의 “Educating Future Physicians of Ontario,” , Core Committee of the Institute for International Medical Education 의 일곱개 역량 영역.

In the concluding years of the second millennium, efforts were under way in both the United States and Canada not only to define the attributes desirable in our physicians but also to foster curricular and evaluative processes to enhance those attributes at the postgradu- ate and practice levels.

  • In the United States, efforts by the American Board of Medical Specialties and by the Accreditation Council for Graduate Medical Educa- tion produced a document describing the six compe- tencies expected of physicians.1
  • A parallel movement in Canada, arising from the project “Educating Future Physicians of Ontario,”2 led to the creation of CanMEDS 2000 and its seven roles of the physician.3
  • From a global perspective, the Core Committee of the Institute for International Medical Education has grouped the essentials that physicians must have under seven competence domains.4


인지적 역량과 대비되는 개인 역량, 개인 인성에 대한 강조는 우연의 일치가 아니다. 전통적으로 인지적 능력을 평가하기 위한 도구들은 비교적 성공적이었지만, 인성 역량을 평가하기 위한 도구는 아주 드문 예외를 제외하고는 신뢰도와 타당도가 떨어진다.

The emphasis on personal, as opposed to cognitive, qualities in that reviewis no accident. As clearly dem- onstrated in an earlier, separate literature review6 of ad- missions tools to health professional schools, tradi- tional tools for the evaluation of cognitive qualities have largely succeeded, although those evaluating per- sonal qualities, with rare exception, have failed to dem- onstrate reliability and validity.


MMI는 이러한 측면에서 상당한 진전이었다.

A significant step in the development of those tools was taken with the advent of the Multiple Mini-Interview (MMI).7


방법

Methods


학부 입학시에 중요한(관련된) 일곱 개의 인적 특성에 대한 리스트를 만들었다. 이 정의는 comprehensive하지는 않지만, 가이드로 사용될 수 있을 것이다.

Adapting the roles, competencies, and competence domains outlined in Table 1 in conjunction with the lit- erature on admissions and local discussion, we created a list of seven personal characteristics that could be conceived to be relevant in an undergraduate admis- sions context. These characteristics, along with the definitions provided to participants, are illustrated in Table 2. Participants were told that these definitions were not comprehensive but that they should serve as a guide.


paired comparison approach에 따라서, 7개를 서로 비교하는 21개 문항을 만들었다. 아래와 같은 instruction

Following the paired comparison approach,12 a questionnaire was created by listing all pairs of these 7 characteristics (e.g., collaborative versus ethical) and randomizing the order in which the items were presented. Participants were given the following instruction.


더 중요하다고 생각하는 것을 선택해주세요

For each pair of characteristics outlined below, please circle the characteristic that you consider more important in determining who should be admitted to the Undergraduate MD Program at McMaster University. You must choose one characteristic from each pair, or your responses will not be analyzed. Definitions for each char- acteristic are provided on the preceding page.


약 10분정도 소요. z score 계산.

Participants responded to 21 pairings; the task re- quired approximately 10 min to complete. From these data, the probability of each item being selected was determined and converted to z scores to determine the relative importance of each of the seven characteris- tics on an interval level scale.

  • Negative z scores do not indicate that the characteristic is viewed as in- unimportant—undoubtedly each of the items cluded are valued to some extent.
  • Rather, negative z scores simply indicate that the characteristic is less important relative to the other options provided.
  • For example, imagine only two items, A and B, were in- cluded in the study, both of which are considered im- portant characteristics. If item A was selected as more important than item B 60% of the time, the probability of selecting item A (0.6) would convert to a z score of 0.26 for item A and the probability of se- lecting item B (0.4) would convert to a z score of –0.26 for item B (see Streiner & Norman13 for an ac- cessible description of the analyses).





Results


그룹을 어떤 식으로 구분하든 z score 결과는 매우 유사했다.

The resultant z score comparisons were remarkably uniform regardless of whether the group under consid- eration was from community, faculty, or the student body. Similarly, homogeneity was observed on com- paring those with more or less intimate administrative level of involvement.

 


 

Discussion



입학 단계에서 실수가 있을 경우 그 결과는 드라마틱하다. 사회적으로 촉발될 수 있는 잠재적 피해 뿐 아니라, 학부의학교육에 들어가는 학생당 비용은 9만달러에 달한다. 균질하게 성공적인 의사결정에 대한 합당한 사회적 기대와 높은 교육 비용을 고려하면 입학과 선발의 판단에서 생겨난 오류를 교정하기 위해 추가적으로 시간, 재정, 노력을 들이는 것은 용납할 수 없다.

The cost of a mis- step in admissions is dramatic. Aside from the poten- tial damage unleashed on society, the financial cost of undergraduate medical education approximates $90,000 (US) annually per student.14,15 Given the rea- sonable expectation of uniformly successful decision making and the high cost of education, any further sig- nificant expenditure of time, money, and effort to remediate errors of judgment by the admissions office is unacceptable.



지난 50년동안 지역사회, 교수, 학생 간 관점에 차이가 유의미하게 다를 것이라는 기대가 있었고, 이는 입학위원회의 구성에 엄청난 변화를 가져왔다. 1957년과 1971년 사이에 입학위원회에 학생이 포함되는 비율은 거의 0%에서 56%까지 늘어났다. 이는 1982년에는 74%까지 늘어났다. 지역사회 인사의 비중이 늘어나는 것은 조금 더 느렸지만 확실히 다가오고 있다. 1971년까지는 3%에서만 포함되어 있었으나 1982년에는 27%까지 늘어났다.

Over the last 50 years, the expectation of significant differences in perspective between community, faculty, and students has promulgated a seismic shift in representation on admissions committees. Between 1957 and 1971, the presence of students on admissions committees of schools affiliated with the Association of American Medical Colleges swung sharply upward, from near nonexistence to 56% (41/73) of committees responding to the survey indicating a student presence.16 This presence continued to rise to 74%(64/86) by the time a similar survey was conducted in 1982.17 The rise of community influence was more delayed, but nevertheless forthcoming. Even by the time of the 1971 survey, only 3% (2/73) of committees reported a community stakeholder presence, although this appears to have risen by the 1982 survey (27% of responding committee memberships arose from non medical–nonprofessional backgrounds in that survey).


이러한 변화를 지지해주는 관점의 차이는 덜 명확하다. 특정 영역에 대한 상대적 중요도 순서를 비교한 연구에서 지역사회 인사와 입학위원회 사이에 공통점이 많았다라는 연구도 있다.

The existence of differences in perspective to warrant these shifts is less clear. A comparison of rank order of the relative importance of particular defined domains was conducted between community members versus members of the Admissions Committee of the University of Massachusetts Medical School (UMMS).18 The study reported that the “results of the rank-ordering of criteria indicate commonalities in outlook and approach between the [community member] conferees and the UMMS Admissions Committee despite the fact that the ranking of the characteristics was done independently” (p. 640). The methodology used by UMMS was, in contrast to the paired comparison analysis described here, far more resource intensive and included a much smaller sample size of stakeholders (n = 20).


이 결과를 바탕으로 윤리적의사결정과 의사소통을 강조하는 MMI스테이션을 만들어야 할 것이다.

These results can now be used to guide the develop- ment of admissions protocols, particularly the MMI, ensuring that the stations are designed to preferentially emphasize ethical decision-making and communica- tion skills.

 






 2005 Winter;17(1):4-8.

Reflecting the relative values of communityfaculty, and students in the admissions tools of medical school.

Author information

  • 1McMaster University, Department of Clinical Epidemiology and Biostatistics, Hamilton, Ontario L8N 325, Canada.

Abstract

BACKGROUND:

In defining the characteristics of medical students that society and the medical profession find desirable, little effort has been spent assessing the relative value of the dozens of characteristics that have been identified. Furthermore, many institutions go to great lengths to ensure equal representation across stakeholder groups in an effort to maximize the heterogeneity of the pool of students accepted to study medicine; however, the extent to which different stakeholders value different characteristics has yet to be determined.

PURPOSE:

This study was an attempt to assess the relative value of the characteristics of medical students that society and the medicalprofession find desirable.

METHODS:

Using documents created internationally to identify the core competencies of medical personnel, a series of 7 characteristics were generated for inclusion in a study that adopted the paired comparison technique. Of 347 surveyed, 292 respondents indicated the rank ordering they would assign to each characteristic by circling the more important characteristic in all possible pairings.

RESULTS:

Overwhelmingly, "ethical" was deemed to be the most important characteristic on which selection tools should be based. Surprisingly, the pattern of responses was highly consistent regardless of stakeholder group and degree of affiliation with the undergraduate medical program.

CONCLUSIONS:

The generalizable features of this study not only include the empirical findings but also demonstrate useful survey protocol that can be adapted by any admission committee to guide the generation of an institution-specific admissions blueprint. A novel protocol that provides the necessary flexibility is discussed.

PMID:
 
15691807
 
[PubMed - indexed for MEDLINE]


각 선발방법은 얼마나 효과적인가? systematic review (Med Educ, 2016)

How effective are selection methods in medical education? A systematic review

Fiona Patterson,1 Alec Knight,2 Jon Dowell,3 Sandra Nicholson,4 Fran Cousans2 & Jennifer Cleland5




INTRODUCTION


실제로, 의학교육에서의 선발은 종종 정치적 고려 및 핵심 이해관계자에 따라 움직인다. 이러한 영향력은 '전통적인' 척도로부터 벗어나고자 하는 모든 움직임에 - 비록 그렇게 해야 하는 확고한 근거가 있음에도 - 반대하는 결과를 낳기도 하며, 근거-기반 선발을 어렵게 한다. 그러나 Kreiter와 Axelson의 non-systemic review를 보면 지난 25년간 효과적인 교육 인터벤션이 학습에 가져다주 이득은 0.20이하의 효과크기이나, 근거-기반 선발은 훨씬 더 강력해서, 잘 설계된 선발 도구는 1SD 이상의 향상을 가져온다.

Indeed, selection for medi- cal education internationally is frequently driven by political considerations and the preferences of key stakeholders.1 Such influences may result in resis- tance against any move away from ‘traditional’ mea- sures despite compelling evidence to do so, often to the detriment of evidence-based selection practices. However, Kreiter and Axelson’s2 non-systematic review of medical admissions research and practice in the last 25 years noted that effective educational interventions typically produce only small gains in learning (effect sizes generally below 0.20), whereas evidence-based selection is comparatively far more powerful, with well-designed selection tools achieving performance gains exceeding one standard devia- tion.


이전 학업 성취도는 일반적으로, 그리고 앞으로도 선발의 기반 근거가 될 것이고, 초기 스크리닝 단계에서 평가될 것이다. 

Prior academic attainment has gener- ally been, and continues to be, the primary basis for selection and is usually assessed at an initial screen- ing stage.3


그러나 이렇나 접근법에 대해서 몇 가지 우려가 있다. 우선, 이전 연구에서 학업성취도가 좋긴 하나 수행능력의 완벽한 예측인자는 아니며, UME의 23%, PGME의 6% 분산만을 설명한다. 

How- ever, there are several concerns about this approach. Firstly, previous reviews have concluded that aca- demic performance is a good, but not perfect, pre- dictor of performance, accounting for approximately 23% of the variance in performance in undergradu- ate medical training and 6% in postgraduate performance.4


둘째로, 학업성취도가 지속적으로 의과대학 수행능력의 좋은 예측인자라는 것을 보여주고 있으나, 역사적으로 중요한 비학업적 특성, 흥미, 동기부여요인과 같은 것들을 신뢰성있게 평가하는 방법에 관한 연구는 덜 이루어져 왔다.

Secondly, although academic achievement is consis- tently shown to be a good predictor of performance in medical school,5 historically substantially less attention has been paid to researching methods that reliably evaluate important non-academic personal attributes, interests and motivational qualities.


셋째로, 장기적 코호트 연구가 부족하다.

Thirdly, there has been a dearth of longitudinal cohort studies examining the predictors of success after qualification.


의과대학 선발절차와 전공의 선발절차의 공정성은 대중의 많은 관심과 비판의 대상이 되어왔다.

Medical school admissions processes and selection for specialty training attract strong public interest and often criticism regarding fairness.7–9






방법

METHODS


자료 출처

Data sources


We conducted a formal literature search using the criteria specified in Table S1 (online).


연구 포함 및 제외 기준

Study selection and inclusion and exclusion criteria


연구 유형, 퀄리티, 선발방법 평가

Assessment of study type, quality and selection method


 

연구질문과 근거의 퀄리티는 Table 1에. Muir and Grey의 ‘salience’ and ‘safety’ 카테고리는 삭제

The research questions and evidence quality cate- gories are displayed in Table 1. In relation to the different research questions under investigation, we removed Muir and Grey’s (1996)10 ‘salience’ and ‘safety’ categories as they were not relevant to our context.


연구에 대해서 다음을 평가함.

Therefore, we examined each study in relation to four research questions concerning, respectively:

  • effectiveness;
  • proce- dural issues;
  • acceptability, and
  • cost-effectiveness.

 

예측타당도가 선발방법의 효과성에 있어 가장 중요한 척도라는 은연중의 가정을 해소하기 위한 것. 또한 선발도구의 성패는 그 외에도 accessibility, 실행(도입)의 용이성, 핵심 이해관계자들에게 받아들여지는acceptable 정도 등에 따라 달려있다.

This approach was intended to address the assumption implicit in much previous research that predictive validity is the most important measure of the effec- tiveness of a selection method; we acknowledge that the success of a selection tool may be determined by a range of additional factors, including its acces- sibility, ease of implementation and the extent to which it is viewed as acceptable by key stakeholders.



RESULTS


For a full list and description of all papers identified in the review, refer to Tables S2 and S3 (online).


Type of evidence


Effectiveness


Procedural issues


Acceptability


Cost-effectiveness


 

적성검사

Aptitude tests



요약 Summary


학생 선발에 있어서 적성검사의 유용성에 대한 근거는 혼재되어 있으며, 어떠한 적성검사를 대상으로 하였는가에 따라 크게 달라진다. 따라서 적성검사에 대한 일반적인 결론을 내리는 것은 어렵다. 예컨대, 어떤 연구는 적성검사의 예측타당도를 지지하나 다른 연구에서는 어떤 적성검사는 예측타당도가 부족하다고 지적한다. 이러한 mixed 근거는 적성검사의 공정성에 대해서도 마찬가지로 나타나는데, 일부 연구에서는 특정 그룹이 더 점수를 받는다고 하며, 어떤 연구에서는 또 그렇지 않다고 한다. 예컨대, 의과대학 지원자의 여러 그룹 간 공정성equity에 대한 근거는 다양하다(sex, age, language status and socio-economic sta- tus) 또 다른 적성검사에 대한 근거는 지원자의 배경에 상관없이 공정하며, 코칭에 영향을 거의 받지 않고, 시간이 지나도 안정적인stable 성격을 보인다고 말하며, 그 예외로 UMAT을 지적한다. 따라서 각 적성검사에 대해서 평가하는 것이 중요하다.

Mixed evidence exists among researchers on the usefulness of aptitude tests in medical student selec- tion and findings largely depend on the specific aptitude test studied; hence commenting on the generality of findings is problematic. For example, some studies support the predictive validity of apti- tude tests, but other research suggests that some specific aptitude tests lack predictive validity. Mixed evidence also exists on the fairness of aptitude tests, with some research suggesting that certain groups score more highly on aptitude tests than other groups, whereas other research suggests that this is not the case. For example, there is varied evidence on the equity of aptitude tests for different groups of medical school applicants (e.g. according to sex, age, language status and socio-economic sta- tus).11,15,20,24,46–50 Other evidence suggests that apti- tude tests are equitable with respect to candidate background, are affected relatively little by candi- date coaching, and remain stable over time,20,24,44,50–52 with the possible exception of the UMAT.30 It is therefore important to evaluate each aptitude test in its own right in order to draw con- clusions on the quality of the tool.





학업성취도

Academic records


Summary


연구자들 사이에서 학업성취도가 의과대학 선발에 유용한 정보를 준다는 합의가 있다. 연구 결과는 일반적으로 학업성취도가 예측력이 있으며, 즉 학업성취도가 더 뛰어날수록 의과대학에서 성공 가능성이 높다는 것이다. 그러나 이전 학업성취도의 변별력에 대한 우려가 있어서 이는 의과대학 지원자가 최상위권top grades를 받을수록 점차 변별력이 없어진다는 우려도 있다. 또한 높은 성적을 받은 지원자가 더 좋은 의사가 된다는 장기 추적 자료근거가 부족하다. 더 나아가 Milburn은 영국에서 지나치게 A-level 지원자에 의존하는 것이 대학의 사회적 유입 social intake를 왜곡시키며, 의과대학을 학업성취도에만 근거해서 뽑는것이 중요한 비학업적 요인을 무시하는 결과를 가져올 수 있다고 지적한다.

There is a high level of consensus among researchers that academic records provide useful information to inform medical student selection. Research generally suggests that prior academic attainment has predictive power, meaning that those with stronger academic records are more likely to succeed in medical school. However, there is concern that the discriminatory power of prior academic attainment may be diminishing as increasing numbers of medical school applicants have top grades. There is also a lack of long-term follow-up data to provide evidence that medical school applicants with higher grades go on to become better physicians. Moreover, Milburn8 notes that over-reliance on A-level results in the UK may create a distorted social intake to univer- sities, and recruiting medical students solely on the basis of academic attainment may neglect important non-academic factors required for suc- cess in medical school and beyond.


자기소개서

Personal statements


효과성 Effectiveness


예측타당도에 대한 효과성 근거는 엇갈린다. 비록 일부 근거가 자기소개서의 유급/탈락, 내과 수행능력, 임상 관련 교육 등에 관한 예측타당도를 지지하고 있지만, 또 다른 연구는 자기소개서는 다른 흔히 사용되는 선발도구에 비해서 신뢰성이 떨어진다고 주장하기도 하며, 의과대학 성공의 예측을 잘 해주지 못한다고 지적한다. 그러나 일부 저자들은 자기소개서는 지원자들로 하여금 그들이 지원하는 의학 학위의 특징에 대해서 인식하게 해주며, 좀더 informed decision을 하게 도와준다고 말한다.

Evidence on the predictive validity of personal state- ments is varied. Although some evidence has been found for the predictive validity of personal state- ments for medical school dropout rates,65 perfor- mance on internal medicine14 and clinical aspects of training,66 several others have reported that personal statements have low reliability compared with other commonly used selection instruments70 and are not predictive of subsequent success at medical school.2,71–73 Some authors suggest, however, that personal statements may have some value for making applicants aware of the characteristics of the medical degree they are applying to, which may help themto make a more informed decision to apply.73


 

절차적 이슈 Procedural issues


절차적 요인이 자기소개서의 신뢰도와 타당도에 영향을 준다. 의과대학 지원자는 자기소개서를 통해서 입학위원회에게 매력적으로 보일 만한 방법으로 스스로를 보여주나, 그것이 지원자의 특성을 반드시 정확하게 보여주지 않을 수도 있다. 따라서 자기소개서에 드러나는 인적 특성은 부분적이고 주관적이다. 자기소개서의 효과성에 영향을 주는 요인으로는 마감시기에 비해서 일찍 냈는지, 채점 방식, onsite vs offsite 등이 있다. 마지막으로 한 연구는 자기소개서가 여러 영국 의과대학 사이에 서로 다양한 방법으로 사용되고 있음을 지적했다. 일부 의과대학은 선발 결정을 내리는 공식적 정보로서 활용했으나, 어떤 의과대학은 선발에 부당한 bias를 줄 수 있어서 이 정보를 무시하였다.

Evidence suggests that a number of procedural factors affect the reliability and validity of personal statements. Medical school candidates may use personal statements to present themselves in ways they believe are attractive to admission commit- tees, which may not necessarily be accurate.74,75 Hence, the information captured by personal statements is likely to be both partial and subjec- tive in nature. Factors that may affect the effec- tiveness of the selection method include the earliness of submission in relation to a deadline,76 marking method, and on-site versus off-site com- pletion.77 Finally, one article highlighted the fact that personal statements are used differentially by different UK medical schools.78 Some medical schools use the information formally in making selection decisions, whereas others ignore this information out of concern that it may unfairly bias selection decisions.


수용가능성 Acceptability


연구 결과로부터 자기소개서의 데이터 오염의 가능한 원인이 지적된 바 있다. 여기에는 지원자의 이전 기대, 제출까지 걸리는 시간, 제3자의 도움 candidates’ prior expectations, the length of time spent completing submissions, and input to submis- sions from third parties등이 있다. 또 다른 연구에서 정치적 타당성과 이해관계자의 만족도에 대해서 지적한 바 있으며, Stevens 등은 약 60%의 학생이 자기소개서를 의과대학 선발도구로서 적절하다고 인식함을 보여주었다. Elam 등은 의과대학 지원서에 작성해야 하는 내용이 입학위원회가 내리는 결정에 중요한 영향력도 행사할 가능성이 매우 낮다는 것을 보고했다. White 등은 의과대학 지원자가 자신을 보여줄 때, 지원자로서 바람직한 모습을 보여주지, 진짜 자신의 모습을 성찰항 보여주지 않는다고 지적했다. 마찬가지로 Kumwenda는 대부분의 의과대학 지원자는 다른 지원자들이 진실을 왜곡한다고 생각했고, 상당 비율의 지원자가 지원서의 정확성accuracy(진실성)을 평가하지 않을 것으로 생각함을 보여주었다.

Research has highlighted potential sources of data contamination in personal statements, including candidates’ prior expectations, the length of time spent completing submissions, and input to submis- sions from third parties. Other research14,74 has commented on the political validity and stakeholder satisfaction of personal statements in medical stu- dent selection. Whereas Stevens et al.45 found that approximately 60% of students thought that per- sonal statements were suitable to use for admission to medical school, Elam et al.13 reported that the contents of medical school candidates’ application forms are very unlikely to exert any significant influ- ence on decisions made by admissions committees. White et al.74 also argued that medical school candi- dates present themselves in ways that they believe are expected of candidates, rather than in ways that are genuine reflections of themselves. Likewise, Kumwenda et al.79 found that most medical school applicants believed that others stretched the truth in their personal statements, and a proportion of applicants believed it was unlikely that statements were checked for accuracy.


 

요약 Summary



자기소개서의 효과성은 좋게 봐줘야 mixed 되어있다고 할 수 있으며, 예측타당도를 지지하는 근거는 매우 적고, 많은 연구에서 신뢰도와 타당도가 부족하다고 지적한다. 자기소개서는 선발도구로서의 효과성이 다양한 외부 요인에 영향을 받음에도 전세계적으로 의과대학 선발에서 널리 사용된다. 자기소개서의 내용은 선발결정을 내리는 사람들의 판단을 불공정하게 흐릴 수 unfairly cloud 있다.

Evidence on the effectiveness of personal statements in medical student selection is mixed at best. Little evidence exists to support the predictive validity of personal statements, and a large volume of research evidence suggests that the selection method lacks reliability and validity. Personal statements remain widely used in medical school selection worldwide, despite concerns that the effectiveness of the selec- tion method is influenced by numerous extraneous factors. The content of personal statements may also unfairly cloud the judgement of individuals making selection decisions.



추천서

References


요약 Summary


추천서의 신뢰성과 타당성 모두에서 부정적이라는 근거는 충분하다. 그럼에도 추천서는 의과대학 선발에 흔히 사용되는 도구이다. 이러한 측면에서, 의과대학 선발에 추천서를 넣는 것은 도움이 되지 않으며, 소중한 자원은 다른 선발 도구에 사용하는 것이 더 좋을 것이다.

There is a good level of consensus that references are neither a reliable nor a valid tool for selecting candidates for medical school. Despite these find- ings, references remain a common feature of med- ical school selection worldwide. To this extent, the inclusion of references in medical school admis- sion processes may be unhelpful and may use valuable resources that could be directed more usefully to selection methods with evidentially based reliability and validity.




SJT

Situational judgement tests


요약 Summary


SJT가 잘 만들어지기만 한다면 신뢰성 있고, 타당하교, 비용효과적이고, 수용가능하다는 근거가 충분하다. SJT는 개발이 복잡하고, 따라서 문항의 형식, Instruction, 채점 등과 관련하여 다양한 옵션이 있다. 이러한 옵션이 적절하게 보정calibrate된다면 SJT에 근거들은 이것이 의과대학에서 비학업적 특성 평가에 강점을 갖음을 보여준다.

There is a good level of consensus among research- ers that SJTs, when properly constructed, can form a reliable, valid, cost-effective and acceptable ele- ment of medical school selection systems. SJTs are complex to develop and there is a wide range of options available in relation to item formats, instruc- tions and scoring. When these options are cali- brated appropriately, research evidence points to the strength of SJTs in medical student selection for assessing non-academic attributes.




성격, 감정지능

Personality and emotional intelligence


요약 Summary


포괄적으로 말해서, 연구자들은 성격의 어떤 영역은 의과대학 수행능력에 유의미하게 긍정적/부정적 방향으로 관련됨에 합의를 이룬다. 그러나 성격 영역과 의과대학 수행능력간의 관계는 종종 매우 복잡한데, 예를 들면 conscientiousness 는 지식-기반 평가에는 긍정적으로 연관되어 있으나, 일부 임상상황에서의 평가에서는 부정적으로 연관되어 있다. 이러한 결과는 성격-기반 선발도구를 검토할 때 준거의 구인에 대해서 보다 자세히 살펴볼 필요가 있음을 제안한다. 성격검사는 비용-효과적이고 면접 방법 등과 같이 추가 probe가 가능한 다른 선발도구와 함께 사용될 수 있다.선발을 하는 사람들은 성격검사가 의과대학을 넘어선 장기적 예측타당도에 대한 근거가 부족함을 알아야 한다. 또한 성격검사가 의과대학에 입학하는 학생들의 다양성을 축소시킬 수 있음을 알아야 한다. EI의 예측타당도에 관한 연구는 거의 없고, 매우 초기 단계이다.

Taken broadly, there is a relatively high level of con- sensus among researchers that some domains or traits of personality are significantly positively or neg- atively associated with aspects of performance in medical school. However, the associations between personality domains and medical school perfor- mance are often complex, as is demonstrated by evidence that conscientiousness may be positively associated with knowledge-based assessment, but negatively associated with some clinical aspects of medical school assessment. This suggests that closer attention to the criterion constructs should also be considered when reviewing personality-based selection tools. Personality assessment can be cost-ef- fective and may be used in combination with an interview method in which applicant responses can be probed further. Recruiters should be aware that there is a relative dearth of evidence regarding the long-term predictive validity of personality assess- ment beyond medical school, and that there has been some concern that personality assessment may narrow the diversity of types of individuals entering medical education and training. Research on the predictive validity of EI assessment was sparse and at a very early stage of development.



면접, MMI

Interviews and multiple mini-interviews


Type of evidence



효과성 Effectiveness


 

일부 반하는 근거가 있지만, 근거를 종합하면 전통적인 면접방식은 학생선발로서 예측타당도가 부족하고 강건한robust 방법이 아니라는 것이 중론이다. Edwards 등은 면접에서의 수행능력이 낮은 것이 높은 의과대학 성적과 연괸된다고 하였다. 면접의 효과성에 대한 혼재된 근거는 면접 방법의 다양성을 보여주는 것이기도 하며, 상대적으로 비구조화된 것부터 고도로 구조화된 패널 면접까지 다양하다. Eva와 Macala는 비록 행동면접스테이션behavioural indicator stations가 다른 타입보다 더 신뢰도가 높긴 했으나, 면접관 평가의 신뢰도에 있어서 비구조화된 것과 구조화된 MMI 간 차이가 없음을 보여주었다.

Despite some evidence to the contrary,14,16,33,123–130 the balance of evidence suggests that generally, the traditional interview is not a robust method of selecting medical students, and lacks predictive validity.4,9,28,80,131–137 Edwards et al.17 found that poorer interview performance was associated with higher medical school grade point average (GPA). The mixed findings on the effectiveness of inter- views may reflect substantial differences in interview methods, which range from relatively unstructured individual interviews to highly structured panel interviews. However, Eva and Macala138 found no difference between the reliability of interviewer ratings in unstructured and structured multiple mini-interview (MMI) stations, although behavioural indicator stations differentiated between candidates more reliably than other station types.




MMI에 관한 연구는 전통적 면접에 관한 것보다 일관된다. 예컨대 psychometric properties는 적절한 것으로 보고된다. Uijtdehaage and Parker는 지원자에 대한 상대적(rather than 절대적absolute) 평가를 사용한 연구에서 MMI의 신뢰성이 쉬운 스테이션을 보다 어려운 것으로 바꿔서 향상될 수 있음을 보여주었다. 그러나 Hissbach 등은 지원자의 수행능력에 대한 systemic difference보다 평가자의 bias가 지원자 점수에 더 큰 영향을 줄 수 있음을 보여주었다. 비록 의사소통기술과 같은 일부 특성은 MMI에서 흔히 평가대상이 되곤 하나, 여러 면접밥법 사이에 측정하고자 하는 것이 무엇인가에 대한 명확성이 부족하다. 비록 설계와 무관하게 MMI와 학업성취도 간의 관계는 작거나 없지만, MMI의 구인타당도는 아직 연구대상이다. 더 나아가서 매우 표준화된 면대면 면접은 표준화된 배우를 활용한 시나리오-기반 MMI면접에 비할 바가 아니며, MMI 스테이션의 차원성dimensionality(MMI가 스테이션당 하나 이상의 구인을 측정하는가)에 관한 문제는 논쟁거리가 되고 있다.

The findings from research on MMIs tend to be more directionally consistent than those from research on traditional interviews: for example, the psychometric properties of MMIs are usually reported to be adequate.44,139–146 Uijtdehaage and Parker146 found that the reliability of an MMI was improved by replacing an easy station with a more challenging one, and using relative, rather than absolute, ratings of candidate performance. How- ever, Hissbach et al.147 found that rater bias had a greater effect on applicant scores than systematic differences in candidate performance. There is little clarity about what is being measured within the dif- ferent approaches described, although some attri- butes, such as communication skills, are commonly purported to be assessed by MMIs. Construct validity evidence for MMIs remains exploratory and largely inconclusive, although irrespective of design differ- ences, the relationships between MMIs and aca- demic measures are small to absent.145 Moreover, tightly standardised face-to-face interviews may not be comparable with scenario-based MMI stations utilising standardised role actors, and the dimen- sionality of MMI stations (i.e. whether MMIs can measure more than one construct per station/inter- view question) has been debated in the literature.145



절차적 이슈 Procedural issues


MMI는 대학별로 길이, 패널 구성, 구조, 내용, 채점방법 등이 다양하다. 면접방법이 다양한 것은 신뢰도와 타당도의 혼재된 연구결과의 원인일 수 있다. 다른 근거들은 지원자의 수행능력이 코칭에 따라 영향을 많이 받는다고 지적한다. 비록 많은 연구자들이 MMI를 성공적으로 도입하였다고는 하나 면접을 사용함에 있어 질문의 범위나 유형에 관련된 logistical 어려움이나 면접관의 주관성 등과 같은 어려움이 있었다고 보고한다. Uijt- dehaage and Parker 는 'MMI도입은 할 수는 있지만 상당히 부담스러운daunting 일이다'라고 요약했다.

Schools differ significantly in terms of the length, panel composition, structure, content and scoring methods for interviews. The differential usage of the interview method in medical student selection may underlie the mixed findings on both the relia- bility and validity of interviews reported above. Other research evidence suggests that candidate performance may be significantly affected by coach- ing.30 Using interviews in a selection process also presents logistical difficulties relating to the range and type of questions155 and interviewer subjectiv- ity,51,143,156,157 although numerous authors report on the successful implementation of MMIs into their medical school admission processes.44,146 Uijt- dehaage and Parker summarised that ‘implementing an MMI was feasible but a daunting task’.146




수용가능성 Acceptability


대부분의 연구는 면접 절차에 대한 지원자와 면접관의 긍정적 인식을 보여주며, MMI와 더 구조화된 면접이 덜 구조화된 면접보다 선호된다는 근거가 있다. 일부 근거는 의과대학 지원자는 면접을 시행하는 의과대학을 더 선호함을 보여준다. Campagna-Vaillan- court 등은 대부분의 지원자와 평가자가 MMI가 다양한 역량을 평가하는데 적절한 방법이며, 이를 공정fair하다고 보았고, 전통적 방법보다 선호함을 보여주었다. MMI를 선발에 도입할 때 단계적으로 staged 도입하는 것이 더 받아들여질 가능성acceptance을 높일 수 있다. 표준화된 면접은 PGME 선발에도 사용할 수 있으며, IMG학생이나 면접관에게도 acceptable하다.

Most research reports that applicants and interviewers tend to viewthe interviewing process posi- tively,44,45,60,146 and there is tentative evidence that MMIs and more structured interviews are preferred over less structured methods.138,158 Some evidence suggests that aspiring medical students may prefer the schools that conduct interviews.159 Campagna-Vaillan- court et al.144 found that the majority of applicants and assessors perceived an MMI to be appropriate to assess a range of competencies and considered it to be a fair process, as well as being preferable to a tradi- tional interview. The staged introduction of an MMI into a selection process may foster institutional accep- tance of the method.160 Standardised interviews can also be adapted for use in postgraduate medical selec- tion to measure characteristics that are considered important and acceptable to both international medi- cal graduates and interviewers.139,141,161


비용 효과성 Cost-effectiveness


비록 면접이 기계-채점 방식의 시험보다 더 비용이 많이 들긴 하고, MMI가 전통적 면접보다 스테이션 개발과 연기자 인건비로 인해서 비용이 더 올라가나, MMI의 비용-효과성은 일반적으로 괜찮은 편이다. Value for money는 스테이션 수를 늘리거나 신뢰도가 충분하지 않은 스테이션을 줄여서 더 높아질 수 있다. 그러나 일부 연구결과를 보면 스테이션 수나 질문question의 수를 늘리는 것이 면접관을 늘리는 것보다 더 신뢰성 향상에 도움이 됨을 보여준다. 실제로 Roberts 등은 Cronbach's alpha가 고부담 시험에서 0.80에 달해야 한다고 추정하며, 한 스테이션당 1명의 면접관을 사용할 경우 14스테이션짜리 MMI 가 이 정도에 도달한다고 했다. 이 숫자는 7~12개 스테이션 정도로 줄일 수 있는데, 이 경우 스테이션당 두 명의 면접관이 필요하다. 또한 Dodson 등은 MMI 스테이션당 길이를 8분에서 5분으로 줄임으로서 자원을 아끼면서도 지원자의 등수나 검사 신뢰도에 영향을 최소화 할 수 있다고 말했다. Knorr과 Hissbach는 최소 MMI 스테이션 수에 대해서 일반적 권고안을 내리기 어렵다고 했다.

The cost-effectiveness of MMIs is generally reported to be good,154 although comparatively interviews are significantly more costly than machine-marked tests, and MMIs are more expensive than traditional inter- views because they incur increased costs for station development and actor payments.145,146 Value for money may be improved by examining the number of stations in an MMI, and reducing the number of stations if reliability is not affected. However, some research suggests that increasing the number of questions or stations in MMIs increases reliability more than increasing the number of interview- ers.143,145,162 Indeed, Roberts and colleagues esti- mated that to reach a Cronbach’s coefficient alpha of 0.80 for high-stakes assessment, MMIs must include 14 stations if each is manned by a single interviewer. This number could be reduced to between seven and 12 stations if each station is manned by two interviewers.143 Alternatively, Dod- son et al.163 found that reducing the duration of MMI stations from 8 to 5 minutes conserves resources with minimal effect on applicant ranking and test reliability. Knorr and Hissbach145 concluded in their systematic review that no general recommen- dation for the minimum number of MMI stations can be derived from the literature at present.


Tiller 등은 비용과 시간을 줄이기 위해서 스카이프로 MMI를 시행가능함을 보여주었다.

Tiller et al.164 found that cost and time savings for candidates were substantial when an MMI was con- ducted online via Skype rather than in person, although further research is required regarding the impact on fidelity of the lack of a face-to-face encounter.



요약 Summary


면접은 가장 많이 사용되는 선발도구 중 하나이다. 여러 근거를 보면 전통적인 면접은 고부담 결정의 도구로 사용하기에는 신뢰도와 타당도가 떨어지며, MMI가 신뢰도와 타당도를 높일 수 있는 방법이다. MMI의 예측타당도와 구인타당도에 대해서는, 특히 구인이 정확하게 측정가능한가에 대해서,  더 많은 이론-주도theory-driven연구가 필요하다. 면접에서 평가될 준거의 적절성에 대한 근거가 더 필요하고, validation study가 필요하다. 비용효과성이 평가되어야 하며, 채점이나 점수의 대안적 활용(최저 기준(과락) 설정)에 대한 연구도 더 필요하다. MMI는 그 신뢰성 근거가 누적되며 최근 빠르게 확산되어가고 있다. 그러나 구인타당도와 차원성dimensionality에 대한 이슈는 아직 문제의 여지가 있다. 대학들은 그들이 측정하고자 하는 것이 무엇인지, 실제로 측정하는 것은 무엇인지를 더 잘 이해해야 한다. MMI가 지원자에 미치는 영향은(공정성fairness, 수행능력, 코칭의 영향력 등) question rotation과 같은 설계 관련 결정에 매우 중요한 실제적 문제이다.

Interviews are among the most widely used tools in selection for medical school admission. Evidence suggests that traditional interviews lack the reliability and validity that would be expected of a selection instrument in a high-stakes selection setting. Evidence also suggests that MMIs offer improved reliability and validity over traditional interview approaches. Further theory-driven research is war- ranted, however, in relation to the predictive and construct validity of the MMI method, particularly with respect to the constructs that can be assessed accurately (e.g. communication, critical thinking, empathy, etc.). More evidence is required regarding the appropriateness of criteria that can be assessed in interviews and should be informed by validation studies. In addition, the cost-efficiency and utility of MMIs should be evaluated, along with alternative approaches to scoring and alternative uses of scores (including any minimum threshold criteria). The use of MMIs has spread rapidly in recent years as they can be designed as a reliable selection method. However, issues surrounding the construct validity and dimensionality of MMIs remain problematic: it is critically important that schools better understand what they are seeking to measure, and actually are measuring, with this approach. The impact of the MMI on candidates (in terms of fairness, perfor- mance, coaching effects, etc.) is an outstanding practical concern that should influence design deci- sions such as question rotation.






선발센터

Selection centres


Summary


전반적으로 SC의 유용성에 대한 연구가 부족하다. PG 선발에서 SC의 예측타당도 근거가 강력하며, 더 많은 연구 필요.

Overall, research on the utility of SCs for medical student selection was relatively sparse. Evidence on the predictive validity of SCs for postgraduate selec- tion is stronger, although further evidence is required to build a case for their predictive validity in medical school selection.






DISCUSSION


핵심결과요약

Summary of key findings


지나치게 단면연구설계에 대한 의존도가 높고, 타당도보다는 신뢰도에 집중되어 있어서 'reliably wrong'한 결과를 가져올 수 있다. 비록 일부 연구가 예측타당도를 다루었지만, 구인타당도(무엇이 측정되고 있는가)를 다룬 연구는 적고, 비용-효과성 연구도 적다. 비록 18년간의 연구를 다루었지만, 장기 추적 연구가 부족하다. 지난 2년간 증가하고 있기는 하다.

There is an over-reliance on cross-sectional study designs and a general focus on reliability estimates as indicators of quality rather than aspects of validity (a method may have high reliability but be ‘reliably wrong’25). Although some studies have addressed issues relating to pre- dictive validity, very little research has explored construct validity issues (i.e. what is being mea- sured) and the relative cost-effectiveness of selec- tion methods. During the 18 years covered by this review, there have been remarkably few long-term evaluation studies; however, we note that over the last 2 years there has been an increase in the amount of longitudinal evidence emerging in this area.


여러 선발방법이 복합적으로 사용된 경우 다양한 선발방법들을 아우르는(그리고 가중치의 영향력을 포함한) 선발 시스템과 관련한 연구가 적다.

There remain comparatively few studies examining selection system design overall and the relative contributions of the various selection methodolo- gies (and the impacts of various weightings) when methods are used in combination (as is the norm in medical school selection172,173).


그러나 신뢰성, 타당성, 효과성에 대한 명확한 메시지는 있다. 학업성취도는 대부분의 선발정책과 근거의 strength에서 공통적 특징으로 지속되고 있으며, 앞으로도 그러할 것으로 생각된다. 여러 근거가 전통적 면접, 자기소개서, 추천서보다 구조화된 면접, MMI, SJT, SC가 더 효과적이고 공정한 방법임을 보여준다. 적성검사의 효과성과 공정성에 대한 근거는 혼재되어있고 검사에 따라 다르다. 이는 현재로서 '적성'이 의미하는 바가 무엇인지 합의된 프레임워크가 없기 때문일 것이다. 현재로서는 '순수한' 인지능력 평가(UKCAT)부터 학력검사(BMAT)까지 다양하다. 이런 상태에서는 다양한 적성검사의 상대적 기여를 systematic하게 평가하기 어렵다.

There are, however, some clear messages about the comparative reliability, validity and effectiveness of various selection methods. The academic attainment of candidates remains a common feature of most selection policies and the strength of evidence in support of it continuing to do so remains strong. The extant evidence paints a relatively clear picture illustrating that structured interviews or MMIs, SJTs and SCs are more effective methods and generally fairer than traditional interviews, references and personal statements. Evidence is currently mixed regarding the effectiveness and fairness of aptitude tests, depending on the tool in question. This stems largely from the fact that there is no currently agreed framework that specifies what is meant by aptitude; at present tests range from assessments of ‘pure’ cognitive ability (e.g. the UKCAT) to aca- demic tests (e.g. the BMAT). As such, it is difficult to systematically assess the relative contributions of different aptitude tests, and of aptitude tests within a wider selection system.


다양한 선발방식의 수용가능성에 대한 결과도 혼재되어 있는데, 다양한 정치적 이슈 - 이해관계자의 다양한 관점, 의과대학생과 의과대학에 관한 철학적 차이, 선발도구가 도입되는 형태 - 때문이다.

The picture regarding the acceptability of various selection methods is also mixed, and may be influenced by a variety of political issues including differing stakeholder views, variations in the philosophies of both medical students and medical schools, and the ways in which the tool is implemented as part of a selection system.


여기에 실린 논문을 평가할 때 어떤 용어는 그 스펙트럼이 다양하다는 것을 명확히 해야한다. 그 설계방식에 따라서 평가도구의 질이 엄청나게 달라질 수 있으며, 따라서 효과성에 대한 결론을 내리기 전에 개별적으로 각 설계방식을 검토해봐야 한다. 

When judging the papers in this review, it was clear that some terms cover a broad spectrum of meth- ods: MMIs, SJTs, aptitude tests, personality assess- ments and SCs are measurement methods that comprise a multitude of different design parame- ters. Depending on the design, this may significantly alter the quality of the instrument to the extent that each needs to be indi- vidually evaluated before conclusions about its effec- tiveness can be reached.


이론에 대한 함의

Implications for theory


선발연구에 대해서 지속적인 문제는 우리가 선발도구로 예측하려는 성과와 관련되어 있다. 예를 들어 준거criterion에 있어서  conscientiousness 와 수행능력간 관계에 있어 의과대학 초기 성과와 후기(임상)성과에 따라 혼재된 결과를 보여준다. 또한 선발도구 평가에 사용되는 성과척도가 성취도와 최대 수행능력에 대한 것이기에 (의과대학 성취도, 면허시험 수행능력), 임상 진료행위나 전형적(day-to-day) 수행능력과는 다를 수 있다.

A persistent problem with selection research relates to the issue of which outcomes we are trying to pre- dict by using various selection methods.59 For exam- ple, to illustrate this criterion problem, when exploring the association between conscientiousness and per- formance outcomes, we find mixed results when examining outcomes relating to early examination performance in medical school and performance within clinical practice in later years. Furthermore, our review also highlights that outcome measures used to evaluate selection methods most often focus on indicators of attainment and maximal perfor- mance (e.g. medical school achievements, perfor- mance in licensure examinations) rather than indicators relating to clinical practice and typical (day-to-day) in-role job performance.


선발 방법의 정확성과 관련해서 outcome criteria의 명확한 프레임워크가 필요하다.

In judging the evidence for the relative accuracy of selection methods, it becomes appar- ent that a clear framework of outcome criteria with which to interpret the research evidence and compare selection methods, both individually, and within a selection system, has yet to be established;



또한 주로 예측타당도에 초점을 맞춰왔으며, 각 평가도구가 무엇을 측정하고 있는가(구인타당도construct validity)에 대해서는 덜 연구되어왔으며, 어떻게 각 방법이 합해져서 선발시스템을 만드는가에 대한 의문을 갖게 한다. 이는 특히 MMI에 대해서 그러한데, 비록 최근 매우 유명해졌지만, MMI를 가지고 평가하려는 특징attribute가 무엇인가에 대한 consistency가 부족한 것이 구인타당도에 관련된 근거 결론을 내리지 못하게 한다.

In addition, evidence regarding the effectiveness of some methods has focused pre- dominantly on the predictive validity of the tool, rather than on assessing precisely what different methods are measuring (i.e. construct validity); this raises the question of how a method can be considered to add value to a selection system if the constructs it is measuring are unknown. This is particularly the case for MMI research, in which, despite the method’s increasing popularity in recent years, there is a lack of consistency regard- ing the attributes selectors are using MMIs to assess for and, relatedly, evidence regarding con- struct validity remains inconclusive.



지원자의 역량의 지표로 무엇을 봐야 하는가는 medical career의 어느 지점을 기준으로 보느냐에 따라서 달라질 수 있다. 따라서 구체적인 역할에 따라서 지원자를 평가하는 선발 준거가 다양해지고 달라지는데, 여기에는 학업적, 비학업적 지표가 모두 포함된다. 어떤 요인이 UME에는 중요한 예측인자로 나올 수 있지만 임상 수행능력에서는 반대로 작용할 수도 있다. 따라서 서로 다른 선발 방법은 서로 다른 단계마다 서로 다른 방식으로 사용되어야 한다. 예컨대 SJT는 의과대학 초기 수행능력과는 예측력이 낮으나(주로 학업에 초점이 맞춰지므로), clinical practice에 있어서는 더 예측력이 높다. 의학 분야의 선발시스템 설계 어려움은 학업적, 비학업적 자질을 아우르는, 학부선발에서 신뢰도와 타당도가 있는 것과 수 년이 지난 전공의 수련에서 신뢰도와 타당도가 있는 것에 대한 연구 근거를모두 포함시켜야 하는 것이다.

It is clear that indicators of competence for entrance to medical training and practice are likely to be different at different points in a medical career; thus, applicants are judged on multiple selection criteria depending on the specific role, which may include varying combinations of aca- demic and non-academic indicators of aptitude. A factor may be identified as an important predictor for undergraduate training, but may actually hinder some aspects of performance in clinical prac- tice.59,66 As such, different selection methods may predict differently at different stages: for example, an SJT may be less predictive of performance in the early years at medical school (which tends to be more academically-focused), but significantly more predictive of performance outcomes when trainees enter clinical practice.28,174 A major challenge within medicine is to integrate the research evi- dence to inform the design of selection systems that are reliable and valid (and weighted appropriately) from undergraduate selection through to selection for specialty training after many years of education, for both academic and non-academic qualities.


따라서, 더 이론-주도적 연구가 'competent'의사란 누구인가 를 밝히기 위해 이뤄져야 한다. unified taxonomy of performance indicators 를 만들어서 단기- 장기- 예측 타당도의 표지자로서 활용해야 한다. 예컨대, 일부 연구자들은 의과대학선발시에는 학업성취도를 기반으로 select in 하고, 비학업적 기술을 바탕으로 select out해야 한다고 주장한다. 비학업적 능력이 PGME 선발에서 더 큰 역햘을 하며, 전공에 다라서 가중치가 달라질 수 있다는 주장도 있다. 예컨대 공감과 의사소통은 일반의와 소아과에서 중요하고, 경계vigilance와 상황인지situational awareness는 마취과에서 중요하다.

Hence, there is a need for more theoretically driven, future-oriented research aimed at identifying what a ‘competent’ physician is at the various stages of training and practice. This will allow researchers and practi- tioners to move towards crafting a unified taxonomy of performance indicators which may be used as markers in short- and long-term predictive validity studies of selection methods. For example, some researchers suggest that from undergraduate selec- tion onwards, medical students should be selected in on the basis of academic attainment and selected out on the basis of non-academic skills and attributes.175 It could be argued that non-academic attributes and skills should therefore play a much larger role in postgraduate selection and the weighting of these may differ depending on the specialty. For example, research from job analysis studies shows that empa- thy and communication are weighted more heavily for selection into general practice176 and paedi- atrics, whereas vigilance and situational awareness carry more weight in anaesthesia.177



실제practice적 함의

Implications for practice


추천서나 자기소개서보다 SJT와 MMI가  inter- and intrapersonal (non-aca- demic) 특성을 더 타당하게 예측한다. SJT와 MMI는 보완적일 수 있다. SJT가 더 넓은 영역의 구인을 효율적으로 평가한다면, MMI는 면대면 접촉을 포함한다. 비록 비용이 들지만 구조화된 면접은 지원자 응답을 더 멀리, 더 깊게 probe할 수 있다.

Our review shows that SJTs and MMIs are more valid predictors of inter- and intrapersonal (non-aca- demic) attributes than personal statements or refer- ences. Situational judgement tests (SJTs) and MMIs may be complementary: whereas SJTs can measure a broader range of constructs efficiently as they can be machine-marked, MMIs, by contrast, involve a face-to-face encounter. Although expensive, struc- tured interviews (including MMIs) allow applicant responses to be probed further and in more depth.


현재로서는 적성검사와 인지요인에 대한 그림은 덜 분명하다.  

At present, the picture for aptitude tests and cogni- tive factors is less clear as a result of
  • the large num- ber of aptitude tests and the differences between those that are currently available,
  • the diverse out- come measures against which performance on apti- tude tests is compared (to assess validity, see the ‘criterion problem’ discussed above),
  • the multiple ways in which aptitude tests are implemented, and
  • the mixed nature of the evidence on the effective- ness of aptitude testing.

 

일부 적성검사는 특정 지원자를 선호한다는 근거도 있다.

There is also some evidence that some aptitude tests may favour certain types of candidate,46 which may have unfavourable implica- tions for fairness and widening access to medicine.


선발방법의 근거를 해석하고 적용하는데 대한 어려움에는 아래와 같은 것들

The challenges of interpreting and apply- ing evidence of selection methods include

  • 장기 자료 부족 the relative lack of longitudinal data,
  • 성과 준거의 합의된 기준 부족 lack of an agreed-upon framework of outcome criteria, and
  • 기관별 차이 institutional differences (including in available resources, curricula and philosophies of what a high-performing medical student is considered to be).

Kreiter and Axelson는 학생선발의 목표의 복잡성이 장애가 된다고 지적함. social jus- tice, educational equality, health care and political outcomes 등이 종종 서로 경쟁하는 목표가 됨. 선발방법의 질과 효과성을 판단할 때, 어떤 준거는 서로 경쟁관계에 있음을 알아야 함. 예컨대 이해관계자나 평가자들이 생각하는 acceptability가 높더라도 타당도 근거가 낮을 수 있다. 유사하게, SC의 타당도 근거는 높지만, 비용이 많이 들어 사용하기 힘들다. 이러한 측면에서 선발도구의 질과 효과성을 판단할 때 의과대학은 선발시스템이 작동하는 시스템 내에서의 맥락을 고려해야 한다.

Kreiter and Axelson2 acknowledge that the complexity of admissions goals may also be an obsta- cle to evidence-based progress in medical school admissions because concerns regarding social jus- tice, educational equality, health care and political outcomes are broad and frequently competing. When judging the quality and effectiveness of selec- tion methods, it is noteworthy that some criteria may compete with one another. For example, the stakeholder acceptability of referees’ reports in selection is generally high, but the evidence for their validity is poor. Similarly, regarding other cri- teria, the evidence for the validity of SCs is high, but they are relatively costly to implement. In this respect, when judging the quality and effectiveness of different selection methods, medical schools and employers may choose to weight different features depending on the context within which the selec- tion system is operating.


코칭에 대한 취약성은 모든 평가도구의 공통된 우려사항이다.  

A common central concern for any selection tool is susceptibility to coaching. Research over the last 10 years has increasingly focused on this issue, prob- ably because there has been increasing emphasis on how to validly assess non-academic attributes in selection for medical education.

  • 자기소개서: 코칭에 영향을 받음. 다국적 기업이 있음.  In particular, per- sonal statements are at significant risk of being influenced by coaching, or indeed of being written by somebody other than the applicant; a brief online search reveals a large number of companies internationally that sell pre-written personal state- ments.
  • SJT: 코칭의 효과가 없음. With regard to SJTs, recent studies have found no effects of commercial coaching on SJT scores or the predictive validity of SJTs.87,178 How- ever, ongoing research is required to assess the coachability of the full range of non-academic selec- tion tools in greater depth.

 

미래 연구 아젠다

Scoping a future research agenda



명확한 결론은 내리기 어렵다.

It is clear from our review that it is challenging to draw firm conclusions regarding the relative strength of the different tools given the variety in the quality and design of the currently available research evidence: at present there are insufficient data, and medical education providers’ agendas are too diverse, to propose a fully comprehensive frame- work for international best practice in medical selec- tion methods.


잘 설계된 연구가 필요하다.

There is a clear need for well-planned studies focusing on the long-term follow-up of medical students, tracking students from admission through to assessments in more senior training posts in clini- cal practice, at the point of licensure and beyond.


widening access and diversity 에 관한 연구가 필요하다.

Within the broader sphere of issues of fairness in selection, more research exploring issues of widening access and diversity is required, whether it refers to race, ethnicity or social class, as this remains a chal- lenge within medical school admissions globally, and it is becoming increasingly important politically to reflect society within the health care profes- sions.179,180


O’Neill 등은 선발방법이 socal diversity에 미치는 유의한 영향은 없다고 하면서, 지원자 풀을 다양하게 하는 것이 더 중요하다고 했다. 아직까지 결론은 임시적이다.

O’Neill et al.181 found no significant effect of selection method on social diversity in the medical student population,

and sug- gest that the attraction of a sufficiently diverse appli- cant pool is more important for widening access than which selection tool is used. Therefore, only tentative conclusions can be drawn.



이전 교육성취도는 높은 예측타당도로 인해서 의학교육의 'academic backbone'이라고 불리지만, 어떻게 'contextual data'가 활용될 수 있을 것인가에 대한 연구 필요.

Whereas traditional markers of prior educational attainment have been called the ‘academic backbone’ of medical education because they are highly predictive of subsequent perfor- mance both at medical school and beyond, there is a need to explore how ‘contextual data’ can be used to allow the social and educational backgrounds of applicants to be taken into consideration alongside their educational achievements.


'비인지적'이라는 용어는 문제가 있는데problematic, '생각하지 않음'을 의미하기 때문이다.

A key criticism of selection research is that there is a distinct lack of theory-driven studies that examine issues related to validity and the constructs being measured and that, more broadly, acknowledge con- temporary models of adult intellectual development and skill acquisition, or attempt to integrate cogni- tive and non-cognitive factors.172,173 The term ‘non- cognitive’ is in itself problematic as it arguably implies ‘not thinking’;




다음을 제안함

In summary, we propose the following priorities for a future research agenda over the next 50 years in order to enable schools and employers to make evi- dence-based decisions about which selection tools to use and why:


1 longitudinal research exploring predictive valid- ity and following students throughout the course of their careers within education, train- ing and practice;


2 research enabling greater understanding of how selection tools may impact on widening access and diversity agendas, and


3 theory-driven studies of the construct validity of both academically and non-academically ori- ented selection methods and selection systems that will help us to understand what we are assessing for in both the short and long terms.




Finally, we propose that the following five consid- erations will be integral in shaping the direction of medical education research over the next 50 years:



 

1. 의과대학 입학은 여전히 경쟁이 높을 것이다.

1. Medical school admissions will remain highly compet- itive. The prestige of being a physician is likely to continue to drive a high applicant-to-selec- tion ratio in medical school selection interna- tionally over the next 50 years. However, this is unlikely to be true in all postgraduate spe- cialties; some medical career pathways may be perceived to be of higher status and will there- fore be more competitive than others. Medical selection may become part of a process to facil- itate recruitment into areas of most need. This may, in turn, require varying emphasis on selec- tion for specific attributes and competencies: one size is unlikely to fit all.



2. 비학업적 역량에 대해 더 집중될 것이다.

2. There will be an increased focus on, and value of, non-academic attributes and skills in medical selec- tion, aligned with what wider society wishes from its physicians. The role of the physician’s own well- being and resilience, and how these can best be selected for, then supported and developed, will be of increasing importance. Trainees’ expectations of their work–life balance will also be integral to medical selection over the next 50 years. Consideration must be given during selection to the discourse around how we encourage new generations of medical students to expend discretionary effort in future.This is strongly related to:




3. 다학제간 팀을 이끄는 능력, 제한된 자원으로 '일상의' 혁신 문화를 만드는 능력

3. a growing focus on capability to lead multidisci- plinary teams, and building a culture of ‘everyday’ innovation in an environment of reduced resources.



4. 한두명의 '혁신가'에 집중하기 보다는 모든 구성원의 헌신이 필요함

4. Rather than a focus on just one or two people in a team, who are touted as the ‘innovators’, there is likely to be an increased 책임onus on all health care professionals to innovate and pro- vide leadership in order to engage multiprofes- sional teams and to continue to deliver high- quality and compassionate care in a climate of ongoing health care spending cuts.185,186 This may represent a significant change in how applicants to medical education are selected. This, in turn, relates to:


5. 더 넓은 지원자 풀 확보

5. a focus on attracting a wider selection pool and recruiting a more diverse workforce, reflecting a philosophical shift towards acknowledging that non-traditional students may be able to align themselves with patients from diverse back- grounds and also contribute to the education of their peers by acting to challenge the cur- rent medical culture.187,188 Bringing such ‘non- traditional’ applicants into the health care sys- tem may promote, and indeed necessitate, innovative working practices. However, as we have discussed elsewhere,180 there is currently a multitude of unanswered questions on how this may be best implemented and how outcomes can be measured in a reliable and valid way.













 2016 Jan;50(1):36-60. doi: 10.1111/medu.12817.

How effective are selection methods in medical education? A systematic review.

Author information

  • 1Department of Organisational Psychology, City University, London, UK.
  • 2Work Psychology Group, Derby, UK.
  • 3School of Medicine, University of Dundee, Dundee, UK.
  • 4Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.
  • 5School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK.

Abstract

CONTEXT:

Selection methods used by medical schools should reliably identify whether candidates are likely to be successful in medical training and ultimately become competent clinicians. However, there is little consensus regarding methods that reliably evaluate non-academic attributes, and longitudinal studies examining predictors of success after qualification are insufficient. This systematic review synthesises the extant research evidence on the relative strengths of various selection methods. We offer a research agenda and identify key considerations to inform policy and practice in the next 50 years.

METHODS:

A formalised literature search was conducted for studies published between 1997 and 2015. A total of 194 articles met the inclusion criteria and were appraised in relation to: (i) selection method used; (ii) research question(s) addressed, and (iii) type of study design.

RESULTS:

Eight selection methods were identified: (i) aptitude tests; (ii) academic records; (iii) personal statements; (iv) references; (v) situational judgement tests (SJTs); (vi) personality and emotional intelligence assessments; (vii) interviews and multiple mini-interviews (MMIs), and (viii)selection centres (SCs). The evidence relating to each method was reviewed against four evaluation criteria: effectiveness (reliability and validity); procedural issues; acceptability, and cost-effectiveness.

CONCLUSIONS:

Evidence shows clearly that academic records, MMIs, aptitude tests, SJTs and SCs are more effective selection methods and are generally fairer than traditional interviews, references and personal statements. However, achievement in different selection methods may differentially predict performance at the various stages of medical education and clinical practice. Research into selection has been over-reliant on cross-sectional study designs and has tended to focus on reliability estimates rather than validity as an indicator of quality. A comprehensive framework of outcome criteria should be developed to allow researchers to interpret empirical evidence and compare selection methods fairly. Thisreview highlights gaps in evidence for the combination of selection tools that is most effective and the weighting to be given to each tool.

© 2015 John Wiley & Sons Ltd.

PMID:
 
26695465
 
[PubMed - in process]


의과대학에서 사회통합이라는 달성하기 힘든 목표(Med Educ, 2013)

The elusive grail of social inclusion in medical selection

Nancy Sturman & Malcolm Parker






이 이슈에서 O'Neill 등은 덴마크의 의과대학생의 사회적 구성이 고등학교 성적을 기반으로 하든 'attribute-based' (자질-기반) 트랙으로 선발하든 차이가 없음을 밝혔다. 후자는 학업성취도가 사회경제적 요인으로 제약되었던 학생들을 위한 것으로, 의과대학에 들어올 수 있는 기회를 '성적 외 중요한 자질과 자격'을 바탕으로 제공한 것이다.

In this issue, a study by O’Neill and colleagues reports that the social composition of Danish medical students was similar whether they were selected according to school-leaving grades or on an ‘attribute-based’ track.1 The latter was designed to afford students whose academic grades may have been limited by socio- economic disadvantage, a chance of entry on the basis of ‘other valuable qualifications and attri- butes’.1


선발에 관한 2010년의 Ottawa Conference Consensus Statement를 살펴보면, 환자군의 인구집단을 반영할 수 있도록 사회문화적 포용이 필요하며, 이것이 과소-대표성이 '차별'과 비슷한 '정치적' 타당성과 관련되기 때문이라고 했다. 이러한 평등에 관한 주장은 다른 것과도 연결되는데 치료자와 환자의 관계, patient outcome등이 환자와 의사의 '매칭'이 잘 되었을 때 더 향상된다는 것, 그리고 더 나아가서 의사와 환자 사이의 사회적 계급의 차이가 의사소통 장애의 근간이 되며, 낮은 SES 환자에게 더 열악/열등한 치료를 제공하게 되는 원인이라는 것 등이 있다.

The 2010 Ottawa Conference Consensus Statement on selection for the health care professions argued that wider social and cul- tural inclusion to reflect the patient populations to be served has a ‘political’ validity in that under-representation is tanta- mount to discrimination.4 This argument for equity is accompa- nied by other equity-related con- cerns, including the proposition that therapeutic relationships and patient outcomes are strength- ened by better ‘matching’ of patients and doctors.5 Further- more, there is at least some evidence that differences in social class between doctors and patients underlie difficulties in communication and the delivery of inferior treatment to patients of lower socio-economic status.6


명백하게, 이러한 취약계층 환자의 진료와 진료성과를 향상시키는 것은 의과대학의 사회적 책무성이라는 의제에 속하는 것이다. 그러나 극단적으로 다원주의적인 이 사회에서, 과소-대표되거나 취약계층이라 할 수 있는 사회적 그룹이 숫자는 아주 많다. '사회통합'이라는 의제는 arbitrary하거나 unwieldy하거나 아니면 그 둘 다이다. 불리한 배경 출신의 의사들이 불리한 배경 출신의 환자들을 더 보려고 할 것이라고 가정하지도 않아야 한다. 이들 의사는 의과대학 졸업 후에 그 자신들이 더 높은 사회경제적 계층으로 이동하게 된다. 낮은 SES 그룹에서 더 많은 의과대학생을 모집하는 방법은 모든 의과대학생에게 문화적, 사회적 역량을 기르게 하는 것이며, 여기에는 다양한 사회적 그룹과의 효과적인 의사소통 등이 포함된다. 대부분의 의사는 자신이 속한 배경과 다른 배경을 가진 환자를 진료하게 될 것이며, 이에 대한 교육훈련이 모든 의과대학에 필수적이어야 한다. 또한 의학이 소외집단에 효과적인 의료를 제공하고 건강을 증진시키기 위한 유일한 진로가 아니라는 것을 지적할 필요가 있다.

Clearly, improving care and out- comes for disadvantaged patients falls within the social accountabil- ity agenda of medical schools. However, in increasingly pluralist societies, there are many under- represented and disadvantaged social groups which might reason- ably lay claim to inclusion in med- ical student quotas. A ‘social inclusion agenda’ might become arbitrary or unwieldy, and perhaps both. It should also not be assumed that doctors from disad- vantaged backgrounds will be more likely to work with disadvan- taged patients. Most of these doc- tors will themselves shift to a higher socio-economic bracket after medical qualification.6 An alternative to recruiting more medical students from lower socio-economic strata is to train all medical students in cultural and social competence, including effective communication with dif- ferent social groups.7 As almost all doctors will work with patients from backgrounds which differ from their own at some stage dur- ing their careers, this training should be fundamental to all medical school programmes. It should also be noted that medi- cine is not the only career open to talented young people with a commitment to promoting health and providing health care effec- tively in disadvantaged communi- ties.


성공적인 사회통합을 위해서 들어가는 의학교육의 비용은 높다. 의과대학 수업을 듣는 것은 만만치 않고, 낮은 학업능력을 가진 학생들은 학업적으로, 종종 사회적으로까지 힘들어한다. 이들 학생이 선발단계에서부터 더 경쟁력을 갖추게끔 하려는 upstream 전략과 더불어 의과대학 기간의 학업 지원 프로그램 등이 도움이 될 것이다. 그러나 의과대학 입학 전 'pipeline' 혹은 의과대학 대비 특별 프로그램을 만드는 비용이나, 이들을 의과대학에서 유지하고 지원해주는 교육과정에 드는 비용은 상당하다.

A successful social inclusion agenda for medical education is also costly. Medical courses are challenging, and students with lower academic qualifications are more likely to struggle academically and sometimes also socially.8 Deliberate upstream strategies to support students from identified groups to become more competitive at selection,9 as well as targeted academic support pro- grammes during medical school training,10 appear to be successful. However, the costs of pre-medical ‘pipeline’ and special preparation programmes within these comprehensive strategies of recruitment, retention and support in the curriculum are considerable.5


사회 통합과 관련하여 다른 주장이 또 있을까? 의과대학생 코호트의 다양성이 높아질 때 사회적으로, 지적으로 풍요로운 교육환경을 만들어주며, 의과대학에서의 전통적인, 위해한hamrful 패러다임이 도전을 받을 것이라는 주장도 있다. 이러한 주장은 직관적으로 옳은 것으로 보이며, 반박하기 힘들다. 그러나 어쩌면 의과대학의 잠재교육과정에 만연한 위해한 조직구조, 위계적 문제, 윤리적 과실 등을 해결하는데 더 비용-효과적인 방법이 있을지도 모른다.

Are there other arguments for the social inclusion agenda? There is also the suggestion that greater diversity in student cohorts is likely to produce a socially and intellec- tually richer educational environ- ment in which traditional, potentially harmful paradigms of medical culture are more likely to be challenged. This argument is also intuitively compelling and probably irrefutable. However, there may be other, more cost- effective strategies for addressing the harmful institutional struc- tures, hierarchical relationships and ethical lapses that have been identified as comprising a perva- sive hidden curriculum in medical education.



의과대학에 지원하는 것은 지원자는 물론 가족에게도 고부담의 결정이며, 모든 선발절차는 지원자와의 gaming이며 우수한 의사가 될 잠재력을 지닌 일부를 떨어뜨리게 된다. 의과대학생 선발과 훈련은 이미 비용이 많이 들고 자원-집중적resource-intensive이다. 따라서 일견 타당해보이더라도 달성하기 어려운 사회적 책무성 목표는 reflective, finite, practical해야 한다.

The stakes are high for medical applicants and their families, and any selection process will both attract its share of gaming from applicants and deny admission to some with the potential to become excellent doctors. Medical student selection and training are already expensive and resource-intensive. Quests for plausible but elusive social accountability goals should therefore be reflective, finite and practical.




비록 정치적 타당성이 명백해 보이더라도, 코호트와 다른 연구에서 의과대학에서의 사회통합이라는 목표가 장점이 있음을 보여줄 수도 있지만, 반대로 비현실적이며, 자리를 잘못 잡은 것이고 형편이 되지 못함unaffordable을 보여줄 수도 있다.

Cohort and other studies may demonstrate benefits, or they may show that, despite its apparent political validity, the quest for social inclusion in medical selec- tion is impractical, misplaced and unaffordable.





 2013 Jun;47(6):542-4. doi: 10.1111/medu.12211.

The elusive grail of social inclusion in medical selection.

Author information

  • 1School of Medicine, University of Queensland, 8th Floor, Health Sciences Building, Royal Brisbane Hospital, Herston, Brisbane, Queensland 4068, Australia. n.sturman1@uq.edu.au


의과대학 학생선발: 자기소개서의 신뢰도와 타당도 향상 (Acad Med, 2006)

Medical School Admissions: Enhancing the Reliability and Validity of an Autobiographical Screening Tool

Kelly L. Dore, Mark Hanson, Harold I. Reiter, Melanie Blanchard, Karen Deeth, and Kevin W. Eva





많은 다른 학교들처럼 Michael G. DeGroote School of Medicine at McMaster University 는 uGPA와 이들이 제출한 자기소개서(ABS)를 기반으로 학생들을 초청하여 지원자 면접을 한다. 연구 결과를 보면 uGPA의 신뢰도와 타당도는 비교적 안정적이나 ABS의 그것은 약하다.

Like many schools, the Michael G. DeGroote School of Medicine at McMaster University invites candidates to interview based on grade point average (uGPA) and a candidate-written autobiographical submission (ABS). Local research has demonstrated strong reliability and validity for uGPA,2,3 but the reliability of the ABS has been weak.2


ABS는 다섯개의 질문으로 되어있으며, 여기에는 지원자의 개인적 경험, McMaster에 적합성, 의학 진로와 전공에 적합성 등을 포함한다. 각 지원자의 ABS는 개인정보를 삭제한 다음 세 명의 독립적 평가자에 의해서 평가된다(one health science faculty member, one community member, and one medical student.)

The ABS is composed of five questions designed to evaluate noncognitive characteristics such as applicants’ personal experiences, suitability for McMaster and suitability for a career in medicine. Each applicant’s five ABS questions, stripped of any personal identifiers, are scored by three independent raters: one health science faculty member, one community member, and one medical student.


각 평가자는 30~60개의 ABS를 평가하며, 매년 최대 150명의 평가자가 동원된다.

Each rater scores 30–60 ABS submissions and upwards of 150 raters participate annually.



평가의 비-독립성

Non-independence of the ratings


ABS점수는 의과대학에서의 수행능력과의 상관성이 매우 낮은 것으로 밝혀졌으며, NLE와도 마찬가지다. 이에 대한 한 자기 이유는 ABS점수의 평가자간 신뢰도가 0.45로 낮기 때문이다. ABS는 그러나 높은 내적일치도를 보여준다. 비록 높은 내적일치도가 척도로서의 신뢰성을 보여주는 지표이긴 하나, 동시에 이것은 한 개인에 대한 평가가 지원자별로 독립적으로 이뤄지고 있지 않음을 보여주는 것이기도 하다. 즉, 후광효과가 반영된다는 의미이다. 어떤 지원자의 첫 번째 답변이 그 지원자의 두 번째, 세 번째 항목에 대한 답변의 수행능력에 영향을 주게 되면, 전체적인 지원자가 작성한 각각에 대한 점수의 평균이 아니라 지원자에 대한 첫인상이 그 지원자의 점수를 결정짓게 되는 것이다. 이는 중요한데 왜냐하면 기능적으로, 열 다섯개가 아니라(평가자 수 x 문항 수), 단 세 개의(즉 평가자 수 만큼의) 관찰결과만이 수집된다고 볼 수 있기 때문이다.

Scores on the ABS have been shown to correlate poorly with performance both within medical school and on the national licensing examinations written postgraduation.2 One reason identified by Kulatunga-Moruzi and Norman is that the interrater reliability of ABS scoring is less than adequate (0.45). The ABS has, however, been seen to have high internal consistency (0.88). Although high internal consistency may be seen as supportive of the reliability of a measure, it may in fact be a negative indication that the scores assigned to the individual questions do not provide independent measures of the applicant. That is, the halo effect may be afflicting this measure; if performance on the first question influences the raters’ perceptions of performance on subsequent questions, then the initial overall impression of the candidate will determine the scores assigned to individual questions rather than the individual questions summing to provide a global assessment. This is an important distinction, because it would indicate that, functionally, only three observations (from three raters) are being collected in the current system instead of the desired fifteen


이것이 문제인지 아닌지 알기 위해서 평가가 수집되는 방향을 바꾸었다.

To test whether or not this was an issue, we altered the direction in which ratings were collected.


피평가자의 비-독립성

Non-independence of the ratees


의심할 여지 없이, 소수의 지원자는 ABS를 대리인을 시켜 작성하게 한다. 더 흔한 것은 지원자가 작성한 ABS를 친구, 가족, 재학생, 의사 등에게 보여주고 피드백을 받는 것이다.

Undoubtedly a small percentage of candidates are less than scrupulous and hire ghostwriters in an attempt to generate a more appealing ABS. More commonly, however, candidates will pass their submissions around to friends, family, current students, or practicing physicians for feedback to improve the submission.



여기에는 몇 가지 측정과 관련한 문제가 있는데, 첫째로 좋은 ABS의 상한선이 존재하는 한, 그리고 이러한 피드백을 통해서 향상이 이뤄진다면, 결국 지원자가 homogeneous해지는데 기여할 것이며, 획득가능한 신뢰도와 타당도의 최대치를 낮출 것이다. 둘째로, 이러한 restriction of range 가 아니어도 타당도에 대한 의문이 생기는데, 지원자를 평가하는 것인지 지원자의 지지기반시스템을 평가하는 것인지 헷갈리기 때문이다.

it creates a pair of measurement problems. First, given that there is an upper limit on how good an ABS can appear, and assuming that the collection of feedback results in improvement, the submissions may end up being more homogeneous than the candidates, thus lowering the maximum achievable reliability and validity. Second, even without restriction of range, the validity itself must be questioned as it becomes questionable whether one is discriminating between candidates or between candidate support systems.


방법

Method



현장에서 작성하는 ABS는 사전에 제출한 ABS와 대등하나 동일하지 않다. 윤리적 의사결정, advocacy, 개인 경험 등에 초점을 둔다.

The onsite ABS questions were comparable with, but not identical to, the noninvigilated questions participants answered offsite, with questions focusing on ethical decision making, advocacy, and personal experiences.



30개의 무작위 선택한 지원자의 ABS를 평가

For a subset of 30 randomly selected candidates, two scoring methods were compared for each ABS.



Results


 

사전 제출한 ABS의 점수가 현장 작성 ABS보다 높았다. 유의미한 interaction이 있어서, 이러한 main effect는 전통적 방법(offsite, vertical) 평가방법에 영향을 받는 것으로 보인다.

The scores for the ABS completed offsite (mean 4.4) were significantly higher than those completed onsite (mean 4.1; F 5.7, p .05). A significant interaction between site and scoring method (p .01) revealed that this main effect was driven by a higher mean score in the traditional (offsite, vertical) scoring method (mean 4.7) relative to the other three groups (mean 4.0 to 4.2).


평가자간 신뢰도는 onsite에서 높았다. 그러나 offsite ABS의 평가자간 신뢰도는 수평방향 평가는 중등도였으나, 수직방향 평가에서는 낮았다.

When the interrater reliability was assessed, it was found to be high for ABS’s completed onsite (0.81 with vertical scoring, 0.78 with horizontal scoring). However, the offsite ABS interrater reliability was moderate when horizontal scoring was used (0.69), but poor when vertical scoring was used (0.03).


가장 중요한 것으로, ABS 점수와 MMI 의 상관관계는 수직방향보다 수평방향 평가에서 더 뚜렷했다.

Perhaps more importantly, the ABS scores correlated better with the MMI when the horizontal scoring method was used (r 0.44 offsite and 0.65 onsite) relative to when the vertical scoring method was used (r 0.12 offsite and 0.28 onsite).


Discussion


수직방향 평가를 사용했을 때의 높은 내적일관성은 후광효과에 대하여 우려하게끔 한다.

The higher internal consistency achieved using the vertical scoring method provides evidence for our concern that the halo effect may have been biasing ABS assessments


외부와 단절된 상태에서 진행되는 ABS는 수평방향 평가를 활용하여, onsite에서 감독하에, 시간제한을 두고 작성하게 했을 때 가장 좋았다. 감독을 두고 작성하게 함으로써 피평가자의 독립성이 유지된다. 그러나 ABS는 외부와 단절된 상태에서 진행되지 않는다. MMI와 onsite ABS를 비교하여 보았을 때, MMI가 여러 이유로 더 선호된다. 첫째로, 전반적인 일반화가능도에서 MMI는 onsite ABS만큼 강ㄺ하다. 둘째로, 예측타당도에 있어서 MMI는 의과대학에서의 측정과 유의한 정의 상관을 보인다. 셋째로, onsite ABS 채점은 평가자의 시간을 많이 들여야 하고 의사결정이 지연되나 MMI는 즉석에서 그 날 결과가 나온다.

Seen in a vacuum, the method of ABS administration that performed best is clearly application of the horizontal scoring method to submissions collected in onsite, invigilated, time-controlled circumstances. Invigilation ensures independence of the ratees. However, the ABS does not function in a vacuum. Given the choice between MMI and onsite ABS, the MMI is preferred for a number of reasons. First, in terms of overall test generalizability, the MMI is at least as strong as the onsite ABS4. Second, with respect to predictive validity, the MMI has demonstrated significant positive correlation with in-school measures,5 and national licensing examination scores.6 Third, scoring of onsite ABS’s requires rater time subsequent to the date of interview, thus delaying decision making, whereas MMI scores are available immediately on that date.



 







 2006 Oct;81(10 Suppl):S70-3.

Medical school admissionsenhancing the reliability and validity of an autobiographical screening tool.

Author information

  • 1Program for Educational Research and Development, McMaster University, MDCL 3510, 1200 Main Street West, Hamilton, Ontario, L8N 3Z5, Canada. kelly.dore@learnlink.mcmaster.ca

Abstract

BACKGROUND:

Most medical school applicants are screened out preinterview. Some cognitive scores available preinterview and some noncognitive scores available at interview demonstrate reasonable reliability and predictive validity. A reliable preinterview noncognitive measure would relax dependence upon screening based entirely on cognitive tendencies.

METHOD:

In 2005, applicants interviewing at McMaster University's Michael G. DeGroote School of Medicine completed an offsite, noninvigilated,Autobiographical Submission (ABS) preinterview and another onsite, invigilated, ABS at interview. Traditional and new ABS scoring methods were compared, with raters either evaluating all ABS questions for each candidate in turn (vertical scoring-traditional method) or evaluating all candidates for each question in turn (horizontal scoring-new method).

RESULTS:

The new scoring method revealed lower internal consistency and higher interrater reliability relative to the traditional method. More importantly, the new scoring method correlated better with the Multiple Mini-Interview (MMI) relative to the traditional method.

CONCLUSIONS:

The new ABS scoring method revealed greater interrater reliability and predictive capacity, thus increasing its potential as a screen for noncognitive characteristics.

PMID:
 
17001140
 
[PubMed - indexed for MEDLINE]


보건의료전문직 교육훈련을 위한 학생선발에서의 MMI - Systematic review (Med Teach, 2013)

The Multiple Mini-Interview (MMI) for student selection in health professions training – A systematic review

ALLAN PAU1, KAMALAN JEEVARATNAM2, YU SUI CHEN1, ABDOUL AZIZ FALL1, CHARMAINE KHOO1 &

VISHNA DEVI NADARAJAH1

1International Medical University, Malaysia, 2Royal College of Surgeons in Ireland, Perdana University, Malaysia







보건전문직 교육 프로그램 학생을 선발하는 것은 고부담 결정이다. 패널 혹은 위원회 면접이 흔히 사용되나 근거들을 살펴보면 이러한 방식은 학업 혹은 임상 수행능력 예측에 제한적 능력만 가진다.

Admissions to health professions training programmes are high stake decisions. The panel or board interview is commonly used to aid this decision (Edwards et al. 1990), although the evidence suggests its limited ability to predict academic or clinical performance in health care disciplines (Goho & Blackman 2006).


예를 들어 Dixon 등은 패널 인터뷰를 review하여 구조와 점수 anchor가 신뢰도와 타당도에 영향을 준다고 하였으며, Wilkinson 등은 패널 인터뷰가 예측력이 떨어지고 면접으로 인한 'threat'이 일부 잠재적 지원자를 떨어져나가게 한다고 하면서 GPA가 학업 수행능력에서 최고의 예측력을 가지는 것이라고 결론지었다.

For example, Dixon et al. (2002), in their review on the panel interview commented that structure and scoring anchors impact on its reliability and validity. Wilkinson et al. (2008), in their study argued that panel interviews have little predictive value and added that the ‘‘threat’’ of an interview may even dissuade some potential applicants and concluded that GPA (grade point average from student pre entry qualification) has the best predictive value to academic performance.


면접을 구조화하는 것은 수용가능도와 신뢰도를 향상시킨다. MMI는 고도로 구조화된 학생선발 방법이다.

Structuring the interview has been reported to enhance its acceptability and reliability (Patrick et al. 2001). The Multiple Mini-Interview (MMI) is a highly structured student selection method designed to resemble the Objective Structured Clinical Examination (OSCE) (Eva et al. 2004c).


MMI는 지원자의 역량에 대한 다면적 표집을 통해서 그들의 전체적 능력에 대한 더 구체적인 그림을 갖게 해준다.

The MMI, therefore, allows a wide sampling of candidates’ competencies in order to gain a more accurate picture of their overall ability.



방법

Methods



 

결과

Results



Review한 연구들의 특징

Characteristics of studies reviewed



MMI의 특징

Features of the MMI


  • The number of stations used in the studies reviewed ranged from 4 to 12, with 10 studies using a 10-station MMI, 6 using 12, 5 using 8, and the remaining 9 using 4, 7, 9 or 11 stations.
  • Fourteen of the studies used one assessor per station while 4 used 2 assessors, and the remaining 12 did not report the number of assessors per station.
  • Most studies used faculty as assessors, while some used a combination of faculty and community practitioners (Hecker & Violato 2011) and others included students (Brownell et al. 2007).


  • The range of time at each station was 5 to 15min with a mode of 8 min. Eleven studies reported using 8-min stations, five using 7-min, three using 10-min, one using 5-min and one 15-min stations.
  • Two studies tested the effect of different lengths of time at stations; one comparing eight and six minutes (Cameron & Mackeigan 2012) and the other eight and five minutes (Dodson et al. 2009). Seven did not report the time at each station.

 

  • The average MMI has 10 stations, each lasting eight minutes and is rated by one assessor.

 


 

활용가능성

Feasibility


Three studies reported on the feasibility of the MMI. One reported that it did not require more examiners when compared to the panel interview, did not cost more, and the interviews could be completed over a short period of time(Brownell et al. 2007; Finlayson & Townson 2011). Another study reported that it provided a positive experience for interviewers as well as applicants (Eva et al. 2004c).





수용가능성

Acceptability


Of the 30 studies reviewed, 14 reported on the acceptability of the MMI. Some authors reported that the MMI was acceptable to interviewees and interviewers because it was perceived as fair (Razack et al. 2009), transparent (Uijtdehaage et al. 2011) and providing opportunities for the interviewees to regain composure if they had problems with a previous station(Kumar et al 2009). Positive experience for both applicants and examiners has also been reported (Eva et al. 2004c). 


Acceptability was also determined as free from gender and cultural bias (Brownell et al. 2007), and socio-economic disadvantage (Uijtdehaage et al. 2011) or benefit of previous coaching (Griffin et al. 2008). Griffin et al. (2008) reported that previous coaching, as disclosed by applicants, had no effect on UMAT or MMI scores. Applicants who had previous MMI experience improved their subsequent performance in the same stations but not in new stations. 



Preference for station length differed between interviewers and interviewees, with the former judging six mins to be ‘‘just right’’ and eight mins to be ‘‘a bit long’’, and the latter preferring longer time (Cameron & Mackeigan 2012). One study reported that graduate candidates outperformed school-leavers (Dowell et al. 2012) while another reported no difference between graduate and school-leaver applicants(O’Brien et al. 2012). 


Acceptability of the MMI was compared to that of the panel or standard interview by O’Brien et al. (O’Brien et al. 2011) for graduate and school-leaver applicants to 4-year and 5-year medical training programmes. The 5-year candidates, generally school-leaver applicants, reportedly felt that the MMI gave amore accurate picture of their abilities and that the panel interview was more difficult. In contrast, the 4-year candidates felt the MMI was more difficult



신뢰도

Reliability


Eighteen studies reported on the reliability of the MMI. Intra-station reliability was reported to reach 0.98 by Lemay et al.(2007). The inter-item reliability (i.e. the internal consistency of the three scores assigned within any one station) and the inter-rater reliability within stations have also been reported to be very high by Dore et al. (2010). However, Finlayson &Townson (2011) conducted a 4-station MMI, each at 15min,and reported inter-rater reliability ranging from 0.50 to 0.69 for three stations, and 0.10 for one station. 



Generally the reported reliability ranged from moderate(Roberts et al. 2008) to acceptable (Dore et al. 2010) to high(Lemay et al. 2007), with Cronbach’s alpha ranging from 0.69to 0.98. However, Finlayson & Townson (2011) reported 0.45 inter-station reliability ranging from to 0.47. Other researchers have also reported low inter-station correlations,(Lemay et al. 2007). 


Using generalisability analysis, Hecker & Violato (2011) reported a G coefficient of 0.79 for seven stations with two assessors. A Decision study indicated that G¼0.81 can be achieved fromten stations with one assessor. Similarly, in Dore et al.’s (2010) study, G¼0.55 to 0.72 for seven stations, is increased to G¼0.64 to 0.79 with 10 stations in a D-study.



타당도

Validity



내용 타당도

Content validity.


The validity of the MMI was discussed in 17 of the 30 studies. One key observation was that the MMI scores did not correlate with traditional admission tools scores such as (r ¼0.185), the personal interview undergraduate grades (r ¼0.317), simulated tutorial (r ¼ 0.227) and autobiograph- (r ¼0.170) ical sketch (Eva et al. 2004c). Other studies did not reported that the MMI correlate with pre-entry such as academic scores (Hecker qualifications, the GPA et al. 2009), pre-pharmacy average (PPA) (r ¼ 0.025) or (r ¼0.042) Pharmacy College Admission Test (PCAT) (Cameron & Mackeigan 2012), GAMSAT ( ¼0.04) and UK Clinical Aptitude Test (MCAT) ( ¼ 0.00) (O’Brien et al. 2011).

 

However, positive association with certain cognitive skills, such as the GAMSAT scores for ‘‘Reasoning in (r ¼0.26) Humanities and Social Sciences’’ and ‘‘Written Communication’’ (0.26) (Roberts et al. 2008), and cognitive reasoning skills (Roberts et al. 2009) have been reported as well as correlation with autobiographical submission focusing on ethical decision making (r ¼0.65) (Dore et al. 2006). The MMI was not reported to be associated with emotional intelligence (Yen et al. 2011).




예측 타당도

Predictive validity.


For medical students, MMI performance at admission was the best predictor for subsequent OSCE as well as clerkship performance (Eva et al. 2004a). Validity against future non-cognitive assessment was investigated by Eva et al. (2009), who reported that MMI performance at admission was statistically significantly predictive of perform- ance at future examinations, such as the percentage of stations passed in the MCCQE (Medical Council of Canada Qualifying Examination) Part II.

 

However, a cross-sectional study investigating the association between MMI performance of medical residency applicants and their MCCEE (Medical Council of Canada Evaluating Examination) and MCCQE I scores reported low, non-significant correlations, and also non-significant correlation with MCCQE II scores (Hofmeister et al. 2009). In a more recent study, Eva et al. (2012) reported that better MMI performance at entry to medical school was predictive of higher MCCQE scores.



Discussion



이번 연구의 핵심 결과는 다음과 같다.

The key findings were that the MMI was

  • (i) practically feasible in terms of efficient utilisation of time, costs and human resources when compared to the panel interview;
  • (ii) generally acceptable to both interviewees and interviewers;
  • (iii) generally reliable with acceptable Cronbach’s apha and G-coefficient values; and
  • (iv) predictive of future performance in certain aspects of medical council examinations.



스테이션을 개발하고 면접을 시행하려면 전문성이 필요하다. 따라서 초기의 준비비용은 높을 수 있다.

Expertise is also necessary in developing the stations and conducting the interviews. Therefore the initial preparatory costs to develop the MMI are likely to be high (Rosenfeld et al. 2008).



Kumar 등은 시나리오 기반의 MMI가 어떻게 답변해야 하는가에 대한 리허설이나 코칭을 더 어렵게 만들며, 실제로도 MMI에서의 수행능력이 자기-보고된 이전 코칭 여부와 상관이 없으며, 코칭을 받지 못한 지원자에게 불리하지 않음을 보여주었다.

Kumar et al. (2009) identified that the scenario-based nature of the MMI made it harder for rehearsal and coaching of responses, and indeed it has been reported that performance at the MMI is not associated with self-reported previous therefore, coaching (Griffin et al. 2008), and does not disadvantage applicants with no access to coaching.



스테이션 내, 평가자 간 신뢰도는 높고 스테이션 간 신뢰도는 낮은데, 이는 서로 다른 스테이션은 서로 다른 특질을 테스트하기 때문이다.

For example, it is expected that intra-station and inter-rater reliability would be high and inter- station reliability low(Lemay et al. 2007; Dore et al. 2010) since different stations may test different attributes.


그러나 신뢰도는 스테이션이나 면접관의 수와 관련된 것으로 보이며, 각 스테이션의 내용과도 관련되어 있다.

However, reliability would appear to be associated with number of stations or interviewers (Hecker & Violato 2011), and the content of each station (Lemay et al. 2007).


MMI 신뢰도는 acceptable하며, Ottawa 2010 컨퍼런스에서 보건의료전문직 선발에서 활용되는 것에 대한 합의를 이루었다. 면접관의 주관은 측정오차의 가장 큰 원인이 되며, 면접관 훈련이 도움이 될 것임을 시사한다.

The reliability of the MMI has generally been reported to be acceptable. This has been recognised by the Ottawa 2010 Conference in a consensus statement on assessment for al. selection for the health care professions (Prideaux et 2011). Interviewer subjectivity is the largest source of meas- urement error, suggesting that interviewer training could be helpful (Roberts et al. 2008).


대부분의 연구는 MMI 수행능력이 입학 전 성취(GPA, MCAT, GAMSAT)과 무관함을 보여준다. 이는 MMI가 비인지적 특성을 평가한다는 것이다.

Most studies reported that MMI performance was not pre-entry qualifications such as associated with academic GPA, MCAT and GAMSAT scores. This suggests that the MMI is capable of testing non-cognitive attributes, such as

  • profes- sionalism (Hofmeister et al. 2009),
  • legal, ethical and organ- isational skills. (Eva et al. 2009),
  • motivation, interest in medicine, decision making skills, ability to debate a complex issue (O’Brien et al. 2011),
  • empathy, moral and ethical reasoning, motivation and preparedness to study medicine, teamwork and leadership, honesty and integrity (Till et al. 2013), and
  • advocacy, ambiguity, collegiality and collabor- ation, cultural sensitivity, responsibility and reliability (Lemay et al. 2007).


Lemay JF, Lockyer JM, Collin VT, Brownell AK. 2007. Assessment of non- cognitive traits through the admissions multiple mini-interview. Med Educ 41(6):573–579.











 2013 Dec;35(12):1027-41. doi: 10.3109/0142159X.2013.829912. Epub 2013 Sep 20.

The Multiple Mini-Interview (MMI) for student selection in health professions training - a systematic review.

Author information

  • 1International Medical University , Malaysia.

Abstract

BACKGROUND:

The Multiple Mini-Interview (MMI) has been used increasingly for selection of students to health professions programmes.

OBJECTIVES:

This paper reports on the evidence base for the feasibility, acceptability, reliability and validity of the MMI.

DATA SOURCES:

CINAHL and MEDLINE STUDY ELIGIBILITY CRITERIA: All studies testing the MMI on applicants to health professions training.

STUDY APPRAISAL AND SYNTHESIS METHODS:

Each paper was appraised by two reviewers. Narrative summary findings on feasibility, acceptability, reliability and validity are presented.

RESULTS:

Of the 64 citations identified, 30 were selected for review. The modal MMI consisted of 10 stations, each lasting eight minutes and assessed by one interviewer. The MMI was feasible, i.e. did not require more examiners, did not cost more, and interviews were completed over a short period of time. It was acceptable, i.e. fair, transparent, free from gender, cultural and socio-economic bias, and did not favour applicants with previous coaching. Its reliability was reported to be moderate to high, with Cronbach's alpha = 0.69-0.98 and G = 0.55-0.72. MMI scores did not correlate to traditional admission tools scores, were not associated with pre-entry academic qualifications, were the best predictor for OSCE performance and statistically predictive of subsequent performance at medical council examinations.

CONCLUSIONS:

The MMI is reliable, acceptable and feasible. The evidence base for its validity against future medical council exams is growing with reports from longitudinal investigations. However, further research is needed for its acceptability in different cultural context and validity against future clinical behaviours.

PMID:
 
24050709
 
[PubMed - indexed for MEDLINE]


MMI로 평가하는 학업/경험/역량 측정의 가중치 변화가 합격자 민족/인종 코호트에 미치는 영향(Acad Med, 2015)

The Effect of Differential Weighting of Academics, Experiences, and Competencies Measured by Multiple Mini Interview (MMI) on Race and Ethnicity of Cohorts Accepted to One Medical School

Carol A. Terregino, MD, Meghan McConnell, PhD, and Harold I. Reiter, MD






의학교육에 있어서 피훈련자의 다양성 혹은 그들의 비율을 인구구조를 반영하게 하자는 폭넓은 요구가 있다. 보건의료인력의 다양성을 증가시키는 것은 그 그룹 간 격차를 줄이는 하나의 접근법이 된다. Cohen 등은 공정과 평등 이슈에 더하여 접근성의 향상, 보건의료시스템의 관리의 최적화 등을 인력 다양화를 달성해야 할 실용적 이유로 보았다.

Within the context of medical education, there has been a call for broad strategies extending beyond measures of the compositional diversity of trainees or representational ratios.2 Enhancing diversity in the health care workforce has been proposed as one approach to address those group disparities.3 Cohen et al3 cite increasing access and ensuring optimal management of the health care system, in addition to issues of equity and fairness, as pragmatic reasons for attaining workforce diversity.


피훈련자의 다양성을 높이는 것은 모든 학생에 대하여 교육의 질을 높이는 것에 중요하고, 농촌지역, 도심 매부, 소수자들의 의료접근성을 높이고, 공공보건 연구의 진보를 가속화하는 데 중요하다. GPA와 MCAT점수에 의존하는 방식은 의료계의 다양성을 증대시키는데 큰 제약이 되며, 연구자들은 MMI에 기반한 선발이 다양성을 더 높인다고 주장한 바 있다.

Increasing trainee diversity is important for shaping educational quality for all students, increasing access to health care in rural, inner-city, and minority populations, and accelerating advances in medical and public health research.22 Reliance on GPAs and MCAT scores may severely constrain diversity within medicine,23,24 and researchers have argued that basing admission selections on MMI scores may promote applicant diversity.17,25


의과대학 인증기준의 변화 역시 의과대학들이 다양성에 관심을 가지게 된 계기이다. Holistic Review Project는 학생선발 과정에서 학문적 역량과 인성 역량을 모두 고려할 것을 장려하는 모델이며, 이를 위해서 RWJMS는 더 전인적인 평가과정을 도입했다.

Changes in accreditation requirements reflect the enhanced attention to diversity expected of all medical schools.26 The Holistic Review Project has articulated a model that promotes the consideration of both academic and personal competencies in the application process.27 In response, Rutgers Robert Wood Johnson Medical School (RWJMS) began to implement a more holistic screening process;


중요한 것은, 지원자들을 오직 MMI점수로만 선발한다는 점이다. MCAT 자료는 학업역량의 최저 수준을 결정하는 것을 도와준다. 11개 의과대학의 자료를 바탕으로 Julian은 MCAT점수 중 생물과학 점수 8점, 물리점수 7점, 언어추롡점수 6점 이하가 되지 않는 한 학업적 어려움을 겪을 가능성은 매우 낮다는 것을 보여주었다. 이러한 연구결과는 합당한 학업적 최저한계점만 넘어선다면, 입학절차는 학업적 수행능력에 덜 신경쓰고, 핵심 인성역량에 더 신경써야 한다는 것을 보여준다.

Importantly, applicants are admitted exclusively on the basis of their MMI scores. MCAT data support this reliance on academic thresholds. Using data from 11 schools, a study by Julian28 demonstrated that the risk of academic difficulties remained very low until entering students’ MCAT scores fell below 8 for biological sciences, 7 for physical sciences, and 6 for verbal reasoning. These findings suggest that for students exceeding acceptable academic thresholds, selection procedures should be less concerned with academic performance and more concerned with core personal competencies performance.


이 가설을 지지하듯 최종 합격자 선발을 MMI로만 했던 RWJMS의 첫 번째 코호트는 1학년과 2학년 과정, 그리고 USMLE Step 1에서 그 앞의 코호트와 동등한 성과를 보여주었다. 또한 이 집단의 MMI점수가 의과대학 재학 중 평가한 핵심인성역량(reliability, integrity, service/sensitivity to diversity)을 잘 예측했다.

In support of this hypothesis, the first cohort at RWJMS whose final admissions decision was based solely on MMI scores performed equivalently in first- and second-year courses and on United States Medical Licensing Examination (USMLE) Step 1 relative to previous cohorts admitted on the basis of traditional interviews, academic scores, and experiences. Additionally, the MMI scores from this first cohort predicted scores for students’ core personal competencies assessed in medical school (reliability, integrity, service/sensitivity to diversity).29


우리는 학업적 척도, 경험 척도, 인성 점수가 지원자의 자기보고식 민족/인종에 따라 다른지, 그리고 이 점수들의 가중치를 변화시켜서 입학생의 다양성에 영향을 줄 수 있는지를 보았다.

Specifically, we examined whether academic measures (GPA, MCAT), experience scores (service, clinical, and research [SCR]), and personal competencies scores (MMI) varied as a function of applicants’ self-reported race/ethnicity, and whether change in weighting of scores would impact diversity by altering the demographic composition of the entering classes.



방법

Method


세팅, 연구집단, 지원자 선발 과정

Setting, study population, application screening process


후향적 연구

This is a retrospective study of previously collected and recorded data for the RWJMS admissions process for entering classes 2011–2013.


학업 기준

We determined that applicants screened for MMI were academically and experientially prepared, based on threshold criteria previously set by the RWJMS Admissions Committee (

    • total GPA > 3.0, 
    • total MCAT > 22, 
    • MCAT biological science score > 8, and 
    • no other MCAT score < 6).


봉사/임상노출/연구/자기소개서/추천서를 5점 척도로 평가함. (3점: 지원자로서 acceptable함.)

    • 연구에서의 5점은 피어-리뷰 발표나 출판 경험
    • 봉사에서의 5점은 봉사단체를 조직한 것, 3점은 정기적으로 봉사조직에 참여한 것


We scored service, clinical exposure, research, the personal essay, and letters of recommendation on a 1–5 Likert scale. The scale was developed so that a score of 3 is an acceptable score for an applicant. An example of a research rating of 5 would indicate culmination of the research experience with peer- reviewed presentation or publication. With respect to service, regular involvement in a service organization would be rated 3, whereas the founder of a service organization would be rated a 5.


스크리닝 점수의 총합은 지원자의 순위를 매기는데 사용되지 않고, threshold로만 사용함(어느 점수 이하는 면접 안 봄). 그러나 SCR점수는 스크리닝 결정에 도움을 주기 위한 자료이지 스크리닝을 하는 절대적 기준은 아니며, 예컨대 일부학생은 연구 경험이 없었기 때문이다. 학업기준을 충족시키고 SCR, 자기소개서, 추천서 점수가 3점을 넘는 학생에게 면접기회를 줌. 이후 GPA, MCAT, 경험치 스크리닝 점수, 자기소개서 ,추천서 등은 더 이상 고려하지 않음

The sums of the screening scores were not used to rank applicants but served as threshold scores below which an interview would not be offered. An SCR score was developed to inform but not dictate screening decisions, as some students did not have research experience. We considered for interview only applicants who met the academic criteria and who had SCR, personal essay, and letters scores of at least 3. We did not revisit the GPA, MCAT, experiences screening scores, essays, and letters after applicants were selected for interview.





MMI 절차, 위원회 고려사항, 합격 결정

The MMI process, committee deliberations, admissions decisions


MMI. 6개 스테이션. 한 면접날의 문항은 그 날에만 사용됨. 

The MMI process at RWJMS consists of a six-station MMI. Each station consists of a behavioral descriptor or situational judgment-type interview stem addressing a specific AAMC COA core personal competency4 or combination of competencies. All interview stems are unique on a given interview day and written by one of the authors (C.A.T.). The MMI process at RWJMS employs only the 30 members of the standing committee, who participate in modified frame-of-reference training prior to the sessions. Extensive interviewer training allows for the assumption of adequate reliability with a six-station MMI.


Table 1

Table 1 demonstrates the behaviorally anchored rating scale for communication.




5점 척도로 다음을 평가

In each station, interviewers evaluate applicants on the 

    • basis of communication, 
    • content/argument, and 
    • overall global impression 

using a behaviorally anchored 1–5 Likert scale.



Statistical analysis


가중치를 달리하여 "what-if" analyses를 수행함. alternative weighting을 적용하기 전에 서로 다른 스케일로 평가하였기 때문에 z-score로 변환함

In addition to comparing differences in mean performance scores as a function of applicant self-reported race/ ethnicity, we also conducted a series of “what-if ” analyses to determine whether alternative weighting methods would have changed final admissions decisions and entering class composition. Because the different performance measures are on different numeric scales, we converted performance measures (GPA, MCAT, SCR score, and MMI) to z scores before implementing alternative weighting schemes.





결과

Results



전통적 수행능력 측정

Traditional performance measures


지원자와 MMI 스테이션의 상호작용은 33% 변인 설명. 이러한 상호작용 효과는 지원자가 MMI 스테이션에 따라 다양한 수행능력을 보이며, context-specificity를 의미함.

The interaction between applicant and MMI station accounted for the second largest amount of variance (33%). This interaction effect indicates that applicant performance varied across MMI stations, an effect commonly referred to as “contextspecificity.” 15







지원자 다양성과 전통적 수행능력 척도와의 관계

Relation of traditional performance measures to applicant diversity







"먄약" 분석: 가중치가 달랐을 경우의 결과

“What-if ” analyses: The effects of alternative weighting of performance measures on race/ethnicity composition of accepted applicants


URIM 지원자의 비율은 가중치에 따라 57%~22%로 다양함.

the proportion of URIM applicants accepted into the undergraduate medical program would have declined from 57% to 22% depending on weighting.









고찰

Discussion


전통적인 학업이나 경험 점수보다 MMI의 비율을 높이면 인종/민족 다양성이 높아질 것임을 보여준다. 우리가 아는 바에 따르면 이는 미국 의과대학에서 MCAT이나 GPA가 아닌 MMI의 URIM 지원자에 대한 중립성을 보여준 첫 번째 연구

Our findings suggest that increasing use of MMI scores in admission decisions may enhance racial/ethnic diversity among entering medical students, relative to reliance on traditional academic measures and experience scores. To our knowledge this is the only report from a U.S. medical school showing the neutrality of the MMI for underrepresented applicants, contrary to the MCAT or GPA.31


MMI 수행능력에 있어서 URIM지원자와 non-URIM 지원자간 차이는 없었으며, 소규모 캐나다 연구와 같은 결과이다. 이러한 결과로부터 extrapolate하는 것은 연구 대상자의 규모나 미국/캐나다의 극도의 사회문화적 다양성 때문에 한계가 있다.

Our results revealed that there was no statistical significance in MMI performance between URIM and non- URIM groups, a finding consistent with a small Canadian study on five aboriginal applicants.25 Extrapolation from that study, however, is limited because of the size of that study, and the very different social and cultural backgrounds of the United States and Canada. 


상위 45% 학생의 민족/인종 구성만 놓고 보면 변화는 더 극적이다. Reiter 등은 여섯 개 캐나다 의과대학에서 MMI 결과를 분석하여 MMI가 다양성을 증가시키고, 의과대학 접근가능성을 높이며, 학업적 변인의 효과를 중화시킨다는 것을 보여줬다. McMaster의 접근법(면접 대상자 선발시에는 60% GPA 와 40% 자기소개서, 최종선발자 선발시에는 70% MMI와 30% GPA)도 있다. 캐나다 연구는 이렇게 가중치를 달리 했을 때 가구수입이나 지역사회 규모를 기준으로 비교하였을 때 합격자 코호트에는 영향을 주지 않았다.

The change in racial/ethnic makeup of the top 45% ranked students who would be offered acceptance is even more surprising. Reiter et al17 combined MMI results of six Canadian medical schools over two years, focusing on MMI effect on enhancing diversity, increasing access to medical school, and neutralizing the effect of academic variables. McMaster’s formulaic approach to invitation for interview was 60% GPA and 40% autobiographical questionnaire, and postinterview selection was 70% MMI score and 30% GPA. The Canadian study found that these differential weighting schemes did not impact the diversity of accepted cohorts, as measured by income and community size.17











 2015 Dec;90(12):1651-7. doi: 10.1097/ACM.0000000000000960.

The Effect of Differential Weighting of AcademicsExperiences, and Competencies Measured by Multiple MiniInterview (MMI) on Race and Ethnicity of Cohorts Accepted to One Medical School.

Author information

  • 1C.A. Terregino is senior associate dean for education and associate dean for admissions, Rutgers Robert Wood Johnson Medical School, Piscataway, New Jersey. M. McConnell is assistant professor, Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada. H.I. Reiter is professor, Department of Oncology, McMaster University, Hamilton, Ontario, Canada.

Abstract

PURPOSE:

To examine whether academic scores, experience scores, and Multiple Mini Interview (MMI) core personal competencies scores vary across applicants' self-reported ethnicities, and whether changes in weighting of scores would alter the proportion of ethnicities underrepresented in medicine (URIM) in the entering class composition.

METHOD:

This study analyzed retrospective data from 1,339 applicants to the Rutgers Robert Wood Johnson Medical School interviewed for entering classes 2011-2013. Data analyzed included two academic scores-grade point average (GPA) and Medical College Admission Test (MCAT)-service/clinical/research (SCR) scores, and MMI scores. Independent-samples t tests evaluated whether URIM ethnicities differed from non-URIM across GPA, MCAT, SCR, and MMI scores. A series of "what-if" analyses were conducted to determine whether alternative weighting methods would have changed final admissions decisions and entering class composition.

RESULTS:

URIM applicants had significantly lower GPAs (P < .001), MCATs (P < .001), and SCR scores (P < .001). However, this pattern was not found with MMI score (non-URIM 10.4 [1.6], URIM 10.4 [1.3], P = .55). Alternative weighting analyses show that including academic/experiential scores impacts the percentage of URIM acceptances. URIM acceptance rate declined from 57% (100% MMI) to 43% (10% GPA/10% MCAT/10% SCR/70% MMI), 39% (30% GPA/70% MMI), to as low as 22% (50% MCAT/50% MMI).

CONCLUSIONS:

Sole reliance on the MMI for final admissions decisions, after threshold academic/experiential preparation are met, promotes diversity with the accepted applicant pool; weighting of "the numbers" or what is written about the application may decrease the acceptance of URIM applicants.

PMID:
 
26488572
 
[PubMed - in process]


의과대학입학면접: 구조화 면접이 비구조화 면접보다 더 Reliable한가? (Teaching and Learning in Medicine, 2010)

Medical School Preadmission Interviews: Are Structured Interviews More Reliable Than Unstructured Interviews?


Rick Axelson and Clarence Kreiter

Department of Family Medicine, University of Iowa, Iowa City, Iowa, USA

Kristi Ferguson

Office of Consultation and Research in Medical Education, University of Iowa, Iowa City, Iowa, USA

Catherine Solow and Kathi Huebner

Office of Student Affairs and Curriculum, Carver College of Medicine, Iowa City, Iowa, USA





평가점수의 신뢰도를 향상시키는 한 가지 흔한 방법은 구조화면접을 사용하는 것. 면접의 구조는 scoring rubric의 활용, 질문의 효준화, 프로빙의 사용, 기타 요인 등에 따라 정해진다. 그러나 구조화된 면접을 사용하는 것을 지지하는 직접적 근거는 희박하며, Kreiter 등의 연구에 따르면 공정성이나 신뢰도와 관련하여 모든 질문을 모든 지원자에게 동일하게 제시하는 것에 대한 논리적 rationale는 없다. 이러한 결과는 직관에 반하는 것일 지도 모른다. 그러나 면접의 질문은 facet의 무작위 측정으로 받아들여져야 하며, that sampling a small number of questions effectively equates for question difficulty across applicants.

One commonly advocated method for enhancing score reli- ability is to use a structured interview format.3,4 The level of interview structure is defined by the use of a scoring rubric, question standardization, the use of probing, and other factors. There is, however, little direct evidence to support the practice of using structured interviews, and a recent study by Kreiter et al.5 suggests there is no logical rationale related to fairness or reliability that would support presenting the same questions to all applicants. This finding may appear counterintuitive; how- ever, it is easily demonstrated that interview questions should be regarded as a random measurement facet and that sampling a small number of questions effectively equates for question difficulty across applicants.









방법

METHODS


25분간, 2명의 교수가 면접. 각 교수는 면접관이 되면 매년 8~10명의 지원자를 평가함.

The University of Iowa Roy J. and Lucille A. Carver College of Medicine (UICCOM) is a public medical school with a total enrollment of 572 students. As part of the application process to UICCOM, particularly well-qualified candidates are selected to participate in a 25-min interview with two faculty members. A pool of faculty interviewers is recruited by the director of medical admissions each year (average interviewers used per year are approximately 150) to conduct the interviews. Each faculty member interviews approximately eight to ten applicants per year.


면접을 두 파트로 나눴음

Hence, there are two parts to the interview: 

  • (a) a structured component—where candidates are read predetermined ques- tions and their responses are scored on a scale from 1 to 5 using an established scoring rubric, and 
  • (b) an unstructured component—where there is a free-flowing exchange between faculty and the candidate on any appropriate topic of interest to the faculty interviewer and/or the candidate. 

비구조화 파트에 있어서 평가는 5점 척도로. 명백한 scoring rubric은 없었으며, 5 (excellent) and 1 (poor).

Scores ranging from1 to 5 are also awarded on this unstructured portion of the interview but, given the variable nature of these exchanges, are not guided by explicit scoring rules or rubrics. For each of these 5-point rating scales, the anchors are 5 (excellent) and 1 (poor).


면접 진행 프로토콜

The interview protocol is as follows. 

  • 4개 표준질문이 있는 구조화 파트로 시작 Each interview begins with a highly structured component that asks the same four standard questions of all applicants being interviewed. 
  • 질문은 매년 바뀌나 면접질문의 표준 pool 중에서 선정됨 Ques- tions vary somewhat from year to year, but they are drawn from a standard pool of interview questions.1 
    • 지원 동기 In general, these ques- tions ask about applicants’ motivation for pursuing a career in medicine, 
    • 난관 극복 how they might deal with various challenges encoun- tered in practicing medicine, and 
    • 과거 경험, 성격 특성 how applicants’ experiences and/or attributes will enable them to be outstanding physicians. 
  • 질문에 답한 직후 두 명의 평가자는 scoring rubric에 따라서 평가하고 다음 질문으로 넘어감. Immediately following the applicant response to a question, the two faculty raters, guided by a scoring rubric, independently rate each of the applicant’s responses before moving on to the next question. 
  • 후속 질문 불가 Interviewers are not allowed to probe or ask follow- up questions. 
  • 각 질문에 대한 시간 제한은 없음 There is no time limit set for responses to each question; candidate responses are typically about 2 to 3 min per question. 
  • 모든 구조화 질문이 끝난 후, 남은 시간은 개방형 대화 After all the structured questions are completed, the remaining minutes of the interview are devoted to an open conversation with the applicant.

면접관 훈련

  • 처음 참여하는 교수는 모두 훈련대상 Training is provided for all first-time interviewers. 
  • 프로토콜, 구조화 질문의 평가 rubric, 샘플 비디오을 이용한 평가 Train- ing sessions provide faculty with an overview of the interview protocol, scoring rubrics for structured questions, and an oppor- tunity to score some sample (fictitious) videotaped responses using the scoring rubrics. 
  • 샘플 비디오를 본 이후에 트레이너와 토론 After each sample response is viewed and scored, faculty discuss their rationale for awarding a given score with the trainer. 
  • 비구조화 파트에 대해서 리뷰하고, 면접 주제로 적절한 것과 부적절한 것을 강조함. Trainers also review the protocol for the unstructured portion of the interview, emphasizing what are considered appropriate and inappropriate topics for discussion. 
  • 실제 면접 세션에 잘 적응하도록 촉진하기 위하여 처음 면접에 참여하는 면접관은 관찰자로부터 피드백 받음 To facilitate adjustment to actual interview sessions, first-time interviewers receive feedback from an observer who is present during their initial day of interviewing. 
  • 관찰자는 숙련된 면접관으로서 새로 참여하는 면접관이 어긋나갈 수 있는 어떤 부분에 대해서든 피드백을 주는 역할
    Observers are experi- enced interviewers who provide feedback regarding any areas where new interviewers may be straying from the established interview protocols and scoring procedures.


Variance components

Table 5 shows variance components and reliability obtained from two complete interview occasions•× • each employing a struc-tured and unstructured format [p o ] and provides informa-tion related to a complete replication of an interview using both the structured and unstructured format. 






The proportion of person variance for the structured format was 22% compared with 30% for the unstructured format and implies the unstructured format will yield more consistent scores across replications. The universe score correlation between the formats was .82, suggesting the formats may not assess identical attributes of the applicant. 



DISCUSSION


기존의 연구결과와 달리, 비구조화 형식이 평가자간 일치도라는 관점 뿐만 아니라 무작위 복제(평가-재평가) 분석에서도 더 reliable함이 확인되었다. 더 나아가 서로 다른 형식이 - 서로 관련되지만 - 서로 구분되는 구인을 평가하는 것으로 보인다. 전체 점수의 상관관계와 Person X Format 상호작용은 두 개의 형식이 지원자와 관련하여 동일한 구인을 측정하는 것이 아님을 보여준다. 마지막으로 신뢰도가 두 개의 척도를 병합함(구조화+비구조화)으로써 더 높아질 수 있음을 알아내었다. 

Contrary to the predominant view in the research literature, we found that the unstructured format was more reliable from both an interrater rater agreement perspective and in the random replications (test–retest) analysis. Further, it appears that the different formats are measuring related, yet distinct, constructs. The universe score correlation (ru = .82) and Person × Format interaction indicated that the two formats do not measure identical constructs related to the applicant. Last, we found that reliability can be increased by combining the two measures into a composite score. An examination of weighted composite scores indicates a sum score with approximately equal weights on both formats maximizes reliability and the information obtained.








 2010 Oct;22(4):241-5. doi: 10.1080/10401334.2010.511978.

Medical school preadmission interviews: are structured interviews more reliable than unstructured interviews?

Author information

  • 1Department of Family Medicine, University of Iowa, Iowa City, Iowa 52242, USA. rick-axelson@uiowa.edu

Abstract

BACKGROUND:

The medical education research literature consistently recommends a structured format for the medical school preadmissioninterview. There is, however, little direct evidence to support this recommendation.

PURPOSE:

To shed further light on this issue, the present study examines the respective reliability contributions from the structured andunstructured interview components at the University of Iowa.

METHODS:

We conducted three univariate G studies on ratings from 3,043 interviews and one multivariate G study using responses from 168 applicants who interviewed twice.

RESULTS:

Examining interrater reliability and test-retest types of reliability, the unstructured format proved more reliable in both instances. Yet, combining measures from the two interview formats yielded a more reliable score than using either alone.

CONCLUSIONS:

At least from a reliability perspective, the popular advice regarding interview structure may need to be reconsidered. Issues related to validity, fairness, and reliability should be carefully weighed when designing the interview process.

PMID:
 
20936568
 
[PubMed - indexed for MEDLINE]


인턴선발 MMI에서 과거행동면접 vs 상황면접 : 신뢰도와 수용가능도 비교(BMC Med Educ, 2015)

Past-behavioural versus situational questions in a postgraduate admissions multiple mini-interview: a reliability and acceptability comparison

Hiroshi Yoshimura1,2,3*, Hidetaka Kitazono2, Shigeki Fujitani2, Junji Machi2,3, Takuya Saiki4, Yasuyuki Suzuki4 and Gominda Ponnamperuma5






세팅과 참가자

Settings and participants


TBIIMC 개요; 진료과; 미션; 교육목표; MMI 진행

TBUIMC is a Japanese general hospital, which newly introduced three specialty training programmes: internal medicine, surgery, and emergency medicine. To accomplish the trans-specialty mission of ‘fostering high-quality generalist physicians providing holistic patient care’, the educational committee of TBUIMC decided to introduce the Accreditation Council for Graduate Medical Education (ACGME) six general competencies [36] as educational outcomes. In 2013, the MMI took place at the partitioned TBUIMC conference room, in three separate weekends. Of the 26 candidates who applied for the TBUIMC programmes, 13, 10, and 3 were invited for the MMI on the first, the second, and the third day of the MMI, respectively.


면접 진행; 대상자; 면접관; 

Three separate days were set for candidates’ convenience, having better access to selection opportunities in TBUIMC; this facilitated the recruitment process. All candidates were Japanese medical graduates, whose level of training ranged from Post Graduate Year (PGY)-2 to PGY-4. They were either in the second year of, or had concluded the two-year National Obligatory Initial Postgraduate Clinical Training Programme (NOIPCTP), following their graduation from Japanese medical schools, and the Japanese National Licensure Examination [37]. A total of 18 examiners, including TBUIMC’s educational committee members (most of whom were US specialty board certified) and clinical supervisors, were all Japanese physicians in the aforementioned three specialties. All candidates, regardless of their applying specialties or the PGY level, were examined by all examiners, who were randomly allocated to the stations. All examiners stayed within the same station, on all three days.


인터벤션

Intervention


ACGME 여섯 개 역량 중 의학지식 제외; 나머지 다섯 개 역량은 하나당 한 스테이션; 각 역량당 2~8개의 하부 영역; 스테이션당 2명의 평가자; PBQ에서는 STAR Approach 사용. SQ에서 평가자는 독단적으로 probing은 못하게 함.

To base stations on the competencies of the ACGME, except ‘medical knowledge’, 5 stations were created to assess one competency (domain) per station. Out of the 2 to 8 sub-domains in each competency [36], two sub- domains (one for the PBQ, and the other for the SQ) per station were selected so that one PBQ followed by one SQ was administered within the same station (Table 1). The same questions were asked from all candidates. Two examiners were assigned to one station and they alternated questioning roles. 

  • In PBQs, Situation-Task-Action-Result (STAR) approach was applied for guiding interviews [38]. 
  • In SQs, presenting a scenario with a dilemma and making the candidates describe what they would do, in a situation where the candidate had to choose between two or more mutually exclusive courses of action [21,22] were followed by structured probing [27]. Examiners were not allowed to probe independently. 

A sample of instructions to exam- iners for one of the stations is shown in Table 2.




인터뷰가이드





10개 스테이션이면 충분히 reliable하다. 질문의 형태 외에도 다른 요인들이 영향을 미쳤을 것.

The current study suggests that less than 10 stations of the MMI with one examiner per station may be suffi- ciently reliable. In addition to the question format, other structuring processes may have contributed to this, e.g. 

  • 기존에 확립된 프레임워크 basing stations on an established competency framework; 
  • 불필요한 라포형성 최소화 minimising unnecessary rapport building between exam- iners and candidates; 
  • 계획에 따른 동일한 질문 asking exactly the same questions from each candidate with planned probing; 
  • 3개의 구분가능한 평가기준 활용 using three distinguishable rating rubrics; 
  • 구체적 anchor에 따른 평가 rating candidates on points anchored with detailed descriptors; and 
  • 평가자 훈련 providing exam- iner training. 

이러한 구조화 노력이 스테이션 수를 줄이는데 도움을 주었을 것임

These structuring efforts would help reduce the number of stations, especially where only limited examiner resources are available for a relatively smaller number of candidates.


평가자와 지원자가 긍정적(하지만 중등도의) 반응을 보인 것에는 스테이션 면접 형식이 이처럼 고도로 구조화된 것이 기여하는 바가 있을 것임. 흥미롭게도, 본 연구에서는 SQ와 PBQ에 대해 지원자와 면접관의 상반되는 반응을 보여준다. SQ는 지원자가 더 선호하였고, 평가자는 PBQ를 더 선호하였다. 특히 모든 참여자는 현재 MMI가 공평하며 SQ와 PBQ를 모두 사용하는 것의 중요성에 대해 언급하였다. 

As non-medical personnel selection studies have sug- gested [27], the highly structured nature of the station interview formats and other structuring efforts in the present study may be responsible for the positive but modest candidate and examiner reaction compared with previous studies [1,7-9,11-15]. Interestingly, this study also indicates contrasting acceptability for SQs and PBQs amongst candidates and examiners, i.e. SQs being more favourable for candidates as opposed to PBQs be- ing more favourable for examiners. Of particular note, all participants admitted fairness of the current MMI and most expressed importance of using both SQs and PBQs. As to how best PBQs and SQs could be com- bined, the participant reactions could be used as a guide for generating a discussion on both question formats at a given level (undergraduate or postgraduate [founda- tion, specialty, or subspecialty]) of admissions MMIs in the future, as is being discussed in the area of SSPIs in non-medical personnel selection [27].

























 2015 Apr 14;15:75. doi: 10.1186/s12909-015-0361-y.

Past-behavioural versus situational questions in a postgraduate admissions multiple mini-interview: a reliabilityand acceptability comparison.

Author information

  • 1Educational Committee, Prefectural Okinawa Nanbu and Children's Medical Centre, Haebaru Town, Okinawa Prefecture, Japan. yoshimura.hiroshi@gmail.com.
  • 2Educational Committee, Tokyo Bay Urayasu-Ichikawa Medical Centre, Urayasu City, Chiba Prefecture, Japan. yoshimura.hiroshi@gmail.com.
  • 3Department of Surgery, University of Hawaii, John A. Burns School of Medicine, Honolulu, State of Hawaii, USA. yoshimura.hiroshi@gmail.com.
  • 4Educational Committee, Tokyo Bay Urayasu-Ichikawa Medical Centre, Urayasu City, Chiba Prefecture, Japan. hkitazono@gmail.com.
  • 5Educational Committee, Tokyo Bay Urayasu-Ichikawa Medical Centre, Urayasu City, Chiba Prefecture, Japan. shigekifujitani@gmail.com.
  • 6Educational Committee, Tokyo Bay Urayasu-Ichikawa Medical Centre, Urayasu City, Chiba Prefecture, Japan. junji@hawaii.edu.
  • 7Department of Surgery, University of Hawaii, John A. Burns School of Medicine, Honolulu, State of Hawaii, USA. junji@hawaii.edu.
  • 8Medical Education Development Centre, Faculty of Medicine, Gifu University, Gifu City, Gifu Prefecture, Japan. saikitak@gifu-u.ac.jp.
  • 9Medical Education Development Centre, Faculty of Medicine, Gifu University, Gifu City, Gifu Prefecture, Japan. ysuz@gifu-u.ac.jp.
  • 10Faculty of Medicine, University of Colombo, Colombo, Western Province, Sri Lanka. gomindap@hotmail.com.

Abstract

BACKGROUND:

The Multiple Mini-Interview (MMI) mostly uses 'SituationalQuestions (SQs) as an interview format within a station, rather than 'Past-BehaviouralQuestions (PBQs), which are most frequently adopted in traditional single-station personal interviews (SSPIs) for non-medical and medical selection. This study investigated reliability and acceptability of the postgraduate admissions MMI with PBQ and SQ interview formats within MMI stations.

METHODS:

Twenty-six Japanese medical graduates, first completed the two-year national obligatory initial postgraduate clinical training programme and then applied to three specialty training programmes - internal medicine, general surgery, and emergency medicine - in a Japanese teaching hospital, where they underwent the Accreditation Council for Graduate Medical Education (ACGME)-competency-based MMI. This MMI contained five stations, with two examiners per station. In each station, a PBQ, and then an SQ were asked consecutively. PBQ and SQ interview formats were not separated into two different stations, or the order of questioning of PBQs and SQs in individual stations was not changed due to lack of space and experienced examiners. Reliability was analysed for the scores of these two MMI question types. Candidates and examiners were surveyed on this experience.

RESULTS:

The PBQ and SQ formats had generalisability coefficients of 0.822 and 0.821, respectively. With one examiner per station, seven stations could produce a reliability of more than 0.80 in both PBQ and SQ formats. More than 60% of both candidates and examiners felt positive about the overall candidates' ability. All participants liked the fairness of this MMI when compared with the previously experienced SSPI. SQs were perceived more favourable by candidates; in contrast, PBQs were perceived more relevant by examiners.

CONCLUSIONS:

Both PBQs and SQs are equally reliable and acceptable as station interview formats in the postgraduate admissions MMI. However, the use of the two formats within the same station, and with a fixed order, is not the best to maximise its utility as an admission test. Future studies are required to evaluate how best the SQs and PBQs should be combined as station interview formats to enhance reliability, feasibility, acceptability and predictive validity of the MMI.

PMID:
 
25890189
 
[PubMed - indexed for MEDLINE] 
PMCID:
 
PMC4427914
 
Free full text


MMI 기반 선발의 결과와 의과대학 지원자의 인종/민족/사회경제적지위의 관계(Acad Med, 2015)

How Medical School Applicant Race, Ethnicity, and Socioeconomic Status Relate to Multiple Mini-Interview–Based Admissions Outcomes: Findings From One Medical School

Anthony Jerant, MD, Tonya Fancher, MD, MPH, Joshua J. Fenton, MD, MPH, Kevin Fiscella, MD, MPH, Francis Sousa, MD, Peter Franks, MD, and Mark Henderson, MD





MMI 도입에 따라 underrepresented racial/ethnic minority (URM) 집단이나 낮은 SES의 지원자가 어떤 영향을 받았는가에 대한 연구가 적다. 미국 의과대학에 URM과 Low SES 학생의 비율이 불균형을 이루고 있음을 감안하면 중요한 사안이다.

Little studied is how underrepresented racial/ethnic minority (URM) and lower socioeconomic status (SES) applicants may be affected by adoption of the MMI. This is a key issue given that U.S. medical schools admit disproportionately few URM and lower SES individuals.6–8


전통적인 비구조화면접은 오랜 시간 면접관의 편견에 취약하다는 지적이 있었다. 무의식적 편견이 인종/민족 소수자들과 낮은 SES 지원자를 탈락시키는 방향으로 작용하는 것은 의사는 물론 미국에 흔한 현상이다. 면접에서 발생하는 비뚤림의 영향은 구조화를 높임으로서(모호성을 제거하고, 정형화된 구조에 따라 판단하게 하는) 줄일 수 있고, 다양한 평가자의 평가결과를 취함함으로써 개개인의 편견의 영향을 희석시킬 수 있다.

A long-recognized problem with traditional nonstructured interviews is vulnerability to interviewer biases triggered by various applicant characteristics.17–22 Implicit (i.e., unconscious) biases disfavoring racial/ ethnic minority and lower SES persons are common in U.S. society,23 including among physicians.24 The effects of bias during interviews can be reduced by increasing structure (removing ambiguity and, therefore, the tendency to rely on stereotype-driven judgments) and pooling evaluations from multiple raters (potentially diluting or offsetting individual biases).20,25–27


우리가 아는 한 MMI수행능력과 URM, SES의 관련성에 대한 연구는 세 개이다.

Only three studies to our knowledge have explored the associations of medical school applicants’ racial/ethnic minority status or SES with MMI performance.


MMI를 치른 이후 합격에 인종/민족이 영향을 주었는지에 대한 연구는 없다. 혹은 인종/민족, SES가 MMI invitation영향에 대한 연구도 없다.

To our knowledge, no studies have examined whether applicants’ race/ ethnicity influences acceptance following MMI participation, or whether race/ ethnicity or SES influences the likelihoodof being invited to an MMI.



방법

Method


지원, 스크리닝, MMI초청, 일정조정

Application, screening, and MMI invitation and scheduling


다음에 따라서 MMI invitation을 평가함
Faculty evaluated secondary applications for invitation to an MMI based on cumulative GPA and MCAT scores, personal statements, extracurricular activities, recommendation letters, and other characteristics that could contribute to fulfilling the educational and service missions of the school.

MMI절차와 점수

MMI process and scoring


2분-8분, 다음의 10개 주제

The MMI consisted of 10 individual 10-minute stations. At each station, applicants had 2 minutes to read a brief set of instructions, and 8 minutes to address the assigned tasks on entering the room. Nine stations assessed skills in the following domains: 

    • integrity/ ethics, 
    • professionalism, 
    • interpersonal communication, 
    • diversity/cultural awareness, 
    • teamwork, 
    • ability to handle stress, and 
    • problem solving. 
    • An additional station asked applicants to explain their choice to pursue a career in medicine

Most stations were adapted from content developed at McMaster University and marketed by ProFitHR.34


학생의 AMCAS 지원 정보를 모르는 한 명의 숙련된 평가자가 각 스테이션에 배정됨

A single trained rater, blinded to participants’ AMCAS application information, attended each station.


총 216명의 서로 다른 평가자

There were 216 different raters during the study period; 

    • 평균 참가 스테이션 the mean number of MMI stations that each evaluated was 104 (standard deviation [SD] 61.9; range 8–276). 
    • 여성 Women made up 61% of raters. 
    • 평가자 Background Rater professional backgrounds were as follows: physicians, 31%; medical students, 15%; other clinicians (e.g., nurses), 11%; basic science faculty, 6%; patients, 2%; and various nonclinician leaders (e.g., deans), professionals (e.g., lawyers), and high- level administrative staff (e.g., curriculum manager), 35%. 


평가자의 배경이 다양한 것은 다양한 관점이 미래에 온갖 계층의 사람들과 효율적으로 일할 의사를 선발하는데 도움이 된다고 생각했기 때문. 의무적인 평가자 훈련은 입학절차에 대한 1시간의 리뷰, 평가자 역할과 의무, 계급문제를 지양할 필요성 등을 다뤘다.

The range of rater backgrounds reflected the conviction that diverse perspectives are helpful in selecting future physicians who will be able to work effectively with people from all walks of life. Mandatory rater training included a one-hour course reviewing the admissions process, rater roles and duties, and the need to avoid pursuing protected class issues (e.g., race/ethnicity, gender).36


각각 스테이션의 평가 (4점 척도)

At each station, raters scored overall applicant performance using an anchored four-point scale: 

    • 0, < 25th percentile performance (relative to other applicants); 
    • 1, 25th–50th percentile; 
    • 2, 51st–75th percentile; or 
    • 3, > 75th percentile. 

또한 지원자의 의사소통능력과 이해도를 고려하도록 함. 

Raters were instructed to consider both the applicant’s communication abilities and the content (e.g., comprehensiveness) of their statements in assigning ratings. The total MMI score was the mean of each applicant’s individual station scores. Scale internal consistency (Cronbach alpha = 0.67) was comparable to that observed in other MMI studies.2,18,37–41



입학 판정

Acceptance recommendation


Subsequently, the committee made one of the following recommendations: reject, low waitlist, high waitlist, or offer acceptance.


URM 상태

URM status


AMCAS 지원정보를 바탕으로 URM Status를 판단

We determined URM status (URM [black, Southeast Asian, Native American, or Pacific Islander race and/or Hispanic ethnicity] versus not [all other responses]) from self-reported race/ethnicity information in the AMCAS application.



SES 불이익

Socioeconomic disadvantage


AMCAS 지원정보를 바탕으로 SES 척도를 개발

We developed a composite measure of SES using self-reported information in the AMCAS application,


다음의 정보를 활용

The following predictors (yes/ no items except where indicated) were significant and maximized the area under the receiver operating characteristic curve (0.95):

    • fee assistance received for medical school application (yes/no); 
    • childhood spent in an underserved area; 
    • family recipients of family assistance program; 
    • income level category of applicant’s family (< $25,000; $25,000 to < $50,000; $50,000 to < $75,000; or > $75,000); 
    • applicant contributed to family income; 
    • any financial-need-based scholarship(s) in paying for postsecondary education; 
    • percentage of postsecondary education costs contributed by the family; and 
    • parents’ highest level of educational attainment (< high school, high school graduate, some college, or college graduate).


Applicant characteristics



MMI invitation




MMI score



Acceptance recommendation





Discussion


URM지원자는 non-URM지원자보다 MMI invitation을 받을 가능성이 더 낮지 않았고, MMI에서 유사한 정도의 점수를 받았으며, 입학할 가능성은 더 높았다.

Further, URM applicants were no less likely than non-URM applicants to receive an MMI invitation, performed similarly on the MMI, and were just as likely to be recommended for acceptance.


URM과 non-URM지원자 사이의 유사한 MMI점수는 구조화된 면접이 다양한 평가자의 관점을 포함하게 하면서 개개인이 은연중에 가지는 편견으로부터 덜 취약하게 해주는 효과가 있음을 보여준다. 비록 우리가 평가자의 implicit bias를 측정하지는 않았지만, 이렇한 정보는 미국사회에 널리 퍼져있음이 이미 여러 문헌에서 나타난 바 있으며, 의사나 다른 전문직도 예외는 아니고, 의료를 포함하여 다양한 고용면접의 결과에 영향을 준다. 따라서 implicit bias는 우리의 평가자들 사이에도 있었을 것이다. 그러나 이는 net로 보았을 때 유의한 영향은 없었고, URM과 non-URM사이에 차이가 있지도 않았다. 의료분야에 URM 비중이 낮은 것이 이미 많이 인정된 문제인만큼, URM에 대한 안좋은 편견은 (이러한 인종/민족의 문제를 해결하기 위하여 평가자들이 들이는 노력에 따른) URM지원자에 대한 우호적 편견으로 offset할 수 있다.

The similar MMI scores for URM and non-URM participants support the notion that structured interview processes that incorporate the perspectives of multiple evaluators like the MMI may be less vulnerable to the effects of individual evaluator implicit biases.20,25–27 Although we did not measure rater implicit biases regarding racial/ ethnic minorities, such biases have been documented to be pervasive in U.S. society, including among physicians and other professionals,23,24 and can affect the outcomes of employment interviews in various fields including medicine.17,19–22 Thus, it is likely that implicit biases were present among our raters; however, they did not exert a significant net influence, given that mean MMI scores did not differ between URM and non-URM applicants. Because lack of URMs in medicine is a widely acknowledged problem,6,7,13,33,42–44 it is possible that biases against URM applicants were offset by ratings biased in favor of URM applicants, made by raters seeking to address limited racial/ethnic diversity in the physician workforce.


반면, 낮은 SES는 더 낮은 MMI점수를 받았다.

In this context, our finding that lower SES applicants had worse adjusted MMI performance may be cause for concern. 


그럼에도 불구하고, 낮은 SES가 MMI점수에 미치는 영향은 작았다. SES를 0-1로 평가했을 때 그 감소 정도가 0.12정도였다. 또한 낮은 MMI점수는 더 높은 합격률로 offset되었다. 이러한 결과는 AAMC가 지향하는 바와 같이 순전히 metric-based의 지원자 검토보다 더 holistic process로 변하고 있음을 보여준다.

Nonetheless, the decrement in MMI performance with decreasing SES in our study was small: The MMI score (scale of 0–3 points) declined by a mean of 0.12 points across the 0–1 range of the SES score. Further, the lower MMI scores among lower SES applicants were more than offset by their greater likelihood of being invited to an MMI and recommended for acceptance. These findings may reflect the ongoing shift from a purely metric-based applicant review process toward the more holistic process advocated by the Association of American Medical Colleges.12,15


낮은 SES 지원자는 MMI에서 평가하는 생애 경험이 더 적을 수 있다. 더 낮은 MCAT점수를 받은 지원자에 대해서도 유사한 추론이 제기된 바 있다. 덜 부유한 지원자가 postsecondary education기간동안 임금노동을 더 많이 했을 수는 있지만, 그들이 일한 것이 MMI식의 선발절차를 거치진 않았을 것이다. MMI와 같은 유형의 선발절차 경험이 없는 것은 특정 면접 형식에 대한 과거 경험이 유사한 방식의 면접에서 더 높은 점수와 관계됨을 고려할 때 의과대학 MMI에서 약점으로 작용할 수 있다. 또한 낮은 수준의 일자리는 높은 수준의 의사소통, 비판적 사고, 문제해결 등 MMI에서 요구하는 능력 개발을 촉진시키지 않을 가능성이 높으며, 그러한 일자리에 투자하하는 시간이 이들 skill 개발에 장애가 될 것이다.

Lower SES applicants may have fewer life experiences bolstering skills assessed by the MMI. Similar reasoning has been suggested to explain the lower MCAT scores among such applicants.45 Although less affluent applicants are more likely to report paid employment during postsecondary education, their financial circumstances may require taking jobs that do not require MMI-type preemployment screening. Lack of prior experience with MMI-type screening may be a disadvantage in the medical school MMI because prior experience with a particular interview format is associated with better future performance with that format.46 Lower-level jobs also may not facilitate the higher-level communication, critical thinking, and problem-solving skills the MMI assesses, and the time required for such jobs may limit participation in pursuits that build such skills (e.g., scholarly presentations, volunteer clinic work).


기존의 연구를 보면 익숙하지 않은 언어(표현)를 사용하는 것이 낮은 평가로 비뚤리게 하는 요인이 된다고 한다. 지원자의 언어 기술은 면접관의 즉각적 인상을 결정하고, 그 결과 최종 평가에도 영향을 줄 수 있다. 의사인력의 SES 불균형은 인종/민족 불균형보다 관심을 덜 받아왔다. 따라서 면접관이 낮은 SES 지원자에게 우호적으로 bias하려고 의식적으로 신경을 썼을 가능성은 낮다. 

Prior work indicates that applicant factors such as use of language unfamiliar to the typical rater could trigger a biased low rating.20,21 Applicants’ verbal skills have been shown to determine immediate interviewer impressions and, in turn, final appraisals.49 The issue of SES-based physician workforce disparities has received less attention than race/ ethnicity-based disparities.6 Thus, it is less likely that raters consciously biased their evaluations in favor of lower SES applicants to address SES-based physician workforce disparities.


34 Advanced Psychometrics for Transitions Inc. Welcome to ProFitHR. http://www.profithr.com/. Accessed April 4, 2015.




















 2015 Dec;90(12):1667-74. doi: 10.1097/ACM.0000000000000766.

How Medical School Applicant RaceEthnicity, and Socioeconomic Status Relate to Multiple Mini-Interview-Based Admissions OutcomesFindings From One Medical School.

Author information

  • 1A. Jerant is professor, Department of Family and Community Medicine, Center for Healthcare Policy and Research, University of California, Davis,School of Medicine, Sacramento, California. T. Fancher is associate professor, Division of General Internal Medicine, Department of Internal Medicine, University of California, Davis, School of Medicine, Sacramento, California. J.J. Fenton is associate professor, Department of Family and Community Medicine, Center for Healthcare Policy and Research, University of California, Davis, School of Medicine, Sacramento, California. K. Fiscella is professor, Department of Family Medicine, University of Rochester School of Medicine and Dentistry, Rochester, New York. F. Sousa is assistant dean, Admissions and Student Development, and volunteer clinical professor, Department of Internal Medicine, University of California, Davis, School of Medicine, Sacramento, California. P. Franks is professor, Department of Family and Community Medicine, Center for Healthcare Policy and Research, University of California, Davis, School of Medicine, Sacramento, California. M. Henderson is associate dean, Admissions and Outreach, and professor, Division of General Medicine, Department of Internal Medicine, University of California, Davis, School of Medicine, Sacramento, California.

Abstract

PURPOSE:

To examine associations of medical school applicant underrepresented minority (URM) status and socioeconomic status (SES) withMultiple Mini-Interview (MMI) invitation and performance and acceptance recommendation.

METHOD:

The authors conducted a correlational study of applicants submitting secondary applications to the University of California, Davis, Schoolof Medicine, 2011-2013. URM applicants were black, Southeast Asian, Native American, Pacific Islander, and/or Hispanic. SES from eight application variables was modeled (0-1 score, higher score = lower SES). Regression analyses examined associations of URM status and SES with MMI invitation (yes/no), MMI score (mean of 10 station ratings, range 0-3), and admission committee recommendation (accept versus not), adjusting for age, sex, and academic performance.

RESULTS:

Of 7,964 secondary-application applicants, 19.7% were URM and 15.1% self-designated disadvantaged; 1,420 (17.8%) participated in the MMI and were evaluated for acceptance. URM status was not associated with MMI invitation (OR 1.14; 95% CI 0.98 to 1.33), MMI score (0.00-point difference, CI -0.08 to 0.08), or acceptance recommendation (OR 1.08; CI 0.69 to 1.68). Lower SES applicants were more likely to be invited to an MMI (OR 5.95; CI 4.76 to 7.44) and recommended for acceptance (OR 3.28; CI 1.79 to 6.00), but had lower MMI scores (-0.12 points, CI -0.23 to -0.01).

CONCLUSIONS:

MMI-based admissions did not disfavor URM applicants. Lower SES applicants had lower MMI scores but were more likely to be invited to an MMI and recommended for acceptance. Multischool collaborations should examine how MMI-based admissions affect URM and lower SES applicants.

PMID:

 

26017355

 

[PubMed - in process]


MMI에서 면접관의 특성과 평가 점수의 관계(Acad Med, 2004)

The Relationship between Interviewers’ Characteristics and Ratings Assigned during a Multiple Mini-Interview

Kevin W. Eva, PhD, Harold I. Reiter, MD, MSc, Jack Rosenfeld, PhD, and Geoffrey R. Norman, PhD






MMI는 지원자의 수행능력에 대한 신뢰도있는 추정을 가능하게 해주나, 이질적인 평가자들의 서로 다른 vantage point로부터 생길 수 있는 bias에 관심을 둬야 한다.

This Multiple Mini-Interview (MMI) has been shown to provide a reliable estimate of candidates’ perfor- mance,1 but the new protocol demands that attention be paid to the biases that might arise as a result of the different vantage points held by heterogeneous raters.



배경

Background


문제는 내용-특이성이다. 학생선발 결정은 Albanese 등이 지적한 바와 같이, "거의 무한에 가까운 서로 다른 상황에 대해서 발생가능성이 가장 높은 안정적인 특질에 관심이 있다". 비록 그러한 "안정적인 특질"이 존재하느냐에 대한 논쟁은 있지만, 다양한 상황을 맞닥뜨리면서 보여주는 평균적인 수행능력이 어떠한 단일한 상황에서의 모습보다 한 개인의 질(qualities)에 대해서 더 일반화가능하다는 것이 여러 context에서 명확해지고 있다.

The problem is one of content spec- ificity. In making selection decisions, as indicated by Albanese et al. “one is most interested in stable qualities that have a high probability of occurrence in an almost infinite number of different sit- uations.”2,p.317Although debate exists regarding whether such “stable qualities” exist, it has become clear in various con- texts that the average performance an individual displays over the course of many encounters is a more generalizable indication of that individual’s qualities than is any single encounter.5


MMI

The Multiple Mini-Interview


MMI가 입학에서 사용되는 OSCE라고 할 수 있지만, 우리는 이 이름을 바꿨는데, 그 이유는 판단이 객관적이지 않고, 스테이션이 의도적으로 임상과 무관하게 설정되기 때문이다.

Although essen- tially an admissions OSCE, we have opted to change the name of the proto- col to make explicit the facts that the judgments are not objective and the stations are intentionally nonclinical.


이 절차는 입학위원회가 종사하는, MMI를 도입하는 기관의 교육 철학에 따라 영향을 받게 되며, 또한 더 넓은 차원에서 진료행위를 하는 의사의 핵심역량에 대해 설명하는 문헌의 영향을 받는다. 그 절차는 Reiter and Eva에 의해서 개발된 바 있다.

This process should be informed by the educational philosophy adopted by the institution in which the admissions committee works as well as broader documents that out- line the key competencies of practicing physicians.6,7 A process for doing so has been developed by Reiter and Eva.8


기존의 연구를 살펴보면, MMI는 지원자의 역량에 대한 신뢰도높은 평가를 가능하게 해준다. 전반적인 검사의 신뢰도는 스테이션당 평가자보다 스테이션의 숫자를 늘릴 때 더 향상되며, 지원자와 평가자 모두에게 긍정적인 평가를 받는다. 그러나 아직 남겨진 질문은 교수와 비-교수 사이에 평가가 서로 다른가 하는 것이다. McMaster에서 다양성(heterogeneity)는 언제나 근본적인 원칙이었는데, 왜냐하면 학생들의 경험의 폭을 넓혀주는 것이 학업적 경험을 더 풍요롭게 해준다고 믿기 때문이다. 학생들의 다양성을 최대화하기 위하여 면접관들은 다양한 인구집단에서 선발되어왔는데, 여기에는 교수, 학생, 지역사회인사 등이 다 포함된다. 우리가 한 스테이션당 한 명의 면접관을 배치하기 때문에, 교수와 지역사회인사의 평가향상이 서로 일치하는가를 보는 것이 중요하다.

Previous research has shown that the MMI provides a reliable assessment of candidates’ abilities, that the overall test reliability improves to a greater ex-tent by maximizing the number of sta-tions rather than by maximizing the number of observers per station, and that the MMI is viewed positively by both candidates and examiners alike.1Remaining unanswered, however, is the question of whether faculty members and nonfaculty members are distin-guishable by their ratings. At McMas-ter, heterogeneity has always been a fundamental principle because it is be-lieved that breadth of experiences across students enriches the scholastic experi-ence.9 To try to maximize heterogeneity across students, interviewers have tradi-tionally been drawn from various popula-tions, including faculty members, medical students, and individuals from the com-munity at large. As we propose assigning a single interviewer to each station, the question of whether faculty members and individuals from the community assign performance ratings consistent with one another becomes an increasingly impor-tant question.



방법

METHOD


참가자

Participants


In addition, 18 health sciences fac- ulty members and 18 community mem- bers drawn from the legal profession and human resource departments of both local businesses and the university were recruited to act as examiners. In two instances, faculty members had to with- draw—they were replaced with current medical students.


절차

Procedure


On the study weekend, three sessions were run sequentially on each of two days with a 40-minute break for the examiners between sessions. Two examiners were assigned to each station. 

    • 3개는 교수만 Three of the nine stations were staffed by two faculty members, 
    • 3개는 지역사회인사만 three by two community members, and 
    • 3개는 교수와 지역사회인사 각 1명씩 three by one member of each group. 

Before the first MMI on each day the authors of this article met with the examiners to ensure that the procedure was clear, to answer any last-minute queries, and to reinforce that the ratings should be assigned in- dependently.



결과

RESULTS


점수

Scores

internal consistency는 높음. 총점만 사용하기로 함.

Table 1 shows the average score and standard deviation assigned to candi- dates for each of the four items on the evaluation form. The internal consis- tency (i.e., the average relationship be- tween pairs of questions) was found to equal .96, indicating a high degree of redundancy. As a result, only the “over- all performance” score was used in sub- sequent analyses.



To determine whether the ratings faculty members assigned were biased relative to those community members assigned, a repeated measures ANOVA was performed on the data collected within the three stations that were staffed by both a community and a fac- ulty member. The mean score assigned by faculty members (4.66) bordered on being significantly less than that as- signed by community members (4.96; F1,53 3.972, mean squared error 1.790, p .06).




신뢰도 분석

Reliability Analysis



평가자의 특성과 평가 점수와의 관계

The Relationship between Interviewers’ Characteristics and Ratings


두 명의 지역사회인사가 들어간 경우 일반화가능도는 가장 높은 경우 0.58정도였다. 두 명의 교수가 들어간 곳에서는 0.46, 한 명의 교수와 한 명의 지역사회인사가 들어간 경우는 0.31이었다. 각각 짝을 지어 보았을 때 그 차이는 통계적으로 유의했다.

The generaliz- ability for the three stations that were staffed by two community members was highest at .58. The three stations that were staffed by two faculty members revealed the second highest generaliz- ability .46. Least reliable were the three stations that were staffed by one member of each group (generalizability .31). Each pairwise difference is statis- tically significant: .58 versus .46, z(106) 2.78, p .05; .46 versus .31, z(106) 3.12, p .05; .58 versus .31, z(106) 5.90, p .05.


어떤 경우든 MMI의 일반화가능도는 각각 1명씩 들어간 경우 가장 낮았고, 둘 간에 larger inconsistency가 있음을 의미한다.

In either case, the generaliz- ability of the MMI appears to be lowest among stations evaluated by one commu- nity member and one faculty member, suggesting that there are larger inconsis- tencies in the way that community mem- bers rate candidates relative to the way that faculty members rate candidates than there are within either group of raters.



Post-MMI Surveys








DISCUSSION


면접이 지원자의 성격을 안정적이고 일반화가능한 수준으로 측정하기 위해서 평가자간 신뢰도를 보여주는 것 만으로는 충분한 근거가 되지 않음을 보여준다. 반면, 지원자가 이 면접과 저 면접 사이에 예측불가능한 형태로 엄청난 차이를 보여준다는 것을 제시한다. 그 결과 한 면접에서의 결과는 다음 면접에서의 결과를 거의 예측해주지 못한다.

These findings suggest that the dem- onstration of adequate interrater reli- ability, which has been used in the past as an argument for standardized inter-views, is insufficient evidence to ensure that an interview is measuring stable and generalizable applicant characteris-tics. By contrast, the findings suggest that applicants will vary considerably,in unpredictable fashion, from one in-terview to another. Consequently, the scores derived from any one interview will be a poor predictor of performance in a second interview.


적어도 이 결과는 Ferrier 등이 주장한 '다양한 평가자가 더 다양한 학생군을 만든다'라는 것을 지지한다. 교수와 지역사회인사가 준 평균점수의 차이는 더 많은 평가자 훈련을 통해서 극복가능하겠지만, 점수 차이의 절대값은 각 그룹에 속한 평가자가 동등한 비율로 있다면 문제가 되지는 않을 것이다. 

At the very least these results support Ferrier et al.’s9 claim that using heterogeneous raters may result in a more heterogeneous class. The difference we observed in the mean scores faculty and community rat- ers provide may be overcome with fur- ther training, but the absolute differ- ence in scores will not matter as long as all circuits contain an equal proportion of examiners from each group. It should be noted that the distinction drawn in this study between raters of different backgrounds is very broad.


MMI의 또 다른 장점은 Edward 등이 밝힌 네 가지 입학면접의 목적을 (굳이 한 차례의 면접에 뒤섞지 않고서도) 달성할 수 있다는 것이다. (정보 수집, 의사 결정, 확인, 모집) 또한 전통적인 면접에서 지적된 시간의 비효율적 사용 문제도 극복할 수 있다.

Additional advantages to the MMI include the potential to achieve the four purposes of admissions interviews identified by Edwards et al.4 (i.e., infor- mation gathering, decision making, ver- ification, and recruitment) without con- founding these purposes within a single interview (e.g., one station could be designed as a recruitment station with- out the goal of attracting the best can- didates affecting the rest of the inter- view process). The MMI also corrects for the inefficient use of time that has been identified by Litton-Hawes et al.12 as a problem in more traditional inter- views.


"깐깐한" 혹은 "널럴한" 면접관에게 배정될 가능성이 무작위였지만 더 많은 수의 평가자에 의해 평가되면 이 효과는 사라질 것이다.

Similarly, any chance effects of being randomly assigned to an “easy” or “hard” panel of interviewers will be di- luted with the MMI as candidates are exposed to a greater number of examin- ers.


왜 지역사회인사의 평가가 교수들의 평가보다 더 less consistent 할까?

Of further interest is the finding that community members’ ratings were less consistent with those provided by fac- ulty members than were the ratings pro- vided within either group.




8. Reiter HI, Eva KW. Reflecting the relative values of community, faculty, and students in the admissions tools of medical school. Sub- mitted manuscript.


Background: In defining the characteristics of medical students that society and the medical profession find desirable, little effort has been spent assessing the relative value of the dozens of characteristics that have been identified. Furthermore, many institutions go to great lengths to ensure equal representation across stakeholder groups in an effort to maximize the heterogeneity of the pool of students accepted to study medicine; however, the extent to which different stakeholders value different characteristics has yet to be determined. 


Purpose: This study was an attempt to assess the relative value of the characteristics of medical students that society and the medical profession find desirable. 


Methods: Using documents created internationally to identify the core competencies of medical personnel, a series of 7 characteristics were generated for inclusion in a study that adopted the paired comparison technique. Of 347 surveyed, 292 respondents indicated the rank ordering they would assign to each characteristic by circling the more important characteristic in all possible pairings. 


Results: Overwhelmingly,ethical” was deemed to be the most important characteristic on which selection tools should be based. Surprisingly, the pattern of responses was highly consistent regardless of stakeholder group and degree of affiliation with the undergraduate medical program. 


Conclusions: The generalizable features of this study not only include the empirical findings but also demonstrate useful survey protocol that can be adapted by any admission committee to guide the generation of an institution-specific admissions blueprint. A novel protocol that provides the necessary flexibility is discussed.














 2004 Jun;79(6):602-9.

The relationship between interviewers' characteristics and ratings assigned during a multiple mini-interview.

Author information

  • 1Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada. evakw@mcmaster.ca

Abstract

PURPOSE:

To assess the consistency of ratings assigned by health sciences faculty members relative to community members during an innovative admissions protocol called the Multiple Mini-Interview (MMI).

METHOD:

A nine-station MMI was created and 54 candidates to an undergraduate MD program participated in the exercise in Spring 2003. Three stations were staffed with a pair of faculty members, three with a pair of community members, and three with one member of each group. Raters completed a four-item evaluation form. All participants completed post-MMI questionnaires. Generalizability Theory was used to examine the consistency of the ratings provided within each of these three subgroups.

RESULTS:

The overall test reliability was found to be .78 and a Decision Study suggested that admissions committees should distribute their resources by increasing the number of interviews to which candidates are exposed rather than increasing the number of interviewers within each interview. Divergence of ratings was greater within the pairing of community member to faculty member and least for pairings of community members. Participants responded positively to the MMI.

CONCLUSION:

The MMI provides a reliable protocol for assessing the personal qualities of candidates by accounting for context specificity with amultiple sampling approach. Increasing the heterogeneity of interviewers may increase the heterogeneity of the accepted group of candidates. Further work will determine the extent to which different groups of raters provide equally valid (albeit different) judgments.

PMID:
 
15165983
 
[PubMed - indexed for MEDLINE]


MMI 시험 특성: 지원자에게 상상하길 요구하기보다는 회상하길 요구하라(Med Educ, 2014)

Multiple mini-interview test characteristics: ‘tis better to ask candidates to recall than to imagine

Kevin W Eva1 & Catherine Macala2





MMI는 그 정의상 일련의 독립적 관찰을 통해 지원자에 대한 정보를 얻으며(대개 인터뷰의 형태로), 선발을 하는 주체가 되는 기관의 목적이나 이상(desires), 그리고 선발된 학생이 장차 될 전문직의 특성을 바탕으로 blueprint를 만든다. 따라서, MMI는 어떤 평가의 도구나 수단이라기보다는 평가의 프로세스로 봐야 한다. 따라서 "MMI는 무엇을 위해서 하는것인가?"라는 질문은 무의미하며, implementation에 따라서 완전히 달라질 수 있기 때문이다.

By definition, it involves collecting (and aggregat- ing across) a series of brief independent observa- tions of the candidate (typically in the form of interviews), preferably blueprinted against the goals and desires of both the institution making the selection and the profession to which the candidate is applying. As a result, the MMI should be considered a process of assessment rather than a tool or instru- ment, and generic questions such as ‘For what does the MMI select?’ are meaningless because the answer is entirely dependent on implementation.


MCQ를 가지고 다양한 내용을 대표하는 시험을 만들 수 있는 것처럼, 매우 다양한 스테이션들로 MMI를 구성할 수 있다.

Just as one can populate a multiple-choice question (MCQ) examination with questions representative of diverse content areas, one can populate an MMI with highly variable stations.


기존 연구를 살펴보면 일반적인 원칙들을 발견할 수 있다. 신뢰도에 대해서는 관찰의 횟수를 증가시키면 신뢰도가 증가하는데, 10~12개 스테이션에서 plateau에 도달하며, 스테이션당 시간을 늘리는 것의 장점은 별로 없고, 각 상황마다 평가자의 수를 늘리는 것보다는 여러 개의 독립적 상황에 대한 수행능력을 관찰하는 것이 더 효과가 좋다.

Research has identified gen- eral principles, including that the reliability of mea- surement improves with increasing number of observations, often reaching a plateau in the 10–12 range,2 that extending the length of the interactions has little discernible benefit,3 and that observing per- formance across independent situations has a greater beneficial impact on the reliability of measurement than does incorporating the opinions of multiple rat- ers within each situation.4,5



배경

Background


MMI 프로세스는 크게 두 가지에 토대를 둔다. Sampling과 Structure

The MMI process was largely designed on two foundations: sampling and structure.


Sampling이 중요하다는 것은 인간 행동에 대한 trait-based model에 대한 우려로부터 출발했다. 사람을 묘사하는데 쓰이는 단어(똑똑한, 달변의, 전문적인)는 변하지 않는 특성인 것처럼 묘사하지만, 실제 행동을 보면 매우 맥락-특이적이다.

The priority placed on sampling is drawn from empirically derived concerns about trait-based mod- els of human behaviour.6 Whereas the adjectives we use to describe people (e.g. ‘smart’, ‘eloquent’, ‘professional’) imply unwavering features of the individual, behaviour has been shown repeatedly to be context-specific.7


한 가지 임상상황에 대한 단일한 관찰결과가 의미하는 바는 한 사람의 지식에 대해서 한 문항의 MCQ가 말해주는 것과 다를 바가 없다.

One observation tells us no more about an individual’s clinical prowess than one MCQ answer tells us about the extent of an individual’s knowledge base.


8분짜리 면접이 지원자의 능력에 대해 충분히 모든 측면을 보여주지 않는다는 주장과 달리, 우리는 이것을 logistic한 필요에 따른 (약점이 아니라) 강점이라고 본다. 여러 연구를 보면 더 긴 면접시간의 가치는 그저 환상일 뿐이며, 이는 지원자에 대한 면접관의 인상은 매우 빠른 시간내에 형성되기 때문이다. 더 나아가서 시간이 더 많을 경우 지원자가 애초에 면접에서 의도한 방향과 다른 방향으로 비틀어버릴 기회를 준다.

Contrary to the argument that 8- minute selection interviews do not allow sufficient time to yield a full perspective on a candidate’s abil- ity, we view this logistic necessity as a strength rather than a liability. A variety of studies have demon- strated that the added value of longer interviews is illusory as examiners tend to form impressions very quickly.9,10 Further, more time yields greater oppor- tunity for the applicant to sway the conversation to issues that are distinct from the intended focus of the interview.11


9 Ambady N, Bernieri F, Richeson J. Toward a histology of social behaviour: judgmental accuracy from thin slices of the behavioural stream. Adv Exp Soc Psychol 2000;32:201–72.

10 Ambady N, Rosenthal R. Thin slices of expressive behaviour as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 1992;111:256–74.


두 번째 토대인 Structure의 가치는 조금 덜 명확하다. MMI가 처음 만들어졌을 때, panel-based 면접은 면접자간 신뢰도 차이가 크지만 면접이 구조화되면(구체적인 문항을 주면) 더 나아진다고 했다. 비록 직관적으로는 그럴 듯 하지만, 최근의 연구 결과를 보면 이 가정에 대한 의문을 갖게 한다. Kreiter 등은 기존 연구는 간접적 비교만 한다고 지적했다. 다섯 개의 구조화된 질문으로 구성된 25분짜리 의과대학입학면접으로부터 일반화가능도 분석을 통해서 '질문'에 기인하는 variance가 무시할 만한 정도라고 밝혔다. 이로부터 저자들은 다수의 질문을 통해서(즉 sampling을 늘려서) 문항 간 난이도에서 오는 차이를 상쇄시킬 수 있기에, 문항의 구조화에서 얻을 수 있는 장점이 없다는 결론에 이르렀다. 몇 년 후, 같은 기관의 면접에서 비구조화 요소가 구조화 면접에 추가되었고, Axelson은 그 결과로부터 구조화 요소보다 평가자간, 평가-재평가 신뢰도가 높다고 보고했다. 결론은 모호하다.

The value of the second foundation, structure, how- ever, has become less clear over time. When the MMI was created, the literature on panel-based interviewing practices revealed that the inter-rater reliability of such exercises was highly variable, but tended to be greater when interviews were structured by giving interviewers a specific set of questions.16 This remains intuitively appealing, but recent research has led us to question this assump- tion. Kreiter et al.17 critiqued the literature for offering only indirect comparisons. Using data collected from a set of 25-minute medical school selection interviews containing five structured questions, they used generalisability analyses to illustrate that the variance attributable to question had a negligible influence on the reliability observed. These findings led the authors to argue that asking multiple questions (i.e. increased sam- pling) washes out differences in difficulty level across questions such that structuring questions offers no advantage. A few years later, an unstruc- tured component was added to the end of the struc- tured interview at the same institution, and Axelson et al.18 reported that resulting scores had greater inter-rater and test–retest reliability than the struc- tured component. As the authors noted, it is unclear whether the performance of the unstructured interview derived fromthe fact that it followed the structured interview or whether the benefit of such structuring is illusory.


구조화 스테이션을 만드는 것은 MMI 프로세스 도입에 가장 큰 장애라는 점에서 이 질문은 대단히 중요하다. 시험 보안에 관한 우려가 많은 대학으로 하여금 (그것을 예방하고자) 스테이션의 데이터베이스를 구축하거나 구입하게 만들었다(비록 시험 보안 위반에 대한 영향력은 확실하지 않더라도). 만약 MMI의 장점이 구조화와 무관하다는 결론에 이른다면, 즉, 주로 sampling의 효과만 있다면, MMI를 도입하는 비용이 크게 절감될 것이다.

This is an important question because the creation of structured stations is one of the primary barriers to adoption of the MMI process.19 Concern about test security breaches derived from the repeated use of set questions has led most institutions we have encountered to generate or purchase a database of stations to reduce this risk (although the impact of such breaches remains questionable20,21). If the benefits that have been observed to accrue from the adoption of MMI practices are unrelated to struc- ture and, instead, are derived dominantly from the sampling it promotes, then the cost inherent in creating an MMI might be substantially reduced.


MMI에서 가장 흔한 타입의 스테이션은 어떤 이슈와 관련하여 면접관과 토론하게 하는 것인데, 이 때 '관련성'의 정의는 그 기관이 만든 blueprint에 달려있으며, 공개되어있는 예시들을 보면 주로 지원자가 경험하게 될 상황과 관련된 딜레마를 제시하는 경우가 많다. 조직/산업 관련 심리연구 문헌을 보면 그러한 면접 대화는 경험-기반(과거 경험을 떠올리게 하기) 이거나 상황-기반(맞닥뜨릴 상황을 상상하게 하기)이다. 어떤 종류의 면접이 더 효과적인지에 대해서 많은 논란이 있었다. 

The most common type of MMI station involves ask- ing a candidate to discuss an issue of relevance with an examiner. The definition of ‘relevance’ depends on the blueprint the institution establishes, but pub- lished examples indicate a tendency towards describ- ing a dilemma about which the candidate is expected to engage in dialogue. The organisational and industrial psychology literature defines such dialogues as generally being ‘experience-based’ (i.e. candidates are required to recall their particular experiences and the behaviours they demonstrated) or ‘situation-based’ (i.e. candidates are required to imagine and describe what they would do if they were to encounter a particular situation).22 There has been considerable debate in this literature regarding which type of interview is most effective.


상황-기반 면접을 선호하는 사람들은 면접이 미래지향적으로 이뤄져야 하며, 과거에 유사한 경험이 없던 지원자라도 주어진 상황에서 자신의 인적특성을 보여줄 기회가 있어야 한다고 주장한다. 

반면 경험-기반 면접을 선호하는 사람들은 과거의 행동이 미래 행동의 가장 정확한 예측인자라고 주장하며, 가상적 상황을 지양하고 과거의 경험에 초점을 둬야 한다고 말한다. 


인상-관리(자기가 어떻게 보이는지를 관리하는 것)이 면접 상황에 따라서 서로 다르게 나타나는데, 상황-기반 면접에서는 환심을 사려는 방향(호감을 유발하고 의견을 동조하게 하는) 으로 나타나며, 경험-기반 면접에서는 자기-홍보 (자신의 성공이 다른 요인보다 스스로의 능력 덕분이다)가 주로 나타난다.

Those who favour situation-based interviewing argue that structure is important and that interviews should be future-oriented so that interviewees with- out previous experience in a given context are granted the opportunity to demonstrate their per- sonal qualities; those who favour experience-based interviewing argue that past behaviour is most pre- dictive of future behaviour and, as a result, one should avoid discussion of the hypothetical and focus on previous experience.20 Impression manage- ment (i.e. attempts to control the image one pro- jects) appears to take place in different ways according to interview type, with situation-based interviews tending to induce ingratiating tactics (i.e. behaviours aimed at inducing liking, such as opin- ion conformity) and experience-based interviewing tending to induce self-promotion (i.e. behaviours aimed at indicating that one’s success is attributable to competence rather than other factors).11



참가자 Participants


4개 서킷, 12개 스테이션, 48명 평가자

Four distinct circuits of 12 stations required the participation of 48 examiners.



문항 Materials


모든 스테이션은 CanMEDS 프레임워크에 기반. 

All stations were focused upon the Professional role promoted within the CanMEDS framework pre- sented by the Royal College of Physicians and Sur- geons of Canada.25 


네 개의 SJ스테이션은 이후 training기간 동안 발생할 수 있는 상황에 대해서 그 상황을 상상하고 어떻게 할지를 물었음.

Four SJ stations were designed around this role, the operational definition being that the station had to present a situation that could plausibly occur during medical training and would require the candidate to imagine and discuss what he or she would do in that situation.


4명의 평가자, 문 앞에 설명, 스테이션 목적에 관한 한 쪽 짜리 설명, 스테이션당 6개까지 문항. 대화를 진행할 것(스크립트처럼 질문만 하지 말고) 질문은 대화를 하는데 도움을 주는 정도. CanMEDS에 대한 설명. 평가지. 6점척도로 세 가지에 대해서 평가 (i) communication skills, (ii) reasoning ability, and (iii) professionalism. 

This information was provided to the four examin- ers who were assigned to that station (one per cir- cuit) and posted on the doors of their rooms for candidates to read. In addition, examiners were given one page of information outlining both the intent of the station and a list of up to six questions they could ask the candidate. They were told that they should engage in actual dialogue with candi- dates rather than treating the list of questions as a script (i.e. the questions were presented simply as prompts that examiners might find useful if conver- sation stalled). Examiners were also given a page of background information outlining aspects of the CanMEDS competencies that were relevant to the situation described, along with a copy of the score- sheet on which they were to offer their assessment. None of the background information or prompting questions contained content that was specific to the instructions given to candidates and thus the same information could be given to examiners in other experimental conditions. The scoresheet consisted of a series of 6-point scales (1 = weak, 2 = below average, 3 = average, 4 = very good, 5 = excellent, 6 = exceptional) on which examiners were asked to rate each candidate’s (i) communication skills, (ii) reasoning ability, and (iii) professionalism. Brief definitions were provided for each quality.


네 개의 BI 스테이션을 위해서 SJ 스테이션을 약간 modify함. 

To generate the four BI stations, each of the SJ sta- tions was modified so that the candidate was instructed to think of a time in which he or she had experienced a situation analogous to the scenario presented in the SJ station.


다른 정보는 SJ 스테이션과 동일

All other information provided to the examiners on these stations was identical to that provided to the SJ station interviewers with the exception of minor wording revisions to ensure that the grammar remained appropriate.


FF스테이션에 대해서는 지원자의 적합성을 평가할 수 있는 대화를 하라고 함. 

To generate the four FF stations, examiners were told simply that we wanted them to conduct a con- versation that would help them evaluate the candi- date’s suitability for the Professional role. They were given the same background information as used in other stations, but the prompting questions were removed. The station instruction, as presented to candidates, said simply:



절차 

Procedure


지원자는 무작위 배정 

Candidates were randomly assigned to a circuit and a starting station.


2분 지시문 숙지, 7분 후 종료, 옆 방 이동. 스테이션 간 3분이 있어서 1분은 지원자 설문 작성, 2분은 다음 스테이션 지시문 숙지

At the start of the MMI, candidates were given 2 minutes to read the first station, after which a buz- zer was sounded to alert them to enter the inter- viewing rooms. Seven minutes later, another buzzer was sounded to indicate that the interview was com- plete and that the candidate should move to the next station. From this point onward, a pause of 3 minutes was provided between stations and candi- dates were asked to spend 1 minute completing a candidate survey about the preceding station and 2 minutes reading and preparing for the next sta- tion.



분석 

Analysis


맥락-특이성은 Applicant x Station 상호작용에 의해서 나타난다. 연구 디자인 상 평가자의 영향을 분리해내기 어렵게 만들며 따라서 순수한 맥락-특이성은 불가능하다. 이러한 연구 설계는 세 가지 이유에 근거한다.

Context specificity is generally indicated by a large Applicant X Station interaction. The design of this study did not allow us the capacity to separate rater influences from station influences and therefore a pure test of context specific- ity is not available. This design decision was based on three reasons: 

  • 평가자 효과는 모든 실험조건에서 나타난다.
    (i) rater effects are likely to be present in all experimental conditions; 
  • 한 스테이션에 한 명의 평가자를 두는 것은 MMI나 OSCE에서 흔한 일이다. 
    (ii) the inclusion of one examiner per station is common practice in MMIs, objective structured clinical examinations (OSCEs) and other comparable assessment activities, and 
  • 기존 연구들을 보면 평가자의 variance는 station variance에 비해서 기여하는 바가 작다.
    (iii) previous work has robustly indicated that rater vari- ance tends to contribute little error relative to station variance.4,5



RESULTS


신뢰도 Reliability


Applicant x Station error가 가장 컸고, 그 다음은 Residual error, 그 다음은 Applicant 였다.

Table 1 reveals that the dominant source of vari- ance in all cases was the Applicant X Station inter- action. The residual error (Item X Station X Applicant [Circuit]) was next most dominant, fol- lowed by Applicant differences, which accounted for 10.0–18.7% of the variance.


Applicant에 따른 variance는 BI > SJ > FF 순이었는데, 이는 BI 스테이션이 지원자간 변별에 가장 뛰어남을 보여준다. Station, Item, Circuit의 main effect와 그것들의 상호작용은 무시할만한 수준이었음. 

The variance attribut- able to Applicant declined from BI to SJ and then to FF stations, suggesting that BI stations offered better capacity to consistently discriminate between applicants relative to the other forms of interview. The main effects of Station, Item and Circuit, and their interactions, were negligible, generally contrib- uting < 3% of the variance in scores.


스테이션간 신뢰도는 스테이션간 평가 결과가 일관되는가에 대한 것으로, BI가 가장 우수하다.

Inter-station reliability, reflecting the extent to which the scores assigned are consistent across stations, suggested that BI stations allowed better measurement than SJ or FF stations.




실제 MMI 결과와의 비교

Relationship to the actual admission MMI

SJ, r = 0.45; BI, r = 0.57, and FF, r = 0.42.

The correlations between the average of the four stations within each station type and the average of the 9-station MMI used for the actual admis- sion decision were: SJ, r = 0.45; BI, r = 0.57, and FF, r = 0.42.



수용가능성

Acceptability


지원자에서 지원자들이 FF가 더 어렵고, 더 긴장을 느낌

In general, candidates considered the FF stations to be more challenging and more anxiety-provoking than either the SJ or BI stations (Table 4). 


평가자의 관점은 유형간 큰 차이가 없었음.

In gen- eral, examiners’ perceptions of their ability to assess candidate performance and the amount of strain MMI stations placed on candidates were insensitive to station type, although BI stations were rated rela- tively low on one question (Table 5).



결론

DISCUSSION


평가프로세스의 질을 평가하기 위한 도구의 다양한 측면이 잘 align 되어있지 않아(신뢰도를 높이면 활용가능도가 떨어짐), 적절한 협상을 하게 된다. 우리는 다양한 결과가 internally 그리고 validity study에 대해서 일관된 결과를 낸다는 것에 놀랐다. 다양한 관찰을 모으는 것 만으로도 중등도의 신뢰도는 도달할 수 있지만(FF 에서 G=0.66), 스테이션을 구조화하는 것은 acceptability는 물론 신뢰도에 있어서도 이득이 있었다. 다만 신뢰도에 대해서는 BI에 대해서만 이득이 있었다. SJ가 신뢰도 측면에서 BI와 같다고 하더라도, feasibility (만들기 쉬움)과 동등한 수용가능성을 고려하면 BI를 쓰는 것이 낫다.

Given that the various aspects of utility used to assess the quality of assessment processes commonly do not align (e.g. increasing reliability tends to decrease feasibility), thereby requiring that compro- mises are made,14 we were surprised by the extent to which the various outcomes considered yielded consistent conclusions both internally and with respect to validity studies that have been conducted in other domains of selection. Although moderate reliability can be achieved simply by aggregating across many observations (G = 0.66 in the FF condi- tion), there did appear to be some benefit from the structuring of stations in terms of both acceptability and reliability, the latter being true only when BI techniques were used (G = 0.77). Even if SJ stations were to be considered equal to BI stations in terms of their reliability, the greater feasibility (i.e. ease of generation) and equivalent acceptability of BI stations would support the prioritising of their use.


추측하건대, BI를 사용하면 - 자신의 경험을 성찰하게 만들고 - MMI 사용에 대한 초창기의 비판 - 지원자가 자신의 과거 자서전적 내용을 설명할 기회가 없다 - 도 극복할 수 있다.

Speculatively, the use of BI stations, which require candidates to reflect on and discuss personal experiences they have had, may also help MMI administrators to address one of the more robust early criticisms of the MMI process, which claims that candidates desire an opportunity to pres- ent autobiographical details during their interview.1







17 Kreiter CD, Solow C, Brennan RL, Yin P, Ferguson K, Huebner K. Examining the influence of using same versus different questions on the reliability of the medical school preadmission interview. Teach Learn Med 2006;18 (1):4–8.


18 Axelson R, Kreiter C, Ferguson K, Solow C, Huebner K. Medical school preadmission interviews: are structured interviews more reliable than unstructured interviews? Teach Learn Med 2010;22 (4):241–5.


20 Reiter HI, Salvatori P, Rosenfeld J, Trinh K, Eva KW. The effect of defined violations of test security on admissions outcomes using multiple mini-interviews. Med Educ 2006;40:36–42. 


21 Griffin B, Harding DW, Wilson IG, Yeomans ND. Does practice make perfect? The effect of coaching and retesting on selection tests used for admission to an Australian medical school. Med J Aust 2008;189:270–3.


23 Taylor PJ, Small B. Asking applicants what they would do versus what they did do: a meta-analytic comparison of situational and past behaviour employment interview questions. J Occup Organ Psychol 2002;75 (3):277–94.


24 Klehe U-C, Latham G. What would you do – really or ideally? Constructs underlying the behaviour description interview and the situation interview in predicting typical versus maximum performance. Hum Perform 2006;19:357–82.


















 2014 Jun;48(6):604-13. doi: 10.1111/medu.12402.

Multiple mini-interview test characteristics: 'tis better to ask candidates to recall than to imagine.

Author information

  • 1Centre for Health Education Scholarship, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

Abstract

CONTEXT:

The multiple mini-interview (MMI), used to facilitate the selection of applicants in health professional programmes, has been shown to be capable of generating reliable data predictive of success. It is a process rather than a single instrument and therefore its psychometric properties can be expected to vary according to the stations generated, the alignment between the stations and the qualities an institution prioritises, and the outcomes used. The purpose of this study was to explore the MMI's test characteristics when station type is manipulated.

METHODS:

A 12-station MMI was established in which four stations were presented in three different ways. These included: situational judgement (SJ) stations, in which applicants were asked to imagine what they would do in specific situations; behavioural interview (BI) stations, in which applicants were asked to recall what they did in experienced situations, and free form (FF) stations, which were unstructured in that the examiner was simply given a brief explanation of the intent of the station without further guidance on how to conduct the discussion. Four circuits of the 12 stations were run with one examiner within each station. Candidates and examiners were surveyed regarding their experience. The reliability of the scores derived from the assessment was analysed separately for each station type.

RESULTS:

A total of 41 medical school candidates participated after completing the regular admission process. Although the score assigned did not differ across station type, BI stations more reliably differentiated between candidates (g = 0.77) than did the other station types (SJ, g = 0.69; FF, g = 0.66). The correlation between actual MMI scores and BI stations was also greatest (BI, r = 0.57; SJ, r = 0.45; FF, r = 0.42). Candidates' opinions indicated that FF stations were more anxiety-provoking, less clear, and more difficult than structured stations (SJ and BI stations). Examiner opinions indicated equivalence on these measures.

CONCLUSIONS:

The results suggest that structuring stations has value, although that value was gained only through the use of BI stations, in which candidates were asked to recall and discuss a specific experience of relevance to the purpose of the interview station.

© 2014 John Wiley & Sons Ltd.

PMID:
 
24807436
 
[PubMed - indexed for MEDLINE]


MMI 점수가 면접관의 엄격/관대 성향에 따라 보정되어야 하는가? (Med Educ, 2010)

Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview?

Chris Roberts,1 Imogene Rothnie,2 Nathan Zoanetti3 & Jim Crossley4






Theoretical framework for interviewer performance


평가자와 관련된 오류에는 크게 세 가지가 있다. (1. 엄격/관대, 2. 면접관 주관(지원자 관련, 문항 관련), 3. 상호작용)

There are broadly three areas of interviewer-related error within the MMI,1,4,8 which are expanded upon in Fig. 1.



그러나 복잡한 평가 절차로 인해서 어떤 MMI 결과자료를 가지고도 아직까지 1차 효과 혹은 2차 효과(상호작용)을 정밀하게 추정해내지는 못하고 있다. 이는 기본적으로 대규모의 면접 계획에서 면접관은 문항에 nested 되어있기 때문이다. 현재까지 지원자-간 variance는 22%에서 25% 수준이다. MMI의 난이도에 따른 것은 0-3%, 평가자 관련 요인 중 엄격/관대 성향은 14% 를 차지한다.

However, because of the designs inherent in complex assessment procedures,6 no set of MMI data has thus far allowed for precise estimates of each first-order effect and their second-order interactions using G theory. This is because of confounding within the naturalistic large-scale interviewing plan, in which interviewers are usually nested in MMI questions. Current estimates suggest candidate-to-candidate variance ranges from 22%4 to 25%.1 MMI question difficulty variance is in the range of 0–3%.1,4 Of the interviewer-related factors, interviewer strin- gency ⁄ leniency accounts for 14% of error,4 


면접관의 지원자-특이 주관은 45% 정도에 달하는 것으로 연구된 바도 있다.

Variance reflecting interviewer candidate-specific subjectivity has been estimated to be as high as 45%in a study of assessments which used two interviewers within each station.8


MMI에 참여하는 면접관들이 자신들이 내리는 판단에 대해서, Kumar 등은 면접관이 결정을 내릴 때 생기는 긴장에 대한 preliminary insight를 제공한 바 있다. 

Kumar et al.9 have provided some preliminary insights into the tensions that arise in the process of making such decisions. 

  • 독립적 차원의 의사결정의 가치와 입학생에게 기대되는 수준에 대한 합의
    These highlight, firstly, the contrast between appre- ciation of independent decision making and the need to achieve a consensus around the standards expected of entry-level students. 
  • 의사소통기술과 대비하여 입학생 수준에서 요구되는 추론능력을 평가한다고 느낌
    The second source of tension concerns the extent to which interviewers may feel they are assessing entry-level reasoning skills in professionalism domains compared with communications skills. 
  • 어떻게 면접관이 지원자에 대한 주관적 판단을 극복할 수 있을까? 
    The third source relates to how interviewers overcome their subjectivity towards certain candidates and 
  • '탈락하는' 지원자에 대한 우려를 어떻게 극복할 것인가?
    the fourth to how they handle their concerns over ‘failing’ candidates. 
  • 참가자들은 적극적으로 면접관과의 상호작용을 통해서 자기 자신에 대한 긍정적 판단을 이끌어내고자 노력하며, 이는 대답의 질과는 무관하다.
    Finally, candidates are actively interacting with interviewers using their impression management skills to promote a favourable decision for themselves, which is not necessarily related to the quality of their answers.9



방법론적 접근

Methodological approaches


IRT 사용

Researchers have turned to item response theory (IRT)11 to provide this opportunity.


MFRM 사용

Roberts et al.12 applied multi-faceted Rasch modelling (MFRM) to the MMI, but they focused on differences in the performance of MMI questions in an item bank rather than on differences between the interviewers themselves. However, they did note that questions appeared to be measuring a unidimensional con- struct, ‘entry-level reasoning skills in professionalism’, as suggested by a good fit to the IRT model.12 The consistency of judgements within and between judges and candidates has been the focus of a number of papers.13–17 IRT software such as FACETS provides easily derived estimates of candidate ability, inter- viewer stringency ⁄ leniency and question difficulty.


'관찰평균점수'는 raw score에 기반한 점수이며 'fair average score'는 다른 모든 facet의 요소들이 평균값일 경우를 가정한 점수이다. 이러한 세팅에서 FAS는 면접관 엄격/관대 성향에 따라 보정된 점수이다.

An ‘observed average score’ is the average rating based on raw scores received by the candidate. The ‘fair average score’ is the measure that would have been observed if all the measures of the other elements on all other facets had been located at the average measure.18 In this setting, the fair average for candidates is the score that has been adjusted for interviewer stringency ⁄ leniency and question difficulty.


McManus는 엄격/관대 성향에 따라서 보정하면 95.9%는 바뀌지 않지만 2.6%가 원점수로는 탈락이지만 합격하게 되며, 1.5%가 원점수로는 합격하나 보정후 탈락함을 보였다. Harasym은 11%의 지원자가 영향을 받을 수 있다고 했다. 

For exam- ple, in the case of a clinical examination for entry into a professional college, McManus et al.14 found that if examination scores were adjusted for examiner stringency ⁄ leniency and the same pass mark was kept, the outcome for 95.9% of candidates would be unchanged using adjusted marks, whereas 2.6% of candidates would pass, although they had failed on the basis of raw marks, and 1.5%of candidates would fail, despite having passed on the basis of raw marks. However, Harasym17 estimated that as many as 11% of candidates in an MMI might be affected by adjusting for interviewer stringency ⁄ leniency,




Psychometric analysis


소프트웨어 

Multi-facet Rasch modelling was used in FACETS Version 3.65 (Winsteps.com, Chicago, IL, USA) to perform a concurrent estimation of several indepen- dent first-order facets and their associated error variances. A model was specified that included identification of the individual facets, the rating scale and how the interviewer was expected to interact with the rating scale.



세팅 

Setting


Details of the MMI design principles have been reported elsewhere.4,9,12 Candidates were applying to a 4-year, graduate-entry, problem-based learning (PBL) programme. From 2007 onwards, candidates were applying for medicine or dentistry or both. The MMI in this study was designed to assess entry-level reasoning skills in professionalism and had eight stations, with each candidate rotating through the circuit and meeting a different single interviewer at each station. Questions were sourced from a preprepared bank and took the format of a non-clinical scenario followed by structured prompts. Each question had five prompts marked with a 4-point Likert scale, giving a total of 20 raw marks per station and 160 for the whole assessment. In this design, although the performance of a candidate on any particular MMI question was assessed once only by a single interviewer, the total performance was rated by eight interviewers. Furthermore, each MMI question was assessed by several interviewers during the course of the MMI process. This created a network through which every parameter was linked to every other parameter with these connecting observations, allowing the measures estimated from the observations to be placed on one common scale.11 This naturalistic interviewing plan also allowed for the partially nested G study design.4



평가자

Interviewers

각 면접관은 평균 22명의 지원자를 면접함. 교수 89명, 지역사회인사 47명, 졸업생 39명.

Each interviewer had interviewed a median of 22 candidates (SD 18.44, range 4–121). Complete details were available in the database for 117 interviewers. Of the 207 used, 88 interviewers were known to be male and 95 were known to be female. Twenty-two were aged 18–34 years, 27 were aged 35–44 years and 68 were aged > 45 years. They included 89 faculty members, 47 community members and 39 graduates.


MFRM

Multi-facet Rasch modelling


Y축이 위로 갈수록 면접관이 엄격해지고, 지원자 능력이 높아지고, 난이도가 높아짐.

Reading the ruler (Fig. 2) from bottom to top shows increasing interviewer stringency, increasing candi- date ability and increasing question difficulty.


Fig 2와 Table 1 모두 면접관이 MMI 문항보다 더 variable함을 보여줌.

Both Fig. 2 and Table 1 show that interviewers are more variable than MMI ques- tions and the spread of interviewers is nearly 3.5 times that of MMI questions.


면접관 J는 모델의 예측과 over-fitting하여 지나치게 예측가능함, 즉 halo effect의 가능성을 시사하며, 면접관 G는 under-fitting으로 점수를 줄 때 randomness가 심함.

Interviewer J appeared to be over-fitting the model and his or her ratings were too predictable, suggesting a halo effect. Interviewer G seems to be under-fitting the model with too much randomness in his or her scoring.







Making adjustments for interviewer leniency and question difficulty


지원자 E는 엄격한 면접관을 만나서 OAS가 3.5로 낮지만 FAS는 3.64. 

Here, candidate E has a lower observed average score of 3.50, but a higher fair average score of 3.64 because he or she answered harder MMI questions and sawmore stringent interviewers. 


OAS대신 FAS를 사용하면, 합격자 270명중 31명(11.5%)는 합격에서 불합격이 되며, 여기서 중요한 것은 이것이 쌍방 이동의 과정으로, 그 대신 누군가가 합격하는 것이다.

Let us assume a scenario in which the fair average rather than observed average scores are used to rankthe candidates. In our situation, in which 270 studentplaces were on offer, if the MMI were the sole determinant of ranking, 31 of 270 (11.5%) candi- dates who were offered a place on the basis of their observed score rankings would not have been offered a place on the basis of their fair average rankings. This is a two-way movement.





Interviewer goodness-of-fit statistics


For the interviewer, the in fit mean square statistic ranged from 0.74 to 1.58 (mean 1.03, SD 0.74). This was a high-stakes assessment and was similar to a clinical rating situation and well within the accepted lower- and upper-control limits of 0.5 and 1.7 to indicate acceptable model fit.19



Number of candidates examined

면접관의 엄격 성향은 면접한 학생의 수와 유의하게 부적 상관관계가 있었다. 즉, 더 많은 학생을 면접한 경우 더 관대해진다. 이는 McManus의 연구결과와 반대되는 것.

Interviewer stringency ⁄ leniency showed a significant but inverse correlation with the number of candidates examined (r = ) 0.21, n = 207, p = 0.002). Thus, interviewers who interviewed more candidates tended to be somewhat more lenient. McManus et al.14 found examiners became more stringent with more candidates. Our finding contrasts with this, but we do not have data to show whether more lenient interviewers participated in more assessments or whether more interviewing caused interviewers to become more lenient.



시사점

Implications


IRT결과를 variance로 변환하는 과정이 중요하다. MFRM 사용에 관한 내용.

The translation of IRT output into variance compo- nents is important. Some have reported a number of limitations in applying IRT models to assessments which measure the performance of skills or behav- iours, as in the MMI.14 These arose because of claims that the MFRM analysis could not take into account the second-order effects of interviewer-by-station, interviewer-by-candidate and candidate-by-station var- iance. There was concern that, as in an incorrectly designed G study,6 error would be apportioned wrongly and hence any calculation of reliability or standard error of measurement was likely to be inflated. The use of MFRM to isolate variance com- ponents is very new and there has been some misun- derstanding in the medical education literature about how they can be estimated and reported with software such as FACETS. This has inflated reliability estimates undermining the credibility of the IRT method for this type of assessment. For example, McManus et al.14 reported variation between examinees in a clinical examination for entry into a professional college as an unrealistic 87%. This resulted from a calculation which partly assumed that the three first-order effects of examiner, item and person were proportions of 100%and thus neglected to take account of the bias or interactions and the residuals that MFRMalso reports.


FACETS를 활용하여 variance component를 분해할 수 있다.

An iterative relationship between the FACETS software developer and the educational research measure- ment community has ensured that later iterations of FACETS are able to provide the decomposition of variance components, including interactions, with naturalistic data.


MMI 훈련 과정에서 면접관들은 누가 hawk이고 누가 dove인지 피드백을 줘야하느냐에 대한 질문을 한다. 그러나 IRT로 측정하든 GT로 측정하든 MMI에서 엄격/관대 성향은 비교적 일관된 것이라는 점이, McManus의 연구와도 같은 결과이다. 따라서 이것의 함의는 McManus가 제안한 것과 같이, 면접관은 염격/관대 성향을 고치려고 하기보다는 지속적으로 하던대로 하는 것이 낫다.

In MMI training, interviewers often ask whether they should be given feedback on which of them are ‘hawks’ and which are ‘doves’ so that they can try to correct their tendencies to mark higher (leniently) or lower (stringently) on the rating scale. The finding that interviewer stringency ⁄ leniency seems to be a stable characteristic in the MMI, whether measured by IRT or by G theory, is remarkable and echoes the findings of McManus et al.14 in examiner stringency in clinical rating situations. The implications, as McManus et al.14 suggest, is that interviewers should not try to correct their hawkish or dove-like tendencies, but should instead continue to behave as they have always done.


Kumar가 지적한 바와 같이, 면관의 MMI 프로세스에 대한 경험이나 트레이닝의 효과에 대한 이론적 개발이 부족하다.

As Kumar et al.9 have noted, theoretical develop- ment in the area of interviewers’ experience of the process and impact of training is lacking.



13 Downing SM. Threats to the validity of clinical teaching assessments: what about rater error? Med Educ 2005;39:353–5.





















 2010 Jul;44(7):690-8. doi: 10.1111/j.1365-2923.2010.03689.x.

Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview?

Author information

  • 1Sydney Medical School-Northern, University of Sydney, Sydney, New South Wales, Australia. christopher.roberts@sydney.edu.au

Abstract

CONTEXT:

There are significant levels of variation in candidate multiple mini-interview (MMI) scores caused by interviewer-related factors. Multi-facet Rasch modelling (MFRM) has the capability to both identify these sources of error and partially adjust for them within a measurement model that may be fairer to the candidate.

METHODS:

Using facets software, a variance components analysis estimated sources of measurement error that were comparable with those produced by generalisability theory. Fair average scores for the effects of the stringency/leniency of interviewers and question difficulty were calculated and adjusted rankings of candidates were modelled.

RESULTS:

The decisions of 207 interviewers had an acceptable fit to the MFRM model. For one candidate assessed by one interviewer on one MMI question, 19.1% of the variance reflected candidate ability, 8.9% reflected interviewer stringency/leniency, 5.1% reflected interviewer question-specific stringency/leniency and 2.6% reflected question difficulty. If adjustments were made to candidates' raw scores for interviewerstringency/leniency and question difficulty, 11.5% of candidates would see a significant change in their ranking for selection into the programme. Greater interviewer leniency was associated with the number of candidates interviewed.

CONCLUSIONS:

Interviewers differ in their degree of stringency/leniency and this appears to be a stable characteristic. The MFRM provides a recommendable way of giving a candidate score which adjusts for the stringency/leniency of whichever interviewers the candidate sees and the difficulty of the questions the candidate is asked.

PMID:
 
20636588
 
[PubMed - indexed for MEDLINE]


고위직을 위한 상황면접질문과 행동묘사면접질문 비교(PERSONNEL PSYCHOLOGY, 2001)

COMPARISON OF SITUATIONAL AND BEHAVIOR DESCRIPTION INTERVIEW QUESTIONS FOR HIGHER-LEVEL POSITIONS


ALLEN I. HUFFCUlT Department of Psychology Bradley University
JEFF A. WEEKLEY Kenexa

WILL1 H. WIESNER, TIMOTHY G. DEGROOT Department of Psychology McMaster University
CASEY JONES Kenexa

 

 

 

 

 

Pulakos and Schmitt 는 고위직에 있어 SI가 BDI보다 덜 효과적이라는 가설을 내세웠다. 그들의 가설을 평가하기 위해서 우리는 2개의 새로운 구조화된 면접 연구를 수행하였다. 두 연구는 모두 고위직 선발에 대한 것이었고, 동일한 직무특성 평가를 위하여 SI와 BDI 문항을 매칭시켰다. 그 결과는 SI가 이러한 직위에 있어서는 수행능력 예측에 더 떨어진다는 것이다. 더 나아가서 SI와 BDI가 동일한 직무 특성을 평가하고자 매칭되었지만, 상관관계가 매우 낮았고 BDI는 외향성과 관련되어 있었다. 낮은 SI의 효과성을 논의하고자 한다.

Based on a study of federal investigative agents, Pulakos and Schmitt (1995) hypothesized that situational interviews are less effective for higher-level positions than behavior description interviews. To evalu- ate their hypothesis we analyzed data from 2 new structured interview studies. Both of these studies involved higher-level positions, a mili- tary officer and a district manager respectively, and had matching SI and BDI questions written to assess the same job characteristics. Re- sults confirmed that situational interviews are much less predictive of performance in these types of positions. Moreover, results indicated very little correspondence between situational and behavior descrip- tion questions written to assess the same job characteristic, and a link between BDI ratings and the personality trait Extroversion. Possible reasons for the lower situational interview effectiveness are discussed.

 

 


 


근대 구조화면접을 이루는 두 가지 가장 유명한 것이 SI와 BDI이다. SI에서 지원자는 가상의 직무상황에 대해서 어떻게 대응할지를 대답해야 한다. SI는 goal-setting theory에 근간을 두고 있어서, 의도(goal)이 행동(action)의 즉각적 전구체(precursor)라고 가정한다.

Situational and behavior description interviews have emerged as the two most popular formats for constructing modern structured interviews (Campion, Palmer, & Campion, 1997; Harris, 1989). In a situational in- terview (SI) applicants are given hypothetical job situations and asked to indicate how they would respond (Latham, Saari, Pursell, & Campion, 1980). Situational interviews are grounded in goal- setting theory, par- ticularly in that intentions (i.e., goals) are the immediate precursor of a person’s actions (Latham, 1989). In a behavior description interview

 

BDI에서 지원자는 과거의 관련된 경험과 관련한 질문을 받는데, BDI는 과거의 행동이 미래의 최고의 예측인자라는 전제에 기반한다.

(BDI) applicants are asked to relate actual incidents from their past rel- evant to the target job (Janz, 1982). Behavior description interviews are grounded in the premise that the past is the best predictor of the future (Janz, 1989).

 

그러나 Pulakos and Schmitt 의 연구는 위의 validity scenario에 잠재 위협을 말한다. 그들은 상당히 복잡한 직무에 대해서 SI와 BDI를 개발하였고, 216개의 샘플에서 수행능력 평가와의 상관관계가 SI에서 -0.02, BDI에서 0.32임을 보여주었다. 이 연구에서 특히 중요한 점은, 그들의 가설, 즉 SI가 고위직에 대해서 효과적이지 않을 것이라는 것, 이며, 만약에 이것이 사실이라면 구조화 면접의 과학과 실행에 대한 상당한 함의가 있다.

However, a study by Pulakos and Schmitt (1995) suggests a possible caveat to the above validity scenario. They developed both situational and behavior description' interviews for a fairly complex position, a In a sample of 216 incumbents (108 for federal investigative agent. each format), the correlations with performance evaluations were -0.02 for the SI and 0.32 for the BDI. What is particularly important about this study is their hypothesis that situational interviews may not be as effective for higher-level positions as they are for lower-level positions. If true, this has very important implications for the science and practice of structured interviewing.

 





그렇다면 고위직 면접에서 왜 SI가 BDI보다 덜 효과적일까? 첫 번째 설명은 SI질문이 이들에게 너무 단순하다는 것이다. 그러나 평균과 표준편차를 분석해보면, 이 세 가지 연구에서 이것은 사실이 아니었다. 오히려 문항으로서 SI가 BDI보다 더 나은 편이었다. 두 번째 가능한 설명은 고위직에 있어서는 SI에 대한 대답을 평가하는 것 자체가 어렵기 때문이라는 것이다. 평가자간 신뢰도를 보면, 이 역시 가능성이 낮다. 세 번째 설명은 BDI가 현재, 혹은 최근의 직위와 관련한 직무 수행능력을 타당하게 보여준다는 것이다. Pulakos와 Schmitt의 연구와 우리의 두 번째 실험은이 가설을 뒷받침해주지 않는다. 또 다른 가능성은 SI와 BDI가 서로 다른 구인을 평가한다는 것이다. 우리의 연구 결과를 보면 SI와 BDI가 같은 직무 특성을 평가하고자 하더라도, 적어도 고위직에 대해서는, 그 결과는 잘 일치하지 않는다. 이러한 낮은 일치도는 중요한 결과인데, 다른 면접 관련 문헌에서 다뤄진 바가 없는 것이다. 이 결과의 함의는 SI와 BDI가 서로 대체가능한 측정방법으로 고려되어서는 안된다는 점이다. 그보다는 각각 별개의 검사도구라고 보는 것이 나으며, 서로 다른 구인을 평가하는 것으로 봐야 한다. 두 번째 함의는 BDI로어떤 구인을 보고자 하든, 고위직에 대해서는 SI보다는 우월하다는 것이다.

So why would situational interviews be less effective than behavior description interviews for higher-level ppsitions? The first possible ex- planation is that SI questions are just to8 simple for higher-level posi- tions. Analysis of the means and standard deviations suggest that this was not the case in any of the three studies. Rather, it was not uncom- mon for the SI questions to have slightly better properties than the BDI questions. The second possible explanation is that responses to SI ques- tions are more difficult to rate with higher-level positions. Analysis of interrater reliability data in all three studies again suggests that this was not the case. The third possible explanation is that BDI questions are valid because they capture job performance in either the current or a re- cent position, an explanation which is particularly viable in concurrent designs. Data available in Pulakos and Schmitt (1995) and in our second study (both of which were concurrent) does not support this idea either. Another possible explanation is that SI and BDI ratings tend to cap- ture different constructs. Our results strongly suggest that SI and BDI questions written to assess the same job characteristics do not tend to correspond, at least not for higher-level positions. This lack of corre- spondence is an important finding, one we are unaware of anywhere else in the interview literature. One implication of this finding is that situa- tional and behavior description formats probably should not be consid- ered as alternate methods of measurement. Rather, it might be more appropriate to view them as separate testing devices, ones which for the most part capture different constructs. A second implication is that whatever constructs BDI questions tend to capture for higher-level po- sitions are more predictive of performance than whatever constructs SI questions tend to capture.

 

마지막으로, SI의 타당도에 관해서 언급되어야 할 방법론적 이슈가 있다. Pulakos와 Schmitt의 연구에서, 일부 지원자는 모든 가능한 가능성을 고려하고자 했고, 다른 지원자는 표면적인 응답만을 했다. 후자와 같은 답도 여전히 옳은 답이기에, 더 복잡한 사고를 통해서 답을 한 전자와 같은 지원자가 - 비록 그들이 더 적합한 지원자임을 보였더라도 - 반드시 더 높은 점수를 받은 것은 아니다. Pulakos와 Schmitt 연구의 함의는 SI의 점수체계가 낮은 복잡도의 직무에 더 잘 맞는다 것이다. SI연구의 표본 답안에 대한 연구를 보면, SI 점수체계가 그 지원자가 어떤 행동을 할 것인가에만 엄격하게 초점이 맞춰져 있고, 왜 그러한 행동을 할 것인지, 어떻게 그 행동을 할 것인지에 대해서 맞춰져 있지 않다. 표면적으로 드러난 행동에 초점을 두는 것은 낮은 직위에 대해서는 완벽하게 적합할 수 있다. 그러나 고위직에 대해서는 어떻게 지원자가 특정 행동에 이르렀고, 왜 그 행동을 하기로 했는가가 행동 그 자체보다 중요할 수 있다.

Last, there is a methodological issue related to SI validity in higher- level positions that warrants mention. During the Pulakos and Schmitt (1995) study it was observed that some candidates thought through every possible contingency when answering the SI questions and other appli- cants gave more superficial responses. Because the latter answcrs were still essentially correct, candidates engaging in more complex thought did not necessarily receive higher ratings even though a case could be made that they represented better job candidates. The implication of Pulakos and Schmitt’s (1995) observation is that the standard SI scoring system may be better suited for jobs of lower com- plexity. An examination of the benchmark answers provided as examples in several SI studies illustrates the tendency for SI scoring to be based strictly on what overt action candidates would take (e.g., Campion et al., 1994, Latham & Saari, 1984), not on why they would take that action or how they arrived that action. A focus upon overt actions may be perfectly adequate (and even preferred) for lower-level positions. But for higher- level positions knowing how candidates arrived at a particular action and why they chose that action is often just as important as the action itself. 


또한 SI 질문에 대한 probing이 왜 그 행동을 하게 되었는가와 특히 관련이 있음에도, 면접관이 probe 하지 못하게 되어있다는 점도 중요하다. 아직 SI의 점수체계가 문제라는 것에 대한 직접적 증거는 없다. 연구가 필요하다.

It is also important to point out that interviewers typically are not allowed to probe responses to SI questions, and probing is where information related to why they choose a particular action would be most likely to emerge. Admittedly we do not have direct evidence at the present time that the standard SI scoring system is the culprit. Nonetheless, this is- sue and its implications are important enough to warrant investigation of modifications to the SI scoring system in future research.

 


요약하면, 본 연구의 결과는 다음과 같은 기여가 있다. 가장 중요한 것은 SI가 고위직에 맞지 않는다는 Pulakos and Schmitt의 가설을 지지하는 결과이다. 그들의 원래 연구와 이번 두 개의 새로운 연구를 합해서 보면, 면접 개발자들은 고위직에 SI를 사용할 때 조심해야 한다는 제언을 할 수 있다. 이 제언이 특히 중요한 이유는 모든 세 연구에서 SI와 BDI를 직접적으로 비교했기 때문이다. 또한 SI와 BDI 질문에 대한 평가 결과는 일치도가 매우 낮았다. 또한 흥미로운 것은 SI와 BDI 점수의 일치도가 낮은 직위의 면접에서는 높은 일치도를 보였다는 점이다. 마지막으로, BDI가 외향성 점수와 상관관계가 높은 점은 verbal presentation skill의 영향이 컸을 수 있음을 의미한다.

In summary, results of this investigation contribute to the interview literature in several ways. Probably the most important contribution is they support Pulakos and Schmitt’s (1995) hypothesis that situational in- terviews do not tend to work as well for higher-level positions. Based on the combined results of their original Ftudy and our two new studies, the formal recommendation can now be dade that interview developers should exercise considerable caution when using the standard SI format for higher-level positions. What makes this recommendation particu- larly viable is that all three of these studies involved direct comparison of situational and behavior description interviews for the same position. In addition, out results suggest a strong lack of correspondence between SI and BDI questions written to assess the same job characteristics in higher-level positions. What is interesting is that the one published study which involved a direct comparison of SI and BDI validity for the same lower-level position found a much higher correspondence (Campion et al., 1994). Finally, our results suggest an association between BDI rat- ings and Extroversion scores, which may point to a larger influence from verbal presentation skdls.

 

 

 

 

 

 


COMPARISON OF SITUATIONAL AND BEHAVIOR DESCRIPTION INTERVIEW QUESTIONS FOR HIGHER-LEVEL POSITIONS

  1. ALLEN I. HUFFCUTT1,*, 
  2. JEFF A. WEEKLEY2, 
  3. WILLI H. WIESNER3,
  4. TIMOTHY G. DEGROOT3 and
  5. CASEY JONES2

Article first published online: 7 DEC 2006

DOI: 10.1111/j.1744-6570.2001.tb00225.x

Personnel Psychology

Personnel Psychology

Volume 54, Issue 3, pages 619–644, September 2001


지원자에게 '어떻게 할 것인가요?' 를 묻기 vs '무엇을 했나요?' 묻기: Situational Interview와 Behavior Employment Interview 비교의 메타분석 (J Occup Organ Psychol., 2002)

Asking applicants what they would do versus what they did do: A meta-analytic comparison of situational and past behaviour employment interview questions


Paul J. Taylor1* and Bruce Small2

1Chinese University of Hong Kong and University of Waikato, Hamilton, New Zealand 2AgResearch, Hamilton, New Zealand

 

 

 

 

 

Situational question(SQ, 다음과 같은 상황에서 어떻게 할 것인가?)혹은 Past behavior question (PBQ, ~한 경험을 떠올려볼 수 있나요? 어떻게 했나요?)를 활용한 구조화된 면접의 준거-관련 타당도와 평가자간 신뢰도를 분석하였다. 신뢰도와 타당도는  descriptively-anchored rating scales 을 사용했을 때, 그리고 직무 복잡성에 따라 나누었을 때를 비교했다.

Criterion-related validities and inter-rater reliabilities for structured employment interview studies using situational questions (e.g. ‘‘Assume that you were faced with the following situation . . . what would you do?’’) were compared meta-analytically with studies using past behaviour questions (e.g. ‘‘Can you think of a time when . . . what did you do?’’). Validities and reliabilities were further analysed in terms of whether descriptively-anchored rating scales were used to judge interviewees’ answers, and validities for each question type were also assessed across three levels of job complexity.

 

SQ와 PBQ 모두 높은 타당도를 보여주나, PBQ를 활용한 연구는 DARS를 사용하면 SQ를 DARS로 했을 때보다 훨씬 더 높은 타당도를 보여주었다(.63 vs .47). rating scale을 보정하고도 질문의 종류(SQ vs PBQ)는 면접 타당도의 moderator인 것으로 밝혀졌다. SQ가 높은 복잡성을 가진 직부에 덜 타당하다는 가설에 근거는 없었다.

While both question formats yielded high validity estimates, studies using past behaviour questions, when used with discriptively anchored answer rating scales, yielded a substantially higher mean validity estimate than studies using the situational question format with descriptively-anchored answer rating scales (.63 versus .47). Question type (situational versus past behaviour) was found to moderate interview validity, after controlling for whether studies used answer rating scales. No support was found for the hypothesis that situational questions are less valid for predicting job performance in high-complexity jobs.

 

DARS를 사용한 경우 SQ와 PBQ의 Sample-weighted mean inter-rater reliabilities 는 비슷했으며, DARS를 사용하지 않은 PBQ는 조금 더 낮았다.

Sample-weighted mean inter-rater reliabilities were similar for both situational and past behaviour questions, provided that descriptively-anchored rating scales were used (.79 and .77, respectively), although they were slightly lower (.73) for past behaviour question studies lacking such rating scales.

 

 


 

 

직무 수행능력의 결정요인에 대한 모델은 PBQ가 SQ보다 우월할 것을 기대하는 토대이다. 두 개의 수행능력 결정요인 Can do와 Will do.

In contrast, models of the determinants of job performance provide a basis for expecting past behaviour questions to exhibit superior criterion-related validity over situational questions. While various theorists have specified somewhat different variables as performance determinants (see Blumberg & Pringle, 1982; Campbell, 1990; McCloy, Campbell, & Cudeck, 1994; Vroom, 1964), all have in common two fundamental groups of performance determinants: ‘can do’ and ‘will do’ variables.

 

  • ‘Can do’ variables include job knowledge, skills and abilities, while
  • ‘will do’ variables primarily concern workers’ motivation to perform.

 

Campbell 등의 수행능력 결정요인 모델에서는 수행능력은 세 가지 변인의 함수이다.

  • (1) 서술적 지식 declarative knowledge,
  • (2) 절차적 지식과 기술 procedural knowledge and skills, and
  • (3) 동기 motivation

In Campbell and colleagues’ performance determinants model (Campbell, 1990; McCloy et al., 1994), for example, performance is viewed as a function of three variables: (1) declarative knowledge, (2) procedural knowledge and skills, and (3) motivation.

 

서술적 지식은 절차적 지식과 기술의 필요조건이나 충분조건은 아니다. 한편 'motivation'은 수행능력의 직접적 결정요인이다.

Declarative knowledge is seen as a necessary, though insufficient, condition for procedural knowledge and skills, while motivation is a direct determinant of performance.

 

 

최대 수행능력(maximal performance)의 척도는 서술적 지식과 절차적 지식의 함수라는 것으로 이론화되어있는 반면, 일상적 수행능력(typical performance, 즉 상관의 평가와 같은 것)은 위의 세 가지 결정요인이 모두 포함되는 함수이다. 따라서 최대 수행능력과 일상 수행능력의 차이는 -시험 상황에서는 모든 사람이 수행의 동기부여가 되어있기 때문에 - 최대 수행능력을 측정할 때에는 개개인이 지식과 기술을 일상적 직무 수행에 적용하고자 하는 동기를 측정하지 못한다는 것이다.

Measures of maximal performance, such as job knowledge tests and work sample tests, have been theorized to be a function of declarative knowledge and procedural knowledge/skills, while measures of typical performance, such as supervisory ratings of job performance, are believed to be a function of all three performance deter- minants (McCloy et al., 1994). Thus the critical difference between maximal and typical measures of performance is that maximal performance measures fail to assess differences in individuals’ motivation to apply knowledge and skills to day-to-day job performance, since all performers are motivated to perform well during the testing situation.

 

우리는 MP와 TP를 구분하는 것과 그리고 직무 수행능력의 결정요인으로서 그들의 관계가 어떻게 면접문항의 형태를 구성하는가에 관련이 된다고 본다.

We believe that the distinction between measures of maximal and typical performance, and their relationship to the determinants of job performance, are relevant to how structured interview questions are formatted.

 

SQ는 시뮬레이션과 마찬가지로, MP의 척도로서 서술적 지식과 절차적 지식을 평가할 수 있다. 그러나 지원자가 이미 최선의 답안을 할 준비가 되어있기 때문에, 일상적 수행에 대한 지원자의 동기를 평가하지는 못한다.

Situational questions, like simulations, work samples and situational judgment tests, are measures of maximal performance, and so they can assess declarative knowledge and procedural knowledge/skills; but since interviewees are all motivated to provide the best answer possible, answers do not necessarily reflect interviewees’ motivation to apply that knowledge/skill to day-to-day job performance. For example, an interviewee who is able to describe the appropriate response to a hypothetical situation certainly demon- strates the requisite knowledge/skill, but it remains uncertain whether the individual would actually apply that knowledge/skill in an actual job situation.

 

면 PBQ는 TP를 평가할 가능성이 높은데, 왜냐하면 지원자가 겪은 일상적 상황에 초점을 맞추고 있기 때문이며, motivation을 포함한 세 가지 결정요인을 모두 평가할 수 있다. 과거 상황에 대해서 효과적으로 대답했다고 응답한 지원자는 지식과 기술, 그리고 충분히 동기부여가 되어있었다. 고용시에 보통 관심을 갖는 것은 TP이므로 PBQ는 미래의 수행능력에 더 정확한 지표가 될 수 있다.

Past behaviour questions, however, are more likely to assess typical performance since they focus on candidates’ responses to the day-to-day situations that candidates have faced, and so they can assess all three performance determinants (including motivation). Interviewees who report that they have responded effectively in a past situation demonstrate both the necessary knowledge and skills, and also that they were sufficiently motivated to apply their knowledge/skills in that situation. Since the cri- terion of interest in employment settings is usually typical performance, past behaviour questions could be expected to provide a more accurate indication of future job performance than situational questions.

 

 

 

 

 

 

 


 

 

Asking applicants what they would do versus what they did do: A meta-analytic comparison of situational and past behaviour employment interview questions

  1. Paul J. Taylor1,* and
  2. Bruce Small2

Article first published online: 16 DEC 2010

DOI: 10.1348/096317902320369712

Journal of Occupational and Organizational Psychology

Journal of Occupational and Organizational Psychology

Volume 75, Issue 3, pages 277–294, September 2002

진화하는 의과대학 입학면접(AAMC, 2011)

The Evolving Medical School Admissions Interview

Anaysis in Brief

 

 

 

많은 연구들이 의과대학 입학면접의 신뢰도와 타당도에 대해 연구했지만, 전형적인 면접 절차에 대한 연구는 지난 20년간 없었다.

While numerous studies have examined the reliability and validity of medical school admissions inter- views, a description of the typical interview process has not been published in nearly 20 years.2,3

 

20년전, 전형적인 입학면접은 교수 혹은 교직원과 일대일 면접이었다. 질문 내용에 대해 면접관에게 주어지는 가이드는 거의 없었으며, graphic rating을 사용했다. 이후 많은 의과대학 입학면접은 반구조화 면접이나 MMI 등을 도입했다. 2011년 총 8만개가 넘는 입학면접이 수행되었다.

Twenty years ago, the typical admis- sions interview was characterized by one-on-one interviews conducted by faculty and staff. Interviewers received little guidance about the content of questions but were required to use graphic rating scales to evaluate applicants. Since then, many medical school admissions interviews use techniques such as semi-structured interviews4 and the Multi-Mini Interview (MMI).5,6 In 2011, admis- sions committees conducted over 80,000 admissions interviews (median = 566; range = 87 to 1,438).7

 

 

어떻게 면접 대상자를 선발하는가?

How is the interview pool selected?

 

입학위원회 위원(64%)과 교수(staff, 56%)이 지원서를 평가하여 면접대상자를 결정한다고 했고, 12%만이 컴퓨터 기반 알고리즘을 사용했다. 69%에서는 2명 혹은 이상이 지원자 정보를 평가한다. 53%에서는 이 과정이 15분 이상 걸린다. 학업적(uGPA, MCAT), 비학업적(봉사, 자기소개서) 정보를 모두 활용하고 있으며, 그러나 보통 이 단계에서의 가중치는 학업적 자료에 더 주어진다.

More than half of admissions officers reported that admissions committee members (64%) and staff (56%) review application materials to decide which applicants to interview; only 12 percent reported that their schools use computer-based algorithms to make this decision.8 Sixty-nine percent of respondents indicated that two or more people review each applicant’s information. At most schools (53%), the review takes 15 minutes or more. The companion AIB indicates that both academic (e.g., undergraduate GPAs and MCAT scores) and non- academic (e.g., medical community service, personal statements) data are used to select the interview pool; however, more weight is given to academic data at this stage in the admissions process.

 

 

전형적인 인터뷰 절차는?

What process is used to conduct a typical admissions interview?

 

20년 전과 마찬가지로 약 83%에서 faculty, staff, 그리고 종종 의과대학생까지 일대일 면접의 면접관이 된다. 59%에서는 각 지원자마다 두 차례의 면접을 한다. 50% 이상에서 면접관은 자기소개서, 평가결과지, MCAT점수, 학부GPA 등을 면접 전 혹은 면접 중에 리뷰한다. 입학면접은 보통 30~44분 소요된다.

As was the case 20 years ago, many admissions officers (83%) indicate faculty and staff and, in some cases medical students, conduct one-on- one interviews. Fifty nine percent of schools conduct two interviews with each interviewee. At more than 50 percent of schools, interviewers review personal statements, letters of evaluation, MCAT scores, and under- graduate GPAs prior to or during the interview.9 Admissions interviews typically last between 30 and 44 minutes each.

 

현재의 입학면접은 과거보다 더 구조화되어있는 편이다. 64%에서 면접관에게 질문 내용에 대한 일반적 가이드라인을 제시하며, 대부분에서 면접에 대한 표준화 절차를 도입한다.

Results show that the current admis- sions interview is more structured than it was in the past. Sixty-four percent of schools provide general guidance to interviewers about the content of the questions they should ask. Similarly, most employ a standardrating process to evaluate applicants during the interview

 

어떤 특징을 평가하는가?

What characteristics are assessed in the typical admissions interview?

 

50% 이하의 학교에서 평가하는 것으로는 지원자의 학과 내용 지식(생물, 화학, 심리학 등)이다.

Less than 50 percent of respondents indicated that their interviews include questions about applicants’ academic content knowledge (e.g., biology, chemistry, psychology, etc.).10

 

 

 

Discussion

 

의과대학들은 거의 전적으로 지원자의 인적특성 평가에 면접을 활용한다. 인성 평가에 대해서 면접에 의존하는 것은 다른 입학도구로는 평가가 어렵기 때문일 것이다.

These data also show that medical schools use the interview, almost exclusively, to assess applicants’ personal characteristics. Reliance on the interview is likely due to the difficulty of assessing personal char- acteristics with other admissions tools currently available earlier in the admissions process.

 

 

 

 

 

 


Dunleavy, D. M., & Whittaker, K. M. (2011). The evolving medical school admissions interview.AAMC Analysis in Brief, 11(7), 1-2.

MMI에서 시험내용 보안 위반에 따른 영향(Med Educ, 2006)

The effect of defined violations of test security on admissions outcomes using multiple mini-interviews

Harold I Reiter, Penny Salvatori, Jack Rosenfeld, Kien Trinh & Kevin W Eva






2001년 11월, MMI의 첫 번째 파일럿 프로젝트가 완료되었다. OSCE의 형식을 따라 6스테이션의, 18명의 가상 지원자를 대상으로 하여 괜찮은 수준의 일반화가능도(신뢰도), 수용가능성, 실행가능성을 확인하였다. 이후 실제 지원자를 대상으로 2002년과 2003년 대규모 연구를 통해 이전의 이러한 결론을 재확인하고 예측타당도를 검증하였다.

In November 2001 the first pilot project of a multiple mini-interview (MMI) process for student admissions was completed.1 Modelled after the objective structured clinical examinations (OSCEs), a 6-station MMI with 18 faux-applicants generated promising data regarding overall test generalisability (reliability), acceptability and feasibility. Results of subsequent large-scale studies of actual medical school applicants in April 2002 and 2003 confirmed prior conclusions and generated preliminary data demonstrating pre- dictive validity.1–3


MMI를 도입하고자 할 때 시험의 보안에 대한 우려가 있다. 그리고 이것은 현실이다. 면접 과정의 신뢰성이 2개의 핵심적 요인에 의해서 위험에 처해 있다. 입학은 엄청나게 중요한(high-stake) 시험이면서, 이 진실성을 깨트릴 수 있는 수단은 많다. 그 결과 MMI의 보안은 위협받게 된다. 그렇게 할 동기도 있고, 수단도 있고, 기회도 있다. 면접의 지시문(stems)이 일반 대중에게 공개될 가능성이 높으며, 그러나 그러한 부도덕한 행위로 인해서 어떤 이득이 있을지는 불확실하다.

With the anticipated move towards MMI implementation, concerns arose regarding test security. Cause for concern is real. The integrity of the interview process, like any other evaluation in academia, is endangered to a greater or lesser extent based upon 2 critical factors. How high are the stakes involved? What obstacles are in place to limit the extent of breaches of academic integrity? As a result, the MMI provides an attractive target for such breaches. 

  • There is motive, with the exceedingly high stakes of career-making in the balance. 
  • There is method, with the explosion of communication tech- nology decreasing obstacles to information dissem- ination. 
  • There is also opportunity, with the stems of interview stations available, of necessity, to those applicants undergoing the MMI. 

Thus the availability of stems to the general populace is anticipated. It remains far less certain whether anything is gained by such unscrupulous conduct.


더 포괄적으로 보자면, 보안문제에 관해 중요한 것 하나는 그에 따른 영향이 얼마나 되느냐인데, 18개의 연구 중 6개는 통계적으로 유의미한 향상을, 4개는 제한적인 향상을, 4개는 차이 없음을 보고했다.

More broadly, the issue is one of determining the impact of security violations on perceived compet- ence levels. Literature exists outlining this impact in the domain of clinical skills assessment. Of 18 studies, 6 showed a statistically significant improvement in performance after test security violations,4–9 4 showed limited benefits10–13 and 4 revealed no difference.14–21


Swanson은 문헌 고찰을 하면서 이 부분에 대한 방법론적 개선을 요구해쓴데, 이러한 연구를 할 때 방법론적 4가지 핵심적 고려사항이 있다.

Swanson et al.,22 in their review of this literature, promoted the need for methodologi- cal improvements in this area. They described 4 key methodological aspects to ensure when designing these types of studies. As applied to the MMI, these are as follows.


  • 1 면접대상자의 일부가 비교가능해야 한다. 평가대상이 무작위 배정되어야 한다.
    Subgroups of applicants being interviewed must be comparable, achievable using random assign- ment of those being rated. 
  • 2 보안 위반이 발생했음이 확실해야 한다.
    The violation(s) must be known to have occurred. 
  • 3 통계적으로 영향력을 예측하기 위해서 충분한 표본크기가 있음어야 하고, 연구의 power가 충분해야 한다.
    The study must have sufficient power, in terms of sample size, to enable any presumptive impact to be identifiable statistically. 
  • 4 평가도구의 신뢰도가 충분해야 한다.
    The tool must be sufficiently reliable for true shifts in the ability to perform to be detectable.



연구 1 

STUDY 1


Methods


57명의 지원자. 전통적 면접 수행 이후, MMI Trial에 자발적 참가

A total of 57 applicants to the MD programme participated in a voluntary trial run of the MMI after their traditional interviews were completed.


절반의 지원자는 2주 전 모든 9개 스테이션의 내용을 제공받음. 절반은 제공받지 않음.

Two weeks in advance of the interview date, half of the volunteers were provided copies of all 9 station stems via electronic mail. Access to these 9 station stems remained restricted from the other half of the volunteers.


2명의 평가자가 종합적 수행능력 평가함 (7점척도)

Two examiners provided a global per- formance rating for each candidate at each stationusing an anchored 7-point scale. 



Results


24명은 2주 전에 면접내용 제공받음. 0.06 차이가 있었으며, 통계적으로 유의미하려면 그룹당 1495명짜리 샘플 필요.

Twenty-four applicants received the station summar- ies 2 weeks in advance of their participation. The mean score of these participants was 4.97 (SD ¼ 0.46). The 33 applicants who did not receive the stations in advance achieved a mean score of 4.91 (SD ¼ 0.67). This difference is not statistically signi- ficant; F1,55 ¼ 0.19, MSE ¼ 6.22, P >0.65. To reveal a difference of 0.06 to be significant with the pooled standard deviation of 0.58 would require a sample size of 1495 per group.


Discussion


그룹간 차이가 존재하고 그 방향이 우려되는 방향이었으나, 매우 미미했고 그 수치가 유의미하려면 많은 지원자가 필요해야 함. 즉 임상적으로 중요하지 않음.

While the difference between groups is in the direction that would cause concern, it is so minuscule that 7 times the number of partici- pants that the MD programme interviews typically would be required to show the difference to be significant, thereby suggesting that the result would be clinically unimportant even if large enough samples were drawn.



연구 2

STUDY 2


Methods


2004년 3~4월에 진행됨. 실제 MMI

The second study occurred in March⁄ April 2004 with the first real high stakes implementation of the MMI.


24개 스테이션 개발하여서, 2일에 걸쳐 진행. 각각 12개 스테이션. 스테이션당 1명의 평가자가 7점척도로 평가

Twenty-four stations were developed, with 12 used on each of 2 interview dates. The 24 stations again focused upon personal quality domains. The system of scoring remained similar to that described above, with the exception that only 1 examiner was present per station.


12개중 2개 스테이션을 파일럿 스테이션으로 사용하였고, 이 스테이션 점수는 총점에 반영 안됨. 절반의 지원자는 그 2개중 1개, 나머지 절반은 다른 1개에 대한 내용을 제공받음. 면접날 지원자는 일부 스테이션이 파일럿 목적으로 포함되었으며, 입학 결정에 영향이 없다는 설명을 받음. 그러나 그 스테이션이 무엇인지는 알려주지 않았음. Repeated measure t-test 사용하여 정보가 없었던 스테이션과 그렇지 않은 스테이션 점수를 비교

Once again an intentional security violation was introduced, this time by using 2 of the 12 stations as pilot stations, scores on which did not count toward the admissions decision. Half the applicants received 1 of the 2 pilot stations with their mailed letter inviting them to interview; in a covering letter they were told to expect to encounter that particular station during their interview. The other half of the applicants received the other pilot station in the same manner. On the day of the interview applicants were told that some stations were included for pilot purposes and that these stations would not count towards their admis- sions decision; they were not told, however, which stations fell into this category. Repeated measures t-tests were used to compare scores on the station seen in advance to scores received on stations to which applicants were naive.


Results


평균 점수

The mean overall performance score received by candidates per station was 4.94 (SD ¼ 1.10). The overall test–retest reliability of this 12-station MMI with 1 examiner per station was 0.70.


Discussion


high-stake 였음에도, 그리고 2주전에 내용을 제공했음에도 benefit은 없었다.

Despite the high stakes nature of this interview process and the fact that stations were delivered 2 weeks in advance with clear indication that they would be included in the interview, we again wit- nessed no benefit of prior exposure in the performance ratings assigned.


그러나 일부 평가자는 - 이 intervention에 대해서 모르는 - 자발적으로 일부 지원자가 지나치게 연습이 되어있었다고 말했는데, 이것이 왜 스테이션에 대한 정보를 알더라도 별 이득이 없는지를 설명해주는 기전의 가능성을 제시함.

Anecdotally, a number of examiners, each of whom were blinded to the intervention, noted spontaneously that some responses seemed too rehearsed, potentially providing insight into the mechanism by which potential benefits of prior knowledge of the stations are lost.



연구 3 

STUDY 3 


Methods


직업치료사 면접을 본 사람 중 38명은 물리치료사에도 지원함. 이 38명은 7개 스테이션 MMI에대해서 오전에 OT, 오후에 PT 선발용 면접을 수행함. 따라서 이 38명은 면접 문항 뿐 아니라, 실제로 그 스테이션에 대한 경험이 있음. 7점척도를 사용하였으며, 이 프로그램의 면접관들은 이 38명이 누군지 몰랐음. 7점척도로 종합적 수행능력과 직업적합성 정도를 각각 평가하게 했음. 이 둘 사이의 상관관계가 0.95를 넘어서 종합적 수행능력 점수만 비교함.

Of the interviewees for occupa- tional therapy seats, 38 also interviewed for physio- therapy seats. These 38 applicants underwent the same 7-station MMI for both interviews (OT in the morning and PT in the afternoon). They were therefore privy not only to the stems of the MMI stations, but also potentially gained benefit from the experience of working through the 7 stations with an interviewer. As before, the stations focused on personal quality domains and were globally scored using a 7-point anchored scale. Interviewers in this programme were blinded to the candidates being repeated interviewees. They were asked to assign ratings of each candidate’s overall performance and to provide a 7-point gut opinion of the person as a candidate for the profession⁄ programme. The cor- relation between these two scores was greater than 0.95, so only the overall performance score will be reported for the sake of comparison with the first 2 studies outlined above.


평균은 0.01차이가 있었으며, 유의미하려면 그룹당 29000이 필요함.

The mean score provided to the sample of 38 applicants during their interview for the OT pro- gramme was 3.46 (SD ¼ 0.43). The mean score provided to the same group during their interview for the PT programme was 3.45 (SD ¼ 0.44). This difference is not statistically significant; t(37) ¼ 0.14, P >0.8. To reveal a difference of 0.01 to be significant with the pooled standard deviation of 0.43 would require a sample size of over 29 000 per group.





고찰 

GENERAL DISCUSSION



MMI 면접 스테이션 개발은 노동집약적이고 여러 단계를 거친다. 다음의 결과물을 만든다.

MMI station development can be labour intensive, requiring several steps.24 The written product consists of:


1 제시문 A station stem entitled Instructions for the Applicant 

2 스테이션 가이드 (면접관용 가이드) A station guide entitled Instructions for the Observer 

3 스테이션의 배경과 이론에 대한 심도 평가 An in-depth review of the station implications entitled Background and Theory .

4 평가지 A station score sheet. 


이 4개중 1번은 지원자와 평가자에게 제공되고 나머지 3개는 평가자에게만 제공된다. 모든 문서는 높은 수준으로 보안되는데, 평가자들은 일반적인 MMI에 대해서는 오래 전에 교육을 받으나, 스테이션에 대해서는 면접 당일에 정보를 받는다. MMI가 시행되면 지원자들은 처음으로 지시문을 접한다. 그러나 종이와 펜이 없다고 해서 이 제시문을 복원하는데 지장이 있지 않다.

The first of these 4 is available to the applicant throughout the station; the other 3 are available to the observer only. All documents are jealously guarded. While observers receive general MMI training well in advance, they remain station-naive until the morning of the interview date when they receive station-specific training. Once the MMI commences, the interviewed applicants become privy to the station stems. Their lack of paper and pen has not significantly constrained the subse- quent publication of those stems.


2004년에 시행한 MMI에서 많은 정보통신기구에 의한 보안 위협이 있었다. 한 지원자는 다른 지원자에게 어떻게 MMI가 진행되었는지 알려주었다고 했고, 웹사이트에 MMI가 끝난지 7분만에 그 정보가 올라갔다고 했다. 몇 주가 지나자 모든 24개 스테이션의 제시문이 상당한 정확도로 복원되었다.

The practical administration of the MMI in March 2004 provideda sample of security challenges in the age of hand-held computers, wireless communication and the internet. The first comment about the MMI by aninterviewed applicant, informing others about howthe MMI was run, was posted in a forum website 7minutes after MMI completion.23 In the subsequentweeks, after interviews were completed, reasonably accurate descriptions of all 24 MMI station stems could be viewed on the same site. 


이렇게 제시문이 빠르게 퍼져나가는 것은 놀랍지도 않고, MMI도입에 걱정거리이다. 체크리스트와 배경정보와 배경이론이 새어나가는 것의 영향력은 모르나, 제시문이 공개되는 것에 대한 우려는 보다 현실적이다.

The rapid publication of these stems was therefore hardly surprising. Nor, apparently, is it particularly unnerving for prospective MMI implementation. While the effects of security violations of station checklists and background and theory remain unknown, the more practical concern regarding security violation of station stems appears misplaced.


직관에 반하는 이러한 결과에 대한 한 가지 설명은, MMI 스테이션은 OSCE나 다른 지식/능력 검사와 달리 1개의 정답만 있을 가능성을 배제하고 있으며, 지원자가 무슨 답변을 하든 면접관이 그에 대한 반문이 가능하다. 모든 가능한 재질문에 대하여 준비하기는 대단히 어려우며, 그래서 오히려 지원자가 사전에 진행 의제를 설정하여 그것대로 진행하려고 하면 오히려 안좋은 결과가 나타나는 것이다. 
One plausible explanation of this counter-intuitive result is that MMI stations, unlike OSCEs and other knowledge⁄ ability tests, are designed to guard against the possibility of there being 1 correct answer, thereby allowing the interviewer to challenge any response provided by the candidate. It would be very difficult to prepare responses for every possible challenge, thus resulting in poorer performance if a candidate attempts to force a pre-planned agenda on the discussion.


시험 보안에 대해서 연구는 제시문에 제한되어 있었지면, 여기서 보면 2주간의 기회를 주어도 향상은 없으며 따라서 얼마나 긴 시간을 주는가는 그다지 문제가 되지 않아 보인다.

In these studies the extent of the violation was limited to the availability of the stem. Our results show that there was no score enhancement despite the 2-week window of opportunity. It appears that time delay is not an issue.


조금 불분명한 것은 보안 위반의 범위에 대한 것이다. 앞의 두 연구에서 제시문 보안만 위반된 경우에는 유의한 영향이 없어 보인다. 시험-재시험 위반의 경우에도 역시나 결과에 영향은 없었다. 그러나 세 번째 연구의 결과는 time delay가 짧았기 때문일 수 있다.

Less clear is the influence of the extent of the violation. One5 of 25,15 OSCE studies with egregious and identifiable violations suggested that extent of violation is a critical factor. In the first 2 MMI studies broadcasting of the stem alone, a more limited violation, had no significant impact on test scores. The more extensive, test–retest violation of the third MMI study also failed to demonstrate any impact on scores. However, this may have been a result of the short time delay (several hours only), and thus short potential responsive preparatory time between information access and the retest.


security violation이나 time delay 둘 중 하나만으로는 MMI 점수를 향상시키는데 불충분하다. 이 둘이 모두 갖춰진다면 가능할지도 모른다. 혹은, MMI에 정답이 없다는 것이 바라지않은 수행능력 향상을 애초에 불가능하게 할 수도 있다.

Alone, neither factor is sufficient to enhance MMI scores. Together, they may be sufficient. Alternatively, the absence of correct answers on MMI performance might result in no unwanted performance modification, even in a setting combining both more extensive security violations and greater time delay between violation and subsequent performance.



결론

CONCLUSIONS


의과대학 지원부터 전문의가 되기까지의 단계.

From 

  • application to medical school through to its successful completion,24 
  • national licensing examina- tions for general medical licence,25 
  • entry into one’s preferred speciality training26 and to 
  • speciality certi- fication,27 


이 중 단 하나의 가장 높은 허들은 의과대학 입학이다. McMaster 의과대학에는 지원자의 3.8%만 합격한다. 들어오면 99%가 졸업하고, 캐나다 의사국가시험 Part I은 95%, Part II는 91%의 합격률을 보인다. 전공과목 수련은 88%가 마치며, 91%는 전문의 시험에 합격한다.

the single greatest hurdle in terms of likelihood of success is, overwhelmingly, admission to medical school. Only 3.8% of applicants to the McMaster University Undergraduate Medical Pro- gram were admitted in 2004.24 Of those who enter the programme, 99%graduate.25 Canadian graduates nationally enjoy a greater than 95% success rate on Part I and 91%success rate on Part II, respectively, of the Licentiate Medical Council of Canada examina- tion upon their first sitting of each examination.26 They also enjoy an 88%likelihood of being chosen by the preferred speciality training programmes in Canada27 and a 91% first attempt success rate on Royal College fellowship speciality certification examinations.28


미국과 캐나다 시스템에서 의과대학에 일단 들어오면 원하는 전공과목 전문의가 못 될 가능성은 별로 없다. 의과대학생과 레지던트의 노력을 폄하하는 것은 아니다. 그러한 노력이 의과대학 입학단계에까지 확장되어야 한다.

In the American and Canadian systems, failure to complete medical school through failure to obtain one’s preferred speciality or family practice certification remains unlikely. This is not meant to denigrate the Herculean efforts and enor- mous talents required on the part of dedicated medical students and residents, but rather to recog- nise that the odds are very much in their favour for those later stages. That same effort, talent and dedication, expended at the level of admission to medical school, combine for far lower success rates.


일반적인 상황, 즉 제시문이 노출되는 보안 위반에도 MMI의 진실성은 유지될 수 있다.

Under normal circumstances, including the potential security violation of distribution of station stems, confidence in the veracity of MMI outcomes can be maintained.




24 Ontario Medical School Application Service Statistical Summary 2004. Ontario: Ontario Universities’ Appli- cation Centre, Council of Ontario Universities, 8 October 2004.









 2006 Jan;40(1):36-42.

The effect of defined violations of test security on admissions outcomes using multiple mini-interviews.

Author information

  • 1Dept. of Clinical Epidemiology and Biostatistics, McMaster University, 1200 Main Street West, Hamilton, Ontario L8Z 3N5, Canada.

Abstract

INTRODUCTION:

Heterogeneous results exist regarding the impact of security violations on student performances in objective structured clinical examinations (OSCEs). Three separate studies investigate whether anticipated security violations result in undesirable enhancement of MMI performance ratings.

METHODS:

Study 1: low-stakes: MMI station stems provided to a random half of 57 medical school applicants 2 weeks in advance of participation in a research study. Study 2: high-stakes: 384 medical school applicants sat a 12-station MMI to determine admission. Each half received 1 of 2 pilot MMI station stems 2 weeks in advance. Study 3: high-stakes: 38 interviewees with dual applications to occupational therapy and physiotherapy experienced the same 7-station MMI twice on the same date.

RESULTS:

No statistically significant differences in MMI performances were detected.

CONCLUSIONS:

Predictable violations of MMI security do not unduly influence applicant performance ratings.

PMID:
 
16441321
 
[PubMed - indexed for MEDLINE]


미국 의과대학생 선발에서 면접절차 (J Med Educ. 1981)

Description of the Interview Process in Selecting Students for Admission to U.S. Medical Schools

James B. Puryear, Ph.D., and Lloyd A. Lewis, Ph.D.



1981~1982년도의 자료에 따르면 전 의과대학 중 99%에서 면접을 활용하고 있다. 의과대학 입학위원회가 면접을 활용해 학생선발에 도움을 받고 있음을 보여준다.

According to data obtained from Medical School Admission Requirements, 1981- 82 (5), 99 percent of all medical schools use the interview in the selection process. This widespread use further supports the idea that medical school admissions com­ mittees rely on the interview to help select students.


이러한 의존에도 불구하고 Fruen에 의하면 면접에 대한 연구는 거의 없다.

Despite this reliance, according to Fruen (6), there has been surprisingly little re­ search on the interview in the medical school admissions setting.


지난 10년간 8개의 논문밖에 없었다.

The authors found only eight articles in the Journal of Medical Education during the last 10 years that concentrated on the subject of medical student admission interviews.



면접의 활용

Use of the Interview


99%가 면접을 활용하고 있다. 그 중요도에 대해서는 98%가 4점 척도 기준으로 매우 중요함~중요함 으로 응답했다.

Ninety-nine percent of the respondents ( 106 of 107) indicated that they used the interview in student selection. In order to ascertain just how much the interview is used, three questions were asked. On a four-point scale, ranging from very impor­ tant to unimportant, an overwhelming ma­ jority (98 percent) of the respondents who used the interview indicated that the interview was important to some degree. Sev­ enty-two percent said it was very  important.



면접 형식

The Interview Format


85%가 일대일 면접

Eighty-five percent of the respondents indicated that their interviews were one to one" (that is, one interviewer and one

interviewee). 


100%에서 교수와 직원이 면접관 역할을 하며, 72%는 학생도 면접관으로서 사용한다고 했음. 34%에서는 동문 활용

One-hundred percent of the responding schools using interviews indicated that fac­ ulty and staff members were used to inter­ view applicants, while 72 percent of the schools used students also as interviewers. Alumni also interviewed applicants in 34 percent of the schools.


절반 정도에서 표준 질문 세트가 있다고 했음

About half (47 percent) of the medical schools which interview had a standard set of questions or areas of inquiry that all interviewers included in their interview sessions.


46%에서 한 지원자당 두 개의 독립적 면접을 한다고 했음

Forty-six percent of the interviewing schools required applicants to be inter­ viewed in two separate interviews,



면접 운영

Interview Administration


92%에서 면접관 훈련이나 면접에 대한 개요 설명이 있다고 했고 94%에서 지원자에 대해서 기술한 보고서를 요구한다고 했음.

Ninety-two percent of the interviewing schools trained or at least briefed their interviewers in some way on interviewing applicants. A similar percentage (94 per­ cent) required a written report on the in­ terview for the applicant's file.


76%에서는 on-campus 면접만, 24%는 on- off- campus 면접

It was found that most (76 percent) of the schools which interview held inter- views only on campus, while the rest (24 percent) interviewed students on and off campus.



Implications

대부분의 의과대학은 면접을 중요하게 여기나, 스스로의 면접의 효과성을 평가하는 도구를 가진 의과대학은 적다.

Obviously, most medical schools consider the interview important in the selection of students. However, medical schools have apparently not, for the most part, established a means of evaluating the ef­ fectiveness of their own interviews.












 1981 Nov;56(11):881-5.

Description of the interview process in selecting students for admission to U.Smedical schools.

Abstract

A survey was made of the medical schools in the United States to obtain a description of the interview process used in the selection of first-yearmedical students. The following questions were the basis for the study: What is the role of the interview in the selection of medical students? What is the nature of the interview process? How is the interview administered? An 87 percent response rate was obtained. The results indicated that 99 percent of the responding medical schools use interviews in evaluating students for medical school admission, and the interview ranks second only to the grade-point average in importance among four selection factors. The interview is usually in a one to one setting, with each applicant having two separate interviews. All schools use faculty and staff members in interviewing, and usually at least one admissions committee member interviews each applicant. Usually interviews are conducted on the campus of the school. Implications drawn from the results indicate a need for a quantification of methods to incorporate the interview into the selection process.

PMID:
 
7299795
 
[PubMed - indexed for MEDLINE]


변형된 면접: 학생선발을 위한 신뢰성 있는 면접의 부활? (Acad Med, 2012)

Modified Personal Interviews: Resurrecting Reliable Personal Interviews for Admissions?

Mark D. Hanson, MD, MEd, Kulamakan Mahan Kulasegaram, Nicole N. Woods, PhD, Lindsey Fechtig, and Geoff Anderson, MD, PhD




특히 개별 면접은 전반적인 신뢰도가 낮은데 - 면접관 간 일치도가 낮고, 서로 다른 인터뷰 상황마다 일관성이 낮다 - 이로 인해서 예측력이 제한된다.

Particularly, personal interviews have low overall reliability— lack of agreement among interviewers and lack of consistency across different interview occasions—which, in turn, limits their predictive power.3,4


한 가지 흔한 해결책은 여러 차례 독립적인 샘플링을 하는 방법이다 (multiple independent sampling (MIS) method)

One common solution to increase the reliability of a performance measurement is to assess samples of the performance independently multiple times—that is, the multiple independent sampling (MIS) method.


가장 눈에 띄는 면접기법은 MMI이다. 고도로 구조화된, 시나리오 기반의 면접을 시행하는 방식이다. AAMC는 최근 MMI가 높은 신뢰도와 중간정도의 타당도를 가짐에도 불구하고, 그리고 일반적인 면접이 psychometric한 한계점이 있음에도 불구하고 개인면접을 시행하는 학교가 압도적임을 보고했다.

The most notable use of this measurement technique is the Multiple Mini-Interview (MMI),6,7 which uses up to 10 highly structured, scenario-based interviews to assess applicants. Interestingly, the Association of American Medical Colleges recently reported that a preponderance of schools use the admissions personal interview, not the MMI,1 despite not only the evidence regarding the MMI’s high reliability and moderate validity7–9 but also the critical psychometric limitations of the personal interview.2–4


MIS방법을 사용하는 비율이 낮은 이유는 상당한 자원이 투입되어야 하기 때문이며(신뢰도 있는 점수를 얻으려면 10명의 면접관이 필요하다) 면접관 모집에 관한 잠재적 영향 때문이다(모집 관련 활동과 관련된 변화). 추가적으로 기존 면접의 유연성과 직관적 단순함이 입학위원회가 MIS 도입을 꺼려하는 또 다른 이유이다.

Two critical factors contributing to the low uptake of the MIS method to admissions interviews are the aforementioned high resourcing requirements (10 interviewers needed to attain reliable scores) and the potential effects on recruitment (due to the associated alterations to campus recruitment-focused activities).10 Additionally, the flexibility and intuitive simplicity of the personal interview may make admissions committees (and interviewers) reluctant to abandon it all together.


Axelson과 Kreiter는 MIS의 적용을 연구했다. 2009년, 두 해 연속 면접을 본 지원자 집단을 대상으로 한 연구에서 전통적 면접 방식을 패널의 수를 줄이는 대신 독립적인 개인별 면접 수를 증가시킴으로써 신뢰도를 높일 수 있음을 보고했다. 따라서 입학위원회는 다수의 구조화된 시나리오에 의존하는 대신 다수의, 짧은, 단일 평가자로 이뤄진 면접을 수행함으로서 면접의 신뢰도를 향상시킬 수 있다.

Axelson and Kreiter10 investigated the application of MIS to the admissions personal interview itself. In their 2009 investigation, they reviewed the multiple interview scores of applicants who had been interviewed twice in consecutive years by a panel of two interviewers for admission to medical school. They estimated, after analyzing the scores of 168 candidates across four years who had interviewed twice, that reasonable reliability could be achieved using a traditional personal interview format by reducing the number of interviewers in the panel while increasing the number of separate personal interviews. Thus—instead of relying on a large number of structured scenarios—admissions committees might be able to depend on multiple, brief, single-rater interviews to enhance the reliability of the personal interview.


MPI에 대한 MIS 방법을 활용한 시도에 대한 연구

We report here the first prospective empirical test of the reliability of a similar modification to the admissions personal interview format using an MIS methodology named the modified personal interview (MPI).



방법
Method



1학년 학생에게 LEAD 프로그램에 대한 설명을 함. LEAD 지원자가 갖추어야 할 특성을 도출함. 이 특성에 대해 잠정적 지원자들과 communicate했으며, MPI과정동안의 질문을 만드는데 사용했다.

We informed the first-year students about LEAD and its selection process via announcements made during class and notifications sent over e-mail. The selection process constituted submission of written materials followed by, for a selected subset of candidates, the MPI process. We derived the attributes of successful LEAD candidates from the literature on leadership11,12 and through LEAD faculty consensus. These desired attributes were communicated to the pool of potential applicants and blueprinted onto (aligned with) questions asked during the MPI process.



제출 자료 Written submission materials (3 가지)

The written submission materials comprised three components: 

    • a two-page curriculum vitae (CV) summarizing applicants’ academic and leadership experiences, 
    • three brief descriptions of leadership experiences reported in the CV, and 
    • a brief vision statement of leadership goals and career aspirations.

MPI 절차 The MPI process

4개 면접방, 10~12분, 4명의 평가자, 평가자들이 Behavioral description 질문 개발

Candidates who proceeded on to the interview stage moved among four interview rooms to complete the MPIs in succession. Each MPI was about 10 to 12 minutes long; a few interviews were longer at the discretion of the faculty interviewer. The four interviewers, all of whom had participated in the review of the written materials, framed all questions as behavioral descriptive questions which have strong validity in assessing personal characteristics.13–15


평가자들은 MPI 형식에 대해서 설명을 받고, 면접의 초점에 대해서도 연습함. 3개의 인터뷰는 반구조화되어있었으며, 평가자는 사전에 질문 목록을 가지고 있었음.

Interviewers received training on the MPI format and on the focus of the interviews. Three of the interviews were semistructured, and the interviewers used a list of predetermined questions.


4명의 평가자는 3개의 공통 특성과 한 개의 MPI-특이적 특성에 대해 평가함.

All four interviewers rated three common attributes—maturity, communication skills, and interpersonal skills—and a fourth attribute unique to their MPI.


평가자는 각 특성에 대해서 5점척도로 평가함. 총점 20점

The interviewers evaluated each attribute as a separate item on a five-point Likert- type scale (1 = poor, 2 = good, 3 = very good, 4 = excellent, and 5 = outstanding) to increase the scoring range available to interviewers. All items were totaled for a final MPI score out of 20, and overall total scores were used for selection.



Results


16명의 지원자, 10명에 대해서 MPI 수행, 8명 선발. 면접시간은 총 3시간

Sixteen candidates submitted initial applications to LEAD. Of these, we selected 10 for the MPI stage, 8 of whom were selected for the program. The entire set of MPIs was completed in three hours in one afternoon.


58%의 변인은 pi와 pq:i에 기인함.

The majority of variance among MPI scores (58%) was attributable to the participant–interview interaction (pi) as well as the participant–question interaction nested with MPI (pq:i), which suggests that these facets caused random error in the assessment of applicants.


전체 신뢰도는 0.79

Overall reliability of the MPI component and subsequent average MPI reliability was 0.79. The reliability of questions nested within MPIs (q:i) was 0.97.






Discussion and Conclusions


MIS가 MPI형태에 적용되었을 때 신뢰도가 높아진다. 4개의 MPI만으로도 0.7 이상의 신뢰도를 보여줌. 총 8 faculty hour 소모. 비슷한 수의 지원자를 대상으로 면접을 전통적 방식으로 한다면 13 faculty hours가 필요.

This report provides some evidence that MIS as applied within the MPI format is a reliable selection strategy. High reliability was achieved with just four MPIs, and a d-study revealed that future MPIs can achieve reliability greater than 0.7 with only three MPIs. A total of only 8 faculty hours was spent conducting the MPI process. A comparable traditional admissions personal interview of 40 minutes’ duration with two interviewers would take more than 13 faculty hours (66% more time) for the same number (n = 10) of applicants.


전통적 면접의 이러한 변형은 MIS 도입 가능성을 높여준다. MMI와 같이 기존의 MIS의 방식에 기반한 방식에서는 10명의 독립적 면접이 필요했다. 여기서 MPI는 3개의 인터뷰만으로도 threshold에 도달했다. 아마도 LEAD 선발 과정 때문일 수도 있다. 이러한 절차에서 사용된 MPI에는 좁은 범주의 특성만 평가했기 때문이다. 다른 비학업적 수행능력은 이미 의과대학 입학단계에서 평가되었다. 이러한 구체적인 제한적인 맥락이 면접 신뢰도를 높여줄 수 있다.
This modification of the personal interview has the potential to increase the uptake of MIS in admissions interviews. Previous application of MIS in the MMI showed that at least a minimum of 10 separate interviews were needed to achieve acceptable reliability.6,7 The MPI here met a minimum threshold at 3 interviews. A potential explanation for this finding is the specialized selection context of the LEAD admissions process. The MPI as used in this process focused on a narrow set of attributes related to leadership qualities as determined by LEAD faculty. Other aspects of nonacademic performance had already been assessed in the medical school admissions process. This specialized context also enabled the use of expert raters, which may have further enhanced interview reliability.


이러한 선발절차의 특수(전문)화는 안면타당도에도 기여한다. 안면타당도는 지원자가 지원절차가 직무에 관련되어있다고 믿는 정도라고 묘사되는데, (의과대학에서는 의과대학 교육과정 수행능력에 대한 추정가능성) 지원자가 면접 절차를 받아들이는 정도가 이 face validity와 관련되어있다.

The specialization of this selection process (with interviewers rating applicants’ performances according to a predetermined, defined suite of attributes aligned with a specific physician role—in this case, the role of physician leader) also lends to the face validity of the MPI format. Face validity has been described as the extent to which the applicants believe the application process is relevant to the job in question,16 or—to extrapolate to the medical school context—medical school curriculum. Applicant acceptance of admissions processes has been associated with face validity.16


본 연구에서 평가 특성의 오버랩은 content validity를 높여주었다. 지원서 점수와 MPI 점수의 상관관계가 높은 것은 LEAD 지원절차를 개발하는데 블루프린팅이나 매핑을 의도적으로 그렇게 한 것에 기인할 것이다. 지원서 평가자를 면접관으로 한 것 역시 영향을 주었을 수 있다.

In the current study, the overlap of attributes across the written application and MPIs enhanced content validity (and reliability). The strong association between written application scores and MPI performance is likely a result both of the intentional attribute mapping or blueprinting we performed in developing the LEAD application process and of the availability of written application materials during MPI occasions. The use of raters from the written application as interviewers may have also contributed to the strong association of scores across both evaluations, even though we removed all personal identifying information from the candidates’ written application materials.



Thus, we would not expect the recruitment of applicants to decrease through the use of the MPI format.




1 Dunleavy DM, Whittaker KM. The evolving medical school admissions interview. AAMC Analysis in Brief. 2011;11. https://www.aamc. org/download/261110/data/aibvol11_no7. pdf. Accessed June 13, 2012.


10 Axelson RD, Kreiter CD. Rater and occasion impacts on the reliability of pre-admission assessments. Med Educ. 2009;43:1198–1202.


13 Taylor P, Small B. Asking applicants what they would do versus what they did do: A meta-analytic comparison of situational and past behaviour employment interview questions. J Occup Organ Psychol. 2002; 75:277–294.


15 Huffcut AI, Weekley JA, Wiesner WH, Degroot TG, Jones C. Comparison of situational and behavior description interview questions for higher-level positions. Pers Psychol. 2001; 54: 619–644.
















 2012 Oct;87(10):1330-4.

Modified personal interviewsresurrecting reliable personal interviews for admissions?

Author information

  • 1Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada. mark.hanson@utoronto.ca

Abstract

PURPOSE:

Traditional admissions personal interviews provide flexible faculty-student interactions but are plagued by low inter-interview reliability. Axelson and Kreiter (2009) retrospectively showed that multiple independent sampling (MIS) may improve reliability of personal interviews; thus, the authors incorporated MIS into the admissions process for medical students applying to the University of Toronto's Leadership Education and Development Program (LEAD). They examined the reliability and resource demands of this modified personal interview (MPI) format.

METHOD:

In 2010-2011, LEAD candidates submitted written applications, which were used to screen for participation in the MPI process. Selected candidates completed four brief (10-12 minutes) independent MPIs each with a different interviewer. The authors blueprinted MPI questions to (i.e., aligned them with) leadership attributes, and interviewers assessed candidates' eligibility on a five-point Likert-type scale. The authors analyzed inter-interview reliability using the generalizability theory.

RESULTS:

Sixteen candidates submitted applications; 10 proceeded to the MPI stage. Reliability of the written application components was 0.75. The MPI process had overall inter-interview reliability of 0.79. Correlation between the written application and MPI scores was 0.49. A decision study showed acceptable reliability of 0.74 with only three MPIs scored using one global rating. Furthermore, a traditional admissions interview format would take 66% more time than the MPI format.

CONCLUSIONS:

The MPI format, used during the LEAD admissions process, achieved high reliability with minimal faculty resources. The MPI format's reliability and effective resource use were possible through MIS and employment of expert interviewers. MPIs may be useful for otheradmissions tasks.

PMID:
 
22914517
 
[PubMed - indexed for MEDLINE]


+ Recent posts