의사국가면허시험의 경향(Med Educ, 2016)

Trends in national licensing examinations in medicine

David B Swanson1,2 & Trudie E Roberts3






면허의 목적은 의사들이 더 안전하고 효율적인 의료행위에 필요한 지식과 스킬을 갖추게 하기 위함이다. 그 아래에 깔려있는 원칙은 면허는 모든 의사에게 적용되어야 하며, 어디서 수련을 받았든, 어떤 세팅에서 진료를 하든 면허가 있어야 한다는 것이다. 국제적으로 약간의 차이는 있지만, 의사는 진료행위에 필요한 unrestricted licence를 받기 위해서는 다음의 전부 혹은 일부를 갖춰야 한다.

The purpose of licensure is to ensure that doctors have the knowledge and skills necessary to practise medicine safely and effectively. The underlying principle is that licensing should apply to all doc- tors, regardless of where they trained or the setting in which they practise. With some variation interna- tionally, a doctor must meet several requirements in order to obtain an unrestricted licence to practise, commonly including some or all of the following:


  • 인증받은 의과대학 졸업  graduation from an accredited medical school;

  • 일정 기간동안 지도감독 하에 진료행위를 해야하며, 이 시기는 의과대학 졸업 이후가 되기도 함.   the successful completion of a period of prac- tice under supervision, often after medical school graduation, and

  • 하나 혹은 그 이상의 NLE에서 합격해야 함   a passing score on one or more NLEs.


USMLE의 사례를 보여줄 것이다.

we will provide an overview of an example NLE, the United States Medical Licensing Examination (USMLE),


첫 번째 세 단계는 졸업후 수련에 준비도를 평가하기 위한 것이다. Step 3를 보기 위해서는 여기에 통과해야 한다. 미국의 licensing jurisdiction에서 일반적으로 요구하는 것은 아님에도 미국에서는 일반적으로 이 세 시험을 졸업후수련을 시작하기 전에 응시한다. IMG는 미국 졸업후수련에 들어오기 위해서는 이것들을 통과해야 한다. 일정 amount의 졸업 후 수련을 이수하고 모든 component에 합격하면 unrestricted license를 받을 수 있다.

The first three components are intended to assess readiness for postgraduate training; a passing score on these components is required to sit the final component, Step 3. In the USA, medical students typically take the first three components prior to entry to postgraduate training, although US licensing jurisdictions do not typically require this for entry; international medical graduates must pass these components in order to enter US postgradu- ate training. Along with successful completion of a specified amount of postgraduate training, passing scores on all components are required to be eligible for an unrestricted licence to practise.




NLE는 점점 더 흔해질 것이다.

NLES WILL BECOME MORE COMMON


의과대학의 수와 다양성 증가

The number and diversity of medical schools has increased


지난 20년간 의과대학의 수가 지나치게 증가하여 의학교육의 질을 떨어뜨린다는 우려가 높아지고 있다. 2003년 WMA의 general assembly에서  Dr Hans Karle은 의과대학이 1995년부터 54%증가했다고 말했다. Karle은 '일부 의과대학에서 교육의 퀄리티는 충분하지 못하다. 이들 의과대학의 일부는 정말 필요한 것이었지만, 많은 수는 단순히 본국에서 의과대학에 들어가지 못하는 학생을 유혹하기 위한 사업으로 세워진 것이다'

Over the last two decades, there has been growing concern that the substantial expansion in the num- ber of medical schools has diluted the quality of medical education. In 2003, in a speech to the World Medical Association’s general assembly in Helsinki, Dr Hans Karle, then president of the World Federation of Medical Education, said that the number of medical schools had increased by 54% since 1995. Dr Karle was quoted as saying: ‘The quality of education in some schools is not good enough. Some of these schools are badly needed, but many are being set up simply as businesses to attract students who cannot get into medical schools in their own countries.’5



Boulet 등은 의과대학의 지속적 팽창을 보고하면서 IMED에 등록된 1900개의 의과대학을 언급했다. 영리적 목적의 사립대학은 캐리비안, 남아메리카, 남아시아애 많다. 인도의 사례.

Boulet et al.6 also reported on the continued rapid expansion of medical schools, 1900 of which are listed in the International Medical Education Directory. Private for-profit schools were particularly popular in the Caribbean region, South America and South Asia. 

      • In India, during the period from 1970 to 2005, whereas the number of public medi- cal schools grew by 36%, the number of private medical schools increased by 1120%.7



학부의학교육의 다양성은 지역 내에서의 그리고, 학교간 학습성과와 스탠다드의 차이와 관련된다는 근거가 있다. 

There is evidence that the diversity of undergradu- ate medical education is associated with regional and between-school differences in standards and learning outcomes. 

      • 영국에서 의과대학에 따라서 RCGP 회원자격시험 합격률이 다르다.
        In the UK, Wakeford et al.8 reported large differences among medical schools in pass rates on the membership examination for the Royal College of General Practitioners; 

      • 영국에서 의과대학에 따라서 RCP 회원자격시험 합격률이 다르다. 
        McManus et al.9 reported a similar pattern of large differences among UK schools on several compo- nents of the membership examination for the Royal College of Physicians. 

      • USMLE Step 2에서 Case 등은 평균점수의 상당한 차이를 발견하였다. 그리고 이러한 차이는 학교마다의 교내의intramural standard 차이에 기인하며, Step 1도 마찬가지다.
        Looking at variation in the performance of US schools on USMLE Step 2, Case et al.10 found sizable differences in mean scores; these coexisted with high correlations between scores on Step 2 and within-school measures of stu- dent performance, suggesting that the variation observed in Step 2 performance could be attributed to differences in intramural standards across schools. A similar pattern was seen for USMLE Step 1.11 

      • 지역 간, 그리고 국가 간 차이가 Step 2 CK에서 발견되었다.
        In a more recent study, Holtzman et al.12 reported large regional and country-to-country varia- tion on the Step 2 Clinical Knowledge component of the USMLE, confirming a pattern observed almost two decades earlier.13 


개별 연구에 대해서는 언제나 설명이 가능하지만, 이 모두를 종합해서 보면, 교육과정 접근법이 국가적으로, 세계적으로 다양해짐에 따라서 학습성과 또한 다양해졌다.

There are always multi- ple potential explanations for the results of individ- ual studies, but, taken together, the findings suggest that the diversity of curricular approaches used nationally and internationally results in diversity in learning outcomes.




의료인력 유동성이 높아진다.

The medical workforce is increasingly mobile


역량에 관한 문제는 한 국가에서 수련받고 졸업한 의사가 다른 환경으로 갈 때 생긴다.

Competency issues can arise when doctors training and graduating in one country and culture move to practise in another environment.


유럽에서, 의료진의 이동은 1940년대부터 관찰되었으며 해가 지날수록 그 패턴이 다양해지고 있다. 

In Europe, medical migration has been observed since the 1940s and has shown various patterns over the years. 

    • 동유럽 The migration of doctors from Eastern Europe started before the accessions of their countries that resulted from the political transitions of the late 1980s and 1990s. 

    • EU의 확장 The enlarge- ment of the European Union, initially in 2004 and subsequently in 2007, has resulted in increased mobility, especially from east to west.15


전 세계적으로 학부의학교육의 형태과 내용에는 큰 차이가 있다. 유럽에서 학부교육에 필요한 기간은 European directive에 의해서 강제되며 최소한 5년, 5500시간이 요구된다. 대부분의 유럽 국가에서는 의사국가시험이 없으며, 각 의과대학은 자신의 졸업시험이 있다. 스위스와 같은 예외도 있다.

Across the globe, there are major differences in the formats and content of undergraduate medical edu- cation programmes. In Europe the length of time for undergraduate training is mandated by Euro- pean directive as a minimum of 5 years and 5500 hours. In most European countries, there is no national licensing examination at the end of the period of study, and each medical school has its own graduating examination requirements. There are exceptions, such as in Switzerland, which brought in a national licensing assessment in 2004.


미국에서 의과대학은, 대학원 과정second degree이며 4년이다. 캐나다하고 유사하게, 학사학위가 필요하며, 의과대학에 들어가기 전 1~2년정도 대학 수준에서 공부가 필요하다. 캐나다의 대부분 의과대학은 4년이며, 일부는 3년제도 있으며 이 경우 여름방학이 없다. 유럽과 달리 미국과 캐나다는 NLE를 모든 의과대학 졸업생이 응시해야 하고, 국내에서 수련을 받았든 해외에서 수련을 받았든 근무work할 수 있으려면 통과해야 한다. 브루나이를 제외한 모든 ASEAN국가는 어떤 형태든 NLE를 운영하며, 중국도 국가시험이 있다.
In the USA, the medical course, which is usually a second degree, is 4 years in duration. Similarly in Canada, students will have a primary degree or will have studied for at least 1–2 years at university level before entering medical school. Most medical courses in Canada are 4 years in duration; schools that offer 3-year programmes (McMaster and Cal- gary Universities) are exceptions and their pro- grammes do not include summer breaks. By contrast with Europe, both the USA and Canada have NLEs that all medical graduates, whether they were trained domestically or internationally, are required to pass before they can work. All ASEAN countries other than Brunei (i.e. Darussalam, Myan- mar/Burma, Cambodia, Indonesia, Laos, Malaysia, Philippines, Singapore, Thailand and Vietnam) operate some form of NLE, and China also has national examinations.


일부 국가에서는 질관리프로그램을 활용하여 NLE와 비슷한 성과를 내고자 한다. 예를 들면 영국의 모든 의과대학은 GMC에서 인증을 받아야 한다. GMC는 평가의 성격과 형식에 관한 정책문헌을 발간한다. 현재 시스템이서 GMC가 선발한 팀이 (의학교육자들, 임상가, 학생, 일반인 대표자) 매 5년마다 대학을 방문하여 inspect한다. 비록 기준을 충족시키지 못한 학교가 폐쇄될 수도 있지만, 실제로 거의 발생하지 않는다. 만약 기준을 충족시키지 못하면 일련의 monitored requirement와 recommendation이 발생한다. 이 시스템에도 불구하고 영국은 현재 NLE도입을 염두에 두고 있다.

In some countries, quality assurance programmes are used in an effort to achieve outcomes similar to those of NLEs; for example, all medical schools in the UK are accredited by the General Medical Council (GMC). The GMC publishes policy docu- ments advising on the nature and format of assess- ments. In the current system, GMC-selected teams, which include medical educationalists, clinicians, students and lay representatives, visit schools every 5 years to inspect the course. Although schools that fail to meet the required standards can be closed, in practice this has never happened. If the standards are not met, a series of monitored requirements and recommendations are put in place. Despite this system, the UK is currently considering the implementation of an NLE.16




NLE 퍼포먼스가 실제 퍼포먼스를 예측한다.

Performance on NLEs predicts performance in practice


일정 수준에서 NLE퍼포먼스는 이후 의료행위의 퍼포먼스를 예측한다.

To a degree, performance on NLEs predicts subse- quent performance in practice.


Tamblyn 는 MCCQE의 점수가 처방패턴과 유방촬영술 스크리닝 비율을 예측한다는 것을 보여주었다. 이들은 또한 수행능력-기반 NLE의 점수가 환자의 불만과 부적 상관관계에 있음을 보여주었다. Wenghofer 등은 NLE점수가 동료평가 점수와 정적 상관관계에 있음을 보여주었고, 미국에서 일부 연구는 인증시험의 점수가 진료퍼포먼수 점수를 예측하였다. NLE점수는 Certifying exam점수를 예측한다.

Tamblyn et al.17,18 showed that scores on the Medical Council of Canada Qualifying Examination predict such practice behaviours as prescribing patterns and mammography screening rates for family medicine doctors. Tamblyn et al.19 also showed a negative association between patient complaints to medical regulatory authorities for doc- tors licensed in Ontario or Quebec and scores on a performance-based NLE. Wenghofer et al.20 demon- strated that NLE scores were positively related to peer assessments of the quality of care. In the USA, a few studies21,22 have shown the predictive value of certification examination scores for performance in practice; Lipner et al.23 provide a review of this liter- ature. In turn, certifying examination scores are predicted by scores on NLEs.24–26



NLE를 위한 사례

The case for NLEs


NLE를 지지하는 주장은 다음과 같다.

A number of arguments have been offered in favour of NLEs:

    • 중요 영역에서 크게 부족한 의사를 가려낼 수 있다.
         they can screen out doctors with significant defi- cits in important areas of practice;

    • 공립, 사립 의과대학에 대해서 지속적이고 객관적인 역량 스탠다드를 제공하여 환자를 보호한다.
         they can provide consistent, objective evidence about competency standards across both public and private medical schools, thereby helping to protect patients;

    • 해외에서 이주한 의사가 수련과 진료를 시작하기에 앞서 ensure해준다.
         they can be used to ensure the quality of migrant doctors before they commence training or practice in a new country, and

    • 의과대학의 스탠다드를 높여준다. NLE의 실패가 질 향상의 유의미한 촉매가 될 수 있다.
         they can drive up standards in medical schools: failures on NLEs can be a significant catalyst for quality improvement.


NLE에 반대하는 주장은 교육과정의 개혁과 교육방법의 다양성을 저해한다고 우려한다. 비록 미국 레지던트 프로그램에서 NLE점수를 사용하고 있지만 미국과 캐나다에서 이런 경험을 한 것은 아니다. 

The arguments against NLEs relate to concerns about causing reductions in curricular innovation and diversity in teaching methods. This has not been the experience in Canada or the USA, how- ever, although the use of scores on NLEs by US resi- dency programmes in the selection of postgraduate trainees is well documented and makes the stakes associated with USMLE Step 1 very high from the student’s perspective.



내용특이성

CONTENT SPECIFICITY



NLE의 목적은 피험자의 지식과 스킬에 대한 inference를 끌어내는 것이다.

The purpose of NLEs is to permit the drawing of inferences on the knowledge and skills of exami- nees;




어떤 평가방법을 사용하든지, 한 케이스에서의 퍼포먼스가 다른 케이스에서의 퍼포먼스를 아주 잘 예측하는 것은 아니다. 이러한 현상은 내용(사례) 특이성이라고 하는데, 오래 전부터 인식되어 왔다. 같은 현상은 '과업 특이성task specificity'가 있고, 다른 영역의 수행능력 평가에서 오래 전부터 관찰되어 왔다(과학, 수학, 법학 등). 그 결과, 고부담 결정을 충분히 반복가능하게 지지해줄 수 있는 점수를 얻기 위해서는 평가의 length는 충분히 길어야 하고, 충분한 영역을 포함하여야 한다.

Regardless of the assessment method used, perfor- mance on one case does not predict performance on other cases very well.29 This phenomenon, com- monly termed ‘content (or case) specificity’ in the literature on medical education,30,31 has been recog- nised for many years.32–35 The same phenomenon, termed ‘task specificity’, has been observed for performance assessments in other areas, including science, mathematics and law.36–38 As a conse- quence, assessments must be of sufficient length and cover an adequate breadth of material in order to obtain scores that are sufficiently reproducible to support high-stakes decisions.


'충분한 길이'가 무엇인지 알기 위해서 Table 2는 통계적으로 예측된 신뢰도를 보여주고 있다.

To quantify what ‘sufficient length’ means, Table 2 provides statistically projected reliability (generalis- ability) coefficients as a function of testing time for the computer-based components of the USMLE. Values of 0.8 or higher are desirable for examinations on which high-stakes decisions are based.


Step 3에서 합/불합 결정은 MCQ와 CCS 점수를 합산하여 이뤄진다.

The first of these col- umns refers to the MCQ component of Step 3; the second refers to the computer-based case simulation (CCS) component. As will be discussed further in the next section, the latter format was developed to assess physicians’ patient management skills.40 Indices of CCS reproducibility as a function of testing time are significantly lower than for the MCQ component of Step 3: For this reason, pass/fail decisions for USMLE Step 3 are based upon a composite of MCQ and CCS scores,41 rather than on each component independently.


내용특이성의 원인은 불명확하고, skill의 전이를 향상시키기 위한 방법에 대한 연구가 이뤄지고 있다. master-apprentice 접근법은 임상 교육에서 많이 사용되고 있으며, 일부 원인일지도 모른다. 임상 로테이션은 서로 분절된disparate 위치에서 이뤄지는데, 학생의 경험에 상당한 차이를 유발한다. 즉, 학습의 성과가 idiosyncratical하게 다르기 때문에 내용특이성이 생기는 것일 수 있다. 동시에, 내용특이성이 측정의 artefact에 따른 부수현상epiphenomenon이라고 보기도 한다.

The causes of content specificity are unclear35 and research on methods to improve the transfer of skills learned in one situation to another through instructional interventions is underway. The ‘mas- ter–apprentice’ approach used in much of clinical instruction may be partially responsible. Clinical rotations take place at disparate sites, both across and within schools, resulting in substantial variabil- ity in trainees’ experiences. It is, per- haps, not surprising that learning outcomes may vary idiosyncratically, and content specificity may be the assessment consequence. At the same time, others have suggested that content specificity is an epiphenomenon attributable to measurement artefacts.43,44





인지능력 평가

COGNITIVE ASSESSMENTS


Bennett는 CBT의 세 세대에 대해서 설명했다.

Bennett45,46 describes three generations of com- puter-based testing (CBT). 


    • In the first generation, the emphasis was largely on building the infrastruc- ture for test delivery. Although this involved sub- stantial investments in testing centres and rapidly evolving computer hardware and software to admin- ister examinations, assessments resembled tradi- tional tests, differing little in design and item format from their paper-and-pencil counterparts and taking limited advantage of the technology.47 The major innovation during this period was the introduction of computer-adaptive testing (CAT) in which software selects items sequentially based on the current estimate of an examinee’s proficiency.48 When used in conjunction with a large pool of items of varying difficulty, CAT makes it possible to efficiently achieve relatively consistent precision of measurement throughout a score scale. This is particularly important for diagnostic testing, which requires the accurate estimation of examinee proficiency across a broad range of distinct content areas. For licensure testing, CAT can also be useful in reducing item exposure and testing time, but, because the reproducibility of pass/fail decisions is most important, the administration of pre-con- structed fixed forms targeted at the pass/fail point is often simpler and can be as effective.

    • In the second generation of CBT, incremental changes were made in item formats to include mul- timedia formats, short constructed responses, and other enhancements to traditional item formats made possible through computer delivery. Often, new item types were incorporated simply because they were different from traditional MCQs, had visual appeal, or were otherwise ‘interesting’, rather than because they targeted key competencies that could not otherwise be measured. 

    • The (current) third generation of CBTs has begun to incorporate more complex, theory-based simulations and interactive performance tasks replicating important features of real environments and assessing new skills in more sophisticated ways. There is also great potential for integration of assessments with instruction, sampling performance repeatedly over time.


비교적 최근에 도입된 것은 자동화 문항 생성(AIG)이다.


A relatively recent and interesting application of computer technology in medical education is auto- mated item generation (AIG),54–56 although a more accurate term may be ‘computer-assisted item gener- ation’. Traditionally, MCQs for NLEs were painstak- ingly written and reviewed by committees of content experts. It can be challenging to recruit sufficient numbers of content experts and expensive to have them travel to a central site to review test items. With AIG, content experts first create item models or templates that highlight item elements (e.g. patient findings in the stem, distractors) to be manipulated. Software is then used to systematically manipulate these elements in each item model to generate hundreds of new (typically quite similar) items, which can then be reviewed by content experts. Initial research has produced promising results, demonstrating that content experts are unable to differentiate items developed in the tradi- tional manner from those produced with AIG.55 In the longer term, AIG could prove to be a useful technique for producing items in large numbers, meeting a significant practical need for NLEs (particularly those given throughout the year) and for formative assessments administered repeatedly during training.



USMLE Step 3의 CCS는 Bennett가 말한 3세대 CBT이다. CCS의 특징은 다음과 같다.

The CCS component used in USMLE Step 3 pro- vides an example of Bennett’s45 third-generation computer-based assessment. 

  • Each CCS case begins with an opening scenario describing a patient’s loca- tion and presentation.57 

  • Using free-text entry, the examinee then orders tests, treatments and consul- tations while advancing the case through simulated time. 

  • The system recognises over 12 000 abbrevia- tions, brand names and other terms representing more than 2500 unique actions. 

  • The patient’s con- dition changes according to both the actions taken by the examinee and the patient’s underlying clinical problem. 

  • Performance is scored using a computer-automated algorithm that models the judgements that would have been produced by expert clinicians.


앞으로 몇 년간 NLE에 컴퓨터-기반 시뮬레이션이 추가로 이뤄질 것이다 .

Over the next few years, additional computer-based simulation formats are likely to be introduced into NLEs. 

  • Response formats can be enhanced to assess skills in: 

    • differential diag- nosis; the use of diagnostic studies and therapeutic options; 

    • writing orders for admission, discharge and other transfers of care; 

    • the management and recon- ciling of medications, 

    • and other common clinical tasks. 

  • The incorporation of multimedia and enhancements to response formats implies a poten- tial natural link to work on entrustable professional activities, bridging the gap between competencies and assessment.58,59


모든 것을 최대한 현실과 유사하게 시뮬레이션 한다는 것은 매혹적이지 모르지만, written simulation의 파란만장한 역사로부터 그 위험성에 대해 배울 점이 많다. 고부담 평가에서 key feature 문제에서 활용된 것과 유사한 접근법을 사용하는 것이 바람직해 보인다. 측정의 효율성을 높이기 위해서 사례는 최대한 간결해야 하며, key clinical decision에 초점을 두어야 한다. 

It will be tempt- ing to simulate everything as realistically as possible, but much has been learned from the chequered history of written simulations (patient management problems) about the dangers inherent in that approach,29,33 which often result in wasted testing time and scoring problems. For high-stakes assess- ments, adopting an approach similar to that used in ‘key feature’ problems49,64 seems warranted. To improve measurement efficiency, cases should be kept short and should focus on key clinical deci- sions that are critical and essential steps in problem resolution and most likely to result in errors and poor patient outcomes in the real clinical environ- ment.



최근까지 NLE는 거의 전적으로 closed-book test였으며 정보를 recall하는 능력을 평가했다. CBT와 함께 지원자들은 온라인 참고문헌을 볼 수 있게 되었고, 이것은 Bennett이 말한 3세대의 또 다른 특징이다. 이는 어떻게 의사들이 현재 실제 진료실에서 진료를 하는지를 더 잘 모방할 것이며, 피험자의 스스로의 한계를 찾아내는 능력까지를 (간접적으로)평가할 수 있게 될 것이고, 관련된 외부 자료에 접근하는 능력, 수집된 정보를 환자진료결정에 통합하는 능력도 평가할 수 있게 될 것이다.

To date, NLEs have almost exclusively represented ‘closed-book tests’ assessing the examinee’s ability to recall information, as well as to apply it to make clin- ical decisions. With CBT, it is now possible to permit examinees to consult online reference materials (‘open-book tests’) during test administration; this is another characteristic of Bennett’s third generation of CBT.45 This will better mimic how doctors currently practise in the real clin- ical environment, and provide (indirect) assessments of examinees’ abilities to identify the limits of their own knowledge, to quickly access and understand relevant external resources, and to integrate the accessed information into patient care decisions.


NLE에서 OSCE

OBJECTIVE STRUCTURED CLINICAL EXAMINATIONS IN NLES


Harden and Gleeson이 1979년 OSCE를 처음 언급하였고, 그 이후 OSCE는 빠르게 전파되었다. 최근에는 의사면허와 Qualifying 시험에 도입되었다.

Harden and Gleeson65 first described the OSCE in this journal in 1979. Since then, the use of OSCEs has spread quite quickly, first in school-based assess- ments and, more recently, in national licensure and qualifying examinations in several countries, includ- ing Australia, Canada, South Korea, Switzerland, Taiwan, the UK and the USA.


많은 OSCE는 여전히 '전통적' 형태로서, 피험자가 4~5분간 임상과제를 해야 한다.

Many OSCEs still use the ‘classic’ station format: examinees have 4 or 5 minutes to complete a clini- cal task


advanced된 피훈련자에게는 더 길고 복잡한 스테이션이 적절할 것이다.

longer and more complex stations seem more appropriate for advanced trainees and for NLEs in order to assess examinees’ capacity to integrate the constellation of competencies required for the provision of safe and effective patient care.66


시뮬레이션-기반 평가로부터의 validity inference가 부족하다는 점에서 우리는 simulation의 fidelity에 지나치게 초점을 둬왔다는 일부 저자의 지적에 동의하며, fidelity라는 단어가 'physical resemblance'나 'functional task alignment'로 바뀌는게 낫다는 지적에 동의한다.

Given that evidence for the validity of inferences from simulation-based assessments is sparse,60,63,68 we agree with several of these authors that, firstly, there has been too much focus on increasing the fidelity of simulations and, secondly, the term ‘fidelity’ should be abandoned in favour of ‘physical resemblance’ and ‘functional task alignment’.


안타깝게도, 여전히 OSCE기반 시험에 스테이션 수는 적은 편이며, 그 결과 점수와 합/불합 결과가 매우 reproducible하진 않다.

Unfortunately, it is still common for OSCE-based tests to include relatively small numbers of stations. As a consequence, scores and pass/fail outcomes on such tests are not very reproducible,


시험 자원을 더 효율적으로 사용하기 위해서, sequential testing이 도움이 될 수 있다. multi-stage 혹은 flexi-level testing으로도 알려져 있으며, 이 방법은 초기에 짧은 initial screening 시험을 거쳐서 빠르게 합격할 피험자를 파악하는 것이다. 

To make more effective use of testing resources, it may be helpful to consider the use of ‘sequential testing’, also known as ‘multi-stage’ or ‘flexi-level’ testing in the general assessment literature,69,70 with OSCEs.71,72 This approach involves the administra- tion of a relatively short initial screening examina- tion that is used to quickly identify examinees who will clearly pass; this group is excused from further testing, and the assessment is continued for the remaining examinees.



근무지 기반 평가

WORKPLACE-BASED ASSESSMENTS


WBA의 formative한 사용이 늘어날 것이고, summative 한 사용은 challenging 하다.

The formative use of WBAs will increase; their sum- mative use in NLEs will prove challenging.


다양한 WBA의 방법

 In their very useful Association for Medical Education in Europe (AMEE) guide, Norcini and Burch74 describe methods for formative WBAs, including 

  • the mini-clinical evaluation exercise (mini-CEX), 

  • clinical encounter cards, 

  • clinical work sampling, 

  • blinded patient encounters, 

  • direct observation of procedural skills (DOPS), 

  • case-based discussion, 

  • and multi- source feedback (MSF).



WBA가 일상의 진료행위를 더 잘 반영하고, 피험자의 학습즐 자극할 수 있지만 문제도 있다. 이것은 기껏해야 formative하게만 사용 가능하다. 최근에는 신뢰도 타당도를 높이는 기전이 제시된 바 있다.

Although WBAs more closely reflect everyday prac- tice and can stimulate trainee learning, they are not without challenges. As Norcini and Burch note in their AMEE guide,74 they are best used formatively, not summatively. Recent research has also suggested some mechanisms for improving the reliability and validity of scales used in making WBA judgements by aligning them with the con- structs of developing clinical sophistication and ‘entrustability’.79



안타깝게도, summative한 활용은 피훈련자와 훈련자가 평가를 바라보는 관점을 바꿀 수 있다. 피훈련자는 'dove'로 보이는 훈련자를 골라내거나 자신이 자신있는 분야를 고르고자 할 수 있다. 비슷하게 훈련자는 평가를 'tick box exercise'로 볼 수 있다. 이러한 문제를 해결하기 위해서 GMC는 두 개의 종류의 WBa사용을 제안하였다.

Unfortunately, the summative use of WBAs can change how trainees and trainers view the assess- ment. Trainees may choose assessors who are seen as ‘doves’ and may pick out cases perceived to be less challenging or in areas in which they feel confident. Similarly, trainers may treat these assessments as ‘tick-box exercises’.80 To address these issues, the UK GMC proposed using two different types of WBA.81 

  • One type – referred to as supervised learning events (SLEs) – would be developmental and not used to determine progress. 

  • The second type – referred to as assessments of practice (AoPs) – would be used summatively to inform judgements about trainees’ progress. 

UK Foundation Programme에 SLE가 도입되었음.

The use of SLEs has now been incorporated into the UK Foundation Programme.81



WBA가 피훈련자들 사이의 실제 차이를 반영할 수 있지만, 그러한 차이는 교수가 판단을 내리는 기준이 다르거나 환자 집단의 특성이 다르거나, 활용가능한 자원이 다르기 때문에 생길 수도 있다. WBA의 risk adjustment를 위한 방법이 특히필요하다. 그러나 그러한 confounding factor를 보정할 분석적 방법이 없다. WBA 결과가 표현되는 metrics는 본질적으로 임상적 맥락과 연결되어 있다. 형성평가에서 이것은 문제가 되지 않으나, 총괄(평가)적 활용은 정당화하기 어렵게 된다.

Although the WBA results may reflect real differences in the trainees’ competence, there are alternative explanations, including variation in the standards used by faculty staff to make judge- ments, and differences in the patient populations (case mix) and resources available. Methods for risk adjustment for WBAs are particularly needed.82 However, there is no analytic way to adjust for these confounding factors: the ‘metrics’ on which WBA results are expressed are inherently tied to the clini- cal contexts in which they are obtained. This is not important for formative assessments in which the goal is to stimulate learning, but it makes the direct summative use of WBA results in NLEs challenging and difficult to justify.



마무리

CONCLUDING THOUGHTS



NLE가 비용-효과적인지 안니지는 어려운 문제이고 관점에 따라 다를 수 있다.

Whether or not NLEs are cost-effective is a difficult question and probably a matter of perspective. 

  • Costs undoubtedly vary substantially across countries as a function of both examination design and numbers tested.85 As an example, it is straightforward to esti- mate the dollar cost of the USMLE from public infor- mation, at least from the examinees’ perspective. An examinee graduating from a US school who passes each USMLE component on the first attempt would pay a total of roughly US$3200 in examination fees; the total fees paid by an analogous IMG are roughly US$800 more.86–88 Even with the addition of travel expenses to test sites, these costs are quite low relative to those of medical school tuition in many countries, although they are still substantial. The combining of cost information with USMLE examinee counts showed that total examina-tion fees, across USMLE components, amounted to approximately US$120 million in 2014.86–88


여기서는 초기 면허에만 초점을 뒀고, 이 경우 평가할 skill은 상대적으로 균질하다. revalidation에 관한 문제는 더 복잡하다. 

We have deliberately focused this arti- cle on initial licensure, for which the skills to be assessed are relatively homogeneous. Issues for reval- idation are much more complex and as doctors’ practices differentiate and evolve, typically become narrower. Norcini et al.89,90 and Melnick et al.91 pro- vide cogent discussions of the challenges posed.


명확하게, NLE는 많은 한계가 있다. NLE는 잘 해봐야 진료에 필요한 역량을 측정할 뿐이며, 그 의사가 진료를 유능하게competently할 것인지를 알려주지는 않는다. 그 결과, NLE에서 좋은 퍼포먼스를 보이는 것이 실제 진료를 잘 하는 것으로 이어지지 않을 수 있다. 그러나 NLE에서 안 좋은 퍼포먼스를 보이는 것은 (환경적 요인이 의사의 부족한 점을 보완해줄지라도) 실제 진료에서도 안 좋은 모습을 보일 가능성이 많다. 

Clearly, NLEs are subject to many limitations. At best, they can only measure competence to prac- tise, not whether a doctor does (or will) perform competently in practice. As a consequence, good performance on an NLE does not guarantee good performance in practice. However, poor perfor- mance on an NLE does suggest that performance in practice may fall below acceptable levels, although clearly other factors, notably systems of care, may compensate for individual doctors’ short- comings or exacerbate their weaknesses.


16 General Medical Council. National Licensing Examination 2014. http://www.gmc-uk.org/06_ National_Licensing_Examination.pdf_57876215.pdf. [Accessed 30 March 2015.]


43 Kreiter CD, Bergus GR. Case specificity: empirical phenomenon or measurement artefact? Teach Learn Med 2007;19:378–81.


53 Holtzman KZ, Swanson DB, Ouyang W, Hussie K, Allbee K. Use of multimedia on the Step 1 and Step 2 Clinical Knowledge components of USMLE: a controlled trial of impact on item Acad Med 2009;84:90–3.


90 Norcini JJ, Lipner RS, Grosso LJ. Assessment in the context of licensure and Med 2013;25:62–7.








 2016 Jan;50(1):101-14. doi: 10.1111/medu.12810.

Trends in national licensing examinations in medicine.

Author information

  • 1Academic Programmes and Services, American Board of Medical Specialties, Chicago, Illinois, USA.
  • 2Department of Medical Education, University of Melbourne Medical School, Melbourne, Victoria, Australia.
  • 3Leeds Institute of Medical Education, University of Leeds, Leeds, UK.

Abstract

CONTEXT:

As a contribution to this special issue commemorating the journal's 50th volume, this paper seeks to explore directions for nationallicensing examinations (NLEs) in medicine. Increases in the numbers of new medical schools and the mobility of doctors across national borders mean that NLEs are becoming even more important to ensuring physician competence.

OBJECTIVES:

The purpose of this paper is to explore the use of NLEs in the future in the context of global changes in medical education and health care delivery.

METHODS:

Because the literature related to NLEs is so large, we have not attempted a comprehensive review, but have focused instead on a small number of topics on which we think we have something useful to say. The paper is organised around five predicted trends for NLEs.

DISCUSSION:

The first section discusses reasons why we think the use of NLEs will increase in the coming years. The second section discusses the ongoing problem of content specificity and its implications for the design of NLEs. The third section examines the evolution of large-scale, standardised cognitive assessments in NLEs and suggests some future directions. Reflecting the fact that NLEs are, increasingly, attempting to assess more than just knowledge, the fourth section addresses the future of large-scale clinical skills assessments in NLEs, predicting both increases in their use and some shifts in the nature of the stations used. The fifth section discusses workplace-based assessments, predicting increases in their use for formative assessment and identifying some limitations in their direct application in NLEs. The concluding section discusses the cost of NLEs and indulges in some further speculations about their evolution.

© 2015 John Wiley & Sons Ltd.

[PubMed - indexed for MEDLINE]



+ Recent posts