영국 의과대학 졸업시험의 학교 간 차이: 임상역량표준은 서로 대등한가? (Med Teach, 2009)

Variations in medical school graduating examinations in the United Kingdom: Are clinical competence standards comparable?*

Peter Mccrorie & Katharine A.M. Boursicot






도입

Introduction


다른 많은 국가와 달리 영국은 의사국가면허시험이 없다. 모은 학교는 자신만의 졸업시험과 평가시스템이 있다. 의과대학을 졸업했다는 것, 그리고 의학 학위가 있으면 자동적으로 (예외적 경우를 빼고) GMC에 의해서 진료면허를 받게 된다. 여기에 깔린 전제는 모든 영국 의과대학의 시험이 면허와 진료에 필요한 동일한 수준의 최소한의 임상적, 전문가적 역량을 담보한다는 것이다.

Unlike many other countries (Medical Council of Canada 2007;USMLE 2007), in the the UK there is no national licensing Medical examination for medical profession. Every School decides on its own systems of assessment, including its graduating examinations. Graduation from Medical School and the conferment of a University degree in medicine leads automatically (unless there are exceptional circumstances) to the granting of a licence to practise by the General Medical Council (GMC), the medical professional regulatory body(GMC 2007a, b). The underlying assumption here is that the examinations at all the UK Medical Schools guarantee the same levels of minimum clinical and professional competence required for licensure and practice. 


모든 영국의과대학의 시험이 중앙 데이터 뱅크에서 기록되고 모니터되는 것은 아니다. 그러나 현재 운영되는 두 가지 inspection 시스템이 있다.

There is no formal documentation process whereby all the examinations of UK Medical Schools are recorded and are,monitored routinely in a central data bank. There however, two systems of inspection currently in place relating to Medical School examinations:

  • the Quality (Quality Assurance Assurance Agency’s ‘external examiner’ system Agency for Higher Education 2000) and

  • the GMC’s Quality Assurance of Basic Undergraduate Education (QABME)process (General Medical Council 2006). 

외부 검사자 시스템은 모든 영국 고등교육에 적용되는 영국만의 QA 시스템이다. 진행방식과 문제점.

The external examiner system is a peculiarly British system of quality assurance which applies to all the UK Higher Education Institutions.

  • This system involves the visit of academic/clinicians from one Medical School, to other similar institutions to review examinations, both paper-based and clinical.

  • The course documentation is usually sent to the external examiners prior to their visit, as are examination papers.

  • The extent to which examiners comment on the papers some varies widely: examiners produce detailed, lengthy comments; others simply send brief anodyne email messages.

  • The choice of external examiners is usually left to individuals within each Medical School, who may invite their friends or people they know and who will generally share their views.

  • The actual functions of external examiners are variable across institutions – in some cases, external examiners are required to act as examiners in clinical examinations while in others the externals simply observe the process. There is no requirement for them to observe any part of the process – merely to review the results of the assessments.

  • External examiners are required to complete reports, the quality of which varies across institutions. Some examiners send in detailed comments on the assessments, others simply tick boxes on a standard comments form (the format of which varies from institution to institution).

  • The hosting institutions can choose to modify their examinations according to the report if they wish, but they can also choose to over-ride the recommendations.

  • The system cannot be described as robust as the processes are so variable and do not provide formal quantitative comparison of standards of graduates across Medical Schools.


두 번째 inspection 시스템은 GMC의 QABME이다.

The second system of inspection is the GMC’s Quality Assurance of Basic Undergraduate Education (QABME)

  • which was started in 2005 and involves a general inspection of every UK Medical School’s curriculum, teaching, facilities, student support as well as assessment.

  • This system involves a series of visits to a Medical School over a 6-month period and each School is visited once every 5–7 years.

  • New Medical Schools, or those introducing radical changes, are visited several times each year starting at least a year in advance of the commencement of the new course and continuing until the first cohort of students has graduated, 6 or 7 years later.

  • Preparation for such inspections is time-consuming and each School is validated against the GMC’s published recommendations, Tomorrow’s Doctors (General Medical Council 2003).

  • The outcome of the process is a detailed document, published on the GMC website, which summarizes the findings of the QABME process and includes a series of requirements and recommendations.

  • Any requirements imposed on a Medical School are followed up by the GMC in the next academic year.


이 두 가지 프로세스는 서로 다른 것을 검사한다.

The two processes are looking at different things.

  • The external examiner system is designed to evaluate the quality of the assessments run by each Medical School, examiners usually being required to inspect a set of examinations in a single year of the course.

  • The QABME process takes a broader perspective and looks at staffing structure, course management, staff development, curriculum content and design, student support, quality assurance processes as well as assessment design, management and implementation.

 

두 가지 시스템 모두 질적인 평가이며, 학생의 역량을 양적으로 비교하지 않는다. 두 가지 시스템은 모든 방문평가자가 동등한 내적 기준을 갖는다는 것을 전제로 한다. 그러나 임상평가자의 기준이 의과대학마다 다르다는 근거가 있다.

Both systems are qualitative in nature and do not actually compare the professional and clinical competence of students in a quantitative manner. The two systems rely entirely on the assumption that all visitors have the same internal standards. However, there has been some evidence that standards for clinical examinations do vary significantly across Medical Schools at graduation level (Boursicot et al. 2006, 2007b).




방법

Methods


설문 실시

A questionnaire was constructed to collect relevant informa- tion about graduation level examinations from Medical Schools and sent to all members of the Association for the Study of Medical Education (ASME) Education Research Group for validation and scoping of opinion. The questionnaire was modified according to comments received and was then sent out to the ASME Council representative of each of the 32 Medical Schools in the UK.

 

This included a number of paired Schools, have who joint Finals examinations (Leicester/Warwick; Liverpool/Lancaster; Manchester/Keele; Newcastle/Durham; Nottingham/Derby) and so the final number of respondents was 27 (100% response rate).


 

설문에서 다루고 있는 내용

The 13-page questionnaire was in several parts and covered the following:

  • (1) Nature of written Final assessments;

  • (2) Nature of clinical Final assessments; 

  • (3) Nature of other final year assessments (e.g. portfolios, logbooks, attachment reports, RITAs, appraisals, video assessments, audit reports, Special Study Module assessments); 

  • (4) Timing of Finals assessments; 

  • (5) Standard setting;

  • (6) Examiners and examiner training; 

  • (7) Cost; 

  • (8) Views on national examinations and shared assessments.




Results


지필고사

Written assessments


At the time of carrying out the survey (November 2006), most of the UK Medical Schools used one or more of three formats to assess knowledge in Finals –

  • Extended Matching Questions, EMQs (23),

  • Single Best Answer Multiple Choice Questions, SBAs (16),

  • Short Answer Questions, SAQs/Modified Essay Questions, MEQs (14).

A few Schools still use True/False MCQs (5) or Essays (4).

 

다양한 형태

Within this partial uniformity, there is a wide range of test form.

  • 시험의 수 There was variation in the number of papers (1–6),

  • 각 시험 시간 the length of each paper (1–3h),

  • 총 시험 시간 the total length of the written papers (2–12h),

  • 문항당 시간 the time allocated per MCQ item (0.86–1.5 min per item),

  • 최종시험 시기 the timing of written Finals (June of the penultimate year until June of the Final Year) and

  • 문항 혼합형태 the mix of formats used (e.g. from a single MCQ paper to SBAs, EMQs, MEQs and SAQs).

 

P/F판단기준

  • Twenty-two Schools reported that they set passing standards for written papers using either the Angoff (Angoff 1971) or Ebel (Norcini 2003) methods.

  • Three Schools reported used the Hoftsee (Norcini 2003) method.

  • Two Schools had fixed pass marks of 50% and one of 60%.



임상평가

Clinical assessments


대부분 OSCE를 시행. 그러나 차이가 많음.

The vast majority of Schools assess the clinical competence of their students using an Objective Structured Clinical Examination, OSCE (25 Schools; Table 2). However there is a wide range in

  • 스테이션의 수 the number of stations (6–48),

  • 각 스테이션의 길이 the length of each station (4, 5, 6, 7, 7.5, 10, 12, 15, 30 or 45 min),

  • OSCE의 스타일 the style of the OSCE [traditional circuit format, static observation of video projections or Practical Assessment of Clinical Examination Skills (PACES)-style examination] (MRCP 2008), and

  • OSCE시행 시기 the timing of the OSCEs.

 

또 다른 차이는..

There were variables in...

  • circuit마다 변하는 스테이션의 정도 the extent to which stations were changed between circuits,

  • 합격 기준 what constituted a pass – achieving a set pass mark, or passing a minimum number of stations, or

  • 채점표 both – the style of marksheet (checklist or global),

  • SP 또는 진짜 환자 활용 the use of real and/or simulated patients and

  • 평가 영역 the range and domains of skills tested.


또 다른 임상평가 도구로는..

Other clinical assessments in use include different formats developed for clinical performance assessment (Boursicot et al. 2007a):

  • DOPS (Direct Observation of Clinical Practice) – eight Schools, requiring demonstration of competence in 3–38 skills;

  • the mini Clinical Evaluation exercise (mini-CEX) (Norcini et al. 2003) – eight Schools, still largely in embryonic development, but averaging at around six mini-CEXs per year;

  • Objective Structured Long Examination Record (OSLER) – seven Schools, the number of assessments ranging from 1 to10, each 20–60min long; and

  • Long Cases – five Schools, competence to be demonstrated in only two long cases.



mini-CEX, an OSLER and a long case를 구성하는 것이 무엇인가에 따른 차이가 컸지만, 포괄적으로 보았을 때, 거의 동일한 평가 방식이라고 볼 수 있다. 

There was much variation in the definition of what constituted a mini-CEX, an OSLER and a long case; indeed, broadly speaking, they could be considered to be much the same form of assessment.


Sequential testing하는 학교. 초반에 통과하지 못한 학생은 시험시간이 2배가 되며, 신뢰도를 높일 수 있다.

Four Schools operated sequential testing whereby students who performed well in an OSLER, an OSCE or a series of mini- CEXs were exempt from further testing. Around two-thirds of students usually fell into this category, the remainder having double the amount of testing time, thereby increasing the reliability of the pass/fail decision.


대부분의 학교는 borderline group이나 borderline regression으로 합격기준 설정

Most Schools reported using either the borderline group or borderline regression method to set the passing standard for OSCEs (17). Other methods reported were a modified Angoff procedure (4) and Contrasting Groups (1). Three Schools used grade descriptors, one used global ratings and one used norm referencing at the level of each station with a minimum permitted number of stations failed.



포트폴리오 평가

Portfolio assessments


사용 방식이 다양함.

Increasingly, Medical Schools were using portfolios in assessment. How they were used varied widely; some were in the style of a record of achievement – a collection of evidence while some could be described as a reflective diary. Some Schools declared the portfolio a formative assessment; others summative. In two Schools, the portfolio was the only assessment in the final year of the medical course (Davis et al. 2001).



QA

Quality assurance


The number and expertise of External Examiners at the different Schools was variable. Numbers ranged from 3 to over 30. Most were discipline-specific, although seven Schools stated one of their examiners was a medical educationalist (or a clinician with an understanding of assessment in medical education). Most schools stated that their internal examiners were trained for clinical assessments but rarely for written assessments; external examiners were generally assumed not to need training.






 


 

Discussion


결국 문제는 우리가 이 모든 평가방식이 임상적/전문가적 역량의 동일한 최저 기준을 보장한다는 것을 받아드일 것이냐 아니면 국가면허시험을 도입할 것이냐이다. Boursicot 등은 다섯개 의과대학에서 동일한 최종 OSCE 스테이션에서 modified Angoff method를 사용하였을 때 한 의과대학에서 합격할 학생이 다른 대학에서 불합격할 정도로 차이가 크다는 것을 보여주었다.

An issue is whether we accept that all these different assessments guarantee the same minimum standards of clinical and professional competence at gradua- tion from Medical School, or whether we should introduce a national licensing examination which all graduates have to pass in order to be licensed to practise. Boursicot et al. (2006) found that, in five Medical Schools, pass marks for six identical Finals OSCE stations, derived using a modified Angoff method, varied sufficiently such that students who would have passed at one Medical School would have failed at some of the others.



2008년 MRCP시험에서 학교간 차이가 크다는 것을보여주었다.

In 2008, a paper was published (McManus et al. 2008), which showed that graduates from different UK medical schools perform significantly differently in all parts of the MRCP (UK) examinations. One interpretation is that this is a consequence of the variation in standards required by different medical schools across the UK at graduation.


Schuwirth 는 한 차례의 시험으로 판단을 내리는 것이 위험함을 주장했다.

Schuwirth (2007) argued that it is dangerous to make judgements based on single-shot assessments,


 

Schuwirth는 한 차례의 시험이 특정 시점엣의 역량을 체크할 수는 있지만, 한 학생이 의사로서 진료를 할 수 있게 적합하느냐에 대한 인증을 위해서는 학생의 발달과정을 보여주는 종단적 요소를 고려해야 한다고 했다. 이는 신뢰도 놓게 평가하기 어려운 고차원적 스킬 (professionalism, scholarliness and critical thinking)에서 특히 그렇다.

He believes that, while single-shot assessments can be used to check competencies at particular points in time, certification of a medical student’s fitness to practise needs to take into account some element of long- itudinal evaluation of a student’s progress. This is particularly true for those higher-order skills which are so hard to assess reliably – professionalism, scholarliness and critical thinking.


평가의 조합을 사용하는 것에는 장점이 있다.

There is some merit in arguing in favour of a combination of approaches to assessment – perhaps a national licensing examination, and/or a School-based Finals examination, plus some form of in-School measurement of continuous progress, such as a portfolio, or progress tests.


GMC는 국가면허시험 도입을 위한 consultation을 시작했으며 Tooke 보고서는 지식 위주의 국가시험 도입을 FY1 말미에 도입하는 것을 타협안으로 할 수 있음을 제시했다.

The GMC set up a consultation on the introduction of a national licensing examination (GMC 2007a, b) in the UK and decided not to proceed along this path. The Tooke Report suggested the introduction of a national examination for knowledge but first not clinical skills at the end of the Foundation be may Year (1 year after graduation): this a reasonable compromise (Tooke 2007).



Tooke J. 2007. Aspiring to excellence: Independent inquiry into modernis- ing medical careers. London: Universities UK.


 



 2009 Mar;31(3):223-9. doi: 10.1080/01421590802574581.

Variations in medical school graduating examinations in the United Kingdom: are clinical competence standards comparable?

Author information

  • 1Centre for Medical and Healthcare Education, University of London, 4th Floor Hunter Wing, St George's, Cranmer Terrace, London SW17 0RE, UK. Email: mccrorie@sgul.ac.uk

Abstract

BACKGROUND:

While all graduates from medical schools in the UK are granted the same licence to practise by the medical professional regulatory body, the General Medical Council, individuals institution set their own graduating examination systems. Previous studies have suggested that the equivalence of passing standards across different medical schools cannot be guaranteed.

AIMS:

To explore and formally document the graduating examinations being used in the UK Medical Schools and to evaluate whether it is possible to make plausible comparisons in relation to the standard of clinical competence of graduates.

METHODS:

A questionnaire survey of all the UK medical schools was conducted, asking for details of graduating examination systems, including the format and content of tests, testing time and standard setting procedures.

RESULTS:

Graduating assessment systems vary widely across institutions in the UK, in terms of format, length, content and standard setting procedures.

CONCLUSIONS:

We question whether is it possible to make plausible comparisons in relation to the equivalence of standards of graduates from the different UK medical schools, as current quality assurance systems do not allow for formal quantitative comparisons of the clinical competence of graduates from different schools. We suggest that national qualifying level examinations should be considered in the UK.

[PubMed - indexed for MEDLINE]


영국 의과대학 간 졸업생의 MRCP, PACES 시험 수행능력 차이(BMC Medicine, 2008)

Graduates of different UK medical schools show substantial differences in performance on MRCP(UK) Part 1, Part 2 and PACES examinations

IC McManus*1, Andrew T Elder2, Andre de Champlain3, Jane E Dacre4, Jennifer Mollon5 and Liliana Chis5





배경

Background


GMC의 Education Committee는 2006년 보고서에서 여러 영국 의과대학 졸업생간 지식/술기/행동의 차이가 유의하게 차이가 난다는 것을 평가할 정보가 부족함을 강조하였다.

The Education Committee of the General Medical Council (GMC), in its wide-ranging report of June 2006, Strategic Options for Undergraduate Education in the United Kingdom [1], highlighted the lack of information available to assess whether graduates from different UK universities vary sig- nificantly in the knowledge, skills or behaviours which are likely to be relevant to their future competence or per- formance as doctors.


더 최근의 2007년 Tooke 보고서에서는 "국가 수준의 지식 시험"이 도입되어야 한다고 주장하면서 ,"국가시험은 의과대학 내에서의 발전을 장려할 것이며, 의과대학이 새로운 교육과정을 개발할 때 안전장치 역할을 하고, 핵심 지식과 술기가 확실히 교육되고 평가되게 해줄 것이다"라고 하였다.

The more recent Tooke Report of October 2007 has also argued, more strongly, that "a national test of knowledge" should be introduced at undergraduate level in UK medical schools, saying that "A national examination would ... encourage development within medical schools, serve as a safeguard when medical schools are developing new curricula, and ensure core knowledge and skills are taught and assessed ([2], p. 126)." 


MRCP시험에 대한 설명

The Membership of the Royal Colleges of Physicians (MRCP(UK)) examination is..

  • a three-stage, high-stakes, international postgraduate medical assessment, the com- pletion of which forms a critical part of career progression for aspiring physicians in the UK, and is attempted by about 30% of all UK medical graduates.

  • Medical gradu- ates from UK universities and elsewhere sit the first part of the examination as early as 18 months after graduation and most complete the third and final part within a fur- ther 3 years.

  • The format of the examination has been described in detail elsewhere [3-8], and details, example questions, marking schemes, etc., can be found at the examination website [9]. Briefly, the examination consists of three parts.

  • Part 1 and Part 2, which are taken sequen- tially, both consist of best-of-five multiple choice exami- nations,

    • with Part 1 concentrating on diagnosis, basic management and basic medical science,

    • while Part 2 has longer questions involving more complex data interpreta- tion, including photographic and other visual material, and considers more in-depth issues of diagnosis and man- agement within internal medicine.

    • Both examinations are blue-printed to cover the typical range of acute and chronic conditions presenting in the wide range of patients seen in general medical practice, and the diagnos- tic, therapeutic and management options which need to considered.

    • The pass mark is set by Angoff-based crite- rion-referencing coupled with a Hofstee procedure.

  • The third part of the examination, Part 2 Clinical (PACES), is a clinical examination, similar in some ways to an OSCE,

    • in which candidates rotate around five 20-minute sta- tions, seeing a range of patients and simulated patients, typically two or more at each station, and the candidates are required to interview, examine and discuss manage- ment options.

    • Two stations are devoted to communica- tion,

      • with one emphasizing the taking of history and the communication of technical information and

      • the other looking at more difficult communication problems such as breaking bad news or asking permission to take organs for transplantation.

    • Each candidate on each case is assessed separately and independently by two trained examiners, with different examiners at each station.

    • PACES can only be taken after Part 1 and Part 2 have both been passed.




방법

Methods


주 분석

Main analysis


추가 분석

Additional analysis


시험의 세 파트에 대한 설명

The formats of the Part 1, Part 2 and PACES stages of the examination were stable between 2003/2 and 2005/3.

  • The Part 1 examination comprised two separate 3-hour papers each of 100 test items in a one-answer- from-five (best-of-five) format.

  • The written examination of Part2 comprised two separate 3-hour papers each of 100 questions in a one-answer-from-five (best-of-five) format until the last 2003 diet when it increased to three 3-hour papers each of 90 questions.

  • The PACES examina- tion comprised a five-station, structured clinical examina- tion lasting 2 hours, incorporating 10 separate clinical encounters each of which was directly observed and assessed by two different and experienced clinician exam- iners, with each candidate being assessed by 10 examiners in total.

 

문항의 역사적 변화과정

There were three diets of Part 1, Part 2 and PACES each year.

  • From 1989/1 to 2002/1 the Part 1 examination consisted of a single paper containing 300 multiple true- false items.

  • From 2002/2 to 2003/1 the Part 1 exam con- sisted of a similar multiple true-false paper and a separate best-of-five exam with 100 questions.


시험점수

Examination scores


배경

Background variables



의과대학

Medical schools



Compositional variables



입학전 특성

Pre-admission qualifications



교육 퀄리티에 대한 인식

Perceptions of teaching quality



병원에서의 커리어 흥미도

Career interest in hospital medicine



MRCP Part 1 응시자 비율

Proportion of graduates taking MRCP(UK) Part 1


In Cambridge, Oxford and Edinburgh, 40%, 40% and 38% of graduates, respectively, took MRCP(UK), compared with 27%, 24% and 23% of grad- uates of Liverpool, Leicester and Birmingham, respec- tively.



MRCGP 퍼포먼스

Performance at MRCGP


MRCGP (Membership of the Royal College of General Practitioners) is the principal postgraduate assessment for doctors in the UK wishing to become general practition- ers.



가디언지 분석

The Guardian analyses



통계 분석

Statistical analysis



결과

Results



주 분석

Main analysis


다수준 모델링

Multilevel modelling


 

배경 변인의 영향

Effect of background variables



시험 영역간 관계

Correlations between examination parts



의과대학 효과

Medical school effects


 

의과대학 수준에서 Part 2의 퍼포먼스는 Part 1의 퍼포먼스와 유의한 상관관계에 있었다.

At the medical school level, performance at Part 2 corre-lated significantly with performance at Part 1 (r = 0.981, p= 0.004), with the same schools as for Part 1 showing sig-nificant differences from the mean. 


PACES시험에서 Part 1과 Part 2와의 상관관계는 Part1-Part2의 상관관계보다는 조금 낮았지만 매우 유의했다. 네 개의 학교가 평균에서 상당한 차이를 보였으며, 그 중 세 개 학교는 Part 1 과 Part 2에서도 그러하였다.

In the PACES examination, the correlation with perform-ance at Part 1 and Part 2 was a little lower than that found between Part 1 and Part 2, but was also highly significant(Part 1 with PACES: r  = 0.849, p  = 0.0114; Part 2 with PACES:  r  = 0.897, p  = 0.0096). Four schools performed significantly differently from average, three of which were also significant at Part 1 and Part 2 (Oxford above average,and Dundee and Liverpool below average) and in addi-tion London also performed significantly worse than aver-age, although London graduates had been almost precisely at the average for Parts 1 and 2

 


 


Compositional variable 분석

Analysis of compositional variables


 

19개 의과대학 수준에서 분석하였으며, '입학생 수준이 높다'라고 할 때는 '의과대학의 평균적 수준'을 이야기하는 것으로 개별 학생을 의미하는 것이아니다. 개인 수준과 의과대학 수준의 상관관계는 유사할 수 있지만 반드시 그러한 것은 아니다.

In this section we analyse data at the level of the 19 med-ical schools, and whenever phrases such as 'higher pre-admission qualifications' are used it must be emphasized that this refers to 'medical schools whose candidates have higher pre-admission qualifications' and does not  mean'individual candidates with higher pre-admission qualifi-cations'. Correlations and structural models at the indi-vidual and school level may be similar but they need not be [20], and the analyses described here are specifically at the school level of analysis. 


MRCP응시 비율이 높은 의과대학에서 'pre-admission qualifications'이 높은 경향은 있지만 의과대학의 MRCP 퍼포먼스와 응시 비율과의 상관관계는 더 약했다.

Although medical schools with a higher proportion of graduates taking MRCP(UK) tended to have higher pre-admission qualifications (r  = 0.833, p = 0.001, n  = 19), there was a weaker correlation between a medical school's performance at MRCP(UK) and the proportion of its grad-uates taking the exam (r = 0.613, p = 0.005, n = 19). 



MRCP 응시 비율은 pre-admission qualifications 를 통제했을 때 outcome을 예측하지 못했으나, pre-admission qualifications 는 응시 비율을 통제한 이후에도 유의한 예측인자였다. 따라서 MRCP에 응시하는 학생 비율은 독립적으로 효과를 가지지 않는다.

The proportion of graduates taking MRCP(UK) did not predict outcome after pre-admission qualifications were taken into account (β = -0.175, p = 0.559), whereas pre-admis-sion qualifications did predict outcome after taking into account the proportion of graduates taking MRCP(UK) (β= 0.928, p  = 0.006). There is therefore no independent effect of the proportion of a school's graduates taking MRCP(UK). 



모든 변인과 Part 1 퍼포먼스의 관계를 보았을 때 유일하게 pre-admission qualifications 만이 다중회귀분석에서 MRCP퍼포먼스를 예측하였다.

The relationship between all of the variables and Part 1performance was examined using multiple regression,and only pre-admission qualifications predicted perform-ance at MRCP(UK)


 

가디언지 평가 관련

Performance in relation to the Guardian assessments


Table 2에서 (두 세트의 데이터 모두에서) 가장 높은 상관관계를 보여주는 것은 '입학점수'였으며, 이는 '대학입학기준'에 기반한 것이었다. Forward-entry로 다중회귀분석을 하였을 때 entry score가 먼저 입력된 후 '대학입학기준'에 포함되는 다른 변인들은 유의한 예측을 하지 못하였다.

Table 2 shows correlations between the variables reported in the two compilations of data by the Guardian, and out-come at Part 1, Part 2 and PACES. The highest correla-tions, for both sets of data, are with the entry scores, whichare based on university admission criteria. Using a for-ward entry multiple regression, in which the entry score based on the 2003–2004 data was entered first, no other variables apart from university admission criteria were sig- nificant predictors of Part 1, Part 2 or PACES performance.

 

 


 

 

 




 

Additional analysis

 


 


 


고찰

Discussion

 


우리의 결과를 보면, 서로 다른 영국 의과대학에서 교육받은 학생들은 MRCP시험에서 서로 다른 수행능력을 보인다. Part 1을 한 번에 통과한 학생이 높은 3개 대학과 낮은 4개 대학.

Our analysis shows that candidates who have trained at different UK medical schools perform differently in the MRCP(UK) examination. In 2003–2005, 91%, 76% and 67% of students from Oxford, Cambridge and Newcastle passed Part 1 at their first attempt, compared with 32%, 38%, 37% and 41% of Liverpool, Dundee, Belfast and Aberdeen graduates, so that, for instance, twice as many Newcastle graduates pass the exam first time compared with Liverpool graduates (odds ratio = 4.3×).


의과대학 수준에서 Part 1의 퍼포먼스는 Part 2의 퍼포먼스와 거의 완벽한 상관관계를 보이나, (실기시험인) PACES와의 상관관계에서는 (여전히 높긴 하지만) 순위에 약간의 차이를 보인다.

At the medical school level, performance at Part 1 corre- lates almost perfectly with performance at Part 2 (and both are multiple-choice examinations), while perform- ance at PACES, which is a clinical examination, still corre- lates highly with Parts 1 and 2, although there are some small changes in rank order,


개인 수준에서 졸업시험의 결과가 학부의학교육과 졸업후커리어에서의 퍼포먼스를 예측한다는 것은 알려진 바 있다. 비록 'pre-admission academic qualifications'가 MRCP Part 1 수행능력과 의과대학 수준에서 유의한 상관관계에 있었지만, 이러한 관계는 Part1과 Part2와의 관계에 비하면 훨씬 낮았다 'pre-admission academic qualifications '는 따라서 전체 변인 중 62%정도를 설명하는 것이며, 38%는 학교-수준의 변이 또는 다른 알려지지 않은 것에 의한 것이라 볼 수 있다. 성별과 인종에 따른 차이가 다수준 모델에 포함되었으므로, 의과대학 수준에서 인종이나 성별에 따른 차이는 없다.

School-leaving examinations are known at the individual level to predict performance in undergraduate medical examinations and in postgraduate careers [23,24]. Although pre-admission academic qualifications correlate significantly with MRCP(UK) Part 1 performance at the medical school level (r = 0.779), that correlation is sub- stantially less than the correlation found between Part 1 and Part 2 of the examination (r = 0.992). Pre-admission qualifications therefore account for about 62% of the accountable variance, leaving about 38% of the school- level variance dependent on other, unknown, factors. It should be emphasized that because sex and ethnic origin have been entered into the multilevel model at an individ- ual level, there can be no differences at medical school level attributable to ethnicity or sex.


이러한 결과는 크게 selection effect, training effect, career preference의 세 가지로 설명될 수 있다.

There are at least three broad types of explanation for the differences we have found: differences in those entering the schools (selection effects); differences in education or training at the school (training effects); or differences owing to students from different schools preferring differ- ent postgraduate careers (career preference effects).


개인 수준에서 A-레벨 결과가 MRCP Part 1의 퍼포먼스와 상관관계가 있었고, 의과대학마다 'pre-admission academic qualifications'에 명확한 차이가 있었다. 우리의 compositional variable 분석은 의과대학 간 차이의 절반 이상은 어떤 학생이 입학하느냐에 따라 달려있다는 것을 보여주며, 가디언지 자료를 분석한 것도 이를 지지한다.

At the individual level it is known that A- level results correlate with performance in MRCP(UK) Part 1 [24] and there are also clear differences in the aver- age pre-admission qualifications of applicants receiving offers at different medical schools (see Figure 2). Our analysis of compositional variables leaves little doubt that one-half or more of the variance between schools can be explained by differences in intake, and that is supported by the correlations found with the data reported in the Guardian tables,


특히, MRCP의 퍼포먼스는 'pre-admission academic qualifications'로만 예측한것에 비해서 일부 의대에서는 1SD만큼 높거나 1SD만큼 낮았다. 그러나 London의 under-performance는 어떤 것으로도 잘 설명되지 않는다.

In particular, MRCP(UK) performance is about one SD higher than predicted from pre-admission qualifications alone for Leicester, Oxford, Birmingham, Newcastle-upon-Tyne and London, and about one SD lower than expected for Southampton, Dundee, Aber- deen, Liverpool and Belfast. Neither differences in pre-admission qualifications can explain the relative under- performance of London graduates at PACES, compared with Part 1 and Part 2



커리어 선호도가 MRCP 수행능력에 영향을 주었을 수 있는데, 왜냐하면 서로 다른 전공에 대한 자기-선택의 형태로 나타날 수 있기 때문이다. 예컨대 Park house보고에 따르면, 1974년과 1983년 사이에 hospital medicine이 특별히 유행했던 대학과 유행하지 않았던 대학이 있다.

Career preference effects would occur if the differential performance of graduates on MRCP(UK) reflects a form of self-selection into different specialities (and Park house reported, for instance, that amongst those qualifying between 1974 and 1983 that hospital medicine was par-ticularly popular for Oxford, London and Wales gradu-ates, and particularly unpopular for Aberdeen, Dundee and Leicester graduates [25])


그러나 만약 그러한 유행이 영향을 주었다면, 한 학교에서 더 학문적으로 능력이 있는 학생이 특정 과를 선호하고, 그렇지 않은 학생은 다른 과를 선택하여 퍼포먼스의 상관관계와 시험 응시 비율이 유의미하지 않아야 할 것이다.

 If popularity also equated to status and kudos, then it might be that the most academ-ically gifted students at one school might prefer to go into one particular speciality, whereas at another school they might prefer a different speciality. However, the correlation of performance and the proportion taking the exam was non-significant after pre-admission qualifications are taken into account. 




의과대학마다 그들이 얼마만큼의 '가치'를 더하느냐에 차이가 있을 것이며 이는 중등교육에서 잘 알려진 사실이다.

Institutions can differ in the amount of 'value' that theyadd, an effect well known in secondary education [26]. 


만약 커리어 선호와 'pre-admission qualifications'가 모든 차이를 설명하지 못한다면, 논리적인 결론은 의과대학 내에서의 training의 퀄리티 차이이다.

 If career preferences and pre-admission qualifications cannot explain all of the differences between medical schools,then a reasonable conclusion is that that medical schools also differ in the quality of their training in general medi-cine. 


그러나 가디언지의 자료에서 교육과 관련한 그 어떤 척도도 MRCP퍼포먼스와 관련이 있지 않았다.

However, it is of interest that none of the teaching-related measures in the Guardian compilations correlate with MRCP(UK) performance.



MRCP시험은 커리어의 초반에 치르게 된다. 우리 연구에서 의과대학의 교육이 퍼포먼스에 미치는 영향을 보여주는 근거는 recency of graduation이 모든 세 파트에서 퍼포먼스의 예측인자였다는 것이다. 의과대학 간 차이를 보여주는 계수는 Part 1에서 가장 컸고 PACES에서 가장 작았는데, 이는 시간이 흐름에 따라서 학부교육의 효과가 희석됨을 보여준다.

The MRCP(UK) examinations are typically taken early in the career, The impact of university teaching on perform-ance is supported by our finding that recency of gradua-tion is a predictor of performance in all three parts of the examination. The coefficient of variation for medical school differences was largest for Part 1 and smallest for PACES, suggesting that postgraduate education dilutes the effects of undergraduate training as time passes.


의과대학 학생이 교육이 '매우 흥미롭다'라고 응답할수록 MRCP에서 더 잘했다는 사실은 흥미롭다. 그러나 이러한 효과는 'pre-admission qualifications'에 부차적인 것으로, 'pre-admission qualifications'이 높은 학교의 학생들이 의학 수업이 더 흥미롭다고 응답했다.

It is interesting that when a university's students are more likely to report that the teaching of medicine is 'very interesting', then graduates subsequently perform better at MRCP(UK). However, that effect does seem to be secondary to pre-admission qualifications, with students from schools with higher pre-admission qualifications also reporting the teaching of medicine to be more inter- esting.


모든 의과대학에 있어서 또 다른 교란변수는 교육과정의 지속적 변화이다. 그러나 우리의 Part 1에 대한 추가적 분석은 1989년에까지 거슬러 올라가며, 이러한 결과가 지속적long-standing이며, GMC의 TD에 의해서 시작된 의학교육의 변화에 의해서 설명되는 부분은 아주 미미함을 보여준다. 

An additional confounding issue for all schools of medi- cine is the constant change in curricula. However, our additional analysis of Part1 data going back to those tak- ing the exam in 1989 (who would have entered medical school in about 1982) shows that the broad pattern of results we have found is long-standing, and therefore could only partly be explained by the changes in medical education initiated by the GMC in Tomorrow's Doctors in 1993 [27].


개별 의과대학에 대한 더 자세한 분석으로부터 1989년과 2005년 사이에 의과대학간 퍼포먼스의 변이가 매우 적었음을 확인할 수 있다. PBL이 도입된 의과대학이라도 효과가 큰 곳과 그렇지 않은 곳이 있었다. 많은 성과의 향상이 있었음에도 London의 재조직화는 많은 비판을 받았다. 옥스포드와 캠브리지는 1990년대 후반에 급격한 퍼포먼스의 향상을 보여주었으며 웨일즈도 마찬가지였다. 다른 학교들은 약간의 변동이 있었으나, 전반적으로는 일관된 인상을 주었으며, 교육과정 등의 변화가 상대적인 퍼포먼스에 별로 영향을 주지 않는 것으로 보인다.

A detailed examination of individual medical schools (see Figures S11a-11e in additional file 1) shows that for many schools there has been little variation in rel- ative performance between 1989 and 2005. Problem- based learning, introduced in Glasgow, Liverpool and Manchester, has had little obvious impact in the latter two schools, although performance did increase in Glasgow. Despite many, much criticised reorganizations in London, performance overall has improved. Oxford and Cam- bridge both showed sudden increases in performance in the late 1990s, as did Wales. Other schools showed fluctu- ations, but the overwhelming impression is of constancy rather than change, suggesting that curricular and other changes have had little impact on relative performance of schools.


MRCP는 지필시험과 실기시험으로 나뉘는데, 시험이 의사에게 요구되는 모든 지식/술기/태도를 평가할 수는 없다. 물론 내과적 진단과 관리를 포괄적으로 다루며 PACES는 광범위한 실제적 술기를 평가한다.

The MRCP(UK) consists of both written and clinical examinations, and detailed analyses of its rationale and behaviour have been presented elsewhere [3-8]. Of course, the examination does not assess the entire range of knowledge, skills and attitudes necessary to be a success- ful physician, although it does cover diagnosis and man- agement within internal medicine comprehensively, and the PACES examination assesses a wide range of practical skills,


그러나 MRCP는 모든 필요한 역량을 평가할 수 없으며, 평가되지 않는 어떤 역량은 의과대학마다 다른 순위를 보일 수도 있다.

However, MRCP(UK) cannot assess all of the necessary competencies and it is possible that some of those not assessed are also inculcated better by some med- ical schools than others, and this possibility must await further evidence from other sources.


 


 

결론

Conclusion



 

Tooke 보고서

The Tooke Report of October 2007 [2] stated that British medical education urgently needed,


" ... answers to some fundamental questions. How does an individual student from one institution compare with another from a different institution? Where should that student be ranked nationally? Are there any predictors for later careers choices and are these evident in undergraduate training? Which medical schools' students are best prepared for the Foundation Years and, crucially, what makes the difference?" ([2], p. 127)



 

GMC 보고서에서도 국가시험의 필요성을 강조했지만 의과대학 간 차이를 보여주는 증거가 불충분하다고 했다. 그러나 근거가 부족하다는 것이 근거가 없다는 것은 아니며, 의과대학마다 상당한 차이가 있다고 믿을 만한 이유는 충분하다. 미국에서도 의과대학마다 malpractice claims을 받을 가능성이 모두 다르다.

The earlier GMC report of June 2006, Strategic Options for Undergraduate Medical Education [1], had also included a discussion on the potential need to introduce a national medical assessment to ensure that all UK medical gradu- ates have attained an agreed minimum standard of com- petence. However, the report also highlighted the very limited evidence that existed to support the contention that significant differences in ability existed between grad- uates of different UK universities. However, an absence of evidence is not evidence of absence, and there are many reasons to believe that schools might differ [28]; a study in the US, for instance, found that graduates of different medical schools differed in their likelihood of malpractice claims [29]. We believe that our data provide a prima facie case that differences in performance exist between UK medical schools, and thus support the case for the routine collection and audit of performance data of UK medical graduates at all postgraduate examinations, as well as the introduction of a national licensing examination.


 


 2008 Feb 14;6:5. doi: 10.1186/1741-7015-6-5.

Graduates of different UK medical schools show substantial differences in performance on MRCP(UKPart 1,Part 2 and PACES examinations.

Author information

  • 1Department of Psychology, University College London, Gower Street, London WC1E 6BT, UK. i.mcmanus@ucl.ac.uk

Abstract

BACKGROUND:

The UK General Medical Council has emphasized the lack of evidence on whether graduates from different UK medical schoolsperform differently in their clinical careers. Here we assess the performance of UK graduates who have taken MRCP(UKPart 1 and Part 2, which are multiple-choice assessments, and PACES, an assessment using real and simulated patients of clinical examination skills and communication skills, and we explore the reasons for the differences between medical schools.

METHOD:

We perform a retrospective analysis of the performance of 5827 doctors graduating in UK medical schools taking the Part 1Part 2 orPACES for the first time between 2003/2 and 2005/3, and 22453 candidates taking Part 1 from 1989/1 to 2005/3.

RESULTS:

Graduates of UK medical schools performed differently in the MRCP(UK) examination between 2003/2 and 2005/3. Part 1 and 2performance of Oxford, Cambridge and Newcastle-upon-Tyne graduates was significantly better than average, and the performance of Liverpool, Dundee, Belfast and Aberdeen graduates was significantly worse than average. In the PACES (clinical) examination, Oxford graduates performed significantly above average, and Dundee, Liverpool and London graduates significantly below average. About 60% of medical school variance was explained by differences in pre-admission qualifications, although the remaining variance was still significant, with graduates from Leicester, Oxford, Birmingham, Newcastle-upon-Tyne and London overperforming at Part 1, and graduates from Southampton, Dundee, Aberdeen, Liverpool and Belfast underperforming relative to pre-admission qualifications. The ranking of schools at Part 1 in 2003/2 to 2005/3 correlated 0.723, 0.654, 0.618 and 0.493 with performance in 1999-2001, 1996-1998, 1993-1995 and 1989-1992, respectively.

CONCLUSION:

Candidates from different UK medical schools perform differently in all three parts of the MRCP(UK) examination, with the ordering consistent across the parts of the exam and with the differences in Part 1 performance being consistent from 1989 to 2005. Although pre-admission qualifications explained some of the medical school variance, the remaining differences do not seem to result from career preference or other selection biases, and are presumed to result from unmeasured differences in ability at entry to the medical school or to differences betweenmedical schools in teaching focus, content and approaches. Exploration of causal mechanisms would be enhanced by results from a nationalmedical qualifying examination.

PMID:
 
18275598
 
PMCID:
 
PMC2265293
 
DOI:
 
10.1186/1741-7015-6-5
[PubMed - indexed for MEDLINE] 
Free PMC Article


고부담 간호 평가의 다지선다형 문제에서 문항작성오류로 인한 영향(Med Educ, 2008)

Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments

Marie Tarrant1 & James Ware2





도입

INTRODUCTION


적절히 만들어지기만 한다면 MCQ는 높은 수준의 인지적 추론을 검사할 수 있고, 고성과-학생과 저성과-학생을 변별할 수 있다. 그러나 현실에서 자체적으로 개발되는 MCQ는 poorly constructed되는데, 왜냐하면 양질의 MCQ 개발에 대한 적절한 교육과 훈련을 받은 교수가 거의 없기 때문이다.

If properly constructed, MCQs are able to test higher levels of cognitive reasoning and can accurately discriminate between high- and low-achieving students.1,3 The reality, how- ever, is that MCQs on many cross-discipline examin- ations developed in-house are poorly constructed because few teaching faculty have adequate education and training in developing high-quality MCQs.


 

비록 양질의 MCQ를 만드는 가이드라인이 다수의 문헌에 명확히 나와있음에도, 이 가이드라인을 위반하는 경우는 흔다하.

Although guidelines for constructing high-quality MCQs have been clearly outlined in numerous publications,11–16 violations of these guidelines are nonetheless common.


IWF는 문항의 난이도를 어렵거나 쉽게 해서 학생들의 MCQ 퍼포먼스에 영향을 줄 수 있다. 비록 실제적으로 평가된 IWF의 영향은 많지 않지만, 전문가들은 IWF가 학생들의 퍼포먼스에 영향을 준다는 것에 대체로 동의한다.

Item-writing flaws can affect student performance on MCQs, making items either more or less difficult to answer.12,16,17 Although the impact of only a few item- writing flaws has been empirically evaluated,18–22 experts agree that item-writing flaws do affect student performance.


 

어떤 IWF들은 문항을 더 쉽게 만든다.

Some flaws, such as

  • the use of absolute terms (e.g. always, never),

  • the use of  all of the above ,

  • making the correct option the longest or most detailed,

  • using word repeats or logical clues in the stem as to the correct answer, and

  • grammatical clues,

...cue the examinee to the correct answer and make items less difficult.12,15–17,20

 

 

어떤 IWF들은 문항을 더 어렵게 만든다.

Furthermore, experts recommend against

  • using items with negatively worded stems (i.e. not, except),

  • unfocused or unclear stems,

  • gratuitous or unnecessary information in the stem, and

  • the  none of the above  option

...as these formats can make questions more difficult.16,18,22

 

 

복합형 혹은 K-type 문항은 가급적 지양되어야 하는데, confusing하거나 학생들이 부분적 정보만으로 답을 하기 때문이다.

Complex or K-type item formats that have a range of correct responses and require examinees to select from combinations of responses should also be avoided as they can be confusing and allow examinees to answer questions based on partial information.23,24


이 주제에 대해서 Downing은 미국 의과대학 학생들이 보는 시험의 질을 평가하여 33~46%의 MCQ에 IWF가 있음을 확인하였다. 그 결과 낙제인 것으로 분류된 학생 중 10~25%는 그 IWF가 없었다면 합격할 학생들이었다.

In the only published studies on this topic, Downing7,8 assessed the quality of examina- tions given to medical students in a US medical school and found that 33–46%of MCQs were flawed. As a consequence, 10–25% of examinees who were classified as failures would have passed if flawed items had been removed from the tests.7,8





METHODS




5년간 간호대 학부 프로그램의 MCQ 수집

As part of a larger study5 examining the quality of MCQs, we retrieved all high-stakes tests and examin- ations containing MCQs that had been administered in an undergraduate nursing programme over a 5-year period from 2001 to 2005 (n = 121).


제외 기준

we eliminated tests without item analysis data (n = 54), all of which were administered prior to 2003. We also removed tests that were not summative assessments (n = 10) as these were not considered to be high-stakes tests. To ensure enough flawed items on each test analysed, we removed tests with < 50 items (n = 42). Finally, we removed tests with unacceptably low reliability (r < 0.50) (n = 5). Although higher reliabil- ity (r > 0.70) is desirable in high-stakes, classroom- type assessments,25 0.50 is sufficient reliability to allow researchers to draw meaningful conclusions about individual achievement.7 Thus, 10 tests were available for analysis.


MCQ 퀄리티 평가 과정

Procedures for assessing the quality of the MCQs were rigorous and have been described indetail elsewhere.5 Briefly, MCQs were reviewed for the presence or absence of 32 commonly identified violations11–16 to item-writing guidelines by a 4-person consensus panel consisting of expert clinicians and trained item writers. Items were classified as  flawed  if they contained >1of the assessed violations of good item writing. Items were classified as  standard  if they did not contain any of the assessed item-writing violations.8 In total, 15 of the 32 common violations were found in the reviewed papers (Appendix). Each panel member reviewed each question independently; discordance on violations occurred on 13.1%(n = 87) of the items. These items were further discussed until consensus was reached among panel members. Item analysis data were retrieved only after all item classification was complete.


2개의 별도의 척도를 계산함

For each test, 2 separate scales were computed:

  • IWF문항 포함된 것 a total scale which reflected the characteristics of the test as it was administered and

  • IWF문항 제외한 것 a standard scale which reflected the characteristics of a hypothetical test that included only the unflawed items.8

 

포함된 시험들의 특성

All items were 4- option, single-best answer questions with no penalty for incorrect answers. All tests were pencil-and-paper format and were completed in-person by examinees. No computer-based tests were assessed. Test results were computed using optical scanning sheets and a customised software program. In the undergraduate nursing programme, criterion-referenced assessment is used and pass scores are set at 50%.



다음의 결과를 계산함

For the 2 scales assessed in this study (total and standard), the following data were computed:

  • mean item difficulty;

  • mean item discrimination;

  • raw test scores;

  • percent scores;

  • Kuder-Richardson 20 reliabil- ity (KR-20);

  • the number and proportion of examinees passing, and

  • the number and proportion of high- achieving examinees (those scoring ‡ 80%).

 

To enhance comparisons, we also computed the mean item difficulty and mean item discrimination for flawed items but did not calculate total scores for this scale.

 

  • 난이도
    Item difficulty
    is the proportion of examinees answering the question correctly, with lower values reflecting more difficult questions.26

  • 변별도
    Item discrimina- tion
    is computed using the point-biserial correlation coefficient or the correlation between the item and total test score.26 Item discrimination is a measure of how effectively an item discriminates between high- and low-ability students.13 Items with higher discrim- ination values are more desirable.

 

Item-analysis was conducted using IDEAL 4.1, an item-analysis program (IDEAL-HK, Hong Kong, China).27 All other data analysis was conducted using STATA Version 9.2 (Stata Corporation Inc., College Station, TX, USA).28



결과

RESULTS




고찰

DISCUSSION


여기서 검토한 시험에는 flawed items이 지나치게 높았다.

There was an unacceptably high level of flawed items in the tests we reviewed.


MCQ문항을 잘 만드는 것은 시간이 많이 들고 어렵다. 따라서 교사들이 필요한 스킬이 없으면 기관은 적절한 훈련을 제공하여 타당성과 신뢰성을 갖춘 평가를 가능하게끔 해야 할 책임이 있다. 연구자들은 이미 training으로 MCQ의 퀄리티가 상당히 향상되었음을 보여준 바 있다.

Well constructed MCQ items are time-consuming and difficult to write. Therefore, if teachers responsible for assessment and evaluation lack the necessary skills, it is the responsibility of the academic institutions that employ them to provide the necessary training and instruction to enable them to develop valid and reliable assessments.29 Research has shown that training substantially improves the quality of MCQs developed by teaching faculty.6,9 



전반적으로 본 연구의 결과는 flawed item과 학생 성취도 사이의 복잡한 관계를 보여준다. 첫째, 평균적인 난이도 점수는 flawed items가 더 어렵거나 쉬운 것은 아님을 보여준다.

Overall, the results of this study show a complex interaction between flawed items and student achievement. First, mean difficulty scores show that flawed items were not substantially more or less difficult over the 10 tests than were standard items.


이는 IWF가 학생의 MCQ에 대한 응답에 다양한 영향을 줄 수 있다는 것과 IWF의 빈도가 다양하다는 점을 고려하면 놀랍지 않다.

This is not surprising given the varying effects that flawed items can have on student responses on MCQs and the different frequencies of various item-writing flaws on the tests examined.



둘째, flawed items가 10개 시험에 걸쳐서 난이도를 더 유의하게 낮춘 것은 아님에도, standard scale과 비교했을 때 total scale에서 합격률이 더 높았다. 이것은 borderline학생이 flawed item으로부터 이득을 봤음을 뜻한다.

Second, although flawed items across all 10 tests were not substantially less difficult, more examinees were able to pass the total scales compared with the standard scales (94.5% versus 90.9%). This indicates that borderline students benefit from flawed items


셋째, total scale에서 80%이상을 받은 학생은 standard scale보다 더 적었는데, 이는 flawed item이 고성취-학생에게 negative한 효과를 미쳤음을 보여준다. 이들 학생은 고부담 시험에서 testwiseness보다는 지식과 추론에 더 의존할 것이므로 flawed item에서 부당하게 패널티를 받았을 것이다.

  • Testwiseness는 "behav- iours that allow examinees to guess or deduce correct answers without knowing the material, thereby increasing their test scores.17 "를 의미한다. 이러한 결과는 CIV가 학생 성취에 미치는 영향을 명백히 보여준다.

  • CIV는 "the introduction of extraneous vari- ables (i.e. item-writing flaws, test-wiseness) that are irrelevant to the construct being measured and which can increase or decrease test scores for some or all examines.17,30"을 말한다.

Third, fewer examinees scored ‡ 80% on the total scales when compared with the standard scales (14.6%versus 21.0%), demonstrating that flawed test items negatively impact high-achieving students. These students may be more likely to rely on knowledge and reasoning to answer questions on high-stakes assessments and less likely to rely on test- wiseness, and thus are unfairly penalised when questions are flawed. Test-wiseness refers to behav- iours that allow examinees to guess or deduce correct answers without knowing the material, thereby increasing their test scores.17 These findings clearly illustrate the impact of construct-irrelevant variance on student achievement. Construct-irrelevant vari- ance refers to the introduction of extraneous vari- ables (i.e. item-writing flaws, test-wiseness) that are irrelevant to the construct being measured and which can increase or decrease test scores for some or all examines.17,30


본 연구에서 검토된 flawed item은 변별도가 더 낮았다. borderline 학생들의 점수가 인공적으로 부풀려졌을 것이며, 고성취 학생들의 점수가 낮아졌을 때, 평가는 변별력을 잃게 되고, 학생 성취를 변별해내지 못한다.

The flawed test items reviewed in this study had lower discriminating power than did standard items. When test scores for borderline students are artificially inflated and scores for high-achieving students are lowered, assessments lose their discriminating power and there is less differentiation in student achievement.


본 연구의 결과는 flawed item이 학생들의 합격률에 negative한 영향을 준다고 밝힌 이전 연구와 다르다. 이는 놀랍지 않은데, 왜냐하면 IWF의 유형이 다양하고 그 빈도가 다양하기 때문이다. 다른 연구결과와 일관된 점은 flawed item은 unflawed item보다 더 perform이 떨어지고, 학생 성취와 변별력에 부정적 영향을 준다는 점이다.

Findings from this study differ from those of previous research that has found flawed items to negatively affect examinee pass rates.7,8 This is not surprising, however, when we consider the wide variation in types of item-writing violations and the differing frequencies of these violations on various tests and examinations. What is consistent with other findings is that flawed items perform worse than unflawed items and negatively affect student achievement and discrimination.


간호와 같은 전문직 프로그램에서 교사들은 다양한 이해관계자들에 대한 책임이 있고, 여기에는 면허기구와 대중들도 포함된다. 따라서 고부담 시험에서 학생의 수행능력은 examinee와 patient 모두에게 중요한 결과를 가져온다.

In professional programmes such as nursing, teachers are accountable to many stakeholders, including licensing bodies and the public.2,31 Thus student performance on high-stakes tests can have serious consequences for both examinees and patients.17



결론

CONCLUSIONS


 

불행하게도 IWF는 여러 학문분야에서 너무 흔하다. 우리는 borderline student에 대해서 flawed item이 가져오는 의도하지 않은 효과를 보여주었다. 또한 고성취 학생에 대한 효과도 보여주었는데 ,이는 과거에는 드러나지 않았던 것이다. 만약 IWF가 borderline student에게 이득이 된다면 모든 학생들의 점수를 모두 올려서 모두에게 이득이 될 것이라고 짐작하기 쉬우나 우리의 연구에서는 그렇지 않았다.

The presence of item-writing violations is unfortu- nately all too common in teacher-developed exam- inations across many disciplines. We have shown the unintended consequences of using flawed items for borderline students. We also examined the impact of flawed items on high-achieving students, something that has not been done previously. One might naturally assume that if item-writing flaws benefit borderline students, they would benefit all students, raising test scores for everyone. Our findings suggest otherwise.





7 Downing SM. Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad Med 2002;77 (Suppl):103–4.


8 Downing SM. The effects of violating standard item- writing principles on tests and students: the conse- quences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ 2005;10:133–43.


9 Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The quality of in-house medical school examinations. Acad Med 2002;77:156–61.


11 Case SM, Swanson DB. Constructing Written Test Questions for the Basic and Clinical Sciences, 3rd edn. Philadelphia: National Board of Medical Examiners 2001;19–29.


16 Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Educ 2002;15:309–34.


17 Downing SM. Threats to the validity of locally devel- oped multiple-choice tests in medical education: construct-irrelevant variance and construct under-rep- resentation. Adv Health Sci Educ 2002;7:235–41.




 2008 Feb;42(2):198-206. doi: 10.1111/j.1365-2923.2007.02957.x.

Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursingassessments.

Author information

  • 1Department of Nursing Studies, Faculty of Medicine, University of Hong Kong, Hong Kong, China. tarrantm@hku.hk

Abstract

CONTEXT:

Multiple-choice questions (MCQs) are frequently used to assess students in health science disciplines. However, few educators have formal instruction in writing MCQs and MCQ items often have item-writing flaws. The purpose of this study was to examine the impact of item-writing flaws on student achievement in high-stakes assessments in a nursing programme in an English-language university in Hong Kong.

METHODS:

From a larger sample, we selected 10 summative test papers that were administered to undergraduate nursing students in 1 nursingdepartment. All test items were reviewed for item-writing flaws by a 4-person consensus panel. Items were classified as 'flawed' if they contained > or = 1 flaw. Items not containing item-writing violations were classified as 'standard'. For each paper, 2 separate scales were computed: a total scale which reflected the characteristics of the assessment as administered and a standard scale which reflected the characteristics of a hypothetical assessment including only unflawed items.

RESULTS:

The proportion of flawed items on the 10 test papers ranged from 28-75%; 47.3% of all items were flawed. Fewer examinees passed the standard scale than the total scale (748 [90.6%] versus 779 [94.3%]). Conversely, the proportion of examinees obtaining a score > or = 80% was higher on the standard scale than the total scale (173 [20.9%] versus 120 [14.5%]).

CONCLUSIONS:

Flawed MCQ items were common in high-stakes nursing assessments but did not disadvantage borderline students, as has been previously demonstrated. Conversely, high-achieving students were more likely than borderline students to be penalised by flawed items.

[PubMed - indexed for MEDLINE]


한국 의사면허시험에서의 OSCE(Kaohsiung J Med Sci. 2008)

OSCE FOR THE MEDICAL LICENSING EXAMINATION IN KOREA (Kaohsiung J Med Sci. 2008)

Yoon-seong Lee

Office of Medical Education, Seoul National University College of Medicine, Seoul, Korea.




BACKGROUND


한국의 NHPLEB는 미국의 NBME와 유사하며 MLE의 Skills and Attitude Test를 담당하고 있다. 대만과 같이 KMLE는 60년이 넘었으며 1년에 1회 시행된다.

The National Health Personnel Licensing Exami- nation Board (NHPLEB), which is akin to the National Board of Medical Examiners (NBME) in the USA, is responsible for the Skills and Attitude Test of the Medical Licensing Examination. As in Taiwan, the Korean Medical Licensing Exam has been a written exam for more than 60 years, and is usually given once a year.


2006년 한국의 MOHW 실기시험 도입을 선언하였다. 의사면허시험에서 최초의 실기시험은 2009년 후반에 시행되었으며, 정부의 선언은 KSME와 NHPLEB의 요청에 따라 이뤄진 것이었다. 2004년부터 실기시험을 위한 경험을 쌓는 중이었으며, 한국 면허를 취득하고자 하는 해외 의사에 대해서 시행중이었다. 일부에서는 정부가 해외 의사들에 대한 장벽을 높일 것이라는 우려를 했다. 그러나 외국 의사들의 임상역량을 확실히 하기 위한 목적이었다. NHPLEB의 실기시험TFT는 2005년 12월 OSCE의 개요를 제시했다.

In June 2006, the Minister of Health and Welfare of the Korean Government declared that the skill test will be applied, starting with graduates in 2009. This means that the first Skill Test for Medical License will take place in late 2009 or January 2010. The Minister’s declaration was made at the request of the KSME and the NHPLEB. We have also had some experience with a skill test since 2004, which has been used for foreign doctors who want a Korean license [3]. Some people raised the possibility that the Government intended to create a barrier against foreign doctors coming to our country. However, it was introduced to ensure the clin- ical competence of foreign graduates. The Task Force Team for the Skill Test of NHPLEB prepared and pre- sented the outline of an OSCE in December 2005 [4].

 

 


개요

OUTLINE


다음과 같다.

The outline is as follows:



1. The Medical Licensing Examination will consist of Clinical Skills (CS) and Medical Knowledge (MK, including basic and clinical medical sci- ences) examinations. The examinee who passes the CS examination (September–November 2009) will be eligible for the MK examination (January 2010). 


2. The OSCE will have 12 stations (10 minutes per station). 


3. Of the 12 stations, six stations will have a stan- dardized patient (SP) and will assess clinical skills as well as communication skills and clinical rea- soning. Each station will take 10 minutes. 


4. Six stations are for procedural skills, such as sim- ple sutures, venous blood sampling, and measur- ing blood pressure. 


5. A center will have two sets of 12 stations and will run three cycles per day, which will cover 72 exam- inees per day. To examine up to 3,600 examinees a year, it will take approximately 50 working days


6. The assessor will be a professor or a physician with a checklist in each station. The cut-off level will be determined by the modified Angoff method.


7. For a single test day, 28 SPs (seven SPs ×2 turns/ day×2 sets) and 28 assessors will be needed.


 

Figure 2. Scheme of the objective structured clinical examination stations.


OSCE STATIONS


Applicants will spend 10 minutes in the long sta- tion, and 5 minutes in the short station. An additional 5 minutes will be for the interstation written exercise. Twelve applicants will be tested in each cycle. Six will start with long stations and six with short stations, with 120 minutes for a cycle and 5 minutes for a break.


For the attitudes and skill testing (long stations), we have developed 56 clinical situations The list of clinical situations or settings is now available to the medical schools and students, but the checklist will not be available (Table 1).


The applicants will be assessed on their compe- tencies of communication skills, interviewing skills, history-taking, brief physical examination, and order- ing laboratory tests. The assessor will be a professor or a physician who has been trained. The assessor will be at each station and will use a structured checklist.


For the short stations assessing basic procedural skills, we have chosen 40 items such as simple suture of laceration, rectal examination, Foley catheter inser- tion, application of splint (Table 2). The short station will be of 5 minutes duration.


The interstation exam is optional, and will be asso- ciated with the preceding long station. Questions will be related to the clinical situation presented by the SP, and will focus on decision-making, differential diagno- sis, further diagnostic plan, and patient management, for example. It is a written test and will take 5 minutes. It there is no interstation assessment, the applicant will take a 5-minute rest before the basic skill test.




TEST DAY SCHEDULE



The test center will have two sets of identical stations. Each set will run 3 cycles/day (morning, midday, afternoon). Each cycle will take 3 hours (180 minutes); 30 minutes for orientation before the test +120 min- utes for the actual test +5-minute break+25 minutes for changes.


The OSCE center will have two sets of 12 stations, running 3 cycles/set/day, and 12 applicants for a cycle. Therefore, the center can assess 72 applicants each day.


There are usually fewer than 3,600 applicants each year. Accordingly, we estimate about 50 test days are required between September and November.


SPS AND ASSESSORS


Each SP will work for half a day, i.e. an SP will be in the test for one half cycle, 18 applicants a day, two turns a day. Each cycle involves six long stations with an SP. Therefore, for each test day, 14 SPs are needed (six stations with one reserve SP×2 turns).


Twenty-six assessors (or raters) are needed each day. An assessor is present at each of the 12 stations, with one reserve assessor, and two turns per day.



 

 





 2008 Dec;24(12):646-50. doi: 10.1016/S1607-551X(09)70030-0.

OSCE for the Medical Licensing Examination in Korea.

Author information

  • 1Office of Medical Education, Seoul National University College of Medicine, Seoul, Korea. yoonslee@snu.ac.kr

Abstract

Objective structured clinical examinations (OSCEs) will be introduced in the Medical Licensing Examination in Korea next year. To evaluate the competency of new medical graduates, a written examination is not sufficient to test the clinical skills and attitudes of medical school graduates. The Korean Society of Medical Education and National Health Personnel Licensing Examination Board have been preparing for OSCEs to be included in the licensing examination for a number of years, following the declaration by the Minister of Health and Welfare, of the Korean Government. One center in Seoul will provide two identical sets of stations. The OSCE will have 12 stations. Six short stations will test procedural techniques and skills, and six long stations will feature standardized patients. The test items for the short stations and the clinical presentations of the long stations will be made available to applicants. However, the checklists will not be made available. It is hoped that theOSCE will raise the standard of competencies of new medical doctors and change clinical education in the medical schools.

[PubMed - indexed for MEDLINE] 
Free full text


유럽면허시험 - 유일한 나아갈 길 (Med Teach, 2009)

European licensing examinations – The only way forward

JULIAN C. ARCHER

Peninsula College of Medicine & Dentistry, UK





 

의학교육은 global endeavor이다.

Medical education is a global endeavour leading to initiatives such as the World Federation for Medical Education(http://www.ifmsa.org/scome/wiki/index.php?title¼World_Federation_for_Medical_Education_(WFME)) and a virtualglobal medical school IVIMEDS (http://www.ivimeds.org/).Regionally including in Europe, medical specialties are work-ing closer together to support common goals including in areas of assessment (European Respiratory Society 2008). Despite this there is a growing debate about the introduction of national, not regional or international, licensing examinations(Tooke 2007; General Medical Council 2008) Europe should look to develop a regional approach.


의과대학의 퍼포먼스를 표준화하려는 external peer review process가 있다. 그러나 영국에서의 졸업시험에 관한 최근 연구결과를 보면 대학마다 qualifying exam에 대한 서로 다른 접근법을 갖고 있음을 보여준다. 이것의 영향력은 졸업 후에까지 이어져서 서로 다른 의과대학을 졸업한 학생은 이후 postgraduate exam에서도 유의미하게 다른 수행능력 차이를 보여준다. 최근에 졸업한 의사들은 국가 수준에서 공정하게fairly 비교되지 못하고 있다. 따라서 national postgraduate training program에서 유럽 내 여러 국가간 졸업생을 비교하려는 어떤 시도도 더 confounded될 수 밖에 없다.

There are external peer review processes that attempt to standardize medical school performance including assessment processes. However, it is known from recent studies of final examinations in the UK that medical towards schools often have several different approaches their qualifying exa-minations (McCrorie et al. 2008) including standard setting processes differences (Boursicot et al. 2007). The impact of these may permeate throughout a professional’s life to as graduates from different medical schools appear perform significantly different  in subsequent postgraduate examinations (Wakeford et al. 1993; McManus et al. 2008).Newly qualified doctors are not compared fairly at a national level (Ricketts & Archer 2008). Therefore any attempt to compare graduates from across Europe when recruiting to a national postgraduate training programme is additionally confounded. 


유럽면허시험은 여러 유의미한 장점이 있다. 공정한 평가는 높은 신뢰도를 필요로 하며, 높은 신뢰도를 얻기 위해서는 표준화와 구조화가 필요하다.

A European licensing examination would provide a number of significant benefits. Fair assessment requires good reliability and this in turn requires standardization and structure (van der Vleuten 1996).


또 다른 장점은 자원을 pool하는 능력이다. NBME나 ABIM이 대표적인 사례일 것이다. 이들은 평가방법론의 중대한 발전을 이끌어왔다.

 A further major benefit would be the ability to pool resources. In the US, the National Board of Medical Examiners and the American Board of Internal Medicine have led the way in large scale assessment design and implementation(Melnick 2009). This has led to significant advances in assessment methodology helping to inform the wider literature (Schuwirth 2007). 


Coordinated approach는 기준을 설정하고 질을 보장하는 robust, defensible한 방법이다.

A coordinated approach would lead to a robust,defensible way to define; implement, standard set and quality assure assessment. 


공통의 평가를 사용하는 전략은 공통의 의과대학 교육과정을 support하며, 모든 이해관계자들에게 이러한 정보가 제공될 수 있다. 공통의 교육과정과 면허시험은 증가하는 사립 의과대학에 대한 quality assurance를 가능하게 해줄 것이다.

A common assessment strategy would help to support a common medical curriculum for Europe and an opportunity for this to be informed by all stakeholders including patients and employers (Wass 2005). A common curriculum and licensing examination would also allow the quality assurance of the increasing number of private medical schools. 



IMG의사들이 종종 문화적 차이에 잘 통합되지 못하는 것이 이후 평가에서 수행능력이 떨어지는 이유로 언급되고는 한다. psycho-social skill의 평가가 포함된 잘 설계된 면허시험은 전 유럽에 걸쳐 의사에게 기대되는 것이 무엇인지를 모든 지원자들에게 guide해줄 수 있을 것이다.

IMG doctors are often poorly integrated into cultural differences perhaps explaining poorer performances in subsequent assessments such as Multi-source Feedback (Archer 2007). A well designed licensing examination that includes the assessment of psycho-social skills could help guide all candidates in what is expected of doctors working across Europe. 


가장 중요한 것은 patient care를 향상시킨다는 초기단계 근거이다.

Most importantly of all there is some initial evidence that licensing examinations improve patient care.


언어는 장애가 되어서는 안된다. 캐나다 면허시험은 영어와 불어로 동시에 제공된다.

Language should not be seen as a barrier. The Canadian licensing examination is implemented simultaneously in both English and French.


 

의과대학간 다양성이 사라지고 이것이 혁신을 저해할 것이라는 우려가 있다. 그러나 미국에서는 그러하지 않았다. 비록 일부 replacement strategy가 있을 수는 있겠지만, 면허시험은 local assessment를 대체하는 것이 아니고 더 강화시키는 것이다.

Concerns might be expressed about a reduction in diversity across medical school teaching methodology and that this in turn would stifle innovation. This has not been the experience in the US. The licensing examination should be seen as augmenting and not replacing local assessment, although it is likely that some replacement strategy would be adopted by most institutions.


면허시험이 one-off 지식검사로서 feasibility와 비용절감에만 신경쓸 것이라는 우려도 있다. 지식에 대한 검사는 평가에서 중요한 부분이긴 하지만, progress testing과 같은 방법으로 가장 잘 달성될 수 있다.

Concerns might be raised that a licensing examination would most likely consist of a one off knowledge test to achieve feasibility and minimize costs. Testing of knowledge would no doubt be an important part to any assessment approach but this might be best achieved over a number of occasions using methods such as progress testing (McHarg et al. 2005).


현재의 psychometric의 한계에 대한 논의에 더해서 평가에 따른 결정은 평가'프로그램'에 의해서 이뤄져야지 평가'방법method'에 의해서 이뤄져서는 안된다는 것이 인정되고 있다. 평가의 다른 영역에 대한 관심이 높아지면서, 유럽은 complementary한 평가방법을 찾고 다듬을 수 있을 것이다.

In addition there is discussion in the literature about the limitations of current psychometric Vleuten approaches (Schuwirth & van der 2006) and an be acknowledgement that assessment decisions should informed by assessment programmes not methods (van der Vleuten & Schuwirth 2005). With an increasing interest in other dimensions of assessment (Govaerts et al. 2007) Europe would be well placed to explore and refine complementary assess- ment methodologies and methods.




 2009 Mar;31(3):215-6.

European licensing examinations--the only way forward.

Author information

  • 1Peninsula College of Medicine & Dentistry, University of Plymouth Campus, Drake Circus, Plymouth PL4 8AA, UK. julian.archer@pms.ac.uk
PMID:
 
19811118
[PubMed - indexed for MEDLINE]


북미의 의사면허시험: 외부 감사External audit의 가치는? (Med Teach, 2009)

Licensing examinations in North America: Is external audit valuable?

DONALD E. MELNICK

National Board of Medical Examiners, Philadelphia, USA




미국

In the United States, the United States Medical Licensing Examination (USMLE), sponsored, developed and adminis- tered by the Educational Commission for Foreign Medical Graduates, the Federation of State Medical Boards and the National Board of Medical Examiners, is required for entry into practice of all international medical graduates and US graduates holding the MD degree.

 

캐나다

In Canada, the Medical Council of Canada’s (MCC) qualifying examinations have the same function, leading to the Licentiate of the MCC, a credential recognized for licensure in Canada.

 

의무는 아니지만 ABMS, RCPSC, CFPC는 추가적인 audits of the competence를 제공한다.

While not mandatory for medical practice in the US, the national specialty certifying examinations provided by the members of the American Board of Medical Specialties, the Royal College of Physicians and Surgeons of Canada and the College of Family Physicians of Canada provide additional audits of the competence of doctors entering specialty practice.


미국과 캐나다의 면허시험은 명쾌한 설계규격에 따라 만들어진다. 이 설계규격에는 검사되어야 할 내용과 역량을 전문가 합의로 정의하여 만든 의료행위에 관한 descriptive data가 있다. Blueprint에 따라 출제하여 시험 형식에 따라, 시기time에 따라 일관성을 유지한다. 현재 미국과 캐나다 모두에서 시험에 포함되는 내용으로는..

Both US and Canadian licensing examinations are built to explicit design specifications. These specifications incorporate descriptive data about medical practice with expert consensus in defining the content and competencies to be tested. Careful adherence to examination blueprints assures consistency of examinations across test forms and across time. Some of the content categories currently included in both US and Canadian licensing exams are:

  • understand and apply fundamental science,

  • understand and apply clinical science,

  • diagnosis,

  • patient management,

  • communication skills,

  • history and physical examinations,

  • critical appraisal,

  • law and ethics and

  • health promotion and maintenance.


북미의 면허시험은 점차 competency framework에 초점을 두고 있다.

The North American licensing examinations are increas- ingly focused on a broad competency framework.


일반적으로 인정받을 수 있는 competency framework를 사용함으로써 기존의 도구로 '평가가 용이한' 영역 뿐 아니라 다양한 중요한 역량을 평가할 수 있는 평가도구 개발을 자극한다.

Use of a generally accepted competency framework stimulates the development of assessment tools that assess the range of important competencies rather than just the domains that are easy to assess with existing tools.


이 국가적 프로그램들national programs에서는 모든 관련된 이해관계자들이 의료행위에 진입하기 위한 기준에 대한 합의를 개발하는데 포함시킨다. 설계규격과 정책, 시험 내용, 기준 등의 설정과정에 참여한다.

These national programs provide a vehicle for all relevant stakeholders to engage in development of consensus about standards for entry into practice. Stakeholders engaged in the USMLE include

  • clinicians,

  • academics,

  • regulators,

  • patients and students;

they are engaged in the process of developing design specifications and policies, test content and performance standards.



 

구성요소

USMLE has four examination compo- nents administered during medical school and at the point of licensure, and

the MCC examination has two components administered at the end of medical school and at the point of licensure.

 

IMG정책

In the US, international graduates must pass all components of USMLE; the first three components are required before entry into graduate medical education.

In Canada, international graduates complete a separate evaluating exam- ination to become eligible for the MCC qualifying examina- tions.

 

문항 유형

Both US and Canadian examinations include:

  • enhanced multiple-choice questions (including items enriched with multimedia stimuli, low-fidelity simulation through sets of items around a single clinical case, multistep items requiring synthesis and application of knowledge to clinical problems);

  • standardized-patient based assessments focusing on interpersonal, communication and clinical skills as well as professional behaviour; and

  • assessment of medical problem solving and patient management using innovative test formats, such as the computer-based patient care simulation used in USMLE Step 3.


비용

USMLE administers about 141,000 tests annually; MCC administers about 10,000. USMLE occupies about 39h in four test sessions; MCC occupies about 10.5 h in two test sessions. The aggregate cost of USMLE to examinees is $2700; MCC aggregate cost is Canadian $2150.



미국과 캐나다 의료에 대한 가치는 아래의 여섯 가지이다.

Their value to American and Canadian medicine can be summarized in six key arguments:


  • . 공통의 스탠다드의 평등 Equity of common standards. 

  • . 외부 감사를 통한 투명성과 책무성 External audit providing transparency and accountability. 

  • . 혁신을 위한 경계가 있는 환경 Providing a bounded environment for innovation. 

  • . 근거-기반 교육과 규제를 위한 자료 제공 Providing data for evidence-based education and regulation. 

  • . 근거-기반 노동력 유동성 Encouraging evidence-based workforce mobility. 

  • . 양질의 평가 촉진 Fostering high quality assessment.


공통의 국가시험시스템은 공통의 기준에 따르는 평등함을 제공한다. 국가시험이 도입되기 전에는 미국의 면허는 각 주의 권한이었다. 1964년 20개의 각 주별 시험을 검토한 결과, 시험의 내용과 타당도에 상당한 차이가 있다는 우려가 제기되었다. 또 다른 연구에서는 주별로 치러지는 시험의 합격률을 검토하였는데 3개의 주는 10년간 불합격자가 아무도 없었고, 6개주에서는 최소한 10년 중 8년 이상에서 불합격자가 없었다. 서로 다른 다양한 그룹이 '별도로, 하지만 동등하게' 평가될 수 있다는 정책주장은 (다른 나라나 문화에서와 마찬가지로) 미국에서도 완전히 실패한 것이었다. 단일한 국가시험은 의사들의 equity를 보장해주며, 교육적/지역적 배경을 비롯하여 '역량'의 결정에 대해 무관한 다른 여러 요인의 영향과 combating 해준다.

Common national assessment systems provide the equity inherent in common standards. Prior to the universal adoption of a national examination system in the US, licensure was based on assessments provided by each state licensing authority. A study assessing 20 of these examinations in 1964 documented high variability and concerns for the quality of test content and validity (Derbyshire 1965). Another study reviewed pass rates for state-based examinations from 1954 to 1964; three states had no failures in this 10-year period, and an additional six states had no failures for at least 8 of the 10 years studied (Miller 1964). Policies that assert that different groups can be treated ‘separately but equally’ have been utter failures in US race relations as in many other countries and cultures. A single national examination system assures equity for doctors, combating the influence of different perceptions of educational or regional background or other irrelevant factors on decisions about competence.


미국의 대통령이었던 로널드 레이건은 핵 비무장화에 대해서 '신뢰하지만 검증하라'라는 단순한 철학으로 접근하였다. 다수의 사회적 시스템은 '외부 감사external audit'의 가치가 정직성, 투명성, 신뢰에 있다고 인정한다. 의료전문직의 동기와 신뢰에 대한 공공의 회의감이 늘어나는 이 시대에, 국가면허시험은 공정하게 적용될 수 있는 단순하고 투명한 수단을 제공하여 환자들이 어떤 의사를 만나든 최소한의 역량 스탠다드를 보장할 수 있게 해준다. 주마다 지역마다 교육기관마다 서로 다른 복잡한 평가 시스템은 한 시스템에서 평가받은 의사가 다른 맥락에서도 동일하게 평가받을 수 있다는 공통의 assurance를 제공해주지 못한다.

US President Ronald Reagan approached nuclear disarma- ment with a simple philosophy: ‘trust but verify’. Many societal systems recognize the value of external audit in supporting honesty, transparency and trust. In an era of increasing public scepticism about the motives and trustworthiness of the medical profession, national licensing examinations equitably applied provide a simple, transparent means of assuring our patients that doctors have met minimum standards of competency. Complex systems of assessment that vary by state, region or educational institution fail the test of transparency, providing no common assurance that doctors assessed in one system have met the same standard as those assessed in another context.


플렉스너 보고서 시절부터 지금까지 MRCGP, MRCP and PACES 시험에서 의과대학간 systematical한 차이가 있음이 보고되어 왔으며, 여러 근거들은 일관되게 서로 다른 의과대학에서는 서로 다른 결과물이 나온다는 것을 보여준다.

From the Flexner report in the US in 1910 through recent studies documenting systematically different performance based on medical school attended on the MRCGP, MRCP and PACES examinations (Wakeford et al. 1993; McManus et al. 2008), evidence consistently demonstrates that different medical schools produce different results.


당연히, 국가시험이 교육 프로세스에서 핵심적 부분으로서의 'evaluation of progress'를 대체하지는 못한다. 그러나 external audit 없이는 교육 프로세스에 통합되어있는 평가에 대한 신뢰와 여러 기관 간 차이를 verify하지 못할 것이다.

Of course, national examinations are not a substitute for effective evaluation of progress as an integral part of the educational process. However, without external audit like that provided by a national assessment system, trust in assessment that is integrated into the educational process and differs from institution to institution will be impossible to verify.


잘 설계된 국가평가시스템은 boundary conditions, 즉 의료행위를 하는 개개인에게 기대되는 최소한의 요건을 설정해준다. 이 core expectation을 개인과 기관 수준에서 일관성을 만족시킬 수 있는 효과적인 도구를 제공한다. 환자는 어느 곳에 있든 그들을 진료하는 의사가 진료에 필요한 최소한의 지식/술기/행동을 마스터했기를 기대할 권리가 있다.

Well designed national assessment systems provide boundary conditions, establishing minimum expectations for individuals wishing to practice medicine. They offer an effective tool in assuring consistency in meeting those core expectations at the individual and institutional levels. Patients everywhere have the right to expect that any doctor they consult has demonstrated minimum levels of mastery of theknowledge, skills and behaviours necessary to practice.




2004년 미국(과 캐나다)에서 광범위한 이해관계자들이 임상술기와 관련된 역량의 평가를 국가시험에 포함시킬 것을 합의하에 지시하였다. 의학교육에서 임상술기를 가르치고 평가하는 것이 핵심적 요건core expectation이라는 오래된 합의에도 불구하고 (의과대학 인증기준에는 1990년대부터 포함되어 있었음), 임상술기에 대한 평가 도입은 2004년에야 이루어졌을 당시 1/3의 의과대학에서는 학생의 clinical skill을 평가할 공식적인 시스템을 갖추고 있지 않았다. USMLE에 도입되고 1년이 지났을 때 비록 접근법은 서로 달랐지만 거의 모든 학교에서 그러한 시스템을 갖추었다. 국가적 스탠다드가 존재함으로써 boundary condition이 설정되었으며, 모든 교육기관이 core expectation을 만족시킬 것이 권고되었고, 동시에 학교마다 매우 다양하고 혁신적 접근법을 허용하였다.
In 2004 in the US (a decade earlier in Canada), broad stakeholder consensus directed that the national examinations incorporate assessment of competencies related to clinical skills. Despite years of agreement that teaching and assessing clinical skills within medical education is a core expectation (as reflected through medical school accreditation standards since the early 1990s), at the time of the implementation of clinical skills assessment in USMLE in 2004, nearly one-third of medical schools did not have formal systems in place to assess their students’ clinical skills. Within 1 year of the USMLE clinical skills examination implementation, nearly all US schools had implemented such systems, although through very diverse approaches (Giluland et al. 2008). The presence of national standards sets boundary conditions that encourage all educa- tional institutions to meet core expectations while, at the same time, permitting highly variable and innovative approaches within the schools.



국가면허시험은 근거-기반 교육과 규제를 위한 자료를 제공한다. 여러 출판된 연구를 보면 국가시험의 수행능력과 임상에서의 수행능력이 정적 관계를 가진다.

National assessments for licensure provide data for evidence-based education and regulation. Several published studies show positive relationships between performance on national examinations and clinical performance (Tamblyn et al. 1998, 2002, 2007; Norcini et al. 2002; Papadakis et al. 2005; Holmboe et al. 2008).

  • Tamblyn and her colleagues have demonstrated clear relationships between performance on examinations Canadian licensing and subsequent clinical performance in primary care over at least 10 years of practice.

  • They have also demonstrated a relationship between assess- ment of clinical skills in a licensing examination and subsequent complaints to licensing authorities.

  • Norcini and colleagues and Holmboe and colleagues have also shown positive relationships between scores on the internal medicine certifying and recertifying examinations and subsequent measures of clinical performance.

  • Studies by Papadakis and colleagues, using the very crude outcome measure of disciplinary actions by licensing authorities, showed among significantly lower licensing examination scores disciplined physicians,



이 연구들은 지식과 술기가 부족한 의사를 가려내기 위한 시험의 활용을 지지하는 근거를 제공한다. 이러한 자료들을 모으는 것은 국가시험이 없이는 불가능할 것이다. 국가시험에 기반한 연구는 개개의 기관과 개별 의사들에게 중요한 benchmarking을 제공해주며, 특히 시험이 교육 프로세스의 기대 성과와 잘 align된 경우 더 그러하다. 

These studies provide evidence to support the use of examinations to identify those who may lack knowledge and skills to practice effectively. Aggregation of such data would not be possible without national assessment systems. Studies based on national examination data provide valuable benchmarking for individual institutions and indivi- dual doctors, particularly when the examinations are well aligned with the expected outcomes of the educational process.


국가시험은 의료인력의 유동성을 지원해준다. 인력의 유동성은 물론 국가시험 없이도 가능하지만 (유럽처럼) 국가시험은 근거-기반 portability를 제공해준다.

National examinations support mobility of the medical workforce. Of course, such mobility can occur without national examination programs, as is the case in Europe today; however, national examinations allow evidence-based port- ability of practice credentials.


마지막으로, 국가시험은 자원의 aggregation을 가능하게 해주는데 (인적자원과 경제적 자원), 이는 양질의 평가를 위해서 필요하다. 의사국가시험이 들어가는 비용이 높아 보이지만 aggregate cost는 수백 수천의 기관이 개별적으로 투입하는 비용의 총합보다 훨씬 적다.

Finally, national examinations allow the aggregation of resources, both human and fiscal, necessary for high quality assessment. While the national examination systems described here are expensive, their aggregate cost is much less than the combined cost of assessment in the hundreds or even thousands of institutions involved in education of doctors.


많은 poorly designed 평가는 신뢰성있는 정보를 제공해주지 못하고, 따라서 학습과 프로그램의 효과성 향상에 기여하는 바가 적다. 그러나 신뢰도가 갖춰지더라도 이는 필요조건이지 충분조건은 아니다. poorly designed된 경우, local이든 national이든, 학습자와 교수자의 행동을 distort할 수 있다.

Many poorly designed assessments do not provide reliable information and are, therefore, of little utility in guiding learning or program effectiveness. However, while reliability is requisite, it is not sufficient. Poorly designed assessment systems, whether local or national, may distort the behaviour of learners and teachers (Newble & Jaeger 1983).


적절한 지식과 전문성을 가지고 문제를 해결해야만 평가가 효과적일 것이다.

Assessment is an art and science apart from medicine, and it will be effective only when adequate knowledge and expertise are brought to bear on its challenges.






 2009 Mar;31(3):212-4.

Licensing examinations in North America: is external audit valuable?

Author information

  • 1National Board of Medical Examiners, Philadelphia 19104, USA. dmelnick@nbme.org

Abstract

The United States and Canada both have long-standing, highly developed national systems of assessment for medical-licensure based outside the institutions of medical education. This commentary reviews those programs and explores some of the reasons for their implementation and retention for nearly a century. The North American experience may be relevant to dialog about national or European assessments for medical practice.

PMID:
 
19811117

[PubMed - indexed for MEDLINE]

USMLE Step 1과 Step 2에서 멀티미디어의 활용 (Acad Med, 2009)

Use of Multimedia on the Step 1 and Step 2 Clinical Knowledge Components of USMLE: A Controlled Trial of the Impact on Item Characteristics

Kathleen Z. Holtzman, David B. Swanson, Wenli Ouyang, Kieran Hussie, and Krista Allbee





1990년대 초반 USMLE가 도입된 이후, 모든 세 Step 모두 환자 vignettes의 형태로 된 MCQ를 활용하였다. 주어진 상황을 기초과학 관점에서 해석하고, 진단을 내리고, 다음 단계를 특정하는 문항들이었다. 이러한 문항은 의사결정능력 평가를 위한 low-fidelity clinical simulation의 하나이다. CBT가 발전하면서 patient situation의 authenticity는 느리지만 꾸준히 증가되어왔고 멀티미디어를 문항 줄기에 포함시킴으로써 fidelity를 더 향상시키고자 하였다.

Since the introduction of the United States Medical Licensing Examination (USMLE) in the early 1990s, all three Steps have commonly used multiple- choice questions (MCQs) that take the formof patient vignettes describing a clinical situation and challenge examinees to interpret the situation froma basic science perspective, reach a diagnosis, or specify the next step in patient care. Such test items are reasonably viewed as low- fidelity clinical simulations designed to assess medical decision-making skills.1 Since the advent of computer-based testing (CBT) for USMLE in 1999, the authenticity with which patient situations are described has slowly but steadily increased, and further improvements are planned as all three Steps incorporate multimedia into item“stems,” enriching the fidelity with which patient findings can be presented.




방법

Method


심음

Heart sounds


Using these lists, NBME staff worked with an external vendor to obtain recorded auscultation findings and develop an interactive Flash-based format (Figure 1) for examinees to use in eliciting auscultation results and viewing related physical findings (i.e., movement of the chest and neck veins).



 


 

Step 2 CK study items


Step 1 study items


절차

Procedure


USMLE는 점수와 상관없는unscored 문항을 점수를 내는scored 문항과 함께 포함시켜서 실제로 사용하기 전에 문항에 대한 점검을 한다. 응시자들은 이에 대해 사전에 안내받는다.

USMLE routinely includes unscored material intermingled with scored test items to obtain information about (pretest) items and new item formats prior to scored use, and examinees are notified prospectively about this practice in registration materials.



문항 특성

Item characteristics studied


For each version of each study item, six indices were calculated from USMG and IMG item responses.

  • The first was the item difficulty (P value), calculated as the proportion of examinees who responded to the item correctly.

  • The second was a logit transform of the item difficulty  log [p / (1  p)], where p is the item difficulty. This nonlinear transformation is commonly used because the “distance” from an item difficulty of 0.50 to 0.60 is much smaller than the “distance” from 0.85 to 0.95.

  • The third was an index of item discrimination: the item-total (biserial) correlation, calculated as the correlation between the item(scored 0/1 for incorrect/correct) and the reported total score.

  • The fourth was an r-to-z transformation of the biserial correlation, also commonly used to correct for nonlinearities in the magnitude of correlation coefficients.

  • The fifth index was the mean response time in seconds, and

  • the sixth was the mean of the natural logs of response times, a transformation commonly used to normalize response times.



결과

Results


 


난이도

Item difficulty


Table 1 provides means for USMGs and IMGs for item difficulty, item discrimination, and response time for both Step 1 and Step 2 CK study items.



변별도

Item discrimination


멀티미디어 활용 문항의 변별도가 더 낮았다.

Multimedia items were less discriminating than matched text versions for both groups in each Step;


응답 시간

Response time


응답시간의 차이는 매우 컸다.

Differences in mean response times were very large from a practical perspective, with multimedia versions of items requiring, on average, 30 to 60 seconds longer for a response than text versions (P  .0001 for both groups in each Step).


고찰

Discussion


청진 소견을 멀티미디어로 제시하는 것은 문항의 난이도와 응답시간에 상당한 영향을 주었으며, 문항의 변별도에 대한 영향력은 중등도였다. 응시자들은 청진소견이 authentic, undigested 형태로 주어졌을 때보다 텍스트로 주어졌을 때 더 쉽게 해석하였다. 평균적으로 응시자들이 심음을 청진하게 하는 것은 문항당 50초 정도 응답시간을 증가시켰다.

Use of multimedia for presentation of auscultation findings has a sizable impact on item difficulty and response time, as well as a more modest impact on item discrimination. Examinees can more readily interpret auscultation findings described textually using standard medical terminology than the same findings presented in a more authentic, undigested format. On average, requiring examinees to listen to heart sounds, rather than read medical terminology accurately interpreting them, increased response times by roughly 50 seconds per item.



 



 2009 Oct;84(10 Suppl):S90-3. doi: 10.1097/ACM.0b013e3181b37b0b.

Use of multimedia on the step 1 and step 2 clinical knowledge components of USMLE: a controlled trial of theimpact on item characteristics.

Author information

  • 1National Board of Examiners, 3750 Market Street, Philadelphia, PA 19104, USA. kholtzman@nbme.org

Abstract

BACKGROUND:

During 2007, multimedia-based presentations of selected clinical findings were introduced into the United States Medical Licensing Examination. This study investigated the impact of presenting cardiac auscultation findings in multimedia versus text format on itemcharacteristics.

METHOD:

Content-matched versions of 43 Step 1 and 51 Step 2 Clinical Knowledge (CK) multiple-choice questions describing common pediatric and adult clinical presentations were administered in unscored sections of Step 1 and Step 2 CK. For multimedia versions, examinees used headphones to listen to the heart on a simulated chest while watching video showing associated chest and neck vein movements. Text versions described auscultation findings using standard medical terminology.

RESULTS:

Analyses of item responses for first-time examinees from U.S./Canadian and international medical schools indicated that multimediaitems were significantly more difficult than matched text versions, were less discriminating, and required more testing time.

CONCLUSIONS:

Examinees can more readily interpret auscultation findings described in text using standard terminology than those same findings presented in a more authentic multimedia format. The impact on examinee performance and item characteristics is substantial.

[PubMed - indexed for MEDLINE]


선진국에서 대규모 면허시험의 효과: Systematic review (BMC Med Educ, 2016)

The impact of large scale licensing examinations in highly developed countries: a systematic review

Julian Archer1* , Nick Lynn1, Lee Coombes2, Martin Roberts1, Tom Gale1, Tristan Price1 and Sam Regan de Bere1




배경

Background


의료 규제는 역사적으로 자기 자신을 '의사'라고 부를 수 있는지를 확립하는 것이었고, '미용사'나 '사기꾼'을 배제시키는 것이었다. 그러나 의료 규제는 이러한 정적인 접근에서 벗어나 (단순히 등록을 마치면 되는 것에서 벗어나) 더 역동적이고 미래적prospective한 것으로 바뀌었다.

Medical regulation has historically involved establishing who is appropriately qualified to call themselves a med- ical doctor and keeping certain people, such as “barbers” [1] and “charlatans” [2], out. But medical regulators have moved away from this static approach, of simply holding a register, to a more dynamic and prospective one.


관심의 많은 부분이 의료행위를 시작하는 시점에 가있다. 의과대학에서 근무지로 이행하는 이 시점은 IMG가 의료인력에 편입되는 순간이기도 하다.

Much of the attention has perhaps understandably focused at the beginning of clinical practice. This transi- tion point from medical school into the workplace is also often a point at which international medical graduates enter the workforce.


왜  면허시험이 중요하다고 여겨지는지는 이해하기 쉽다. 면허시험이 시행되는 시점은 의과대학이 졸업생을 배출하는 순간이다. 요구 기준을 충족시키는 사람만이 그 사법권 내에서 의료행위를 할 수 있다. 따라서 NLE를 지지하는 사람들은 국민집단이 '안전하게' 진료를 할 수 있는 사람만이 '자격을 갖추게 됨'을 reassure하게 해준다고 주장한다.

It is easy to understand why the concept of a licensing exam is hailed as important. They sit at the point at which medical schools graduate their students. Only those who achieve the required standards are then allowed to practice in the jurisdiction. In this way, the advocators argue, a nation’s population is reassured that only capable doctors who can practice safely are quali- fied.


그러나 NLE가 정확히 어떤 형식을 따르는지, 어떤 내용을 커버해야 하는지, 누구를 평가해야 하는지 등은 논쟁의 영역으로 남아있고, 특히 의사와 여러 의료직들이 국과와 지역간 경계를 넘어서 다니기 때문에 더 그러하다.

But exactly what form NLEs should take, what they should cover and who they should assess remains a source of debate [5–11], as doctors and other healthcare workers increasingly wish to move across national or regional (state) boundaries [5, 12–16].


영국은 현재 NLE가 없으나, 역사적으로 external examiner에 의존해왔다. GMC와 같은 기관에서 나온 방문 의학교육자들이 의과대학의 질을 확인한다. 국외에서 들어온 의사들은 다른 경로를 따르게 되는데 PLAB라는 시험을 본다.

The United Kingdom (UK) does not currently have a NLE but has historically relied on external examiners – visiting medical educators from other organizations – and General Medical Council (GMC) inspections to assure quality across UK medical schools. Doctors from overseas take a different route into licensure in the UK; predomin- ately through the Professional and Linguistic Assessments Board (PLAB) examination [17].


그러나 2014년 말, GMC는 NLE를 도입하겠다고 발표했고, 2015년에는 모든 영국 졸업생과 비-EEA 졸업생이 응시해야 하는 MLA를 2021년까지 도입하기 위한 timeframe을 내놓았다. 유럽법이 EEA졸업생은 면제 범위에 둔다.

However at the end of 2014 the GMC announced that it planned to establish a NLE and in June 2015 it laid out a timeframe for the introduction of a ‘Medical Licensing Assessment’ (MLA) which will ultimately be taken by all UK graduates and non-European Economic Area (EEA) graduates who wish to practice in the UK by 2021 [18]. As European Law stands EEA graduates will be exempt under freedom of movement legislation [19].

 


 


방법

Methods


Data sources and searches


Study selection


Data extraction, synthesis and analysis


 

결과

Results



우리는 문헌에서 수많은 논란은 찾았지만 근거는 그것보다 훨신 덜 찾았다. 우리가 문헌들을 타당도 프레임워크에 배치하였을 때 73개 문헌 중 24개만이 면허시험에 대한 타당도 근거를 가지고 있었다. 남은 50개 문헌은  informed opinion이거나 editorials이거나 단순히 논쟁을 지속시키는데 기여한 것일 뿐이었다.

We found a lot of debate in the literature but much less evidence. After we mapped the papers to the validity framework, only 24 of the 73 papers were found to contain validity evidence for licensing examinations. The remaining 50 papers consisted of informed opinion, editorials, or simply described and contributed to the continuing debate. We summarize the overall review process in Fig. 1.

 


 

표2는 24개 문헌이다. 이 중 내용타당도에 대한 것은 4개, 응답프로세스에 대한 것은 3개, 내적구조에 대한 것은 4개였다.

Table 2 summarizes the 24 papers mapped to the validity framework. Of these reviewed papers only four offered evidence for content validity [27–30], three for response process [27, 29, 31], and four for internal struc- ture [27–29, 32].


다른 variable과의 관련성 근거

Relationship to other variables as evidence for validity


relationship to other variables를 본 것은 세 종류로 나누었다. 과거와 미래의 퍼포먼스, 환자 성과/불만과의 관련성, 자국내 수련받은 의사와 IMG와의 차이

The papers that explored the relationship to other vari- ables, as evidence for validity, we sub-grouped into three areas of enquiry: prior and future performance by individ- uals in examinations; relationship to patient outcomes and complaints; and specifically the variation in performance between home-trained doctors and IMGs.


1. 의과대학에서 잘 했던 학생이 이후의 시험도 잘 보았다.

First, several authors explored the relationship between medical school examination performance and subsequent established large scale testing e.g. the USMLE [28, 33–35]. Overall they found, perhaps not surprisingly, that those who do well in medical school examinations also do well in subsequent testing.


2. 근거가 혼재되어 있는데, NLE가 진료의 퀄리티에 영향을 준다는 근거는 거의 없다라는 리뷰문헌.

Second, there is mixed evidence on the relationship with other variables when NLE test scores are compared with criterion based outcomes around complaints and patient welfare. Sutherland & Leatherman concluded in a 2006 review that “there is little evidence available about [national licensing examinations’] impact on qual- ity of care” across the international healthcare system [37].


3. IMG가 조금 뒤떨어지는 것으로 나온다. 이는 영어 실력이 부족해서일 수도 있지만 스위스의 연구결과에서는 IMG가 뒤떨어지는 영역이 '의사소통기술'이 아닌 다른 부분이었다.

Third, a series of papers each demonstrated that IMGs do less well in large scale testing [32, 35, 40, 41]. Some argue that the differences were due to a lack of proficiency in spoken English [32, 40], but a paper from Switzerland found that while IMGs did less well than Swiss candidates in their Federal Licensing Examination, the IMGs’ lower scores were in areas other than communication skills [27].

 


 

Consequential validity

Consequential validity


본 리뷰의 중요한 부분 중 하나는 환자성과가 NLE의 도입으로 향상된다는 근거가 부족하다는 점이다.

An important finding of this review is the lack of evidence that patient outcomes improve as a conse- quence of the introduction of national licensing exams.


비록 Norcini나 Tamblyn의 연구결과에서 시험의 중요성을 훌륭하게 주장하였지만, 이들의 주장은 상관관계에 그칠 뿐이며 인과관계에까지 도달하진 못하였다. 다른 말로 하자면, 더 좋은 의사들이 NLE에서 더 잘한다는 근거는 있지만, 의사들이 NLE의 결과로 더 향상된다는 근거는 없으며, 이러한 전-후 비교가 문헌에서는 부족하다. 또한 다른 교란요인은 USMLE에서 점수를 잘 받은 사람이 더 나은 기관에서 더 나은 직장을 갖는다는 것이다.

Although the aforementioned studies by Norcini et al. [38] and Tamblyn et al. [39] demonstrate excellent argu- ments for the importance of testing, and medical educa- tion more generally, their findings are limited to establishing correlations between testing and outcomes and not causation. In other words, there is evidence that better doctors do better in NLEs, but not that doctors improve as a consequence of introducing NLEs; this kind of before and after evidence is absent in the extant lit- erature. One confounding factor to a causal link between testing performance and subsequent care is the fact that those who do well in the USMLE get the better jobs in the better institutions [36, 42].


전체적으로 보았을 때 일부 저자들은 NLE는 전문직으로 들어서는 진정한 장벽이 아니며, 따라서 대중을 보호하지 못한다고 주장하기도 한다. 예컨대 USMLE를 보는 거의 모든 지원자가 궁극적으로 이 시험에 통과한다.

Overall, some authors argue that NLEs are not real barriers to entry into the profession, and therefore do not protect the public. For example nearly everyone who takes the USMLE passes it in the end [43].


NLE가 의과대학 교육과정에 미치는 영향에 대한 확실한 그림도 없다. 한 연구에서 1/3의 응답자가 "교육과정의 목표, 내용, 강조점"이 달라졌다고 응답하였다. 한 연구에서는 기존에 NLE가 존재하는 상태에서 NLE에 더해진 새로운 요소에 초점을 두며 과연 NLE가 의과대학의 관심을 국가적으로 부족하다고 드러난 술기/기술에 두게 할 수 있을지에 대한 의문이 제기되어 왔으며, 이 연구에서는 clinical skills component가 그러한 사례라고 언급하였다.

There is no clear picture from the literature as to the impact of NLEs on the medical school curricula. The study found that over one third of respondents reported changes to the “objec- tives, content, and/or emphasis of their curriculum” (p.325) [46]. While the study focuses only on the intro- duction of one new component of a licensure exam, within an already well established NLE, it does raise the question of whether NLEs can be used to focus medical schools’ attention to nationally identified skills/know- ledge shortages, as appears to be the case with the clinical skills component in this study [46].


그러나 동시에 NLE가 균질성을 장려하거나 교육과정 설계의 혁신을 저해하는지에 대한 의문도 있다. 그러나 플로리다의 치과의사 사례를 차치하더라도, NLE가 동질성homogeneity를 장려한다는 근거는 없다.

At the same time however, this raises the question that NLE exams may encourage homogeneity or a lack of innovation in curriculum design. Yet aside from one dental example in Florida [34], there appears to be no empirical evidence that NLEs encourage homogeneity




고찰

Discussions


예를 들어 NLE점수가 낮은 지원자가 궁극적으로 덜 좋은less respected 또는 퍼포먼스가 떨어지는 기관에서 근무하는 결과를 낳는다는 것을 보여준 연구가 있다. 나아가 규제가 헬스케어에 미치는 영향에 대한 리뷰문헌에서, 저자들은 NLE합격점수가 환자진료 혹은 미래의 Displinary action의 예측인자가 된다는 근거가 "희박"함을 밝혔다. 우리의 리뷰도 이러한 결과를 지지한다.

For example, studies demonstrate that candidates with lower NLE scores tend to end up working in less respected institutions [36, 42] and poorer performing or- ganizations [51]. Moreover, a comprehensive review on the role of regulation in improving healthcare by Suther- land and Leatherman [37], found “sparse” evidence to support the claims that NLE pass scores are a predictor of patient care or future disciplinary actions. Our review supports that conclusion.



NLE와 의사의 수행능력 간에 인과관계는 아니어도 상관관계에 대한 타당도 근거는 있으며, 이것이 NLE에 찬성하는 주장이 될 수는 있지만, 이 역시 NLE의 목적에 따라 달려있을 것이다. Schuwirth는 최근 "NLE의 목적이 대중들에게 '면허를 받은 의사는 안전하고 독립적 진료 수행이 가능하다'라는 것을 보여주는 것이다"라고 했다. 유사하게 Swanson and Robert는 NLE의 역할에 대해서 "환자, 대중, 고용기관에 '어디서 수련을 받은 의사이든 최소한의 역량은 갖추었음'을 보장하기 위한 것"이라고 하였다. 그러나 Schuwirth가 지적한 바와 같이 대중의 안심reassurance는 "최소한, 적어도 그 일부는 대중의 인식에 달려 있다"라고 하였다. 여기서 위험한 점은 대중과 정책결정자가 NLE의 역살이라고 인식하는 것과, NLE가 실제로 달성하는 것의 잠재적 차이이다. NLE가 환자안전을 향상시킨다는 잘못된 신뢰는 - NLE가 실제로 하는 일이 대중을 안심시키는 것 밖에 없을 때 - 의료 규제의 다른 중요한 측면으로부터 관심을 distract시키는 것이 된다.

That there is validity evidence for the correlation, as opposed to causation, between NLEs and doctors’ per- formance may in itself be an argument for national li- censing [4], but this will depend on the policy purpose of the NLE. Schuwirth has recently pronounced that, “In essence the purpose of national licensing is to reassure the public that licensed doctors are safe, independent practitioners” [52]. Similarly, Swanson and Roberts point to the role of NLEs in “reassuring patients, the public, and employing organisations that, regardless of where their doctor trained, they can be sure of a minimum level of competence” [4]. However, as Schuwirth notes, public reassurance is, “at least partly, based on public perception” [52]. The danger here is a potential disjunc- ture between what the public, and indeed policy-makers, perceive that NLEs do, and what they actually achieve; misplaced trust in the impact of national licensing to enhance patient safety, when what they actually do is simply reassure the public, may potentially divert atten- tion from other important aspects of medical regulation.


마지막으로 IMG의사에 대해서 포함/배제/공정함에 대한 어려운 문제가 있다. 스웨덴에서는 IMG의 경험에 따르면 스웨덴 시스템이 적극적으로 역량있는 IMG들에게 불이익을 준다고 인식하며, 스웨덴 시스템은 결함이 많고, Overlong하고 frustrating하다. 이러한 어려움은 다수의 캐나다 연구에서도 밝혀진 바 있다.

Lastly, there are difficult questions raised about inclu- sion, exclusion, and fairness in respect to IMG doctors [14]. In Sweden, which has a regulatory system similar to other countries across Europe and elsewhere, IMGs’ experiences suggest that the Swedish system may ac- tively disadvantage competent IMG practitioners; partic- ipants viewed the Swedish system as flawed, overlong, and frustrating [44]. Such difficulties have also been highlighted by a number of Canadian studies [13, 53], providing some descriptive evidence of the way in which practitioners, provincial licensing authorities, and em- ployers use the system to balance the demands arising from physician shortages, making it difficult for both IMGs and those that employ them to negotiate the li- censing system.



연구의 강점과 약점

Strengths and weaknesses of the study


 

 

의사와 정책입안자들에 대한 함의

Implications for clinicians and policymakers


 

NLE를 반대하는 사람 뿐 아니라 지지하는 사람에게도 근거는 취약하다.

The weakness of the evidence base exists for those who argue against national licensure examinations [7], as well as for those who advocate such a system [55–57].


답해지지 않은 문제들

Unanswered questions and future research


IMG가 뒤쳐진다는 강력한 통계적 근거가 있지만, 그 이유는 불명확하다.

Whilst a strong body of statistical evidence exists to show IMGs perform less well in licensure examina- tions than candidates from the host countries, [27, 59] the reasons for this phenomenon remain unclear.


NLE도입을 통해 의사의 수행능력이나 환자 안전이 향상된다는 근거는 부족하나, 시험결과와 Overall performance의 상관관계는 강력하다. 따라서 NLE를 도입하는 것의 이득은 기존의 규제 시스템이 얼마나 효과적으로 작동하는가에 달려있을 것이다. 따라서 정책입안자들과 규제기구는 one size fits all 접근법에서 벗어나야 하며, NLE를 기존 '규제 시스템'의 관점에서 근거를 살펴보아야 한다.

We have argued that the evidence for NLEs improving doctor performance and patient safety as a consequence of their introduction is weak, whereas the evidence for a correlation between test results and overall performance is strong. As such, the relative benefits of introducing aNLE may well be contingent upon the efficacy of exist-ing regulatory systems. As such, policy-makers and regu-lators may consider moving beyond a one size fits allapproach to NLE; evidence should be examined in lightof existing regulatory systems 


Conclusions


The main conclusion of our review is that the debate on licensure examinations is characterized by strong opin- ions but is weak in terms of validity evidence.


9. Ricketts C, Archer J. Are national qualifying examinations a fair way to rank medical students? Yes. BMJ. 2008;337:a1282.


51. Noble ISG. Are national qualifying examinations a fair way to rank medical students? No. Br Med J. 2008;337:a1279.



49. McMahon GT, Tallia AF. Perspective: Anticipating the challenges of reforming the United States medical licensing examination. Acad Med. 2010;85(3):453–6.


26. Cook DA, Lineberry M. Consequences validity evidence: evaluating the impact of educational assessments. Acad Med. 2016;91(6):785–95.



Ahn, D., & Ahn, S. (2007): Reconsidering the Cut Score of the Korean National Medical Licensing Examination



 

 




The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education.

Author information

  • 1Department of Medical Education (MC 591), College of Medicine, University of Illinois at Chicago, 60612-7309, USA. sdowning@uic.edu

Abstract

The purpose of this research was to study the effects of violations of standard multiple-choice item writing principles on test characteristics, student scores, and pass-fail outcomes. Four basic science examinations, administered to year-one and year-two medical students, were randomly selected for study. Test items were classified as either standard or flawed by three independent raters, blinded to all item performance data. Flawed test questions violated one or more standard principles of effective item writing. Thirty-six to sixty-five percent of the items on the four tests were flawed. Flawed items were 0-15 percentage points more difficult than standard items measuring the same construct. Over all fourexaminations, 646 (53%) students passed the standard items while 575 (47%) passed the flawed items. The median passing rate difference between flawed and standard items was 3.5 percentage points, but ranged from -1 to 35 percentage points. Item flaws had little effect on test score reliability or other psychometric quality indices. Results showed that flawed multiple-choice test items, which violate well established and evidence-based principles of effective item writing, disadvantage some medical students. Item flaws introduce the systematic error of construct-irrelevant variance to assessments, thereby reducing the validity evidence for examinations and penalizing some examinees.

[PubMed - indexed for MEDLINE]


문항 작성의 질 관리: 의학교육에서 고부담시험용 MCQ 도입의 경험 (Med Teach, 2009)

Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations


JAMES WARE1 & TORSTEIN VIK2

1Health Sciences Centre, Kuwait University, Kuwait, 2Norwegian University of Science and Technology (NTNU), Norway





도입

Introduction


수월성의 표지자를 찾아서 트레이닝 프로세스에 포함시키는 것이 적절하다고 생각하였음.

It was also felt appropriate to look for markers of excellence and build them into the training process.

  • Susan Case and David Swanson’s NBME web monograph gives useful guidelines and tips how this goal might be achieved (Case & Swanson 2004).

  • Other sources include a series of papers written by Haladyna and Downing (1989), who have also produced evidence suggesting that item writing flaws (IWFs) can prejudice the outcome of high stakes examinations (Haladyna & Downing 1989; Downing 2005).

  • It is an important area of educational research that has stimulated further examination in the health sciences (Tarrant & Ware 2008).


방법

Methods


모든 노르웨이 의과대학은 6년제이며, NLE는 없다.

All Norwegian medical schools have a 6-year curriculum, however, there is no national licensing examination, and each school has their specific educational model.


K1은 기억 회상과 이해, K2는 적용과 추론이다. 임의로 결정한 목표는 최소 50%의 K2문항을 출제하는 것이다.

Particular emphasis was put on recognising items testing lower levels of cognition, K1 (recall and comprehension) and higher levels, K2 (application and reasoning). This is a modification of a proposal made by Irwin and Bamber (1982) and also accords with the classification used by the IDEAL Consortium(Prideaux & Gordon 2002). An arbitrary goal was set for NTNU examinations being delivered with at least 50% K2 items.



negative marking없이 결과를 계산하였다. 문항분석은 IDEAL 소프트웨어로 시행하였다.

Following the delivery of each MCQ paper the results were computed without negative marking (Downing 2003). Item analysis was carried out using the IDEAL software (vers. 4.1). The post hoc reviews confirming the results were done with the aid of these data. The item statistics are based on classical test theory and the output is presented in the format shown in Table 1, (Osterlind 1998). Also available on the output sheet is the mean group result, variance, SD, Kuder–Richardson Reliability and SE of Measurement. Data have been used both from the performance data of each item and also the whole test, particularly p-values and upper-lower item discrimination based on the top and bottom27%of candidates. The frequency of candidates marking each option was also noted and where  5% of candidates marked a distracter it was determined to be functional. A further analysis was carried out after the removal of items with p-values  85%.


변별도는 세 단계로 나눈다.

Arbitrary levels of discrimination were used to create ranges which reflect three levels of the discrimination power:

  • >0.40, excellent;

  • 0.30–0.39, good and

  • 0.15–0.29, moderate.

 

0.15이하는 유의한 변별도가 없다고 간주하였다.

Below 0.15 was considered as having no discrimination power of significance.



Results


longest option은 가장 흔한 IWF였다. 그 외에 다른 것은..

The longest option was the commonest IWF (55%) and thefour others were: word repeats in vignette and correct option(2/31), logical clues (4/31), sentence completions (6/31) and a negatively worded question (2/31). 



Discussion


무엇이 Item writing violation에 들어가는지 논의할 부분이 많다. 기관 내에서의 규칙을 정할 필요가 있다. NTNU에서는 negative marking을 사용하지 않기로 했고, 그럴만한 근거가 있다.

There still remains much discussion about what constitutes an item writing violation, with available empiric data being rather few (Haladyna & Downing 1989). Notwithstanding the controversies, it still remains reasonable to set rules for an institution, and we believe the list given in Appendix 2 are worth avoiding until such time as we know that any inclusion does not affect the test outcome. This becomes more important when the guessing factor, inherent in any selected response item format test, is accounted for. At NTNU negative marking was not used and there is good evidence for avoiding such a strategy (Downing 2003).


고부담 시험의 다섯 가지 기준은 다음과 같다.

The five criteria for a high stakes end of course we would (viz., graduation) or year-end examinations recommend are the following: 

  • 1. 내부 스타일을 따를 것. Strong adherence to an in-house style: for NTNU see Appendix 1. 

  • 2. K2는 50%이상 될 것 The proportion of K2 items is at or above 50%. 

  • 3. 50% 이상의 답가지가 5% 수준에서 기능할 것 Greater than or equal to 50% of all distracters shall be functioning at the 5% level. 

  • 4. 60% 이상의 문항이 중등도 혹은 그 이상의 변별도를 가질 것 Greater than or equal to 60% of items shall have moderate or better discrimination using set ranges. 

  • 5. IWF가 10% 미만일 것.  The frequency of IWFs agreed for the institution shall be <10%.




답가지가 4개이거나 3개일 때도 잘 기능한다는 근거는 있지만 다섯 개의 답가지를 활용했다. 몇 개로 하든지 이는 상당히 임의적인 결정이다. 정답률이 0.85 이상의 값을 갖는 문항을 제거하고 나서 바람직한 functional distractor 비율이 50% 이상이 되어야 한다고 보았다.

NTNU chose to use five options, although there is evidence that four or even three option MCQs function as well (Haladyna & Downing 1993). Whatever number chosen, and this may be a quite arbitrary decision, an important part of quality assurance is to determine that the number of options that function justifies the number set as a policy. We believe that after removing items with p-values  0.85 the desirable functional distracter propor- tion should be >50%.


Maastricht school 에서는 평가하고자 하는 인지수준은 문항의 형태가 결정하는 것이 아니라, (문항의) 내용이 결정하는 것이며, vignettes을 사용하는 것이 K2평가를 보장하지 않는다고 하였다.

The influential Maastricht school (Schuwirth & Van der Vleuten 2003) stresses that item format is not the arbiter of cognitive level tested, but rather the content. using vignettes does not guarantee an item testing at K2.

 

 


 


Case S, Swanson DW. 2004. Item writing manual: Constructing written test questions for the basic and clinical sciences, National Board of Medical Examiners publications, retrieved in 2004, from: http://www.nbme.org/ aboutitem/writing.asp


Tarrant M, Ware J. 2008. The impact of itemwriting flaws in multiple-choice questions on student achievment in highy-stakes nursing assessments. Med Educ 43:198–206.


Downing SM. 2003. Guessing on selected-response examinations. Med Educ 37:670–671.


Downing SM. 2005. The effects of violating standard itemwriting principles on tests and students: The consequences of using flawed items on achievement examinations in medical education. Adv Health Sci Educ 10:133–143.




Appendix 2


문항작성의 오류

IWFs to be avoided 


1. Grammatical clues, found when using sentence com- pletions. The option with an incorrect grammatical flow is automatically eliminated by most candidates


2. Logical clues, based on information in the stem also being used in the correct keyed option. Test wise candidates are quick to spot this flaw. 


3. Words repeat, where the stem has a complete or part of a word that is clearly identified in the correct keyed option. 


4. Convergence cues, usually based on multiple facts used in the options. The good candidate quickly adds up these facts and finds the correct option having most repeaters in it. Or, where more than two options deal with similar areas to the exclusion of others, which are the distracters and then serve little purpose. 


5. The longest option is the correct keyed option because of the number of qualifying statements added to justify it as the best choice. 


6. Lost sequence in presentation of data, failure to use ranges and mixed units, as well as overlapping data, or no normal values given. All these flaws add to the uncertainty and, therefore, become confusing. 


7. Use of absolute terms such as never, always, only etc which are seldom appropriate qualifiers for clinical statements and the option is eliminated by a good candidate. 


8. Use of vague terms such a frequently, occasionally or rarely (among others) which then cause uncertainty and are usually eliminated as being fillers. 


9. Use of negative(s) in the question. These items are frequently misunderstood as one is not expecting the formulation to be in the negative. Alternatively, the correct option is so implausible so that it shall not apply under any circumstance. 


10. Use of EXCEPT in the stem as part of the question formulation. Although seldom confuses, these items identify the correct keyed option as often being out of sequence with the others without the use of any knowledge. 


11. The use of none or all of the above (NOTA or AOTA) as the last option. Writing options that fulfill these absolutes: NOTA, often provide clues; while AOTA rewards partial information.


12. Failure to pass the Hand Cover Test (HCT) increases uncertainty about the question being asked, or leaves the examinee guessing. 


13. Unclear language, ambiguities, gratuitous information, vignette not required etc. 


14. Use of interpreted data. Not infrequently a complex vignette is followed by a reference to the condition, disease or diagnosis followed by a question which requires no reference to the information given in the vignette, only knowledge of the condition. 


15. Inaccurate information, including implausible options.






 2009 Mar;31(3):238-43. doi: 10.1080/01421590802155597.

Quality assurance of item writing: during the introduction of multiple choice questions in medicine for highstakes examinations.

Author information

  • 1Faculty of Medicine, Health Sciences Centre, Kuwait University, Safat, Kuwait. jamesw@hsc.edu.kw

Abstract

BACKGROUND:

One Norwegian medical school introduced A-type MCQs (best one of five) to replace more traditional assessment formats (e.g. essays) in an undergraduate medical curriculum. Quality assurance criteria were introduced to measure the success of the intervention.

METHOD:

Data collection from the first four year-end examinations included item analysis, frequency of item writing flaws (IWF) and proportion of items testing at a higher cognitive level (K2). All examinations were reviewed before after delivery and no items were removed.

RESULTS:

Overall pass rates were similar to previous cohorts examined with traditional assessment formats. Across 389 items, the proportion of items with >or=5% of candidates marking two or more functioning distracters was >or=47.5%. Removal of items with high p-values (>or=85%), this item distracter proportion became >75%. With each successive year in the curriculum the proportion of K2 items used rose steadily to almost 50%. 31/389 (7%) items had IWFs. 65% items had a discriminatory power, >or=0.15.

CONCLUSIONS:

Five item quality criteria are recommended: (1) adherence to an in-house style, (2) item proportion testing at K2 level, (3) functioning distracter proportion, (4) overall discrimination ratio and (5) IWF frequency.

[PubMed - indexed for MEDLINE]


의사면허시험과 과제(Journal of Health Specialties, 2013)

National licensing examinations and their challenges

Cees PM van der Vleuten

School of Health Professions Education, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands





UG 및 PG 수준에서 NLE도입에 어려움을 겪는 나라가 많다.

There are many countries in the world struggling with introducing national licensing examinations at both undergraduate and postgraduate levels.


NLE에 대한 공통적 요구는 한 국가의 의학교육연속체 내에서도 훈련프로그램과 그 훈련프로그램에서 수련을 받는 의사들의 이질성이 존재하기 때문이다. 네덜란드는 매우 균질한 훈련프로그램을 운영하는 나라이다. 동일한 규제를 받는 졸업시험을 치르게 하는 강한 규제를 받는 국립 의과대학들이 있다. 그리고 법적으로 정해진 프로그램 성과를 따르는 국가 기반의 학부교육 프로그램이 있다. 모든 졸업후 수련 프로그램에 대해서 실시되는 매우 강력한 국가 단위의 인증 시스템과 결합되면 네덜란드 의과대학 졸업생의 균질함에 대해 이해하기가 더 쉬워진다.

A common need for national licensing exams depends on the heterogeneity of the training programs in the medical education continuum of a country and in the training programs preparing learners to enter that continuum. Countries may vary substantially in terms of the heterogeneity of training. The Netherlands is an example of a country with very homogeneous training programs. There are state schools with strongly regulated programs leading to equally regulated exit exams. Thereafter, comes state-based undergraduate medical training that follows legally defined program outcomes. Combine this with a very strict national accreditation system for all postgraduate training programs governed by the Dutch equivalent of royal colleges elsewhere, and it is easy to understand the homogeneous character of the products from Dutch medical schools.


네덜란드의 관점에서, 국가시험이 이 훈련역속체에 큰 의미를 지닐 것인가? 아마 그렇지 않을 것이다. 그러나 사우디 아라비아의 상황은 무척 다르다. 지는 10년간 국립과 사립을 막론하고 의과대학들에는 큰 문제trebling이 있었다. 사우디 아라비아는 의학교육에 국가 커리큘럼이 없어서 훈련 프로그램과 심지어 훈련 프로그램의 성과에도 diversity가 존재한다. 또한 중등교육에도 엄청난 격차가 존재하여 (국립 사립 모두), 많은 의과대학들이 1년의 pre-professional program을 운영하여 모든 입학생이 의학을 배우기 전 동일한 수준이 되게끔 준비시킨다. 심지어 오늘날 SCHC는 졸업후 트레이닝에 대해서 야심찬 점검을 시작했는데, 오랜 기간 무시되어 왔던 것이다. 따라서 국가시험등의 형태로 질관리를 하는 것은 사우디의 상황에서는 합리화될 수 있다.

From a Dutch perspective, will national examinations add much to this training continuum? Probably not. The situation in Saudi Arabia seems very different. Over the last 10 years there has been a trebling of medical schools in the Kingdom,[6] both federally sponsored and private. Saudi Arabia does not have a national curriculum for medical training hence there is diversity in the existing training programs and even outcomes. There is also wide diversity in secondary school education available in the Kingdom, both private and federal, and many medical schools include the equivalent of a 1 year pre-professional program to ensure that all students admitted are brought to the same level before starting their medical studies. Even today the Saudi Commission for Health Specialties is starting an ambitious overhaul of the accreditation of postgraduate training, which has long been neglected. So in Saudi Arabia the educational continuum is very heterogeneous. Therefore, quality control in the form of national examinations seems to be justified in the Saudi situation,


영국 의과대학에게 주어진 것과 같은 큰 자유는 의과대학이 전체 의료시스템 내에서 개별 요구를 address할 수 있게 도와주며, one-size-fits-all 방식으로 하지 않는다.

Greater freedoms, as espoused by the UK medical schools, can allow for schools to address individual needs within the whole healthcare system and not just one size, of product, fits all approach.


SCHS는 모든 사우디 의과대학 졸업생이 응시해야 하는 NLE를 개발을 지원하기로 결정하였다. 그러나 많은 난관이 남아있는데, 가장 중요한 것은 양질의 문항을 만드는 것이다. 그 외 다른 것들은 기술적인 문제지만, 역시나 쉽지만은 않다. 기본적으로 두 가지 방식이 있다. 첫 번째는 문항을 더 큰 시험기관에서 '구매'하는 것이다. 두 번째는 자체적으로 면허시험문항을 개발하는 것이고 사우디는 이 방식을 택했다. 첫 번째 방식은 상대적으로 쉽고 빠르게 이뤄질 수 있으며, (자체개발로는) 최소한 몇년간은 이렇게 testing service를 위해 기존에 있는 문항만큼의 퀄리티를 내기가 어려울 것이라고 볼 수도 있다. 그러나 바람직한 방법은 사우디가 택한 방법이다. 여러 이유가 있지만 기본적으로는 다음의 세 가지 때문이다. Ownership, Capacity building, Sustainability. 더 나아가, 평가와 교육은 동전의 양면과 같아서 어떤 형태의 시험이 되었든 NLE와 교육프로그램은 서로 연결이 되어 있어야 한다. 이는 외부에서 구입한 shelf exam으로는 달성하기 어렵다.

The Saudi Commission for Health Specialties has made the decision to support the development of a National Licensing Examination to be taken by every medical graduate in the Kingdom. However, there will be many challenges ahead, of which the pivotal one is the production process of high quality test material. All other challenges are technical, but not to be taken lightly. There are basically two routes. The first is to actually ‘buy’ a licensing exam from one of the larger testing institutions around the world. The second is to develop licensing exams in house, and the Saudi Commission has chosen this route. The first route would have been relatively easy and can be done fairly quickly and it is suggested that it may be difficult to match the quality of the established testing services, at least for several years. However, the desirable approach is the one chosen by the Saudi Commission. There are many arguments but fundamentally they come down to three: ownership, capacity building and sustainability. Moreover, because assessment and teaching are two sides of the same coin, there must be an obvious linkage between any form of licensing examination and the teaching programs. This becomes more difficult to achieve when an off the shelf exam is purchased.


시험에 대한 ownership은 시험문항을 그 지역의 상황에 맞게 tailoring하는 것이고, 교육 시스템을 바람직한 방향으로 이끌 것이다. 훈련 프로그램을 평가에 맞추어, 또는 그 반대 방향으로 지속적인 조율attuning이 필요하다.

Ownership of testing is about tailoring the assessment to local circumstances and it will then drive the teaching system in a desired direction (according to the adagium “assessment drives learning”).[7] There will be an ongoing process of attuning the training program to the assessment and vice versa.


6. Bajammal S, Zaini R, Abuznadah W, Al-Rukban M, Aly SM, Boker A, et al. The need for national medical licensing examination in Saudi Arabia. BMC Med Educ 2008;8:53.






National licensing examinations and their challenges



School of Health Professions Education, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands


면허와 인증의 맥락에서의 평가(Teach Learn Med, 2013)

Assessment in the Context of Licensure and Certification

John J. Norcini

Foundation for Advancement of International Medical Education and Research, Philadelphia,

Pennsylvania, USA

Rebecca S. Lipner and Louis J. Grosso

American Board of Internal Medicine, Philadelphia, Pennsylvania, USA





지난 25년간 세 가지 주된 힘이 licensure and certification에 영향을 미쳐왔다.

Over the past 25 years, three major forces have had a sig- nificant influence on licensure and certification:

  • 교육 프로세스에서 교육 성과로 the shift in focus from educational process to educational outcomes,

  • 의사의 커리어 전반에 걸친 학습과 평가의 필요성 the increasing recognition of the need for learning and assessment throughout a physician’s career, and

  • 새로운 평가의 풍경을 연 기술과 pschometrics의 변화 the changes in technology and psychometrics that have opened new vistas for assessment.


이러한 힘에 반응하여 면허와 인증 프로그램은 다음의 변화를 겪었다.

To respond to these forces, licensure and certification programs have

(a) 시험 구성/채점/시행의 방식 향상 improved the ways in which their examinations are constructed, scored, and delivered;

(b) 평가방법의 레파토리 확장 expanded their reper- toire of methods of assessment; and

(c) 결정을 타당화하기 위한 연구에 대한 투자 invested in research intended to validate their decisions.


면허와 인증에 미치는 힘

FORCES INFLUENCING LICENSURE AND CERTIFICATION


교육 성과

Educational Outcomes


1990년대 초반에, 교육의 일반적인 초점이 프로세스에서 성과로 옮겨갔다. 전 세계 규제기구에 의해서 의학교육도 이 접근법을 점차 도입하기 시작했다.

In the early 1990s, the focus in general education shifted from process to outcome.1 Driven by the regulatory bodies around the world, medical education has begun to gradually adopt this approach.


좋은평가는 성과로의 움직임에 있어서 핵심되는 것이며, 왜냐하면 그 자체로 '적절한 결과가 달성되었음'을 확인해주기 때문이다. 따라서 '성과'의 도입은 면허와 인증 기구들이 그들의 결정을 내리는 기반이 되어주는 평가를 어떻게 생각하는지에 큰 영향을 주었다. 특히, 환자와의 의사소통/대인관계기술/프로페셔널리즘 등과 같은 것은 역사적으로 평가에 포함되어오지 않았는데, 지난 25년간 새로운 평가법등이 도입되었다.

Good assessment is the linchpin of the outcomes movement, as it is the means for ensuring that the appropriate results have been achieved. As such, the introduction of outcomes has had a profound effect on howthe licensing and certifying bodies con- ceive of the assessments on which they base their decisions. In particular, competencies such as communications with patients, interpersonal skills, and professionalism had not historically been included in the assessments of the licensing and certi- fying bodies. Over the past 25 years, this has led to research in, and the introduction of, new methods of assessment



평생학습과 평가에 대한 요구

Need for Lifelong Learning and Assessment


학습과 평가에 대한 요구가 평생 지속된다는 것을 지지하는 네 가지 근거가 있다.

The lifelong need for learning and assessment is supported by four key pieces of scientific evidence related to patient care.

  • 환자는 다양한 만성, 급성 상황에서 받아야 할 care의 절반 정도밖에 받지 못한다.
    First, McGlynn and colleagues showed that patients receive only about half of the care that they should for a variety of chronic and acute conditions.5

  • 둘째, 의사의 지식/술기/근거중심의료에 대한 순응도가 시간이 지남에 따라 떨어진다.
    Second, Choudhry and colleagues showed that continued learning and assessment throughout a physician’s career was critical because physicians’ knowledge, skills, and compliance with evidence-based patient care decline as a func- tion of time.6

  • 의사들은 자신의 강점과 약점을 정확히 자기-평가해내지 못한다. 그러나 적절한 학습도구는 의사들이 그 역량의 gap을 찾아내고 줄여가는데 도움을 줄 수 있다.
    Third, researchers found that physicians are unable to accurately self-assess their strengths and weaknesses with respect to patient care, but with appropriate learning tools to help physicians identify and fill the competency gaps, effective change can occur.7–10

  • 넷째, 환자들은 의사들이 몇 년 단위로 평가되고 있다고 믿으며, 의학의 발전을 따라가고 있다고 믿는다.
    Fourth, patients believe that physicians are being assessed every few years and are keeping pace with medical advances.11,12


ABMS 의 MOC프로그램은 현재 '평생 인증'에 반하여 진행되는 것으로, 지속적인 학습과 정기적인 평가가 대중들에게 의사들이 근거-중심 환자진료를 의사의 커리어에 결쳐 꾸준히 따라가고 있음을 확신시켜주는데 필요함을 인정하는 것이다. 유사하게, 면허 역시 '평생의 특권'에서 'MOL'로 옮겨가고 있으며, 정기적으로 의사들이 자신의 면허를 갱신하게끔 한다.

The ABMS Maintenance of Certification (MOC) programs, which nowrequire ongoing activity as opposed to lifetime certification, acknowledge that continuous learning and periodic assessment are important to assuring the public that physicians are keep- ing pace with evidence-based patient care throughout their ca- reers.13 Likewise, medical licensure has begun to transition from an initial privilege to Maintenance of Licensure, which period- ically requires physicians to renew their license.14



테크놀로지와 Psychometrics

Technology and Psychometrics


지난 25년간 시스템의 상호연결성과 컴퓨팅 능력에 엄청난 진전이 있었고, 의사가 근무하는 방식 역시 25년 전과는 매우 달라졌다.

The past 25 years have also seen tremendous advances in computing power and the interconnectivity of systems.15 As a result, the way that physicians work is quite different than it was 25 years ago, and this has led to the incorporation of technological ad- vances into new assessment approaches.


문항 분석, 시험 분석의 영역에서는 IRT가 발전하였다.

In the areas of item analysis, examanalyses, and scoring the major advance has been the growth of Item Response Theory (IRT).18 IRT is...

  • a sophisticated mathematical model that provides precise esti- mates of examinee ability and evaluates how well assessments and individual items work without assuming equal difficulty of items.


면허와 인증에 미치는 힘에 대한 반응

RESPONSES TO THE FORCES INFLUENCING LICENSURE AND CERTIFICATION


시험 구성/채점/시행의 향상

Improved Test Construction, Scoring, and Delivery


 

Adaptive testing.


더 짧은 시험 시간 내에 더 정확하게 결과를 알려준다. IRT 방법을 활용하여, 피험자의 능력이 시험 프로세스 내내 추정되고, 추정이 이뤄질 때마다 최대한의 정보를 줄 수 있는 시험문항이 제공되고 언제 특정 피험자가 시험을 끝낼지는 정지규칙stopping rule(능력 추정 신뢰도의 충분조건)에 따라 이뤄진다.

Although there are several approaches to adaptive testing (e.g., pure or multistage), they all offer the advantage of shorter testing times with higher levels of preci- sion than standard, fixed length tests. Adaptive testing typically employs IRT methodology.19 Estimates of examinee ability are made throughout the testing process. Each time an estimate is made items that provide maximum information are adminis- tered, and stopping rules (i.e., sufficient confidence in the accu- racy of an ability estimate) are used to determine when the exam ends for a particular examinee.



Test scoring.


테크놀로지 발전으로 피험자가 시험을 보면서 취하는 대부분의 행동을 잡아낼 수 있게 되었고, 복잡한 채점과 분석 방법이 가능해졌다. 얼마나 시간을 쓰는가, 답을 얼마나 바꾸는가, 얼마나 검산(검토)를 하는가, 누가 먼저 시험을 종료하는가 등.

Technology now enables the capture of most examinee actions during a computer exam, and it, in turn, has al- lowedfor more sophisticatedscoringandanalyses. This includes a window into the amount of time examinees spend on items and their test-taking behaviors (e.g., frequency of changing an- swers, items marked for review, and the order of examination completion).


이러한 복잡한 과업을 위한 측정방법(rule-based logic and regression- based and Bayesian algorithms)도 더 복잡해졌다.

Measurement solutions to scoring these complex tasks, such as rule-based logic and regression- based and Bayesian algorithms, have become more sophisti- cated and effective in medical licensure.20,21




Test design.


시험 설계를 위한 평가공학의 도입이 가능해졌고, 다양한 conceptual design framework이 있다. 근거를 모으고 해석하는 방식이 평가의 확장된 목적과 더 직접적으로 연관된다.

Implementation of assessment engineering ap- proaches to building tests are now possible, and the conceptual design framework can support various methods.22 It helps ensure that the way in which evidence is gathered and interpreted is more directly linked to the intended purpose of the assessment.


다른 분야와 마찬가지로 연산능력의 향상이 기여하였는데, 시험을 컴퓨터에서 시행하게 되고 정보를 자동으로 수집하면서 문항 유형을 더 realistic하게 만들 수 있다.

Like advances in other areas, improvements in computing have opened the door to advances in the authenticity of the assessment. Delivery of exams on the computer and the au- tomatic capture of the data allows for more realistic features and item types.


마지막으로, 미국에서 EHR 도입을 강제하는 것은 의사의 진료행위 전반에 걸친 평가를 가능하게 한다. 잘 개발된 결정-지원 도구의 활용은(왓슨 등) 평가 접근이 더 달라질 것이다. 

Finally, the mandate for implementing electronic health records in the United States will enhance the ability to assess practice performance across the breadth of a physician’s prac- tice. Use of well-developed tools that provide decision support, such as Watson and Isabel, might further change the approach to assessment.23,24


문항 개발과 조합

Item development and test assembly.


AIG가 등장했다. 또한 ATA는 내용/측정특성/보안 등을 고려하여 구체적인 조합을 만들어준다.

In terms of item devel- opment, efficiencies have been gained through the Automated Item Generation approach.25 Advances in technology have also enabled the use of sophisti- cated Automated Test Assembly routines to ensure that tests are built to a specific set of specifications including content, mea- surement properties, and security considerations.



데이타 법의학

Data forensics.


레코딩 장치가 더 작고 강력해져서 부정행위의 방식도 다양해졌다. 면허와 인증과 관련한 stake가 크기 때문에 이러한 데이타 법의학의 과학은 측정 커뮤니티에서 좋은 반응을 얻고 있는데, 퍼포먼스와 응시 행태에서의 이상한 부분(빠른 대답)을 발견하며, 추가 연구가 필요하다.

Small recording devices and more pow- erful, subtle ways of communicating have enabled unprofes- sional behaviors such as cheating or illegally obtaining exam content. The stakes associated with licensure and certification have grown such that jobs and financial incentives are closely linked to successfully maintaining credentials. The science of data forensics has been the major response of the measure- ment community. Data forensics employ statistical procedures to identify anomalies in exam performance and behavior (e.g., rapid response to questions) that require further investigation.



새로운 평가방법

New Methods of Assessment



적절하게 도입된다면 구술시험은 MCQ의 정보를 보완해줄 수 있다. 그러나 outcome movement에 따라 임상스킬, 커뮤니케이션, 프로페셔널리즘 등을 평가해야할 요구가 높아졌다. 초기의 면허와 인증에서 규제기구는 시뮬레이션을 더 강조하는 방식으로 응답하였다.

When deployed properly, the oral examination can supplement the information derived from MCQs. With the out- comes movement, however, there followed the need to assess a wider array of competences including clinical skills, communi- cation with patients, and professionalism.26 For initial licensure and certification, the regulatory bodies responded by increas- ing emphasis on simulation in their examination processes.27



시뮬레이션

Simulation.


SP와 컴퓨터/마네킹 방식이 있다.

Two types of simulation are now included in licensure and certifica- tion: standardized patients28,29 and computer/mannequin-based simulations.30


처음 SP를 활용한 곳은 MCC에서 1993년

One of the first standardized-patient-based assessments used as part of licensure was offered by the Medical Council of Canada in 1993.


컴퓨터/마네킹 기반의 시뮬레이션에서 좋은 사례는 ABIM의 interventional cardiology

Computer/mannequin-based simulation includes a wide va- riety of different methods but one example is the device used as part of the American Board of Internal Medicine’s (ABIM) maintenance of competence for interventional cardiology.31



근무지 기반 평가

Workplace-based assessment.


아무리 복잡해져도 시뮬레이션이 환자 대면을 대체할 수는 없다.

As sophisticated as it has be-come, simulation is still not a substitute for real patient encoun-ters. 


면허와 인증 기구는 근무지에서의 평가 필요성을 두 가지 다른 방향에서 바라보았다.

The licensing and certifying authorities have taken the need for assessment in the workplace in two different directions.

    • For trainees, they have delegated responsibility for assess- ment to training program directors and developed and/or re- searched tools to support them in their efforts. Specifically, chart-stimulated recall (called case-based discussion in the United Kingdom) was developed by the American Board of Emergency medicine and the mini-CEX was developed by the ABIM.32,33

    • For practicing doctors, the licensing and certifying boards have begun to explore the use of practice performance as a basis for assessment. ABIM has developed a series of Perfor- mance Improvement Modules. These web-based tools lead doc- tors through a review of their patient data, often for a specific condition.



타당도 연구

Increased Validation Research


MOC에 연관되는 stake는 높이며, hospital privileges를 얻기에 앞서서  credential을 요구하는 병원의 숫자가 늘어남을 보았다. stake가 높아짐에 따라 프로그램의 가치에 대한 논쟁이 늘어났고 어떻게 실제 (진료) 세계에서의 문제를 반영할 것인가가 과제가 되었다. 이에 따라 validating the decision을 위한 연구가 이뤄졌다. 

The stakes associated with maintenance of certification are high and we have witnessed a significant increase in the number of hospitals requiring the credential before granting hospital privileges.38 As the stakes grow, the value of the programs are debated and how well they reflect the real world of practice is challenged.39 As such, sig- nificant research has been targeted at validating the decisions made in these programs.



Validity에 대한 이해는 시간에 따라 진화해왔다. 

Our understanding of the concept has evolved signifi- cantly over the years, and

  • Construct validity라는 우산 아래 통합 in 1995 Messick unified validity under the umbrella of construct validity.41

  • 표준화/일반화/외삽/결정 Kane’s more modern valid- ity theory is conceptualized as four structured arguments: stan- dardization, generalization, extrapolation, and decision rules.42

특히 외삽(평가 조건과 준거 행동간의 관계), 결정(개개인에 대한 평가 결정의 결과에 대한 것)이 중요하다.

In particular, practical implications flow from extrapolation, which involves the relationship between assessment conditions and criterion behaviors, and decision rules, which relate to the consequences of assessment decisions for individuals.


지난 25년간 타당도 연구는 이 practical implication을 네 가지 방향으로 확장해왔다.

Over the last 25 years validity research has expanded to address these practical implications in four ways:

  • (a) 평가나 프로그램을 이해관계자들이 받아들이는 정도 the ac- ceptability of the assessment or program to stakeholders,

  • (b) 이해관계자들이 학습하고 발전하게끔 encourage하는 정도 the extent to which stakeholders are encouraged to learn and improve,

  • (c) 프로그램 내에서의 수행능력과 외부 척도(다른 평가, 진료 특성)과의 관련성 정도 the extent to which there is a relationship between performance in the programs and external measures such as other assessments or practice characteristics, and

  • (d) 진료중의 평가와 퍼포먼스에 의해서 측정되는 퍼포먼스와의 관계 정도 the ex- tent to which there is a relationship between performance as measured by the assessment and performance in practice.43


연구의 사례

  • Examples of these types of research include studies showing  (a) public acceptability in that the participation rates in the pro- grams are high and the public expects that physicians will be assessed over their careers.38,44,45

  • Examples of studies (b) examin- ing the relationship between assessments and external measures include one study showing a positive relationship between those certified in Internal Medicine and annual income as well as ca- reer satisfaction and another study showing that higher MOC scores are correlated with greater electronic resource use.47,48

  • Examples include a study showing that (d) certified physicians do better on mortality for acute myocardial infarction or congestive heart failure and those with higher MOC scores also do better on processes of care for diabetes and mammography screening.49,50




CONCLUSION



6. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The rela- tionship between clinical experience and quality of health care. Annals of Internal Medicine 2005;142:260–73.






 2013;25 Suppl 1:S62-7. doi: 10.1080/10401334.2013.842909.

Assessment in the context of licensure and certification.

Author information

  • 1a Foundation for Advancement of International Medical Education and Research , Philadelphia , Pennsylvania , USA.

Abstract

Over the past 25 years, three major forces have had a significant influence on licensure and certification: the shift in focus from educational process to educational outcomes, the increasing recognition of the need for learning and assessment throughout a physician's career, and the changes in technology and psychometrics that have opened new vistas for assessment. These forces have led to significant changes inassessment for licensure and certification. To respond to these forces, licensure and certification programs have improved the ways in which their examinations are constructed, scored, and delivered. In particular, we note the introduction of adaptive testing; automated item creation, scoring, and test assembly; assessment engineering; and data forensics. Licensure and certification programs have also expanded their repertoire of assessments with the rapid development and adoption of simulation and workplace-based assessment. Finally, they have invested in research intended to validate their programs in four ways: (a) the acceptability of the program to stakeholders, (b) the extent to which stakeholders are encouraged to learn and improve, (c) the extent to which there is a relationship between performance in the programs and external measures, and (d) the extent to which there is a relationship between performance as measured by the assessment and performance in practice. Over the past 25 years, changes in licensure and certification have been driven by the educational outcomes movement, the need for lifelong learning, and advances in technology and psychometrics. Over the next 25 years, we expect these forces to continue to exert pressure for change which will lead to additional improvement and expansion in examination processes, methods of assessment, and validation research.

[PubMed - indexed for MEDLINE]


의사면허시험에 관한 논쟁(Regulation & Governance,2016)

The medical licensing examination debate

Julian Archer, Nick Lynn, Martin Roberts, Tom Gale, Sam Regan de Bere

The Collaboration for the Advancement ofMedical Education Research and Assessment (CAMERA), Plymouth University, Peninsula Schools ofMedicine

and Dentistry, Plymouth, UK

Lee Coombes

Cardiff University School of Medicine, Cardiff, Wales, UK







도입

Introduction


헬스케어 규제는 공공을 보호하고 보건의료 전문직 내에서 기준을 따르도록 하는 책임이 있다. 한 가지 접근은 NLE이다.

Healthcare regulators around the world are charged with protecting the public and supporting standards within healthcare professions. One approach to this is the adoption of a national licensing examination (NLE).


이러한 시험을 도입하고 실행하는 방식은 국가마다 다르다.

The adoption and delivery of these examinations varies around the world.

  • 자국 내 졸업생만 NLE 치름
    Some countries have established NLEs exclusively for their home graduates as a method of quality assurance for both graduates and medical schools. These include Bahrain, Croatia, Germany, Poland, Qatar, and Switzerland (Seyfarth et al. 2010).

  • 해당 사법권 내에서 진료하고자 하는 모든 의사는 NLE를 봐야 함.
    Others, such as Canada, Chile, Hong Kong, Japan, Korea, the United Arab Emirates, and the United States (US), ask all prospective doctors who hope to work in their jurisdiction to undergo a NLE process, with North America dominating the literature and methodology (Sutherland & Leatherman 2006).

  • IMG에 대해서만 인증프로세스와 더불어 추가적으로 면허시험을 보는 국가. 언어능력과 진료를 위한 기본적 자격을 위한 validated documentation의 목적
    A third distinct group of countries exclusively involve international medical graduates (IMGs) who undergo an ad- ditional examination alongside an accreditation process, where they are required to provide evidence of language com- petence and validated documentation of their primary qualifications in order to practice in the new country. These include Australasia, parts of Europe (Sonderen et al. 2009), including Sweden (Musoke 2012), and the United Kingdom (UK) (Kovacs et al. 2014).


NLE는 다음과 같은 함의를 지닌다. 전 세계적인 의사 부족이 있고, 추가적인 시험이 있을 경우 인력 계획에 허들을 높이는 것이다. NLE는 최하위 1/4에 해당하는 학생들에게 있어서 IMG든 국내 대학 졸업생이든 미래 커리어 전망을 어붇게 한다. 반면, 높은 점수를 받은 사람들은 더 좋은 위치에 갈 것을 보장받기 마련이다. NLE는 졸업생의 건강에도 영향을 미치며, 면허시험의 결과로 "스트레스와 번아웃"을 겪는 졸업생이 많다.

However, such systems are not without implications. There is a worldwide shortage of physicians, thus, addi- tional examination hurdles have workforce planning implications (Leitch & Dovey 2010). NLEs may also have im- plications for candidates if they achieve scores in the lowest quartile, resulting in poorer career prospects for both home graduates and international medical graduates (IMGs). In contrast, those who achieve the highest scores are likely to secure the best posts (Noble 2008; Green et al. 2009; Kenny et al. 2013). NLEs may also have an impact on the health of graduates, many of whom complain of “stress and burnout” as a result of the process (McMahon & Tallia 2010).


2015년 GME는 궁극적으로 2021년에는 새로운 의학면허평가를 도입할 것이라고 공지하였다. 이는 현재까지는 영국에서 진료하고자 하는 IMG만 평가하는 방식에서 영국에서 진료하고자 하는 모든 의사들을 대상으로 평가하는 방식으로 변경하는 것이다. 비록 EL, EEA, 스위스 졸업생, EC권 소유자 등은 '이주의 자유 권리'하에서 면제받을 것이긴 하다. 그리고 GMC는 '유럽 국가들까지 포괄하려는 열망이 있으며, 가능성을 찾아낼 것이다' 라고 언급했다.

In 2015, the General Medical Council (GMC), which regulates doctors in the UK, announced that they were plan- ning to develop and ultimately implement a new Medical Licensing Assessment by 2021 (General Medical Council 2015a). This will transformthe UK from only currently assessing IMGs if they wish to enter the UK to practice (General Medical Council 2015b), to assessing all doctors who wish to practice – although as European Law stands, European Economic Area and Swiss graduates, and European Commission (EC) rights holders will remain exempt under free- dom of movement legislation (Gullard 2015) and the GMC has stated that “it is our strong aspiration also to cover European nationals and we will continue to explore options” (Hulf & Hart 2015, p.3).


방법

Methods


We conducted an extensive appraisal of the literature following accepted guidance for narrative synthesis within sys- tematic reviews (Popay et al. 2006).


결과

Results


 

우리는 의사국가시험에 관한 국가 수준 혹은 대규모 수준의 담론이 양극화polarized된 것을 확인하였다. 이를 지지하는 주장과 반대하는 주장은 상대적으로 작은 숫자의 주제들에 머물러 있었고, 근거하고 있는 연구의 숫자도 적었다.

We found that the debate around national and other large scale licensing examinations is polarized (Ferris 2006; Neilson 2008) with the arguments for and against limited to a relatively small number of core themes based on a limited number of papers (Ricketts & Archer 2008; Harden 2009; Melnick 2009).


NLE는 새로운 것이 아니며, 어떤 사람들은 NLE의 오래 된 수명 그 자체가 그것의 가치를 보여주는 것이라 한다. 지지자들은 시험은 모든 사람에게 최소한의 기준을 요구하므로 모든 졸업생이 전문직에 들어서기 전에 그 기준을 달성해야 하고, "공평"해야만 한다고 주장한다. 따라서 신뢰도는 표준화와 구조화를 통해서 향상될 수 있다. 모든 지원자가 같은 시험을 보지 않는다면 지원자 간 공정하게 구별discriminate할 수 없을 것이다. 예컨대 학생들은 어떤 의과대학에 다녔느냐에 따라 대규모 시험에서 서로 다른 수행능력을 보여주고, 이 차이는 온전히 학생의 능력으로만 설명되는 것은 아니다. 다른 말로 하자면, 의과대학의 수련 과정의 차이가 반영된 것이다. NLE는 공통의 의과대학 교육과정을 지지해주고, 이 교육과정에 대한 정보를 이해관계자(환자와 고용인 포함)들에게 더 잘 전달해준다.

National licensing examinations are not new and, indeed, some suggest that their longevity is, in itself, now an endorsement of their value (Neumann & Macneil 2007; Melnick 2009). Proponents argue that testing everyone to a minimum required standard, which all graduates must achieve to enter their chosen profession, must be “fair” (Cosby 2006; Melnick 2009). After all, reliability is improved by standardization and structure (van der Vleuten 1996). If all candidates do not undertake the same examination, one cannot fairly discrim- inate between them (Lee 2008). For example, medical students perform differently in large scale testing depending on which medical school they attended (Boursicot et al. 2007), and this difference cannot be en- tirely explained by underlying student ability (McManus et al. 2008; McManus & Wakeford 2014). In other words, differences in medical school training matter. NLEs might help to support a common medical curric- ulum and provide an opportunity for this curriculum to be better informed by stakeholders, including pa- tients and employers (Wass 2005).


그러나 점차 자국 내 졸업생이 - 언어와 시스템에 익숙하여 - 해외 졸업생보다 지속적으로 잘 한다는 근거가 축적되고 있다.

However, there is a growing body of evidence that home graduates, familiar with the language and systems, perform consistently better than graduates trained elsewhere (Holtzman et al. 2014; Tiffin et al. 2014).


어떤 사람들은 NLE의 기여가 적을거라고 하지만, 만약 공정하게 시행된다면 NLE는 의사의 (국가 간) 이동을 촉진시킬 것이다.

 If fairly administered, NLEs should promote the movement of doctors, although some believe thatsuch examinations do little to increase the mobility of health professionals (Cooper 2005; Philipsen & Haynes 2007;Archer 2009). 


NLE를 지지하는 사람들은 시험 프로세스와 관련된 프로세스가 대규모 시험의 로지스틱스에 따라 향상될 수 있으며, 이 경우 자원과 전문가들을 pooling할 수 있기 때문이다. 더 나아가 규모의 경제를 시행하면 경제적으로도 유리하다.

Advocates of NLEs infer that the standards associated with the testing process are improved as a result of the logis- tics of staging large-scale examinations, which necessarily require the pooling of resources and expertise (Neumann & Macneil 2007; Lehman & Guercio 2013). Furthermore, it may also be financially advantageous by economies of scale (Lehman & Guercio 2013).


교육 평가와 이론이 빠르게 발전하고 있으며, USMLE나 MCCQE에 대한 대부분의 전문가들은 (그 발전에 기여하는)  대표적 사례이다. 이러한 주장을 뒷받침하는 근거는 많다. 그러나 NLE가 환자 안전을 보장해준다는 등과 같은 추가적 주장을 하기는 어렵다. Norcini 등이 USMLE점수와 특정 환자 성과와의 상관관계를 보여주었고 Tamblyn은 의사에 대한 불만과의 상관관계를 보여주었으나 인과관계에 대한 근거는 없다.

There is little doubt that while educational assessment and theory evolves at a rapid pace, most specialists regard the United States Medical Licensing Examination (USMLE) and the Medical Council of Canada Qual- ifying Examination (MCCQE) as world-leading examples (Bajammal et al. 2008; Lillis et al. 2012; Norcini et al. 2014). This claim is supported by good empirical evidence (Stewart et al. 2005; Hecker & Violato 2008; Committee to Evaluate the USMLE Program 2008; Margolis et al. 2010; Lillis et al. 2012; Guttormsen et al. 2013). Yet difficulties arise from the additional claims that a national licensing examination assures pa- tient safety and that this form of assessment directly leads to better patient care (Stewart et al. 2005; Tamblyn et al. 2007; Wenghofer et al. 2009; McMahon & Tallia 2010). While Norcini et al. (2014) established a cor- relation between USMLE scores and some specific patient outcomes and Tamblyn et al. (2007) a correlation with later complaints, there is no evidence as yet to establish causality (Sutherland & Leatherman 2006; Boulet & van Zanten 2014).


일부 의학교육학자들은 면허시험이 환자를 더 안전하게 만들어준다는 주장은 것은 근거보다는 추정에 의존하고 있다고 말한다. 실제로, 의학교육을 표준화하는 것은 damaging할 수 있으며, 혁신과 발전을 저해할 수 있다.

Some medical educationalists suggest arguments that licensing examinations make patients safer rest on assump- tions rather than evidence. In fact, it could even be damaging to standardize medical education, thus reducing innova- tion and advancement in curricula (Gorsira 2009; Harden 2009; van der Vleuten 2009).


분명히 NLE에 관한 다수의 가정이 있다.

Certainly a number of assumptions have been made about NLEs.

  • 공공이 더 안전해진다는...
    Melnick (2009) argues that the public are safer because assessment specialists provide a credible external audit of the quality of NLEs and, therefore, healthcare workers that are allowed to practice.

  • 지식과 역량의 최소 기준을 정확히 판단할 수 있다는...
    Assessment specialists are able to accurately establish and assess minimum standards of knowledge and competence (Melnick 2009).

  • NLE가 더 나은 환자 진료와 관계된다는...
    Others reason that the statistical correlation between NLE scores, patient care outcomes, and the incidence of disciplinary action in later professional life is evidence that NLEs directly lead to better care (Tamblyn et al. 2007; Wenghofer et al. 2009).


NLE가 낡은 것이고, 사회적/교육적/전문직업적 맥락이 USMLE가 처음 도입되었을 때와 다르다는 것을 지적하며 NLE에 반대하는 사람들이 있다. "우리는 더 이상 존재하지도 않는 문제(교육 기준과 인증이 전혀 없는 Proprietary diploma 공장)를 해결하기 위해서 100년도 넘는 과거에 설계된 시스템을 보고 있다." Ferris는 우리는 근무지에서의 역량을 여러 시간에 걸쳐 평가해야 하며, 특정 하나의 시점에서 평가해서는 안된다고 주장한다. 다른 사람들도 "(치)의료행위를 위한 최선의 준비는 (치)의료행위이다"라고 하며 동의했다. NLE를 반대하는 사람들은 쉽게 검사가능한 합습성과가 그러한 시험의 초점이 되기 쉬우며, 실제 진료행위를 위해 필요한 역량은 blueprint에서 제외됨을 지적한다. 이들이 내놓는 대안은 다음과 같은 것들이다.(more job assessment, appraisal, and professional devel- opment to provide more accurate, up-to-date, and ongoing evidence of practitioner competence )

Opponents counter that NLEs are actually outdated and that the social, educational, and professional contexts are different to when, for example, the USMLE was first established: “We see a systemdesigned over 100years ago to solve a problemthat no longer exists – proprietary diploma mills that had no educational standards, or accreditation” (Ferris 2006, p.129). Ferris (2006) argues that we should be testing competence inthe workplace over time and not at a point in time. Others agree, stating, “the best preparation for the practice of dentistry is the practice of dentistry” (Calnon 2006, p. 140). Opponents of licensing examinations also suggest that easily testable learning outcomes become the focus of such examinations and that the competences needed more for actual practice are often missing from the blueprint (Neilson 2008; Noble 2008; Harden 2009). Their answer is for more job assessment, appraisal, and professional devel- opment to provide more accurate, up-to-date, and ongoing evidence of practitioner competence (Calnon 2006; Waldman & Truhlar 2013; Kovacs et al. 2014).


Neilson 은 이 주장을 더 발전시켰다. 최종시험/면허시험/진료적합성시험의 표준화는 교육자들을 기쁨에 울게 할 것이지만, 더 나은 의사를 만든다는 명확한 근거는 없다. 이제 갓 졸업한을 만나는 내 동료와 나는 의사가 어디서 qualified 되었뜬지, 그들의 실질적 교육은 환자와 실제로 만나기 시작하면서 시작함을 알고 있다.

Neilson (2008, p. a1783) develops this argument: The standardisation of final, licensing, and fitness to practice ex- aminations may make educationalists weep with joy, but there is no clear evidence that it makes for better doctors. My colleagues and I deal with the immediate postgraduate training of juniors and know that, regardless of where the doc- tors have qualified, their practical education starts when they start working with patients for real.


궁극적으로, NLE가 실제 진료에 대한 장애가 되지 않으면서, 사실상 시험을 보는 사람이 모두 통과하는 상황에서 더 이상 공공을 보호하지 않는다고 보는 사람도 있다. 이들은 커리어 시험에서 잘 하는 사람이 더 좋은 직장을 갖는다는 점을 지적한다. 어떤 의사가 disciplinary hearings 등에 불려나오는지 예측하려는 연구들이 있고, 환자 진료를 잘 못하는 것은 수많은 다른 변수와 결과에 따른 것이라는 연구도 있다. 많은 지지자들은 시간이 지나서 규제를 당하는 의사들도 여전히 처음에는 NLE를 통과한 것이라고 지적한다. 그리고 NLE는 운영하는데 돈이 많이 든다.

Ultimately, some believe that NLEs are not real barriers to practice and, therefore, do not protect the public as, for example, nearly everyone who takes the USMLE passes it in the end (Margolis et al. 2010). They also point out that while there is evidence that those who do well in early career examinations go on to get the best jobs (Green et al. 2009; Kenny et al. 2013), there are studies that argue that trying to predict which doctors may appear in disciplinary hearings or administer poor patient care is subject to a myriad of other variables and consequences (Harden 2009; Nor- man 2015a). Many proponents themselves point out that the doctors who are subsequently identified and disciplined still initially passed the NLE (Tamblyn et al. 2007; Wenghofer et al. 2009; Norcini et al. 2014). Finally, NLEs are expen- sive to run (Brown et al. 2015). 4


대서양을 사이에 둔 차이

Transatlantic divide


우리는 북미와 유럽의 차이를 발견하였다.

We discovered a distinct North American/European divide in the arguments in the literature.

  • 북미 North America has the well established USMLE and MCCQE (Bajammal et al. 2008; Lillis et l. 2012; Norcini et al. 2014), with the literature focusing on the psychometric/measurement aspects of the examinations (Stewart et al. 2005; Hecker & Violato 2008; CEUP 2008; Margolis et al. 2010; Lillis et al. 2012; Guttormsen et al. 2013).

  • 유럽 In Europe, it is more commonly argued that the challenges and practicalities of introducing an equivalence within or across European boundaries would be considerable (Harden 2009), despite imperatives that ensure citizens are able to freely move and work (de Vries et al. 2009). Securing a consensus whereby all doctors in Europe might sit a pan-European licensing examination would be very challenging (Gorsira 2009).

 

전-유럽 면허시험에 반대하는 주장은 로지스틱스에 대한 것 뿐만은 아니며, 유럽 전문가들은 더 큰 그림을 우려하고 있다. Harden은 대규모 시험은 쉽게 평가가능한 학습성과를 타겟으로 하는 경향이 있으나, 이것은 의사에게 요구되는 전반적인 역량과의 관련성이 거의 없다. 또한 NLE는 변화와 혁신을 저하시키고, 다양성을 낮춘다. van der Vleuten이 유럽에서 NLE가 작동할 수는 있다고 결론내렸지만 그는 "우리는 어떻게 학습과 혁신의 힘을 지나치게 훼손시키지 않으면서 원하는 효과를 달성할 수 있는 qualifying system을 set up 할 수 있을지 매우 고심해봐야 한다"라고 하였따.

The arguments against a European equivalent not only refer to logistics, but European experts are also concerned about the wider picture. Harden (2009) argues that large scale examinations tend to target learning outcomes that can be easily assessed and that these are rarely related to the overall competence re- quired of a doctor, and that NLEs depress change and innovation while failing to recognize diversity. Although van der Vleuten concludes that an NLE might work in Europe, he explains: “…we need to start thinking very carefully about how qualifying systems could be set up to achieve the desired effects without doing too much harmto learning and to innovation power…” (2009, p. 191).


결론

Discussions and Conclusions


모든 잠재적 지원자에게 시행되는 시험은 여러 psychometrics 측면의 장점이 있고, 신뢰성있는 결과를 내놓아야 한다. 그러나 그렇게 간단하진 않다. 지원자의 능력이나 특성은 시간에 따라 일정한 것이 아니며, 유능한 사람도 다른 환경에서는 수행능력이 저하될 수 있다. 통계적으로 말하면, poor performance 가 매우 드물어서 신뢰성있는 예측은 거의 불가능하며, NLE의 다양한 장점에 대해서 이야기하지만, 이는 transatlantic divide일 수 있다.

An examination that is deliv- ered to all possible candidates also has many psychometric strengths and should produce reliable results; however it is not as simple as that. Candidates’ traits or abilities are not constant over time and good people can perform badly in difficult settings (Weiner 2013). In statistical terms, poor performance is rare, making reliable predictions almost im- possible, and while recognized experts argue over the various merits of a national licensing examination, it is perhaps the discourse of a transatlantic divide that is of the most interest (Norman 2015a).


북미는 오랜 역사가 있다. NBME는 1913년 설립되었다. '시험의 문화'는 끊임없이 성장하는 litigioius society에 따라 더 강화되었다. 유럽은 적어도 역사적으로는 평가와 관련된 결정은 local하게 개발되고 시행되었으며, QA는 inspection의 동료평가모델에 기반하고 있다.

North America has a long history of large scale testing in health care; the National Board of Medical Examiners was founded in 1913. The testing culture is grounded and further reinforced by the perception of an ever-growing litigious society. In Europe, at least historically, more assessment decisions have been locally developed and delivered, and qual- ity assurance has been based on a peer review model of inspections.


미국에 기반을 둔 교육기관이 개발한 유용한 프레임워크는 validity를 다섯 가지로 나눈다. 그러나 이 프레임워크에 따라 수집하고 보고하는 타당도 근거는 저마다 다르다. 북미의 psychometrician은 전통적으로 internal structure에 초점을 맞춰왔으며, 이것은 평가는 신뢰도와 같이 통계학적 퍼포먼스에 대한 것이다. 시험의 재생산가능성은 타당도 근거의 근본적인 한 부분으로서, legal challenge를 버텨낼 수 있는 굳건한 일군의 통계학적방법들에 의해서 지지된다.

A useful framework, developed and revised by US based educational organizations, divides validity into five categories: content; response process; internal structure; relationship to other variables; and consequences (Downing 2003). Yet the emphasis for collecting and reporting validity evidence fromacross this framework appears to vary. North American psychometricians have traditionally focused on internal structure, which deals with the statis- tical performance of the assessment, including its reliability (Brannick et al. 2011). The reproducibility of a test is a fun- damental part of validity evidence, supported by a concrete statistical group of methods that can withstand legal challenge (Cavanaugh 1991).


우리는 NLE에 대한 논쟁이 현실reality를 구성하는 것에 관한 표준화와 맥락화라는 두 가지 이데올로기의 싸움의 일부라고 본다 - 즉 타당도의 두 가지 측면에 대한 것이다. 타당도 문헌에서 신뢰도와 다른 validity construct는 동일한 프레임워크 내에 존재하는데, 기본적으로 사회-정치적 차원에 의해서 만들어지는 갈등이 있다. 신자유주의와 같은 더 넓은 차원의 움직임과 함께 국가는 전문직을 약간의 거리를 두고 통제하려고 하며, 우리는 그들이 운영하는 평가방법은 중앙화와 표준화를 목도하는데, 이로 인해 자기-조절적으로 자율적으로 돌아가게 하기 위함이다. 이는 위험을 회피하고자 하는 대중을 안심시키기 위한 접근법 중 하나이다. 이러한 drive는 'trusted'될 수 있는 신뢰성있는 결과를 요구하는 압박으로 이어지는데, 타당도의 다른 측면에 있어서는 의도하지 않은 detrimental한 영향의 위험이 있다. 프로페셔널리즘과 Capability와 같은 실제로 관심을 가져야 하는 구인을 평가하는데 어려움을 겪으면서 robust하지만, potentially irrelevant한 평가가 되는 것이다.

We would argue that the NLE debate is part of a wider ideological battle between standardization and contextual- ized reality construction, the two “sides” of validity. Within the validity literature, while reliability and other validity constructs inhabit the same framework, there is a conflict that is fundamentally shaped by socio-political dimensions. With a background of a broader move, under neo-liberalism, as the state seeks to promote its control on professionals from a distance, we increasingly witness a desire to centralize and standardize assessment methodologies so they run autonomously with self-regulation (Rose & Miller 1992). This is part of an approach to assure an increasingly risk- averse public that professionals are safe and fit-for-purpose. This drive puts pressure on the need to secure reliable re- sults that can be “trusted” but at the risk of an unintended detrimental impact on other aspects of validity. The result is an assessment that is robust but potentially irrelevant as it struggles to assess constructs that are of real interest, such as professionalismand capability.


이 논쟁은 우리가 NLE를 운영해야하느냐 말아야하는가가 아니라, 무엇을 평가해야 하느냐로 관점을 옮길 때 도움이 될 수 있다. 전문직의 평가에 대한 더 나은 이해를 함으로써 우리는 더 옳은 방향으로 나아갈 수 있을 것이다.

The debate might benefit from refocusing fromwhether or not we should have an NLE, to what they should assess. A better understanding of the assessment of professionals in the workplace, as opposed to simulated environments, such as OSCEs, might take us in the right direction. Ultimately, we would benefit fromattempting to achieve a balance between assessing a breadth of skills and the capacity for such skills in practice and focusing less on psychometric reproducibility.


궁극적으로, 과연 NLE가 환자 안전을 지켜주느냐에 대한 연구가 필요하다. Jolly는 NLE의 긍정적인 측면을 확신시키기 위해서는 기존의 평가테크닉에 대해서 '추가적 가치'가 있는가를 찾아야 하며, NLE보다 더 비용-효과적 대안은 없는 것인지 찾아야 한다. 고 했다.

Ultimately, research needs to focus on whether NLEs add uniquely to assuring patient safety. Jolly (2016), in his re- cent opinion piece, argues that inorder to be convinced of the positive impact of NLEs, we should seek to ascertain their “added value” to existing assessment techniques and explore potentially more cost-effective alternatives to the NLE, such as a more “beefed up” accreditation of the existing assessment process (Jolly 2016, p. 14).


우리는 NLE 수행능력이 과거 시험의 성공여부에 의해서 예측가능하며, 미래의 성공을 예측해준다는 것을 안다. 또한 (최소한 일부 영역에서는) NLE에서 더 나은 저수를 받은 사람이 더 잘 진료한다는 것도 안다. 따라서 시험은 의심할 여지 없이 중요하다. 그러나 NLE만을 사용하여 시험을 치르는 것은 논쟁의 여지가 있으며, 이는 NLE의 활용을 지지하거나 방박하는 일관된 근거가 없기 때문이다.

We know that performance in NLEs can be predicted by prior examination success and can, in turn, predict future success (Stewart et al. 2005; Ranney 2006; Hecker &Violato 2008; Tiffin et al. 2014). We also knowthat those who per- formbetter in NLEs deliver better care, at least in some specific domains (Norcini et al. 2014); therefore, testing is un- doubtedly important. However, testing specifically using NLEs remains up for debate as there is a lack of unequivocal evidence to either support or refute their use (Boulet & van Zanten 2014).

 

 


 

Holtzman KZ, Swanson DB, Ouyang W, Dillon GF, Boulet JR (2014) International Variation in Performance by Clinical Discipline and Task on the United States Medical Licensing Examination Step 2 Clinical Knowledge Component. Academic Medicine. 89(11), 1558–1562.


Lee YS (2008) OSCE for the Medical Licensing Examination in Korea. Kaohsiung Journal of Medical Sciences 24, 646–650.


Norcini JJ, Boulet JR, Opalek A, Dauphinee WD(2014) The Relationship Between Licensing Examination Performance and the Out- comes of Care by International Medical School Graduates. Academic Medicine 89, 1157–1162.


Norman G (2015a) Identifying the Bad Apples. Advances in Health Sciences Education 20, 299–303.


Norman G (2015b) The Negative Consequences of Consequential Validity. Advances in Health Sciences Education 20, 575–579.


Wenghofer E, Klass D, Abrahamowicz M, Dauphinee D, Jacques A, Smee S et al. (2009) Doctor Scores on National Qualifying Examinations Predict Quality of Care in Future Practice. Medical Education 43, 1166–1173.





The medical licensing examination debate

Authors

Abstract

National licensing examinations are typically large-scale examinations taken early in a career or near the point of graduation, and, importantly, success is required to subsequently be able to practice. They are becoming increasingly popular as a method of quality assurance in the medical workforce, but debate about their contribution to patient safety and the improvement of healthcare outcomes continues.

A systematic review of the national licensing examination literature demonstrates that there is disagreement between assessment experts about the strengths and challenges of licensing examinations. This is characterized by a trans-Atlantic divide between the dominance of psychometric reliability assurance in North America and the wider interpretations of validity, to include consequences, in Europe. We conclude that the debate might benefit from refocusing to what a national licensing examination should assess: to achieve a balance between assessing a breadth of skills and the capacity for such skills in practice, and focusing less on reproducibility.


양질의 진료를 보장하기: 인증(대학), 면허, 증명, 재확인의 역할 (Med Educ, 2014)

Ensuring high-quality patient care: the role of accreditation, licensure, specialty certification and revalidation in medicine

John Boulet & Marta van Zanten




도입

INTRODUCTION


의료 규제 담당자의 관점에서, 불량한 의사는 질병에 비유될 수 있다. 규제의 첫 번째 역할은 역량을 갖춘 자에게만 전문적 진료를 제한하여 환자안전을 보장하는 것이다. 유사하게, 잘 기능을 수행하는 인증 시스템은 역량을 갖추지 못한 의사의 양성을 최소화시킬 수 있다.

From the perspective of medical regulators, disease can be thought of as a metaphor for poorly functioning doctors. The primary mandate of regulators is to ensure patient safety by restricting professional practice to only those who have demonstrated competence (i.e. are free from ‘disease’). In a similar way, properly functioning accreditation systems should minimise the production of poorly skilled doctors by improving the education process. Prevention, in this context, is certainly preferable to cure. Given the costs associated with poor health care delivery, it is better to produce highly competent practitioners and ensure, through continuing educational activities, that those who care for patients remain competent to do so.



인증

ACCREDITATION


인증 시스템은 stakeholder에게 기본의학교육과정과 이후 수련 프로그램의 퀄리티를 보증하는 효과적인 기전으로 인식되어 왔다. 인증은 designated authority가 주기적으로, 기관의 교육 프로그램을, 특정 준거와 절차에 따라, 검토하고 평가하는 것을 말한다.

Systems of accreditation are frequently viewed by stakeholders (e.g. the public, health care administra- tors, policymakers) as effective mechanisms for ensuring the quality of basic medical education cur- ricula and subsequent training programmes across the learning continuum. Accreditation can be defined as a process by which a designated authority reviews and evaluates, on a cyclical basis, an educa- tional programme or institution using clearly speci- fied criteria and procedures.


전 세계 다양한 조직에서 의학교육과 수련 프로그램 인증을 해왔다. 많은 국가에서, 예를 들어 남아메리카의 대부분, 아프리카와 아시아의 일부에서 인증 기관은 고등교육기관 전체에 대한 검토를 한다. 일부 국가(호주, 멕시코, 영국) 등에서는 의학과 같은 전문직교육프로그램에 특이적인 기관이 별도로 존재한다.

Various organisations around the world accredit medical education and training programmes. In many countries, such as the majority in South America and some in Africa and Asia, accreditation organisations review higher education institutions as a whole. In other countries (e.g. Australia, Mexico, the UK), specialised agencies accredit specific pro- fessional education programmes, such as medicine.



인증기구는 정부기관(MOE, MOH)의 한 부분일 수도 있고 독립된 기구일 수도 있다. 인증기구의 결정은 보통 공식적으로 정부 단위에서 인정된다. provincial, national, cross-national 수준에서 작동할 수 있으며, 평가는 의무적일 수 있고 자발적일 수도 있다. 인증기구의 권위는 모든 의학교육프로그램을 포괄할 수도 있고, 일부에 대해서만(공립 또는 사립) 시행할 수도 있고, 특정 언어를 사용하는 기관에 대해서만 할 수도 있고, 다른 정책요건에 따를 수도 있다.

Accreditation authorities can be part of a country’s government, such as an entity that is directly part of a ministry of education or health, or may be an independent body, the decisions of which are usually officially recognised at government level. Organisations can function at the provincial, national or cross-national (regional) level, and reviews can be mandatory or voluntary. An accreditation organisation’s authority can be broad and encompass all medical education programmes in its jurisdiction, or be limited in scope, and cover only either public or private educa- tion programmes, institutions with a specific lan- guage of instruction, or institutions that meet certain other policy requirements.



미국의 인증시스템

Accreditation in the USA



LCME와 COCA

In the USA, the Liaison Committee on Medical Education (LCME) is the nationally recognised accrediting authority for medical education pro- grammes leading to the MD degree in US and Cana- dian* medical schools, and the Commission on Osteopathic College Accreditation (COCA) accred- its schools granting the osteopathic (DO) degree.


ACGME와 AOA. 2015년부터는 단일한 통합된 시스템을 운영할 것.

Currently, the Accreditation Council for Graduate Medical Educa- tion (ACGME) is the overseeing body responsible for the quality assurance of allopathic GME, and the American Osteopathic Association (AOA) accredits osteopathic residency programmes. These organisa- tions have recently announced that, as of 2015, there will be a single, unified accreditation system for GME programmes in the USA.5


ACGME는 여섯 개 역량. 

The ACGME has identified six general competencies deemed essential for residency training: patient care; medical knowledge; practice-based learning and improvement; interpersonal and communication skills; professionalism, and systems-based practice. A meta-analysis of 56 studies reported mixed results, with the authors concluding that there was little evidence that most of the cur- rent measurement tools validly assessed the compe- tencies independently of one another.6



인증 (글로벌)

Accreditation globally



FAIMER에 의해 유지되는 DORA는 177개국 중 기본의학교육 인증시스템을 갖춘 104개 국가의 목록을 열거하였다. 이 중 42%는 의학교육 특이적 인증기구, 58%는 고등교육기관의 일부로서 의학교육프로그램을 인증하는 기구를 가지고 있다. 인증시스템이 존재한다고 해서 모든 해당 국가의 의과대학이 인증 절차를 밟는 것은 아니다.

As of February 2013, the Directory of Organizations that Recognize/Accredit Medical Schools (DORA) maintained by the Foundation for Advancement of International Medical Education and Research (FAIMER) lists 104 countries with active systems of accreditation for basic medical education (out of 177 countries with currently operational medical schools).7 Of these countries, 42% (n = 44) have accreditation agencies that are specific to medical education, and 58% (n = 60) use agencies that accredit medical programmes as part of higher edu- cation institutions. It is important to note that the existence of an accreditation system in a country does not denote that all medical schools in that country are accredited, as the review is sometimes voluntary.



1996년 WHO의 설문에서 2/3의 의과대학이 외부기구에 의한 인증을 받는다고 하였다.

A report based on a 1996 World Health Organization (WHO) survey of ministries of health and deans of medical schools8 indicated that almost two-thirds of medical schools were accredited by an external body,


또 다른 국제적 investigation은 비록 절반 이상의 국가에서 인증시스템을 갖추고 있지만, authority와 enforcement의 수준은 차이가 크다.

Another global investigation of medical education accreditation found that although over half of all countries with medical schools have a national system of accreditation, the nature of the various authorities and levels of enforcement vary considerably.9


아홉 개 개발도상국의 의학교육 인증시스템을 비교한 연구에서 일부 개발도상국에서 robust quality assurance procedure가 퍼져나가고 있으며, 그 프로토콜은 미국과 비슷하다. 인증시스템의 prevalence와 특징에 대한 문헌은 있으나, 그것이 교육향상에 기여하는 유용성을 정량화한 근거는 거의 없다.

A study comparing medical education accreditation systems in nine developing countries located throughout the world concluded that the trend towards instituting robust quality assurance procedures was spreading to some developing countries, in which protocols similar to those used in the USA have been developed and implemented.10 Unfortunately, although the preva- lence and characteristics of accreditation systems have been documented, there is relatively little evidence to quantify their utility with respect to improving education practices.



인증시스템의 타당도

Validity of accreditation


인증의 가치에 대한 연구가 부족하다는 것은 교육분야에 걸쳐 적용가능한 방법론적 요인이 다수이기 때문일 수 있다. 예컨대 의학교육에 있어서 많은 국가에서 모든 프로그램이 인증을 받기 때문에 국가 내에서의 비교가 불가능하다.

The lack of research related to the value of accreditation is likely to reflect a number of methodological fac- tors applicable across educational fields.11,12 For example, in medical education specifically, in many countries all programmes are accredited (usually based on the same criteria), which precludes within- country comparisons of performance of students or graduates from accredited and non-accredited pro- grammes.


이러한 방법론적 어려움에도 불구하고, 일부 연구자들은 인증활동이 의학교육을 발전시키고, 적어도 학생의 퍼포먼스 차원에서 그러함을 보여주었다.

Despite these methodological difficulties, some investigations have shown that accreditation activi- ties may improve medical education, at least in terms of the performance of students. 

    • In a study of Mexican and Philippine citizens seeking Educa- tional Commission for Foreign Medical Graduates (ECFMG) certification, first-attempt pass rates on all components of the required US Medical Licensing Examination (USMLE) series were higher for individuals who had attended accredited medical schools, compared with their peers who had attended non-accredited schools.13 

    • In another study of the performance of all graduates of international medical schools who took the USMLE Step 2 clini- cal skills (CS) examination during the 5-year (2006– 2010) study period, accreditation was positively asso- ciated with the Step 2 CS first-attempt pass rate.


비록 이들 연구가 인증활동과 학생의 시험성적에서 연관성을 보여주었지만, 퀄리티에 대한 또 다른 표지자가 필요하고, 여기에는 이후 커리어에서 치르게 되는 시험과 실제 환자 진료 데이터 등이 필요하다. 이런 것 없이는 인증 프로세스가 좋은 것appreciable인지 알기 어렵고, 교육프로그램에 대한 장기 영향력도 알기 어렵다.

Although these studies include some data showing a positive association between accreditation activities and student success on examinations, additional markers of quality, including performance on other examinations taken later in the career and actual patient care data are required. Without these, it is difficult to know whether accreditation processes have an apprecia- ble, long-term impact on education programmes and, hence, the quality of those who graduate and eventually practise medicine.



인증시스템의 존재에 따른 영향력 자료를 수집하는 것 외에도, 인증의 요소(구체적 스탠다드)를 조사하는 것도 필요하다. 지금까지의 연구가 인증시스템의 존재 여부 혹은 일부 일반적 특성에 관해 진행되어왔다면, 소수의 연구만이 특정 의학교육 스탠다드의 적절성 또는 효과성에 대해서 다루었을 뿐이다.

In addition to gathering data on the effects of the existence of accreditation systems, it is also impor- tant to investigate the components of accreditation, such as the specific standards used and protocols employed, that may enhance the quality of the edu- cation process.15 Whereas previous investigations focused on describing the existence and some gen- eral characteristics of accreditation systems, only a few studies, to our knowledge, have compared or assessed the effectiveness or appropriateness of the specific medical education standards used to make accreditation decisions.16–18


제한적이더라도 근거는 존재하재하고, BME에서 인증의 가치를 지지한다. 의과대학은 자신의 QA system을 돌아보고 internal rules and regulations와의 compliance를 정해야 한다.

Although there is evidence, albeit limited, to sup- port the value of accreditation of basic medical education programmes, some benefits of these sys- tematic reviews may be manifest despite limited data showing marked improvement in student outcomes. Schools must examine their own quality assurance systems and determine their compliance with internal rules and regulations.


인증기구에 의한 인증 결정은 이해관계자들에게 유의미하고 신뢰할 수 있는 것으로 받아들여진다. 그러나 절차의 투명성 부족, 방법론적 변동성, 학교 간 표준화 부족 등으로 인해서 그 결정은 의심을 받거나 자의적인 것으로 보이기도 한다. 전세계적으로 받아들여지는 시스템을 갖추기 위해서 WFME는 FAIMER와 함께 인증기구를 recognition하는, 즉 meta-accreditation 절차를 만들고 있다.

Accreditation decisions made by agencies around the world are usually considered credible and are accepted by stakeholders as meaningful and trust- worthy. Nevertheless, because of a lack of transpar- ency in the process, variability in methodology, or other issues related to a lack of standardisation across schools, the decisions can sometimes appear capricious and arbitrary. In order to address the need for a globally accepted system for ensuring the quality of accreditation systems themselves, the World Federation for Medical Education (WFME), in conjunction with FAIMER, has formulated poli- cies and procedures for the recognition of agencies accrediting medical schools, an endeavour in meta- accreditation.19



인증에 대한 인센티브

Incentives for accreditation (cost/benefit)



미국에서, BME에 대한 인증은 기본적으로 자발적인 것이지만, 졸업생이 GME에 들어가고 진료면허를 받기 위해서는 그 학교가 인증을 받아야 한다.

In the USA, the accreditation of basic medical education programmes is techni- cally voluntary, but a school must be accredited in order for graduates to enter GME and obtain licen- sure to practise, 

    • For example, in Mexico, students at accredited schools are provided with enhanced clinical clerkship opportunities compared with their peers at non- accredited institutions. The purpose of the US Department of Education National Committee on Foreign Medical Education and Accreditation (NCFMEA) is to review the standards used by for- eign countries to accredit medical schools and determine whether those standards are comparable with standards used to accredit medical schools in the USA.20 Thus, medical schools located outside the USA that seek to attract US citizen students are given incentives to seek accreditation by an agency that has been deemed comparable by the NCFMEA.


인도 사례

In other countries, such as India, obtaining accredi- tation by a voluntary agency (e.g. the National Assessment and Accreditation Council [NAAC]) in addition to the mandatory accreditation by the Med- ical Council of India (MCI) carries some prestige in a crowded field of medical education programmes. A school considering voluntary accreditation needs to weigh the direct cost of seeking the accreditation review and the costs associated with making the nec- essary changes dictated by the standards against the indirect value of obtaining a secondary accreditation status.



의사의 유동성

Doctor mobility


의학교육 인증은 의사의 유동성에도 도움이 된다. 왜냐하면 하나의 기관으로부터 여러 의과대학이 인증을 받거나, 다수의 기관으로부터 상호 인정을 받으면 학생들이 의과대학 간 학점 교류와 같은 유동성 옵션이 생기기 때문이다. 

Accreditation of medical education can also aid in doctor mobility, as accreditation of multiple schools by a common agency, or mutual recognition of accreditation decisions across multiple agencies, can enhance options for student mobility between schools, such as the transfer of credits.



면허, 증명, 재인증

LICENSURE, CERTIFICATION AND REVALIDATION OF CREDENTIALS


의학에서, 적어도 대부분의 국가와 사법권에서 합리적으로 진료를 제한하는 것이 있다. 면허 - 대체로 정부에 의해서 교부되는 - 는 국가 혹은 지역 단위에서 최초에 그 전문직군으로 들어가기 위해 필요한 것이다. 증명은 반대로 비-정부 기관에 의해서 부여되며, 일반적으로 더 높은 수준의 자격qualification을 함축한다. 전 세계 많은 지역에서 의사는 진료를 위한 최초의 면허를 받을 수 있고, 그 후에 전문과목의 학회나 보드에서 증명을 받는다.

In medicine, at least in most countries and jurisdic- tions, there are reasonably strict practice regula- tions. 

  • Licensure (or registration), which is generally granted by governments, at either the national or regional level, is necessary for initial entry into the profession. 

  • Certification, by contrast, is usually con- ferred by a non-governmental agency, and typically connotes a higher level of qualification. 

In many areas of the world, a doctor can obtain an initial licence to practise medicine and then specialise, obtaining a certificate from a specialty society or board.



Licensure와 Certification이 각국마다 다른 것처럼, Licensure와 Certification에 관한 발급 결정도 상당히 다르다. 더 나아가 의학의 발전과 변화하는 환자 요구에 따라서 licensure와 certification 기준은 시간에 따라 변할 수 있고, 또 그래야 한다. 일반적으로 Licensure와 Certification은 어떤 형태의 credentialing과 assessment를 포함한다. 면허 발급을 위한 credentialing으로 다음의 것이 필요하다.

Just as licensure and certification processes vary around the world, the criteria upon which licensure and certification decisions are granted can be quite different.21–23 Moreover, to keep up with advances in medicine and changing patient needs, licensure and certification criteria can, and should, be modi- fied over time.24 In general, licensure and certifica- tion involve some form of credentialing and assessment. 


For licensing purposes, credentialing can entail, amongst other criteria, 

  • confirmation of medical school attendance and graduation

  • recogni- tion of the medical school (e.g. accreditation), and 

  • verification of the medical school diploma. 


Credentialing에 더하여, 모두는 아니더라도 Licensure와 Certification 기구는 일정 유형의 평가 프로세스가 있다. Credentialing와 Assessment 프로세스 모두 Licensure와 Certification을 받고자 하는 지원자들이 특정 기준을 만족시켰는지를 확인하고, 대중을 qualify되지 않은 진료행위에서 보호하고자 하는 목적으로 설계된다.

In addition to credentialing, most, if not all, licensing and certifi- cation (or registration) bodies have some sort of assessment process. In the USA, initial licensure is dependent on successful completion of the USMLE or the Comprehensive Osteopathic Medical Licens- ing Examination (COMLEX-USA). Subsequent board certification (or registration) may also involvea number of assessments, including in-training examinations and specialty board examinations. Both the creden- tialing and assessment processes are designed to ensure that candidates seeking licensure and certifi-cation have achieved specific standards, and thus help to protect the public from unqualified practi- tioners. 




역사적으로, Licensure와 Certification은 한번 주어지면 평생 가는 것이었다. 그러나 이제는 MoL와 MoC를 강조하는 쪽으로 옮겨가고 있다. 영국 등에서는 이를 revalidation이라고 한다. 이러한 're-registration'은 일정 기간(5년 또는 10년)마다 이뤄지며, 여러가지 요소(CME, CPD, 동료평가, 환자평가, 시험 등)로 구성될 수 있다.

Historically, both licensure and certification (or reg- istration) have been granted for the lifetime of the doctor.2 Today, there is a general movement towards maintenance of licensure (MoL) and main- tenance of certification (MoC) requirements. In the UK and many other countries, this process is typi- cally referred to as revalidation. This ‘re-registration’ of doctors, which is typi- cally on a periodic schedule (e.g. every 5 or 10 years), can have many components, including requirements for continuing medical education (CME) or continuous professional development (CPD), peer and patient assessments, and various types of examination.



최근, 임상에 re-entry하는 들어서는 의사들에 대한 규제에 대한 논의가 있다. 여기서 논의의 중심은 '해당 전문직을 자발적이든, 규제에 따라서든 떠났던 사람이 다시 전문직을 수행하기에 적절한가'이다.

Recently, there have been discussions of the regulatory challenges associ- ated with doctor re-entry into clinical practice.23,28 Here, the impetus is to ensure that doctors who have left the profession, either voluntarily or because of disciplinary action, are fit to return.


비록 Licensure, Certification, Revalidation이 대부분의 국가에서 도입되어 있지만, 국제적으로 통용되는 best practice는 없다. 대신, 각 정부가 자신의 나름의 기준을 만들고 있다.그럼에도 불구하고 어떻게 도입되느냐 그리고 구체적은 기준이 얼마나 엄격rigour하느냐에 따라서 이들은 긍정적 결과를 낳을 수도 부정적 결과를 낳을 수도 있다.

Although licensing, certification and revalidation of doctors are accepted practice in most nations, there are no globally accepted ‘best practices’. Instead, local governments (or specialty societies) maintain their own standards. Never- theless, depending on its implementation and the rigour with which specified criteria are enforced, the application of licensure (or certification or reg- istration or revalidation) processes can have both positive and negative consequences.



Licensure와 Certification의 타당도

Validity of licensure and certification (scores and decisions)



가장 주된 우려는 평가와 실제 의료행위가 얼마나 match하느냐는 것이다. 면허를 위한 평가는 명확하게 환자가 의사에게 기대하는 역량을 반영해야 한다. 진료중인 의사들에게서 발견되는 문제의 성격과 유형이 규제를 위한 평가가 얼마나 요구되는지 알려준다. 종종, 내용 타당도를 지지하기 위해서 실제 진료 자료를 test blueprint에 활용한다. 그럼에도 불구하고 의료에는 다양한 측면이 있고(건강정보기술의 활용 등) 규제 프로세스에서 이것들이 더 잘 평가되어야 할 필요가 있다.

Of primary concern is the match between what is assessed and what doc- tors actually do in practice. Assessments for licen- sure should clearly reflect competencies that patients expect of their doctors.31 The assessment needs for regulation can also be informed by the nature and types of problems typically seen in prac- tising doctors.32 Often, to support content validity, actual practice data are used to help inform the test blueprint (i.e. how examination content is distrib- uted in a test form). Never- theless, there continue to be various aspects of medical care (e.g. the use of health information technology) that need to be better assessed as part of the regulatory process.33


면허시험이 relevant content를 담고 있다는 것을 보여주기 위한 많은 작업이 있었음에도, 새로운 시뮬레이션-기반 평가는 기존에는 평가하기 어려운 역량을 측정할 수 있다. 그러나 타당도 근거는 잘 해봐야 discouraging할 뿐이다. USMLE점수를 전공의 선발에 사용하는 validity에 대한 의문이 있었다. 그러나 면허시험점수와 졸업 후 수행능력은 validity chain에서 그저 하나의, 상대적으로 취약한 것일 뿐이다. 정말 필요한 것은 이들이 unrestricted 면허를 받았을 때 어떻게 하느냐이다. 안타깝게도 이러한 연구는 부족하고, 주로 관찰적/묘사적이며 인과간계 해석이 안된다. 반면, 증명(재증명)에 대해서 의료전문직 내에서의 advanced standing이 더 나은 환자진료와 관련이 된다는 연구가 있다.

Although much work has been conducted to ensure that licensure assessments contain relevant content, and new simulation-based assessment methods can measure certain competencies that were difficult to measure previously,34 other validity evidence is often lacking or, at best, discouraging. For example, in the USA, some researchers have questioned the validity of USMLE scores for making medical resi- dency selections.35 However, the relationship between these licensure assessment scores and post- graduate performance is only one, and probably a relatively weak, link in the validity chain. What is really needed is some indication of the quality of care these individuals provide after receiving an unrestricted licence to practise medicine. Unfortu- nately, the research evidence linking regulatory interventions and quality of care is sparse, mainly observational and descriptive, and does not, for the most part, allow for causal interpretations.36 By con- trast, with respect to certification (or recertifica- tion), there have been several studies to show that advanced standing within the medical profession (i.e. specialisation) is associated with the provision of better patient care.37–39 


초기 면허 발급에 대해서 보면 많은 연구들이 면허시험 점수가 미래의 진료 퍼포먼스와 관련됨을 보여주었다. 비록 이들연구가 informative하고 면허시험의 퍼포먼스가 미래의 진료까지 외삽가능함을 보여주지만, 아직 갈 길이 멀다.

For initial licensure, several studies have shown that licensure examination scores are related to future practice performance.42–44 Although these studies are infor- mative, and provide some evidence to suggest that performance on licensure examinations extrapo- lates to practice, they are far from complete.


새로운 평가를 Licensure와 Certification의 한 부분으로 도입하는 것이 가져올 영향은 클 것이다. 평가가 잘 만들어지고 실제 진료행위와 유의미하게 연결이 되어있다고 한다면, 지원자들이 시험을 준비하면서 더 나은 의사가 될 수 있을 것이다. 이러한 효과는 주로 CME에 근거를 두고 있으며, 재확인revalidation 활동에서 더 두드러질 것이다.

There is, no doubt, a consequential impact of intro- ducing new assessments as part of the licensure and certification (or registration) process.46 Provided that the assessments are well constructed and mean- ingfully related to practice, candidates will prepare, making them better practitioners. This effect, based primarily on CME, is probably more pronounced in revalidation activities.27,47




비용과 이득

Costs and benefits


대부분의 면허, 증명, 재인증 프로세스는 매우 비용이 많이 든다.

Most licensure, certification and revalidation pro- cesses are very expensive, and costs are typically borne by the candidate. As an example, the total cost of the USMLE (Step 1, Step 2 Clinical Knowl- edge, Step 2 Clinical Skills), necessary for licensure in the USA in all jurisdictions, is approximately US $2320.48 On top of the cost of typical licensure assessments, it is also com- mon to have a recurring fee associated with the granting and maintenance of the licence. For those doctors who seek specialty certification, there are additional expenses associated with obtaining and maintaining this status. Finally, with the possible introduction of simulation-based assessments for the MoC49 and retraining of doctors for medical licen- sure,50 the expense borne by the individual doctor is likely to rise. 


비록 면허, 증명, 재인증 비용이 엄두도 못 낼 정도로 높긴 하지만, 이것을 관리감독하는데 드는 비용은 그것이 없을 때 사회가 감당해야 하는 비용에 비추어 생각해보아야 한다. 규제시스템이 더 나은 의사를 양성한다는 일부 자료가 있다. 또한 전문과목학회specialty board의 증명과 등록이 더 나은 진료를 한다는 결과도 있다. 그러나 더 연구가 필요하다.

Although licensure, certification and revalidation costs can be prohibitive, the costs of oversight must be weighed against the potential cost of its lack to society (e.g. poor patient care). There are some data to show that regulatory (licensure) systems yield bet- ter performing doctors,51,52 and even more evidence linking specialty board certification or registration to better patient care,53–55 but there appear to be no comprehensive studies of their costs and benefits.


이 때 문제의 일부분은 개별 의사들의 poor care의 비용과 관련한 어려움을 반영할지도 모른다.

Here, part of the problemmay reflect the difficulty of relating the costs of (poor) care with the individually licensed doctor.



비용과 관련한 또 다른 문제는 의사의 qualification에 대한 정확한 DB를 구축하는 것이다. 미국과 다른 곳에서 이미 노력이 있어왔다. 규제시스템의 효과성을 판단하기 위해서 포괄적이고 종단적인 자료는 필수불가결한 것이다.

Another issue related to cost rests with the need to maintain accurate databases concerning doctor qualifi- cations. In the USA and elsewhere, efforts have been made to construct national practitioner databases.57 To judge the efficacy of any regulatory sys- tem, it is imperative to have comprehensive, longitudinal data on all doctors within a jurisdiction, including qualifications and practice characteristics.



의사의 유동성

Doctor mobility


개별 여행자들에 입장에서 외국에 도착하면 자동차를 빌리는 것은 당연한 절차이다. 국제운전면허증이 일부 국가에서 필요하지만, 차 대여 업체는 보통 그 대여 업체가 설립된 사법권 내에서 valid한 면허만 요구한다. 환자를 보는 것은 물론 운전보다는 더 복잡하지만, 의료의 국제화는 언젠가 authorities로 하여금 공동의 licensure pathway를 설립하게 할지도 모른다.

For individuals who travel, it is fairly straightforward to arrive at an airport and rent a car. Although international driver’s licences can be procured in some parts of the world, rental car companies typi- cally demand only a licence that is valid in the juris- diction in which the renter resides. Although treating patients is certainly more complex than driving a car, the globalisation of medicine, including the desires of patients to tra- vel across national borders to obtain health care, may one day motivate authorities to establish a com- mon licensure pathway.62,63



비록 유동성이 중앙화된 규제 기구에 의해서 강화될 수는 있지만 단점도 있다. 첫째로, 의료행위라는 것은 지역마다 다르다. 타당도 측면에서 광범위한 지리적 영역을 포함하는 단일화된 시험을 만드는 것은 어렵기도 할 뿐만 아니라 방어가능defensible하지 않을 수 있다. 만약 그렇다 하더라도, 여전히 면허기구들은 시험내용에 따라서는 best practice에 대해서 서로 공유할 필요가 있고, 그것이 효율적인 것이다.

Although mobility can be enhanced by the introduc- tion of centralised regulatory structures (e.g. national or international licensing examinations), there are some potential drawbacks. First and fore- most, the practice of medicine can be quite differ- ent from one region to another. Based on validity considerations, creating a unified examination (for licensure or certification) that spans a large geo- graphic region, or even a continent, may be quite difficult and, perhaps, not defensible. Even so, it would still be prudent, and efficient, for licensing authorities to share best practices and, when appli- cable, examination content.64



결론

CONCLUSIONS


그러나 미래의 환자 진료가 어떻게 되든, 역량의 진화는 필수적인 것이며, 적절한 규제는 자원(연구결과와 자료 등)의 공유 그리고 사용가능한 근거를 활용하여 best practice를 개발할 때 달성될 수 있다.

However, regardless of how patient care is deliv- ered in the future, or the evolution of competencies deemed essential for practice, the development of sound regulatory practices can be best accomplished by sharing resources, including research findings and data, and using the available evidence from around the world to develop best practices.


인증, 면허, 증명, 재확인의 성과를 정량화하기 위한 시스템이 필요하다.

Systems to evaluate and quantify the outcomes of accreditation, licensure, certification and revalidation programmes are needed.



9 van Zanten M, Norcini JJ, Boulet JR, Simon F. Overview of accreditation of undergraduate medical education programmes worldwide. Med Educ 2008;42: 930–7.





 2014 Jan;48(1):75-86. doi: 10.1111/medu.12286.

Ensuring high-quality patient care: the role of accreditationlicensurespecialty certification and revalidationin medicine.

Author information

  • 1Foundation for Advancement of International Medical Education and Research (FAIMER), Philadelphia, Pennsylvania, USA.

Abstract

CONTEXT:

The accreditation of medical school programmes and the licensing and revalidation (or recertification) of doctors are thought to be important for ensuring the quality of health care. Whereas regulation of the medical profession is mandated in most jurisdictions around the world, the processes by which doctors become licensed, and maintain their licences, are quite varied. With respect to educational programmes, there has been a recent push to expand accreditation activities. Here too, the quality standards on which medical schools are judged can vary from one region to another.

OBJECTIVES:

Given the perceived importance placed by the public and other stakeholders on oversight in medicine, both at the medical school and individual practitioner levels, it is important to document and discuss the regulatory practices employed throughout the world.

METHODS:

This paper describes current issues in regulation, provides a brief summary of research in the field, and discusses the need for further investigations to better quantify relationships among regulatory activities and improved patient outcomes.

DISCUSSION:

Although there is some evidence to support the value of medical school accreditation, the direct impact of this quality assurance initiative on patient care is not yet known. For both licensure and revalidation, some investigations have linked specific processes to quality indicators; however, additional evaluations should be conducted across the medical education and practice continuum to better elucidate the relationships among regulatory activities and patient outcomes. More importantly, the value of accreditationlicensure and revalidationprogrammes around the world, including the effectiveness of specific protocols employed in these diverse systems, needs to be better quantified and disseminated.

© 2013 John Wiley & Sons Ltd.

PMID:
 
24330120
 
DOI:
 
10.1111/medu.12286
[PubMed - indexed for MEDLINE]


의사국가면허시험의 경향(Med Educ, 2016)

Trends in national licensing examinations in medicine

David B Swanson1,2 & Trudie E Roberts3






면허의 목적은 의사들이 더 안전하고 효율적인 의료행위에 필요한 지식과 스킬을 갖추게 하기 위함이다. 그 아래에 깔려있는 원칙은 면허는 모든 의사에게 적용되어야 하며, 어디서 수련을 받았든, 어떤 세팅에서 진료를 하든 면허가 있어야 한다는 것이다. 국제적으로 약간의 차이는 있지만, 의사는 진료행위에 필요한 unrestricted licence를 받기 위해서는 다음의 전부 혹은 일부를 갖춰야 한다.

The purpose of licensure is to ensure that doctors have the knowledge and skills necessary to practise medicine safely and effectively. The underlying principle is that licensing should apply to all doc- tors, regardless of where they trained or the setting in which they practise. With some variation interna- tionally, a doctor must meet several requirements in order to obtain an unrestricted licence to practise, commonly including some or all of the following:


  • 인증받은 의과대학 졸업  graduation from an accredited medical school;

  • 일정 기간동안 지도감독 하에 진료행위를 해야하며, 이 시기는 의과대학 졸업 이후가 되기도 함.   the successful completion of a period of prac- tice under supervision, often after medical school graduation, and

  • 하나 혹은 그 이상의 NLE에서 합격해야 함   a passing score on one or more NLEs.


USMLE의 사례를 보여줄 것이다.

we will provide an overview of an example NLE, the United States Medical Licensing Examination (USMLE),


첫 번째 세 단계는 졸업후 수련에 준비도를 평가하기 위한 것이다. Step 3를 보기 위해서는 여기에 통과해야 한다. 미국의 licensing jurisdiction에서 일반적으로 요구하는 것은 아님에도 미국에서는 일반적으로 이 세 시험을 졸업후수련을 시작하기 전에 응시한다. IMG는 미국 졸업후수련에 들어오기 위해서는 이것들을 통과해야 한다. 일정 amount의 졸업 후 수련을 이수하고 모든 component에 합격하면 unrestricted license를 받을 수 있다.

The first three components are intended to assess readiness for postgraduate training; a passing score on these components is required to sit the final component, Step 3. In the USA, medical students typically take the first three components prior to entry to postgraduate training, although US licensing jurisdictions do not typically require this for entry; international medical graduates must pass these components in order to enter US postgradu- ate training. Along with successful completion of a specified amount of postgraduate training, passing scores on all components are required to be eligible for an unrestricted licence to practise.




NLE는 점점 더 흔해질 것이다.

NLES WILL BECOME MORE COMMON


의과대학의 수와 다양성 증가

The number and diversity of medical schools has increased


지난 20년간 의과대학의 수가 지나치게 증가하여 의학교육의 질을 떨어뜨린다는 우려가 높아지고 있다. 2003년 WMA의 general assembly에서  Dr Hans Karle은 의과대학이 1995년부터 54%증가했다고 말했다. Karle은 '일부 의과대학에서 교육의 퀄리티는 충분하지 못하다. 이들 의과대학의 일부는 정말 필요한 것이었지만, 많은 수는 단순히 본국에서 의과대학에 들어가지 못하는 학생을 유혹하기 위한 사업으로 세워진 것이다'

Over the last two decades, there has been growing concern that the substantial expansion in the num- ber of medical schools has diluted the quality of medical education. In 2003, in a speech to the World Medical Association’s general assembly in Helsinki, Dr Hans Karle, then president of the World Federation of Medical Education, said that the number of medical schools had increased by 54% since 1995. Dr Karle was quoted as saying: ‘The quality of education in some schools is not good enough. Some of these schools are badly needed, but many are being set up simply as businesses to attract students who cannot get into medical schools in their own countries.’5



Boulet 등은 의과대학의 지속적 팽창을 보고하면서 IMED에 등록된 1900개의 의과대학을 언급했다. 영리적 목적의 사립대학은 캐리비안, 남아메리카, 남아시아애 많다. 인도의 사례.

Boulet et al.6 also reported on the continued rapid expansion of medical schools, 1900 of which are listed in the International Medical Education Directory. Private for-profit schools were particularly popular in the Caribbean region, South America and South Asia. 

      • In India, during the period from 1970 to 2005, whereas the number of public medi- cal schools grew by 36%, the number of private medical schools increased by 1120%.7



학부의학교육의 다양성은 지역 내에서의 그리고, 학교간 학습성과와 스탠다드의 차이와 관련된다는 근거가 있다. 

There is evidence that the diversity of undergradu- ate medical education is associated with regional and between-school differences in standards and learning outcomes. 

      • 영국에서 의과대학에 따라서 RCGP 회원자격시험 합격률이 다르다.
        In the UK, Wakeford et al.8 reported large differences among medical schools in pass rates on the membership examination for the Royal College of General Practitioners; 

      • 영국에서 의과대학에 따라서 RCP 회원자격시험 합격률이 다르다. 
        McManus et al.9 reported a similar pattern of large differences among UK schools on several compo- nents of the membership examination for the Royal College of Physicians. 

      • USMLE Step 2에서 Case 등은 평균점수의 상당한 차이를 발견하였다. 그리고 이러한 차이는 학교마다의 교내의intramural standard 차이에 기인하며, Step 1도 마찬가지다.
        Looking at variation in the performance of US schools on USMLE Step 2, Case et al.10 found sizable differences in mean scores; these coexisted with high correlations between scores on Step 2 and within-school measures of stu- dent performance, suggesting that the variation observed in Step 2 performance could be attributed to differences in intramural standards across schools. A similar pattern was seen for USMLE Step 1.11 

      • 지역 간, 그리고 국가 간 차이가 Step 2 CK에서 발견되었다.
        In a more recent study, Holtzman et al.12 reported large regional and country-to-country varia- tion on the Step 2 Clinical Knowledge component of the USMLE, confirming a pattern observed almost two decades earlier.13 


개별 연구에 대해서는 언제나 설명이 가능하지만, 이 모두를 종합해서 보면, 교육과정 접근법이 국가적으로, 세계적으로 다양해짐에 따라서 학습성과 또한 다양해졌다.

There are always multi- ple potential explanations for the results of individ- ual studies, but, taken together, the findings suggest that the diversity of curricular approaches used nationally and internationally results in diversity in learning outcomes.




의료인력 유동성이 높아진다.

The medical workforce is increasingly mobile


역량에 관한 문제는 한 국가에서 수련받고 졸업한 의사가 다른 환경으로 갈 때 생긴다.

Competency issues can arise when doctors training and graduating in one country and culture move to practise in another environment.


유럽에서, 의료진의 이동은 1940년대부터 관찰되었으며 해가 지날수록 그 패턴이 다양해지고 있다. 

In Europe, medical migration has been observed since the 1940s and has shown various patterns over the years. 

    • 동유럽 The migration of doctors from Eastern Europe started before the accessions of their countries that resulted from the political transitions of the late 1980s and 1990s. 

    • EU의 확장 The enlarge- ment of the European Union, initially in 2004 and subsequently in 2007, has resulted in increased mobility, especially from east to west.15


전 세계적으로 학부의학교육의 형태과 내용에는 큰 차이가 있다. 유럽에서 학부교육에 필요한 기간은 European directive에 의해서 강제되며 최소한 5년, 5500시간이 요구된다. 대부분의 유럽 국가에서는 의사국가시험이 없으며, 각 의과대학은 자신의 졸업시험이 있다. 스위스와 같은 예외도 있다.

Across the globe, there are major differences in the formats and content of undergraduate medical edu- cation programmes. In Europe the length of time for undergraduate training is mandated by Euro- pean directive as a minimum of 5 years and 5500 hours. In most European countries, there is no national licensing examination at the end of the period of study, and each medical school has its own graduating examination requirements. There are exceptions, such as in Switzerland, which brought in a national licensing assessment in 2004.


미국에서 의과대학은, 대학원 과정second degree이며 4년이다. 캐나다하고 유사하게, 학사학위가 필요하며, 의과대학에 들어가기 전 1~2년정도 대학 수준에서 공부가 필요하다. 캐나다의 대부분 의과대학은 4년이며, 일부는 3년제도 있으며 이 경우 여름방학이 없다. 유럽과 달리 미국과 캐나다는 NLE를 모든 의과대학 졸업생이 응시해야 하고, 국내에서 수련을 받았든 해외에서 수련을 받았든 근무work할 수 있으려면 통과해야 한다. 브루나이를 제외한 모든 ASEAN국가는 어떤 형태든 NLE를 운영하며, 중국도 국가시험이 있다.
In the USA, the medical course, which is usually a second degree, is 4 years in duration. Similarly in Canada, students will have a primary degree or will have studied for at least 1–2 years at university level before entering medical school. Most medical courses in Canada are 4 years in duration; schools that offer 3-year programmes (McMaster and Cal- gary Universities) are exceptions and their pro- grammes do not include summer breaks. By contrast with Europe, both the USA and Canada have NLEs that all medical graduates, whether they were trained domestically or internationally, are required to pass before they can work. All ASEAN countries other than Brunei (i.e. Darussalam, Myan- mar/Burma, Cambodia, Indonesia, Laos, Malaysia, Philippines, Singapore, Thailand and Vietnam) operate some form of NLE, and China also has national examinations.


일부 국가에서는 질관리프로그램을 활용하여 NLE와 비슷한 성과를 내고자 한다. 예를 들면 영국의 모든 의과대학은 GMC에서 인증을 받아야 한다. GMC는 평가의 성격과 형식에 관한 정책문헌을 발간한다. 현재 시스템이서 GMC가 선발한 팀이 (의학교육자들, 임상가, 학생, 일반인 대표자) 매 5년마다 대학을 방문하여 inspect한다. 비록 기준을 충족시키지 못한 학교가 폐쇄될 수도 있지만, 실제로 거의 발생하지 않는다. 만약 기준을 충족시키지 못하면 일련의 monitored requirement와 recommendation이 발생한다. 이 시스템에도 불구하고 영국은 현재 NLE도입을 염두에 두고 있다.

In some countries, quality assurance programmes are used in an effort to achieve outcomes similar to those of NLEs; for example, all medical schools in the UK are accredited by the General Medical Council (GMC). The GMC publishes policy docu- ments advising on the nature and format of assess- ments. In the current system, GMC-selected teams, which include medical educationalists, clinicians, students and lay representatives, visit schools every 5 years to inspect the course. Although schools that fail to meet the required standards can be closed, in practice this has never happened. If the standards are not met, a series of monitored requirements and recommendations are put in place. Despite this system, the UK is currently considering the implementation of an NLE.16




NLE 퍼포먼스가 실제 퍼포먼스를 예측한다.

Performance on NLEs predicts performance in practice


일정 수준에서 NLE퍼포먼스는 이후 의료행위의 퍼포먼스를 예측한다.

To a degree, performance on NLEs predicts subse- quent performance in practice.


Tamblyn 는 MCCQE의 점수가 처방패턴과 유방촬영술 스크리닝 비율을 예측한다는 것을 보여주었다. 이들은 또한 수행능력-기반 NLE의 점수가 환자의 불만과 부적 상관관계에 있음을 보여주었다. Wenghofer 등은 NLE점수가 동료평가 점수와 정적 상관관계에 있음을 보여주었고, 미국에서 일부 연구는 인증시험의 점수가 진료퍼포먼수 점수를 예측하였다. NLE점수는 Certifying exam점수를 예측한다.

Tamblyn et al.17,18 showed that scores on the Medical Council of Canada Qualifying Examination predict such practice behaviours as prescribing patterns and mammography screening rates for family medicine doctors. Tamblyn et al.19 also showed a negative association between patient complaints to medical regulatory authorities for doc- tors licensed in Ontario or Quebec and scores on a performance-based NLE. Wenghofer et al.20 demon- strated that NLE scores were positively related to peer assessments of the quality of care. In the USA, a few studies21,22 have shown the predictive value of certification examination scores for performance in practice; Lipner et al.23 provide a review of this liter- ature. In turn, certifying examination scores are predicted by scores on NLEs.24–26



NLE를 위한 사례

The case for NLEs


NLE를 지지하는 주장은 다음과 같다.

A number of arguments have been offered in favour of NLEs:

    • 중요 영역에서 크게 부족한 의사를 가려낼 수 있다.
         they can screen out doctors with significant defi- cits in important areas of practice;

    • 공립, 사립 의과대학에 대해서 지속적이고 객관적인 역량 스탠다드를 제공하여 환자를 보호한다.
         they can provide consistent, objective evidence about competency standards across both public and private medical schools, thereby helping to protect patients;

    • 해외에서 이주한 의사가 수련과 진료를 시작하기에 앞서 ensure해준다.
         they can be used to ensure the quality of migrant doctors before they commence training or practice in a new country, and

    • 의과대학의 스탠다드를 높여준다. NLE의 실패가 질 향상의 유의미한 촉매가 될 수 있다.
         they can drive up standards in medical schools: failures on NLEs can be a significant catalyst for quality improvement.


NLE에 반대하는 주장은 교육과정의 개혁과 교육방법의 다양성을 저해한다고 우려한다. 비록 미국 레지던트 프로그램에서 NLE점수를 사용하고 있지만 미국과 캐나다에서 이런 경험을 한 것은 아니다. 

The arguments against NLEs relate to concerns about causing reductions in curricular innovation and diversity in teaching methods. This has not been the experience in Canada or the USA, how- ever, although the use of scores on NLEs by US resi- dency programmes in the selection of postgraduate trainees is well documented and makes the stakes associated with USMLE Step 1 very high from the student’s perspective.



내용특이성

CONTENT SPECIFICITY



NLE의 목적은 피험자의 지식과 스킬에 대한 inference를 끌어내는 것이다.

The purpose of NLEs is to permit the drawing of inferences on the knowledge and skills of exami- nees;




어떤 평가방법을 사용하든지, 한 케이스에서의 퍼포먼스가 다른 케이스에서의 퍼포먼스를 아주 잘 예측하는 것은 아니다. 이러한 현상은 내용(사례) 특이성이라고 하는데, 오래 전부터 인식되어 왔다. 같은 현상은 '과업 특이성task specificity'가 있고, 다른 영역의 수행능력 평가에서 오래 전부터 관찰되어 왔다(과학, 수학, 법학 등). 그 결과, 고부담 결정을 충분히 반복가능하게 지지해줄 수 있는 점수를 얻기 위해서는 평가의 length는 충분히 길어야 하고, 충분한 영역을 포함하여야 한다.

Regardless of the assessment method used, perfor- mance on one case does not predict performance on other cases very well.29 This phenomenon, com- monly termed ‘content (or case) specificity’ in the literature on medical education,30,31 has been recog- nised for many years.32–35 The same phenomenon, termed ‘task specificity’, has been observed for performance assessments in other areas, including science, mathematics and law.36–38 As a conse- quence, assessments must be of sufficient length and cover an adequate breadth of material in order to obtain scores that are sufficiently reproducible to support high-stakes decisions.


'충분한 길이'가 무엇인지 알기 위해서 Table 2는 통계적으로 예측된 신뢰도를 보여주고 있다.

To quantify what ‘sufficient length’ means, Table 2 provides statistically projected reliability (generalis- ability) coefficients as a function of testing time for the computer-based components of the USMLE. Values of 0.8 or higher are desirable for examinations on which high-stakes decisions are based.


Step 3에서 합/불합 결정은 MCQ와 CCS 점수를 합산하여 이뤄진다.

The first of these col- umns refers to the MCQ component of Step 3; the second refers to the computer-based case simulation (CCS) component. As will be discussed further in the next section, the latter format was developed to assess physicians’ patient management skills.40 Indices of CCS reproducibility as a function of testing time are significantly lower than for the MCQ component of Step 3: For this reason, pass/fail decisions for USMLE Step 3 are based upon a composite of MCQ and CCS scores,41 rather than on each component independently.


내용특이성의 원인은 불명확하고, skill의 전이를 향상시키기 위한 방법에 대한 연구가 이뤄지고 있다. master-apprentice 접근법은 임상 교육에서 많이 사용되고 있으며, 일부 원인일지도 모른다. 임상 로테이션은 서로 분절된disparate 위치에서 이뤄지는데, 학생의 경험에 상당한 차이를 유발한다. 즉, 학습의 성과가 idiosyncratical하게 다르기 때문에 내용특이성이 생기는 것일 수 있다. 동시에, 내용특이성이 측정의 artefact에 따른 부수현상epiphenomenon이라고 보기도 한다.

The causes of content specificity are unclear35 and research on methods to improve the transfer of skills learned in one situation to another through instructional interventions is underway. The ‘mas- ter–apprentice’ approach used in much of clinical instruction may be partially responsible. Clinical rotations take place at disparate sites, both across and within schools, resulting in substantial variabil- ity in trainees’ experiences. It is, per- haps, not surprising that learning outcomes may vary idiosyncratically, and content specificity may be the assessment consequence. At the same time, others have suggested that content specificity is an epiphenomenon attributable to measurement artefacts.43,44





인지능력 평가

COGNITIVE ASSESSMENTS


Bennett는 CBT의 세 세대에 대해서 설명했다.

Bennett45,46 describes three generations of com- puter-based testing (CBT). 


    • In the first generation, the emphasis was largely on building the infrastruc- ture for test delivery. Although this involved sub- stantial investments in testing centres and rapidly evolving computer hardware and software to admin- ister examinations, assessments resembled tradi- tional tests, differing little in design and item format from their paper-and-pencil counterparts and taking limited advantage of the technology.47 The major innovation during this period was the introduction of computer-adaptive testing (CAT) in which software selects items sequentially based on the current estimate of an examinee’s proficiency.48 When used in conjunction with a large pool of items of varying difficulty, CAT makes it possible to efficiently achieve relatively consistent precision of measurement throughout a score scale. This is particularly important for diagnostic testing, which requires the accurate estimation of examinee proficiency across a broad range of distinct content areas. For licensure testing, CAT can also be useful in reducing item exposure and testing time, but, because the reproducibility of pass/fail decisions is most important, the administration of pre-con- structed fixed forms targeted at the pass/fail point is often simpler and can be as effective.

    • In the second generation of CBT, incremental changes were made in item formats to include mul- timedia formats, short constructed responses, and other enhancements to traditional item formats made possible through computer delivery. Often, new item types were incorporated simply because they were different from traditional MCQs, had visual appeal, or were otherwise ‘interesting’, rather than because they targeted key competencies that could not otherwise be measured. 

    • The (current) third generation of CBTs has begun to incorporate more complex, theory-based simulations and interactive performance tasks replicating important features of real environments and assessing new skills in more sophisticated ways. There is also great potential for integration of assessments with instruction, sampling performance repeatedly over time.


비교적 최근에 도입된 것은 자동화 문항 생성(AIG)이다.


A relatively recent and interesting application of computer technology in medical education is auto- mated item generation (AIG),54–56 although a more accurate term may be ‘computer-assisted item gener- ation’. Traditionally, MCQs for NLEs were painstak- ingly written and reviewed by committees of content experts. It can be challenging to recruit sufficient numbers of content experts and expensive to have them travel to a central site to review test items. With AIG, content experts first create item models or templates that highlight item elements (e.g. patient findings in the stem, distractors) to be manipulated. Software is then used to systematically manipulate these elements in each item model to generate hundreds of new (typically quite similar) items, which can then be reviewed by content experts. Initial research has produced promising results, demonstrating that content experts are unable to differentiate items developed in the tradi- tional manner from those produced with AIG.55 In the longer term, AIG could prove to be a useful technique for producing items in large numbers, meeting a significant practical need for NLEs (particularly those given throughout the year) and for formative assessments administered repeatedly during training.



USMLE Step 3의 CCS는 Bennett가 말한 3세대 CBT이다. CCS의 특징은 다음과 같다.

The CCS component used in USMLE Step 3 pro- vides an example of Bennett’s45 third-generation computer-based assessment. 

  • Each CCS case begins with an opening scenario describing a patient’s loca- tion and presentation.57 

  • Using free-text entry, the examinee then orders tests, treatments and consul- tations while advancing the case through simulated time. 

  • The system recognises over 12 000 abbrevia- tions, brand names and other terms representing more than 2500 unique actions. 

  • The patient’s con- dition changes according to both the actions taken by the examinee and the patient’s underlying clinical problem. 

  • Performance is scored using a computer-automated algorithm that models the judgements that would have been produced by expert clinicians.


앞으로 몇 년간 NLE에 컴퓨터-기반 시뮬레이션이 추가로 이뤄질 것이다 .

Over the next few years, additional computer-based simulation formats are likely to be introduced into NLEs. 

  • Response formats can be enhanced to assess skills in: 

    • differential diag- nosis; the use of diagnostic studies and therapeutic options; 

    • writing orders for admission, discharge and other transfers of care; 

    • the management and recon- ciling of medications, 

    • and other common clinical tasks. 

  • The incorporation of multimedia and enhancements to response formats implies a poten- tial natural link to work on entrustable professional activities, bridging the gap between competencies and assessment.58,59


모든 것을 최대한 현실과 유사하게 시뮬레이션 한다는 것은 매혹적이지 모르지만, written simulation의 파란만장한 역사로부터 그 위험성에 대해 배울 점이 많다. 고부담 평가에서 key feature 문제에서 활용된 것과 유사한 접근법을 사용하는 것이 바람직해 보인다. 측정의 효율성을 높이기 위해서 사례는 최대한 간결해야 하며, key clinical decision에 초점을 두어야 한다. 

It will be tempt- ing to simulate everything as realistically as possible, but much has been learned from the chequered history of written simulations (patient management problems) about the dangers inherent in that approach,29,33 which often result in wasted testing time and scoring problems. For high-stakes assess- ments, adopting an approach similar to that used in ‘key feature’ problems49,64 seems warranted. To improve measurement efficiency, cases should be kept short and should focus on key clinical deci- sions that are critical and essential steps in problem resolution and most likely to result in errors and poor patient outcomes in the real clinical environ- ment.



최근까지 NLE는 거의 전적으로 closed-book test였으며 정보를 recall하는 능력을 평가했다. CBT와 함께 지원자들은 온라인 참고문헌을 볼 수 있게 되었고, 이것은 Bennett이 말한 3세대의 또 다른 특징이다. 이는 어떻게 의사들이 현재 실제 진료실에서 진료를 하는지를 더 잘 모방할 것이며, 피험자의 스스로의 한계를 찾아내는 능력까지를 (간접적으로)평가할 수 있게 될 것이고, 관련된 외부 자료에 접근하는 능력, 수집된 정보를 환자진료결정에 통합하는 능력도 평가할 수 있게 될 것이다.

To date, NLEs have almost exclusively represented ‘closed-book tests’ assessing the examinee’s ability to recall information, as well as to apply it to make clin- ical decisions. With CBT, it is now possible to permit examinees to consult online reference materials (‘open-book tests’) during test administration; this is another characteristic of Bennett’s third generation of CBT.45 This will better mimic how doctors currently practise in the real clin- ical environment, and provide (indirect) assessments of examinees’ abilities to identify the limits of their own knowledge, to quickly access and understand relevant external resources, and to integrate the accessed information into patient care decisions.


NLE에서 OSCE

OBJECTIVE STRUCTURED CLINICAL EXAMINATIONS IN NLES


Harden and Gleeson이 1979년 OSCE를 처음 언급하였고, 그 이후 OSCE는 빠르게 전파되었다. 최근에는 의사면허와 Qualifying 시험에 도입되었다.

Harden and Gleeson65 first described the OSCE in this journal in 1979. Since then, the use of OSCEs has spread quite quickly, first in school-based assess- ments and, more recently, in national licensure and qualifying examinations in several countries, includ- ing Australia, Canada, South Korea, Switzerland, Taiwan, the UK and the USA.


많은 OSCE는 여전히 '전통적' 형태로서, 피험자가 4~5분간 임상과제를 해야 한다.

Many OSCEs still use the ‘classic’ station format: examinees have 4 or 5 minutes to complete a clini- cal task


advanced된 피훈련자에게는 더 길고 복잡한 스테이션이 적절할 것이다.

longer and more complex stations seem more appropriate for advanced trainees and for NLEs in order to assess examinees’ capacity to integrate the constellation of competencies required for the provision of safe and effective patient care.66


시뮬레이션-기반 평가로부터의 validity inference가 부족하다는 점에서 우리는 simulation의 fidelity에 지나치게 초점을 둬왔다는 일부 저자의 지적에 동의하며, fidelity라는 단어가 'physical resemblance'나 'functional task alignment'로 바뀌는게 낫다는 지적에 동의한다.

Given that evidence for the validity of inferences from simulation-based assessments is sparse,60,63,68 we agree with several of these authors that, firstly, there has been too much focus on increasing the fidelity of simulations and, secondly, the term ‘fidelity’ should be abandoned in favour of ‘physical resemblance’ and ‘functional task alignment’.


안타깝게도, 여전히 OSCE기반 시험에 스테이션 수는 적은 편이며, 그 결과 점수와 합/불합 결과가 매우 reproducible하진 않다.

Unfortunately, it is still common for OSCE-based tests to include relatively small numbers of stations. As a consequence, scores and pass/fail outcomes on such tests are not very reproducible,


시험 자원을 더 효율적으로 사용하기 위해서, sequential testing이 도움이 될 수 있다. multi-stage 혹은 flexi-level testing으로도 알려져 있으며, 이 방법은 초기에 짧은 initial screening 시험을 거쳐서 빠르게 합격할 피험자를 파악하는 것이다. 

To make more effective use of testing resources, it may be helpful to consider the use of ‘sequential testing’, also known as ‘multi-stage’ or ‘flexi-level’ testing in the general assessment literature,69,70 with OSCEs.71,72 This approach involves the administra- tion of a relatively short initial screening examina- tion that is used to quickly identify examinees who will clearly pass; this group is excused from further testing, and the assessment is continued for the remaining examinees.



근무지 기반 평가

WORKPLACE-BASED ASSESSMENTS


WBA의 formative한 사용이 늘어날 것이고, summative 한 사용은 challenging 하다.

The formative use of WBAs will increase; their sum- mative use in NLEs will prove challenging.


다양한 WBA의 방법

 In their very useful Association for Medical Education in Europe (AMEE) guide, Norcini and Burch74 describe methods for formative WBAs, including 

  • the mini-clinical evaluation exercise (mini-CEX), 

  • clinical encounter cards, 

  • clinical work sampling, 

  • blinded patient encounters, 

  • direct observation of procedural skills (DOPS), 

  • case-based discussion, 

  • and multi- source feedback (MSF).



WBA가 일상의 진료행위를 더 잘 반영하고, 피험자의 학습즐 자극할 수 있지만 문제도 있다. 이것은 기껏해야 formative하게만 사용 가능하다. 최근에는 신뢰도 타당도를 높이는 기전이 제시된 바 있다.

Although WBAs more closely reflect everyday prac- tice and can stimulate trainee learning, they are not without challenges. As Norcini and Burch note in their AMEE guide,74 they are best used formatively, not summatively. Recent research has also suggested some mechanisms for improving the reliability and validity of scales used in making WBA judgements by aligning them with the con- structs of developing clinical sophistication and ‘entrustability’.79



안타깝게도, summative한 활용은 피훈련자와 훈련자가 평가를 바라보는 관점을 바꿀 수 있다. 피훈련자는 'dove'로 보이는 훈련자를 골라내거나 자신이 자신있는 분야를 고르고자 할 수 있다. 비슷하게 훈련자는 평가를 'tick box exercise'로 볼 수 있다. 이러한 문제를 해결하기 위해서 GMC는 두 개의 종류의 WBa사용을 제안하였다.

Unfortunately, the summative use of WBAs can change how trainees and trainers view the assess- ment. Trainees may choose assessors who are seen as ‘doves’ and may pick out cases perceived to be less challenging or in areas in which they feel confident. Similarly, trainers may treat these assessments as ‘tick-box exercises’.80 To address these issues, the UK GMC proposed using two different types of WBA.81 

  • One type – referred to as supervised learning events (SLEs) – would be developmental and not used to determine progress. 

  • The second type – referred to as assessments of practice (AoPs) – would be used summatively to inform judgements about trainees’ progress. 

UK Foundation Programme에 SLE가 도입되었음.

The use of SLEs has now been incorporated into the UK Foundation Programme.81



WBA가 피훈련자들 사이의 실제 차이를 반영할 수 있지만, 그러한 차이는 교수가 판단을 내리는 기준이 다르거나 환자 집단의 특성이 다르거나, 활용가능한 자원이 다르기 때문에 생길 수도 있다. WBA의 risk adjustment를 위한 방법이 특히필요하다. 그러나 그러한 confounding factor를 보정할 분석적 방법이 없다. WBA 결과가 표현되는 metrics는 본질적으로 임상적 맥락과 연결되어 있다. 형성평가에서 이것은 문제가 되지 않으나, 총괄(평가)적 활용은 정당화하기 어렵게 된다.

Although the WBA results may reflect real differences in the trainees’ competence, there are alternative explanations, including variation in the standards used by faculty staff to make judge- ments, and differences in the patient populations (case mix) and resources available. Methods for risk adjustment for WBAs are particularly needed.82 However, there is no analytic way to adjust for these confounding factors: the ‘metrics’ on which WBA results are expressed are inherently tied to the clini- cal contexts in which they are obtained. This is not important for formative assessments in which the goal is to stimulate learning, but it makes the direct summative use of WBA results in NLEs challenging and difficult to justify.



마무리

CONCLUDING THOUGHTS



NLE가 비용-효과적인지 안니지는 어려운 문제이고 관점에 따라 다를 수 있다.

Whether or not NLEs are cost-effective is a difficult question and probably a matter of perspective. 

  • Costs undoubtedly vary substantially across countries as a function of both examination design and numbers tested.85 As an example, it is straightforward to esti- mate the dollar cost of the USMLE from public infor- mation, at least from the examinees’ perspective. An examinee graduating from a US school who passes each USMLE component on the first attempt would pay a total of roughly US$3200 in examination fees; the total fees paid by an analogous IMG are roughly US$800 more.86–88 Even with the addition of travel expenses to test sites, these costs are quite low relative to those of medical school tuition in many countries, although they are still substantial. The combining of cost information with USMLE examinee counts showed that total examina-tion fees, across USMLE components, amounted to approximately US$120 million in 2014.86–88


여기서는 초기 면허에만 초점을 뒀고, 이 경우 평가할 skill은 상대적으로 균질하다. revalidation에 관한 문제는 더 복잡하다. 

We have deliberately focused this arti- cle on initial licensure, for which the skills to be assessed are relatively homogeneous. Issues for reval- idation are much more complex and as doctors’ practices differentiate and evolve, typically become narrower. Norcini et al.89,90 and Melnick et al.91 pro- vide cogent discussions of the challenges posed.


명확하게, NLE는 많은 한계가 있다. NLE는 잘 해봐야 진료에 필요한 역량을 측정할 뿐이며, 그 의사가 진료를 유능하게competently할 것인지를 알려주지는 않는다. 그 결과, NLE에서 좋은 퍼포먼스를 보이는 것이 실제 진료를 잘 하는 것으로 이어지지 않을 수 있다. 그러나 NLE에서 안 좋은 퍼포먼스를 보이는 것은 (환경적 요인이 의사의 부족한 점을 보완해줄지라도) 실제 진료에서도 안 좋은 모습을 보일 가능성이 많다. 

Clearly, NLEs are subject to many limitations. At best, they can only measure competence to prac- tise, not whether a doctor does (or will) perform competently in practice. As a consequence, good performance on an NLE does not guarantee good performance in practice. However, poor perfor- mance on an NLE does suggest that performance in practice may fall below acceptable levels, although clearly other factors, notably systems of care, may compensate for individual doctors’ short- comings or exacerbate their weaknesses.


16 General Medical Council. National Licensing Examination 2014. http://www.gmc-uk.org/06_ National_Licensing_Examination.pdf_57876215.pdf. [Accessed 30 March 2015.]


43 Kreiter CD, Bergus GR. Case specificity: empirical phenomenon or measurement artefact? Teach Learn Med 2007;19:378–81.


53 Holtzman KZ, Swanson DB, Ouyang W, Hussie K, Allbee K. Use of multimedia on the Step 1 and Step 2 Clinical Knowledge components of USMLE: a controlled trial of impact on item Acad Med 2009;84:90–3.


90 Norcini JJ, Lipner RS, Grosso LJ. Assessment in the context of licensure and Med 2013;25:62–7.








 2016 Jan;50(1):101-14. doi: 10.1111/medu.12810.

Trends in national licensing examinations in medicine.

Author information

  • 1Academic Programmes and Services, American Board of Medical Specialties, Chicago, Illinois, USA.
  • 2Department of Medical Education, University of Melbourne Medical School, Melbourne, Victoria, Australia.
  • 3Leeds Institute of Medical Education, University of Leeds, Leeds, UK.

Abstract

CONTEXT:

As a contribution to this special issue commemorating the journal's 50th volume, this paper seeks to explore directions for nationallicensing examinations (NLEs) in medicine. Increases in the numbers of new medical schools and the mobility of doctors across national borders mean that NLEs are becoming even more important to ensuring physician competence.

OBJECTIVES:

The purpose of this paper is to explore the use of NLEs in the future in the context of global changes in medical education and health care delivery.

METHODS:

Because the literature related to NLEs is so large, we have not attempted a comprehensive review, but have focused instead on a small number of topics on which we think we have something useful to say. The paper is organised around five predicted trends for NLEs.

DISCUSSION:

The first section discusses reasons why we think the use of NLEs will increase in the coming years. The second section discusses the ongoing problem of content specificity and its implications for the design of NLEs. The third section examines the evolution of large-scale, standardised cognitive assessments in NLEs and suggests some future directions. Reflecting the fact that NLEs are, increasingly, attempting to assess more than just knowledge, the fourth section addresses the future of large-scale clinical skills assessments in NLEs, predicting both increases in their use and some shifts in the nature of the stations used. The fifth section discusses workplace-based assessments, predicting increases in their use for formative assessment and identifying some limitations in their direct application in NLEs. The concluding section discusses the cost of NLEs and indulges in some further speculations about their evolution.

© 2015 John Wiley & Sons Ltd.

[PubMed - indexed for MEDLINE]



국가면허시험, 유럽면허시험, 또는 아무것도 안 하기? (Med Teach, 2009)

National, European licensing examinations or none at all?

C. P. M. VAN DER VLEUTEN

Maastricht University, The Netherlands




고등교육 일반과 의학교육에서 유럽의 통합은 특이하다. 볼로냐 선언 이후 대부분의 유럽 국가들은 two-cycle 훈련 프로그램을 도입하였고, 의과대학에도 이러한 것이 도입되었다. European legislature에서 전문직의 자유로운 mobility가 있다. 유럽 전역에 걸친 교육 이니셔티브의 퀄리티를 관리하기 위하여 고등교육을 accredit하려는 움직임이 있다. 

The unification of Europe in higher education in general and in medical education in particular is progressing. Since the Bologna declaration most of the European countries are in the process of introducing a two-cycle training programme in higher education that also affects medical training programmes (Patricio et al. 2008). Within the European legislature there is free mobility of professionals across the borders of Europe. To control for quality of education initiatives are taken across Europe to accredit higher education programmes (Dittrich et al. 2004) including medical training programmes (van Zanten et al. 2008).


근대 유럽의 일부분은 노동력의 mobility와 그 경향을 함께 한다. 그렇다면 질문은 "이 모든 것에서 환자의 목소리는 어디로 갔는가? 환자의 권리는 이 유동성에 대해서 어떻게 지켜지고 있는가?"이다. 유럽 의과대학들은 서로 다른 퀄리티의 졸업생을 양성하고 있다. 만약 전 유럽적 관점에서 보면 어떠한가? Athens 의과대학 졸업생은 Tampere의 졸업생과 어떻게 다른가?

Part of modern Europe is also the trend of mobility of the workforce (Jinks et al. 2000). The question is: what is the voice of the patient in all this? How are the rights of the patient protected against this mobility? European country medical schools produce different quality of medical graduates (McManus et al. 2008). What happens if we look at this from a European perspective? How different is a graduate coming froma medical school in Athens as compared to one from Tampere? Or from London compared to Warsaw?



이 문제에 대한 한 가지 접근법은 북미의 면허시험 루트를 따르는 것이다. 그러나 내가 북미로 와서 내 자국에 면허시험이 없다고 말할 때 그들은 날 축하해준다.

One way of dealing with this issue of quality of graduates is to follow the North-American route of licensure examinations. However, when I come to North America and explain that licensing examina- tions do not exist in my country, they sometimes congratulate me.


질문은, 표준화된 시험이 표준화된 (교육) 프로그램을 낳느냐는 것이다. 최근의 평가와 관련된 문헌에서 'testing culture'에서 'assessment culture'로의 이행이라는 패러다임을 주장한 연구자가 있다. 여기서 우리는 assessment of learning을 assessment for learning으로 변화시켜야 한다. 평가의 원동력을 전략적으로 활용하여 평가를 학습 그 자체에 가급적 최대한으로 embed 시켜야 한다. 면허시험은 그 정의상 assessment를 ongoing learning과 구분시킨다.

The question is whether standardized tests lead to standardized programmes. In the recent assessment literature one speaks of a paradigm shift by moving away from the ‘testing culture’ towards an ‘assessment culture’ (Gielen et al. 2003). Here we change from assessment of learning towards assessment for learning. By using the driving effect of assessment strategically it is advocated to embed assessment as much as possible with the learning itself (Wilson & Sloane 2000). Licensure examina- tions by definition separate the assessment from the ongoing learning.



시스템 간 차이가 너무 커서 졸업생간 어떠한 비교도 사실상 불가능하다. 같은 OSCE스테이션에 대해서도 의과대학마다 passing score가 달라서 한 의대에서 합격할 학생이 다른 의대에서는 탈락할 수도 있다.

Finally, this issue of Medical Teacher contains a study by Peter McCrorie and Katherine Boursicot exploring the graduating assessment systems across the UK in terms of content, format, testing time and standard setting procedures(McCrorie & Boursicot 2009). They found wide variation in these systems and conclude that any comparison of graduates is really impossible. Even worse, they cite one of their own studies in which the same procedure used in different medical schools to set passing scores for the same OSCE stations resulted in such widely differing passing scores that students who would have passed in one medical school would have failed in another (Boursicot et al. 2006). They therefore conclude it is time to reassure the public by introducingnational qualifying examinations. 


Archer J. 2009. European licensing examinations – The only way forward. Med Teach 31:215–216.


McManus IC, Elder AT, De Champlain A, Dacre JE, Mollon J, Chis L. 2008. Graduates of different UK medical schools show substantial differences in performance on MRCP(UK) Part 1, Part 2 and PACES examinations. BMC Med 6:5.


McCrorie P, Boursicot K. 2009. Variations in Medica School Graduating Examination in the United Kingdom: Are clinical competence standards comparable? Med Teach 31:223–229.


Melnick D. 2009. Licensing examinations in North America: Is external audit valuable? Med Teach 31:212–214.




 2009 Mar;31(3):189-91.

NationalEuropean licensing examinations or none at all?

Author information

  • 1Maastricht University, The Netherlands.
PMID:
 
19811114

[PubMed - indexed for MEDLINE]



국가면허시험의 필요성 (Med Educ, 2007)

The need for national licensing examinations

Lambert Schuwirth





미국과 영국의 평가가 서로 반대방향으로 진행해온 것을 보면 흥미롭다. 미국은 높은 퀄리티의 의사국가시험을 오랜 전통으로 유지하고 있으며, 영국은 각 의과대학이 각자 자신의 시험프로그램에 대한 책임이 있다. 그러나 영국의과대학은 외부 examiner를 두고 있는데, 의사국가시험은 없다.

It is interesting to see how some developments in assessment in the USA and UK seem to have gone in opposite directions. The USA has a longstanding tradition of high- quality national licensing examin- ations, whereas, in the UK, each medical school has traditionally been responsible for its own examination programmes. Although UK medical schools use external examiners, the country has no national licensing examin- ations.



"완전히 중앙화된 의사면허시험 vs 탈중앙화된 의과대학 자체 평가"를 둘러싼 논쟁은 다양한 주장을 가져왔다.

The debate between the systems ) a fully centralised national licens- ing system versus a decentralised in-school assessment system ) needs to incorporate a myriad of arguments



단발성 vs 지속성

SINGLE-SHOT ASSESSMENT VERSUS CONTINUOUS ASSESSMENT PROGRAMMES


대부분의 면허시험은 단발성(1회, 2회, 또는 3회의 시험)의 성격을 띈다. 이들 시험을 통과한 지원자는 특정 주제에 대한 적절한 스탠다드에 도달했다고 인정되며, 다시는 시험을 보지 않아도 된다. 역량의 관점에서 이는 평가와 교육의 misalignment이다. 반대로, 의료전문가가 되기 위해서 필요한 요소는 무수하며, 이 요소들은 특정 시점에서 평가되어야 하고 평가될 수 있다.

Yet most licensing approaches apply a  single-shot  technique with 1, 2 or 3 tests, each of which has to be passed. A candidate who passes such tests is considered to have reached an adequate standard in terms of the subject matter tested and will not be tested again. In terms of com- petences, this constitutes a mis- alignment between assessment and education. By contrast, there are a substantial number of elements involved in becoming a medical expert that can and probably should be tested definitively at certain time-points.




시간적 측면에서의 제한 vs 확장

LIMITED VERSUS EXTENDED MEASUREMENTS IN TIME


단발성 시험을 아무리 잘 설계하더라도, 제한된 시간 내에 제한된 샘플만을 가지고 할 수 밖에 없다. 네덜란드에서 Progress test의 결과는 (4개 의과대학이 공동으로 만들고 시행하는데) 각 의과대학 시점에 따라, 학년에 따라 상당한 차이를 보인다. 

No matter how carefully designed the single-shot assessment is, it always represents a limited sample in time. Progress test results in the Netherlands, where 4 medical schools jointly produce and administer progress tests, show large variations between the relative positions of each medical school depending on the test moment, the year groups com- pared and the cohorts of stu- dents.2



고도로 구조화된 평가 vs 유연한 평가

HIGHLY STRUCTURED VERSUS FLEXIBLE ASSESSMENT APPROACHES


필연적으로, 모든 의사면허시험은 고도로 구조화되어있다. 더 나아가 그들은 모든 시험이 모든 지원자에게 시행된다는 점에서 매우 inflexible하다.

By necessity, all national licensing assessment approaches I know of are highly structured. Moreover, they are rather inflexible in that the same test is given to all candidates.


그러나 in-school 시험은 추가적인 정보를 줄 필요가 있다. 첫째로, 어떤 교육적 인터벤션이 학생의 역량을 이상적으로 향상시키기 위해서 필요한가를 결정하는데 중요하다(치료적 결정 이라고 한다). 두 번째 결정은 어떤 종류의 추가적 평가정보가 개별 학생이 다음 단계로 넘어갈지 말지를 결정하는데 필요한지를 결정해야 한다 (진단적 결정 이라고 한다.)

In-school examinations, however, need to provide addi- tional information. Firstly, it is important to determine which educational intervention is needed to improve optimally a student’s competence (we might call this a  therapeutic  decision). The second decision involves which type of additional information or assess- ment is needed for an individual student to determine his or her progression to a next phase (a sort of  diagnostic  decision).



저차원 역량 vs 고차원 역량

LOW-LEVEL VERSUS HIGH- LEVEL ABILITIES



이 용어를 사용하는 것은 무슨 말을 하는지 매우 간결하게 묘사하기 때문이다. 그러나 나는 low 와 high라는 용어가 부적절하다고 생각한다. 하나가 다른 하나보다 더 중요하다는 것에 동의하지 않는다.

I use these terms here because they describe so succinctly what is meant, but I think the terms  low  and  high  are inadequate. I dis- agree with their implication that one is more important than the other.


국가시험은 전통적으로 저차원 스킬에 초점을 두나, 현재는 고차원 스킬과 역량에 대한 관심이 높아지고 있다. 얼마나 시험을 잘 설계하든지간에 심지어 OSCE조차 이러한 역량의 평가에는 제한적이다.

National tests typically focus on  low  level skills, whereas currently there is growing interest in higher order skills and compe- tences, such as  professionalism ,  scholarship , the ability to think as a  scientist , etc. Regardless of how well designed it is, any highly structured examination (even a national OSCE) is of limited value in this respect.


중앙화된 전문가 vs 탈중앙화된 전문가

CENTRALISATION VERSUS DECENTRALISATION OF EXPERTISE


의사면허시험시스템은 well equipped 된 면허기관이 필요하다. 이러한 조직은 그 분야를 이끄는 다수의 전문가가 고용되어야 하고, 이러한 전문가집단이 연구와 개발에 미치는 영향은 방대하다. NBME나 ABIM이 평가에 대한 우리의 지식과 이해에 미치는 영향을 고려해보라. 탈중앙화된 상황에서 이러한 전문가집단은 잘 되어봐야 전국에 흩어져있을 뿐이며 전혀 가능하지 않다.

A system with national licensing requires a well equipped licensing body. This is often an organisation in which many leading experts in the field are employed. The impact of such groups of experts on research and development in assessment is huge. Consider the enormous contributions of, for example, the National Board of Medical Examiners or the Ameri- can Board of Internal Medicine to our knowledge and understanding of assessment. In a decentralised situation this expertise is, at best, dispersed through the country or not available at all.



면허시험이 개별 학생을 비교하는 것보다는 교육과정을 비교하는데 유용하다는 인식을 제안하며, in-school 평가에서의 GPA비교는 한 나라의 의과대학 졸업생의 퀄리티를 관리하는데는 쓸모없다.

 I would like to suggest the notion that the results of licensing examinations are more useful for comparing curricula than for com-paring the results of individual students, and that comparisons ongrade point averages from in- school assessments only are useless in ensuring and controlling the quality of medical graduates in a country.







 2007 Nov;41(11):1022-3.

The need for national licensing examinations.

[PubMed - indexed for MEDLINE]




유럽의사면허시험 또는 국가의사면허시험에 대한 다섯 가지 신화(Med Teach, 2009)

Five myths and the case against a European or national licensing examination

RONALD M. HARDEN

University of Dundee, UK






유럽의사면허시험은 매혹적인 것이지만, 부작용은 충분히 논의되지 않았다.

The case for a European or national examination is a seductive one,


What have not been discussed sufficiently, however, are the associated disadvantages or side effects –





신화 1: 유럽의사시험 혹은 의사국가시험은 의료행위에 있어서 중요한 영역을 평가하게 해줄 것이다.

Myth 1: A European or national examination will ensure that the candidate is assessed in important areas of medical practice



아니다! 중앙화된 시험은 가장 중요한 학습성과 대신 국가 혹은 유럽 수준에서 쉽게 평가가능한 것을 강조할 것이다. 시험의 초점은 written exam에 맞춰질 것이고, MCQ가 주를 이룰 것이며, 임상술기나 태도보다는 지식을 평가하게 될 것이다. Schuwirth가 주장한 것처럼 '국가시험은 '낮은' 레벨의 스킬에 초점을 두게 되나, 최근의 추세는 프로페셔널리즘, 스칼라십, 과학적 사고 등의 '높은' 수준의 스킬을 강조한다'. 역량평가프로그램으로 이행함에 따라서 우리는 전통적인 형식과 새로운 형식의 평가를 모두 사용해야 하며, psychometric과 edumetric 모두에서 유도derive된 질평가준거를 활용해야 한다. 이러한 조합은 중앙화된 시험에서는 갖추기 어렵다.

No! A centralized examination will emphasize not necessarily the most important learning outcomes but instead those which can be readily assessed at a national or European level. The focus will be on written examinations, often with multiple choice questions, where knowledge rather than clinical skills or attitudes are assessed. As Schuwirth (2007, p. 1023) argued ‘National tests typically focus on ‘‘low’’ level skills, whereas currently there is growing interest in higher order skills and competences, such as ‘‘professionalism’’, ‘‘scholarship’’, the ability to think as a ‘‘scientist’’, etc.’. With the move to competence assessment programmes, we need a combination of different assessment methods including both traditional and new forms of assessment with the criteria used to evaluate their quality derived from both psychometrics and edumetrics (Baartman et al. 2007). This combination is unlikely to be found with a central examination.



문제는, 과거에는 우리가 측정에 대해 논쟁을 할 때 '신뢰도'를 강조했다. 우리는 이제는 'systemic validity'의 개념을 강조하는 데에 이르렀으며, 이것은 교육시스템, 교육과정, 교육변화instructional change에 대한 것이다.

A problem is that in the past we have emphasized in the measurement debate, the concept of reliability. We need now to focus more on Fredericksen and Collins’ (1989) concept of ‘systemic validity’ which they define in terms of inducing in the education system, curricular and instructional changes.


중앙화된 시험은 덜 중요한 학습성과를 평가하게 되면서, 교사의 교육과 학생의 학습 패턴에 영향을 주어 뒤틀리게 한다. soft skill을 강조하는 방식으로 변화하는 현재의 교육과정은 지식의 recall과 사실의 manipulation을 다루는 시험에서는 축소undercut될 것이다.

A centralized examination, assessing as it does less important learning outcomes, is likely to influence and distort a teacher’s teaching and the students’ study patterns. Current moves to increase the emphasis in the curriculum on ‘soft skills’ such as the ability to generate fresh and original ideas, to work in teams, and to be empathetic are undercut when assessment regimes largely test the recall and manipulation of facts (Cole 2007).



신화 2: 유럽의사시험 혹은 의사국가시험은 평가행위assessment practice의 향상을 가져올 것이다.

Myth 2: A European or national examination is likely to encourage change and lead to improvements in assessment practice


아니다! 현재의 근거를 보면, 중앙화된 시험이 있는 시스템에서는 평가의 혁신은 저해된다.

No! The evidence is that where there is a system of centralized examinations, innovation in assessment is hindered.


NBME는 아마도 중앙화된 평가전문가들이 모인 가장 좋은 예시일 것이며, 전 세계적으로 평가행위의 발전에 기여해왔고, 그 분야의 이해에 기여하였다. 그러나 그 조직의 우선순위상 필요한 것을 감안하면, 그러한 조직은 평가practice의 최신의 지식을 활용하거나 그 분야의 새로운 이니셔티브를 가져오는 것에는 적합하지 않다. Peter Scoles는 NBME가 미국에서 Gateway Examination의 지위를 갖게 되면서, USMLE를 변화시키기 위해서는 6~10년의 시간이 필요하다고 강조하였다. 의과대학의 졸업시험으로 OSCE가 처음 도입된 것은 30년 전이었지만, 다수의 의과대학에서 이미 그것을 도입한지 수 년이 지나서야 중앙화된 시험에 도입되기 시작하였다.

The NMBE is probably the best example of a central team of assessment experts and it has made significant contributions internation- ally to advances in the practice of assessment and to our understanding of the field. Because, by necessity its priorities have to be different, however, such a body is not well placed to work at the cutting edge of assessment practice and to lead new initiatives in the field. Peter Scoles highlighted, at a session on the NBME position on the Gateway Examination in the USA, that 6–10 years was required to change the process for the USMLE. It was 30 years after the objective structured examination (OSCE) was first introduced in the final examina- tion at the medical school in Dundee (Harden et al. 1975) and many years after it was introduced by a number of other schools that it was adopted as an approach to be used in a centrally organized examination.



중앙화된 시험에 대해서는 근본적인 우려가 있는데, 그 특성상 평가에 대한 낡은outmoded 관점을 가질 수 밖에 없고, 학생들로 하여금 평가란 교육과정이나 교육 프로그램에서 분리되어 어느 한 시점에서 가장 잘 이뤄질 수 있는 것이라는 오해를 가지게 만든다는 것이다. Schuwirth가 '역량의 관점에서, 이러한 시험(한 차례로 끝나는 시험)은 평가와 교육의 misalignment를 가져온다'

There is a fundamental concern about a central examina- tion and its impact on progress with assessment practice. A central examination by its very nature perpetuates an outmoded view of assessment and encourages the misconcep- tion that students are best evaluated at one point in time and that assessment is something separate from the curriculum and the teaching programme. As Schuwirth (2007, p. 1022) put it ‘in terms of competences, this (single-shot assessment) constitutes a misalignment between assessment and educa- tion’.



미래를 볼 때 Schuwirth and Van der Vleuten 는 '평가는 교육 프로세스의 결과에 대한 외부적 측정이 아니라, 교육 프로세스의 핵심적 부분으로 인식될 것이다'라고 하였다. 이들은 또한 '평가는 psychometric measurement problem으로만 인식되지 않을 것이며, 그보다는 교육설계의 문제educational design problem로 인식될 것이다'라고 했다. 이는 평가의 목적은 단순히 한 지원자가 기준을 충족시켰는지를 보는 것이 아니라, 한 지원자의 역량에 대한 정보를 교육과 교육과정을 개개인의 요구에 따라 tailor하는데 사용되게끔 하는 것으로 봐야 함을 시사한다. 그리고 이것은 평가가 교육과정에 통합될 때 (교육이 끝날 때 single-shot으로 하는 것이 아니라) 가능할 것이다.

Looking to the future, Schuwirth and Van der Vleuten (2006, p. 18) suggest that ‘Assessment will be less viewed as an external measurement of the results of the educational process but more as an integral part of the process’. They argue that ‘Assessment is no longer seen exclusively as a psychometric measurement problem, but more as an educational design problem. This implies that the purpose of assessment is not merely to determine whether a candidate is up to standard, but more how the information about the candidate’s competence can best be used to tailor the teaching or the courses to individual needs’. This is most likely to be achieved when assessment is integrated into the curriculum and not a single- shot end of course in national or international test.




신화 3: 스탠다드와 국제화의 시대에, 유럽의사시험 혹은 의사국가시험은 uniformity를 향상시킴으로서 우리의 요구를 충족시킬 것이다.

Myth 3: In an age of standards and globalization, a European or national examination will meet our needs by encouraging uniformity


아니다! 의심할 여지 없이, 모든 학교에 해당되는 핵심 역량이 있고, 공통의 평가가 적절한 부분이 있지만, 중요하고 유의한 차이 또한 존재한다.

No! Undoubtedly there are core competencies that are likely to be common to all schools and where a common assessment might be appropriate, but there are also important and significant differences. 



실제로 UK에서는 GMC가 'promoting equality and diversity'를 질관리프로그램의 한 목표로 지정하고 있다. 이 문제는, Darney가 정의한 것으로서, 국제적 donor 국가에 의해서 주도되는 시험의 활용이 인지적 성취를 글로벌하게 강조하는 결과를 낳으면서, 좋은 교육은 무엇인가에 대한 지역의 가치는 무시하는 결과를 낳는 것이다.

Indeed in the UK, the General Medical Council has been ‘promoting equality and diversity’ among the aims of its Quality Assurance Programme. The problem, as defined by Carney (2003), is that the use of tests promoted by international donor nations has resulted in a global emphasis on cognitive achievement without a con- sideration of what local values might define as good education.



신화 4: 의사국가시험은 시스템의 퍼포먼스를 추적할 수 있는 지표이다.

Myth 4: A national examination is of value as an indicator that helps to track the performance of the system


Smith는 ‘tunnel vision, sub-optimization, myopia, measure fixation, gaming, ossification, misinterpretation and misrepresentation’이라는 문제를 언급했다.

Smith (1995) identified a ‘huge number of instances of unintended behavioural consequences of the publication of performance data’ and described problems such as ‘tunnel vision, sub-optimization, myopia, measure fixation, gaming, ossification, misinterpretation and misrepresentation’.


Fitz-Gibbon and Tymms 는 지표가 복잡한 상황을 제대로 반영하지 못할 수 있으며, 각 기관이 지표에 초점을 맞추려는 노력에 대한 게이밍이 발생하여 그 시스템이 훼손될 가능성도 있다고 지적했다.

Fitz-Gibbon and Tymms (2002) argue that an indicator may not reflect the complexity of the situation and that the system will be undermined as gaming takes hold with the efforts of the institution focussed on the indicator (student performance in the national examination, for example) at the expense of other issues.


Hecker and Violato의 연구에 따르면, 의사국가시험의 퍼포먼스를 교육과정평가의 지표로 사용할 때 문제가 발생할 수 있다. USMLE 점수에 기반하여 이들은 '의학교육개혁의 변화는 학생의 성취에 별다른 영향을 주지 못한다'는 결론을 도출하였다. 그러나 이들은 자신들의 연구에서 종속변수가 면허시험 점수이며, 의사의 여러 중요한 특성인 공감/문제해결/임상추론 등을 평가하지 못한다는 한계를 인지하고 있었다. 

The study of Hecker and Violato (2007, p. 112) demon- strates the problems that can arise when performance in a national examination is used as an indicator for curriculum evaluation. Based on USMLE scores, they made the sweeping conclusion that ‘changing curricula in medical education reform is not likely to have much impact on improvement in student achievement’. They did recognize, however, that in their study the dependent variables are licensing exam scores, which may not fully access important physician characteristics such as empathy, problem solving, clinical reasoning and so on, that are learning objectives of some medical schools.


표준화시험과 고정된 교육과정의 문제는 가장 중요하게 다뤄져야 할 유형의 교육을 죽인다는 데 있다. 그러나 학생이 고득점으로 시험을 통과하게 도와주려는 선생의 노력을 누가 비난할 수 있겠는가? 학생들이 표준화 시험에서 잘 하도록 가르치려는 유혹은 그러한 시험이 전체 학교 시스템을 평가하는 자료로 사용될 때 거의 피할 수 없는 상황이다.

This problem with standardized tests was highlighted in ‘Engines for Education’ (Engines for Education 2008). ‘The problem with standardized tests and the fixed curricula they engender is their tendency to kill off the kind of education that matters most. But who can blame a teacher or school for orienting the lesson towards helping students pass those tests with high marks? The temptation to teach students to do well on standardized tests is almost unavoidable when performance on such tests is how entire school systems are evaluated.’




신화 5: 유럽의사시험 혹은 의사국가시험은 더 안전한 의료행위를 하게 해서 공공을 보호할 것이다.

Myth 5: A European or national examination will lead to safer medical practice and protect the public from substandard practitioners



아니다! 중앙화된 시험이 존재하지 않는다고 해서 수준미달의 의사를 양성한다는 근거는 없다. 또한 그러한 시험을 도입한다고 향상이 있다는 근거 또한 없다. 의사국가시험이 없는 UK의 의과대학 졸업생이 진료역량 측면에서 의사국가시험이 있는 미국의 의사보다 열등하다는 근거는 없다.

No! There is no evidence that the absence of a centralized examination leads to the graduation of substandard doctors (Noble 2008) nor that the introduction of such an examination will lead to an improvement in the practice of medicine. There is no evidence that the graduate of a UK school where there is no national examination is in some way inferior in terms of their competence to practice medicine than the graduate of an American school who has passed the relevant national examinations.


Papadakis 는 진료중인 의사에 대해 medical board가 시행하는 displinary action이 (의사면국가시험에서 측정된 퍼포먼스보다) 비전문직업적 행위와 더 강력한 연관이 있음을 보여준 바 있다. 

Papadakis et al. (2005) found that disciplinary action taken against practicing doctors by medical boards in the USA was more strongly associated with records of unprofessional behaviour in medical school, than with their performance as measured by a national examination.



결론

Conclusions


전체적인 이득에 대한 근거가 없음을 고려하면, 왜 상당한 양의 자원이 유럽의사시험 혹은 의사국가시험에 투입되어야 하는가를 물어야 할 것이다. 이는 연속체에 따라 구분해볼 수 있다.

Given the lack of evidence as to the overall benefits, one has to ask whether the significant resources required to implement a European or national examination might be used in other ways to improve the assessment practice. This can be considered as a continuum with stages in the continuum: 



(1) Assessment procedures in the medical school are completely independent with no external standards


(2) The school’s assessment procedures are subject to external inspection and approval in line with specified standards (such as those offered by the WFME).


 (3) Examinations in a school are monitored by external examiners including student pass/fail decisions (the current practice in the UK, with the process quality assured by the General Medical Council). 


(4) Assessment instruments and some questions are shared between schools and common questions are intro- duced in examinations. 


(5) Students are required to take and pass both an internal school examination and an external European or national examination.


(6) Only a single European or national examination is taken by the student.




Noble ISG. 2008. Are national qualifying examinations a fair way to rank medical students? No. Br Med J 337:1a279.



 2009 Mar;31(3):217-20.

Five myths and the case against a European or national licensing examination.

Author information

  • 1University of Dundee, UK. r.m.harden@dundee.ac.uk

Abstract

The introduction of a European licensing examination or national examinations, where these do not already exist, offers significant advantages. These are more than offset, however, by the disadvantages and the collateral damage incurred. Five myths about centralizing examinations are explored. Myth 1: The claim that a central examination will ensure that candidates are assessed in important areas of medical practice is unfounded. What tends to be assessed are learning outcomes that can be easily assessed. These are often not the important outcomes related to the overall competence of a doctor. Myth 2: It is claimed that a central examination will lead to improvements in assessment practice. The evidence is that this is not the case and that, in fact, a central examination stifles change and inhibits innovation. Myth 3: A central examination, it is suggested, will meet a need for greater uniformity. There is also an important need to recognize diversity. Myth 4: Central examinations are seen as an indicator that will track the performance of the system. The limitations of the data, however, are usually not recognized and there maybe unfortunate and unintended consequences if the results are used in this way. Myth 5: Finally, a major argument proposed for a Europeanor national examination is that it will lead to safer medical practice and that this will protect the patient from substandard practitioners. There is, in fact, no evidence to support this argument. There is a need for further work and new initiatives on standards and quality improvement in assessment approaches. This can be achieved in a number of ways including monitoring the assessment process and sharing tools and assessment approaches between schools.

PMID:
 
19811119

[PubMed - indexed for MEDLINE]

의사국가시험, 그리고 딜레마 (Acad Med, 2015)

National licensing examinations, not without dilemmas

Lambert Schuwirth







의사국가시험(NLE)는 대중에게 의과대학 졸업생의 질에 대한 확신을 주는데 기여할 수는 있으나 제한적이다. 그렇다면 몇 가지 딜레마를 언급하고자 한다.

National licensing examinations can play an important role in reas- suring the public about the quality of medical graduates but only to a certain extent. At the risk, then, of kicking in the proverbial open door I want to highlight some of the dilemma’s I think national licensing is facing.



딜레마 1: NLE의 목적

Dilemma 1: the purpose of national licensing


본질적으로, NLE의 목적은 대중에게 면허를 받은 의사는 안전하고, 독립적인 진료가 가능하다는 확신을 주기 위함이다.

In essence the purpose of national licensing is to reassure the public that licensed doctors are safe, independent practitioners.


대중들이 표준화된 NLE가 안전한 독립적 의사를 만드는 최고의 방법이라고 인식한다면 모든 것이 문제없을지도 모르나, 이는 NLE의 신뢰도가 충분히 확실할 때 뿐이다. 그렇지 않은 경우 NLE 출제 기관은 NLE의 목적이 NLE를 출제하는 그 자체냐 아니면 대중에게 의학교육과 의사의 질에 대한 확신을 심어주기 위함이냐의 딜레마에 빠진다.

As long as there is a public perception that standard- ised testing with NLEs is the best way to ensure safe independent practitioners all is ok, but when that credibility of NLEs to be suffi- cient reassurance fades a dilemma will occur. Then, the dilemma national licensing agencies face, is whether their mission is to pro- duce NLEs or whether it is to reas- sure the public of the quality of medical education and the doc- tors.



현재 평가에 관한 문헌을 보면 단순히 고도로 구조화되고 표준화된 지식과 기술에 대한 시험만으로는 안된다고 말한다. 대신, 초점은 (틀린 명명이나) 소위 'softer' 능력에 초점을 두어야 한다고 한다. 여기에는 프로페셔널리즘, 커뮤니케이션, 성찰 등이 들어간다. 의학교육에서의 평가의 발전과 NLE에서의 평가의 발전 사이의 차이는 점점 더 벌어져가는 듯 하다.

Surveying the current assessment literature in medical education it does seem to show a movement away from solely relying on highly structured and standardised testing of knowledge and skills (cf2–5). Instead, the focus seems to shift towards what are often called with a misnomer ‘softer’ abilities; abili- ties like professionalism, communi- cation, reflection, etc. There appears to be a widening gap between assessment developments in medical education and those in national licensing examinations’





딜레마 2: 자료 수집과 측정의 질, 혹은 해석의 질

Dilemma 2: the quality of collecting and measuring data or quality of interpretation


네덜란드에는 meten is weten 이라는 속담이 있다. 즉 '측정하는 것은 아는 것을 뜻한다'라는 의미이며, 평가는 객관적 성과로 이끌어주는 프로세스여야 한다는 것을 시사한다. 비록 의사결정에 있어서 양적 정보의 유용성을 부정하고 싶지는 않으나, 그러한 정보가 의미를 가지려면 반드시 주관적 해석을 거쳐야 한다. 심지어 가장 양적인 연구를 수행한 것 조차 숫자보다 더 많은 단어가 들어가 잇으며, 최소한 도입/방법/고찰을 기술하여 숫자 자료의 의미를 만들어간다. 평가에서도 주관적 해석이 결여된 객관성 혹은 순수한 숫자정보의 가치에 대한 인식에 점차 이의를 제기하고 있다.

A Dutch saying (meten is weten) suggests that ‘to measure’ means ‘to know’, implying that assess- ment has to be a process that leads to an objective outcome. Although I do not want to invalidate the use- fulness of quantitative information to inform a decision, that informa- tion will have to be subjectively interpreted before it can start mak- ing sense6,7; even the most quanti- tative research paper in medical education still contains more words than numbers and it con- tains at least an introduction, methods section and discussion to make sense of the numerical data. In assessment, the notion of objec- tivity or the value of purely numer- ical data without subjective interpretations is increasingly being questioned.8,9


이러한 발전은 NLE에 또 다른 딜레마를 만드는데, 즉 자료 수집에 있어서의 향상에 투자하는 것이 옳은지, 아니면 그 자료로부터 양질의 의미를 찾기 위한 투자가 옳은가에 대한 것이다.

These developments seem to cre- ate another dilemma for national licensing, namely whether to invest in improving the quality of the data collection methods (like com- puter adaptive testing) or in improving the quality of sense- making of the data.


딜레마 3: 학습의 평가 혹은 학습을 위한 평가

Dilemma 3: assessment of learning or assessment for learning


세 번째 딜레마는 평가를 행동주의적 관점에서 활용할 것이냐 아니면 구성주의적 관점에서 사용할 것이냐에 대한 것이다. 즉, 행동(의 변화)을 추구할 것이냐 배움(그 자체)을 추구할 것이냐이다. 근본적으로 배움에 대한 구성주의적 관점과 시험에 대한 행동주의적 접근은 잘 맞지 않는다. 행동주의적 접근의 문제는 - 전혀 새로운 것이 아니며 - 바람직한 행동을 강화하는 요인을 제거하면 그러한 행동이 사라지므로, '강화'를 유지해야 한다는 것이다.

I think that this third dilemma boils down to whether you want to use assessment in a behaviourist or in a constructivist way; whether you seek to induce study behaviour or learning. There is a fundamental misalignment between construc- tivist views on learning and beha- viourists approaches to testing. The problem with a behaviourist approach – and these are not new insights – is that when you take away the reinforcement the desired behaviour generally fades, so you have to keep on reinforcing (cf.10).


이 시점에서 '학습을 위한 평가'에 대한 흔한  오해를 언급해야 할 것 같다. '학습을 위한 평가'가 형성평가와 동일한 것이어서 전혀 부담이 없으며(no stakes) 학습의 평가는 총괄평가와 같아서 고부담(high stakes)라는 오해이다. 평가는 언제나 evaluative activity로서 주관적인 판단을 통해서 결과에 의미를 부여하고, 이는 심지어 표준화 시험에서도 마찬가지이다.

At this point it is probably good to discuss the common misunder- standing that assessment for learn- ing would be the same as formative assessment with no stakes and assessment of learning would be the same as summative assessment with high stakes. Assess- ment is always an evaluative activity and therefore requires subjective human judgements to give mean- ing to the results, even with stan- dardised testing.


따라서 '학습을 위한 평가'라고 해서 부담이 없다는 의미가 아니다. 실제로, 그것이 제대로 작동하기 위해서는 (어떤 형태로든) 부담이 있어야 한다. 전통적으로 형성평가가 선택사항이고 학생들이 피드백으로 무엇을 하든 상관없다고 보았던 것과 달리, 학습을 위한 평가는 선택사항이 아니며 학생들은 어떻게 그 모든 평가 피드백을 실제 학습 활동에 활용하고 포함시켰는지 보여야 한다. 

So, assessment for learning does not mean that the stakes are low. Actually, in order for it to really work the stakes (in whatever form) have to be considerable. Whereas in the traditional notion of formative assessment the pro- cess is completely optional and stu-dents can do whatever they want with the feedback, in assessment for learning it is not optional and students will have to demonstrate how they have incorporated all the assessment feedback into their learning activities.5,12



 

딜레마 4: 단일 평가도구 혹은 평가 프로그램

Dilemma 4: single instruments or pro- grammatic assessment


평가 프로그램(programmatic assessment)는 단순히 다양한 평가도구를 사용하는 것이 아니다. 평가 프로그램의 필수적 특징은 한 학생의 역량과 발달에 관한 판단이 언제나 '가설-유도'에 기반하며, 다양한 평가 방법을 통한 정보의 통합을 거쳐 개별화된다는 점이다. 이는 임상에서의 진단과 마찬가지이다. 의사가 다양한 진단도구를 활용하여 환자를 진단하듯, 평가 프로그램의 평가자도 마찬가지이다. 다양한 평가도구의 다양한 요소를 활용하여 'dyscompetence'를 정확히 진단해내야 한다.

Program- matic assessment is more than merely using multiple instruments; an essential feature of program- matic assessment is that decisions about a student’s competence and progress are always based on a hypothesis-driven and individu- alised combination of information from a variety of assessment meth- ods. It is like clinical diagnostics: much like a clinician uses various diagnostics to diagnose ill health an assessor in programmatic assess- ment will use various elements of various assessment methods to diag- nose accurately ‘dyscompetence’.


 

평가 프로그램의 목적은 평가가 학생의 역량에 대한 판단을 내리기 위해서만 하는 것이 아니라, 각 학생에게 최적화된 그 다음 학습활동을 결정해주기 위한 것이다.

In programmatic assessment the purpose of the assessment is not only to come to a judgement of a student’s competence and pro- gress but also to decide about the optimal next learning activities for each student.



로지스틱한 문제는 극복되어야 할 대상임, NLE를 '학습을 위한 평가'적 접근으로 보는 것은 검토할 가치가 있는 대안이다. 따라서 면허기구의 딜레마는 단일한 평가도구에 집착할 것인가, 아니면 평가의 프로그램적 접근으로 옮겨갈 것인가에 대한 것이다.

If the logistical issues could be overcome program- matic approaches and assessment for learning approaches to licens- ing examinations would be alterna- tives worth exploring. So, the dilemma for licensing bodies will be whether to stick to single-instru- ment assessment or to move to programmatic approaches to assessment.


의료에 비유해보자면, 국가 질병 스크리닝 프로그램에 반대하는 것은 아니나, 진료에 있어서 그 스크리닝 프로그램의 위치를 면밀히 살펴보고, 문화적 차이를 고려해야 한다.

To use an analogy with health care: I would not argue against national disease screening programmes, but I would argue for careful consideration of their place in health care and to care- fully consider cultural differences.





 2016 Jan;50(1):15-7. doi: 10.1111/medu.12891.

National licensing examinations, not without dilemmas.

Author information

  • 1Adelaide, Australia.
PMID:
 
26695461
 
[PubMed - in process]


의사국가시험이 필요한가, 아니면 필요하지 않은가? 그것이 문제로다 (Med Educ, 2016)

National licensing exam or no national licensing exam? That is the question

Brian Jolly






이 commentary에서 의사국가시험(NLE)가 최근의 몇 가지 걱정스러운 문제들을 해결하는 최선의 방식이라는 NLE 지지자들에게 이의를 제기하고자 한다.

In this commentary I wish to chal- lenge the case that proponents have used to support the idea that national licensing examinations (NLEs) are the best way to tackle a number of recent troubling find- ings.


첫째로, 모든 평가는 규제를 위하여 도입된 것이 아니라 교육과정 전략의 한 부분이라는 점, 그리고 어떤 평가든 그것이 도입될 때는 기존의 교육과정 설계 프로세스에 주요한 영향을 줄 것이라는 점이다. 둘째로, 앞서 문제가 되는 것으로 제기된 것들은 그다지 '명백한' 것들이 아니다. 세번째로, NLE를 필두로 한 전략이 적합한지에 대해서 생각해보기 전에 다양한 의과대학 간 기준을 정하는 작업이 선행되어야 한다.

First, that any assessment has to be part of a curricular strategy rather than a regulatory imposition, and that to impose any assessment will have a major impact on the exist- ing curriculumdesign process. Second, that the evidence that is quoted as troubling is far from clear-cut. And third, because of this, more work needs to be done on benchmarking across different schools before we could even begin to think that an NLE-focused solu- tion would be adequate.


우리는 '평가가 학습을 유도한다'라는 것을 알고 있다. 이 문구는 의학교육의 예배식(liturgy)에 매우 뚜렷한 것이었다. 한 개인의 능력을 좌절시키고, 또한 일부 사람들의 숙달(master)하고자 하는 동기부여를 꺾는 과목을 학습하고자 노력하는 것의 불가피한 결과로서 받아들여지고 있다.

We know that ‘assessment drives learning’. It is widely appreciated as the inevitable consequence of trying to learn a discipline that defeats anyone’s capacity, and some people’s motivation, to mas- ter it.


현실에서는 '평가는 학습되어지는 것을 유도한다'라고 보는 것이 보다 정확할 것이다. 비록 평가 자체가 실제의 교수-학습 프로세스를 유도할 수도 있지만, 대부분은 그러하지 않다. 프로그램이 가진 목적과 관련해서, 평가는 의과대학에서 요구되는 중요한 성과나, 임상 환경에서의 상호작용, 공감하고 공정한 의료를 실천하기 위해 배우고자 하는 헌신, 지속적인 전문성 개발을 위한 윤리적/사회적 책임 등의 정의를 설정하는 것이 아니라 그것을 뒷받침하는 역할을 한다.

In reality it might be more precise to say that assessment driveswhat is learned. Although assess- ment can drive the actual learning and teaching process, it usually does not. In relation to the goals of the programme, assessment is sup- posed to serve rather than establishthe definition of the overarching outcomes required of medical schools, the interactions in the clin-ical environment, the dedication tostudy and to delivering compas- sionate and equitable health care, and the ethical and social responsi- bility for continual professional development.


평가가 교육과정 전략의 한 부분으로서 들어가지 않으면, 그것을 종합적이면서 충분한 해결책이 아니다. 


'좋은 의사'를 양성하는 한 방법으로 NLE를 사용하는 것은 불가피한 것의 영역으로서의 지위를 공고히 하고 있어 보인다. Swanson and Roberts가 수행한 NLE의 트렌드에 대한 리뷰에서 NLE는 '좋은 것'인 것으로 굳건히 가정되고 있다.

The belief that one way to deal with the global need for ‘good doctors’ is to develop national licensing examinations (NLEs) also seems to be crystallising into the realm of being inevitable.3 In Swanson and Roberts’ stimulating and scholarly review of the trends in NLEs there seems to be a fairly firmly held assumption that NLEs are ‘a good thing’.


저자들은 의과대학 간 차이를 보여주는 여러 흥미로운 연구들을 인용하며, 여러 의과대학 졸업생들 사이에 vocational training examinations 혹은 기존의 NLE에서 성취한 정도의 차이가 다양함을 보여주었다.

The authors quote a number of very intriguing studies that show variation between medi- cal schools in the degree of suc- cess that their graduates have in vocational training examinations or existing NLEs.4–6


그러나 이들 중 적어도 두 연구는(그 차이가 가장 크게 드러난 연구들은) 의과대학을 졸업한 시점과 시험을 치른 시점 사이에 상당한 기간을 두고 있다. 적어도 1년의 preregistration 시기가 있었고, 그리고 1년 혹은 그 이상이 지난 후에 vocational training을 들어갔다. 내가 근무했던 모든 의과대학은 (역량-지향 평가에서 흔히 기대할 수 있듯)최종시험결과가 상당히 오른쪽으로 치우쳐(skew)있었으며, 이는 의과대학을 졸업할 때 학생들이 상당히 (적어도 한 의과대학 내에서는) 균질화된다는 것을 시사한다.

However, in at least two of these (the ones with the biggest differences) there is a considerable period between leav- ing the medical school and taking the examination. There is at least 1 year of preregistration activity, and potentially one or more years after that, before entering voca- tional training. Every medical school in which I’ve worked has final year assessment data that are always strongly skewed to the right, as would be expected in a compe- tence-oriented assessment, suggest- ing that the students when they leave medical school are a pretty homogenous group (at least within the school).


따라서 비용이 많이 드는, 대규모의, 속박하고(constraining), 균질화시키는(homogenizing) NLE 시험이 이 시점에 과연 필요한지를 가정하기 전에, 이들 연구로부터 우리는 가장 잘 하는 학교와 가장 못 하는 학교 간 더 효율적인 비교(benchmarking)이 필요하며, 기준(standard)에 집중해야 한다.

So maybe we should be looking at more efficient benchmarking between the best and worst performing schools in these studies, and a concentration on standards, before we assume, that a costly,large and potentially constraining and homogenising examination is needed at that point. 


또한 우리는 평가를 변화를 촉진하는 도구라는 관점으로 바라보아야 하며, NLE의 도입은 이러한 관점을 왜곡한다. 다수의 교육과정 이론가들은 평가를 교육과정 설계의 핵심 요소로 본다. 그러나 여기에 NLE를 덧대어 놓으면 그 교육과정의 다른 필수적 요소에 어떤 영향을 미칠 것이며, 누가 그것을 제공할 것이며, 어떤 linkage와 alignment가 요구되는가?

Additionally, we should be putting assessment in perspective as a tool to promote change, which the imposition of NLEs tends to distort. Numerous curriculum theorists have identified that assess- ment is best viewed as a vital element of curriculumdesign. But if we have a superim- posed NLE, what impact does this have on the other vital features of local curricula, and who is provid- ing those, and the linkages and alignment that they demand?


수년 전, 전공과목 학회들은 향후 그 학회 구성원이 될 의사들에게 지원자들이 합격해야 하는 시험을 제공하는 것 외에는 주는 것이 없었다. 그러나 이것이 전문성 개발에 있어서 궁색한 접근법이라는 것은 널리 인식되고 있으며, 대부분의 학회들은 이제 CanMEDS와 같은 프레임워크를 가지고 있으며, 이 프레임워크는 National exit exam의 개념이 처음 도입될 당시의 그 어떤 교육과정보다 더 넓은 범위를 아우른다.

Years ago, specialty colleges had very little to offer a prospective member other than a series of assessments that the appli- cant had to pass. It was universally recognised that this was a rather impoverished approach to specialty development, and almost all col- leges now have frameworks (e.g. CanMEDS10) that are much more encompassing than any curricula were when the national exit exami- nation was first conceptualised.


이 프레임워크는 더 넓은 목적과 지향점, 교육훈련 전략과 지원체계를 가지고 있으며, 이와 더불어 더 확대된, 재개발된, 그 질이 보장된 평가법을 포함하고 있다. 더 나아가 work-based assessment와 같은 현재 사용되는 일부 평가법이 의사들을 위한 더 넓은 목표에 대한 평가를 할 수 있지만, 그러한 평가가 표준화된 전 국가적 차원의 형태로 수행가능하지 않다. 따라서 NLE를 의과대학에 그저 툭 던져놓는 것은 퇴행적 방식이다.

These frameworks include broader aims and objectives, training strate- gies and support mechanisms, alongside expanded, redeveloped and quality-assured assessments. Furthermore, even though some of the current assessment strategies, such as work-based assessment, are capable of tackling the appraisal of those wider goals for practitioners, such assessments would not be deliverable in a standardised national form. So plonking NLEs down on medical schools must be seen as a retrograde step.


이 논문의 저자들은 'NLE에서 더 잘 한다는 것이 환자를 더 잘보는 것과 연관된다는 것은 매우 설득력이 있으며, 우리는 NLE를 더 광범위하게 활용해야 한다고 생각한다'라는 것을 확신하는 듯 하다. 그러나 적어도 대학교육에 오랜 역사를 지닌 선진국에 있어서는 NLE가 보건의료를 개선할 것이라는 것에 대한 의심이 있다. 

The authors of the paper seem convinced that ‘The evidence that better performance in NLEs is associated with better patient care seems compelling and, we think, aids in justifying more widespread use of NLEs’.3 However, there is some considerable doubt, at least in most developed countries with a long history of established universi- ties, that NLEs will improve health care substantially over and above what is currently done.


Archer 등에 의한 최근의 systemic review에서는 '일부 연구자들은 면허시험이 환자안전을 더 강화해주고, 의료의 질을 향상시킨다고 주장하나, 이러한 주장에 대한 근거는 NLE와 환자 outcome의 결과의 직접적 연결성을 보여주는데 실패하고 있다' 라고 결론지었다.

A recent systematic review by Archer et al.11 concludes ‘Some authors claim to provide evidence that licensing examinations ensure greater patient safety and improved quality of care .... The evidence for these claims however is based on correlations of performance that fail to establish a direct link between national licens- ing examinations and improve- ments in patient outcomes’.


의사에 대한 불만은 점차 늘어나고 있는데, 과연 그 불만이 의과대학에서 준비를 덜 시킨 것 때문일까? 30세 이하의 의사, 새롭게 면허를 딴 의사, 그리고 여성 의사는 더 나이가 많고 전문의를 딴 의사보다 컴플레인을 받을 가능성이 낮다.

Complaints against doctors are rising,12 but are these being caused by inade- quate medical school preparation? Doctors under 30, the newly quali- fied and women have a much lower probability of being com- plained about than older or spe- cialised doctors.12


더 나아가서, 호주에서 3%의 의료인력이 49%의 컴플레인을 받고 있고, 1%가 1/4의 컴플레인을 받고 있다.

Furthermore, in Australia, 3% of the medical work- force accounts for 49% of com- plaints, and 1% accounts for a quarter of complaints.13


이는 의과대학간 variability로 인해서 NLE가 중요하다는 주장이 타당하지 않음을 보여준다. 의과대학에서의 컴플레인에 대해서 연구한 것은 거의 드물고, 특정 학교에서 poor한 의사가 집단으로 나온다는 것도 그럴듯하지 않아 보인다. 그러나 고소를 당한 의사에 대한 한 연구에서는 특정 10년의 기간동안 고소를 당한 의사를 많이 배출한 3개의 미국 의과대학은 그 다음 10년에도 비슷한 비율로 그러한 의사를 배출했다는 결과가 있다.

This would seem to make any argument that suggests NLEs are important because of variability across medi- cal schools a lame one. It is rare to find studies of complaints analysed by medical school of origin, and it seems unli- kely that there would be clusters of poor doctors emanating from particular schools. However, one study looking at doctors who were sued in three US states found that medical schools that passed more of these doctors in one decade, also passed a similar proportion in the next decade.14

 


또한 우리는 법적 소송의 여부는 지식보다는 'soft skill'에 달려있음을 잘 안다. 따라서, 심지어 OSCE style에서조차 거의 지식-기반의 평가인 NLE는 그 대안이 될 수 있을까?

We also know that liti- gation is much more related to ‘soft skills’ than knowledge,15 so where are largely knowledge-based, or even OSCE-style, NLEs going to provide that scope?


따라서 NLE의 지위를 보다 잘 이해하려면 NLE의 효과를 (인턴으로 이어지는) 현재의 의과대학-기반 평가 프로세스와 비교할 필요가 있다.

So, to be in a better position to ex- plore NLEs, we should be able to compare the impact of NLEs ver- sus the impact of the current school-based assessment process paired with ensuing internships.


이러한 비판이 NLE의 대안이 될 수도 있다. 예컨대, (지금은 살짝만 다루는 것으로 보이는(light touch)) 의과대학에서 사용하는 평가과정에 대한 '지속가능한' 인증, 혹은 강화된 인증이 더 효과적이면서 덜 비용이 들까?  단지 우리가 환자를 놓칠 것 같다는 두려움을 갖고 있다는 것, 그리고 우리가 지도를 가지고 있다는 것이 우리가 지금까지 해온 방식을 그대로 해도 된다는 것은 아닐 것이다.

Such a critique would also address the potential alternatives to NLEs.16 For example, would beefed up or ‘sustainable’17 accreditation of the assessment process utilised within medical schools (which currently seems to have become more ‘light touch’ in many jurisdictions) be both more effective and less expensive? Just because we are scared of failing our patients, and we have a map, doesn’t mean we have to charter a course along previously travelled routes, at least not quite yet.










 2016 Jan;50(1):12-4. doi: 10.1111/medu.12941.

National licensing exam or no national licensing exam? That is the question.

Author information

  • 1Newcastle, New South Wales, Australia.
PMID:
 
26695460
 
[PubMed - in process]


평가의 영향력에 대한 학생들의 인식: 유사한 교육과정을 가진 두 학교 비교(Int J MEd Educ, 2011)

Students’ perceptions of the impact of assessment on approaches to learning: a comparison between two medical schools with similar curricula

Hanan M. Al Kadri1, Mohamed S. Al‐Moamary1, Mohi E. Magzoub1, Chris Roberts2, Cees P.M. van der Vleuten3



Crooks 등은 교육과정에서 의도한 목표와 평가과정을 통해서 정의되는 목표 사이의 불일치를 경고했다. 이 두 종류의 목표를 일치시키는 것을 'constructive alignment'라고 한다.

Crooks et al.10 warned against the possible incongruence between academic objectives as intended by the curriculum and the objectives defined through the assessment process. Synchronization between these two types of objectives is called constructive alignment.


constructive alignment가 달성되었을 때, 학습이 쉽게 일어나게 된다.

When constructive align- ment is achieved, it is assumed to be conducive to learning.


교육과정의 alignment가 제대로 되지 않았을 때 일어날 수 있는 결과 중 하나는, 학생들이 시험을 위해서 알아야 한다고 생각하는 것과, 교과목에서 목표로 기술된, 의도한 목표 사이에 반복적인 불일치가 생기면서 '잠재 교육과정'이라고 불리는 국소적 문화를 형성하는 것이다.

Therefore, one of the consequences of curriculum misalignment is that repeated discrepancies between what students perceive that they need to know for assessment purposes and the stated course objectives can potentially lead to a local culture, whereby a hidden curriculum13 is created.


피드백과 형성평가가 학생의 성취에 강력한 힘을 발휘한다는 결과를 얻었다.

Researchers have concluded that feedback and formative assessment produce the most powerful effect on student achievement.16, 17


교실의 규모를 줄이거나 교사의 내용 지식을 늘리는 것보다도 평성평가가 학생의 성취에 할 수 있는 역할이 더 크다. 반대로, 총괄평가는 학생들의 성취에 대한 근거를 끌어내고, 학생간 능력 차이를 보여주는 역할을 한다.

Formative assessment appears to play a larger role in increasing student achievement than does a reduction in class size or an increase in teachers' content knowledge.18 On the other hand, summative assessment is a proven way


학생의 학습에 대한 접근에 있어서 개인적 영향에 대한 연구가 부족하다.

The third area of research where there is a lack of clarity is the effect of personal influences on students’ approaches to learning.


임상실습기간 중 Sydney Medical School (SMS)에서는 임상 블록마다 요구하는 것이 다르다.

The assessment process during the clinical years in SMS was characterized by different requirements for each clinical block.


문화는 학생이 배우고 공유하고 전달하는 신념, 행동, 태도, 실천을 모두 아우르는 것이며, 학생의 '문화적 정체성'은 문화/성별/사회/경제/종교/정치적 특성의 복합적 결과물이다.

Furthermore, culture is a personal factor that encom- passes students’ beliefs, behaviors, attitudes, and practices that are learned, shared and passed on.23 Students’ sense of a “cultural identity”24 is derived from a complex mixture of cultural, gender, social, economic, religious, and political affiliations.


King Saud Bin Abdulaziz University for Health Sciences, College of Medicine (KSAU-HS, COM)의 평가는 블럭단위이며 블럭마다 비슷하다.

The assessment program for the third and fourth year of the curriculum at KSAU-HS, COM was block-based and was similar from block to block.



연구방법

Methods


Study Setting




Study Design


A qualitative approach using thematic analysis25 was used to generate a rich understanding of the full range of opinions and experiences of students when they are exposed to the implemented assessment. Our assumption was that students of different cultural background were influenced in their approach to learning by different personal and contextual factors. In interpreting our data we used a theoretical framework based on the work of Biggs11 and Ramsden2, 12 describing the interactive relationship among student factors, teaching context, the on-going approaches to a particular task and student learning outcomes.


Study Population


The study participants were students who were in the last two years of the curriculum. This convenience sampling was undertaken to gain students’ common experiences and perceptions of the various methods of assessment imple- mented during this phase of the curriculum.


Twenty eight students and all supervisors agreed to participate in the semi-structured individual interviews.


Data Collection


Semi-structured individual interviews and open-ended questions were conducted with students and supervisors.


Each interview lasted from 30-45 minutes. Interviews were recorded on audiotape and transcribed verbatim.



Analysis


Interview data was examined in-depth aiming to obtain the emerging themes. Initial coding revealed a number of basic themes that were arranged to form organizing themes. Subsequently, organizing themes were iteratively discussed between authors and were renegotiated when differences existed. After further analysis, the organizing themes were condensed into the three global themes discussed in this paper.25




  • Students Personal Perceptions of Assessment Function
    • Summative Assessment
    • Formative Assessment
  • Students Perception of Learning Outcomes
  • Student Perception of Authentic Assessment in the Workplace




두 그룹의 문화적 차이는 학습과 학습방법에 대한 인식에 영향을 주는 것으로 보인다.

The cultural differences between the two studied groups appears to have influenced their perception of learning and their approach to their studying.23


이는 서로 다른 문화권에 기인하는 교육적 경험과 교육 시스템의 차이를 반영한다고 볼 수도 있다. 예를 들어, 아시아 학생들은 독립적 학습과 교수자의 관리와 지도가 적은 교육환경에 적응하는 것을 어려워한다.

This partly reflects prior educational experience and the prevailing educational systems in differing cultures. For example, in one study, Asian students tend to have difficulty adjusting to an educational environment that is characterized by independent learning and less instructor supervision and guidance.30 


본 연구에서 양 그룹 모두 총괄평가가 '열심히 공부'하게 하는 동기가 된다는 점에서 중요하다는 것에 동의했다. 그러나 SMS학생과 달리 KSAU학생은 총괄평가에서 더 스트레스를 받고, 더 불안감을 유발하는 경험이라고 생각했으며, 이로 인해서 산발적이고 표면적인 독해(reading)을 하게 한다고 응답했다. 따라서 총괄평가시스템은 학생의 학습에 긍정적 측면과 부정적 측면을 모두 가지고 있으며, 이러한 것은 학생들의 인식에 따라 다르다. 총괄평가를 준비하고 시험을 보는 과정에서 KSAU 학생이 나타내는 행동은 '시험 불안'으로 알려진 것으로서, 약 10%의 학생 이러한 '시험 불안'을 겪으며, 이는 수행능력과 정서적 안정에 좋지 않은 영향을 줄 수 있다. 이러한 문제는 KSAU학생들에게서만 나타내는 것은 아니며 미국, 호주, 중국, 영국, 독일, 인도, 이태리, 네덜란드, 파키스탄, 터키 등에서도 나타났다.

In this study, both studied groups acknowledged the importance of summative assessment as a motivator for hard work. However, in contrast to SMS students, the KSAU-HS, COM students perceived summative assessment as a stressful and anxiety-provoking experience that led them to engage in sporadic and superficial reading. Therefore, the summative system may have both positive and negative influences on students’ learning. These influences depend upon students’ perceptions. The behavioral symptoms displayed by the KSAU-HS, COM students during preparation for and completion of summative exams might be what is known as “test anxiety”. Based upon the literature, about 10% of students suffer from test anxiety, which compromises their performance and emotional well-being.32, 33 This problem is not specific to KSAU-HS, COM students. Severe test anxieties have been reported for medical trainees in many countries with different social and financial backgrounds, including the United States, Australia, China, England, Germany, India, Italy, the Netherlands, Pakistan and Turkey.34


시험 불안의 주요 이유는 성적이었다.

It was found that the primary source of test anxiety was exam grades.35 


따라서 사우디 대학의 총괄평가의 특성은 학생들에게 인지적 측면과 정서적 측면에 부정적 영향을 미치며, 예컨대 집중력을 떨어뜨리고, 집중을 저해하고, 암기한 지식의 인출에 어려움을 겪게 한다. 이러한 문제는 결국 학습 습관에 안좋은 영향을 주는데, 더 피상적인 학습만 하게 한다.

Therefore, the summative characteristics of the Saudi college assessment program may have led students to experience adverse cognitive and emotional effects, including impaired attention, problems with focusing and difficulties with the retrieval of stored knowledge.36 These problems, in turn, may have adversely affected their study habits,37 leading to a more superficial approach to learning.


SMS학생들은 총괄평가를 덜 겪게 되고, 형성평가를 더 겪게 된다. 총괄-형성 평가사이의 균형이 총괄평가에 대한 불안감을 덜어주는 것으로 보인다. 더 나아가서 SMS학생들은 문화적으로 이러한 불안을 더 잘 다루는데, 평가 절차에 대한 인식에 미치는 영향도 있다. SMS의 평가 프로그램은 딱 적절한 정도의 스트레스를 유발하여, 집중력을 향상시키고, 기억력을 높이고, 결과적으로 더 나은 평가결과를 가져온다. 우리의 자료를 보면 학생들은 총괄평가에서 ㅂ다는 스트레스를 어느 정도는 더 긍정적으로 느끼고 있다. 따라서 총괄평가에 지나치게 초점을 둬서 부정적인 스트레스를 받는 것을 적절한 형성평가로 줄여줄 수 있다.

The SMS students were subjected to less frequent summative assessment but to more formative assessment compared to KSAU-HS, COM students. It appeared that the balance between summative and formative assessment in the SMS program reduced the negative anxiety effects of summative assessment. Moreover, the possible influence of cultural factors in enabling SMS students to cope with this anxiety and its subsequent effect on their perception of the assessment process cannot be ruled out. The assessment program in SMS may have created just the right degree of stress, leading to focused attention, improved memory38 and better overall results. Our data suggest that student perceive stress from summative exams as beneficial to a certain degree, beyond which it has negative effects. Therefore, negative stress resulting from an over focus on summative assessment should be balanced with adequate frequency of formative feedback.


학습목표, 평가, 교육, 학습활동에 대한 교육과정 alignment가 학생들이 더 넓게 성취하고, 교육적 핵심 개념에 더 직접적으로 노출되도록 해준다. 그러나 많은 학생들은 자기 자신의 목표를 따로 가지게 되는데, 이것이 바로 잠재 교육과정이다. 이러한 현상은 SMS학생들 사이에 흔했는데, 이는 구조화가 덜 되어있고, 평가가 형성평가적 특징이 강하기 때문인 것으로 보인다. KSAU는 총괄평가가 많고 매우 구조화되어있어서, 교육과정의 목표에 기반한 학습을 촉진시키고 잠재 교육과정이 발생할 가능성을 줄여준다.

Curriculum alignment of learning objectives, assess- ment and teaching and learning activities help students to achieve broad and direct exposure to core educational concepts.41 However, many students were actually selecting their own study objectives, thereby creating a hidden curriculum.13 The prevalence of this phenomenon among SMS students may have been a result of the less structured and formative nature of assessment compared with KSAU- HS, COM assessment program. KSAU-HS, COM assess- ment program, with its frequent summative assessments and highly structured format, was more successful in stimulating curriculum objective-based learning and in reducing the tendency to create a hidden curriculum.


사우디아라비아의 전공의 수련은 학부시절 누적성적에따라 매우 달라진다. 그 결과 학생들은 고득점에 대해서 경쟁적 태도를 가지게 되고 이러한 태도가 학습목표를 두고 '도박'을 할 가능성을 낮춘다.

A student’s prospects for residency training in Saudi Arabia are greatly dependent on their accumulative assessment grades. Consequently, students develop a very competitive attitude with the goal of achieving high scores, and this attitude makes them less likely to gamble in selecting their study objectives.


지필고사에 비해서 학생들의 임상평가에 대한 인식은 비슷했다. 학생들은 근무지기반 평가를 긍정적으로 평가했고, 학습에 도움이 되며, 큰 가치를 지닌다고 했다. 근무지기반평가는 학생들이 보다 능숙한 의사가 되도록 도와주고 더 나은 학습접근법에 자극제가 된다고 했다.

There were similarities in students’ perceptions of clini- cal assessment as opposed to written assessment in between the two studied groups. Students particularly appreciated work based assessment that was conducive to learning and held significant value for them. Work-based assessment was perceived by the students as leading to more skilled doctors and was a stimulant for better approaches to learning.


본 연구의 결과와 관련해서 교육과 임상평가에 사용할 수 있는 시간의 중요성은 다른 연구에서도 확인된 바 있다. 기관들 사이의 차이가 보고된 바 있는데, 본 연구에서 제한된 시간과 자원이 양질의 교육활동에 장애물이 됨을 보여주엇다. 

The results of this study regarding the importance of available time for teaching and clinical assessment are also confirmed by other studies.43, 44 Variations between institu- tions and supervisors in the acceptance of responsibility for clinical teaching and time allocated for supervision have been reported. In this study, limited time and resources for clinical teaching were regarded as barriers to high-quality teaching performance. This result was not affected by the various differences between the two groups.


23. Tervalon M. Components of culture in health for medical students' education. Acad Med. 2003;78:570-6.


25. Attride-Stirling J. Thematic networks: an analytic tool for qualitative research. Qualitative Research. 2001;1:385- 405.










Int J Med Educ. 2011; 2: 44–52.
Published online 2011 May 27. doi:  10.5116/ijme.4ddb.fc11
PMCID: PMC4205515

Students' perceptions of the impact of assessment on approaches to learning: a comparison between two medical schools with similar Curricula

Abstract

Objectives: The aim of the study was to investigate students? perceptions of assessment and the resulting learning styles.

Methods

Qualitative semi-structured interviews were conducted with 14 students and 8 clinical supervisors from Sydney Medical School and 12 students and 13 clinical supervisors from King Saud bin Abdulaziz University. Both institutions have similar curricula but a different assessment approach. The interviews were transcribed and analyzed using thematic analysis. Interview transcripts were stored and analyzed using ATLAS.ti.

Results

Three themes emerged from analyses of the interviews: the function of assessment, learning outcomes and, finally, authentic assessment in the clinical environment. A model is presented to show the relationship between contextual and different personal factors and students’ perceptions of the impact of assessment on learning styles.

Conclusions

Cultural differences and emotions can affect students’ perceptions of assessment and learning styles. A combination of formative and summative assessment based on learning objectives is required. This combination should take into consideration students’ cultural background, values and the implemented education system. This balance should be sufficient to motivate students in order to maintain their focus and attention, and reduce the potential negative impacts of a hidden curriculum. The experience of authentic assessment was a powerful motivator for students’ approaches to learning.

Keywords: Assessment methods, learning approaches, cultural differences


성과중심교육에서 학생평가

Assessment in Outcome-Based Education

임선주

부산대학교 의학전문대학원 의학교육실

Sun Ju Im

Department of Medical Education, Pusan National University School of Medicine, Yangsan, Korea






서 론

성과중심교육에서 ‘성과(outcome)’는 교육과정을 이수한 학생들이 달성해야 하는 역량(competency)을 의미하고, 교수는 학생들에게 현재의 수준에 대해 지속적으로 피드백을 제공함으로써 모든학생들이 성과에 도달할 수 있도록 해야 한다(Shumway et al., 2003).이 과정에서 평가는 학생과 교수에게 현재의 수준에 대한 정보를제공하는 핵심적인 역할을 하며, 적절한 평가 없이 성과중심교육은 성공하기 어렵다. 지금까지 의학교육현장에서 평가는, 교육결과인 성과에 대하여 고려한 것이 아니라, 교수가 중요하다고 생각하는 지식위주의 학습목표에 대하여 지필시험으로 학생을 평가하고있다. 학생에게 현재 수준에 대하여 알려주고 피드백을 제공하는의도보다는, 진급과 졸업 여부를 결정하는 종합평가로서의 목적이강하다(Carraccio et al., 2002). 이러한 현실에서 교육의 가치를 학생들의 역량 향상에 두고 있는 성과중심교육의 도입은 평가방법과체계에 대한 변화를 요구한다.


성과중심교육을 설계하는 과정은, 우선 학습성과를 선정한 다음, 교육과정을 설계하기 이전에 평가기준과 방법을 먼저 설계하는것을 권장한다. 이것은 학습목표설정, 교육과정설계, 평가방법설계의 순서로 일어나는 전통적인 설계방법과 비교하여, 후향적으로 설계하는 방식을 취한다(Dent &Harden, 2009). 평가는 성과와 직접적인 연관이 있으며 평가설계는 성과중심교육 설계의 초기에 일어난다. 평가를 설계하기 위해서는 왜 평가해야 하고, 무엇을 평가하며, 어떻게 평가할 것인지 고려해야 하며, 평가도구의 신뢰도, 타당도, 실현가능성을 생각해 볼 필요가 있다. ‘무엇’을 평가하느냐에 해당하는 것이 ‘성과 또는 역량’으로 볼 수 있으며, 성과의 특성을 잘측정할 수 방법을 선택하는 것이 중요하겠다.


일반적으로 평가는, 

  • 평가의 목적(피드백을 주기 위한 형성평가인가, 서열을 결정하기 위한 총괄평가인가?), 
  • 평가방법의 선택(지필시험으로 판단할 것인가, 수행평가로 판단할 것인가?), 
  • 평가도구의 개발(수행평가를 시행한다면 체크리스트, 루브릭, 평정척도 중 어떤도구를 어떻게 개발할 것인가?) 및 
  • 기준설정(합격/불합격을 어떻게결정할 것인가?)의 사항을 고려한다.


본 연구에서는 위와 같은 일반적인 평가에서의 고려사항에 대하여 문헌고찰을 통해 성과중심교육에서 평가의 특성을 알아보고자하였다. 또한, 성과중심교육을 조기에 도입한 대학에서 시행한 평가방법과 이들 대학에서 제시한 모형을 통하여 성과를 판단하는적절한 평가방법과 도구를 살펴보고자 하였다. 마지막으로, 일반적인 기준설정방법과 임상수행평가에서 기준설정방법에 대해 고찰하였다.



성과중심교육에서 평가의 특성


평가의 목적, 방법 및 기준설정과 관련하여 성과중심교육에서평가의 특성은 다음과 같다.


첫째, 성과중심교육에서 평가의 목적은 학생들에게 단계별로 향상 정도를 알려주기 위한 것이다. 평가는 목적에 따라, 학생의 학업성취도를 평가하여 서열을 정하기 위한 총괄평가(summative assessment)와 학생에게 학업성취도에 대한 정보를 제공함으로써 장점과 취약점에 대한 피드백을 주기 위한 형성평가(formative assessment)로 나눌 수 있다. 성과중심교육에서는 학생이 졸업할 때갖추어야 할 역량을 정의한 다음(exit outcome), 이 목표에 도달하기 위한 단계별 수준을 정하게 된다. 이것은 학년이 올라감에 따라교육내용의 폭이 넓어지고 깊이가 깊어지도록 교육과정을 설계하는 나선형 교육과정과 일치한다(Harden, 1999). 이러한 교육과정에서 학생이 최종 졸업성과를 달성할 수 있도록 단계별로 향상 정도를 파악하는 것이 성과중심교육에서 평가의 목적이라 할 수 있다(Harden, 2007; Holmboe et al., 2010). 평가의 목적이 학생에게 향상 정도를 알려주기 위한 것이라면, 평가는 총괄적이기보다 형성평가의 성격을 띤다. Carraccio et al. (2002)은 성과중심교육에서 총괄평가보다 형성평가가 중요함을 강조하였고, Burch et al. (2006)은형성평가를 통해 임상실습에서 학습을 향상시켰다고 보고하였다.그 밖에 성과중심교육에서 평가의 목적은 교육과정에 대한 피드백을 제공함으로써 교육과정의 개선을 위해 시행하며, 우수한 교육프로그램의 제공은 대학의 사회적 책무성과도 연관되며, 이것은보편적인 평가의 목적과도 일치한다.


둘째, 성과중심교육에서 평가는 지속적으로 자주 일어난다(Holmboeet al., 2010). 학생들의 역량을 향상시키기 위해 피드백을 제공하기 위해서 평가의 빈도가 증가한다. 이러한 예로 수업에서는퀴즈나 이해도에 대한 점검을 시행할 수 있으며 임상실습현장에서수행에 대한 피드백을 지속적으로 시행할 수 있다. 평가는 교수와학습이 일어나는 수업·실습 중에 빈번히 일어나게 되므로 평가-수업 간 경계가 종종 불분명하게 된다.


셋째, 역량의 특성에 따라 합당한 방법으로 성과를 평가한다. 평가방법은 지필시험(written examination), 임상수행평가(clinicalor practical assessments), 관찰(observation), 포트폴리오와 다른기록평가(portfolio, other records of performance), 동료평가와 자기평가(peer report and self-report)의 다섯 가지로 분류할 수 있다(Shumway et al., 2003). 이 중 성과에 명시된 역량을 잘 나타낼 수있는 평가방법을 선택하여야 하며, 각 평가방법의 타당도, 신뢰도,적용효과, 비용 등을 고려하여야 한다.


넷째, 같은 역량에 대해서 다양한 평가방법으로 성과를 평가한다. Miller (1990)는 어떤 단일한 평가방법으로 의사의 전문적인 수행과 같은 복합적 능력을 판단할 수 없다고 하였다. 예를 들어 의학지식 역량은 보편적으로 지필시험으로 평가하나, 포트폴리오, 관찰, 수행평가를 통해서도 판단할 수 있다(Shumway et al., 2003)


.다섯째, 성과는 궁극적으로는 실제 상황에서 행하는 것을 의미하므로 수행평가가 강조된다(Gruppen et al., 2012; Holmboe et al.,2010). 저학년 시기의 성과는 지식적인 내용을 습득하는 것일 수 있지만, 임상실습을 수행하는 3, 4학년 학생들의 평가는 실제 상황에서 수행하는 것을 목표로 해야 한다. 따라서 임상실습 상황에서는실제 수행을 평가하도록 권장하며, 불가능한 경우 실제 상황과 비슷하게 조성하여 수행평가를 시행해야 한다.


여섯째, 성과를 판단하기 위해서 양적 평가와 더불어 질적 평가를 시행한다(Holmboe et al., 2010; Shumway et al., 2003). 의료윤리와 전문직업성처럼 양적 평가가 어려운 성과항목들은 질적 평가가합당하다(Davis et al., 2001; Smith et al., 2007). 질적 평가는 신뢰도를 확보하는 일이 중요하며, 삼각기법(triangulation), 지속적이고 잦은 평가, 평가자 훈련을 통해 객관성을 확보할 수 있다(Shumwayet al., 2003). 삼각기법은 여러 평가자가 다양한 각도에서 평가하는것을 의미하며, 한 번의 시점에서 평가한 것은 높은 신뢰도를 확보할 수 없으므로 지속적으로 자주 평가해야 하며, 평가자 간의 편차를 줄이기 위해서는 평가자 훈련을 시행할 수 있다.


일곱째, 성과중심교육에서 평가기준은 일반적으로 절대적 기준(standards, criterion)을 사용한다(Gruppen et al., 2012; Holmboe etal., 2010). 성과중심교육에서 합격/불합격을 결정하기 위해 평가기준을 설정하는 일은 한 가지 뚜렷한 방법은 없지만, 대체로 다른 학생들의 수행수준과 관련이 없는 기준을 사용한다. 이것은 모든 학생은 학습 후에 명시된 성과수준에 도달해야 하는 것을 의미한다.


여덟째, 성과중심교육의 평가에 교수와 학생이 적극적으로 참여한다(Holmboe et al., 2010). 성과중심교육에서 다양한 평가가 지속적으로 이루어지게 되므로 교수의 이해와 참여가 필수적이다.학생 또한 자신이 부족한 역량을 스스로 파악하고 보완하기 위해,달성해야 하는 성과기준뿐만 아니라 그것을 판단하는 기준과 평가에도 깊이 관여해야 한다.


현재의 교육과정은 구조 또는 과정을 강조한 교육과정으로서(structured and process-based education), 성과중심교육에서 평가의 특성과 비교하여 Table 1에 제시하였다(Carraccio et al., 2002;Gruppen et al., 2012; Holmboe et al., 2010). 과정중심교육의 현재교육과정에서 평가는 종합평가의 의도로서 과정 종료시점에 간헐적으로 시행하는 경우가 많다. 또, 학습성과는 고려하지 않고 지필시험방법으로 지식습득에 대하여 평가하는 경우가 대부분이다. 질적 평가는 드물고 학생들의 상대적인 결과에 따라 판정하는 경우가 많으며 학생은 평가기준이나 과정을 잘 알지 못한 채 통보받은경우가 많다. 이렇게 현재 시행 중인 평가의 특성과 차이를 보이므로 새로운 성과중심교육의 도입을 위해서는 평가체계의 개선이 필요하다.




성과중심교육에서 평가방법의 선택


유럽의학교육학회(Association for Medical Education in Europe)에서 발간한 성과평가에 대한 지침을 중심으로, 평가방법의종류와 성과에 합당한 평가방법을 살펴보고자 한다(Shumway etal., 2003).



1. 평가방법의 분류


이 지침서에서는 평가방법을 5가지로 분류하여 장단점과 적용효과를 제시하였다. 

  • 지필시험(written assessments)은 지식의 회상또는 적용하는 성과를 평가하기 위하여 사용하며 단일정답형 또는 확장결합형 선택형 시험문항(multiple-choice of questions, MCQ)을 보편적으로 사용하고 있다. 지필시험은 신뢰성이 높은 장점이 있지만, 깊은 이해보다는 표면적 지식을 평가하는 경향이 있고 실제수행하는 것과 차이가 있다는 단점이 있다. 
  • 임상수행평가(clinical/practical assessments)는 임상수기 역량을 평가하는 신뢰성이 높은 시험이다. 표준화 환자 또는 시뮬레이션을 사용한 시험은 실제어려운 임상 상황을 표현할 수 있을 뿐만 아니라 즉각적인 피드백을 제공할 수 있는 장점이 있지만, 학생이 시험을 준비할 때 임상수기의 단편을 수행하도록 훈련받아 임상수기 전체의 흐름 연결을 이해하기에는 부족하며 표준화 환자 훈련 등 많은 비용이 수반되는단점이 있다. 관찰(observation)은 문제바탕학습이나 임상실습을수행하는 동안 시행할 수 있는 방법이며 의사소통기술이나 태도등의 역량을 평가하기에 합당하다.
  •  체크리스트(checklist) 또는 평점척도(rating scale)의 방법으로 평가하며 학생들에게 피드백을 주기 위한 효과적인 방법이지만, 교수 간 차이로 인해 신뢰도를 확보하기 어려운 단점이 있다. 
  • 포트폴리오(portfolio)는 수행한 내용과수행에 대한 성찰을 포함하는 것으로서 비판적 사고와 자기평가와같은 성과를 판단하는 유용한 방법일 수 있으나(Davis et al., 2001),신뢰도를 확보하기 위해 삼각기법 등의 방법이 필요하다. 로그북(logbooks)은 포트폴리오와 유사하나 단순히 학생들의 경험을 기록한 문서라 평가로서 한계가 많으며 학생이 정확하게 기록을 한것인지 명확하지 않은 단점이 있다. 
  • 동료와 자기평가(peer and selfassessment)는 교수평가와 함께 사용할 수 있으며 태도와 의사소통기술을 평가하기에 유용하나 신뢰도를 확보하기 위해서는 훈련이 필요하며 신뢰를 기본으로 한 교육환경의 조성이 전제되어야 한다. 평가방법을 선택할 때 측정하고자 하는 성과의 특성에 합당한방법을 선택해야 하며 각 평가방법의 장단점, 교육에 적용할 때 고려사항 등을 살펴보아야 할 것이다.


2. 성과에 따른 평가방법


Shumway et al. (2003)은 유럽의학교육학회 지침서를 통하여Miller의 평가 피라미드와 성과를 연결하여 평가방법을 선택하는모형을 제시하고자 하였다.


Miller (1990)는 학생을 평가하는 목적에 따라 아는 것(knows),어떻게 해야 하는지를 아는 것(knows how), 어떻게 해야 하는지를보여주는 것(shows how), 실제 하는 것(does)을 구분하여 평가해야한다고 하였다. Miller가 제시한 평가체계 피라미드와 평가방법을연결할 수 있는데, MCQ와 같은 지필시험은 knows와 knows how와, objective structured clinical examination (OSCE)와 같은 임상수행시험은 shows how, 실제 상황에서 행하는 것을 관찰하는 방법이나 포트폴리오 평가는 does에 해당한다(Figure 1).


Miller의 평가 피라미드와 던디 의과대학(University of DundeeSchool of Medicine)의 12가지 성과를 연결하면 Figure 2와 같다.의학지식(medical sciences), 진단 및 처치(investigation and management)는 knows와 knows how에, 임상수기(clinical skills)와 의사소통(communication)은 shows how, 윤리(ethics)와 의사의 역할(role of doctor)은 does에 해당한다.







Table 2는 성과의 특성을 잘 측정할 수 있는 평가방법을 제시하였다. 병력청취와 신체진찰 등을 포함하는 임상수기 역량은 OSCE와 같은 수행평가로 판단할 수 있을 뿐만 아니라 임상실습을 수행할 때 관찰방법이나 로그북을 이용하여 평가할 수 있다. 의학지식역량은 지필시험으로 평가할 수 있지만, 포트폴리오, 관찰 및 수행평가로도 판단이 가능하며, 태도와 윤리성은 실제 임상 상황에서판단하는 것이 가장 바람직하다.






3. 임상실습에서의 평가


성과중심교육에서는 실제 임상 상황에서 행하는 것을 강조하므로 임상실습에서 평가는 더욱 중요하며 가치가 있다. 의사소통기술이나 임상수기 역량은 임상수행평가를 통해서 가능하며 우리나라는 2009년부터 의사실기시험이 도입되면서 임상실습평가로서 수행시험이 정착되고 있다. 그러나 표준화 환자를 훈련하여 시행하는임상수행평가는 임상수기의 단면을 평가하는 것으로서 진료 전체의 흐름을 판단하기에 한계가 있으며 실제 수행과는 차이가 있다(Shumway et al., 2003).


Davis &Harden (2003)은 실제 임상 상황에서의 수행을 평가하는 것을 강조하며, 직무중심평가(workplace-based assessment)와같은 새로운 평가방법을 제안하였다. 직무중심평가는 매일의 실습현장에서 형성평가의 일환으로 지속적인 평가가 일어나며 피드백을통해 학생들의 수행 향상을 도모하는 것이다. Burch et al. (2006)은임상실습 중에 장기간의 직무중심평가를 시행한 결과, 학생들은자신의 역량수준을 알 수 있을 뿐만 아니라 학습동기유발, 실습활동에 적극적으로 참여하는 등의 긍정적 효과가 있었고, 교수 또한실습 중 형성평가가 학생들의 학습촉진에 효과적이었다고 하였다.피드백은 학생의 요구와 일치하여야 하며, 중요한 사안에 대해 구체적으로 제시해야 효과적인데, 임상교수들의 평가에 대한 이해를높이기 위해서 교수개발프로그램을 시행할 필요가 있다(Norcini& Burch, 2007).


직무중심평가의 평가방법으로는 수행의 직접관찰, 증례에 대한토의 및 동료, 간호사 및 환자에 의한 다각도의 360° 평가 등이 있다(Miller &Archer, 2010).


포트폴리오는 전문직업성, 평생학습능력과 같이 기존의 일반적인 평가방법으로 측정하기 어려운 역량을 평가할 수 있다. 포트폴리오 평가는 활동내용, 성취에 대한 근거, 학습경험에 대한 성찰을포함하게 된다. Friedman Ben David et al. (2001)은 성공적인 포트폴리오 평가를 위해 절차를 제시하였는데

  • 1) 포트폴리오 평가의목적을 분명히 하고, 
  • 2) 평가하는 역량을 결정하며, 
  • 3) 근거자료를정하고, 
  • 4) 채점표를 개발하며, 
  • 5) 관련 교수를 훈련하며, 
  • 6) 평가절차에 대하여 계획하며, 
  • 7) 학생에게 알려주며, 
  • 8) 평가지침을 개발하고, 
  • 9) 신뢰도와 타당도 근거를 확보하며, 
  • 10) 개선절차를 마련한다.



4. 강의와 실습에서의 실제 적용


브라운 의대(Brown Medical School)에서 9가지 역량에 대해 평가한 방법을 살펴보고자 한다(Smith et al., 2003). 

  • 첫째, 의사소통기술은 소집단수업에서 의사소통을 직접 관찰하거나 비디오로 녹화하여 평가할 수 있는데, 실제 해부학수업에서 해부실습을 하면서 발표, 질문과 응답을 통하여 의사소통기술을 평가하였다. 또, 선택과정에서 의사소통기술에 대하여 서술식으로 작성한 동료평가는 효과적이었다고 보고하였다. 임상실습을 수행하는 동안에는 직접 관찰 또는 OSCE시험을 통해 의사소통기술 평가가 이루어졌다.
  • 둘째, 임상수기 역량은 2학년 임상의학입문수업에서 피드백을 위하여 표준화 환자를 활용하였으며, 임상실습에서는 직접관찰, OSCE시험으로 평가하였다. 
  • 셋째, 기초의학지식의 적용 역량은 문제바탕학습(problem-based leanring, PBL) 수업 또는 임상 상황으로 주어진 MCQ, modified essay question (MEQ) 문항으로 평가하였다.
  • 넷째, 진단·치료·예방 역량은 PBL수업에서 평가가 가능하며, 임상실습에서는 지필 또는 구술시험, OSCE시험을 통해 평가하였다.
  • 다섯째, 평생학습 역량에 대한 평가는 PBL에서 학습과제 준비를평가하거나, 임상실습 논문지도교수가 장기간 학생들을 관찰하면서 시행하였다. 
  • 여섯째, 자기인식·자기관리·개인적 성장 역량 또한 멘토교수가 오랫동안 학생과 교류하면서 평가하였다. 
  • 일곱째, 건강관리의 사회적 맥락은 보고서, 구술시험, 전시물 및 성찰수업시간에 교수 직접관찰 등을 통해 평가하였으며, 지역사회 관련 프로그램이나 프로젝트, 일반인 대상의 발표과제로 평가가 가능하였다.
  • 여덟째, 도덕적 사유와 임상윤리는 윤리적 이슈를 분석하는 지필과제, OSCE에서 윤리적 문제를 접근하는 방법으로 평가하였으며멘토교수가 평가하기도 하였다. 
  • 아홉째, 문제해결 역량은 병원조직에서 복잡한 업무를 처리해야 할 때 우선순위를 정하고 효과적으로 일을 처리하는 능력으로, 임상 상황 또는 지필시험분석, 실습에서 실제 복잡한 일이 주어졌을 때 처리방법 및 학생인턴수행 시 업무관리능력을 평가하였다.



5. 기준설정(Setting Standard)


성과에 도달하였는지를 판단하는 것은 수집한 평가결과들을 취합하여 기준점수를 설정하는 복잡하고 어려운 일이다. 이러한 기준설정은 기준의 유형과 기준설정방법을 고려하여야 한다.


기준의 유형은 상대적(relative/norm-referenced) 기준과 절대적(absolute/criterion-referenced) 기준으로 분류할 수 있다. 상대적기준은 피험자의 시험결과에 따라 특정 수나 퍼센트를 선발할 때적절하며 운영하기가 쉬운 장점이 있으나 역량의 성취도를 판단하기는 어렵다(Norcini, 2003; Ricketts, 2009).


기준설정방법에는, 통과시킬 학생의 일정 비율을 정하는 고정비율방법(fixed percentage method, relative method), 무작위로 표본을 추출한 학생집단에 대해 결정을 내린 후 합격과 불합격의 점수를 그래프로 나타내어 비교하는 비교집단방법(contrasting groupsmethod, examinee-centered absolute method), 시험문항의 내용을 고려하여 경계그룹(borderline group) 학생들을 판단하는 엔고프(Angoff)와 에벨(Abel)방법(test-centered absolute method), 상대적&절대적 평가방법을 절충한 호프스티(Hofstee)방법 등이 있다(Norcini, 2003).


이와 같은 기준설정방법은 지필시험에 적용하기 위해 개발되었지만, 임상수행평가에서 변형하여 사용할 수 있다(Boulet et al.,2003). 엔고프와 에벨방법을 적용하면 각 시험방에 사용된 증례의채점표를 검토하여 최소수행기준을 정한 다음 각 시험방의 점수를합하여 기준점수를 설정한다. 비교집단방법에서도 유사하게 각 증례마다 합격과 불합격을 판단할 수 있는 점수를 정의하고, 모든 시험방의 점수를 합하여 기준점수를 정한다.


성과중심교육에서 평가는 상대적 서열을 결정하기 위한 것이 아니라 역량에 도달하였는지를 판단하는 것으로서 절대적 기준을 적용하는 것이 합당해 보인다. 그러나 George et al. (2006)에 의하면두 가지 기준유형과 설정방법에 따라 판단결과가 차이를 보이며,각 방법은 장단점이 있다. 사전에 결정된 기준점수(cut-off scores)로 판단하는 준거참조법(절대적 평가)은 실패율에서 변동을 보이며, 상대적 평가는 기준점수 자체의 변동을 보이는 단점이 있으므로 병합하여 판단하는 것이 유리하다는 의견도 있다(Cohen-Schotanus& van der Vleuten, 2010).


기준설정은 시험의 목적, 학생의 특성, 역량의 특성을 고려하여 다양한 견해를 통합하여 합리적인 판단을 하는 것으로서 평가도구가 신뢰도를 확보했는지, 합격률이 역량성취도와 일치하는지 숙고해야 하며, 향후 수행과도 비교하여야 한다(Norcini, 2003).



결 론


성과중심교육을 실천하기 위해 평가를 체계적으로 시행하는 것은 핵심적인 일이다. 현재 의학교육현장에서 평가는 성과중심교육에서 평가와 차이를 보이며, 다음과 같은 성과중심의 평가의 특성을 고려하여 평가체계를 개선할 필요가 있다.


성과중심교육에서 평가는 학생들에게 성과성취수준에 대하여 피드백을 주기 위한 형성평가의 목적이 강하며 지속적으로 이루어져야 한다. 성과를 평가하기 위해서는 적절한 평가방법을 선택하고 같은 성과에 대해서 다양한 방법으로 평가를 시행한다. 성과는 학생들이 실제 상황에서 수행하는 것을 목표로 하므로 수행평가가 특히 중요하며, 양적 평가뿐만 아니라 질적 평가를 시행할 필요가 있다. 학생의 성과도달 여부의 판단은 대체로 절대적 기준을 사용하는 것이 원칙이지만 다양한 자료의 분석과 의견을 종합하여 판단하는 것이 중요하다.


성과중심교육에서 평가의 이러한 특성은 교수의 적극적인 참여를 필요로 한다. 지식뿐만 아니라 윤리성, 평생학습능력 등 다양한 성과에 대하여 평가가 필요하며, 특히 전문직업성 등의 성과를 판단하기 위해서는 새로운 평가방법의 개발이 요구된다. 또 다양한 평가방법의 장단점을 알고 적절히 적용하여 합리적으로 판단할 수 있어야 하며, 효과적으로 피드백을 제시하는 방법에 대해서도 잘 알고 있어야 한다. 성과중심교육에서 학생은 중심적인 역할을 하는데, 성과의 정의와 판단기준을 명확히 이해하고 반드시 달성해야 하는 목표라는 점을 인지할 필요가 있으며, 기준을 충족하지 못한 역량을 파악하여 개선하도록 스스로 노력해야 할 것이다. 이를 위해서 성과중심교육과정과 평가는 교수와 학생이 함께 설계하여 공유의식을 높이고, 지원체계와 시스템을 갖추어 수월성을 확보하도록 노력해야 할 것이다.






Assessment plays a vital role in outcome-based education (OBE). This article describes the characteristics of assessment and appropriate assessment instruments for measuring learning outcomes in OBE. Assessment in

OBE needs to be formative, continuous, and frequent. Miller’s pyramid is useful for selecting the appropriate assessment

instruments to reflect a specific outcome; different methods can be applied to evaluate one outcome.

Outcomes and competency mean that student must ‘do’; therefore, performance tests are emphasized. Qualitative

methods as well as quantitative methods are used to evaluate the outcomes of areas such as professionalism

or ethics. An absolute criterion-based standard is usually applied to decide whether students pass or fail,

but the decision should be based on gathering value judgments and reaching consensus. Active participation

of faculty members and students in assessment is crucial.


Keywords: Outcome assessment, Competency-based education

전문가적 역량의 정의와 평가

Defining and Assessing Professional Competence 

Ronald M. Epstein, MD; Edward M. Hundert, MD

Author Affiliations: Departments of Family Medicine (Dr Epstein), Psychiatry (Drs Epstein and Hundert), and Medical Humanities (Dr Hundert), University of Rochester School of Medicine and Dentistry, Rochester, NY.




Medical schools, postgraduate training programs, and licensing bodies conduct assessments to certify the competence of future practitioners, discriminate among candidates for advanced training, provide motivation and direction for learning, and judge the adequacy of training programs. Standards for professional competence delineate key technical, cognitive, and emotional aspects of practice, including those that may not be measurable.1,2 However, there is no agreed-upon definition of competence that encompasses all important domains of professional medical practice. In response, the Accreditation Council for Graduate Medical Education defined 6 areas of competence and some means of assessing them3: 

    • patient care (including clinical reasoning), 
    • medical knowledge, 
    • practice-based learning and improvement (including information management), 
    • interpersonal and communication skills, 
    • professionalism, and 
    • systems-based practice (including health economics and teamwork).3




DEFINING PROFESSIONAL COMPETENCE


저자들의 정의

Building on prior definitions,1- 3 we propose that professional competence is the habitual and judicious use of communication, knowledge, technical skills, clinical reasoning, emotions, values, and reflection in daily practice for the benefit of the individual and community being served. 

      • 토대Competence builds on a foundation of basic clinical skills, scientific knowledge, and moral development. 
      • 구성It includes 
        • 인지적 기능 a cognitive function—acquiring and using knowledge to solve real-life problems; 
        • 통합적 기능 an integrative function—using biomedical and psychosocial data in clinical reasoning; 
        • 관계적 기능 a relational function—communicating effectively with patients and colleagues; and 
        • 정동적/도덕적 기능 an affective/moral function—the willingness, patience, and emotional awareness to use these skills judiciously and humanely (BOX 1). 
      • 마음의 습관에 따라 달라짐 Competence depends on habits of mind, including attentiveness, critical curiosity, self-awareness, and presence. 
      • 발전하는 것이며, 영구적이지 않고, 맥락의존적 Professional competence is developmental, impermanent, and context-dependent.





지식의 습득과 활용

Acquisition and Use of Knowledge

EBM은 Explicit한 것 Evidence-based medicine is an explicit means for generating an important answerable question, interpreting new knowledge, and judging how to apply that knowledge in a clinical setting.4 그러나 competence는 tacit knowledge에 의해서 정의됨 But Polanyi5 argues that competence is defined by tacit rather than explicit knowledge. 

Tacit knowledge is that which we know but normally do not explain easily, including the informed use of heuristics (rules of thumb), intuition, and pattern recognition. 

The assessment of evidence-based medicine skills is difficult because many of the heuristics used by novices are replaced by shortcuts in the hands of experts,6 as are other clinical skills.7


Personal knowledge는 경험을 통해 쌓는 것이지만, 경험이 곧바로 학습과 역량으로 연결되지는 않기 때문에 cognitive and emotional self-awareness가 필요하다. 

Personal knowledge is usable knowledge gained through experience.8 Clinicians use personal knowledge when they observe a patient's demeanor (such as a facial expression) and arrive at a provisional diagnosis (such as Parkinson disease) before eliciting the specific information to confirm it. Because experience does not necessarily lead to learning and competence,9 cognitive and emotional self-awareness is necessary to help physicians question, seek new information, and adjust for their own biases.


Integrative Aspects of Care

Professional competence 는 고립된 역량들과는 다르다. 한 질환을 가진 환자를 치료하는데 필요한 개별적 지식과 술기를 갖춘 학생이라도 그 환자를 잘 볼 수 있는 것은 아니다.

Professional competence is more than a demonstration of isolated competencies10; "when we see the whole, we see its parts differently than when we see them in isolation."11 


역량있는 의사는 통합적으로 사고하고, 느끼고, 행동할 수 있어야 한다.

A competent clinician possesses the integrative ability to think, feel, and act like a physician.6,12- 15 Schon16 argues that professional competence is more than factual knowledge and the ability to solve problems with clear-cut solutions: it is defined by the ability to manage ambiguous problems, tolerate uncertainty, and make decisions with limited information.


역량은 과학, 임상, 인문학에 대한 전문가적 판단을 임상추론에 활용할 수 있느냐에 달려있다.

Competence depends on using expert scientific, clinical, and humanistic judgment to engage in clinical reasoning.14,15,17,18 Although expert clinicians often use pattern recognition for routine problems19 and hypothetico-deductive reasoning for complex problems outside their areas of expertise, expert clinical reasoning usually involves working interpretations12 that are elaborated into branching networks of concepts.20- 22 


Building Therapeutic Relationships

환자-의사 관계는 건강과 질병의 회복, 비용, 만성질환에 의한 결과 등에 영향을 준다.

The quality of the patient-physician relationship affects health and the recovery from illness,23,24 costs,25 and outcomes of chronic diseases26- 29 by altering patients' understanding of their illnesses and reducing patient anxiety.26 Key measurable patient-centered28 (or relationship-centered)30,31 behaviors include responding to patients' emotions and participatory decision making.29


의학적 과오는 종종 개개인의 실수가 아닌 시스템의 실패에서 기인한다. 따라서 팀워크에 대한 평가와 기관의 자체평가가 개인이 대한 평가를 보완할 수 있다.

Medical errors are often due to the failure of health systems rather than individual deficiencies.32- 34 Thus, the assessment of teamwork and institutional self-assessment might effectively complement individual assessments.


Affective and Moral Dimensions

동료나 환자에 의해서 더 잘 평가될 수도 있다.

Moral and affective domains of practice may be evaluated more accurately by patients and peers than by licensing bodies or superiors.35 (...) Recent neurobiological research indicates that the emotions are central to all judgment and decision making,13 further emphasizing the importance of assessing emotional intelligence and self-awareness in clinical practice.1,40- 42


Habits of Mind

여러가지가 있겠으나 객관화하기가 어렵다. 의학에서의 과오는 의구심을 건너뛴 스스로에 대한 과신에서 나온다.

Competence depends on habits of mind that allow the practitioner to be attentive, curious, self-aware, and willing to recognize and correct errors.43 Many physicians would consider these habits of mind characteristic of good practice, but they are especially difficult to objectify. (...) Errors in medicine, according to this view, may result from overcertainty that one's impressions are beyond doubt.41,43,44


Context

역량은 맥락 의존적이다. 개인의 능력, 업무, 의료시스템의 환경과 임상상황의 관계에 달려있다. 

Competence is context-dependent. Competence is a statement of relationship between an ability (in the person), a task (in the world),45 and the ecology of the health systems and clinical contexts in which those tasks occur.46,47 This view stands in contrast to an abstract set of attributes that the physician possesses—knowledge, skills, and attitudes—that are assumed to serve the physician well in all the situations that he or she encounters. (...)


Development

역량은 발전하는 것이다. 각 수련의 단계마다 역량의 어떤 측면이 획득되는지에 대해서는 논쟁이 있다. 따라서 임상의사와 학생의 평가가 어떻게 달라야 하는지에 대한 질문이 생긴다. 어떻게, 어떤 수련의 단계에서 환자-의사 관계에 대한 것을 평가해야하는지를 결정하는 것도 쉽지 않다. 의료의 변화에 의해서도 역량을 재정의 해야하기도 한다. 

Competence is developmental. There is debate about which aspects of competence should be acquired at each stage of training. (...) which raises the question of whether assessment of practicing physicians should be qualitatively different from the assessment of a student. Determining how and at what level of training the patient-physician relationship should be assessed is also difficult. (...) Changes in medical practice and the context of care invite redefinitions of competence; for example, the use of electronic communication media48 and changes in patient expectations.49,50




CURRENT MEANS OF ASSESSMENT


평가는 다음의 측면에서 봐야 한다. 

Assessment must take into account what is assessed, how it is assessed, and the assessment's usefulness in fostering future learning. In discussing validity of measures of competence in an era when reliable assessments of core knowledge, abstract problem solving, and basic clinical skills have been developed,45,51- 56 we must now establish that they encompass the qualities that define a good physician: the cognitive, technical, integrative, contextual, relational, reflective, affective, and moral aspects of competence. We distinguish between expert opinion, intermediate outcomes, and the few studies that show associations between results of assessments and actual clinical performance.57- 60


어떻게 평가의 과정에서 미래의 학습을 촉진할 수 있는지 고려해야 한다.

We consider how the process of assessment might foster future learning. Too often, practitioners select educational programs that are unlikely to influence clinical practice.61 Good assessment is a form of learning and should provide guidance and support to address learning needs. Finally, we address concerns that the medical profession still lacks adequate accountability to the public62 and has not done enough to reduce medical errors.32,63


평가의 각 영역에 대해서 네 가지 레벨이 있다.

Within each domain of assessment, there are 4 levels at which a trainee might be assessed (Figure 1).64 

        • The knows level refers to the recall of facts, principles, and theories. 
        • The knows how level involves the ability to solve problems and describe procedures. 
        • The shows how level usually involves human (standardized patient), mechanical, or computer simulations that involve demonstration of skills in a controlled setting. 
        • The does level refers to observations of real practice. 

For each of these levels, the student can demonstrate the ability to imitate or replicate a protocol, apply principles in a familiar situation, adapt principles to new situations, and associate new knowledge with previously learned principles.65






METHODS


Summary of Studies

가장 많이 쓰이는 세가지 평가법

The 3 most commonly used assessment methods are 

subjective assessments by supervising clinicians, 

multiple-choice examinations to evaluate factual knowledge and abstract problem solving,66 and 

standardized patient assessments of physical examination and technical and communication skills.67- 69 


다음의 것들은 평가가 잘 되고 있지 않다.

Although curricular designs increasingly integrate core knowledge and clinical skills, most assessment methods evaluate these domains in isolation. Few assessments use measures such as participatory decision making70 that predict clinical outcomes in real practice. Few reliably assess clinical reasoning, systems-based care, technology, and the patient-physician relationship.3,69 The literature makes important distinctions between criteria for licensing examinations and program-specific assessments with mixed formative and summative goals.


지식과 문제해결능력 평가(MCQ)

Evaluation of factual knowledge and problem-solving skills by using multiple-choice questions offers excellent reliability71- 75 and assesses some aspects of context and clinical reasoning. (...) Standardized test scores have been inversely correlated with empathy, responsibility, and tolerance.83 Also, because of lack of expertise and resources, few medical school examinations can claim to achieve the high psychometric standards of the licensing boards.


OSCE 개요

The Objective Structured Clinical Examination (OSCE) is a timed multistation examination often using standardized patients (SPs) to simulate clinical scenarios. The roles are portrayed accurately56,84 and simulations are convincing; the detection rate of unannounced SPs in community practice is less than 10%.57,59,85- 89 Communication, physical examination, counseling, and technical skills can be rated reliably if there is a sufficiently large number of SP cases67,90- 100 and if criteria for competence are based on evidence.101 Although few cases are needed to assess straightforward skills, up to 27 cases may be necessary to assess interpersonal skills reliably in high-stakes examinations.102,103 Although SPs' ratings usually correlate with those of real patients,104 differences have been noted.105- 107


OSCE Pass/Fail 정하기

Defining pass/fail criteria for OSCEs has been complex.54,108- 111 There is debate about who should rate student performance in an OSCE.112 Ratings by the SP are generally accurate52 but may be hampered by memory failure, whereas external raters, either physicians or other SPs, may be less attuned to affective aspects of the interview and significantly increase the cost of the examination.


체크리스트

Checklist scores completed by physician-examiners in some studies improve with expertise of the examinees113 and with the reputation of the training program.90,114 But global rating scales of interpersonal skills may be more valid than behavioral checklists.7,115,116 The OSCE scores may not correlate with multiple-choice examinations and academic grades,90,100,117 suggesting that these tools measure different skills. Clinicians may behave differently in examination settings than in real practice,106,118 and short OSCE stations can risk fragmentation and trivialization of isolated elements of what should be a coherent whole.119 The OSCE also has low test reliability for measuring clinical ethics.120


does level에서는 validated strategies가 별로 없음.

There are few validated strategies to assess actual clinical practice, or Miller's does level. Subjective evaluation by residents and attending physicians is the major form of assessment during residency and the clinical clerkships and often includes the tacit elements of professional competence otherwise overlooked by objective assessment instruments. Faculty ratings of humanism predicted patient satisfaction in one study.121 However, evaluators often do not observe trainees directly. They often have different standards122,123 and are subject to halo effects124 and racial and sex bias.125,126 Because of interpatient variability and low interrater reliability, each trainee must be subject to multiple assessments for patterns to emerge. Standardized rating forms for direct observation of trainees127- 132 and structured oral examination formats have been developed in response to this criticism.133,134


Profiling by managed-care databases 

Profiling by managed-care databases is increasingly used as an evaluation measure of clinical competence. However, data abstraction is complex140 and defining competence in terms of cost and value is difficult. The underlying assumptions driving such evaluation systems may not be explicit. For example, cost analyses may favor physicians caring for more highly educated patients.141


동료평가

Peer ratings are accurate and reliable measures of physician performance.77,142 Peers may be in the best position to evaluate professionalism; people often act differently when not under direct scrutiny.143 Anonymous medical student peer assessments of professionalism have raised awareness of professional behavior, fostered further reflection, helped students identify specific mutable behaviors, and been well accepted by students.35 Students should be assessed by at least 8 of their classmates. The composite results should be edited to protect the confidentiality of the raters.


자기평가

Self-assessments have been used with some success in standardized patient exercises144 and in programs that offer explicit training in the use of self-assessment instruments.145 Among trainees who did not have such training, however, self-assessment was neither valid nor accurate. Rather, it was more closely linked to the trainee's psychological sense of self-efficacy and self-confidence than to appropriate criteria, even among bright and motivated individuals.




COMMENT


Assessment serves personal, institutional, and societal goals (BOX 2). Distinctions between these goals often are blurred in practice. 


Performance는 직접 측정이 가능하나 역량은 inferred quality이다. 

Whereas performance is directly measurable, competence is an inferred quality.148 

      • Performance on a multiple-choice test may exceed competence, as in the case of a trainee with a photographic memory but poor clinical judgment. 
      • Conversely, competence may exceed test performance, as in the case of a trainee with severe test anxiety.

Correlation with National Board scores and feedback on graduates' performance can be useful in validating some assessment instruments but should be done with caution. For example, efficiency is highly valued in residents but less so in medical students.





Future Directions



Comprehensive assessments link content across several formats. 

        • Postencounter probes immediately after SP exercises using oral, essay, or multiple-choice questions test pathophysiology and clinical reasoning in context.151,152 
        • Triple-jump exercises152consisting of a case presentation, an independent literature search, and then an oral or written postencounter examination—test the use and application of the medical literature. 
        • Validated measures of reflective thinking153 have been developed that use patient vignettes followed by questions that require clinical judgment. These measures reflect students' capacity to organize and link information; also, they predict clinical reasoning ability 2 years later.153 


Combining formats appears to have added value with no loss in reliability.150,154 Ongoing educational outcomes research will show whether composite formats help students learn how to learn more effectively, develop habits of mind that characterize exemplary practice,43 and provide a more multidimensional picture of the examinee than the individual unlinked elements.



Well-functioning health systems are characterized by continuity, partnership between physicians and patients, teamwork between health care practitioners, and communication between health care settings.156,157 

        • The use of time in a continuity relationship can be assessed with a series of SP or real-patient exercises. 
        • To assess partnership, patient assessment, currently used to assess physicians in practice,158 is being tested for students and residents.159,160 These efforts are guided by data showing that patients' ratings of communication and satisfaction correlate well with biomedical outcomes,24,29 emotional distress,161 health care use,25 and malpractice litigation.162 Patient ratings also have the potential to validate other measures of competence.163 
        • Several institutions assess teamwork by using peer assessments. Others use sophisticated mannequins to simulate acute cardiovascular physiological derangements found in intensive care settings164- 169; trainees are graded on teamwork as well as individual problem solving, and statistical adjustments can account for team composition. 
        • Communication between health settings could be assessed at the student level, for example, by grading of their written referral letters.170











 2002 Jan 9;287(2):226-35.

Defining and assessing professional competence.

Abstract

CONTEXT:

Current assessment formats for physicians and trainees reliably test core knowledge and basic skills. However, they may underemphasize some important domains of professional medical practice, including interpersonal skills, lifelong learning, professionalism, and integration of core knowledge into clinical practice.

OBJECTIVES:

To propose a definition of professional competence, to review current means for assessing it, and to suggest new approaches to assessment.

DATA SOURCES:

We searched the MEDLINE database from 1966 to 2001 and reference lists of relevant articles for English-language studies of reliability or validity of measures of competence of physicians, medical students, and residents.

STUDY SELECTION:

We excluded articles of a purely descriptive nature, duplicate reports, reviews, and opinions and position statements, which yielded 195 relevant citations.

DATA EXTRACTION:

Data were abstracted by 1 of us (R.M.E.). Quality criteria for inclusion were broad, given the heterogeneity of interventions, complexity of outcome measures, and paucity of randomized or longitudinal study designs.

DATA SYNTHESIS:

We generated an inclusive definition of competence: the habitual and judicious use of communication, knowledge, technical skills, clinical reasoning, emotions, values, and reflection in daily practice for the benefit of the individual and the community being served. Aside from protecting the public and limiting access to advanced training, assessments should foster habits of learning and self-reflection and drive institutional change. Subjective, multiple-choice, and standardized patient assessments, although reliable, underemphasize important domains of professional competence: integration of knowledge and skills, context of care, information management, teamwork, health systems, and patient-physician relationships. Few assessments observe trainees in real-life situations, incorporate the perspectives of peers and patients, or use measures that predict clinical outcomes.

CONCLUSIONS:

In addition to assessments of basic skills, new formats that assess clinical reasoning, expert judgment, management of ambiguity, professionalism, time management, learning strategies, and teamwork promise a multidimensional assessment while maintaining adequate reliability and validity. Institutional support, reflection, and mentoring must accompany the development of assessment programs.

Comment in




(출처 : http://dartmed.dartmouth.edu/fall12/html/back_to_basics/)





의학교육자들은 인지심리학, 교육심리학과 관련된 이론들을 찾아서 어떠한 학습후 행동(post-learning activities)의 특성이 기억의 장기보존(long-term retention)이나 지식의 전달(transfer of knowledge)에 어떤 정보를 줄 수 있는가를 연구해왔다.

Medical educators increasingly look to related disciplines, particularly those of cognitive and educational psychology, to inform the critical nature of post-learning activities with respect to, for example, long-term retention and the application or transfer of knowledge.


자기설명(self-explanation)이나 시험 강화 학습(Test-enhanced learning)과 같은 학습 기술들의 적절성이나 효과성을 다룬 연구들이 있다. Larsen등은 TEL이 SE와 비교하여 기억의 장기보존에 어떠한 영향을 주며, 두 방법을 혼합했을 때 효과는 어떠한지를 연구했다.

Research has aimed to investigate the appropriateness and efficiency of several learning techniques, such as self-explanation (SE) and test-enhanced learning (TEL). In this issue of Medical Education, Larsen et al.1 investigate how TEL compares with SE with regard to long-term retention and explore the added value to be gained by combining the two learning methods.


시험은 대개 평가를 위해 활용되지만, 정보의 보존을 더 높여줄 수 있고 이를 시험효과(testing effect)라고 한다. 기억의 인출(retrieval)은 이 효과의 핵심 기전이며, 기존의 지식에 대한 접근 경로를 더 강화시킨다.

Tests are usually used for assessment, but they can directly influence learning by promoting better retention of information, a phenomenon known as the testing effect.2 Retrieval, presumed to be the central mechanism mediating the effect, is seen as an active process that strengthens the pathway to the given knowledge.


기억의 인출은 지식을 능동적으로 재구성함으로써, 자기 모니터링(self-monitoring)을 하게 되고, 특정 기억을 반복하도록 하는 것으로 알려져있다.

It has been suggested that retrieval may involve the active reconstruction of knowledge3 and may induce targeted rehearsal through self-monitoring.4


또한 기억의 인출에 더 많은 노력을 쏟을수록 학습에 대한 효과도 더 높다.

it has been noted that the more ‘effort’ students put into the retrieval process, the more effect it has on learning.2,5


자기설명은 어떤 학습자료가 주어졌을 때 스스로에게 설명을 함으로서 이해를 높이는 것이다. 이 과정에서 추론을 하거나 정보간 고리를 만들면서 지식의 재구조화가 일어나고 학습자는 더 일관성있고 통합된 학습을 하게 된다. 자기설명의 긍정적 효과는 다양한 영역에서 보여진 바 있고, 피드백이 없는 상태에서도 효과가 있다고 밝혀져 있다. 자기설명의 이러한 효과는 자기설명에 대한 훈련이 되어 있을 때나 추가적인 단서가 주어질 때 더 효과적이다.

Self-explanation involves generating explanations to oneself while working through given learning material with the purpose of deepening one’s understanding.6,7 As a result of the generation of inferences, or the creation of new links between pieces of information or prior knowledge, and monitoring, knowledge restructuring takes place, allowing the learner to build a more coherent and integrated representation that facilitates the transfer of learning.6 The positive effect of SE on learning has been shown in various domains and even in the absence of specific content feedback. The effects of SE are increased when students are trained to self-explain8 and by the addition of prompts and cues.9 


의학교육에서 자기설명은 임상실습 기간에 익숙하지 않은 케이스에 대한 진단 수행능력에 도움이 되나, 이미 익숙한 케이스에는 별 도움이 안 되는 것으로 나타났다. 자기설명이 강력하긴 하나, 장기적 효과는 알려져 있지 않다.

In medical education, SE has been shown to have beneficial effects on students’ diagnostic performance on unfamiliar cases during clerkships, but not on familiar ones.10 Despite the robustness of SE, its long-term effects are unknown.


의학의 전문가는 지식을 기반으로 하지만, 단순히 오랜 시간동안 지식을 축적하는 것 이상이다. 의사는 생의학적 지식과 인과관계에 대한 설명, 임상 지식, 질병에 대한 지식들의 상호 네트워크를 구성해서, 질병과 임상표현에 대한 일관된 심적 표상(mental representation)을 가지게 된다.

Medical expertise is knowledge-based, but is much more than an accumulation of retained knowledge over time. It is characterised by complex networks in which biomedical knowledge and causal explanations, clinical knowledge, illness scripts and instances are interconnected and form coherent mental representations of diseases and clinical presentations.11 


의사는 단순히 지식을 적용하거나 기존의 해답을 활용해서 문제를 푸는 사람이 아니다. 더 중요한 것은 지속적으로 변하는 환경과 개별 환자의 특성에 맞춰 해답을 적용하는 것이다. 이는 우수한 의과대학 학생들에게조차 엄청난 양의 학습을 필요로 하며, 다양한 효과적인 학습 기술을 필요로 한다.

Clinicians not only apply knowledge and solve problems using existent solutions, but, more importantly, adapt solutions to the ever-changing context and uniqueness of each patient.12 This represents a tremendous learning challenge, even for highly selected medical students, and thus a variety of effective learning techniques are welcome.


Larsen 등은 의학교육에 있어서 TEL과 SE이 상호 보완적인 역할을 한다는 것에 덧붙여서 이것들의 장기적인 효과를 언급했다. 저자는 TEL과 SE가 합해지면 교육 성과에 엄청난 효과를 낼 수 있다는 것을 보였다.

Larsen et al.’s study1 adds to the evidence for the long-term effects of TEL and SE in medical education, in addition to providing insight into their abilities to complement one another, despite their differences. The authors observed that TEL and SE combined produced the greatest impact on the educational outcome


실제로 학생들은 '시험에서 답을 할 수 있는 것은 학습자료에 있는 지식을 충분히 갖추고 있다는 것을 의미하지만, 반드시 이해의 수준이 높은 것은 아니다' 라고 말했다. 이것은 SE가 익숙하지 않은 환경에서는 효과적이나 익숙한 환경에서는 그렇지 않다는 연구결과와 일치한다고 볼 수 있다.

In fact, students commented that ‘the ability to answer the question on a test indicated that they had sufficient knowledge of the material and giving an explanation did not add significantly to their understanding’.1 This is in line with previous studies showing that SE works when it is used in unfamiliar but not in familiar contexts. 10


의학교육자로서 우리는 학생들이 목적의식이 있는 학습을 하고, 지식을 실습할 수 있는 기회를 주고, 활용할 수 있는 자원을 효과적으로 사용해야 한다. 그러기 위해서 우리는 우리가 가진 형성평가에 TEL을 효과적으로 할 수 있는 요소들이 포함되도록 만들어야 한다. 시험은 반복적으로, 자주, 일정 기간을 두고 이뤄져야 한다.

As medical educators, we should pursue endeavours that provide purposeful learning and practice retrieval opportunities 

for students, and use available resources efficiently. To do so, we can revisit our formative assessment practices and make sure they incorporate the critical elements that make TEL so effective. Testing should be repeated, frequent and spaced over time. 


또한 시험은 효과적인 기억의 인출과 긍정적 피드백 작용을 하도록 만들어져야 한다. 의학교육자들은 실기시험과 자기설명을 활용할 수 있도록 해야 하며, 효율성이 낮은 강조(highlighting) 반복해서 읽기(rereading), 요약하기(summarising)등의 사용은 지양해야 한다.

It should include production tests that require effortful retrieval on the part of students and provide feedback. Medical educators should promote the use of practice testing and self-explanation and discourage the exclusive use of techniques that have proven to be of low utility, such as highlighting, rereading and summarising.13


선생님으로서, 우리는 TEL과 SE에 더 익숙해져야 하며, 학생들이 학습방법에 대한 조언이 필요하다면 같이 이야기할 수 있어야 한다. 이들 두 기술을 활용하는데는 복잡한 기술이나 학습자료가 필요한 것이 아니다.

As teachers, we should become familiar with TEL and SE, and should be able to engage in discussions with students and to give advice on learning and study habits when it is required. The implementation by students of these two techniques does not necessarily require advanced technologies or complex materials, and they can be relatively easy to use. 





 2013 Jul;47(7):641-3. doi: 10.1111/medu.12218.

Back to basicskeeping students cognitively active between the classroom and the examination.

Source

Sherbrooke, Quebec, Canada.

PMID:

 

23746152

 

[PubMed - in process]








+ Recent posts