면허와 인증의 맥락에서의 평가(Teach Learn Med, 2013)

Assessment in the Context of Licensure and Certification

John J. Norcini

Foundation for Advancement of International Medical Education and Research, Philadelphia,

Pennsylvania, USA

Rebecca S. Lipner and Louis J. Grosso

American Board of Internal Medicine, Philadelphia, Pennsylvania, USA





지난 25년간 세 가지 주된 힘이 licensure and certification에 영향을 미쳐왔다.

Over the past 25 years, three major forces have had a sig- nificant influence on licensure and certification:

  • 교육 프로세스에서 교육 성과로 the shift in focus from educational process to educational outcomes,

  • 의사의 커리어 전반에 걸친 학습과 평가의 필요성 the increasing recognition of the need for learning and assessment throughout a physician’s career, and

  • 새로운 평가의 풍경을 연 기술과 pschometrics의 변화 the changes in technology and psychometrics that have opened new vistas for assessment.


이러한 힘에 반응하여 면허와 인증 프로그램은 다음의 변화를 겪었다.

To respond to these forces, licensure and certification programs have

(a) 시험 구성/채점/시행의 방식 향상 improved the ways in which their examinations are constructed, scored, and delivered;

(b) 평가방법의 레파토리 확장 expanded their reper- toire of methods of assessment; and

(c) 결정을 타당화하기 위한 연구에 대한 투자 invested in research intended to validate their decisions.


면허와 인증에 미치는 힘

FORCES INFLUENCING LICENSURE AND CERTIFICATION


교육 성과

Educational Outcomes


1990년대 초반에, 교육의 일반적인 초점이 프로세스에서 성과로 옮겨갔다. 전 세계 규제기구에 의해서 의학교육도 이 접근법을 점차 도입하기 시작했다.

In the early 1990s, the focus in general education shifted from process to outcome.1 Driven by the regulatory bodies around the world, medical education has begun to gradually adopt this approach.


좋은평가는 성과로의 움직임에 있어서 핵심되는 것이며, 왜냐하면 그 자체로 '적절한 결과가 달성되었음'을 확인해주기 때문이다. 따라서 '성과'의 도입은 면허와 인증 기구들이 그들의 결정을 내리는 기반이 되어주는 평가를 어떻게 생각하는지에 큰 영향을 주었다. 특히, 환자와의 의사소통/대인관계기술/프로페셔널리즘 등과 같은 것은 역사적으로 평가에 포함되어오지 않았는데, 지난 25년간 새로운 평가법등이 도입되었다.

Good assessment is the linchpin of the outcomes movement, as it is the means for ensuring that the appropriate results have been achieved. As such, the introduction of outcomes has had a profound effect on howthe licensing and certifying bodies con- ceive of the assessments on which they base their decisions. In particular, competencies such as communications with patients, interpersonal skills, and professionalism had not historically been included in the assessments of the licensing and certi- fying bodies. Over the past 25 years, this has led to research in, and the introduction of, new methods of assessment



평생학습과 평가에 대한 요구

Need for Lifelong Learning and Assessment


학습과 평가에 대한 요구가 평생 지속된다는 것을 지지하는 네 가지 근거가 있다.

The lifelong need for learning and assessment is supported by four key pieces of scientific evidence related to patient care.

  • 환자는 다양한 만성, 급성 상황에서 받아야 할 care의 절반 정도밖에 받지 못한다.
    First, McGlynn and colleagues showed that patients receive only about half of the care that they should for a variety of chronic and acute conditions.5

  • 둘째, 의사의 지식/술기/근거중심의료에 대한 순응도가 시간이 지남에 따라 떨어진다.
    Second, Choudhry and colleagues showed that continued learning and assessment throughout a physician’s career was critical because physicians’ knowledge, skills, and compliance with evidence-based patient care decline as a func- tion of time.6

  • 의사들은 자신의 강점과 약점을 정확히 자기-평가해내지 못한다. 그러나 적절한 학습도구는 의사들이 그 역량의 gap을 찾아내고 줄여가는데 도움을 줄 수 있다.
    Third, researchers found that physicians are unable to accurately self-assess their strengths and weaknesses with respect to patient care, but with appropriate learning tools to help physicians identify and fill the competency gaps, effective change can occur.7–10

  • 넷째, 환자들은 의사들이 몇 년 단위로 평가되고 있다고 믿으며, 의학의 발전을 따라가고 있다고 믿는다.
    Fourth, patients believe that physicians are being assessed every few years and are keeping pace with medical advances.11,12


ABMS 의 MOC프로그램은 현재 '평생 인증'에 반하여 진행되는 것으로, 지속적인 학습과 정기적인 평가가 대중들에게 의사들이 근거-중심 환자진료를 의사의 커리어에 결쳐 꾸준히 따라가고 있음을 확신시켜주는데 필요함을 인정하는 것이다. 유사하게, 면허 역시 '평생의 특권'에서 'MOL'로 옮겨가고 있으며, 정기적으로 의사들이 자신의 면허를 갱신하게끔 한다.

The ABMS Maintenance of Certification (MOC) programs, which nowrequire ongoing activity as opposed to lifetime certification, acknowledge that continuous learning and periodic assessment are important to assuring the public that physicians are keep- ing pace with evidence-based patient care throughout their ca- reers.13 Likewise, medical licensure has begun to transition from an initial privilege to Maintenance of Licensure, which period- ically requires physicians to renew their license.14



테크놀로지와 Psychometrics

Technology and Psychometrics


지난 25년간 시스템의 상호연결성과 컴퓨팅 능력에 엄청난 진전이 있었고, 의사가 근무하는 방식 역시 25년 전과는 매우 달라졌다.

The past 25 years have also seen tremendous advances in computing power and the interconnectivity of systems.15 As a result, the way that physicians work is quite different than it was 25 years ago, and this has led to the incorporation of technological ad- vances into new assessment approaches.


문항 분석, 시험 분석의 영역에서는 IRT가 발전하였다.

In the areas of item analysis, examanalyses, and scoring the major advance has been the growth of Item Response Theory (IRT).18 IRT is...

  • a sophisticated mathematical model that provides precise esti- mates of examinee ability and evaluates how well assessments and individual items work without assuming equal difficulty of items.


면허와 인증에 미치는 힘에 대한 반응

RESPONSES TO THE FORCES INFLUENCING LICENSURE AND CERTIFICATION


시험 구성/채점/시행의 향상

Improved Test Construction, Scoring, and Delivery


 

Adaptive testing.


더 짧은 시험 시간 내에 더 정확하게 결과를 알려준다. IRT 방법을 활용하여, 피험자의 능력이 시험 프로세스 내내 추정되고, 추정이 이뤄질 때마다 최대한의 정보를 줄 수 있는 시험문항이 제공되고 언제 특정 피험자가 시험을 끝낼지는 정지규칙stopping rule(능력 추정 신뢰도의 충분조건)에 따라 이뤄진다.

Although there are several approaches to adaptive testing (e.g., pure or multistage), they all offer the advantage of shorter testing times with higher levels of preci- sion than standard, fixed length tests. Adaptive testing typically employs IRT methodology.19 Estimates of examinee ability are made throughout the testing process. Each time an estimate is made items that provide maximum information are adminis- tered, and stopping rules (i.e., sufficient confidence in the accu- racy of an ability estimate) are used to determine when the exam ends for a particular examinee.



Test scoring.


테크놀로지 발전으로 피험자가 시험을 보면서 취하는 대부분의 행동을 잡아낼 수 있게 되었고, 복잡한 채점과 분석 방법이 가능해졌다. 얼마나 시간을 쓰는가, 답을 얼마나 바꾸는가, 얼마나 검산(검토)를 하는가, 누가 먼저 시험을 종료하는가 등.

Technology now enables the capture of most examinee actions during a computer exam, and it, in turn, has al- lowedfor more sophisticatedscoringandanalyses. This includes a window into the amount of time examinees spend on items and their test-taking behaviors (e.g., frequency of changing an- swers, items marked for review, and the order of examination completion).


이러한 복잡한 과업을 위한 측정방법(rule-based logic and regression- based and Bayesian algorithms)도 더 복잡해졌다.

Measurement solutions to scoring these complex tasks, such as rule-based logic and regression- based and Bayesian algorithms, have become more sophisti- cated and effective in medical licensure.20,21




Test design.


시험 설계를 위한 평가공학의 도입이 가능해졌고, 다양한 conceptual design framework이 있다. 근거를 모으고 해석하는 방식이 평가의 확장된 목적과 더 직접적으로 연관된다.

Implementation of assessment engineering ap- proaches to building tests are now possible, and the conceptual design framework can support various methods.22 It helps ensure that the way in which evidence is gathered and interpreted is more directly linked to the intended purpose of the assessment.


다른 분야와 마찬가지로 연산능력의 향상이 기여하였는데, 시험을 컴퓨터에서 시행하게 되고 정보를 자동으로 수집하면서 문항 유형을 더 realistic하게 만들 수 있다.

Like advances in other areas, improvements in computing have opened the door to advances in the authenticity of the assessment. Delivery of exams on the computer and the au- tomatic capture of the data allows for more realistic features and item types.


마지막으로, 미국에서 EHR 도입을 강제하는 것은 의사의 진료행위 전반에 걸친 평가를 가능하게 한다. 잘 개발된 결정-지원 도구의 활용은(왓슨 등) 평가 접근이 더 달라질 것이다. 

Finally, the mandate for implementing electronic health records in the United States will enhance the ability to assess practice performance across the breadth of a physician’s prac- tice. Use of well-developed tools that provide decision support, such as Watson and Isabel, might further change the approach to assessment.23,24


문항 개발과 조합

Item development and test assembly.


AIG가 등장했다. 또한 ATA는 내용/측정특성/보안 등을 고려하여 구체적인 조합을 만들어준다.

In terms of item devel- opment, efficiencies have been gained through the Automated Item Generation approach.25 Advances in technology have also enabled the use of sophisti- cated Automated Test Assembly routines to ensure that tests are built to a specific set of specifications including content, mea- surement properties, and security considerations.



데이타 법의학

Data forensics.


레코딩 장치가 더 작고 강력해져서 부정행위의 방식도 다양해졌다. 면허와 인증과 관련한 stake가 크기 때문에 이러한 데이타 법의학의 과학은 측정 커뮤니티에서 좋은 반응을 얻고 있는데, 퍼포먼스와 응시 행태에서의 이상한 부분(빠른 대답)을 발견하며, 추가 연구가 필요하다.

Small recording devices and more pow- erful, subtle ways of communicating have enabled unprofes- sional behaviors such as cheating or illegally obtaining exam content. The stakes associated with licensure and certification have grown such that jobs and financial incentives are closely linked to successfully maintaining credentials. The science of data forensics has been the major response of the measure- ment community. Data forensics employ statistical procedures to identify anomalies in exam performance and behavior (e.g., rapid response to questions) that require further investigation.



새로운 평가방법

New Methods of Assessment



적절하게 도입된다면 구술시험은 MCQ의 정보를 보완해줄 수 있다. 그러나 outcome movement에 따라 임상스킬, 커뮤니케이션, 프로페셔널리즘 등을 평가해야할 요구가 높아졌다. 초기의 면허와 인증에서 규제기구는 시뮬레이션을 더 강조하는 방식으로 응답하였다.

When deployed properly, the oral examination can supplement the information derived from MCQs. With the out- comes movement, however, there followed the need to assess a wider array of competences including clinical skills, communi- cation with patients, and professionalism.26 For initial licensure and certification, the regulatory bodies responded by increas- ing emphasis on simulation in their examination processes.27



시뮬레이션

Simulation.


SP와 컴퓨터/마네킹 방식이 있다.

Two types of simulation are now included in licensure and certifica- tion: standardized patients28,29 and computer/mannequin-based simulations.30


처음 SP를 활용한 곳은 MCC에서 1993년

One of the first standardized-patient-based assessments used as part of licensure was offered by the Medical Council of Canada in 1993.


컴퓨터/마네킹 기반의 시뮬레이션에서 좋은 사례는 ABIM의 interventional cardiology

Computer/mannequin-based simulation includes a wide va- riety of different methods but one example is the device used as part of the American Board of Internal Medicine’s (ABIM) maintenance of competence for interventional cardiology.31



근무지 기반 평가

Workplace-based assessment.


아무리 복잡해져도 시뮬레이션이 환자 대면을 대체할 수는 없다.

As sophisticated as it has be-come, simulation is still not a substitute for real patient encoun-ters. 


면허와 인증 기구는 근무지에서의 평가 필요성을 두 가지 다른 방향에서 바라보았다.

The licensing and certifying authorities have taken the need for assessment in the workplace in two different directions.

    • For trainees, they have delegated responsibility for assess- ment to training program directors and developed and/or re- searched tools to support them in their efforts. Specifically, chart-stimulated recall (called case-based discussion in the United Kingdom) was developed by the American Board of Emergency medicine and the mini-CEX was developed by the ABIM.32,33

    • For practicing doctors, the licensing and certifying boards have begun to explore the use of practice performance as a basis for assessment. ABIM has developed a series of Perfor- mance Improvement Modules. These web-based tools lead doc- tors through a review of their patient data, often for a specific condition.



타당도 연구

Increased Validation Research


MOC에 연관되는 stake는 높이며, hospital privileges를 얻기에 앞서서  credential을 요구하는 병원의 숫자가 늘어남을 보았다. stake가 높아짐에 따라 프로그램의 가치에 대한 논쟁이 늘어났고 어떻게 실제 (진료) 세계에서의 문제를 반영할 것인가가 과제가 되었다. 이에 따라 validating the decision을 위한 연구가 이뤄졌다. 

The stakes associated with maintenance of certification are high and we have witnessed a significant increase in the number of hospitals requiring the credential before granting hospital privileges.38 As the stakes grow, the value of the programs are debated and how well they reflect the real world of practice is challenged.39 As such, sig- nificant research has been targeted at validating the decisions made in these programs.



Validity에 대한 이해는 시간에 따라 진화해왔다. 

Our understanding of the concept has evolved signifi- cantly over the years, and

  • Construct validity라는 우산 아래 통합 in 1995 Messick unified validity under the umbrella of construct validity.41

  • 표준화/일반화/외삽/결정 Kane’s more modern valid- ity theory is conceptualized as four structured arguments: stan- dardization, generalization, extrapolation, and decision rules.42

특히 외삽(평가 조건과 준거 행동간의 관계), 결정(개개인에 대한 평가 결정의 결과에 대한 것)이 중요하다.

In particular, practical implications flow from extrapolation, which involves the relationship between assessment conditions and criterion behaviors, and decision rules, which relate to the consequences of assessment decisions for individuals.


지난 25년간 타당도 연구는 이 practical implication을 네 가지 방향으로 확장해왔다.

Over the last 25 years validity research has expanded to address these practical implications in four ways:

  • (a) 평가나 프로그램을 이해관계자들이 받아들이는 정도 the ac- ceptability of the assessment or program to stakeholders,

  • (b) 이해관계자들이 학습하고 발전하게끔 encourage하는 정도 the extent to which stakeholders are encouraged to learn and improve,

  • (c) 프로그램 내에서의 수행능력과 외부 척도(다른 평가, 진료 특성)과의 관련성 정도 the extent to which there is a relationship between performance in the programs and external measures such as other assessments or practice characteristics, and

  • (d) 진료중의 평가와 퍼포먼스에 의해서 측정되는 퍼포먼스와의 관계 정도 the ex- tent to which there is a relationship between performance as measured by the assessment and performance in practice.43


연구의 사례

  • Examples of these types of research include studies showing  (a) public acceptability in that the participation rates in the pro- grams are high and the public expects that physicians will be assessed over their careers.38,44,45

  • Examples of studies (b) examin- ing the relationship between assessments and external measures include one study showing a positive relationship between those certified in Internal Medicine and annual income as well as ca- reer satisfaction and another study showing that higher MOC scores are correlated with greater electronic resource use.47,48

  • Examples include a study showing that (d) certified physicians do better on mortality for acute myocardial infarction or congestive heart failure and those with higher MOC scores also do better on processes of care for diabetes and mammography screening.49,50




CONCLUSION



6. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The rela- tionship between clinical experience and quality of health care. Annals of Internal Medicine 2005;142:260–73.






 2013;25 Suppl 1:S62-7. doi: 10.1080/10401334.2013.842909.

Assessment in the context of licensure and certification.

Author information

  • 1a Foundation for Advancement of International Medical Education and Research , Philadelphia , Pennsylvania , USA.

Abstract

Over the past 25 years, three major forces have had a significant influence on licensure and certification: the shift in focus from educational process to educational outcomes, the increasing recognition of the need for learning and assessment throughout a physician's career, and the changes in technology and psychometrics that have opened new vistas for assessment. These forces have led to significant changes inassessment for licensure and certification. To respond to these forces, licensure and certification programs have improved the ways in which their examinations are constructed, scored, and delivered. In particular, we note the introduction of adaptive testing; automated item creation, scoring, and test assembly; assessment engineering; and data forensics. Licensure and certification programs have also expanded their repertoire of assessments with the rapid development and adoption of simulation and workplace-based assessment. Finally, they have invested in research intended to validate their programs in four ways: (a) the acceptability of the program to stakeholders, (b) the extent to which stakeholders are encouraged to learn and improve, (c) the extent to which there is a relationship between performance in the programs and external measures, and (d) the extent to which there is a relationship between performance as measured by the assessment and performance in practice. Over the past 25 years, changes in licensure and certification have been driven by the educational outcomes movement, the need for lifelong learning, and advances in technology and psychometrics. Over the next 25 years, we expect these forces to continue to exert pressure for change which will lead to additional improvement and expansion in examination processes, methods of assessment, and validation research.

[PubMed - indexed for MEDLINE]


+ Recent posts