의과대학 벤치마킹 - 도구에서 프로그램으로 (Medical Teacher, 2015)

Medical school benchmarking – From tools to programmes

TIM J. WILKINSON1, JUDITH N. HUDSON2, GEOFFREY J. MCCOLL3, WENDY C. Y. HU4, BRIAN C. JOLLY2 & LAMBERT W. T. SCHUWIRTH5,6

1University of Otago, New Zealand, 2University of Newcastle, Australia, 3University of Melbourne, Australia, 4University of Western Sydney, Australia, 5Flinders University, Australia, 6Maastricht University, The Netherlands







Background

의과대학에서 벤치마킹은 반드시 필요한 것이지만, 의도하지 않은 결과를 초래할 수도 있다.

Benchmarking among medical schools is essential, but may result in unwanted effects.


AIM:

의과대학의 벤치마킹에 conceptual framework를 적용하기 위함.

To apply a conceptual framework to selected benchmarking activities of medical schools.


METHODS:

평가가 학습에 미치는 영향과 벤치마킹이 의과대학 교육에 미치는 영향의 유사성을 비교하고, 벤치마킹을 평가하기 위한 framework를 개발하였으며, 호주와 뉴질랜드의 주요 벤치마킹 활동에 적용해보았음.

We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand.


RESULTS:

다섯 개의 주요 질문을 도출하였다. 

The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge.


CONCLUSION:

의과대학은 다양한 교육활동에 대한 벤치마킹을 통해 질적 향상을 추구해야 하며, 여러 기준들을 만족하고 있음을 다른 이해관계자들에게 보여줄 수 있어야 한다. 벤치마킹이 긍정적인 점도 있지만, 만일 소수의 평가 도구에 의해서만 이루어진다면 예측하지 못한, 부정적 효과를 촉진할 수도 있다.

Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.






This has been partially captured in Goodhart’s law, most often stated as: ‘‘When a measure becomes a target, it ceases to be a good measure’’ (Goodhart 1981).



Benchmarking is defined (Meade 1998) as 

the formal and structured process of searching for those practices which lead to excellent performancethe observation and exchange of information about them, their adaptation to meet the needs of one’s own organisation, and the implementation of the amended practice



Stage 1: Conceptual analogy


어떤 평가든 그것을 수행하기 전에 그 목적을 분명히 해야한다. 단순한 결정(합-불합)을 위한 것일 수도 있지만, 정보를 탐색하기 위해서, 즉 강점과 약점을 파악하여 향상시키기 위한 목적일 수도 있다.

Before any assessments are conducted, however, there should be a clear statement of purpose of what the process aims to achieve. The purpose can be purely decision-oriented or alternatively, it can be used informatively: to optimise the information about strengths and weaknesses and how to improve.


목적이 분명하지 않으면 수집된 데이터는 물론이고 그 해석도 잘못될 가능성이 있다. 예컨대 평가의 목적이 부족한 부분을 찾아 향상시키기 위해서라면 학생이 자신의 약점을 더 드러내고자 할 가능성이 있다. 그러나 단순히 합-불합 결정을 위한 평가라면 학생은 자신의 약점을 숨기려고 할 것이다. 

If these purposes are not clear, then data collection and their interpretation will logically become distorted. For example, if the purpose is to help identify areas for improvement, then a student will be more likely to reveal their weaknesses and therefore guide how their performance can be improved. If the purpose is merely to make a passfail decision, then students will prefer to conceal their weaknesses. As the case studies illustrate, collecting data for one purpose and using it for another, will only create tensions, distortions and difficulties in data interpretation.


타당도라는 개념이 쉽지는 않지만, 지금까지의 이론에서 공통적인 것은 평가자가 무엇을 평가하고자 하는가, 학생의 어떤 특질에 가치를 둘 것인지, 그 평가가 학습에 어떠한 영향을 줄 것인지 등에 대해서 명확히 해야 한다는 것이다. 다만 문제는 평가자가 중요시하는 특질이 쉽게 측정가능한 것이 아닐 때 발생한다. 이 경우에 측정하기 쉬운 것만 측정하고 싶은 유혹에 빠져서 정작 중요하게 평가해야 할 것을 놓칠 수 있다.

Validity is not an easy concept and there have been numerous theories concerning how best to establish validity of a measure (Kane 2001). Current theories concur that it is necessary that the assessors have a clear focus on what they want to assess, which student attributes they value and want to capture, and how the assessment influences learning behaviours. Problems arise when student attributes that are valued are not so easily measured. It can be tempting to measure only the easily measurable attributes (because it is easier, more convenient and a feasible starting point) and ignore the important ones.


흔한 오해 중 하나는 '주관성'을 '낮은 신뢰도'와 동치시키는 것이다. 평가를 위해서는 전문가가 그 평가에 대한 전반적인 가치(global value)를 설정하기 때문에 반드시 전문가의 판단이 들어가게 되며, 이미 평가에는 본질적으로 주관성이 포함된 것이다. 이러한 판단도 여전히 신뢰성을 갖출 수 있다. 신뢰성은 샘플링의 문제이며, 샘플링이 잘 된다면 신뢰도와 타당도를 고루 갖춘 결과를 얻을 수 있다.

It is a common misconception to equate subjectivity with unreliability. The process of assessment requires expert judgement, which is intrinsically subjective as it requires assessors to assign a global value. Such judgements may still be reliable, as reliability is a matter of sampling and many well-sampled judgements will produce reliable – and valid – results (van der Vleuten et al. 1991).





Implications for benchmarking

Programmatic approach로 가야한다.

We argue that discussions on benchmarking should move forward in a manner that is similar to the recent changes in assessment focus described above (van der Vleuten & Schuwirth 2005), namely a move from individual testing methods to a programmatic approach.


벤치마킹의 목적이 분명해야 한다.

Similarly, if the purpose of benchmarking is not clear, the information will become useless for quality improvement and may be manipulated by stakeholders for strategic gain.


이런 목적들이 있을 수 있다.

Organisational responses to a benchmarking process will depend on whether it is undertaken in the context of quality improvement, competition for limited ongoing training opportunities, or to reassure the public and professional accreditation bodies of student attainment of competency standards.



Stage 2: A framework for evaluating benchmarking


(1) What is the purpose? (평가의 목적이 무엇인가?)

Just as in scientific research: ‘‘invalid data, invalid conclusions’’ and ‘‘decide the purpose, before deciding the method’’.


(2) What are the attributes of value? (무엇을 평가하고자 하는가?)

A clear description of the expected outcomes of the programmes being benchmarked is needed, acknowledging that these may differ among medical schools.


(3) What are the best tools to assess the attributes of value? (평가하고자 하는 것을 가장 잘 측정할 수 있는 도구는 무엇인가?)

The tools and measures chosen for benchmarking medical schools do not need to be the same as (and ideally would be different from) those used for assessing medical students


평가에 대한 이상적인 접근방법은 평가도구를 고르기에 앞서서 평가의 목적과 대상을 결정하는 것이다. 이것은 연구나 진료에서도 마찬가지로 적용되는 원리이다. 연구방법론으로부터 연구질문이 만들어지는 연구을 좋은 연구라 할 수 없으며, 먼저 검사를 한 뒤 감별진단을 끄집어낸 진료를 좋은 진료라 할 수 없다. 감별진단으로부터 검사계획을 수립해야 하고, 연구질문에 따라 적절한 연구방법론을 활용해야 한다. 더 나아가, 하나의 도구만으로 교육과정의 질에 대한 복합적인 질문에 모두 답할 수 없기 때문에 다양한 도구를 사용해야 한다. 

This is analogous to good research; the methodology should not drive the research question and, similarly, in good clinical care the choice of investigations should not drive the differential diagnosis. Instead, the differential diagnosis drives the diagnostic plan, and the research question the methodology. Also, single tools never give comprehensive answers to the complex question of the quality of a curriculum, so a combination of tools is needed.


(4) What happens to the results? (결과를 어떻게 활용할 것인가?)

Even with the best intentions, good results in the wrong hands can have unwanted negative effects.


(5) What is the likely ‘‘institutional impact’’ of the results? (조직 차원에서 어떤 영향이 있을 것인가?)

The ‘‘educational impact’’ of assessment on student learning behaviour is well known (Newble & Jaeger 1983; Frederiksen 1984; Wilkinson et al. 2007; Cilliers et al. 2010, 2012a, 2012b, 2013).




Stage 3: Applying the framework: some examples


(1) Medical Schools Outcomes Database.

(2) Australian Medical Schools Assessment Collaboration.

(3) Collaborative progress testing.

(4) Sharing OSCE stations (ACCLAiM)

(5) Sharing a large assessment bank for medical education on an international scale (IDEAL).









In summary, benchmarking, like assessment, is a powerful driver of behaviour. We need to ensure that decisions that are made result in behaviours that improve quality, share good practice and foster innovation.






 2015 Feb;37(2):146-52. doi: 10.3109/0142159X.2014.932902. Epub 2014 Jul 3.

Medical school benchmarking - From tools to programmes.

Author information

  • 1University of Otago , New Zealand .

Abstract

Abstract Background: Benchmarking among medical schools is essential, but may result in unwanted effects.

AIM:

To apply a conceptual framework to selected benchmarking activities of medical schools.

METHODS:

We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical schooleducational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand.

RESULTS:

The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge.

CONCLUSION:

Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.

PMID:
 
24989363
 
[PubMed - in process]


+ Recent posts