보건전문직 교육에서 프로그램 평가에 대한 생각: '효과가 있었나?'를 넘어서 (Med Educ, 2013)

Rethinking programme evaluation in health professions education: beyond ‘did it work?’

Faizal Haji,1–3 Marie-Paule Morin2,4 & Kathryn Parker1,5






도입

INTRODUCTION


지난 40년간 health professions에서 프로그램 평가는 '커크패트릭의 위계'를 도입함으로서 형성되어왔다. 이 모델이 널리 쓰인 이유는 자명하다. 이 모델은 평가와 관련된 판단을 내리는데 명확한 taxonomy를 제공해주며, 가장 관심이 있는 outcome을 위계의 가장 윗쪽에 두는 복잡하지 않은 구조를 활용하였다.

For nearly 40 years, programme evaluation in the health professions has been shaped by the wide- spread adoption of the Kirkpatrick hierarchy.1,2 The reason for the model’s predomi- nance is evident: it provides a clear taxonomy for making evaluative judgements, utilising an uncomplicated structure that places the outcomes of greatest interest at the top of the hierarchy.


현재의 실망의 대부분은 교육과정 intervention이 (주 관심 성과에) 제한된 효과만이 있었다는 것과 연관이 있다. 예컨대 CME프로그램의 메타분석을 보면 higher-level 성과에 대해서는 매우 작은 효과만이 있다(의사의 행동이나 환자진료). 우리의 프로그램들이 의도한 성과에 대해서 최소한의 효과만 가진다는 것에서 내릴 수 있는 결론은 둘 중 하나이다. 우리가 하는 일 중에서 차이를 만드는 것은 거의 없거나, 기존의 평가모델이 우리가 관심있는 효과를 잡아내기에 부적절하거나.

Much of the current consternation relates to the limited impact curricular interventions have had on primary outcomes of interest.7 For instance, meta- analyses of continuing medical education pro- grammes repeatedly show small effects on higher- level outcomes, such as physician behaviours and patient care.8,9 The realisation that our programmes have demonstrated minimal effects on our intended outcomes leads to one of two conclusions: either little of what we do makes a difference, or our existing evaluation models are inadequate to capture the effects we are interested in.


최근 BEME movement가 일면서, 프로그램평가의 목적을 'to place value on an activity,10 or to demonstrate its ‘merit or worth’라고 했다. 그러한 판단을 내리는데 필요한 우리의 능력은 프로그램의 효과성을 평가하는데 있으며, 이 때 우리는 '효과적이다'라는 말을 ‘inescapably linked to the outcomes of our educa- tional interventions’의 의미로 사용한다. 다른 말로 하자면, 우리는 우리가 '사전에 결정한 성과'를 달성한 프로그램을 '성공적인 프로그램'이라고 생각한다.

In the wake of the best evidence medical education (BEME) movement that has emerged in recent years,2 we have defined the purpose of programme evaluation to be to place value on an activity,10 or to demonstrate its ‘merit or worth’.11 Our ability to make such judgements rests principally on the evaluation of the effectiveness of our programmes, in which what we define as ‘effective’ is ‘inescapably linked to the outcomes of our educa- tional interventions’.6 In other words, we consider a successful educational programme as one that has achieved its predetermined outcome(s).


그러나 성과바탕적 접근에 대한 exclusive reliance를 필수불가결하게 여기는 것은 지나치게 좁은 관점이며, health professions education 과 관련된 맥락의 복잡성을 다 설명해주지 못한다. 따라서 최근 몇 년간 HPE문헌에서 context, process, theory를 고려한 다른 대안적 모델이 나오는 것도 이상하지 않다.

However, an exclusive reliance on outcomes-based approaches as the sine qua non of evaluation is too narrow in scope and cannot account for the complexities of the health professions education context. It is not surprising, therefore, that in recent years an increasing number of published reports in the health professions education literature have utilised alternative models of evaluation that consider factors such as a programme’s context, process and theory.



merit or worth를 판단하는 것에만 머무르는 것이 아니라, 교육프로그램을 진화하는 맥락의 관점에서 도입하고자 하는 교육과정개발자를 위한 reliable, valid, useful 정보를 생성하는 것이다. 또한 HPE연구자들은 ‘inform the efforts of others’할 수 있는 지식을 만들고 싶어한다.

it is not just about judging merit or worth, but also about generating reliable, valid and useful information for curriculum developers seeking to adapt pro- grammes in the light of evolving contexts, and health professions education researchers seeking to generate knowledge that can ‘inform the efforts of others’.13


우리는 imperative of proof를 넘어서, clarification studies로 나아가야 하며, 여기서는 왜 어떤 intervention이 작동하였는지, 작동하지 않았는지를 묻는 것이고, (의도한 것 외에) 무슨 일이 발생했는지 묻는 것이다.

we must move beyond the ‘imperative of proof’12 to focus on ‘clarification studies’14 that additionally ask how and why our interventions do (or do not) work, and seek to establish what else is happening when our programmes are imple- mented.


흥미롭게도 50년 전 비슷한 패러다임의 전환이 있었다.

Interestingly, a similar paradigm shift occurred within the discipline of educational programme evaluation nearly 50 years ago.



교육프로그램 평가의 역사

HISTORICAL ROOTS OF EDUCATIONAL PROGRAMME EVALUATION: A PARALLEL TO THE HEALTH PROFESSIONS



타일러는 '사전에 계획된 목표를 달성하는 프로그램의 효과성'의 관점에서 quality를 정의하였다. 이러한 타일러의 패러다임은 사전에 결정된 objectives와 planned outcome을 비교하는 선형적, 위계적 접근법.

Although early evaluation efforts date back to the late 1800s, programme evaluation as a discipline devel- oped earliest and most intensely within the field of education. Evaluation scholars often cite Ralph Tyler’s coining of the term ‘educational evaluation’ in the 1930s and 1940s as a landmark event in the development of the modern profession and disci- pline.15,16 Following his work on the Eight-Year Study, Tyler came to view evaluation as the appraisal of an educational programme’s quality.17 Defining quality in terms of a programme’s effectiveness in achieving its predetermined goals,18 the Tylerian paradigm called for a linear, hierarchical approach that compared planned outcomes with objectives defined a priori.


 

러시아가 스푸트니크호를 발사하고 미국은 국가적 혼란에 빠지게 됨. 교육과정을 전면적으로 확장하고, 타일러식 접근방법으로 이 새로운 교육과정의 목표를 정의하여 국가적 표준화시험으로 평가함. 자본이 교육에 유입되어 프로그램의 효과성을 평가하기 시작하였고, Campbell과 Stanley의 저서에 영향을 받아 대규모의 field-experiment가 새롭게 개발된 교육과정이 계획된 성과를 얼마나 달성했는지에 관한 평가에 활용되었다.

The Russian launch of the Sputnik satellite in 1957 precipitated a national crisis in the USA. Reflecting an effort to compete on a global scale, the National Defense Education Act was passed, leading to the rapid expansion of educational programmes in math, science and foreign languages.19 The Tylerian approach was adopted to define objectives for this new curriculum and national standardised tests were created to better reflect these objectives and curric- ular content. With the infusion of capital into education came a desire to evaluate the effectiveness of these programmes.19 Influenced by the writings of Campbell and Stanley,20 large-scale field experiments were used to evaluate the newly developed curricula with respect to planned outcomes.


그러나 '유의한 효과 없음'을 보여주는 연구가 많았다. 심지어 유의한 효과를 보여줄 때에도 프로그램의 본질nature 혹은 어떻게 프로그램이 도입되었는지에 대한 정보가 거의 없었다. 1960년대 후반 학자들에게 있어서 현재의 접근법은 '어떻게 프로그램을 개선할 수 있는지' '프로그램의 효과성에 관한 질문' 등을 전혀 대답해주지 못했다.

Despite best efforts, these hugely expensive and widely attempted experimental studies often demon- strated ‘no significant difference’.15 Even when significant results were reported, little information was provided on the nature of the programme and the manner in which it was implemented. Analogous to the results of ‘grand curricular experiments’21 in medical education that have recently been called into question, by the late 1960s it was apparent to leading evaluation scholars that this approach neither pro- vided insight to decision makers on how to improve programmes, nor adequately addressed questions about a programme’s ‘effectiveness’.19


 

익숙하지 않은가?

The reader may now be experiencing an uneasy sense of familiarity.



 

성과-지향적 접근의 '효과성'을 확인하는데 초점을 둔 평가는 과거와 비슷한 패턴에 빠지게 한다. 그 결과 프로그램의 성패에 관한 유의미한 understanding을 만들어낼 수 없다.

Focusing evaluations on demonstrating the ‘effectiveness’ of our interventions through an outcome-oriented approach has caused us to fall into the same pattern as evaluation scholars of the past; as a result, we too have failed to generate meaningful understanding of the factors that lead to the success or failure of a programme and the interactions of these factors within the complex, multivariate system that epitomises health professions education.




프로그램 평가에 대한 현재의 패러다임

EVOLUTION OF CONTEMPORARY PARADIGMS IN PROGRAMME EVALUATION: UNDERSTANDING HOW AND WHY PROGRAMMES WORK


Tylerian age에 개발된 전통적인 평가 접근법의 한계로 인해 학자들은 낙담한다. 그 결과 1960년대 후반부터 패러다임의 이동이 발생한다.

As a result of the shortcomings of traditional evalu- ation approaches that developed in the Tylerian age, many evaluation scholars became disheartened with the status quo. Consequently, beginning in the late 1960s a number of paradigm shifts occurred in the theory and practice of educational evaluation.


 

첫 번째 패러다임 시프트는 평가의 목적은 의사결정을 위해 improvement-oriented된 , 사용자-중심의 정보를 이해관계자들에게 제공하는 것이라고 본 학자들.

One of the first paradigm ‘shifts’ to occur during this critical period was catalysed by a group of evaluation theorists who believed that the primary purpose of evaluation was to provide improvement-oriented, user-centred information to stakeholders for the purposes of decision making. A number of evaluation models, including

  • Daniel Stufflebeam’s CIPP (con- text, input, process, product) model,22

  • Robert Stake’s responsive evaluation,23,24 and

  • Michael Patton’s util- isation-focused evaluation (UFE),25

developed within this paradigm.

 

 

이들에게 프로그램이 작동하는 교육적맥락은 평가 질문, 평가 방법, 평가결과의 해석에 중요한 역할을 한다. 추가적으로 평가질문을 이해관계자의 요구에 맞춤focusing으로써 교육성과 뿐만 아니라 프로그램의 implementation에 관여되는 교육적 프로세스의 중요성을 고려하기 시작하였다. 예컨대 CIPP모델에서는 '우리가 옳게 하고 있는가?' '우리가 하기로 했던 것을 하고 있는가?'를 확인한다.

For these evaluators, the educational context in which a programme operates plays a significant role in the articulation of evaluation questions, evaluation methods and interpretation of evaluation findings. In addition, by focusing evalua- tion questions on the needs of programme stake- holders, these models bring to the fore the importance of considering the educational processes involved in programme implementation, in addition to measuring outcomes.24 For instance, in Stuffle- beam’s CIPP model,22 the evaluation of educational processes (the first ‘P’) involves asking ’are we doing it correctly?’ (or, to put it another way, ‘Did we do what we said we would?’) to determine whether programmes are delivered in the manner in which their designers intended.


 

Steinert 등이 CIPP모델을 프로페셔널리즘 평가에 관한 교수개발 프로그램 평가에 활용한 사례. 맥락과 프로세스를 평가하였음.

Addi- tionally, Steinert et al.26 used the CIPP model to evaluate a faculty development programme for teaching and assessing professionalism. Their evalu- ation of programme context and process, which consisted of surveys and informal interviews with stakeholders, revealed that

  • the identification of core concepts,

  • the provision of a structured framework forteaching and evaluating professionalism, and

  • the analysis of case vignettes in small groups

...were particularly useful for participants.26

 

이 요소들이 positive outcome에 기여한 것으로 보임.

It would appear that these factors contributed to the positive out- comes observed by the authors, which included an increase in medical education activities for trainees directed at professionalism, as well as the incorpora- tion of these concepts into the clinical practice and teaching of faculty participants.26



Reeves and Freeth는 유사한 모델로 3P (presage, process, product) model을 활용하여 in-service IPE 프로그램을 평가함. 이렇게 context와 process 지향적 방식으로 어떻게 프ㅗ그램이 작동하여 의도한 성과를 내도록(혹은 내지 못하도록) 했는지 알 수 있다.

Utilising a similar model known as the 3P (presage, process, product) model, Reeves and Freeth27 demonstrated the vital importance of contextual factors in relation to a lack of continuity of leadership and engagement of senior management in the failed long-term viability of an in- service interprofessional education programme for community mental health teams. In this way, such context- and process-oriented approaches can pro- vide insight into how programmes operate to bring about (or fail to bring about) their intended out- comes.


 

그러나 이해관계자의 의사결정 니즈에만 초점을 둔 평가는 프로그램의 프로세스에 관한 중요한 정보를 주지만, 왜 그 프로그램이 작동하였는가를 설명해주지 못한다. 따라서 의도한 결과를 내지 못했을 때, 그 결과에 대해서 거의 설명해줄 수 있는게 없다. 이러한 문제는 1970년대 중반~1980년대 등장한 theory-based evaluation paradigm에 의해서 해소되었다. 이 패러다임은 왜 프로그램이 성공하거나 실패하는가를 이해하는데 목적이 있어서 이를 knowledge construction 이라고 할 수도 있고, 메커니즘을 밝히기 위한 '블랙박스'를 연다라고 할 수도 있다.

One of the limitations of focusing evaluation solely on the decision-making needs of stakeholders is that although such evaluations may provide valuable information regarding programme processes, they fail to inform us about why programmes work. Thus, if programme processes fail to produce their intended effects, there is little explanation for this result. Fortunately, this issue is addressed by the theory- based evaluation paradigm, which emerged during the mid-1970s and 1980s through the work of Huey-Tsyh Chen, Stewart Donaldson and others.30 The main purpose of theory-based evaluation is to understand why a programme is succeeding or failing, for the purposes of programme improvement and ‘knowledge construction’.31 In other words, theory-based evaluations seek to unpack the ‘black box’ by identifying mechanisms that mediate between programme processes and intended outcomes30 in the hope that these findings can be generalised to similar programmes in similar situations. 


뉴턴의 사과를 사용한 비유

To illustrate this type of evaluation, consider the law of universal gravitation. In Newton’s classic observa- tion of an apple falling from a tree, the final position of the apple can be viewed as an ‘outcome’; the characteristics of the apple, the tree and the earth as the ‘context’, and the path of the apple as it falls through the air as the ‘process’. However, under- standing each of these facets does not adequately inform us about why the apple falls. Only by articu- lating the existence of an attractive force (gravity) and its action upon the apple can we adequately explain the mechanism by which the apple gets from the tree to the ground. Similarly, proponents of theory-based evaluation argue that to generate an understanding of why programmes operate in the way they do, evaluators must articulate (and subsequently evaluate) a ‘plausible and defensible’32 conceptual framework (i.e. theory) that explains the mechanism by which programme processes lead to outcomes.



theory-based evaluation의 순서

Thus, a theory-based evaluation would

  • 개념적 프레임워크를 설명함
    begin with an articulation of the conceptual framework that underlies programme design,

    • 기획자가 왜 그 프로그램이 작동할 것인지에 대해 이해했던 것에 기반
      based on programme designers’ and evaluators’ understandings of why the programme will work.

    • 이론은 무엇이 일어날지를 예측함
      In this way, the theory serves as a prediction and answers the question of what will happen.

  • 예측에 기반하여 평가를 위한 variable과 design을 경정하고 프로그램을 시행함  
    Based on this prediction, pertinent evalua- tion questions, variables and designs are identified and an implementation plan is generated.

  • 프로그램이 articulated theory대로 도입되었는지를 먼저 봐야 한다. 그렇지 않으면 '도입/시행의 실패'
    The evaluative phase first considers whether programme implementation is consistent with the articulated theory because when it is not, a ‘failure of imple- mentation’ can occur.30

  • 무작위 실험이나 통계기법 등으로 인과관계(이론과 의도한 성과 사이의)를 분석함.
    Subsequently, through the simultaneous use of randomised experiments and advanced statistical techniques (such as structural equation modelling),16 causal linkages between pro- gramme theory and intended outcomes are evalu- ated.

  • 계획대로 잘 도입/시행되고 의도한 결과가 달성되면, 평가자는 process와 outcome을 이어주는 mechanism을 이론적으로 설명할 수 있게 됨.
    Thus, when the implementation plan is carried out correctly and the desired results are achieved, the evaluator can conclude that the stated theory accounts for the mechanism linking the programme processes to the observed outcomes.


이 패러다임에서 적절한 이론을 선택하는 것은 프로그램에서 무엇을 달성하고자 하는가에 달려있다. 새로운 staff의 학습과 행동 변화를 목적으로 하는 프로그램이라면 learning and behaviour change 와 관련된 이론이 적합하다. Hodges and Kuper 가 주장한 바와 같이, 세 가지 프로그램이론의 classification이 가능하다.

In this paradigm, the selection of an appropriate theory for a given programme is largely dependent on what the programme hopes to achieve. For instance, the programme’s objectives are to facilitate the learning of new staff and to promote behaviour change to ensure best practices are followed. Thus, theories concerning both learning and behaviour change are likely to be mechanistically relevant to this pro- gramme. As Hodges and Kuper argue,33 three classi-fications of programme theory are applicable in the health professions context:

  • bioscience theories (e.g. theories related to motor learning and cognitive load),

  • learning theories (e.g. situated learning, adult learning theories and socio-cognitive theory), and

  • socio-cultural theories (e.g. critical and politico-eco- nomic theories).


평가로부터 얻은 understanding은 기존의 이론 위에 구축built되어서 어떻게 학생이 학습하는지, 무엇이 복잡한 시스템 내에서 행동변화의 동기를 유발하는지에 대한 우리의 이해를 확장시켜줄 수 있다. 따라서 이러한 접근법을 HPE에 적용하는 것은 명확하고, 많은 학자들이 요구하고 있다.

The understanding garnered from evaluations that are built on these theories can further our understanding of how students learn and what motivates behavioural change within complex systems. Thus, the applicability of this approach to health professions education is clear and, not sur- prisingly, many scholars have already called for an increase in theory-based approaches.1,34,35




Chen의 program theory 모델을 의학교육펠로우십 프로그램에 활용하면서 Park 등은 program theory가 program element 사이의 밝혀지지 않은 tension을 드러내어줄 수 있음을 보여주었다.

Interestingly, in an attempt to operationalise Chen’s model of programme theory32 to the evaluation of a medical education fellowship programme, Parker et al.34 demonstrated that the consideration of programme theory may unearth previously undefined tensions within programme elements. In a creative approach to address this tension, the authors engaged a secondary process known as ‘polarity management’,34 drawn from the organisational development literature, to deepen their understand- ing of this tension and how it related to the core strengths of the programme. In doing so, they were able to unravel another layer of the programme’s theory and thus a better understanding of how not only intended, but also unintended and emergent outcomes of the programme came to be.


 

 

복잡성을 수용하기: emergence 평가

EMBRACING COMPLEXITY: EVALUATING EMERGENCE TO ESTABLISH WHAT (ELSE) HAPPENED


앞서 programme theory, 프로세스, 성과를 설명하였다. 같은 조건에서는 같은 결과가 나올 것이라고 가정했을 때, 이러한 접근법은 '단순화된' '일반화가능한' 진실을 생성해줄 수 있다. 그러나 교육적 intervention은 singular entity가 아니며, 상호작용하고 항구히 변화는 context내에서 예측불가능한 무수한 역동적 요소들로 이루어져 있다. 따라서 '실제로 무슨 일이 일어났는가'를 고려하지 않는다면 complexity를 정확히 설명할 수 없다. 즉, '그것 말고 또 무슨 일이 발생했지?'를 묻지 않는다면 의도하지 않은 프로세스와 성과를 잡아낼 수 없다.

The paradigms we have discussed thus far rely heavily on articulating programme theory, processes and outcomes in advance of implementation. Based on the assumption that events occurring under the same conditions will produce similar results, this approach attempts to generate simplified, generalisable ‘truths’ that can be broadly applied to curriculum-level interventions.12 Yet educational interventions are not singular entities;36 they consist of a myriad of dynamic components that interact in complex, non-linear ways, influenced by ever-changing contexts, in which unpredictability is the rule. Thus, the lack of consid- eration of what is actually happening in the moment (as opposed to what we predict will happen based on our goals) limits our capacity to account for com- plexity. In other words, as these paradigms fail to ask ’what (else) happened?’, they cannot capture the unintended processes and outcomes that emerge as programmes are operationalised. We need only consider the impacts of the hidden curriculum in undergraduate medical education to understand how a failure to capture unintended effects can lead to evaluations that may well miss the mark.



emergence를 잡아내야 할 필요성은 초창기부터 있었다. Michael Scriven은 'goal-free evaluation'을 1970년대 초반에 발표하였는데, 그의 관점에서 평가의 주된 목적은 프로그램에 value를 place하는 것이다. 그러나, 그는 또한 평가자들은 의도한 것이든 아니든 프로그램의 actual effect를 고려해야 한다고 말했다.

Fortunately, the need to capture ‘emergence’ was recognised by evaluation theorists early in the evolu- tion we have been tracing. Among the first to do so was Michael Scriven, who presented his formulation of ‘goal-free evaluation’ in the early 1970s.16 Scriven’s perspective is that the primary purpose of evaluation is to place value on a programme, or to judge its merit or worth.37 However, he argues that to adequately do so, evaluators must consider the actual effects of a programme, whether they were intended or not.37


이것은 프로그램의 효과성에 대한 판단을 내릴 때 planned and unplanned outcome을 모두 고려하는 것의 중요성을 강조한다. 이러한 개념의 중요성은 health care professionals에게 명확한데, 비유하자면 치료법의 잠재적 부작용은 전혀 고려하지 않은 채 효과성만 판단하는 것이다.

it highlights the importance of considering not only planned outcomes, but unplanned (i.e. emergent) ones as well when making judgements regarding a pro- gramme’s effectiveness. The importance of this con- cept should be readily apparent to health care professionals, as the analogous circumstance of judging the effectiveness of a proposed treatment without considering its potential side-effects would obviously be inadequate.


Emergence 은 UFE 패러다임에서도 등장하였다. 이 개념의 이름에서도 드러나듯, 프로그램 디자이너의 요구를 충족시키기 위한 접근법으로, 프로그램에 관한 판단을 내리는데 관심이 있는 것이 아니라, 그들(디자이너들)이 개발하고 있는 것에 관한 함의를 이해하는데 관심이 있었다.

Emergence has also gained prominence in the evolution of the UFE paradigm. This stems princi- pally from Michael Patton’s work in the mid-1990s on the concept of ‘developmental evaluation’.38 As its name suggests, this approach evolved from a desire to better serve the needs of programme designers, who were interested not in making judgements about their programmes, but in understanding the impli- cations of what they were developing. These designers...

‘never expect to arrive at a steady state of program- ming because they’re constantly tinkering as partic- ipants, conditions, learnings, and contexts change…[and] no sooner do they articulate and clarify some aspect of the process than that very awareness becomes an intervention and acts to change what they do’.38



전통적인 UFE가 프로그램 디자이너의 요구를 충족시키는 데 한계가 있음을 깨닫고, Patton은 development와 evaluation 프로세스 사이에 시간적 구분을 두지 않는 접근법을 만들었다. 평가자는 개발팀의 핵심적인 part가 되며, 디자이너들이 프로세스와 성과가 급변하는 환경 속에서 드러나는emerge대로 모두 모니터할 수 있도록 돕는다. 목적 자체가 developmental process에 필요한 모든 정보를 제공하는 것에 있기에, 방법론적 접근법은 그 상황에 가장 적합한게 무엇이냐에 따라 달라진다(질적, 양적, 혼합)

Recognising the limitations of traditional UFE in addressing the needs of programme designers, Pat- ton articulates an approach in which there is no temporal distinction between the development and evaluation process. The evaluator becomes an inte- gral part of the development team,38 helping designers to monitor both processes and outcomes as they emerge from the evolving, rapidly changing environment; evaluation literally occurs in the moment. As the focus is on providing whatever information is needed for the development process, the choice of methodological approach is based on what is most appropriate for the situation. This can include

  • quantitative methods (such as structured, numerically anchored stakeholder surveys),

  • qualitative methods (e.g. interviews, focus groups and structured stake- holder conversations such as those defined by the ORID [objective, reflective, interpretive, decisional] model39)

  • or, in most cases, mixed methods which combine these approaches.


Patton의 formulation은 '하겠다고 계획한 것을 했는가'라는 질문 뿐 아니라 '그 밖에 어떻게 프로그램이 작동하는가'라는 질문에 답할 수 있게 한다. 즉, 어떻게 참가자/디자이너/이해관계자/맥락이 프로그램에 adapt하며, 어떤 추가적인 프로세스가 그 결과로서 emerge하는가?

Patton’s formu- lation38 provides an avenue by which to answer not only the question of whether we did what we said we would, but also to establish how (else) the pro- gramme is operating. That is, how did participants, programme designers, other stakeholders and the context itself adapt to the programme, and what additional processes emerged as a result?


이러한 emergent process를 잡아내는 것이 중요하다

It would be important to capture these ‘emergent processes’ in the evaluation of the programme so that future programmes could be planned to include both new and experienced staff.


 

 

'이론-기반 평가'가 진화하면서 'emergence'라는 개념도 포괄하기 시작했다. 1990년대 Ray Pawson은 'realist evaluation이라는 접근법을 주장하면서 '실세계의 지저분함'을 인정해야 한다고 했다. '한 사람이 같은 강을 두 번 밟을 수 없다'라는 말처럼, 이 관점은 프로그램에 대해 사전에 결정된 이론만으로는 emergent process를 모두 설명할 수 없다고 주장한다.

The evolution of theory-based evaluation has also embraced the notion of emergence in recent years. In the late 1990s, Ray Pawson and colleagues articulated an approach known as ‘realist evaluation’, which acknowledges and accommodates the ‘messiness of real-world interventions’.36 In line with the notion that one can never step twice into the same river (i.e. the context in which our interventions operate is changed merely by our interventions operating within them), the realist view contends that relying solely on programme theories articulated a priori is inadequate to explain the emergent processes and outcomes that result from programme implementation.

 

 

비유

From our example of the universal law of gravity, consider the instance of a leaf falling from the tree, rather than an apple. It is conceivable that instead of falling directly to the ground, a leaf might land 20 feet away (an emergent outcome). Upon observing the leaf, we recognise that although it eventually falls to the ground, it takes a tortuous route, floating through the air before reaching its final resting place (an emergent process). Our planned theory (gravity) cannot explain how the leaf came to rest so far away from the tree.


planned theory에만 의존하는 것은 부적절한데, 의도한 성과가 달성되지 않았을 때 의지할 곳이 없기 때문이다. 심지어 의도한 성과가 달성되었더라도, 그것을 설명하는 다수의 메커니즘이 있을 수 있다. realist evaluation에서 program theory는 '반복적인, 설명-설계 프로세스'로서 발생하는 프로세스와 outcome을 실시간으로 관측하면서 하나 이상의 이론을 도출하고 구성한다. 이 formulation은 데이터가 새로 emerge 할 때마다 지속적으로 비교되고 revise된다. 그러나 emergent theory만 보는 것이 가장 효율적인 것은 아니며, planned and emergent theory를 같이 봐야 한다.

As such, relying solely on planned theory is inade- quate because it leaves the evaluator with no recourse when planned outcomes are not achieved (in the absence of an ‘implementation failure’ to explain this finding). Even when intended outcomes are ob- served, there are often multiple mechanisms that can explain how this came to be. In realist evaluation, articulation of a programme theory is thus viewed asan ‘iterative, explanation-building process’.36 As the programme is implemented, the evaluator observes processes and outcomes as they occur, identifying and constructing one or more theories that might explain the mechanism between them. This formu- lation is constantly compared against emerging data and iteratively revised to ensure that it explains the findings. As the programme theory is articulated based on what is happening in the moment, we have termed this ‘emergent theory’. However, it is impor- tant to note that a realist evaluation that only considers emergent theory may not be the most efficient approach. When programme designers andevaluators have a sense of (some of) the mechanisms at work within a programme, the combined use of planned and emergent theory may provide a better understanding.


비유

Returning to our example of the leaf falling from the tree, using this approach we would be able to consider the possibility of other forces, such as wind or air resistance (our emergent theory) acting in concert with gravity (our planned theory) on the leaf to better account for ‘what (else) happened’.



관련해서, Parker는 로직모델과 근거이론을 활용하여 연구함.

In a related approach, Parker et al.,1 developed a strategy for evaluating a clinician- scientist training programme using both the logic model and grounded theory methodology to capture planned processes, as well as planned and emergent outcomes.


교훈: 프로그램평가의 새로운 개념

LESSONS LEARNED: A RECONCEPTUALISATION OF PROGRAMME EVALUATION IN THE HEALTH PROFESSIONS


 

planned outcome에만 초점을 두는 평가전략은 충분하지 못하다. contemporary한 평가패러다임을 도입하여야 하며, 왜냐하면 그렇게 하지 않으면 선형적, 융통성없는 접근법을 쓰게 될 것이고 의사결정/지식생성/가치판단에 필요한 정보를 얻지 못하기 때문이다.

It is evident that an evaluation strategy that focuses solely on planned outcomes will not be sufficient to meet the needs of programme evaluators in the health professions moving forward; thus, we must continue to incor- porate contemporary evaluation paradigms into our evaluation efforts because failure to do so will cause us to promote a linear, inflexible approach to evaluation42 that will not provide the necessary information for decision making, knowledge generation, or judgement of the merit or worth of our interventions.



이 요소들과 관계를 그림 1에 넣었다.

We have summarised these elements, and the relationships among them, in Fig. 1. In brief:

 


1 To address the question of how a programme works, evaluators must capture not only pro- gramme outcomes, but also programme processes.


2 An understanding of why a programme works requires an evaluation of programme theory to elucidate the mechanisms that can explain how programme processes lead to programme out- comes.


3 To truly understand what (else) happened, it is not sufficient to rely solely on the articulation of planned theory, processes and outcomes. The evaluator must also capture emergence, in terms of both processes and outcomes, and generate emergent theory to explain what is occurring in articulo (in the moment).


4 It is also important to acknowledge that all programmes operate within an educational con- text, which must be considered for any evaluation to be complete. Furthermore, simply by virtue of programme delivery, this context will change; capturing this change is essential for the plan- ning of future programme iterations.



두 이론, 프로세스, 성과 사이의 관계는 절대 선형적이지 않다.

the links between planned and emergent theory, processes and outcomes are far from linear. A planned process may lead to a planned outcome, but can just as easily result in an emergent (unintended) outcome. Similarly, an emergent process may result in an emergent outcome, or by some previously undefined mecha- nism, the planned outcome as well.



따라서 프로그램평가의 접근법은 '완벽한 모델'을 향한 보물찾기가 아니다. 프로그램평가는 'reflective exercise'로서 평가자가 스스로 가진 내재적 bias를 인지하고, 접근법의 가장 적절한 조합을 찾는 과정이다. 이를 통해서 모델과 방법 사이의 선택은 either/or가 아니라 both/and가 되어야 한다.

Thus, the choice of an approach to programme evaluation is not so much a treasure hunt for the ‘perfect model’ as it is a reflective exercise in which the evaluator recognises the inherent biases associated with his or her selection and decides on the most appropriate combination of available approaches. In this way, the choice between models (and methods) that emerge from these paradigms should not be viewed as an ‘either ⁄ or’ choice, but rather as a ‘both⁄ and’ selection.


 

마지막으로, 우리는 의도적으로 이 discussion에서 prescriptive하지 않으려 했다.

On a final note, although we have purposely avoided being prescriptive in this discussion, we would be remiss if we did not point out the few fundamental changes we feel need to be made to the way we conduct evaluations in the health professions. Firstly, instead of treating evaluation as a snapshot endeav- our that occurs after programme delivery, we must consider programme evaluation as a process in and of itself. In parallel with the notion of ‘programmes of assessment’, which has appeared in the health professions literature of late,43 we must similarly move towards ‘programmes of evaluation’.

 

평가는 다수의 이해관계자를 포함하고, 다수의 방법을 사용하여, 프로그램의 life 동안 발생하는 것들을 모두 포함해서 프로그램과 그 효과에 관한 holistic understanding을 만들어야 할 뿐만 아니라, 어떻게 그 프로그램이 작동하는 맥락이 프로그램의 존재 자체로서 변화하게 되는지까지 알수 있어야 한다. 이를 위해서는 development와 evaluation의 임의적 구분을 두고자 하는 생각을 끊어내야 한다. 이 두 가지 활동은 한 동전의 양면과 같은 것이다.

In recog- nition of the maxim that the whole is greater than the sum of its parts, evaluations must involve multiple stakeholders, use multiple methods, and occur throughout the life of the programme (right from conception, through to planning, delivery and revi- sion) to generate a holistic understanding not only of the programme and its effects, but also of how the context in which the programme operates is changed by its presence. To do so effectively, we must relinquish our bonds to the arbitrary distinction between the process of development and that of evaluation; instead, we must see these activities as two sides of the same coin, each of which informs the other in a continuous, iterative process that leads to incremental programme change.




CONCLUSIONS


health profession에서 전통적 성과-기반 모델은 부적절하다

It is clear that programme evaluations using tradi- tional ‘outcomes-based’ models are inadequate for the health professions context.


These elements have allowed us to address not only the fundamental question of whether our programme worked, but also the issues of how it worked, why it worked and what (else) happened.



 




1 Parker K, Burrows G, Nash H, Rosenblum ND. Going beyond Kirkpatrick in evaluating a clinician scientist programme: it’s not ‘if it works’ but ‘how it works’. Acad Med 2011;86 (11):1389–96.


33 Hodges BD, Kuper A. Theory and practice in the design and conduct of graduate medical education. Acad Med 2012;87 (1):25–33.


34 Parker K, Shaver J, Hodges B. Intersections of creativity in the evaluation of the Wilson Centre Fellowship Programme. Med Educ 2010;44 (11):1095–104.


24 Curran V, Christopher J, Lemire F, Collins A, Barrett B. Application of a responsive evaluation approach in medical education. Med Educ 2003;37 (3):256–66.





 2013 Apr;47(4):342-51. doi: 10.1111/medu.12091.

Rethinking programme evaluation in health professions educationbeyond 'did it work?'.

Author information

  • 1Wilson Centre, University of Toronto, Toronto, Ontario, Canada. faizal.a.haji@gmail.com

Abstract

CONTEXT:

For nearly 40 years, outcome-based models have dominated programme evaluation in health professions education. However, there is increasing recognition that these models cannot address the complexities of the health professions context and studies employing alternative evaluation approaches that are appearing in the literature. A similar paradigm shift occurred over 50 years ago in the broader discipline of programme evaluation. Understanding the development of contemporary paradigms within this field provides important insights to support the evolution of programme evaluation in the health professions.

METHODS:

In this discussion paper, we review the historical roots of programme evaluation as a discipline, demonstrating parallels with the dominant approach to evaluation in the health professions. In tracing the evolution of contemporary paradigms within this field, we demonstrate how their aim is not only to judge a programme's merit or worth, but also to generate information for curriculum designers seeking to adapt programmes to evolving contexts, and researchers seeking to generate knowledge to inform the work of others.

DISCUSSION:

From this evolution, we distil seven essential elements of educational programmes that should be evaluated to achieve the stated goals. Our formulation is not a prescriptive method for conducting programme evaluation; rather, we use these elements as a guide for the development of a holistic 'programme of evaluation' that involves multiple stakeholders, uses a combination of available models and methods, and occurs throughout the life of a programme. Thus, these elements provide a roadmap for the programmeevaluation process, which allows evaluators to move beyond asking whether a programme worked, to establishing how it worked, why it worked and what else happened. By engaging in this process, evaluators will generate a sound understanding of the relationships among programmes, the contexts in which they operate, and the outcomes that result from them.

PMID:
 
23488754
 
DOI:
 
10.1111/medu.12091
[PubMed - indexed for MEDLINE]


+ Recent posts