프로그램 평가의 구조: 교과목, 임상실습, 전공의 또는 펠로우 수련 프로그램 평가

The Structure of Program Evaluation: An Approach for Evaluating a Course, Clerkship, or Components of a Residency or Fellowship Training Program

Steven J. Durning a , Paul Hemmer a & Louis N. Pangaro a 

a Department of Medicine , Uniformed Services University of the Health Sciences , Bethesda, Maryland, USA



교과목, 임상실습, 전공의수련 관리자들은 개별 수련자(trainee)들이 교육 목적을 달성했는지 뿐만 아니라 그 프로그램 자체의 질도 담보할 수 있어야 한다. "프로그램 평가"란 개별 수련자들의 정보를 모으는 것을 넘어서 관리자로 하여금 다양한 요인과 성과 지표를 역동적, 지속적으로 평가하도록 한다.

Directors of courses, clerkships, and residencies are responsible not only for determining whether individual trainees have met educational goals but also for ensuring the quality of the training program itself. This “program evaluation” is more than the aggregate of individual trainee data; it requires academic directors to employ a dynamic, longitudinal evaluation process that tracks multiple contributing factors and outcome measurements.


Academic director가 프로그램 평가에 대한 프레임워크를 가지고 있어야 한다는 점은 자명하다.

The desirability and necessity for academic directors to have a framework for program evaluation is evident ...

      • ACGME
      • LCME


논문의 목표

The purpose of this article is to discuss a framework for program evaluation that has sufficient rigor to satisfy accreditation expectations and still be flexible and responsive to the uniqueness of individual educational programs. (...) Our intent is to demonstrate how practical our framework can be for conducting program evaluation in medical education training programs. (...)


이 프레임워크는 baseline, process, product의 역할을 강조하고 있으며, 프로그램의 성공을 판단하는데 있어서 양적과 질적 정보를 모두 강조하고 있다. 또한 최근 ACGME에서 "outcome"을 강조하고 있으나, 우리는 'process'지표도 강조하고자 한다. 왜냐하면 많은 성과지표는 신롸도와 타당도에서 확실하지 않기 때문이며, 또한 성과라는 것은 궁극적으로는 절차(process)를 정교하게 하기 위해서 필요하기 때문이다. 마지막으로 baseline 측정을 강조하고자 하는데, 이는 성과가 수련자에 따라서가 아니라 프로그램에 의해서 얼마나 바뀌었는지를 확인하고자 함이다.

The proposed framework emphasizes the role of baseline, process, and product (outcome) information, both quantitative and qualitative, for describing program “success.” (...) Nevertheless, despite the recent emphasis of the ACGME on “outcomes,” we emphasize the importance of process measurements (such as the number and kinds of patients seen during training and the level of proficiency obtained), as many available outcome measurements have uncertain reliability and validity and because, eventually, the outcomes should be used to refine the curricular process. (...) Finally, we advocate for the inclusion of baseline measurements, which may allow us to determine how much of an eventual outcome depends on the curriculum, as opposed to the prior characteristics of trainees.




프로그램 평가의 과업 정의하기

Defining the Task of Program Evaluation


프로그램평가에서의 "성공"이란?

In this article, we define “success” for program evaluation (PEv) as achieving information that can relate “inputs” to “outputs” and therefore be used to help understand sources of success or failure. (...) In other words, our goal is to understand how the program is working rather than simple classification of graduates (e.g., competent or not competent).


프로그램평가는 특정 프로그램의 '성공'이 무엇인지를 정하고 시작해야 함. 최소 yes/no 형태로 답할 수 있어야하며, 가능하다면 성공의 정도를 묘사할 수 있어야 한다. 어떤 경우든 목표와 기대치가 분명하고 구체적이어야 하며 실질적이어야 한다. 또한 성공의 일부 지표는 외부(LCME, ACGME)에 의해서 정의되기도 한다.

PEv should begin with a definition of success for the specific program. It is not sufficient to say that “good patient care” or “patient safety” is the goal. The description of success must be construed with sufficiently precise words to allow, at a minimum, the determination of whether success has been met, in a dichotomous yes/no fashion. If possible, it is also desirable to describe degrees of success. In either case, these goals and expectations should be clear, specific, and tangible. (...) However, we all recognize that demonstrating success is often defined externally, by accrediting bodies such as the LCME or ACGME. (...)


목적을 열거한 후에는 목표를 설정해야 한다.

After listing goals, objectives should be constructed for determining success in achieving the goals. (...)


그러나 성공을 정의하는 것은 첫 단계일 뿐이며, 어떻게 그 질문에 답할 것인가를 이해하는 것은 더 어렵다. 교육 프로그램에 대한 총체적인 평가에 대한 문헌이 많지 않다.

However, defining success is only the first step; understanding how to approach the question is more difficult. We have found that relatively little practical guidance exists for systematic evaluation of educational programs; the limited guidance that does exist is restricted to graduate medical education arena. 8 , 9 , 10 Indeed, the research pertaining to program evaluation in medical education is less developed than other educational fields and is largely descriptive. 11 (...)





프로그램평가 프레임워크

PEv Framework Overview


세 단계의 평가를 해야 한다.

We advocate a three-phase framework for program evaluation. This framework allows for establishing relationships among baseline, process, and product measurements—Before, During, and After.(...) All of these measurements have often been based on what the graduate does under testing circumstances (such in vitro measurements would include licensure or certifying examinations), but we also want to include in vivo observations such as what trainees do in patient care and graduates do in their practice.

      • Before (baseline) measurements are necessary to determine “how learners change,” and they are especially important to determine the effect of curriculum as opposed to selection of trainees. (...)
      • During (process) measurements are those that monitor the activities of learners during the training program (Table 2). These measurements are often collected prospectively, that is, in real time. (...) “During” measurements, therefore, need to be prioritized for program evaluation purposes so that response to critical, potentially unexpected, information is not delayed or potentially overlooked.
      • After (product) or outcome measurements, in the clinical research literature are analogous to primary and secondary end points. 
        • Primary end points indicate the overall success of management. (...) 
        • Secondary clinical end points indicate intermediate success or complications. As the majority of outcomes in medical education are as complex as in clinical studies, data gathering should optimally be done through multiple measurements (triangulation). (...)


절차 측정의 중요성을 강조하고자 한다.

It is evident that...

      • we emphasize the importance of process measurements; 
      • we believe that we must not ignore systematic process measurements, or baseline measurements, at the expense of focusing on “outcomes” for several reasons: 
        • Our current outcomes in medical education are imprecise, 
        • process measures are required by regulating bodies (LCME with ED2, ACGME with outcomes project), and 
        • process measurements are indispensable to explaining the variance in the outcomes (products) of interest.


세 단계의 프레임워크에서 다양한 척도가 사용될 수 있다. 양적 질적 평가법이 모두 사용되어야 한다.

A variety of measurements can, and should, be used in this three-phase framework (see Table 2). Both qualitative (descriptive) as well as quantitative (numerical scores) assessments can, and in many cases, should be used for program evaluation 14 as the quantitative measurements alone may overlook important findings that are revealed through qualitative analysis. (...)


이 프레임워크는 기존의 프로그램과 새 프로그램 모두에 적용가능하다.

The basic three-phase structure that we propose for program evaluation readily applies to both existing programs and new programs, curricula, or interventions. (...)


각 단계에서 다양한 측정을 하기 위해서는 상당한 자원이 필요하다. 프로그램평가정보는 다양한 의사결정자에게 제공되어야 한다.

Collecting multiple measurements in each phase of our framework can require significant time and human resources. (...) Indeed, program evaluation information should inform multiple decision makers in the program; if this is not the case, then the utility is too limited. (...)


세단계 프레임워크는 의학교육 연속체의 모든 단계를 걸쳐서 협력을 장려할 수 있다.

In addition, our three-phase framework to program evaluation could foster collaboration across the medical education continuum, as the baseline measurements for a 3rd-year clerkship director may comprise the outcome measurements of a 2nd-year course director and the outcome measurements of a clerkship director can serve as baseline measurements for a residency training program director.






Distinctions and Definitions


평가의 이상적 특징들

Next, we define the desirable attributes for the assessment tools that are placed within the model (the micro level) and afterward the desirable attributes of the overall framework (model) for the macro level. Optimally, a tool for assessing the success of a program, is feasible, reliable, and valid. 


미시적 관점에서..

For our purposes at the micro level, 

      • feasibility means the percentage of possible measurements that are actually obtained and the unit cost per measurement (i.e., cost of printing, mailing, and/or entering survey data); 
      • reliability means the internal consistency of specific assessment tools, and 
      • validity is the confidence that the inferences drawn from the data are true. Thus, in our model, validity will mean that process measurements have a significant, meaningful, predictive association or correlation with outcomes measurements and that outcomes measurements have a similar correspondence with patient outcomes.

거시적 관점에서..

At macro level, academic directors optimally should be able to collect the same set of data the same way for each trainee, at each site, each year to help ensure reliable and valid observations collected for program evaluation purposes. 

      • The overall method for program evaluation must be feasible—measurements must be obtained effectively (method should allow at least 90% of possible observations about trainees, faculty, etc., to be captured each evaluation time), 
      • consistently (observations and ratings are recorded, transferred, and stored without degradation),
      • efficiently (no more than 10% of a course, clerkship, program, or fellowship director's time must be consumed, and no more than 10% of an administrator's time is needed), 
      • economically (cost of program evaluation should be no more than 5% of the operating budget for the course, clerkship, or graduate medical education program), and 
      • securely (trainees are protected from their data being shared, and any risk is minimized).


거시적 관점에서..

At the macro level, this means that each set of measurements (before, during, and after) adequately reflects the construct appropriate to the framework.(...)


프로그램 평가모델이 '타당'하려면, 절차가 성과의 원인이 된다는 논리적 추론이 가능해야 한다.

    • For the program evaluation model to be valid, inferences that the process caused, or contributed to, the desired outcome must be reasonable. The usual standard for causality in clinical medicine is the prospective, randomized, blinded trial. This is difficult to achieve in the educational setting, except for subtotal modifications of the curriculum such as restructuring an individual clerkship or changing a 1-month rotation during residency. 
    • Therefore, validation of our program evaluation model would mean that proposed explanations for how variance in outcomes (skill of our individual graduates, in the aggregate, in taking care of their diabetic patients) was related to specific curricular elements (whether they actually performed the practice-based learning and improvement review of their own care of diabetics) would require two levels of evidence
      • initially statistical demonstration (e.g., through correlation or multiple regression models) and 
      • subsequently improvement in diabetic care given by residents graduating after further modification in their curriculum. The latter has not been published in the literature.



자원과 관리

PEv Resources and Measurements


We think that it is helpful to define essential and desirable resources for program evaluation at the outset, as this can assist with requests for funding as well as facilitate best use of limited resources. (...)



프로그램평가의 실현가능성

PEv Practicalities


Timing—Identifying Problems Early

The appropriate timeline for data collection, analysis, and reporting can differ based on the academic program. (...)


We believe that a robust formative evaluation process, based on “during” or process measurements, is a critical component of successful program evaluation. For example, program information that might cause concern, such as having a sufficient number of patients for each intern, should be collected, analyzed, and reported more frequently than on an annual basis. These concerns might be categorized as red and yellow “flags” (Table 3). (...)



Program Evaluators

- Internal evaluators 

- External evaluators 


Visualization of Goals and Objectives

It is not sufficient to say that “good patient care” or “patient safety” is the goal. The description of success must be construed with sufficiently precise words or quantifiable measurements to allow, at least, the determination of whether success has been met, in a dichotomous yes/no fashion.





한계점

Limitations of the Approach


    • In the three-phase approach, learners serve as their own control, which is not necessarily an optimal design (i.e., randomized approach) to study the benefits of a curricular innovation. 
    • Complete data collection may not be feasible for directors of educational programs with limited resources
    • Success with using this model requires close cooperation from others (registrars, other course and clerkship directors). 
    • Data collection may be constrained by local Institutional Review Boards if you wish to use findings for more than quality control in your own program. 
    • Further, the model illustrates correlation and not causation
    • As all factors cannot be controlled, the opportunity exists for multiple confounders with data analysis. 
    • Also, many measurements will have not been sufficiently studied to demonstrate reliability and validity in each institution. 
    • Despite these limitations, such a framework can be useful to guide efforts to evaluate the program.


(...)However, the use of a similar conceptual model to what we propose in the quality assurance literature does enhance the validity of our approach. 3 , 4 , 5 We do believe that our model can be effective at monitoring educational programs to make them more effective. In our program, we monitored and subsequently minimized intersite inconsistencies. Although this did not necessarily lead to any specific changes, successful program evaluation informs the stakeholders and guides their decision making, whether or not the decisions lead to change. We also propose using red and yellow flags allowing a course, clerkship, residency, and/or fellowship director to identify and remediate potentially harmful curricular and/or teacher anomalies (a definition of quality).



PEv Recommendations

  1. Begin with defining the goal: “I would be happy about my program if I knew that …”
  2. It is essential to list outcomes measurements for key parameters of success. Use triangulation—collect at least two measurements for each domain. We advocate collecting at least three measurements for each phase.
  3. Then list process measurements that you think will lead to successful outcomes.
  4. It is desirable to specify baseline measurements that would attribute success to the learner rather than the program.
  5. It is desirable to include qualitative information along with quantitative data measurements.
  6. Define the needed resources—time, human resources, and money. This will assist with feasibility of program evaluation efforts.
  7. Include red and yellow flags to prioritize if unexpected/undesirable process measurements or outcomes are observed
  8. Define your unit of analysis. As a principle, we recommend using the unit of analysis that would be most likely to reveal problems.
  9. The analysis of data should include measurements of both statistical and functional significance. Decide the functional significance that would constitute success. Decide the statistical significance that would constitute failure.



















 2007 Summer;19(3):308-18.

The structure of program evaluation: an approach for evaluating a course, clerkship, or components of a residency or fellowship training program.

Abstract

BACKGROUND:

Directors of courses, clerkships, residencies, and fellowships are responsible not only for determining whether individual trainees have met educational goals but also for ensuring the quality of the training program itself. The purpose of this article is to discuss a framework for program evaluation that has sufficient rigor to satisfy accreditation requirements yet is flexible and responsive to the uniqueness of individual educational programs.

SUMMARY:

We discuss key aspects of program evaluation to include cardinal definitions, measurements, needed resources, and analyses of qualitative and quantitative data. We propose a three-phase framework for data collection (Before, During, and After) that can be used across undergraduate, graduate, and continuing medical education.

CONCLUSIONS:

This Before, During, and After model is a feasible and practical approach that is sufficiently rigorous to allow for conclusions that can lead to action. It can be readily implemented for new and existing medical education programs.

PMID:
 
17594228
 
[PubMed - indexed for MEDLINE]


+ Recent posts