교육 프로그램 평가(AMEE Education Guide no. 29)

AMEE Education Guide no. 29: Evaluating educational programmes

JOHN GOLDIE

Department of General Practice, University of Glasgow, UK






What is evaluation?


Evaluation is defined in the Collins English Dictionary (1994) as “the act of judgement of the worth of …”. As such it is an inherently value-laden activity. However, early evaluators paid little attention to values, perhaps because they naively believed their activities could, and should, be value free (Scriven, 1983). The purpose(s) of any scheme of evaluation often vary according to the aims, views and beliefs of the person or persons making the evaluation. Experience has shown it is impossible to make choices in the political world of social programming without values becoming important in choices regarding evaluative criteria, performance standards, or criteria weightings (Shadish et al., 1991). The values of the evaluator are often reflected in some of the definitions of evaluation which have emerged, definitions that have also been influenced by the context in which the evaluator operated. Gronlund (1976), influenced by Tyler's goal-based conception of evaluation, described it as “the systematic process of determining the extent to which instructional objectives are achieved”. Cronbach (Cronbach et al., 1980), through reflection on the wider field of evaluation and influenced by his view of evaluators as educators, defined evaluation as “an examination conducted to assist in improving a programme and other programmes having the same general purpose”.


In education the term evaluation is often used interchangeably with assessment, particularly in North America. 

    • While assessment is primarily concerned with the measurement of student performance, 
    • evaluation is generally understood to refer to the process of obtaining information about a course or programme of teaching for subsequent judgement and decision-making (Newble & Cannon, 1994). 

Mehrens (1991) identified two of the purposes of assessment as:

    • to evaluate the teaching methods used;
    • to evaluate the effectiveness of the course.


Assessment can, therefore, be looked upon as a subset of evaluation, its results potentially being used as a source of information about the programme. Indeed student gain by testing is a widely used evaluation method, although it requires student testing both pre- and post-course.





History of evaluation


With the realization of the political nature of the decision-making process, educational evaluators began to embrace Cronbach's view of the evaluator as an educator, in that he/she should rarely attempt to focus his/her efforts on satisfying a single decision-maker, but should focus those efforts on “informing the relevant political community” (Cronbach, 1982b). They also realized that, while many of their attempts at evaluation did not work, some did and when they worked programme quality improved to varying degrees. Improvement, even when modest, was recognized to be valuable (Popham, 1988).






Effecting programme evaluation

There are a number of steps to be taken in planning and implementing programme evaluation.


Initiation/commissioning

The initial stage of evaluation is where the institutions or individuals responsible for a programme take the decision to evaluate it. They must decide on the purpose(s) of the evaluation, and who will be responsible for undertaking it. There are potentially numerous reasons for undertaking evaluation. Muraskin (1997) lists some of the common reasons for conducting evaluations and common areas of evaluation activity (Table 1).


Chelimsky & Shadish (1997) suggest that the purposes of evaluation, along with the questions evaluators seek to answer, fall into three general categories:

      • evaluation for accountability;
      • evaluation for knowledge;
      • evaluation for development.


Table 2.  Three perspectives and their positions along five dimensions (Adapted from Chelimsky & Shadish, 1997).






The potential cost of the evaluation often plays a major role in determining the scope of the evaluation and identity of the evaluator(s), as the cost will have to be met from the programme budget, or by seeking additional funding. The question of whether the evaluator should be internal or external to the programme's development and delivery is often considered at this point. In order to produce an effective educational evaluation, Coles & Grant (1985) point out that skills from many disciplines, for example psychology, sociology, philosophy, statistics, politics and economics, may be required. They rightly question whether one individual would have the competence to perform all these tasks, and whether an institution would necessarily have these skills in-house.


Defining the evaluator's role

The evaluator(s), having been appointed, must reflect on his/her role in the evaluation. This is important to establish as it will influence the decision-making process on the goals of the evaluation, and on the methodology to be used. It is at this point that the evaluator decides where, and to whom, his/her responsibility lies, and on the values he/she requires to make explicit. The questions to be asked in the evaluation, and their source of origin, will be influenced by these decisions.


The ethics of evaluation

Evaluators face potential ethical problems, for example, they have the potential to exercise power over people, which can injure self-esteem, damage reputations and affect careers. They can be engaged in relationships where they are vulnerable to people awarding future work. In addition, evaluators often come from the same social class and educational background as those who sponsor the evaluations. The ethics of an evaluation, however, are not the sole responsibility of the evaluator(s). Evaluation sponsors, participants and audiences share ethical responsibilities. House (1995) lists five ethical fallacies of evaluation: (평가시에 빠지기 쉬운 윤리적 오류)

      • Clientism—the fallacy that doing whatever the client requests or whatever will benefit the client is ethically correct.
      • Contractualism—the fallacy that the evaluator is obliged to follow the written contract slavishly, even if doing so is detrimental to the public good.
      • Methodologicalism—the belief that following acceptable inquiry methods assures that the evaluator's behaviour will be ethical, even when some methodologies may actually compound the evaluator's ethical dilemmas.
      • Relativism—the fallacy that opinion data the evaluator collects from various participants must be given equal weight, as if there is no bias for appropriately giving the opinions of peripheral groups less priority than that given to more pivotal groups.
      • Pluralism/Elitism—the fallacy of allowing powerful voices to be given higher priority, not because they merit such priority, but merely because they hold more prestige and potency than the powerless or voiceless.


To assist evaluators a number of organizations including the Joint Committee on Standards for Educational Evaluation (1994); the American Evaluation Association (1995); the Canadian Evaluation Society (1992); and the Australasian Evaluation Society (AMIE 1995) have issued guidance for evaluators undertaking evaluation. Other authors and professional organizations have also implicitly or explicitly listed ethical standards for evaluators, for example, the American Educational Research Association (1992), Honea (1992), and Stufflebeam (1991). Drawing on these, Worthen et al. (1997) have suggested the following standards could be applied: (평가자가 지켜야 할 기준들)

      1. Service orientation—evaluators should serve not only the interests of the individuals or groups sponsoring the evaluation, but also the learning needs of the programme participants, community and wider society.
      2. Formal agreements—these should go beyond producing technically adequate evaluation procedures to include such issues as following protocol, having access to data, clearly warning clients about the evaluation's limitations and not promising too much.
      3. Rights of human subjects—these include obtaining informed consent, maintaining rights to privacy and assuring confidentiality. They also extend into respecting human dignity and worth in all interactions so that no participants are humiliated or harmed.
      4. Complete and fair assessment—this aims at assuring that both the strengths and weaknesses of a programme are accurately portrayed.
      5. Disclosure of findings—this reflects the evaluator's responsibility to serve not only his/her client or sponsor, but also the broader public(s) who supposedly benefit from both the programme and its accurate evaluation.
      6. Conflict of interest—this cannot always be resolved. However, if the evaluator makes his/her values and biases explicit in an open and honest way clients can be aware of potential biases.
      7. Fiscal responsibility—this includes not only the responsibility of the evaluator to ensure all expenditures are appropriate, prudent and well documented, but also the hidden costs for personnel involved in the evaluation.


Choosing the questions to be asked

The aims of the evaluation depend not only on the interests of individuals or groups asking them, and the purpose(s) of the evaluation, but also on the views of the evaluator as to his/her role. The work of Cronbach is perhaps the most far reaching in this area. He views the evaluator's role as educator rather than judge, philosopher-king or servant to a particular stakeholder group. In deciding which questions to ask, he advocates asking both all-purpose and case-specific questions. The all-purpose questions depend on the evaluator's assessment of the leverage associated with a particular issue, the degree of prior uncertainty about the answer and the degree of possible and desirable reduction in uncertainty in light of trade-offs among questions, methods and resources. This results in different types of issues prevailing in different programme contexts. His case-particular questions relate to the substantive theories underpinning programme design and investigate why a programme is, or is not, successful, knowledge that not all stakeholders are interested in as they may only desire outcome knowledge, or knowledge specific to their needs. His views place a heavy burden on the evaluator in terms of the methodology being complicated by the range of questions generated (Shadish et al., 1991).


Shadish et al. (1991) supply a useful set of questions for evaluators to ask when starting an evaluation. These cover the five components of evaluation theory and provide a sound practical basis for evaluation planning (boxes 1–5).

      • Box 1: Questions to ask about Social programming
      • Box 2: Questions to ask about use
      • Box 3: Questions to ask about knowledge construction
      • Box 4: Questions to ask about valuing
      • Box 5: Questions to ask about evaluation practice


Designing the evaluation

Having decided what needs to be done the evaluator has to design an appropriate plan to obtain the data required for the purpose(s) of his/her evaluation.


Dimensions of evaluation

Stake (1976) suggested eight dimensions along which evaluation methods may vary: (평가의 여덟 차원)

    • Formative–summative: This distinction was first made by Scriven (1967). Formative evaluation is undertaken during the course of a programme with a view to adjusting the materials or activities. Summative evaluation is carried out at the end of a programme. In the case of an innovative programme it may be difficult to determine when the end has been reached, and often the length of time allowed before evaluation takes place will depend on the nature of the change.
    • Formal–informal: Informal evaluation is undertaken naturally and spontaneously and is often subjective. Formal evaluation is structured and more objective.
    • Case particular–generalization: Case-particular evaluation studies only one programme and relates the results only to that programme. Generalization may study one or more programmes, but allow results to be related to other programmes of the same type. In practice results may lend themselves to generalization, and the attempt to formulate rules for case study recognizes that generalizing requires greater control, and more regard to setting and context (Holt, 1981).
    • Product–process: This distinction mirrors that of the formative–summative dimension. In recent years evaluators have been increasingly seeking information in the additional area of programme impact.
      • Process information: In this dimension information is sought on the effectiveness of the programme's materials and activities. Often the materials are examined during both programme development and implementation. Examination of the implementation of programme activities documents what actually happens, and how closely it resembles the programme's goals. This information can also be of use in studying programme outcomes.
      • Outcome information: In this dimension information is sought on the short-term or direct effects of the programme on participants. In medical education the effects on participants’ learning can be categorized as instructional or nurturant. The method of obtaining information on the effects of learning will depend on which category of learning outcome one attempts to measure.
      • Impact information: This dimension looks beyond the immediate results of programmes to identify longer-term programme effects.
    • Descriptive–-judgmental: Descriptive studies are carried out purely to secure information. Judgmental studies test results against stated value systems to establish the programme's effectiveness.
    • Pre-ordinate–responsive: This dimension distinguishes between the situation where evaluators know in advance what they are looking for, and one where the evaluator is prepared to look at unexpected events that might come to light as he/she goes along.
    • Holistic–analytic: This dimension marks the boundary between evaluations, which looks at the totality of a programme, from one that looks only at a selection of key characteristics.
    • Internal–external: This separates evaluations using an institution's own staff from those that are designed by, or which require to satisfy, outside agencies.


Choosing the appropriate design

A range of methods, from psychometric measurement at one end to interpretive styles at the other, has been developed. Table 3 provides a list of common quantitative and qualitative methods and instruments available to educational evaluators.


Table 3.  Common quantitative and qualitative methods and instruments for evaluation.


Shadish (1993), building on Cook's (1985) concept that triangulation should be applied not only to the measurement phase but to other stages of evaluation as well, advocates using critical multiplism to unify qualitative and quantitative approaches. He proposes seven technical guidelines for the evaluator in planning and conducting his/her evaluation:(평가자가 평가 계획 및 수행시 따라야 할 가이드라인)

      1. Identify the tasks to be done.
      2. Identify different options for doing each task.
      3. Identify strengths, biases and assumptions associated with each option.
      4. When it is not clear which of the several defensible options is least biased, select more than one to reflect different biases, avoid constant biases and overlook only the least plausible biases.
      5. Note convergence of results over options with different biases.
      6. Explain differences of results yielded by options with different biases.
      7. Publicly defend any decision to leave a task homogenous.


Approaches to evaluation

With the explosion in the numbers of approaches in recent years, many of which overlap, a number of attempts have been made to categorize the different evaluation approaches. One of the most useful was developed by Worthen et al. (1997), influenced by the work of House (1976, 1983). They classify evaluation approaches into the following six categories: (평가의 여섯 가지 접근법)


      • Objectives-oriented approaches—where the focus is on specifying goals and objectives and determining the extent to which they have been attained.
      • Management-oriented approaches—where the central concern is on identifying and meeting the informational needs of managerial decision-makers.
      • Consumer-oriented approaches—where the central issue is developing evaluative information on ‘products’, broadly defined, for use by consumers in choosing among competing products, services etc.
      • Expertise-oriented approaches—these depend primarily on the direct application of professional expertise to judge the quality of whatever endeavour is evaluated.
      • Adversary-oriented approaches—where planned opposition in points of view of different evaluators (for and against) is the central focus of the evaluation.
      • Participant-oriented approaches—where involvement of participants (stakeholders in the evaluation) is central in determining the values, criteria, needs and data for the evaluation.


These categories can be placed along House's (1983) dimension of utilitarian to intuitionist-pluralist evaluation (Figure 1). 

Utilitarian approaches determine value by assessing the overall impact of a programme on those affected, 

whereas intuitionist-pluralist approaches are based on the idea that value depends on the impact of the programme on each individual involved in the programme.


Figure 1. Distribution of the six evaluation approaches on the utilitarian to intuitionist–pluralist evaluation dimension. (직관론자-다원론자 차원에 배치)



Similarly, to provide a complete description of each example would be beyond the scope of this guide. To assist readers in choosing which of the approaches might be most helpful for their needs, the characteristics, strengths and limitations of the six approaches are summarized in Table 5. These are considered under the following headings after Worthen et al. (1997):

      • Proponents—individuals who have written about the approach.
      • Purpose of evaluation—the intended use(s) of evaluation proposed by writers advocating each particular approach or the purposes that may be inferred from their writings.
      • Distinguishing characteristics—key descriptors associated with each approach.
      • Past uses—ways in which each approach has been used in evaluating prior programmes.
      • Contribution to the conceptualization of an evaluation—distinctions, new terms or concepts, logical relationships and other aids suggested by proponents of each approach that appear to be major or unique contributions.
      • Criteria for judging evaluations—explicitly or implicitly defined expectations that may be used to judge the quality of evaluations that follow each approach.
      • Benefits—strengths that may be attributed to each approach and reasons why one might want to use this approach.
      • Limitations—risks associated with use of each approach.


Table 5.  Comparative analysis of the characteristics, strengths and limitations of the six categories (after Worthen et al., 1997).







Interpreting the findings

Having collected the relevant data the next stage in evaluation involves its interpretation. Coles & Grant (1985) view this process as involving two separate, though closely related activities: analysis and explanation.


Such meta-evaluations assert that all evaluations can be evaluated according to publicly justifiable criteria of merit and standards of performance, and that the data can help determine how good an evaluation is. The need for meta-evaluation implies recognition of the limitations of all social science, including evaluation (Hawkridge, 1979). Scriven (1980) developed the Key Evaluation checklist, a list of dimensions and questions to guide evaluators in this task (Table 6). (평가 체크리스트)


Table 6.  Key evaluation checklist.




Having analysed the data, the evaluator needs to account for the findings. In education the researcher accounts for the findings by recourse to the mechanisms embodied in the contributing disciplines of education (Coles & Grant, 1985). As few individuals have expert knowledge of all the fields possibly required, specialist help may be required at this point. This again has resource implications for the evaluation. Shadish et al.'s (1991) questions (boxes 2–5) also offer evaluators salient points to consider when interpreting the results of their evaluation.



Dissemination of the findings

Again Shadish et al.'s questions on evaluation use (box 2) are of value in considering how, and to whom, the evaluation findings are to be reported. Reporting will be in some verbal form, written or spoken, and may be for internal or external consumption. It is important for the evaluator to recognize for which stakeholder group(s) the particular report is being prepared. Coles & Grant (1985) list the following considerations: (보고서 작성시 유의해야 할 사항)

      1. Different audiences require different styles of report writing.
      2. The concerns of the audience should be reviewed and taken into account (even if not directly dealt with).
      3. Wide audiences might require restricted discussion or omission of certain points.
      4. The language, vocabulary and conceptual framework of a report should be selected or clarified to achieve effective communication.


Evaluators require to present results in an acceptable and comprehensible way. It is their responsibility to persuade the target audience of the validity and reliability of their results. Hawkridge (1979) identified three possible barriers to successful dissemination of educational research findings: (교육연구 결과물을 배포하는데 있어서 장애요인)

      1. The problem of translating findings into frames of reference and language which the target audience can understand. However the danger in translating findings for a target audience is that the evaluator may as a result present the findings in a less than balanced manner.
      2. If the findings are threatening to vested interests, they can often be politically manoeuvred out of the effective area.
      3. The ‘scientific’, positivistic, approach to research still predominates in most academic institutions, which may view qualitative research methods and findings as ‘soft’, and be less persuaded by their findings. As qualitative methods receive greater acceptance this is becoming less of a problem.


A further problem concerns the ethics of reporting. As Coles & Grant (1985) suggested in their consideration of how—and to whom—to report, dissemination of information more widely may require to be censored, for example, information about a particular teacher would not usually be shared with anyone outside a select audience. The evaluator also has to be aware that the potential ramifications of a report may go wider than anticipated, for example into the mass media, where this may not be desired.



Influencing decision-making

As has been touched upon earlier, the initial enthusiasm of the 1970s educational evaluators became soured by the realization of the political nature of the educational decision-making process, and by the inconclusive results that were often obtained. Coles & Grant (1985) suggest the following ways in which evaluators can effect the educational decision-making process: (평가자들이 교육적 의사결정과정에 영향을 줄 수 있는 방법)

      1. involving the people concerned with the educational event at all stages of the evaluation;
      2. helping those who are likely to be associated with the change event to see more clearly for themselves the issues and problems together with putative solutions;
      3. educating people to accept the findings of the evaluation, possibly by extending their knowledge and understanding of the disciplines contributing towards an explanation of the findings;
      4. establishing appropriate communication channels linking the various groups of people involved with the educational event;
      5. providing experimental protection for any development, allocating sufficient resources, ensuring it has a realistic life expectancy before judgements are made upon it, monitoring its progress;
      6. appointing a coordinator for development, a so-called change agent;
      7. reinforcing natural change. Evaluation might seek out such innovations, strengthen them and publicize them further.






 2006 May;28(3):210-24.

AMEE Education Guide no. 29: evaluating educational programmes.

Abstract

Evaluation has become an applied science in its own right in the last 40 years. This guide reviews the history of programme evaluation through its initial concern with methodology, giving way to concern with the context of evaluation practice and into the challenge of fitting evaluation results into highly politicized and decentralized systems. It provides a framework for potential evaluators considering undertaking evaluation. The role of the evaluator; the ethics of evaluation; choosing the questions to be asked; evaluation design, including the dimensions of evaluation and the range of evaluation approaches available to guide evaluators; interpreting and disseminating the findings; and influencing decision making are covered.

PMID:

 

16753718

 

[PubMed - indexed for MEDLINE]


+ Recent posts