의학교육에서의 과목평가 (Teaching and Teacher Education, 2007)

Course evaluation in medical education

Jennifer R. Kogana, Judy A. Sheab,






1. 도입

1. Introduction

 

다음과 같은 내용을 다룰 것.

(1) briefly consider the dis- tinctive features of medical education compared to higher education and the implications these differ- ences may have on course evaluation,

(2) present a framework for course evaluations,

(3) review some of the important details that shape the evaluation process,

(4) present key measurement issues that are important in implementing and interpreting course evaluation data, and

(5) briefly discuss opportu- nities for expanding the scope of medical education research regarding course evaluations.




2. 의학교육의 유니크한 특징들

2. Unique features of medical education


Medical education differs from most of higher education in four important ways that will impact on course evaluation.

  • 첫째, 임상교육이 큰 부분 포함되어 있다. 교육방식이 다르고, 따라서 과목평가도 달라야 한다.
    First, a great portion of medical education involves clinical teaching. In this type of teaching, somewhat analogous to advanced independent studies and graduate work using a preceptor model, the look and feel of a course is somewhat different than in a preclinical setting. Thus, the course evaluation will also be somewhat different.

  • 둘째, 전임상 교육과정에서도 과목 내에서 구조가 다르다. 의학교육의 과목은 대형강의와 소그룹의 조합으로 이뤄진다. PBL 등. 또한 과목을 organize하는 교수들은 교육의 일부만을 담당한다. 따라서 다수의 teacher들의 교육스킬과 스타일이 뒤섞여서 과목의 특징으로 나타난다.(multi-instructor precli- nical courses)
    Second, even in the pre-clinical curriculum, there is a difference in the structure within courses. within medical education courses, courses are often taught in a combination of large lectures and small groups. Many schools have adopted problem-based learning (PBL) curricula that is dominated by small group learning (Dolmans, DeGrave, Wolfhagen, & van der Vleuten, 2005). Additionally, courses are sometimes orga- nized by an individual who does only a portion of the teaching. Thus, it is important to disentangle what may be features of the course such as organization and structure from the teaching skills/style of multiple teachers. An example of an evaluation system used for multi-instructor precli- nical courses may serve as a model (Leamon & Fields, 2005).

  • 세 번째 차이는, 학생의 과목과 교수자 선택권이 적다는 것이다. 소수의 selective와 elective를 제외하면, 학생들은 과목 선택의 기회가 매우 적다.
    A third difference between medical education and higher education is students’ freedom to select courses and instructors. In medical education the curriculum is prescribed and outside of a few selectives and electives, students have minimal opportunity to select courses (both pre- clinical and clinical) and/or teachers.
  • 네 번째는 더 큰 범위의 커리큘럼의 구조가 다르다.
    The fourth way in which courses within medical education differ from higher education is in the structure of courses within the larger curriculum. 

    • 통합교육과정. 각각의 teacher 뿐 아니라, overarching organization도 평가해야 한다.
      courses such as anatomy, pathology, pharmacology, and pathophysiology are entirely integrated. When designing an evaluation system for such a curriculum, it becomes important to not only evaluate individual teachers, but to recognize the need to evaluate larger issues such as the overarching organization of the curriculum,

    • 다수의 교수자. 학습자와 교수자 간 노출(만나는) 길이가 다양하다는 점을 고려해야 한다.
      Separating features of the course from those of the teacher seems particularly important when courses are taught by multiple instructors. Appropriate interpretation of results may very well need to take into account the variable length of exposure between the learner and the teacher. In integrated curricula it is vitally important to make sure course directors and learners similarly define the ‘‘course.’’



3. 프레임워크와 기본원리

3. Framework and basic principles


그림 1.

Fig. 1 depicts a framework that can help to define and organize course evaluation.


시작점은 평가의 대상focus이 되는 '과목'을 정의하는 것이다. 평가의 단위‘‘unit of evaluation’’는 다양할 수 있다.

Naturally, the starting place is defining the ‘‘course,’’ that is, the focus of the evaluation. As discussed previously, the ‘‘unit of evaluation’’ may be a lecture series, may include small group and independent learning sessions, or may be a clinical rotation.


 


 

4. 왜 과목을 평가하는가?

4. Why evaluate courses?



다양한 이유가 있다.

  • The first question to ask is why a course is being evaluated. There are many important reasons to do so, including curriculum evaluation,

  • accreditation,

  • feedback to course directors/organisers,

  • improve- ment of the educational content and educational methods with the aim of improving student learning and

  • collection of data to facilitate the faculty appointment and promotion process.


'왜' 평가를 하는지가 '무엇'을 평가하는지 영향을 준다.

Most directly, the ‘‘why’’ answer influences the content (‘‘What’’question). For example,

  • if the primary goal is overall curriculum evaluation then the questions one asks might be about learners perceptions of their preparedness for the course and how well the material within this particular course is integrated with that which was taught previously.

  • If the focus is feedback to faculty, then the balance of questions might be focused on individual teachers’ coherency,ability to present material in an understandable fashion, and receptivity to student questions.

  • If the focus is on course content, it might be decided that the best-informed evaluators (the ‘‘who’’ question)are peers rather than students. 



5. 무엇을(누구를) 평가하는가?

5. What (who) is evaluated?


 

'잠재교육과정'에 대한 평가가 점차 중요해지고 있다.

In medical education,evaluation of the ‘‘hidden curriculum’’, (defined as‘‘the commonly held understandings, customs,rituals, and taken-for-granted aspects of what goes on in the life-space we call medical education ... training institutions, cultural entities and moral communities intimately involved in construction definitions about what is ‘‘good’’ and what is ‘‘bad’’in medicine (Haidet, Kelly, & Chou, 2005)) is increasingly component being recognized as an important(Hafferty,of curriculum evaluation 1998; Haidet et al., 2005; Lempp & Seale, 2004). 




현재까지 대부분의 과목평가가 '프로세스'에 초점을 두고 있다.

To date, the majority of course evaluation focuses on process. When evaluating processes the content of an evaluation may focus on topics such as

  • organization, availability, clarity and understanding of course objectives,

  • quality of materials such as textbooks and readings, and

  • perhaps fairness/comprehensiveness of the learners’ evaluation methods (i.e., tests).

 

Other features of a course, especially within medical education include

  • learners’ perceptions of appropriate placement of a course within the curriculum and

  • relevance of a course to clinical education. 



과목평가는 교육의 효과성과 관련될 수 있다.

Course evaluation evaluation can be related to learner of teaching effectiveness (Litzelman, Shea, Wales, & Kogan, 2005).

  • 고등교육에서의 평가도구 Forms such as the Course Experience Questionnaire (CEQ) used in higher education have demonstrated validity in the medical education setting during preclinical courses (Broom-field & Bligh, 1998).

  • 임상교육 평가도구. 교육효과성의 7가지 카테고리 With respect to clinical teaching, Litzelman has examined and refined an instrument to evaluate clinical teaching which is based on seven categories of teaching effectiveness:

    • establishing a positive learning environment,

    • con-trol of the teaching session,

    • communicating goals to the learner,

    • promoting understanding and retention,

    • evaluation of achievement of goals,

    • feedback to thelearner and

    • promotion of self directed learning1998).(Litzelman, Stratos, Marriot, & Skeff, 




global assessment를 할 것인지 vs multi-item, multi-dimensional한 scale을 사용할 것인지

As evaluations are developed to assess teaching effectiveness, it must be determined whether a global assessment of teaching effectiveness will be made or if a multi-item, multi-dimensional scale will be used. 


어떻게 평가를 하든, systematic review를 보면 임상교육의 평가는 두 가지 domain으로 이뤄진다 - interpersonal and clinical teaching domains. 반대로, 학생의 교육효과성에 대한 다른 연구들에서는 multi-dimensional 이 더 적합하다고 제안한다.

Whether learners’ evaluations of teachers reflect global or multi-dimensional assessment has been addressed in the medical education literature. systematic review by Beckman (Beckman, Ghosh, Cook,Erwin, & Mandrekar, 2004) suggests that evalua-tion of clinical teaching is primarily comprised of two domains—interpersonal and clinical teaching domains. In contrast, other studies of student ratings of teaching effectiveness suggest that a multi-dimensional assessment is most appropriate(Hayward et al., 1995; James, Kreiter, Shipengro-ver, & Crosson, 2002; Litzelman et al., 1998).


흥미롭게도, 어떤 학생들은 자신이 teaching performance평가를 위한 훈련이 안되어있다고 생각한다.

Interestingly, some students believe that they have not been trained to evaluate teaching performance (Afonso, Cardoza, Mascarensas, Aranha, & Shah, 2005).


임팩트나 효과성에서 평가하는 것들

Impact or effectiveness outcomes might include

  • determinations of whether the pro- gram has improved the educational skills of its students (i.e. approaches to learning, communica- tion skills, information gathering skills),

  • prepared students effectively for their clinical career roles (i.e. professional behavior), or

  • changed the educational environment (scholarship of teaching) (Blumberg, 2003).






누가 평가하는가?

6. Who evaluates?


6.1. 현재 학생들

6.1. Current students


 

가장 흔함.

As mentioned, students currently enrolled on a course are the most common course evaluators.


학생이 어느 level에 있느냐에 따라서 평가가 달라질 수 있다.

The level of the student can affect course ratings although studies conflict as to whether it is learners earlier or farther along in training that rate faculty more favorably.


1년중 어느 시점에 평가해야 하느냐에 대한 논쟁

It is also controversial as to whether time of year impacts ratings of clinical instruction,



6.2. 과거 학생들

6.2. Former students


Students can be asked to evaluate courses or curriculum well after a given course or even once medical training has concluded (DaRosa, Prystowsky, & Nahrwold, 2001; Parrino & Kern, 1994).


6.3. 동료

6.3. Peers


교수에 대한 평가, 지속적인 과목과 교육과정 개선에 활용되어왔다. 흥미롭게도 많은 경우 동료와 학생의 평가가 일치한다. 피어리뷰 시스템은 존경을 받고 경험이 풍부한 교수들로 이뤄진 위원회에 의해서 운영될 때, 대학이 이 위원회를 지원하고 존중할 때, 과목책임자가 과목평가에 참여할 때 가장 잘 돌아간다.

Peer review of courses has been used to facilitate faculty evaluation and continual course and curriculum improvement (Burke, Bonaminio, & Walling, 2002; Horowitz, Van Eyck, & Albanese, 1998; Levine, Vanek, Lefferts, Michener, & Weiker, 1988). Interestingly, peer and student course ratings have been shown to be congruent the majority of the time (Horowitzet al., 1998). The peer review system seems to work best when run by a committee of respected, experienced teachers chosen by their peers, when the school faculty supports and respects the committee and its work, and when course directors are involved in evaluating their courses (Horowitzet al., 1998).


임상교육에서 더 흔하게 사용되어지고 있다.

Peer review is becoming more common with clinical teaching, and tools have been developed that can be used reliably across peer evaluators (Beckman, Lee, Rohren, & Pankratz, 2003).

  • 자기주도학습, 학습분위기, 목표에 대한 의사소통, 평가 등은 internally consistent하고, 교육에 대한 열정은 consistently rated.
    Pre- liminary evidence shows that certain teaching characteristics such as self-directed learning, learn- ing climate, communication of goals and evaluation are among the most internally consistent domains with teaching enthusiasm being one of the most consistently rated (Beckman et al., 2003).

  • 학생이나 레지던트보다 동료가 더 박하게 평가한다.
    A study by Shores (Shores et al., 2000) demonstrated validity for concurrent a medical school lecture evaluated by students and faculty peers (r ¼ :85) while finding that peer faculty often rate teaching lower than students. Similarly peer evaluators scored inpatient faculty teaching lower than resident evaluators (Beckman, Lee, & Mandrekar, 2004).



6.4. 자기평가

6.4. Self-evaluation


자기평가결과는 다른 평가와 상관관계가 별로 없다는 리뷰결과.

A recent review by Eva and Regehr argues that self-evaluations are largely uncorrelated with other evaluations and that the problem stems from a failure to begin with a cogent conceptualisation of the nature and need for self assessment in the daily practice of health care professionals (Eva & Regehr, 2005). 



7. 언제 평가하는가?

7. When are courses evaluated?


여러 시점에 평가될 수 있다.

Courses can be evaluated at many points in time:

  • 시작직후 immediately after a lecture or teaching encounter has occurred,

  • 로테이션 후 at the end of a course/clinical rotation, or

  • 연말 at a point in time well after the course has been completed, for example, at the end of the year or at the end of the training program.


의학교육에서 평가 시점에 대한 연구는 소수. continuous assessment를 했을 때 retrospective한 평가보다 더 신뢰도가 높다. 그러나 end-of-lecture와 end-of-teaching의 상관관계는 매우 높다.

Within the context of medical education, there have been only a few studies addressing the timing of evaluations and if evaluations completed at the time of a lecture are congruent with those provide at a later time, such as the end of a course (Shores et al., 2000). Some studies suggest that reliability of ratings is higher when continuous assessment of course content is done rather than assessing lectures in a retrospective fashion (Peluso, Tavares, & D’Elia, 2000). However, the correlation between end of lecture global teaching assessments with end of course teaching assessments has been shown to be high (Shores et al., 2000).



8. 평가 프로세스 디테일

8. Details of the evaluation process




9. 평가서식 디자인

9. Design of the evaluation form


평가서식이 어떻게 construct되었는지가 rating에 큰 영향을 준다.

The way in which the rating form is constructed can have significant impact on the ratings made.

  • 총괄평가 vs 다차원 평가
    The use of a single item or few global items versus a multi-dimensional form with many items was discussed earlier.

다른 요인들
Other factors are also important.

  • 긍정선택지가 왼쪽에 있을 때 더 높은 점수(primacy effect)
    For example, one study showed that students are more likely to give a course positive ratings when the scale on the rating form has the positive side of the scale on the left. This is known as the ‘‘primacy effect’’ where individuals tend to endorse items or statement printed on the left side of the page (Albanese, Prucha, Barnet, & Gjerde, 1997a).

  • behavioral anchor가 양쪽 끝에만 있을 때 더 positive end에 응답경향 높음
    Additionally, when behavioral anchors or descrip- tors are left off the middle part of the rating scale (thereby only being on the extreme ends), students are more likely to evaluate on the positive end of the scale (Albanese, Prucha, & Barnet, 1997b).

  • 부정문 vs 긍정문. 부정형 문장은 긍정형 문장과 다르게 작동perform한다. 신뢰도를 떨어뜨릴 수 있다.
    There is also debate as to whether it is important to have both positively and negatively worded items on a rating scale. While common wisdom has been to include both positive and negatively worded items and then reverse code the negative items, it has also been suggested that negatively worded phrases perform differently than positive worded phrases and that the inclusion of negatively worded items might decrease scale reliability (Stewart & Frye, 2004).



10. 익명 vs 실명후 보호

10. Anonymity versus confidentiality


익명평가가 더 선호된다. 익명평가시 학생이나 레지던트가 교수에게 더 낮은 평가를 준다. 개별 item에 대해서 뿐만 아니라 overall teaching domain에서도 그렇다. 익명평가가 아닐 경우 레지던트는 자신이 평가한 그 교수와 나중에 함께 일해야 할 사실을 걱정한다. 그러나 다른 연구에서 교수들의 평가는 익명평가에서 더 높게 나오기도 했다 (closed in-person debriefing sessions보다)

Most authors agree that anonymous evaluation systems are preferred to open evaluation systems (Albanese, 2000). One study in the medical education literature suggests that students and residents rate faculty lower (i.e., more harshly) in anonymous evaluations. Ratings are lower on individual items within a form as well as overall teaching domains. When evalua- tions are not anonymous, residents worry about the implications of their evaluations since they might have to work with that faculty again in the future (Afonso et al., 2005). However, in another study (Pelsang & Smith, 2000) faculty ratings were more favorable with anonymous evaluations than closed in-person debriefing sessions.


의학교육에서 평가자의 익명성을 유지하는 것은 매우 어려운데, 교수가 특정 시기 혹은 심지어 한 해에 걸쳐서 소수의 학생들과 work하기 때문이다. 이런 경우, 일정정도 숫자의 학생응답을 수합한 다음에 교수에게 전달함으로써 학습자의 risk를 줄일 수 있다.

In medical education, maintaining anonymity of the evaluator can be particularly challenging given that the faculty member being evaluated may only work with a small group of students at any given time, or even over the course of a year. In these instances, the risk to learners can be minimized if evaluation ratings are given to faculty after a critical mass of students have provided ratings (Albanese, 2000).



11. 자발적 vs 강제적

11. Required versus voluntary


물론 응답률이 평가결과 해석의 타당도에 영향을 줄 수 있다. 과목 끝나고 할 경우 70~80%는 나오는게 보통이다. 응답률을 높이기 위해서, students can be sampled to provide evaluations. 전체 학생에게 평가지를 돌리는 것은, 무작위로 선택된 일부 subgroup에게만 돌리는 것보다 신뢰도가 더 높지 않을 수도 있다.

One detail that is important, but has not been well studied, is the implication of having voluntary versus required evaluations. Clearly, the response rate can impact the validity of evaluation inter- pretation. It is common when handing out forms at the end of a course to get rates close to 70–80%. In order to improve the response rate, especially in a multi-lecturer course, students can be sampled to provide evaluations (Albanese, Schroeder, & Barnes, 1979; Carline & Scher, 1981; Kreiter & Lakshman, 2005). Administering an evaluation instrument to an entire class may not provide better reliability than administering an evaluation instru- ment to randomly selected subgroups of students (Leamon & Fields, 2005).




12. 질적 vs 양적

12. Qualitative versus quantitative


 

여러가지 방법

  • 포커스그룹 Focus groups are one qualitative methodology that has been used to gather students’ opinions (Frasier, Slatt, Kowlowitz, Kollisch, & Mintzer, 1997; Lam, Irwin, Chow, &Chan, 2002; Shea, Bridge, Gould, & Harris, 2004). Focus groups are efficient and presumably provide a means to learn something additional from the group conversation that might not be learned in one-on-one interviews.

  • Nominal Technique: 질적 방법에 약간의 양적평가를 더한 것 The group alternative that nominal technique is an borrows from qualitative methodology and adds some quantitative assessment (Lloyd-Jones, Fowell, & Bligh, 1999).

  • 일대일면담 However, given the complexities of clinical training schedules and the distance from and infrequency with which learners congregate at a home base, one to one interviews are also a reasonable method to collect course evaluation data.


질적방법만 사용하기보다는 open-ended Q를 적도록 할 수도 있다. 질적정보는 학생들의 평가에 더해서 교수의 강점과 약점을 잘 드러내준다. 교수들의 순위는 리커트 척도의 양적 점수와 교육효과성에 대한 질적 평가에서 비슷한 결과를 보였다.

Instead of a full-fledged qualitative design, it is also informative to analyze responses to open-ended questions from students about their clinical tea- chers’ effectiveness. The qualitative information, used alone or to complement student ratings, can provide detailed information about individual faculty’s strengths and weaknesses (Sierles, 1996). Quantitative scores on Likert ratings and the qualitative assessment of comments about teaching effectiveness result in similar rankings of faculty (Lewis & Pace, 1990).


At this point it would be premature to say that any of the issues have large bodies of research to support particular decisions. However, consensus seems to suggest that multi- item forms, with evaluators’ identify kept anon- ymous from the end user(s) will work well. And, the use of qualitative methods in medical education is clearly growing.



13. 과목 평가에서 측정 관련 이슈

13. Measurement issues in course evaluation



14. 타당도

14. Validity


타당도란..

Questions of validity are concerned with asking ‘‘have we measured (i.e., evaluated) that which we intended to measure? Do scores behave the way we expect them to?’’


일반적으로 학생들의 teaching에 대한 평가는 valid하다고 여겨진다.

In general, student evaluations of teaching are believed to be valid. For example, there is a large literature in higher education surrounding the validity of student evaluations of teachers (d’Apollonia & Abrami, 1997; Greenwald, 1997; Greenwald & Gillmore, 1997; Marsh & Roche, 1997; McKeachie, 1997).


평가가 좋은 교수에게서 더 잘 배운다고 결론을 내린 논문들이 있다. 그러나 학생의 교수자 평가에 영향을 주는 것들은..다음과 같은 것들이 있다.

A common conclusion from such studies is that higher ratings are modestly correlated with higher achievement—good teaching causes learning (Abrami, Cohen, & d’Apollonia, 1998; d’Apollonia & Abrami, 1997). Path-analytic studies looking at construct validity examine the effects of variables other than teaching effectiveness that might impact grades and students’ evaluations of teaching such as

  • prior interest or motivation in the course,

  • grading leniency,

  • workload difficulty,

  • class size,

  • level of course or year in school (Marsh & Roche, 1997).


clinical teaching efficacy 에 대한 학생들의 평가는 다양한 서식form이 사용되었을 때 서로 비슷하다

The most common model in medical education is to collect cross-sectional data and use statistics such as correlations to ask how ratings of course/teacher effectiveness compare to scores on another tool (e.g., self-assessment) or grades. For example, two studies looking at con- vergent validity suggest that students’ evaluations of clinical teaching efficacy are similar when different forms are used (Steiner, Franc-Law, Kelly, &Rowe, 2000; Williams, Litzelman, Babbott, Lubitz, & Hofer, 2002).


teacher rating이 student outcomes와 연관된다.

Certainly, a persuasive group of studies are those that show that teacher/preceptor ratings are related to student outcomes (e.g., Blue, Griffith, Wilson, Sloan, & Schwartz, 1999; Griffith, Wilson, Haist, & Ramsbottom-Lucier, 1997, 1998; Stern et al., 2000).


그러나 둘 사이에 관련이 없다고 말한 연구도 있다.

However, it should be noted that another study found a non-systematic relationship between stu- dent grades and teaching ratings (Shores et al., 2000).


약간의 근거만 있는 정도

Overall, there is some evidence to support the validity of interpretations made regarding student’s rating of faculty/teachers.



15. 신뢰도

15. Reliability


 

The most common method is to compute and report a Cronbach’s alpha for multi-item domains (perhaps defined through factor analysis) within an evaluation instrument (Shea & Fortna, 2002).


Beckman, Ghosh, et al. (2004), Beckman, Lee, et al. (2004), in a systematic review of 21 instru- ments to evaluate clinical teaching in medical education, found that factor analysis was the most common method to determine scale dimensionality followed by estimates of internal consistency of items using Cronbach’s alpha.



16. 재생산가능성

16. Reproducibility


단순하게 말하면, 응답이 많을수록 점수는 더 정확하고, 더 일반화가능도가 높다.

Quite simply, the more ratings there are, the more precise and generalizable the score.


짧게 말하자면,  과목평가와 관련하여 우리가 배운 것은 신뢰도있게 teaching performance를 추정하기 위해서는 다양한 평가가 필요하다는 것이다. 8~10명 정도의 레지던트(학생)가 평가한 결과는 reasonably reproducible하다. 비록 일부는 0.9이상의 재생산가능도계수를 달성하려면 20명 이상이 필요하다고 말하기도 한다.

In short, what we have learned with respect to course evaluation is that multiple evaluations of faculty teaching are needed to produce reliable estimates of teaching performance. It has been estimated that anywhere between 8 and 10 resident or student evaluations produce a reasonably repro- ducible score (Hayward et al., 1995; Irby & Rakestraw, 1981), although some suggest that as many as 20 evaluations are needed to achieve a reproducibility coefficient of 0.9 (Ramsbottom- Lucier et al., 1994).


17. Opportunities




 



Leamon, M. H., & Fields, L. (2005). Measuring teaching effectiveness in a preclinical multi-instructor course: A case study in the development and application of a brief instructor Teaching and Learning in Medicine, 17(2), rating scale. 119–129.






Course evaluation in medical education

  • a Hospital of the University of Pennsylvania, University of Pennsylvania School of Medicine, 3701 Market Street- Suite 640, Philadelphia, PA 19104, USA
  • b Hospital of the University of Pennsylvania, University of Pennsylvania School of Medicine, 1223 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021, USA

Abstract

Course evaluation is integral to medical education. We discuss (1) distinctive features of medical education that impact on course evaluation, (2) a framework for course evaluations, (3) details that shape the evaluation process, (4) key measurement issues important to data gathering and interpretation, and (5) opportunities for expanding the scope of research regarding course evaluations. Drawing from higher education and medical education literature, a great deal is known about course evaluations. High-quality rating scales have been developed. There is evidence that ratings are valid and reproducible given sufficient ratings are gathered, but there remain many areas deserving of more research.

Keywords

  • Program evaluation
  • Curriculum
  • Validity
  • Reliability
  • Medical school
  • Student ratings


+ Recent posts