행동에서 귀인으로: 프로페셔널리즘 평가의 추가적 고려사항(Med Educ, 2009)

From behaviours to attributions: further concerns regarding the evaluation of professionalism

Shiphra Ginsburg,1 Glenn Regehr2 & Maria Mylopoulos3






도입

INTRODUCTION


프로페셔널리즘을 평가하는 것은이 분야의 많은 연구에도 불구하고 의학 교육에서 여전히 어려운 과제입니다.

Evaluating professionalism remains a challenge in medical education, despite a plethora of research in this domain.1,2


이것은 다음의 세 가지 주요 유형의 rationale에 기초하여, 프로페셔널리즘 딜레마에 대한 학생들의 반응을 설명하는 이론적 틀의 개발과 검증을 가져왔다. 

  • 전문성의 원칙;

  • proposed action의 의미,

  • 정서 (직감, 감정 또는 성격과 관련된 문제).

이 틀 내에서 '공언', '부주의', '부인'이라는 용어가 채택되어 이러한 원칙, 영향 및 의미를 더 자세히 분류합니다.


This resulted in the development and validation of a theoretical framework that describes students’ responses to professionalism dilemmas based on three major types of rationale: 

  • principles of professionalism; 

  • implications of proposed actions, and 

  • affect (issues related to gut instincts, feelings or personality).4 

Within this framework, the terms ‘avowed’, ‘unavowed’ and ‘disavowed’ were adopted to further categorise these principles, affect and implications. 


예를 들어, 환자 진료 및 안락, 정직, 자기 규제 등의 원칙과 같은 프로페셔널리즘 문서 또는 행동 강령에서 직업이 명시 적으로 진술 한 것을 반영하는 경우 공언avowed 으로 간주되었습니다.

For example, principles were considered avowed if they reflected what the profession explicity states in professionalism documents or codes of conduct, such as principles of patient care and comfort, honesty, self-regulation, etc. 


일부 원칙은 공언되지 않음unavowed으로 분류되었는데, 명시적으로 언급되지 않았지만 교육 환경에서 암묵적으로 묵인된 것처럼 보일 경우를 말한다 (예 : 순종, 복종, 좋은 교육의 획득). proposed action의 의미와 관련하여

  • 일부는 공언 된 것으로 간주 될 수있다 (환자와 관련된 것)

  • 일부는 공언되지 않은 것 (팀원과 같은, 다른 사람들을 위한 함의),

  • 일부는 행동에 대한 정당한 이유로서 직업에 의해 부인 된 것으로 분류 될 수 있습니다 (예 : 성적 또는 평가와 관련된 우려와 같이 학생 자신에게 발생할 수있는 것을 고려한 행동)

Some principles were considered to be unavowed as they are not explicitly stated but seem to be implicitly condoned in the education setting (e.g. obedience, deference, getting a good education). Regarding the implications of proposed actions, it was found that 

  • some could be considered avowed (those related to patients) and 

  • some unavowed (implications for others, such as team members), and 

  • some could only be categorised as disavowed by the profession as legitimate reasons for action (e.g. behaviours that are based on consideration of what might happen to the student him or herself, such as concern around grades or evaluations).



최근의 연구에서, 30 명의 교수진 직원은 전문적으로 어려운 상황에 직면한 학생에 관한 다섯 가지 시나리오를 제시 받았다.

In a recent study, 30 faculty staff were presented with five scenarios in which students were faced with professionally challenging situations,


흥미롭게도, 시나리오의 표준화에도 불구하고, 어떤 행동이 적절한지가 교사 간에 거의 일치하지 않았다. 비슷한 행동이라도 교수가 어떻게 행동을 해석했는지에 따라 프로페셔널하거나 그렇지 않은 것으로 판단되었다. 이러한 결과는 다른 연구와 일치하며, 동일한 행동도 그 뒤에 숨은 perceived rationale에 따라 다르게 고려 될 수 있음을 보여줍니다.

Interestingly, despite this standardisation of the scenarios, there was poor agreement not only between, but even within faculty across scenarios about what they considered appropriate student behaviour, as similar behaviours could be viewed as either professional or unprofessional depending on how the faculty member construed the situation. These results are consistent with other work demonstrating that the same behaviours can be considered differently depending on the perceived rationales behind them,6


후속 연구에서, 위에서 언급 한 5 가지 시나리오는 전문성 시험의 비디오 또는 텍스트 기반으로 제시되었으며, 53 명의 학생들이 각 시나리오를 제시 한 다음 3 가지 항목에 대해 서면으로 응답하도록 요청했습니다.

In subsequent work, the five scenarios mentioned above were used as either videoor text-based prompts on a professionalism ‘examination’, during which 53 students were presented with each scenario and then asked to respond in writing to three items

  • 다음 상황에서 다음에 무엇을 할 것인가? asking what they would do next in the scenario, 

  • 왜 그럴 것인가? why they would do it, and 

  • 그러한 행동의 잠재적 함의는? what they saw as potential implications of their actions.7


흥미롭게도 텍스트 및 비디오 프롬프트 그룹 간의 응답 패턴에는 차이가 발견되지 않았습니다. 또한, 시험 환경에서의 학생들의 반응이 (인터뷰 상황에서 발생하는 패턴과 비교할 때) 더 avowed rationale 쪽으로 변화하였지만, 학생들은이 설정에서 모든 범위의 이론적 근거를 계속 사용했으며, 여기에는 disavowed도 포함되었다.

Interestingly, no difference was found in response patterns between the text- and video-prompted groups. Further, although students’ responses in an examination setting showed a shift towards more avowed rationales for action (compared with patterns arising in the interview context), students continued to use the full range of rationales in this setting, including those that were considered to be disavowed.



METHODS


세 가지 질문

We included students’ responses to three questions:


1 그 상황에서 다음에 무엇을 할 것인가?
Describe in detail what you would do next if you were the student in the scenario;

2 왜?? 

Why would you do that? 

3 그렇게 하면 다음에 어떤 일이 일어날 것인가?
What do you think would happen next if you did that?



교수 채점

Faculty marking of examination papers



Twenty faculty members were randomly assigned to one of 10 examiner pairs. Each member of the pair independently marked the responses to the three questions for a single scenario for the 20 students in either the text or video-prompted group.


Faculty members were instructed to mark students’ responses as if this were a real examination. Each of the three questions was scored out of five.


staff were told to ‘use whatever criteria seem appropriate to you in deciding how many marks to assign’. It was hoped that this relatively unguided instruction would allow us to determine what teaching staff felt were more or less professional responses based on their experience as attending physicians and supervisors of medical students. In addition, they were asked: ‘Based on this response to this scenario, how professional do you think this student is?’ To answer this question, faculty members were given a 6-point scale ranging from ‘extremely unprofessional’ to ‘extremely professional’ and had 1 week to complete the assignment.




교수 인터뷰

Interviews with faculty



Each faculty member was asked to choose a highly rated and a low-rated student response as a basis for discussing his or her philosophy of marking. In particular, teaching staff were asked why they had assigned certain responses high or low scores, and whether they had weighed the behaviours, reasoning or consequences differentially in the process of deciding a global score.



RESULTS


평가자간 신뢰도?

What was the inter-marker reliability in scoring?


낮음

pairwise ICCs were highly variable (ranging from ) 0.40 to 0.89 depending on the pair and the question), with average ICCs across the 10 pairs for Q1–3 and the global rating at 0.41, 0.38, 0.28 and 0.39, respectively (Table 1).


점수를 숫자화하기 어렵다는 것이 이유일수도

One possible explanation for the low correlations relates to faculty members’ expressed difficulty in assigning numerical scores in general. For example, during the interviews one faculty member described her struggle in giving a low score even when she felt it was justified:


극단값(1점 혹은 6점)을 사용하기 어려움

Another said he didn’t find it difficult to use ‘extremely’ on the positive end of the scale, but:


'6점은 몰라도 1점을 주기는 너무 어렵다'

‘I find it much harder [to give a score of 1].’ (F18).


평가지에 쓰여진 말 때문에 어려웠을수도

He went on to suggest that the language we had chosen to use might have made things more difficult for faculty, and that a restricted scale might work better:



‘3점 척도였으면 더 좋았겠다'

'I would have been happier with a 3-point scale, not a 6-point scale. You’re either unprofessional, in the middle, or very professional.’ (F10)



행동이나 판단근거의 가중치가 다른가?

Did faculty differentially weight the action or the rationale?


Q1과 Q2의 상대적 중요도가 다르다. 어떤 교수는 판단근거(Q2)이 제일 중요함.

In the post-scoring interviews, faculty members expressed different views about the relative importance of students’ actions (Q1) and rationales (Q2) to their overall opinions of the students’ professionalism (the global rating). For some members of faculty, the rationale behind the behaviour seemed most important.


어떤 교수는 행동(Q1)이 가장 중요함.

By contrast, some faculty felt the behaviour was more important:


어떤 교수들은 무엇이 더 중요한지 갈등이 있음

Some faculty members went further and expressed the conflict and tension involved in deciding which was more important in a given case.


행동이든 근거이든 일관된 것은 없음.

This illustrates that there is no consistent favouring ofone element over the other – the behaviour or the reasoning behind it 


행동 or 행동의 이유 or 행동에 수반하는 결과

For some, the action seemed most important, whereas for others the student’s justification of the action (or demonstrated awareness of the consequences of the action) had greater influence on the score.


Q1~Q3와 총점과의 상관관계는 유사함. 즉, 평균적으로보면 비슷한 가중치임.

Overall, the average correlations between Q1, Q2 and Q3 and the global score were very similar, at 0.79, 0.78 and 0.70, respectively. This indicates that, on average, each question carried about the same weight when faculty members assigned the global score.


thematic code가 점수와 연관되나?

Did the thematic codes correlate with the scores assigned?


그렇지는 않았음.

Overall, we found no correlation between the coding scheme and the scores assigned by faculty members (Fig. 2; original coding).




1차 고찰

PRELIMINARY DISCUSSION


결과가 실망스러운데, 가설을 뒷받침하지 못했기 때문. 행동보다 rationale에 더 가중치를 둔다는 근거도없음.

These results were initially disappointing, as we failed to establish evidence for our main hypotheses. Further, we found no evidence that faculty members placed more weight on student rationales than on their behaviours


scoring할 때 기존에 있었던 이론적 framework를 따르는 것 같지도 않음.

Finally, faculty members did not seem to follow the previously developed theoretical framework when scoring the responses,




2차분석

SECONDARY ANALYSIS


근거이론 분석

Grounded theory analysis of faculty interviews


목표는 교수가 학생을 평가할 때 실제고 가중치를 두는 요인이 무엇인가를 보는 것. 그리고 그 요인들을 original framework와 비교하는 것.

Our goal was to determine the factors faculty members actually weighed and considered when judging students’ responses to the scenarios and to compare these with the components of the original framework.


가장 이기적인 행동만 disavowed로 남기고, correlation 다시 계산.

Based on these data we modified our initial coding structure so that only the most seemingly ‘selfish’ implications for students were left as disavowed ,as these were unlikely to be deemed legitimate reasons for action. We then recoded the dataset using the modified framework and re-explored thecorrelations between the modified codes and the scores.




2차분석 결과

Results of the secondary analysis


교수들의 채점기준

Faculty members’ marking philosophy


세 가지 주요 주제 : 통찰, 책임, 환자 우선.

three major themes: Insight, Responsibility and Putting the Patient First.


통찰은 학생들이 어떻게 딜레마를 느끼고있는 것처럼 보였는지에 대한 교사의 의견. 표 2에 나타낸 바와 같이 교수들은 학생에 대해서 다음을 평가한다.

      • 딜레마를 인식 할 수 있는가

      • 대안을 알고 있었고 균형을 이루려 시도했는가

      • 행동 계획을 세웠는가

      • 그들의 행동의 잠재적인 결과를 알고 있었는가

      • 큰 그림을 보는가

The major theme of Insight was evolved to capture teaching staff comments regarding their impressions of how (or even if) students seemed to appreciate the dilemma at hand. As illustrated in Table 2, faculty members liked to see that students could recognise the dilemma, were aware of alternatives and made an attempt to balance them, had a plan for action, were aware of potential consequences of their actions, and appreciated the big picture.



"학습 기회를 놓치는 것이다"라는 식으로 자신에게 미치는 결과를 언급하는 것이 반드시 나쁜 것은 아님을 보여줍니다. 오히려 이것은 acceptable한데, 학생이 통찰을 가지고 있음을 나타내주기 때문이다. 이것이 행동의 원동력이 되지 않는 한 이러한 생각을 했다는 것이 다른 것보다 더 가중치를 받지는 않는다.

This illustrates that simply mentioning potential consequences for oneself, like missing out on an educational opportunity, is not necessarily bad. It can be acceptable – and indeed may indicate that the student has insight – as long as these consequences are not weighted more heavily than other considerations and as long as they do not become the driving force behind the action.


두 번째 주요 주제 인 책임은 학생들의 전반적인 책임감에 대한 교수들의 생각을 반영. 교수진은 학생들에게 다음을 보고자 했다.

      • 솔직히 행동하는지.

      • 약속을 지키기 위해 헌신하는지

      • 그들의 한계를 인식하면서 주도권을 행사하고 행동 할 준비가 되어 있는지,

      • 팀의 다른 사람들의 역할을 알고 있는지

      • 그들이 올바른 행동을 취하는 것을 피하기 위해 학생 technicalities를 사용하지는 않는지

The second major theme, Responsibility, was developed to reflect faculty discussion of students’ overall sense of responsibility. Faculty members wanted to see that students were behaving honestly, were committed to keeping their word, were prepared to take initiative and act while remaining aware of their limitations, were aware of others’ roles on the team, and did not use the technicalities of being a student to avoid taking action they felt was correct.


마지막 주제는 환자를 우선하는 두는 것이다. 환자의 권리에 우선순위를 두고, 환자와 보호자 사이의 신뢰의 중요성을 강조하는 것.

The final theme that emerged encompasses faculty members’ comments regarding students’ apparent emphasis on Putting the Patient First by prioritising patients’ rights and emphasising the importance of trust between patients and caregivers.


만약 어떤것이 진짜로 disavowed라면?

What (if anything) is truly disavowed?


위에서 설명한 바와 같이 학생의 시험 서류는 더 겉으로 드러난 심각한 반응만 disavowed로 recode하였다. 우리는 이러한 수정이 코드와 실제 점수 사이의 관계를 강화할 것이라고 가설을 세웠다. 그러나 우리는 여전히 의미있는 상관 관계를 발견하지 못했습니다 (그림 2; 수정 된 코딩).

As described above, students’ examination papers were recoded in order to leave only the more seemingly egregious responses as disavowed. We hypothesised that these modifications would strengthen the relationship between the codes and actual scoring. However, we still found no meaningful correlations (Fig. 2; modified coding).







고찰

DISCUSSION


우리의 노력은 프로페셔널리즘이 단순하고 일차원적인 척도로 잘 축소되지 않는 미묘하고 복잡한 구성이라는 가정을 강화합니다. 의사들이 의대생의 프로페셔널리즘을 판단하는 것은 해석적, 주관적, 상황적으로 달라질뿐만 아니라 사람들이 다른 사람들의 행동을 전반적으로 판단하는 방법과 매우 일치합니다. 한 명의 교수진이 설명한대로 :

our efforts reinforce the supposition that professionalism is a subtle and complex construct that does not reduce well to simple, single-dimensional scales. How attending doctors judge medical students’ professionalism is not only interpretive, subjective and context-dependent, it is highly consistent with how people judge other people’s behaviours in general. As illustrated by one faculty member:


'학생이 하는 행동은 그 학생이 그러한 행동을 하는 이유와 불일치할 수 있다... 그리고 우리는 의도를 측정 할 방법이 없으므로 항상 행동을 봐야합니다.'(F14)

‘What the student would do can be discordant with why they’re actually doing it… And… since we have no way of measuring intentions, we always have to look at the actions.’ (F14)



이 긴장감은 실제로 사회 심리학자들에 의해 수년 동안 연구되어 왔습니다. 태도와 행동 사이의 관계는 간단하지 않거나 예측 가능하지 않습니다.

This tension has in fact been studied for many years by social psychologists. The relationship between attitudes and behaviours is not simple or predictable,


행동을 판단 할 때, 우리는 그 사람과 관련된 요소를 지나치게 강조하는 경향이 있으며 행동이 발생한 상황이나 상황과 관련된 요소를 과소 평가합니다. 이를 fundamental attribution error라고합니다. 우리는 많은 경우 관찰 된 행동의 원인을 그 사람의 동기 부여, 성격, 내적 가치와 같이 그 사람에게 내재된 것으로 돌립니다. 그러나 우리는 두 가지 주요한 이유 때문에 틀릴 수 있다. 첫째, 우리는 일반적으로 사람의 행동을 뒷받침하는 다양한 이유, 특성, 가치를 인식하지 못하고 있다. 둘째, 이것들 사이의 관계를 혼란스럽게 만드는 많은 요소가 있어서, 예컨대 특정 행동을 수행하는 데 어려움이 있을 수 있으며, 외부 압력이 존재하여 특정 방식으로 행동하게 압박을 줄 수 있습니다.

When judging these behaviours, we tend to overemphasise factors related to the person and undervalue factors related to the context or situation in which the behaviours occurred. This is called the fundamental attribution error.14,15 We largely attribute observed behaviours to what we think underlies them, such as the person’s motivation, personality and inner values, but we are often incorrect, for two major reasons. We are usually not aware of the multitude of reasons, traits and values that underlie a person’s behaviours, and there are many factors that confound the relationship between them, such as the difficulty that may be involved in performing a particular behaviour and external pressures to act in one way or another.16



예를 들어, 동일한 행동이 근본적으로 다른 방식으로 판단 될 수 있음이 입증되었습니다. 그러나 이러한 연구에서 행동의 근거는 알려지지 않았으며 단지 가정되었습니다. 만약 동기 부여를 명백하게 드러내줄 수 있다면 교수가 attribution의 과정을 통해 동기 부여를 추론 할 필요가 없으므로, 교수 간의 일치도가 개선 된 것을 볼 수 있다는 가설을 설정했다. 이에 따라 학생들에게 행동의 이유를 설명하게 하였다.

It has been demonstrated, for example, that the same behaviour can be judged in radically different ways: However, in these studies the rationales behind the behaviours were unknown and only assumed. We hypothesised that if we could make the motivations explicit, we would see improved agreement between faculty as they did not have to infer a motivation through the process of attribution. This was achieved by having students provide the reasons behind the behaviours


그러나 우리는 평가자 간 신뢰도가 낮았으며 상황에서 더 중요한 점에 대해 교수들간에 여전히 상당한 불일치가 있음을 발견했습니다. 또한 교수들은 자신의 생각을 결정한 후에, 그것에 수치 점수를 부여하는 것에 어려움을 겪었습니다. 일치도가 낮은 것은 거의 놀라운 일이 아닙니다.

However, we found that inter-rater reliability remained poor and that there was still substantial disagreement among faculty about what was more important in a situation: Further, once faculty markers decided what they thought, they faced additional difficulties in assigning this a numerical score. It is hardly surprising then that agreement was so poor.


우리가 발견 한 것은 다소 고무적이었습니다. 기대했던 것보다 교수들은 학생들의 압박을 충분히 이해하고 있었고, (학생들이 자기 자신의 이익에 대해서 생각하는 것에 대해) 상당히 관대하게 생각하고 있었다. 자신에 대한 우려가 환자에 대한 우려와 균형을 이룬다면, 시험에 대한 언급조차 완전히 받아 들일 수 있었습니다. 실제로, 제대로만 된다면, 그러한 우려를 표출하는 것은 오히려 통찰의 지표로 인식되었고, 이것은 전문직의 역할에 필수적이기 때문에, 학생들의 advantage가 될 수 있었다. 그러나 역설적으로, 많은 수의 avowed principle만 언급하는 것으로는 좋은 점수를 받지 못했으며, '무릎 반사'처럼 생각없어 보였다.

What we found was somewhat encouraging. Faculty members seemed to fully appreciate the pressures that students were under and were more forgiving than we had expected them to be about students’ concerns for themselves. As long as these concerns were balanced (and outweighed) by concern for the patient, it was perfectly acceptable for students to mention them, even on an examination. Indeed, it may actually have been to students’ advantage to mention such concerns as, when done properly, faculty perceived the concerns as indicative of insight, which is increasingly recognised as essential to the professional role.8,17 Paradoxically, mentioning a large number of avowed principles did not protect against low scores and may have appeared to represent a knee-jerk response that lacked reflection.


지금까지 전문성 평가에 관한 문헌의 대부분은 신뢰할 수 있고 타당한 평가 도구를 만드는 데 중점을 두었습니다. 그러나 이러한 시도에서 비록 의사들이 학생들을 평가하고 있지만, 여전히 우리는 다른 사람들의 행동을 판단하는 사람에 불과하며, 이 때 작동하는 실제 심리학은 다른 사회적 맥락에서 작동하는 것과 다르지 않다는 단순한 사실을 고려하지 않았습니다.

To date, much of the literature on evaluating professionalism has focused on attempts to create reliable, valid assessment tools.2,18 However, in these attempts we failed to take into account the simple fact that, although we are attending doctors evaluating students, we are also people judging other people’s behaviour, and the psychology involved is really no different from that seen in other social contexts.


그러나 다른 사람들이 발견 한 것처럼, 교수에게 학생의 행동 근거에 대한 자세한 정보를 제공한다고해서 반드시 행동을 판단하기가 더 쉬워지는 것은 아닙니다.

However, as others have found,20 presenting faculty attendings with more information about students’ rationales for action did not necessarily make it easier for them to judge the action


교수진에 대한 추가 교육은 많은 이익을 얻지 못할 수도 있습니다. 골디 (Goldie) 등이보고 한 바와 같이, 문학에 '폭 넓은 동의'가 있었던 윤리적 딜레마를 포함하는 사례 ( '합의 사례'라고 함)에 대한 집중적 인 평가자 교육을 제공하더라도 여전히 평가자들 사이에 일치도는 낮았다.

Further training of faculty members is also unlikely to be of much benefit. As Goldie et al.20 have reported, providing intensive rater training around cases involving ethical dilemmas in which there was ‘broad agreement’ in the literature (referred to as ‘consensus cases’) still resulted in poor agreement between raters.


대신, 척도에 전혀 의존하지 않는 평가와 같이, 이러한 유형의 평가에 대한 완전히 다른 대안을 모색하는 것이 더 유익 할 것입니다.

Instead, it may be more fruitful to explore entirely different alternatives to these types of evaluations, such as those that do not rely on scales at all.






 2009 May;43(5):414-25. doi: 10.1111/j.1365-2923.2009.03335.x.

From behaviours to attributions: further concerns regarding the evaluation of professionalism.

Author information

1
Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada. shiphra.ginsburg@utoronto.ca

Abstract

OBJECTIVES:

This study aimed to explore faculty attendings' scoring and opinions of students' written responses to professionally challenging situations.

METHODS:

In this mixed-methods study, 10 pairs of faculty attendings (attending physicians in internal medicine) marked responses to a professionalism written examination taken by 40 medical students and were then interviewed regarding their scoring decisions. Quantitatively, inter-rater scoring agreement was calculated for each pair and students' global scores were compared with a previously developed theoretical framework. Qualitatively, interviews were analysed using grounded theory.

RESULTS:

Inter-rater reliability in scoring was poor. There was also no correlation between faculty's scores and our previous theoretical framework; this lack of correlation persisted despite modifications to the framework. Qualitative analysis of faculty attendings' interviews yielded three major themes: faculty preferred responses in which students expressed insight, showed responsibility, and ultimately put the patient first. Faculty also expressed difficulty in deciding what was more important (the behaviour or the rationale behind it) and in assigning numerical scores to students' responses. Interestingly, they did not downgrade students for mentioning implications for themselves as long as these were balanced by other considerations.

CONCLUSIONS:

This study attempted to overcome some of the instability that results when we judge behaviours by making the rationales behind students' behaviours explicit. However, between-faculty agreement was still poor. This reinforces concerns that professionalism, as a subtle and complex construct, does not reduce easily to numerical scales. Instead of concentrating on creating the 'perfect' evaluationinstrument, educators should perhaps begin to explore alternative approaches, including those that do not rely on numerical scales.

Comment in

PMID:
 
19422488
 
DOI:
 
10.1111/j.1365-2923.2009.03335.x


+ Recent posts