행간 읽기: 서술적 평가 코멘트에 대한 교수들의 해석(Med Educ, 2015)

Reading between the lines: faculty interpretations of narrative evaluation comments

Shiphra Ginsburg,1 Glenn Regehr,2 Lorelei Lingard3 & Kevin W Eva2





도입

INTRODUCTION


HPE에서 많은 일들이 일어나며, 이는 수습생의 이야기에 대한 네러티브 코멘트가 영향력있는 역할을한다. 예를 들어, 병동 기반 트레이닝 평가 보고서 (ITER)는 또한 승급 및 교정에 관한 결정을 내리는 프로그램 디렉터에게 평가 목적에 필요한 정보를 제공합니다.

There are many circumstances in health professions education in which narrative commentary on a trai- nee’s performance plays an influential role. For example, on ward-based in-training evaluation reports (ITERs) comments also serve more evaluative purposes such as communicating tothe programme director information that can sup- port decisions about promotion and remediation.2


전문가의 주관적이고 서술적인 의견 사용에 대해 Hodges는 평가에 대한 후기-심리측정적 접근법으로 묘사하였으며, '포괄적 인 평가에서 '신뢰할 수있는 의사 결정에 필수 불가결'로 제시되었다.

These uses of experts’ subjective, narrative comments regarding trainee performance – described by Hodges as ‘post-psychometric’ approaches to evaluation3 – have recently been put forward as ‘indispensable for trustworthy decision making in summative assess- ments’.4


네러티브 코멘트를 해석하는 복잡성은 의학 교육에서 잘 설명되어 있습니다. 연구원은 응급 의학 레지던시 프로그램에 지원하는 신청자의 학장의 편지에서 "good"이라는 단어가 실제로 '평균 이하'라는 단어라는 결론을 내 렸습니다. 방사능 재난에 대한 적용 패키지에 대한 또 다른 연구는 '우수'라는 단어가 의학 학교에서 가장 많이 사용 된 적이없는 것으로 밝혀졌으며, 절반 이상의 학교에서는 'excellent'한 학생이 하위 50%에 해당할 수 있습니다.6

The complexity of interpreting narrative comments is well documented in medical education. In one studyof deans’ letters for applicants applying to an emer-gency medicine residency programme, researchers  concluded that the word ‘good’ was actually a code word for ‘below average’. Another study of application packages to a radiologyresidency found the word ‘excellent’ was never usedby medical schools with reference to the top cate- gory of students and, for more than half the schools, an ‘excellent’ student could be in the  bottom half of the class.6


ITER 코멘트는 성과 예측이나 재교육 필요성을 판단하는데 가치가 있 .8,9 이러한 연구 결과는 평가 언어의 작성 및 해독과 관련하여 비교적 잘 이해 된 '숨겨진 코드'가 있음을 시사한다.

it appears that ITER comments may have value in predicting perfor- mance or need for remediation.8,9 In combination, such studies suggest that there may be a relatively well-understood ‘hidden code’ involved in writing and deciphering assessment language.


의사 소통 언어로 알려진 언어학의 이론은 의사 소통에 문자 적 ​​의미를 넘어서는 언어의 특징이 어떻게 사용되는지 이해하는 데 도움이 될 수 있습니다. 영어로 흔히 볼 수있는 비문 상적인 의사 소통의 잘 알려진 예에는 아이러니, 풍자 및 은유가 포함됩니다 .10 이러한 비-문자적 ​​의미를 정확하게 해석하는 능력은 누가, 누구에게, 무엇을 어떤 목소리로, 어떤 설정에서, 어떤 목적으로 말하는지 등등에 따라 달라진다..

Theory from the branch of linguistics known as prag- matics can help us understand how features of lan- guage beyond literal meaning are used for communication. Well-known examples of non-literal communication, which is common in English, include irony, sarcasmand metaphor.10 The ability to correctly interpret these non-literal meanings depends heavily on context, including awareness of who is speaking, to whom, in what tone of voice, in what setting, for what purpose, and so forth.



METHODS


The data collected for the analysis described here were generated during interviews of participants immediately after they had completed a ‘narrative ranking’ task which is described in full in Ginsburg et al.7



Materials


Each resident in our IM programme receives approximately eight or nine ITERs per year, each of which contains 19 items rated on a 5-point scale and a box for free-text comments that asks the per- son completing the ITER to: ‘Provide a general impression of the trainee’s development during this rotation, including general competence, motivation and consultant skills. Please emphasise strengths and areas that require improvement.’


The 63 PGY1 and 63 PGY2 documents were sepa- rately assigned to 12 packages of 15 or 16 docu- ments each so that no two packages were alike and each document appeared in three packages. The decision to include 15 or 16 documents for each rater was based on previous work indicating that this is a reasonable number of narratives to categorise and rank-order within a timeframe considered appropriate by participants.7 The decision to use three raters per resident document for each PGY set resulted in a required sample size of 24.




Participants and procedure


To be included in the study, physician participants were required to have attended an in-patient IM ser- vice at any of our university’s teaching hospitals and to have at least 2 years of experience in evaluating residents. This led to a list of approximately 60 eligi- ble faculty attendings, from which we recruited 24 attending physicians. The resulting sample con- tained 14 men and 10 women, with an average of 9.3 years of experience (range: 2–33 years).


In a one-to-one setting, participants were oriented to the four categories describing residents’ perfor- mance that were developed in a previous study: 

      • A = outstanding, excellent, exemplary; 

      • B = solid, safe, may need some fine tuning; 

      • C = borderline, bare minimum, remediable, and 

      • D = unsafe, unac- ceptable, multiple deficits.7,11 

Their first task was to categorise the 15 or 16 residents in their package by placing as many in each category as they wished. They were then asked to rank-order the residents within each category.


Subsequent to this process, each participant was interviewed by a single research assistant, who had qualitative research experience in education but was not involved in any way with our residency pro- gramme and was thus unknown to participants. One pilot interview was co-conducted with the lead author, but because no changes were made to the protocol afterwards, this interview was included in our dataset. During each semi-structured interview, participants were asked about the ranking process, how they had decided to place the residents in the four categories and rank-order them, how they had made cut-point decisions (i.e. how they had decided whether to place a resident at the bottom of one category or at the top of another), and what lan- guage in the comments had influenced their deci- sions. They were also asked to provide comments on the ITERs in general. The entire task took approximately 90 minutes per participant and the interview portion lasted 15–30 minutes. Interviews were audiotaped, transcribed and anonymised.



Analysis


The transcripts were analysed using principles of constructivist grounded theory.12 As sensitising con- cepts, we considered that participants may have been influenced by such factors as the strength of adjectives used, the mention of particular compe- tency domains, and the presence of ‘lukewarm’ lan- guage that may be interpreted negatively.13 SG conducted the primary analysis using a line-by-line approach to identify codes that were then grouped into themes. We used a constant comparative approach to coding in an iterative fashion, whereby each transcript was read numerous times to look for confirming or disconfirming examples in a process that continued until the coding structure appeared stable and sufficient (i.e. until no new codes emerged after multiple reads).14 The codebook (the coding framework with definitions and examples) was then presented to three other members of the research team along with several uncoded tran- scripts. Each team member read the transcripts before reviewing the codebook and provided critical feedback on the codes and their interpretation. No substantive changes to the coding were made during this process; rather, feedback was used to further clarify and define existing codes. NVivo Version 10.0 (QSR International Pty Ltd, Melbourne, Vic., Australia) was used to organise the data and facili- tate coding.




RESULTS


인터뷰 내역서 150 페이지를 분석 한 결과 참여자가 순위 결정 및 분류 판단에 어떻게 도달했는지 이해할 수있는 프레임 워크를 제공하는 몇 가지 주제가있었습니다. 참여자들이 ITER 코멘트를 읽고 해석하는 방법을 설명하는 가장 중요한 주제는 '회선 사이 읽기'라고했습니다.

Analysis of the 150 pages of interview transcripts resulted in several themes that provide a frame- work for understanding how participants came to their rank-ordering and categorisation judgements. The overarching theme, which explains how partic- ipants read and interpreted the ITER comments, we called ‘reading between the lines’.



행간 읽기

Reading between the lines


모든 참가자는 내러티브 코멘트를 이해하려면 행간을 읽어야 한다고 말했다.

All participants either directly or indirectly expressed a need to read between the lines when attempting to understand narrative comments:


'해석'이라는 단어는 공통적이었습니다.

The word ‘interpret’ or variations thereof were com- mon in participants’ responses:


참가자들은 완곡 어법 (euphemisms)

Participants also noted euphemisms,


일부는 '좋은'것으로 보이는 것이 실제로 '나쁘다'고 말했습니다.

Some commented that what appears to be ‘good’ is actually ‘bad’:


코멘트 언어가 액면 그대로 사용되어서는 안되며 실제 의미가 숨겨져 있는 descriptor가 많았다.

The data abounded with such descriptions of how language should not be taken at face value and that the real meaning was implicit:


요약하면, 이 예제들은 언어가 액면 그대로 사용되지 않았으며, 서로 암묵적으로 공유되는 코드가 있었고, 참가자들은 과거의 비슷한 경험을 토대로 단어를 지속적으로 '번역'하였다.

In sum, these examples demonstrate that language was not taken at face value and that there is an implicit code that was shared, with participants ‘translating’ words consistently based on their past experiences with similar comments.


다만 코멘트를 해독하는 것은 참가자가 특정 언어 단서를 찾았다고 나타내는 적극적인 과정임을 유의해야합니다. 이들은 자주 무수한 코멘트 속에서 부정적이거나 긍정적인 것을 나타내는 "red fleg"의 적절한 단서를 스캐닝한다고 말했다.
The specific factors that fed into this code will be explored below. Beforehand, however, it is impor- tant to note that the decoding of comments was an active process in which participants indicated that they sought particular language cues. They fre- quently mentioned scanning for ‘red flags’, both positive and negative, to help them find the relevant cues in a sea of comments.

수많은 'red flag'가 다수 참여자들로부터 일관되게 나타났으며, 여기에는 잠재적인 문제 (예 : '좋음', '괜찮음') 또는 수퍼 스타 ( '모범적 인', '치프 레지던트 수준') 등이 있다..

Numerous red flag words or phrases were consis- tently identified by participants, suggesting either potential problems (e.g. ‘good’, ‘solid’) or super- stars (‘exemplary’, ‘chief resident material’). Table 2 shows further examples.



판단에 영향을 주는 구체적 요인들

Specific factors influencing judgements


참가자들이 각 문장을 읽으면서 위에서 언급 한 언어 단서와 더불어 몇 가지 특정 요소가 그들의 판단에 영향을 미치는 것으로 보였다 (표 1).

As participants read between the lines of the com- ments, several specific factors in addition to the lan- guage cues described above appeared to influence their judgements (Table 1).



일관성

Consistency


참가자들은 정기적으로 의견의 일관성에 영향을받는 것으로보고했습니다. 모든 인터뷰에는 여러 로테이션 및 평가자 또는 도메인 전반에 걸쳐 일관성에 대한 여러 참조가 포함되었습니다.

Participants regularly reported being influenced by the consistency of the comments: every interview contained multiple references to consistencies over time, across different rotations and evaluators, or across domains.


'다중 평가자'와 '매 로테이션마다'는 일관된 성과를 나타냈다. 참가자들에게 도메인 간의 일관된 성능 유지 또한 중요했습니다.

the ‘multiple evaluators’ and ‘in every single rotation’ signalled consistent perfor- mance. Consistency of performance across domains was also important to participants,


로테이션과 영역에 걸쳐 일관되게 긍정적 인 의견이 있었음에도 불구하고 참가자들의 의견 모순에 대한 해석이 다양했다. 일부 참가자는 일관성이 낮을 경우 이것이 레지던트가 전반적으로 weak할 것을 우려했다.

Although the presence of consistently positive comments across rotations and domains was inter- preted favourably, participants’ interpretations of inconsistency in comments varied. Inconsistencies were a concern for some participants, to whom they suggested that the resident might be weaker overall.


정확히 어떻게 비일관된 코멘트를 조율해야하는지가 긴장의 원인이 될 수 있습니다. 예를 들어, 한 참가자는 부정적인 의견에도 불구하고 D 카테고리에 레지던트를 두지 않은 이유를 설명하기 위해 애를 썼다. 마침내 '누군가가 자신이 정말로 좋다고 생각하기 때문에'(I2)라고 이유를 설명했다.

Exactly how inconsis- tencies should be reconciled could be a source of tension. For example, one participant struggled to explain why he or she did not put a resident in category D despite negative comments, finally con- ceding ‘...because someone thinks they’re really good’ (I2).



역량 영역

Competency domains


코멘트에 등장한 역량 영역은 참가자의 해석 및 순위 판단에도 영향을 미쳤습니다. 지식에 관한 코멘트는 특히 수월성의 마커로 간주되었는데, '탁월한 지식없이 A등급을 줄 수는 없다'는 대표적인 것이다(I14). 반대로, 지식과 관련된 '눈에 띄는 결함'은 의심의 여지가 있습니다. 특히 레지던트가 자신이 얼마나 열심히 일했는지에 대한 의견을 받았지만 지식 기반에 대한 의견이없는 경우 특히 그렇습니다. 사실 직장 윤리 ( '열심히', '위대한 노력'등)의 '내재적 역량'15에 대한 의견은 '누구한테나 해주는 좋은 말'로 해석되어 (I21) 특히 도움이되지 않는다고 생각되었습니다 . 그러나 지식이 언제나 카테고리를 구분하는 주요 원인이되지는 않았습니다.

The domain of competency featured in a comment was also influential to participants’ interpretation and ranking judgements. Comments about knowl- edge were specifically viewed as markers of excel- lence, illustrated in the representative assertion that ‘...you can’t be an A without outstanding knowl- edge’ (I14). Conversely, ‘conspicuous absences’ related to knowledge raised suspicions, particularly if a resident had received comments about how hard he worked but none about his knowledge base. Indeed, sometimes comments about the ‘implicit competency’15 of work ethic (‘hardworking’, ‘great effort’, etc.) were interpreted as ‘those nice things you say about everyone’ (I21) and thus were thought to be particularly unhelpful. However, knowledge was not always the primary trigger for categorising;



코멘트의 구체성

Specificity of comments



보다 구체적이고 자세한 설명은 작가가 실제로 알고 거주자와 시간을 보냈던 표지로 해석되었다. 그러므로 이러한 주장은 더 믿을 만하고 더 많은 무게를 지녔다.

More specific and detailed comments were inter- preted as signs that the writer really knew and had spent time with the resident; therefore, these com- ments were seen as more credible and carried more weight:



대조적으로, 일반적인generic 코멘트는 신용 할 여지가 적고 의심스러운 것으로 인식되었습니다. 참여자들은 어떤 조직에 대해서도 글을 쓸 수 있었기 때문에 유용한 정보를 전달하지 못했다고 느꼈다.

By contrast, generic comments were seen as less credible and were perceived as suspect. Participants felt that they could have been written about any- body and thus did not convey any useful informa- tion.


일반적인 언어를 싫어하는 것은 적기에 대한 스캔 전략을 설명 할 수 있습니다. 일부에서는 일반적인 주석으로 인해 여러 줄 사이의 추가 읽기가 발생하여 부정적인 해석이 생길 수 있습니다.

The dislike of generic language may explain the strategy of scanning for red flags. For some, generic comments led to further reading between the lines, potentially resulting in a negative interpretation:


(코멘트의) 양

Quantity


인터뷰 대상자는 주어진 거주자에 대한 의견의 수량에 대해 종종 언급했지만, 훌륭한 레지던트와 문제가 많은 레지던트 모두에게 해당하는 것이기 때문에 레지던트의 품질보다는 의견의 신뢰성을 나타내는 것으로 간주하는 것처럼 보였다. 더 긴 코멘트는 더 많은 노력을 들여 글을 썼다는 인상을 주었다. 따라서 레지던트를  얼마나 잘 알고있는지 또는 작성자가 얼마나 기꺼이 노력을 들였는지를 나타내는 것으로 해석 할 수 있습니다.

Interviewees often remarked on the quantity of comments for a given resident, but seemed to regard this as an indication of the credibility of the comments rather than of resident quality as lengthy comments were seen for both outstanding and problematic residents. Longer comments gave the impression that greater effort had gone into writing them; therefore, they could be interpreted as indi- cating how well the resident was known by the wri- ter or how much effort the writer had been willing to expend.


맥락적 요인

Contextual factors


평가자 정체성, 로테이션 유형 및 타이밍과 같은 ITER 의견의 해석에 영향을 미치는 세 가지 중요한 상황 요인이 발생했다.

Three important contextual factors that influenced the interpretation of ITER comments arose: evalua- tor identity, rotation type, and timing.


평가자의 정체성과 주치의에 따라 글쓰기 스타일이 현저하게 다를 수 있다고 지적했다. 일부는 수식어가 많은 언어를 사용하고, 다른 언어는 더 간결합니다. 어떤 사람들은 최상위 형용사를 사용하지만 다른 사람들은 그렇지 않습니다. 평가자가 알려지지 않았고 동일한 사람이 주어진 거주자에 대해 하나 이상의 의견을 작성한 것 같지 않았기 때문에 참여자는 연구 과제의 좌절 한 측면을 발견했습니다.

Regarding evaluator identity, many participants noted that the style of writing might differ markedly between different attending doctors: some write more, others less; some use flowery language, others are more terse; some use superlative adjectives, oth- ers do not. As the evaluator was not known, and it was unlikely that the same person had written more than one comment for a given resident, participants found this a frustrating aspect of the research task:


많은 참가자들은 로테이션 유형을 아는 것이 해석에 필수적이라고 느꼈습니다. 일반적인 내과 (GIM) 순환에서 유래 한 의견은 하위 스페셜티에서 얻은 의견, 특히 짧은 로테이션 블록에서의 의견보다 중요했습니다.

Many participants also felt that knowing the rotation type was essential to their interpretation. Comments derived from a general internal medicine (GIM) rotation carried more weight than comments obtained from a subspecialty, especially those for which attending blocks are shorter:



세 번째 문맥 적 요소는 타이밍이었다. 예를 들어, 많은 참가자들은 특정 의견이 도출 된시기를 주목하는 것이 중요하다고 생각했습니다.

The third contextual factor was timing. For exam- ple, many participants thought it was important to note the time of year at which certain comments were derived:


시간이 지남에 따라 개선되지 않으면 주민의 부분에 대한 통찰력이 부족할 수 있습니다. 대조적으로, 참가자들은 상주 공연의 부정적인 특징을 암시하는 것으로 변화를 나타내는 동사 (예 : '발전', '발전', '계속', '진화')의 사용에 대해 반복적으로 논평했다.

Lack of improvement over time might suggest a lack of insight on the resident’s part. By contrast, partici- pants recurrently commented on the use of verbs indicating change (e.g. ‘improving’, ‘developing’, ‘continues’, ‘evolving’) as implying a negative char- acteristic of resident performance:



ITER에 대한 일반적 코멘트

General comments about ITERs


많은 사람들은 ITER가 레지던트들이 개선 될 수 있도록 형성 피드백을 제공하는 수단을 대표한다고 생각했다. ITER의 목적에 대한 논의에서 '피드백'이라는 단어가 반복적으로 등장했다. 다른 사람들은 실제로 의사들이 참석 한 의사 결정 과정에서 의사 결정 과정이나 의사 결정 과정에서 훨씬 더 건설적인 피드백을 제공한다고 말하면서 ITER를 순수 총합계로보고 레지던트의 '최종 판단'으로 간주해서, 모든 것을 문서화 할 필요는 없다고 지적했다.

Many considered the ITER to represent a means of providing formative feedback so that resi- dents could continue to improve. The word ‘feed- back’ arose repeatedly in discussions of the purpose of the ITER. Others noted that in practice, attend- ing physicians provide much more constructive feed- back during the rotation or in a discussion setting and do not necessarily document everything on the form, viewing the ITER as purely summative, a ‘final judgement’ of a resident’s performance.



고찰

DISCUSSION


공통의 디코딩 전략이 명백하게 존재 함에도 불구하고, 코딩 된 언어의 사용은 문제가 아니 었습니다. 우리 참가자들은 Lye등의 연구와 같이 모호하고 일반적인 의견을 해석하는 데 어려움을 겪었으며 종종 레지던트의 성향에 초점을 두었다. 저자는 소아과 실력 평가에서 가장 흔한 구절은 '쾌적한 / 일하기를 좋아하는 것'이었는데, 결과적으로 의대생으로서 성공하지 못한 것에 대해 경각심을 품은 결과였습니다. 이 연구에서 특정 임상 기술과 관련된 의견은 31 %에서만 발견되었다 .16 Ginsburg 등은 IM 레지던트들의 ITER에 대한 서면 의견의 내용 분석에서 레지던트의 '태도 또는 성향'은 흔히 있었으며, 이는 역량에 직접적으로 연결되지 않은 다른 해설들도 그러했다. 해석에 종속되는 모호하고 배열적인 논평의 문제는 의학에 고유하지 않으며 다른 고등교육영역에서도 발견 될 수있다. 17-19

Despite the apparent existence of shared decoding strategies, the use of coded language was not unpro- blematic. Our participants claimed to struggle with interpreting vague and generic comments, often focused on the resident’s disposition, thereby echo- ing a study by Lye et al., 16 in which the authors found that the single most common phrase in pae- diatric clerkship evaluations was ‘pleasant/a plea- sure to work with’, a result they considered alarming for its irrelevance to success as a medical student. In that study, comments related to specific clinical skills were found only 31% of the time.16 Similarly, Ginsburg et al., in a content analysis of written comments on IM residents’ ITERs, found that comments about a resident’s ‘attitude or dispo- sition’ were common, along with other commentary not linked directly to competencies.15 The problems associated with the writing of vague, dispositional comments that are subject to (mis)interpretation are not unique to medicine and can be found else- where in higher education.17–19


그러나 다른 사람들이 지적했듯이, 우리의 데이터가 보여주는 바와 같이, ITER는 여러 목적을 동시에 수행하고 있으며, 그 중 일부는 상당한 사회적 복잡성을 수반 할 수 있습니다. 잠재적 인 사회적 목적 중 하나는 레지던트의 '체면'(즉, 자신이 가진 긍정적 인 이미지)에 주의를 기울이는 것일 수 있습니다. 공손함에 대한 이론에 따르면, 팀에 큰 가치가 있다고 인식되는 긍정적 인 기술을 강조함으로써 교수들은 레지던트가 '체면을 차리'거나 자신의 긍정적 자아상을 유지 또는 향상키도록 도와줄 수 있다. 이렇게 할 수 있는 것은, 작성자 입장에서 독자가 자신의 의견을 정확하게 해석 할 수있는 코드를 공유하고 있다고 생각하기 때문에, 의도 한 메시지를 보내면서도 레지던트의 체면도 챙겨줄 수 있다고 생각하기 때문이다.

However, as others1 have noted, and as our data show, it is likely that the ITER is serving multiple purposes simultaneously, some of which may involve considerable social complexity. One potential social purpose may be to attend to residents’ ‘face’ (i.e. the positive image a person has of him or herself). According to theories of politeness,20 by emphasis- ing positive skills that are perceived to be of great value to the team – such as being hardworking, pleasant to work with and possessing ‘those other basic qualities that, if you’re a good person, you get’ – faculty attendings may be allowing residents to ‘save face’, or to maintain or enhance their positive self-image. It is possible that faculty members are able to do this because they believe readers share the code for interpreting their comments accurately, and thus they can attend to residents’ face while still sending their intended message.


여기에서 관련이있을 수있는 두 번째 공손 개념은 'conventional indirectness'으로 알려져 있으며, 의도적으로 '문자 적 의미와는 다르지만, 문맥상으로는 모호하지 않은' 표현을 사용한다. 이는 왜 '좋은', '단단한'및 '기대 충족'과 같은 단어가 실제로는 부정적인 의미를 나타내는 단어가 아님에도, 교수가 경계선 이하의 평균치 이하의 수행 능력을 전달하려는 의도로 이해되는지를 보여준다.5,6 독자에게는 의미가 명확하게 보이지만, 이 용어에 대한 레지던트의 해석이 알려지지 않았다는 점에 주목하는 것이 중요합니다. 레지던트가 그 용어를 액면 그대로 이해하면, 자신의 퍼포먼스가 얼마나 향상되어야 하는지를 깨닫지 못할 수도 있다. 만약 레지던트가 이 용어를 액면 그대로 사용하지 않는다면, 그 비용은 교수진이 보존해주려고 했던 '체면'의 손실 일 수 있습니다.

A second politeness concept that may be relevant here is known as ‘conventional indirectness’ and refers to the use of phrases that, by virtue of con- vention, ‘have contextually unambiguous meanings which are different from their literal meanings’.20 This can explain why words and phrases such as ‘good’, ‘solid’ and ‘meets expectations’ are under- stood as intending to convey performance that is borderline or below average without requiring the attending doctor to actually use those undesirable terms.5,6 Although these meanings seem clear to physician readers, it is important to note that resi- dents’ interpretations of these terms are unknown. If residents take the terms at face value, they may not appreciate the degree to which their perfor- mance could be improved. If they do not take the terms at face value, the cost of their understanding the code may be the loss of ‘face’ that faculty mem- bers seek to help them preserve.


두 경우 모두 ITER 코멘트는 유용하지만 코드가 보편적이지 않고 저자의 내용을 완전히 이해하지 않으면 해독하기 어렵다는 것을 분명하게 나타냅니다.

In either case, the data collected in this study clearly indicate that, although it is generally useful, the code is not universal and is difficult to decipher without a full understanding of the author’s con- text.


참가자들은 전반적인 퍼포먼스 저하를 반영하여 개선 할 부분 (또는 이전에 제정 된 변경 사항)을 나타내는 언어 큐를 포착했습니다. 물론 레지던트 교육에 대한 문제가 제기됩니다. 숨겨진 코드가 있지만 불완전하게 이해되고 적용되는 경우, 레지던트는 개선되지 않으면 나빠질 수 있지만, 설령 반대로 개선이 되더라도 마찬가지로 나쁘게 보일 수도 있습니다. 더 문제는, 그녀의 개선 내용이 문서화되어도 부정정 평가를 받을 수 있다는 것이다. 이것은 ITER의 의도 된 목적과 실제 (또는 인식 된) 용도 사이의 불일치 문제를 강조한다.

Our participants picked up on lan- guage cues indicating areas for improvement (or previously enacted change) as reflective of a weaker performance overall. This of course raises issues for resident education. If there is a hidden code but it is imperfectly understood and applied, a resident might look bad if she doesn’t improve, but equally bad if she does. More to the point, she could look bad if her improvement is documented. This high- lights the problem of a misalignment between the intended purpose of the ITER and its actual (or perceived) use.


어떤 평가 도구에 대해서도 이러한 정렬 불일치는 '임의적 판단'22의 위험을 증가시킬 수 있으며, 따라서 이 도구가 실제로 어떻게 사용되고 해석되는지 이해하는 것이 매우 중요합니다. 실제로 우리 참가자들은 감독자가 누구인지 (예를 들어, 모든 연수생을 위해 개선 영역을 문서화했는지 여부) 알지 못하면 이러한 '균형 잡힌'의견을 해석하는 방법이 확실하지 않다는 우려를 표명했습니다.


For any assessment instrument, such misalignment can increase the risk for ‘arbitrary judgement’22 and thus it is critically important to understand how the instrument is actually being used and interpreted. Indeed, our participants expressed concern that without knowing who the supervisor was (and whether, for example, he or she documents areas for improvement for all trainees), they were not certain how to interpret these ‘balanced’ comments.


이것은 언어학의 추가적인 개념을 고려한 것으로서, 의사가 논평의 문맥에 대한 완전한 지식을 갖지 못한 것에 대해 의사가 표현한 좌절감을 설명하는 데 도움이 될 수 있습니다. 언어 실용주의자들은 특정 단어 및 어구의 의미를 이해하는 데 필요한 문맥 정보를 deixis로 표시했습니다. 한 가지 유형의 묵시(deixis)는 이야기의 이해에 필수적인 사람, 장소 또는 시간에 대한 지식을 가리킨다 .23 우리 참가자들은 끊임없이 이러한 정보를 원했고 정보가 없을 때는 적절히 코멘트를 평가할 수 없었다. 그러나 이것은 교수들이 ITER코멘트를 기반으로 레지던트의 순위를 매기는 것에 대한 자신감이 그들의 실제 능력보다 더 과장되었음을 보여준다. 즉, 이 'deictic marker'는 실제의 필요보다 인식된 필요를 더 나타낸다.

This leads to consideration of an additional concept from linguistics that may help to explain the frustra- tion expressed by attending physicians over not hav- ing full knowledge of the context in which the comments arose. Linguistic pragmatists have labelled the idea that contextual information is necessary to understand the meaning of certain words and phrases as ‘deixis’. One type of deixis refers to knowl- edge of the person, place or time as essential for understanding a narrative.23 Our participants rou- tinely expressed a desire for more information along these lines and felt that, in its absence, they were unable to properly assess the comments. However, this may speak more of their confidence in rank- ordering the residents than of their actual abilities to do so (i.e. these ‘deictic markers’ may represent a perceived necessity rather than an actual need).



결론적으로, ITER 코멘트를 평가하는데 중요하다고 느껴지는 정보들이 부재했음에도 참여자들이 의견을 바탕으로 레지던트를 높은 신뢰도로 순위를 매길 수 있었던 이전 연구의 결과는 놀랍다.7 이는 행간을 의견을 디코딩하는 전략은 참여자간에 매우 일관된 것으로 보입니다.

In sum, the multiple apparent purposes expected of ITER comments, the idiosyncratic faculty writing styles, and the absence of what is felt to be key infor- mation in many ITER comments make it surprising that participants as demonstrated in previous work were able to reliably rank-order residents based on comments alone.7 Their strategy of reading between the lines and decoding the written comments appears to have been remarkably consistent across participants.



Limitations


CONCLUSIONS



참가자가 '행간을 읽는' 능력은 그들이 어떻게 서면 의견을 말하고 어떻게 효과적으로 레지던트들을 진단 할 수 있었는지를 설명합니다. 그러나 이 전략은 또한 상황 해석 정보가 누락되거나 유추되는 경우 특히 다양한 해석이 쉽게 발생할 수있는 메커니즘을 제안합니다.

Participants’ ability to ‘read between the lines’ explains how they made sense of written comments and how they were able to effectively cat- egorise residents. However, this strategy also sug- gests a mechanism whereby variable interpretations can easily arise, particularly when contextual infor- mation is missing and inferred.


4 Govaerts M, van der Vleuten CPM. Validity in work- based assessment: expanding our horizons. Med Educ 2013;47 :1164–74.


7 Ginsburg S, Eva KW, Regehr G. Do in-training evaluation reports deserve their bad reputations? A study of the reliability and predictive ability of ITER scores and narrative comments. Acad Med 2013; 88 :1539–44.


12 Charmaz K. Coding in grounded theory practice. Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. London: Sage Publications 2009;42–71.


13 Frohna A, Stern D. The nature of qualitative comments in evaluating professionalism. Med Educ 2005;39 :763–8.







 2015 Mar;49(3):296-306. doi: 10.1111/medu.12637.

Reading between the linesfaculty interpretations of narrative evaluationcomments.

Author information

1
Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

Abstract

OBJECTIVES:

Narrative comments are used routinely in many forms of rater-based assessment. Interpretation can be difficult as a result of idiosyncratic writing styles and disconnects between literal and intended meanings. Our purpose was to explore how faculty attendings interpret and make sense of the narrative comments on residents' in-training evaluation reports (ITERs) and to determine the language cues that appear to be influential in generating and justifying their interpretations.

METHODS:

A group of 24 internal medicine (IM) faculty attendings each categorised a subgroup of postgraduate year 1 (PGY1) and PGY2 IM residents based solely on ITER comments. They were then interviewed to determine how they had made their judgements. Constant comparative techniques from constructivist grounded theory were used to analyse the interviews and develop a framework to help in understanding how ITER language was interpreted.

RESULTS:

The overarching theme of 'reading between the lines' explained how participants read and interpreted ITER comments. Scanning for 'flags' was part of this strategy. Participants also described specific factors that shaped their judgements, including: consistency of comments; competency domain; specificity; quantity, and context (evaluator identity, rotation type and timing). There were several perceived purposes of ITER comments, including feedback to the resident, summative assessment and other more socially complex objectives.

CONCLUSIONS:

Participants made inferences based on what they thought evaluators intended by their comments and seemed to share an understanding of a 'hidden code'. Participants' ability to 'read between the lines' explains how comments can be effectively used to categorise and rank-order residents. However, it also suggests a mechanism whereby variable interpretations can arise. Our findings suggest that current assumptions about the purpose, value and effectiveness of ITER comments may be incomplete. Linguistic pragmatics and politeness theories may shed light on why such an implicit code might evolve and be maintained in clinical evaluation.

PMID:
 
25693989
 
DOI:
 
10.1111/medu.12637


+ Recent posts