평가 프로그램의 철학적 역사: 변화해온 윤곽의 추적(Adv Health Sci Educ Theory Pract. 2021)

Meded. 2021. 11. 18. 05:13

2021. 11. 18. 05:13

평가 프로그램의 철학적 역사: 변화해온 윤곽의 추적(Adv Health Sci Educ Theory Pract. 2021)
A philosophical history of programmatic assessment: tracing shifting configurations
J. Pearce1 · W. Tavares2

소개: 왜 철학적 역사인가?
Introduction: why a philosophical history?

HPE에서는 프로그램 평가가 보편화되었다. 이 주제에 대한 학술 문헌이 풍부하고 의과대학과 전문 훈련대학은 점점 더 프로그래밍식 평가를 전면적으로 시행하거나 접근법의 측면을 평가 프레임워크에 통합하고 있다. 프로그래밍 접근방식은 [평가 증거를 결합하여 프로그램 수준에서 평가를 최적화]한다. 개별 평가에서 합격/불합격 결정을 제거하고(데이터포인트로 처리), 학습자가 다양한 방법으로 평가(및 지속적인 피드백 제공)를 받을 수 있도록 보장하며, 축적된 증거를 검토하는 즉시 전문 심사위원이 역량에 대한 높은 결정을 내려야 한다. 그러나, 분명해지겠지만, 이것은 상황을 설정하기 위한 프로그램 평가의 작업 정의일 뿐입니다. 우리는 프로그램 평가가 유동적인 개념이며, 프로그램 평가를 구성하는 것이 시간이 지남에 따라 변화했다고 주장한다.
Programmatic assessment has become ubiquitous in health professions education. The academic literature on the topic is abundant, and medical schools and specialist training colleges are increasingly implementing programmatic assessment in full, or incorporating aspects of the approach into their assessment frameworks. A programmatic approach in health professions education optimises assessment at a programme level by combining assessment evidence. It removes pass/fail decisions from individual assessments (treated as datapoints), ensures that learners are assessed (and given constant feedback) with a variety of methods over time, and requires high-stakes decisions regarding competence to be made by expert judges upon reviewing accumulated evidence. However, as will become clear, this is merely a working definition of programmatic assessment in order to set the scene. We argue that programmatic assessment is a fluid concept, and what constitutes programmatic assessment has changed over time.

지지자들은 초기 형태의 심리 측정 테스트에서 나온 프로그램 평가의 출현에 대한 서술적 이야기를 되짚고, 그것이 해결한 문제와 긴장을 개략적으로 설명한다(슈비르트 & 반 데르 블뢰텐, 2019). 의학 교육 교과서에는 이제 프로그래밍 평가에 관한 장이 수록되어 있다(Van der Vluten 등, 2017, 2020). 2020년 오타와 회의는 이 주제에 대한 첫 번째 '합의문'을 준비했다. 간단히 말해서, 우리는 [거의 의심받지 않는 프로그램적 접근법]으로 구성된 의학 교육의 평가의 시대로 들어섰다. 프로그램적이지 않은 평가에 대한 접근은 어떤 의미에서는 가식적이 되었다. 프로그래밍식 평가는 확고히 자리잡은 발상이 되었고, 이제는 평가 관행을 규제하고 있다. 이를 통해 그것이 언제 처음 나타났는지, 어떻게 우리가 오늘날 알고 있는 형태로 진화했는지 성찰할 수 있다.
Proponents recount narrative stories about the emergence of programmatic assessment from earlier forms of psychometric testing, and outline problems and tensions it has solved (Schuwirth & van der Vleuten, 2019). Medical education textbooks now feature chapters on programmatic assessment (Van der Vleuten et al., 2017, 2020). The 2020 Ottawa Conference prepared its first ‘consensus statement’ on the topic. In short, we have entered an era of assessment in medical education constituted by an almost unquestioned programmatic approach. Approaches to assessment that are not programmatic have become, in a sense, tendentious. Programmatic assessment has become an entrenched idea, now regulating assessment practice. This allows us to reflectively ask when it first emerged and how it evolved into the form we know today.

연구의 목적 및 구조
Purpose and structure of the study

본 논문은 이러한 문제가 다른 곳에서 다루어지기 때문에 프로그램 원칙이나 운영 접근법에 관여하는 것에서 한 발 물러섰다(Van der Vleuten et al., 2015, 2017; Wilkinson & Tweed, 2018). 대신, 우리는 메타 철학 및 역사학적 관점에서 프로그램 평가를 검토하기로 선택한다. 우리는 프로그램 평가 역사에서 상대적으로 구별되는 세 가지 단계를 식별하였다.

출현
진화
고착

This paper takes a step back from engaging with programmatic principles or operational approaches, as these issues are dealt with elsewhere (Van der Vleuten et al., 2015, 2017; Wilkinson & Tweed, 2018). Instead, we choose to review programmatic assessment from a meta-philosophical and historiographical perspective. We identify three relatively distinctive phases in the history of the programmatic assessment:

emergence,
evolution and
entrenchment.

우리는 시기phase 사이에 일어나는 것처럼 보이는 철학적 변화를 끌어내고 궤도의 변화를 이끄는 것처럼 보이는 순간에 철학적 이슈를 검토한다. 여기서, 우리는 아이디어 자체의 [비-목적론적non-teleological 변화], 즉 어떻게 그것이 필요한 엔드포인트나 목표 없이 유기적으로 진화했는지를 언급할 것이다.
We draw out the philosophical shifts that seem to be occurring between phases, and also examine philosophical issues at moments that appear to be driving shifts in the trajectory. Here, we are referring to non-teleological changes in the idea itself—how it evolved organically without a necessary endpoint or goal.

연구의 관련성
Relevance of the study

평가에 대한 방법론적 접근법이 문헌에서 일상적으로 논의되고 있지만, 평가를 형성하는 철학적 가정과 책임에는 덜 주의를 기울였다. [철학적 입장이 암묵적일 때], 뒤따르는 토론은 방법론적인 것에 집중된다. [철학적 위치]가 단순히 [데이터를 생성하는 데 사용되는 방법]을 넘어서, [평가에 대한 이해를 형성]하는 데 갖는 역할을 고려할 때(Tavares 등, 2019) 프로그램 평가와 그 내역에 대한 철학적 조사는 시기적절하다. 프로그램 평가의 역사적 궤적을 추적하면서, 우리는 그 안에서 변화하는 암묵적인 철학적 위치를 끌어낼 것이다.

Although methodological approaches to assessment are routinely discussed in the literature, less attention has been given to the philosophical assumptions and commitments that shape assessments. When philosophical positions remain implicit, methodologically focused debates ensue. Given the role philosophical positions have in shaping an understanding of assessment beyond what methods are used to generate data (Tavares et al., 2019), a philosophical probing of programmatic assessment and its history is timely. In tracing the historical trajectory of programmatic assessment, we draw out the shifting, implicit philosophical positions within it.

접근법의 기초 및 정당성
Basis and justification of the approach

다양한 관점(예: 지적, 사회, 문화, 경제, 정치)의 프로그래밍 평가의 완전하고 세분화된 역사는 가치가 있지만, 우리는 다른 역사적 관점에서 진행하기를 바란다. 라스무센은 과학의 역사에서 [전통적인 접근법]이 전형적으로 [이론의 발전]을 추적해왔다고 지적한다. 즉, (발명과 발견과 같은) 진보적인 해결책과 (이론과 모델과 같은) 학문 내의 질문에 대한 답에 초점을 맞추고 있다고 언급했다.

Although a complete and fine-grained history of programmatic assessment from multiple perspectives (e.g., intellectual, social, cultural, economic, political) would be valuable, we wish to proceed from a different historiographical perspective. Rasmussen notes that traditional approaches in the history of science have typically traced the development of theories, focussing on progressive solutions (such as inventions and discoveries) and answers (such as theories and models) to questions within a discipline (Rasmussen, 1997).

자딘(Jardine)은 과학 분야의 역사에 접근하는 [대안적인 방법론]을 제시한다. 즉, [변화하는 질문, 문제, 실천 및 전제를 추적]하는 것에 초점을 맞추는 것이다(Jardine, 2000). 자딘의 질문의 목적은 변화하는 '조사의 장'이다.

연구자들은 어떤 질문을 하고 있는가?
그들이 걱정하는 문제들은 무엇인가?
이러한 문제를 해결하기 위해 그들은 어떤 관행, 방법, 기술을 사용하는가?
어떤 근본적인 철학적 전제가 조사를 인도하는가?

Jardine offers an alternative methodology for approaching the history of scientific disciplines, which instead focuses on tracing the shifting questions, problems, practices and presuppositions of inquirers (Jardine, 2000). Jardine’s object of interrogation is the shifting ‘scene of inquiry’—

what questions are being asked by inquirers?
What are the problems that concern them?
What practices, methods and techniques do they draw upon to solve these problems?
What underlying philosophical presuppositions guide inquiry?

이러한 미묘함은 시간이 지남에 따라 사라지거나, 잘못 해석되거나 의도하지 않은 방식으로 옮겨질 수 있습니다. 그럼에도 불구하고 이러한 미묘함들은 정확히 우리가 이 프로그램 평가의 역사에서 분석하게 될 것이다.

These subtleties can get lost over time, misinterpreted or taken up in unintended ways. And yet these subtleties are precisely what we will be analysing in this history of programmatic assessment.

자딘의 방법론에 따라, 우리는 메타 철학 렌즈를 통한 프로그램 평가에 대한 역사적 연구를 수행하는 과정에서, 연구자inquirer의 질문, 문제, 실천 및 철학적 전제에 초점을 맞춘다. 이 방법론적 접근은 [역사적 인식론]이라는 지적인 전통 정신을 이용한다. [역사적 인식론]은 "사물이 지식의 대상으로 만들어지는 역사적 조건과 수단"에 대한 연구이다. 근본적으로 역사적 인식론은 과학이 철학적으로 무엇인지 이해하기 위해서는 우선 비판적인 관점에서 그것의 역사를 연구해야 한다고 가정한다. 여기에는 역사적 궤적에서 철학적 조건과 자극을 이해하는 것이 포함된다(Tavares 등, 2019).

Following the methodology of Jardine, we conduct a historiographical study of programmatic assessment through a meta-philosophical lens, focussing on questions, problems, practices and philosophical presuppositions of inquirers. This methodological approach draws on the spirit of the intellectual tradition of historical epistemology. Historical epistemology is an investigation into “the historical conditions under which, and the means with which, things are made into objects of knowledge” (Rheinberger, 2010, p. 2). Fundamentally, historical epistemology posits that in order to understand what science is philosophically, we must first study its history from a critical perspective. This includes understanding the philosophical conditions and stimuli in a historical trajectory (Tavares et al., 2019).

이 접근법은, 관점주의perspectivism에 기반을 두고 있으며, [불변하거나, 시대를 초월하거나, 객관적이거나, 절대적인 형태의 합리성]을 갖춘 주장은 없다고 주장한다. 과학(그리고 과학적 탐구)은 역사적 우발성historical contingencies으로 인해 시간이 지남에 따라 단편화된다. 과학적 실천의 규범과 지식의 내용은 [특정한 철학적 우선순위]로 인해 발생하는 길고 때로는 꼬여있는 역사에 의존한다. 우리는 이러한 개념을 평가와 관련하여 유동적이고 문화적인 탐구 관행을 강조하면서 프로그램적 평가에 활용합니다.

Such an approach, grounded in perspectivism (Pearce, 2013), argues that there is no claim to unchanging, timeless, objective or absolute forms of rationality. The sciences (and scientific inquiry) become fragmented over time due to historical contingencies. The norms of scientific practice and the content of knowledge are dependent on long and sometimes convoluted histories, which arise due to specific philosophical priorities. We utilise these conceptions for programmatic assessment, underscoring the fluid, cultural practice of inquiry in relation to assessment.

현대 우주론(J. 피어스, 2017), 유기화학(Klein, 2003), 전자현미경(Rasmussen, 1997), 확률(Hacking, 1975)을 포함한 과학적 연구의 많은 측면들이 이러한 관점에서 연구되어 왔다. 간단한 예로, 암흑물질의 개념의 출현은 현재 확고하게 자리 잡고 있는 물리학 현상으로, 복잡한 궤적을 가지고 있다. 1930년대에 은하 회전곡선의 질량 불일치 문제를 해결하기 위해 제안되었지만, 1970년대까지만 해도 이 '누락missing 물질' 문제는 더 넓은 물리학계에서 문제가 되지 않았다(de Swart 등, 2017). 우주론 연구의 연구, 입자물리학과 우주론의 융합, 그리고 심지어 천문학에 대한 투자 증가와 같은 사회적, 경제적 요인에 의해 이 아이디어가 더욱 확고해졌다.
Many aspects of the history of scientific inquiry have been probed from this perspective, including modern cosmology (J. Pearce, 2017), organic chemistry (Klein, 2003), the electron microscope (Rasmussen, 1997) and probability (Hacking, 1975). By way of a brief example, the emergence of the concept of dark matter, now a firmly entrenched physics phenomenon, took a convoluted trajectory. It was proposed in the 1930s to solve a mass discrepancy problem in galaxy rotation curves, but this ‘missing matter’ problem was not seen as problematic by the wider physics community until the 1970s (de Swart et al., 2017). The entrenchment of the idea was driven more by lines of inquiry in cosmological research, the fusion of particle physics and cosmology, and even social and economic factors such as increased investment in astronomy.

우리는 [메타 철학적 렌즈]를 통해 역사를 면밀히 조사하는 과정에서, [우리 자신의 관점]에서 프로그래밍 평가의 역사에 관여engage한다. 우리는 이 논문에서 [처방적prescriptive]이 아니라, 전적으로 [서술적descriptive]이고자 함을 강조한다. 그리고 어떤 경우에는, 철학적 입장이 항상 명시적이지 않기 때문에, 우리는 우리 자신의 추론에 의존해야 합니다. 그러므로 우리는 이 철학사에서 '왜'가 아닌 '무엇'과 '어떻게'만 다루고 있다.
We engage with the history of programmatic assessment from our own perspective, carefully investigating the history through a meta-philosophical lens. We must stress that we are being entirely descriptive in this paper, not prescriptive. And in some cases, we must rely on our own inferences, as philosophical positions are not always explicit. Thus, we are only dealing with the ‘what’ and the ‘how’ in this philosophical history, rather than the ‘why’.

우리의 설명은 완전한 역사이거나 객관적인 서술recount를 의미하지 않는다. 둘 다 역사적 인식론의 정신에 반할 것이다. 우리는 또한 특정 검색 프로토콜로 체계적인 검토를 수행하지 않는다. 주로 평가 문헌이 철학적으로 구성되어 있지 않고, 조사 장면에서 변화하는 구성을 도출하는 데 프로그래밍 평가와 관련된 모든 문헌이 관련이 없기 때문이다. 대신, 우리는 다양한 출처에서 정보를 제공하는 서술적 접근법을 사용했다.

our account is neither meant to be a complete history, nor an objective recount; both of which would go against the spirit of historical epistemology. We are also not conducting a systematic review with specific search protocols—mainly because assessment literature is not philosophically organized, nor is all literature related to programmatic assessment relevant in drawing out shifting configurations in the scene of inquiry. Instead, we used a narrative approach informed by a variety of sources.

Lingard의 말에 따르면, 우리는 "프로그램적 평가를 뒷받침하는 동기"를 발굴할 필요가 있습니다(Lingard, 2009, 페이지 627). 우리는 역사 속 핵심 주체들이 쓰고, 말하고, 행동하는 것에 집중함으로써 이것을 한다.

우리의 주요 출처는 의학 교육 저널의 일차 문헌이 될 것이다. 이러한 출처는 평가 학자와 실무자가 만든 질문의 종류, 문제 및 제안된 해결책을 상세히 기술하는 조사 현장의 중요한 기록이다(Jardine, 2000).
2차 자료(접근법의 지지자들에 의한 성찰적 작업 등) 또한 그들의 전제, 동기 요소, 그리고 그들이 일하고 있던 문화적 맥락을 강조하는 데 도움이 될 것이다.
관련성이 있는 경우, 우리는 또한 평가에 대한 새롭고 새로운 접근법에 대한 중요한 기록을 제공하기 때문에 컨퍼런스 프레젠테이션과 진행의 예를 도출할 것이다.
마찬가지로, 우리는 때때로 교과서와 교육 자원을 언급할 것이다. 이러한 객체들이 반복적으로 학문을 만들고 그 후에 분야를 통합할 것이다(Badino & Navarro, 2013; Kragh, 2013).
마지막으로, 웹사이트나 온라인 강의와 같은 더 인기 있는 자료들은 장면의 구성에 대한 중요한 지표이기 때문에 언급될 것이다.

In the words of Lingard, we need to excavate “the motivations that underpin” programmatic assessment (Lingard, 2009, p. 627). We do this by focussing on what key players in the history write, say, and do.

Our main sources will be primary literature in medical education journals. These sources are important records of a scene of inquiry (Jardine, 2000) as they detail the kinds of questions, problems and proposed solutions made by assessment scholars and practitioners.
Secondary sources, such as reflective works by proponents of the approach, will also be helpful in highlighting their presuppositions, motivational factors, and the cultural contexts in which they were working.
Where relevant, we will also draw on examples of conference presentations and proceedings as these provide an important record of new and emergent approaches to assessment.
Similarly, we will at times mention textbooks and educational resources as these objects iteratively create and subsequently consolidate a discipline (Badino & Navarro, 2013; Kragh, 2013).
Finally, more popular material, such as websites and online lectures will also be mentioned as they are an important indicator of the configurations of a scene.

출현: 후기 실증주의적 관점으로 전환하여 실증주의적 '터모일' 극복(2005년 이전)
Emergence: overcoming positivist ‘turmoil’ by shifting to a post-positivist perspective (pre-2005)

씬(scene) 구성
Configurations of the scene

이 이야기의 선구자는 측정 접근법에 관한 개념적 프레임워크에서 평가에 이르는 20세기 중반으로 거슬러 올라갈 수 있지만, 우리는 이 역사에 대한 질문을 거의 틀림없이 프로그램 평가 아이디어의 첫 출현으로 제한하기로 결정했다. 2016년, 씨스 판 데르 블뢰텐은 1996년 그의 논문이 프로그램적 사고의 첫 출현이라고 밝혔다. 그러나 이러한 아이디어 중 일부는 두 개의 이전 논문에서 선행되었다는 점에 주목한다(Norman 등, 1991년; Van der Blouten 등, 1991년). 1996년, 그는 조사 현장의 현주소를 다음과 같이 기술하며 시작한다.
Although the precursors to this story can be traced back to the mid-20th Century—stemming from conceptual frameworks concerning measurement approaches to assessment—we choose to limit our interrogation of this history to what is arguably the first emergence of the idea of programmatic assessment. In 2016, Cees van der Vleuten identifies his 1996 paper as the first emergence of programmatic thinking (Van der Vleuten, 2016). However, we note that some of these ideas were foreshadowed in two earlier papers (Norman et al., 1991; Van der Vleuten et al., 1991). In 1996, he begins by describing the current state of the scene of inquiry:

교육 성취도 시험은 보건과학에서 혼란스러운 분야이다.
시험은 많은 교사, 교육과정 설계자, 교육자들에게 지속적인 문제의 원천이다.
학생 성취도에 대한 평가는 교육 회의, 회의, 워크숍에서 지속적으로 논의되고 있다.
전통과 개인적 가치관, 경험이 토론을 좌우하는 영역이다.
반면에 지난 10년 동안 평가 대상 과학 출판물의 수는 폭발적으로 증가했다.
제안된 도구의 수는, 각각 흥미로운 두문자어를 사용하는 것을 선호하며, 셀 수 없이 많다.
(Van der Bluten, 1996, 페이지 41)
Educational achievement testing is an area of turmoil in the health sciences. Examinations are a constant source of problems for many teachers, curriculum designers and educationalists. The evaluation of student achievement is continually debated at educational meetings, conferences and workshops. It is an area in which tradition, personal values, and experiences tend to dominate discussions. On the other hand, the number of scientific publications on assessment over the last decade has exploded. The number of proposed instruments, each preferably using an intriguing acronym, is countless. (Van der Vleuten, 1996, p. 41)

위 논문에서는 평가를 [최적화 문제]로 제시한다. 평가 방법마다 장단점이 있다. 효용 공식이 제시(그리고 종종 인용)되는데, 여기서 모든 평가 방법의 효용은 신뢰성, 유효성, 교육적 영향, 수용성 및 비용의 산물이다. 주된 주장은 단일 평가 방법이 모든 품질 기준에서 완벽할 수는 없다는 것이다. 각 평가 순간에는 타협compromise이 필요합니다. 트레이드오프는 불가피하며, 결정은 상황별 요인에 의해 영향을 받습니다.
The paper goes on to present assessment as an optimisation problem. Each assessment method has its own strengths and weaknesses. A utility formula is presented (and often cited), whereby the utility of any assessment method is a product of its reliability, validity, educational impact, acceptability and costs. The main argument is that no one single assessment method will be perfect on all quality criteria. Each assessment moment requires a compromise—trade-offs are inevitable, and decisions will be influenced by contextual factors.

"turmoil"이라는 단어의 사용은 실무자들과 이론가들 사이의 혼란스러운 장면을 스케치합니다. 의료 교육자들은 [타당성validity과 실제성authenticity을 희생하지 않고] [신뢰성reliability을 높일 수 있는 최상의 평가 방법]을 찾기 위해 고군분투하고 있었다. 끊임없이 확대되는 [평가 툴킷]을 기반으로 하는 [테스트의 시대]였다. 이 시대에는 다양한 문제들이 있었다.

맥락 특이성,
객관성에 대한 도전,
일부 형태의 평가에서 제한된 데이터에 대한 우려
가짜 데이터(예: 시험 점수의 위양성) 및
[임상 역량의 복잡성]과 [특성의 조합] 사이의 긴장에 관한 우려

The use of the word “turmoil” sketches a scene of confusion amongst practitioners and theorists alike. Medical educators were struggling to find the best assessment methods that would enhance reliability without sacrificing validity and authenticity. It was an era of testing, based in an ever-expanding toolkit of assessments. There were problems such as

context specificity,
challenges to objectivity,
concerns about limited data from some forms of assessment,
spurious data (such as false-positives in test scores), and
concerns relating to the combination of traits in tension with the complex nature of clinical competence.

이러한 문제들은 심리학자들의 문제였고, 과학적 연구의 언어로, 이 문제들은 실증주의적 원리에 기초했다. 그러나 이러한 문제들을 해결하지 못하자 새로운 접근을 위한 준비로 떠들썩한 탐구 장면이 연출되었다.

These problems were those of psychometricians, and in the language of scientific inquiry, these problems were based on positivist principles. But the failure to solve these problems resulted in a tumultuous scene of inquiry that was primed for a novel approach.

우리는 Cees van der Vleuten과 Lambert Schwirth의 2005년 논문 '전문적 역량 평가: 방법에서 프로그램까지'를 프로그래밍 평가의 출현의 결정적 순간으로 식별한다(Van der Vleuten & Schwirth, 2005). 우리가 글을 쓸 당시, 그 논문은 1237번 인용되었다. 2016년 판 데르 블뢰텐은 이 논문이 그의 평가에 대한 생각에서 획기적인 사건이라고 언급했다(Van der Blouten, 2016). 저자들의 목표는 [평가를 "측정 문제"로 취급하는 것]으로 보는 초점을 바꾸는 것이다. 그들은 "프로그램적 교육 설계"에 대해 이야기하며 "평가는 프로그램적 접근이 필요한 교육적 설계 문제"라고 주장한다. 1996년 논문에서 확인된 평가의 다양한 측면을 논의하는 효용 모델로 시작한다. 그런 다음 그들은 "개별 방법을 평가하는 것이 아니라 전체적으로 평가 프로그램의 유용성에 대한 증거를 제공해야 한다"는 설득력 있는 제안을 제시한다(Van der Bluten & Schwirth, 2005, 페이지 309).

We identify Cees van der Vleuten and Lambert Schuwirth’s 2005 paper ‘Assessing professional competence: from methods to programmes’ as a signature moment in the emergence of programmatic assessment (Van der Vleuten & Schuwirth, 2005). At our time of writing, the paper has been cited 1237 times. In 2016, van der Vleuten noted that the paper represented a landmark in his thinking around assessment (Van der Vleuten, 2016). The authors explicitly aim is to shift the focus away from treating assessment as a “measurement problem”. They talk about “programmatic instructional design” and argue that “assessment is an educational design problem that needs a programmatic approach”. They begin with the utility model, discussing the different aspects of assessment identified in the 1996 paper. They then present a compelling proposition, that “we should not evaluate individual methods, but provide evidence of the utility of the assessment programme as whole” (Van der Vleuten & Schuwirth, 2005, p. 309).

철학적 전제
Philosophical presuppositions

이 단계 동안 우리는 [평가에 관한 근본적인 철학적 입장]에 있어서의 명시적인 변화는 없는 상태에서, [심리 측정과 방법을 중심에 두는 주장과 변화]를 관찰하였다. 즉, 프로그래밍 방식의 평가는 이러한 변화의 기회가 나타나기 시작한 것은 사실이나, [새로운 온톨로지나 인식론]을 제공하기 보다는, [주로 역량을 완전히 "측정"하기 위한 다양한 방법]과 연관된다. 예를 들어, 2004년의 프로그램 평가는 [평가 방법의 타당성과 신뢰성 및 역량 측정에 대한 관심]에 의해 지배되는 개념적 공간에 분명히 위치해 있다. 비록 '프로그램적'이라는 용어는 사용되지 않았지만, 그들은 "모든 좋은 평가 프로그램은 [다양한 방법]으로 구성된다"는 개념을 밀어붙였다(Schuwirth & van der Vleuten, 2004, 페이지 975).

During this phase we observe arguments and shifts that place psychometrics and methods at the core, without necessarily making explicit shifts in underlying philosophical positions concerning assessment. That is, programmatic assessment becomes associated with a diversity of methods mainly in order to “measure” competence fully, rather than providing any new ontologies or epistemologies, even though the opportunity for these shifts start to present themselves. For instance, in 2004 programmatic assessment is clearly situated in a conceptual space dominated by concerns of the validity and reliability of assessment methods and the measurement of competence. Although the term ‘programmatic’ was not used, they pushed the notion that “any good assessment programme consists of a variety of methods” (Schuwirth & van der Vleuten, 2004, p. 975).

저자들이 새로운 접근법으로 심리측정학 문제를 극복하는 것을 명시적으로 목표로 했더라도, 2005년의 주장들 중 다수는 [심리측정학 및 측정 기반 사고에 근거]하고 있다. 예를 들면 신뢰성 추정치가 서로 다른 평가 형식에 걸쳐 시험기간에 따라 증가한다는 것을 보여주기 위해 [8개 연구에서 얻은 경험적 증거를 취합하여 신뢰성이 사실상 표본 추출 문제라고 주장]한다. 이들이 도출한 결론 중 하나는 [덜 구조화되었거나 덜 표준화된 평가]라도 "[더 구조화되고 객관적인 측정과 온전히 혹은 거의 신뢰도가 높을 수 있다]"는 것이다(Van der Bluten & Schwirth, 2005, 312쪽, 원래 강조). 신뢰할 수 있는 역량 측정을 달성하기 위해 [샘플링이 가장 중요한 원칙]으로 선정됩니다. 이것은 프로그래밍 평가에 대한 도입 발표를 지배하게 된 [픽셀 은유]가 확산된 기초이다.

Many of the 2005 arguments remain grounded in psychometric and measurement-based thinking, even if the authors explicitly aim to overcome psychometric problems with novel approaches. For example, they argue that reliability is effectively a sampling problem—they collate empirical evidence from eight different studies to demonstrate that reliability estimates all increase with testing time across different assessment formats. One conclusion they draw is that less-structured or standardised assessments “can be entirely or almost as reliable as other more structured and objective measures” (Van der Vleuten & Schuwirth, 2005, p. 312, emphasis in original). Sampling is singled out as the overarching principle to achieve a reliable measure of competence. This is the basis for the proliferation of the pixel metaphor which has come to dominate introductory presentations to programmatic assessment.

타당성 문제에 대해 제시된 논쟁은 후기실증주의, 후기-심리측정적 관점을 취한다. 즉, 역량에 대한 환원주의적 관점을 비판한 것이다. 역량이란 것은 [하위 역량의 분리된 패킷으로 분해될 수 없는] [기술과 능력의 복잡한 통합]에 기초한 [다면적인 현상]이라고 주장하였다. 따라서 좋은 품질 평가는 여러 출처와 다양한 방법의 정보와 "이 출처들에 걸친 정보를 삼각측량하여" 전체적인 판단을 구성하는 것이 필요하다고 주장한다(Van der Bluten & Schwirth, 2005, 페이지 313). 그들은 미래의 평가 개발자의 과제는 이 프로세스를 "가능한 한 엄격하게" 만드는 동시에, 의사결정에 있어 질적(정성적) 정보 출처와 전문적인 판단에 의존하는 것이라고 언급했다.

The argument presented on the issue of validity takes a post-positivist, post-psychometric perspective, criticizing the tendency towards reductionism in assessment—that competence is a multifaceted phenomenon based on a complex integration of skills and abilities which cannot be broken down into discrete packets of sub-competencies. They argue that good quality assessment requires information from multiple sources and from a variety of methods, and the constructing of overall judgments “by triangulating information across these sources” (Van der Vleuten & Schuwirth, 2005, p. 313). They note that one challenge for assessment developers of the future will be relying on qualitative sources of information and professional judgment in decision-making, while making this process “as rigorous as possible”.

반 데르 블뢰텐과 슈워스는 [의학 교육 평가의 실무자들이 심리 측정 접근법을 넘어설 필요가 있다]고 명시적으로 주장한다. 그들은 당시의 평가 문헌이 지나치게 방법에 치우쳐 있었고 "심리측정학 문제에만 몰두했다"고 지적했다(Van der Bluten & Schwirth, 2005, 페이지 315). 대신, 그들은 프로그램적 설계에서는 "단순한" 심리측정 평가로는 충분하지 않을 것이라고 언급하면서, [평가에 대한 관점의 확대]를 주장한다(Van der Bluten & Schwirth, 2005, 페이지 315). 그들이 문제해결에 있어서 [새로운 특정한 관행이나 기술로 문제를 해결]하기 보다는, [(기존의) 관점을 전환하여 문제를 극복하거나 회피하는 것]을 포함했다. 그럼에도 불구하고 흥미롭게도, [심리측정학적 접근방식을 극복]하기 위한 그들의 가장 강력한 주장은, [신뢰성과 더 큰 표본 추출의 문제]인, 심리측정학적 사고에 입각해있다. 비록 그들은 밀접하게 정렬되어 있지만, 이 둘 사이에는 미묘한 특이성이 있다. 궁극적으로, 반 데르 블뢰텐과 슈워스는 환원주의적 측정-기반 경향성을 극복하기 위해 [심리측정에 근거한 주장]을 사용한다.
Van der Vleuten and Schuwirth explicitly argue that practitioners in medical education assessment need to move beyond psychometric approaches. They note that the assessment literature of the time was overly geared towards methods and “too preoccupied with exclusively psychometric issues” (Van der Vleuten & Schuwirth, 2005, p. 315). Instead, they argue for a broadening of perspectives on assessment, noting that for the programmatic instructional design approach to work, “‘simple’ psychometric evaluation will not suffice” (Van der Vleuten & Schuwirth, 2005, p. 315). Their problem-solving move involved overcoming or circumventing problems through shifting perspectives rather than solving them with new specific practices or techniques. And yet, interestingly, their most powerful argument for overcoming psychometric approaches is one predicated on psychometric thinking—the issue of reliability and greater sampling. Although they are closely aligned, there are nuanced specificities between these two. Ultimately, van der Vleuten and Schuwirth employ an argument grounded in the psychometric in order to overcome reductionist measurement-based tendencies.

이 시기 프로그램 평가에서 [실용주의pragmatism]와 관련된 깊은 철학적 이슈도 있다. 전체적으로 평가 프로그램의 유용성을 강조함으로써 실용주의적 함의가 나타났다. 그러나 효용성utility의 전구체로서 목적을 벗어난 견해를 형성할 수 있는 기본적인 가정에 대해서는 명시적인 고려가 이루어지지 않는다. 타당성과 관련하여 효용성utility을 다루는 방법은 미개발적이며, 현재도 고려되고 있다. 평가에서 타당성이 갖던 우위는 경시된 것downplayed으로 보인다. 효용utility 모델을 구성하는 요소를 언급하면서, 저자들은 어떤 기준에 가중치를 두는지는 "특정 상황의 특정 사용자"에 따라 달라질 수 있다는 점에 주목한다(Van der Vleuten & Schwirth, 2005, 페이지 309). 그러나 가치로 작용하는 것이 어떻게 달라질 수 있고 잠재적으로 어떤 경우에는 외부 요인에 의해 암묵적으로 영향을 받을 수 있는지에 대한 설명은 없다. 정치적, 개념적, 심지어 경제적 요소도 효용 기준에 따른 가치에 영향을 미칠 수 있다.

There is also a deep philosophical issue relating to pragmatism in programmatic assessment at this time. By underscoring the utility of assessment programmes as a whole, pragmatist undertones emerge. Yet no explicit consideration is given to underlying assumptions that may shape views beyond purpose as a precursor to utility. The way that utility is treated in relation to validity is underdeveloped, and scantly considered. The primacy of validity in assessment appears to be downplayed. Referring to elements in the utility model, the authors note that the weighting of criteria would depend on “a specific user in a specific situation” (Van der Vleuten & Schuwirth, 2005, p. 309). However, there is no treatment of how what serves as valued can vary and be potentially and, in some cases, insidiously influenced by external factors. Political, conceptual, even economic factors can influence the value placed on utility criteria.

게다가, 이들은 "개별적인 방법"에서 "프로그램"으로의 전환을 지지하면서도, 단지 다르게 포장된 방법들을 강조하는 것처럼 보인다. 그들은 평가에서 보다 [질적인 판단을 지지]하여 측정 기반 접근법에서 벗어날 수 있도록 한다. 그러나 이것은 완전히 사실이 아니다. 이러한 초기 논문에는 역량의 본질에 대한 몇 가지 실증주의적 존재론적 입장이 남아 있다. 예를 들어, 우리가 잠재된 구조를 통해 역량의 본질을 알 수 있는 방법에 대한 근본적인 가정이 있지만, 이제 와서야 어떤 방법이든, [단 한 가지 방법으로는 내재된 심리측정적 한계]가 있기 때문에, [프로그램(방법들의 집합)을 통해 이를 더 잘 포착할 수 있다]는 주장이 제기된다.
Further, while advocating for a shift from “individual methods to programs” the founders still seem to emphasize methods, just packaged differently. They advocate for more qualitative judgment in assessment, facilitating a departure from measurement-based approaches. But this is not entirely the case. There remains in these early papers some positivist ontological positions on the nature of competence. For instance, there are underlying assumptions about how we can know the nature of competence through a latent structure, only now the argument is that we can better capture this through a program (a collection of methods) because of the inherent psychometric limitations with any one method.

질적 데이터를 사용하고 객관성을 강조하지 않는 것, 그리고 "엄격한 판단"의 필요성이 (굳이 예를 들자면) 필연적으로 [잠재 변수 구인latent variable construct]이라는 입장으로 포기하는 것은 아니다. 정보의 여러 출처를 삼각측량하여 전체적인 판단을 구성하지만, 이는 샘플링을 통한 신뢰성과 같은 [심리측정학적 필수사항imperative]에 복무하는 것이다. 이것은 프로그램 평가의 역사에서 나중에 강조되는 구성주의적 철학적 존재론과는 상당히 다르다.

Even the use of qualitative data and de-emphasizing objectivity, and the need for “rigorous judgment”, is not necessarily abandoning (as an example) a latent variable construct position. Multiple sources of information are triangulated to construct an overall judgment, but this occurs in the service of psychometric imperatives, such as reliability through sampling. This is substantially different to the constructivist philosophical ontology that is highlighted later in the history of programmatic assessment.

요약하자면, 이 단계의 프로그램 평가에는 (과거의) 철학적 전제들 중 많은 것들이 함축적으로 남아 있다. 예를 들어,

[판단에 대한 강조]를 하였지만, 이것은 [평가에서 존재론적 또는 인식론적 위치의 변화]가 있었는지, [평가 문헌의 이전 가정이 유지되고 있는지]에 대해서는 아무런 언급도 하지 않는다.
"(포화에 도달하고 신뢰할 수 있고 변론할 수 있는 결정이 될 때까지 정보를 계속 축적하는) 질적 접근법"을 채택하는 것도 마찬가지로 그러한 판단이나 주장이 무엇에 관한 것인지에 대한 직접적인 주장이 아니다(Van der Bluten & Schwirth, 2005, 페이지 315).

In summary, many of these philosophical presuppositions remain implicit in this phase of programmatic assessment.

The emphasis on judgment, for example, says nothing about shifts in ontological or epistemological positions in assessment, nor whether earlier assumptions from the assessment literature were being upheld.
Adopting a “qualitative approach that continues to accumulate information until saturation is reached and a decision becomes trustworthy and defensible” is not a direct claim on the nature of what those judgments or claims are about (Van der Vleuten & Schuwirth, 2005, p. 315).

(통계적 심리측정학에 대한 강조를 포기하거나, 최소화하는 대신) 전문적인 판단을 강조하는 것은 [방법론적 선택 및 정당화를 위한 전략]으로 봐야지, 여기에 [평가가 무엇인지에 대한 명시적인 철학적 관점]이 반드시 필요한 것은 아니다. 이것은 나중에 그것들이 존재론적 및 인식론적 쌍과 어떻게 일치하는지 인식하지 못한 채 프로그램적 방법의 실용적인 채택 사이에 긴장이 나타난 이유를 설명할 수 있다. '평가의 효용성'으로의 이동이 [실용적인 고려]인지 아니면 [실질적인 철학적 실용주의에 기반을 둔 것인지] 또한 현재로서는 불분명하다.
Abandoning or at least minimizing the emphasis on statistical psychometrics in place of professional judgment is also a methodological choice and/or justification strategy, not necessarily an explicit philosophical view of what assessment is when interrogated from different views. This may explain why tensions later emerged between the practical adoption of programmatic methods without appreciating how these align with ontological and epistemological pairings. It is also unclear at this time whether the move to ‘utility of the assessment’ is a pragmatic consideration or one built on substantive philosophical pragmatism.

업샷
Upshots

이것은 우리가 프로그램 평가의 역사에서 볼 수 있는 철학적 전제에서 중요한 전환점을 제공하는 기회를 제공했다. 여전히 방법의 고려, 신뢰성을 위한 표본 추출, 그리고 철학적 입장에 대한 명시적인 취급의 부족에 중점을 두고 있다는 점에서, 아직은 [장면scene이 완전히 바뀌지 않았음]을 암시한다. 그러나 2005년 이후, '프로그래마틱'이라는 단어가 문헌에 점점 더 많이 등장하였다. 2005년 논문은 평가와 교육 설계에 대한 다양한 접근 방식을 깔끔하게 패키징했다. 우리는 2005년 논문에는 [후기 실증주의적 사고방식 외에도, 더 많은 뉘앙스나 혼합된 철학적인 어조가 나타난다는 것]을 알게 되었다. 예를 들어, 학습 및 교육 설계 요소에 대한 영향이 여기에 제시된다. 그럼에도 불구하고, 그 논문의 핵심main driver는 [샘플링을 기반으로 한 직관적인 주장]이라고 볼 수 있다.
This provided the opportunity for a major turning-point in the philosophical presuppositions we see in the history of programmatic assessment. The emphasis still being placed on considerations of methods, sampling for reliability, and the lack of explicit treatment of philosophical positions, implies that the scene had not entirely shifted. Yet after 2005, the word ‘programmatic’ increasingly appeared in the literature. The 2005 paper neatly packaged a range of approaches to assessment and thinking around instructional design. We acknowledge that, aside from the post-positivist mindset, there are more nuanced or blended philosophical tones appearing in the 2005 paper. For instance, impacts on learning and instructional design elements are presented there. Nevertheless, we contend that it is the intuitive argument based on sampling that was the main driver of the piece.

이 주장은 종종 인용되는 '픽셀 은유'에서 가장 큰 영향을 끼쳤다. 즉, 평가 데이터의 축적에 기초한 후보 "진정한" 역량의 '이미지 해상도'에 대한 강력한 입증이다. 그 이후 수많은 컨퍼런스 프레젠테이션에서 픽셀 은유를 사용하여 평가 정보의 샘플링 이점을 입증했다(예: 온라인 참조(Van der Vluten, 2015). 일련의 프레임에서, 하나의 픽셀은 다중 픽셀이 되고, 발견될 "진정한" 이미지의 출현을 나타내는 모나리자의 유명한 그림으로 점차 분해된다. 은유적인 주장은 매우 직관적이고, 의학 교육자들에게 빠르게 반향을 일으켰다. 여기서의 메시지는 더 많은 데이터가 더 나은 그림을 의미한다는 것입니다. 비록 이 은유에서 [역량의 본질]에 대한 명확한 표현은 없었지만, 어떤 의미에서 [타당성의 대용품]이 된 것이다. 픽셀 은유에 반하여, 사이코메트릭 사고에 기초한 관행은 시대에 뒤떨어진 것으로 간주되었다. (크로슬리, 2006; 호지스, 2013).
This argument also brought with it the greatest impact, appearing time and again in the oft-quoted ‘pixel metaphor’; a powerful demonstration of the ‘image resolution’ of candidate “true” competence based on the accumulation of assessment data. Countless conference presentations since have used a pixel metaphor to demonstrate the benefits of sampling assessment information (for an example online, see (Van der Vleuten, 2015)). In a series of frames, a single pixel becomes multiple pixels, and gradually resolves into the famous painting of the Mona Lisa—representing the emerging of a “true” image to be discovered (consistent with positivist assumptions). The metaphorical argument is highly intuitive, and one it quickly resonated with medical educators. The message here is that more data equates to a better picture. This in a sense becomes a surrogate for validity, although there is no clear articulation of the nature of competence in this metaphor. Placed against the pixel metaphor, practises based on psychometric-thinking became to be seen as outdated (Crossley, 2006; Hodges, 2013).

2005년은 프로그램 평가의 역사에서 가장 중요한 순간이지만, 등장 이후 아이디어는 빠르게 변형되었고, 새로운 양상을 띠었으며, 다른 요소들이 강조되었다. 이제 2단계에서 조사 현장의 구성을 설명하는 것으로 전환하고 2005년 논문(및 그 업쇼트)이 의료 교육 평가 환경을 어떻게 빠르게 변화시켰는지를 강조하고자 한다.
Although 2005 represents a signature moment in the history of programmatic assessment, after its emergence the idea quickly morphed, took on new facets, and had different elements emphasised. We now shift to describing the configuration of the scene of inquiry in phase two, and highlight how the 2005 paper (and its upshots) rapidly changed the landscape of medical education assessment.

진화: 다양성과 학습이 강조되고, 구성주의/해석주의 진보 (약 2005-2013)
Evolution: diversity and learning underscored, constructivism/interpretivism advances (approx. 2005–2013)

씬(scene) 구성
Configurations of the scene

이 기간의 씬은 [프로그램 평가의 진화궤적에 영향을 미친 다양한 평가 아이디어]로 특징지어진다. 실무자들이 해결하려고 시도했던 한 가지 주요 문제는 인간의 판단의 역할뿐만 아니라 [다양한 유형의 평가 정보를 결합하는 방법]이었다. 평가 프로그램에는 깔끔하게 축적되거나 함께 합산될 수 없는 역량 평가에 대한 다양한 접근 방식이 포함되어 있습니다. 이는 특히 저부담 평가를 다루는 방법의 다양성에 의해 강조되었다. 예를 들어, 2005년 말에 제출되어 2006년 10월에 수락된 한 논문은 임상 성과 평가에 대한 더 넓은 관점을 추구한다(Govaerts et al. 저자들은 '프로그래밍'이라는 용어를 사용하지 않지만, 2005년 논문을 인용하며 "결과 기반 및 역량 기반 교육에 대한 강조는 관련 역량을 통합하는 평가 방법을 선호하는 것 같다."(Govaerts et al. 2007, 페이지 240)고 언급했다.
This period in the scene of inquiry is characterised by a range of assessment ideas that influenced the evolving trajectory of programmatic assessment. One main problem that practitioners were attempting to solve was how to combine different types of assessment information, as well as the role of human judgment. A program of assessment contains varied approaches to assessing competencies that could not be neatly accumulated or added together. This was particularly emphasised by more diversity in the way that low-stakes assessments were being treated. For example, one paper submitted in late 2005 and accepted in October 2006 pushes for broader perspectives on clinical performance assessment (Govaerts et al. 2007). Although the authors do not use the term ‘programmatic’, they cite the 2005 paper and note “the increasing emphasis on outcome-based and competency-based education is likely to favour assessment methods that integrate relevant competencies” (Govaerts et al. 2007, p. 240).

평가에서 후기-환원주의적 관행은 점점 더 옹호되고 있었다. [평가 판단을 등급, 점수, 등급 및 숫자로 변환하는 일반적인 관행]은 [프로그램 프레임워크에서 의사결정에 필요한 정보의 풍부함을 유지하는 것]과 반대되는 것으로 간주되었다(Schuwirth & van der Vleuten, 2011). 각 평가를 '해치워야ticked-off' 할 장애물로 보기보다는, 많은 주관적 판단이 확실한 그림을 제공한다는 개념이 있었다. 이는 서로 다른 출처에서 추출한 다수의 표본 추출과 서로 다르더라도 의미가 있다는 주장에 의존했다. 정보의 '삼각측량' 개념(질적 연구에서 차용)과 평가 데이터를 패턴과 별자리(전반적인 방법이지만 역량 내에서)로 취급하는 개념이 여기에 연결됐다. 인간의 판단은 "평가 프로세스의 중심central in the assessment process"으로 강조되었다. 이는 객관성과 순수한 측정에서 벗어나 좀 더 주관적이고 구성적이며 해석되는 것으로의 전환을 의미했다. 정성적 접근법의 증가는 평가자 인지에 대한 연구와 같이 다른 곳에서 옹호되고 있는 평가에 대한 접근법과 일치하는 것으로 나타났다(Gingerich 등, 2011; Govaerts 등, 2011).
Post-reductionist practices in assessment were increasingly being advocated (Kim et al., 2006; Ma et al., 2012; Regehr et al., 2007). Common practices of converting assessment judgments to ratings, led scores, grades and numbers to be seen as antithetical to maintaining the richness of information required to make decisions in a programmatic framework (Schuwirth & van der Vleuten, 2011). Rather than seeing each assessment as a hurdle to be ‘ticked-off’, there was a notion that many subjective judgments provide a robust picture. This relied on a multitude of sampling from disparate sources and claims that each were meaningful even if different. Notions of ‘triangulation’ of information (borrowed from qualitative research), and treating assessment data as patterns and constellations (across methods but within competencies) were connected to this. Human judgment was underscored as “central in the assessment process” (Schuwirth & van der Vleuten, 2011, p. 481). This further signalled the shift away from the dominance of objectivity and pure measurement, to something more subjective, constructed and interpreted. The rise of qualitative approaches appeared to coincide with approaches to assessment that were being advocated elsewhere, such as research on rater cognition (Gingerich et al., 2011; Govaerts et al., 2011).

이 단계의 프로그램 평가에서 보여준 주요한 혁신은 [학습에 중점을 둔 것]이었다. 2011년, 슈비르스와 판 데르 블뢰텐은 "프로그램적 평가: 학습 평가에서 학습 평가까지"를 출판했다. 이 논문은 의학교육 외에서 확립된 평가 문헌을 활용했다. 그들은 assessment for learning를 "평가 과정이 교육 과정에 불가분하게 내재되어 있고, 정보가 풍부하며, 각 개별 학생의 학습을 최대 수준으로 조정하고 육성하는 접근법"으로 제시한다. 이것은 프로그램 평가의 개념이 어떻게 발전하고 있었는지를 보여주는 분명한 지표이다.
A major innovation in this phase of the evolution of programmatic assessment, was the strong emphasis placed on learning. In 2011, Schuwirth and van der Vleuten published: “Programmatic assessment: From assessment of learning to assessment for learning”. This paper drew on established assessment literature from outside of medical education. They present assessment for learning as “an approach in which the assessment process is inextricably embedded within the education process, which is maximally information-rich, and which serves to steer and foster the learning of each individual student to the maximum of his/her ability” (Schuwirth & van der Vleuten, 2011, p. 478). This is a clear indication of how the notion of programmatic assessment was evolving.

프로그램 평가의 개념이 처음 등장했을 때 살짝 드러났던 '학습'이라는 요소는 비로소 성장하여 강조되었다. 과거에는 학생에 대한 효과적인 피드백이 충분하지 않았다. 하지만 평가 프로그램은 "학생 개개인의 필요에 따라 특별히 조정되어야 한다.". 학생의 학습과 진행을 둘러싼 '맞춤형 조언', '치료', '멘토', '치료적 결정', '프로그노스틱 결정' 등의 문구가 등장한다. 이 ['배움을 위한'] 이라는 요소는 "다양한 출처에서 정보를 수집 및 결합하여, 학생 개개인의 강점과 약점에 대상 정보를 주고, 학습을 최적화하기 위한 목적으로 사용하는, 정보가 풍부한 접근법"이며, 프로그래밍 평가와 명시적으로 연결된다.

The learning element of programmatic assessment, which appeared in the original emergence of the notion, was nourished and underscored here. Effective feedback to students was not enough—assessment programmes needed to be “tailored specifically to the individual needs of each student” (Schuwirth & van der Vleuten, 2011, p. 481). Phrases such as “tailored advice”, “remediation”, “mentors”, “therapeutic decisions”, and “prognostic decisions” around student learning and progression appear. This ‘for learning’ element is linked explicitly to programmatic assessment, as “an information-rich approach in which a programme of assessment is used to collect and combine information from various sources to inform about the strengths and weaknesses of each individual student, with the purpose to optimise their learning.” (Schuwirth & van der Vleuten, 2011, p. 482).

Van der Vleuten 등의 2012년 논문은 "프로그래밍 평가의 실천을 위한 모델"을 제시한다(Van der Vleuten 등, 2012). 이 논문은 2011년의 학습 강조를 통합하여, 실제로 프로그램 평가가 어떻게 보일 수 있는지에 대한 가장 완전한 그림을 제시한다. 저자들은 "학습자의 성취, 선발, 진급에 대한 견고한 의사 결정과 더불어 "학습을 위한 평가"라는 목적"을 가진 모델을 제시한다. 그들은 학습자 성찰과 계획, 성찰 주변의 사회적 상호작용, 학습 과제가 평가 과제이고 마스터 과제에 대한 인증 데이터 포인트의 구성요소에 의해 연결된 다양한 훈련, 평가 및 지원 활동을 제시한다. 이들은 이 모델이 [가장 합목적적이고, 학습을 최적화하며, 데이터의 의미에 손상을 주지 않으며, 신뢰할 수 있고, 견고한 고부담 의사결정을 할 수 있다]고 주장한다(Van der Vleuten et al., 2012, 페이지 211).
A 2012 paper by van der Vleuten et al. presents “a model for programmatic assessment in action” (Van der Vleuten et al., 2012). This paper incorporates the learning emphasis from 2011 and presents the most complete picture of what programmatic assessment might look like in practice. The authors present a model that has the explicit “purpose of assessment for learning, with robust decision making on learners’ achievements, selection and promotion” (Van der Vleuten et al. 2012, p. 209). They present a range of different training, assessment and supporting activities, that are linked by components of learner reflection and planning, social interactions around reflection, learning tasks being assessment tasks, and certification data-points for mastery tasks. They argue that their model is optimally fit for purpose, optimises learning, makes no compromises on the meaningfulness of the data, and allows for credible and robust high-stakes decision-making (Van der Vleuten et al., 2012, p. 211).

이는 프로그래밍 방식의 평가가 [샘플링을 통한 더 나은 신뢰성에 관한 것]을 넘어서는 무언가로 전환되었음을 의미하며, 심지어 이전의 우선순위priorities를 강조하지도 않는다. 모든 것을 아우르는 이러한 주장들(즉, 프로그래밍 평가가 이중적인 목적일 수 있음)은 직관적으로 매력적인 개념적 주장이었고 널리 받아들여진 것으로 보인다. 그러나 이러한 주장을 더 자세히 탐구한 후속 연구에서 증명되었듯이 경험적 주장은 여전히 제한적이었다(Heenman 등, 2015). 2012년, Van der Vleuten 등은 [비용과 자원, 관료주의, 소소화와 환원주의, 법적 제한 및 미지의 문제 등] 프로그래밍 평가가 직면할 몇 가지 과제를 예상한다(Van der Vleuten et al. 2012, 페이지 211–212). 그러나 그들은 또한 연구 현장에서 열린 "여러가지manifold" 기회와 "무한한의 연구 가능성"에 분명히 흥분하고 있었다.
This signalled the shift of programmatic assessment from being about better reliability through sampling to something more, even de-emphasizing those earlier priorities. These claims of being all encompassing (i.e., programmatic assessment could be dual purposed) were intuitively appealing conceptual arguments and, it seems, widely taken up. However, empirical arguments were still limited, as evidenced by subsequent research that has explored these claims in more detail (Heeneman et al., 2015). In 2012, van der Vleuten et al. anticipate several challenges that programmatic assessment will be confronted with, such as costs and resources, bureaucracy, trivialisation and reductionism, legal restrictions, and the unknown (Van der Vleuten et al. 2012, pp. 211–212). But they also are clearly excited by the “manifold” opportunities and “infinite number of research possibilities” that have opened up in the scene of inquiry (Van der Vleuten et al., 2012, p. 212).

철학적 전제
Philosophical presuppositions

이 단계에서는 [질적 연구에서 informed된, 구성주의와 해석주의에 강하게 기초한] 사고가 분명하게 출현하였다. 비록 프로그램 평가의 초창기 도입시에도 이러한 개념들이 존재했더라도, 이 단계에서 이러한 요소들이 발전하고 강화되어 프로그램 평가의 구성주의 온톨로지의 기둥이 되었다. 이것이 단지 지엽적으로 교육 연구에서 일어나고 있는 일 때문인지 아니면 아마도 후기 실증주의 움직임에 대한 반응이었는지는 불분명하다. 예를 들어, 2007년에, 고바르트 외 연구진은 "인지, 동기 부여, 의사결정 이론의 요소들을 현장 기반 평가에 통합하는" "구성주의, 사회심리학적 관점"을 명시적으로 요구한다(Govaerts 외 2007, 페이지 252).

In this phase there was a clear emergence of thinking informed by qualitative research, strongly grounded in and informed by constructivism and interpretivism. Even if these notions were nascent in the original introduction of programmatic assessment, these elements advance and were strengthened here, becoming pillars of programmatic assessment’s constructivist ontology. It is unclear whether this was solely due to what was happening in education research peripherally, or whether it was perhaps in reaction to the post-positivist moves prior. For example, in 2007, Govaerts et al. explicitly call for “constructivist, social-psychological perspective” that “integrates elements of theories of cognition, motivation and decision making” into work-place based assessments (Govaerts et al. 2007, p. 252).

고바에르츠(Govaerts)와 판 데르 블뢰텐(Van der Bleuten, 2013)은 나중에 이를 "구성주의-해석주의 평가 프레임워크"로 제시한다. 본질적으로, 이 견해는 평가를 [사회적으로 구성되고 가치판단적인value-laden 것]으로 본다. 평가자는 평가 과정에 [자신의 신념과 가치]를 가져옵니다. 이것을 사소한 것으로 취급할 수 없으며, 평가 판단은 이런 의미에서 결코 '객관적'일 수 없다. 인간의 판단은 특이하다고 인식되었지만, 틀릴 수 있다. 따라서, 프로그램 평가에서의 의사결정은 [평가의 특정 순간]으로부터 상쇄될offset 필요가 있었다. 위원회는 이러한 검토 과정을 보다 신뢰할 수 있고 신뢰할 수 있게 만드는 데 도움이 되었다. 하지만 [역량이 무엇인지에 대한 개념]은 (존재론적으로) 덜 강조되어, 여전히 인식론적 쌍과 존재론적 쌍 사이에 어느 정도 모호함을 남겼다. 예를 들어, 삼각측량과 주관적 판단을 활용하는 것은 역량이 사회적으로 구성된 것으로 보였는가, 아니면 진정한 역량의 더 가까운 근사치로 보였는가? 그러한 질문들은 직접적으로 다루어지지 않았다.

Govaerts and van der Vleuten later present this as a “constructivist-interpretivist assessment framework” (Govaerts & van der Vleuten, 2013). Essentially, this view sees assessment as socially constructed and value laden. Assessors bring their own beliefs and values to the assessment process. This process cannot be neglected, and assessment judgments can never, in this sense, be ‘objective’. Human judgment was recognised as idiosyncratic, but fallible. Thus, decision-making in programmatic assessment needed to be offset from specific moments of assessment. Committees one-step removed from assessments helped to make this process of review more credible and trustworthy (Driessen et al., 2005; Schuwirth & van der Vleuten, 2011, p. 481). Notions of what competence is—ontologically—were less emphasized leaving some degree of blurring between epistemological and ontological pairings. For example, did leveraging triangulation and subjective judgements mean competence was viewed as socially constructed, or closer approximations of true competence? Such questions were not directly addressed.

이에 대한 다른 측면으로는 [능동적인 참여의 과정]으로서 학습을 강조하는 [사회문화적 학습 요소]가 있다. 평가와 학습 사이의 경계는 의도적으로 모호해졌다. 학습은 평가 활동에 내재되어 있었다. 학습자들은 "멘토/코치 같은 조연 배우"에게 의존하게 되었습니다. 이것은 프로그램 문헌에서 "코치"라는 용어가 "멘토"와 동시에 사용되는 것을 처음으로 알 수 있었습니다. 2011년 논문은 "[교육적 프로세스의 성과 지표로서 학습]과 [역량에 대한 새로운 사회 구성주의 이론의 출현]"을 언급하고 있다(Schuwirth & van der Blouten, 2011). 그들은 문헌에서 나타나는 "평가를 세팅하고 사용하는 방식의 급격한 변화"를 언급하면서, 이것은 "전통적인 접근법에 대한 반대를 절실히 필요로하는 운동"이라고 강조한다(슈비르트 & 반 데르 블뢰텐, 2011, 페이지 478).
The other side to this was the socio-cultural learning element, that emphasises learning as a process of active participation. The boundary between assessment and learning was deliberately blurred—learning was embedded into assessment activities. Learners came to rely on “supporting actors, such as mentors/coaches”. This was the first time we could note the term “coach” being used at the same time as “mentor” in the programmatic literature (Van der Vleuten et al., 2012, p. 211). The 2011 Schuwirth and van der Vleuten paper makes reference to the “emergence of new—social constructivist—theories on learning and the notion of competencies as outcome indicators of the educational process” (Schuwirth & van der Vleuten, 2011, p. 478). They echo the “radical changes in the way we set up and use assessment” from the literature and stress that this is a “highly needed antithetic movement against the traditional approaches” (Schuwirth & Van der Vleuten, 2011, p. 478).

['진짜 점수', '오류' 등의 개념] 및 [이 개념과 관련된 평가 방법을 버리는 것]이 보여주듯, 구성주의/해석주의의 렌즈를 통한 프로그램적 평가의 입지가 강화되었다. 이것은 흥미로운데, 그 당시 더 광범위한 평가 커뮤니티가 이를 따르지 않았기 때문이다. 현재 프로그램화된 평가 문헌 중 일부에는 후기-심리측정적 사고의 요소가 있다. 판 데르 블뢰텐 외 연구진 그들은 "심리측정학 담론이 불완전하다"고 보았기 때문에 "개인, 평가 도구들의 배타적인 심리측정학적인 담론을 넘어서기를" 희망했다(Van der Bleuten et al., 2012, 페이지 212). 고바어츠와 판 데르 블뢰텐은 2013년에 발간된 영향력 있는 논문에서 이러한 주제를 계속 이어가며 "숫자 등급과 표준화된 평가는 역량 평가의 프로그램적 접근에 있어 가치 있는 요소"라는 주장을 유지했지만, 그들은 "평가 프로그램에서 양적 및 질적 접근법의 신중한 균형을 목표로 해야 한다"고 제안한다(Govaerts & van der Bleuten, 2013, 페이지 1172).
There was a strong positioning of programmatic assessment through the lens of constructivism/interpretivism, such as abandoning concepts of ‘true scores’, ‘error’ and the assessment methods associated with them. This is interesting, as the broader assessment community had not, at this time, followed suit. There are elements of post-psychometric thinking in some of the programmatic assessment literature at this time. Van der Vleuten et al. hoped to “move beyond the exclusively psychometrically driven discourse of individual, assessment instruments”–because as they saw it, “psychometric discourse is incomplete” (Van der Vleuten et al., 2012, p. 212). Govaerts and van der Vleuten continued these themes in an influential paper published in 2013, maintaining that “numerical ratings as well as standardised assessments are valuable elements in programmatic approaches to competence assessment” (Govaerts & van der Vleuten, 2013). However, they propound that “we should aim for careful balancing of quantitative and qualitative approaches in our assessment programmes” (Govaerts & van der Vleuten, 2013, p. 1172).

이 단계 동안의 학술 논문은 [방법에 대한 논의]에서 [(방법론적 주장을 뒷받침하는) 보다 명확한 철학적 토대]로 전환되는 것으로 보인다. 이것은 다소 암묵적이지만, 상충하는competing 철학적 입장을 소개하기도 했다. 타당성 고려는 때때로 이러한 변화의 초석이었다. 역량에 대한 개념은 이론적이고 철학적인 관점에서 더 잘 설명되었고 평가 활동을 주도하는 근본적인 가정과 일치했다. 그러나 타당성은 분명히 그 논의의 일부가 아니었다. 실제로 타당성은 매우 중요한 개념이라기보다는 효용utility 모델에서 하나의 변수일 뿐이었고, Programmatic assessment에서 급진적으로 탈-강조된de-emphasized 변수였다. 프로그램 평가의 이 진화 단계에서는 [실용주의적 개념]이 더 중요한 것처럼 보인다. 예를 들어, 구성주의/해석주의에 대한 주장을 하면서도, '뭐든 다 된다anything goes'는 접근에 반대하며, "진실Truth"이 아니라 "주장"의 정당성과 방어 가능성을 지지하는 것과 같은 [실용주의라는 함의]가 있다.

The academic papers during this phase appear to shift from discussions of methods to more explicit philosophical underpinnings that support methodological arguments. This also introduced, although somewhat implicitly, competing philosophical positions. Validity considerations were at times the cornerstone to these shifts. Notions of competence were becoming better elucidated from a theoretical and philosophical perspective and they were matched to the underlying assumptions driving assessment activities. And yet validity was not explicitly part of that discussion. Indeed, validity was only one parameter in the utility model rather than an overarching concept, and a parameter that programmatic assessment radically de-emphasized. Pragmatist notions seem to matter more in this evolutionary phase of programmatic assessment. While there is a claim toward constructivism/interpretivism, there are undertones of pragmatism, for example arguing against an ‘anything goes’ approach and instead arguing for the justifications and defensibility of claims, not of Truths.

업샷
Upshots

이 시기의 흥미로운 결과 중 하나는 [프로그램 평가의 시행 경험]을 제시한 최초의 논문이었다. 2013년 한 논문은 프로그램 평가가 "실행하기 쉽지 않은 것으로 입증되었다"고 언급했다(Bok 등, 2013). 실제로 프로그램 평가의 문화적 요소가 가장 어려워 보였다. 예를 들어, 학생들을 위한 교수진 개발과 훈련에 대한 관심이 부족했습니다. 학생들은 점수가 낮은 평가조차도 총괄적이라고 느낀다는 것을 발견했습니다. 프로그래밍 평가의 학습 요소는 모든 이해관계자의 새로운 사고 방식을 필요로 하며, 프로그래밍 평가를 구현하는 것은 어려울 것이라는 것이 분명했다.
One interesting upshot from this period was the first papers that presented experiences from implementing programmatic assessment. A 2013 paper noted that programmatic assessment “proved not easy to implement” (Bok et al., 2013). Indeed, the cultural elements of programmatic assessment seemed to be the most challenging. For instance, insufficient attention was placed on faculty development and training for students. Students found that even the low-stakes assessments felt summative. It was clear that the learning elements of programmatic assessment would require a new way of thinking from all stakeholders, and that implementing programmatic assessment would be challenging.

[프로그램 평가]를 위한 실증적 타당성 주장이 매우 많이 진행되고 있었다. 이 전까지는 프로그래밍 평가를 지지하는 많은 주장은 개념적이고 이론적이었다(슈비르트 & 반 데르 블뢰텐, 2012). 맥락적 증거보다는 프로그램적 평가의 특징이 근거를 대신하고 되었다. [학습]을 강조하면서, [학습]을 향해 전환되었고, 교수설계는 [타당도 주장]은 탈-강조화하였다. 그리하여 실제로 이 두 가지가 충돌할 경우, 언제 어디에 중점을 두어야 하는지에 대한 약간의 불확실성이 남았다(방어성defensibility을 지지하는 활동은 학습learning을 지원하는 활동과 반대contrasted될 수 있다.)
Empirical validity arguments for programmatic assessment were very much in progress. Many of the arguments in support of programmatic assessment had been conceptual and theoretical (Schuwirth & van der Vleuten, 2012). Features of programmatic assessment rather than contextual evidence had come to serve as surrogates. The emphasis on and transition to learning, de-emphasized validity arguments in place of instructional designs. This left some uncertainty about where, in practice, to place emphasis if and when the two were in conflict (activities supporting defensibility contrasted with activities supporting learning).

의과대학이 이미 학생 학습의 비계, 수준 높은 피드백 제공, 멘토링에 집중하고 있었음에도 불구하고, 이 기간은 이러한 고려 사항을 평가 고려의 최전선에 올려놓았다. 확실히, 평가 프로그램을 보는 관점은 [부분의 합]보다는 [전체whole 측면]에 가까웠다. 많은 사람들에게 평가는 더 이상 [측정의 문제]라거나 [[합격 점수]에 대해서 순위를 매기려는 시도]가 아니었다. 평가는 이제 [학습을 더 넓은 관점에서 보는 복잡한 구성 요소]였고, [여러 형식과 맥락에 걸쳐 분포된 것]이었다. 또한 교수진에서 요구되는 평가의 양, 학습에 대한 평가의 극단적 강조와 관련된 실질적인 문제, 비용 관련 등과 같은 [평가에서 긴장감]이 나타났다. 그러나, 프로그래밍 방식의 평가는 여전히 많은 사람들에게 새로운 아이디어였다. 프로그램 평가의 다음 단계는 그것이 오늘날 의학 교육에서 평가 이론과 실습을 점점 더 규제하는 확고한 개념이 되는 것을 보았다.
Even if medical schools were already focusing on scaffolding student learning, providing high-quality feedback, and mentoring, this period brought these considerations to the forefront of assessment considerations. Certainly, assessment programs were being viewed more in terms of their whole, rather than in terms of the sum of their parts. For many, assessment was no longer a measurement problem, or an endeavour used rank candidates against cut scores. Assessment was now an intricate component of a broader perspective on student learning, and something that was distributed across multiple formats and contexts. There also emerged tensions in assessment, for example with the volume of assessment required in faculties, practical issues regarding the extreme emphasis on assessment for learning, cost implications, and so on. However, programmatic assessment was still, to many, a new idea. The next phase of programmatic assessment’s trajectory saw it become an entrenched notion that increasingly regulates assessment theory and practice in medical education today.

굳게 자리잡기: 풍부한 서술, 학문적 통합, 그리고 철학적 전제의 모호함(약 2013-2020)
Entrenchment: rich narratives, disciplinary consolidation and the blurring of philosophical presuppositions (approx. 2013–2020)

씬(scene) 구성
Configurations of the scene

프로그램 평가의 궤적에서 가장 최근의 단계는 [학문적 통합displinary consolidateion]의 단계로 절정에 이른다. 그러나 이에 앞서 먼저 ['정보의 풍부함']을 2013년 이후 고착화된 하나의 실질적인 요소로 파악할 수 있다. 이전 단계에서 [정보다양성diversity]과 [삼각측량triangulation]이라는 개념이 등장했지만, 이후 [풍부함richness]과 [의미meaningfulness]에 대한 강조가 나타난 것으로 보인다.

2013년, Govaerts & van der Blouten은 "성과에 대한 풍부하고 서술적인 평가"를 "학습 극대화를 위해 평가 시스템의 형성 기능을 강화"하고, "신뢰할 수 있는 의사 결정"을 보장하기 위한 "필수적인" 평가 데이터로 요구하였다(Govaerts & van der Bluten, 2013, 페이지 1171–1172).
그들은 이 개념을 "숫자에서 단어로의 변화"라고 포장한다(Govaerts & van der Blouten, 2013, 페이지 1172).

This most recent phase in the trajectory of programmatic assessment culminates in what we term disciplinary consolidation. But before this, first we identify ‘information richness’ as one substantive element that became entrenched after 2013. Although the notions of information diversity and triangulation appeared in the previous phase, it seems that the emphasis on richness and meaningfulness emerged later.

In 2013, Govaerts and van der Vleuten call for “rich, narrative evaluations of performance” to “enhance the formative function of the assessment system to maximise learning” and as “indispensable” assessment data to ensure “trustworthy decision making” (Govaerts & van der Vleuten, 2013, pp. 1171–1172).
They package this notion as “a shift from numbers to words” (Govaerts & van der Vleuten, 2013, p. 1172).

흥미롭게도, 이러한 추진은 더 넓은 의학 교육 문헌과 병행되었고, 2013년 Hodges의 기념비적 논문 제목에서 '포스트 사이코메트리 시대'라는 용어를 사용한 첫 번째 사례였다(Hodges, 2013). 이후 수많은 저자들은 우리가 지금 주관적이고 질적인 데이터가 점점 더 중시되는 시대에 살고 있다고 주장했다. 반구조적 인터뷰 및 기타 접근법에서 '부유한' 및 '두꺼운' 데이터를 도출하는 것과 같은 질적 연구 방법론의 개념은 새로운 통찰력을 창출했다(Bearman, 2019; Schultze & Avital, 2011). 그럼에도 불구하고, 우리와 다른 사람들이 이전에 주목했듯이, 반-심리측정적anti-psychometric 개념이 문헌을 포화시키기 시작했다(Pearce, 2020; Schoenherr & Hamstra, 2016).

Interestingly, this push was paralleled in the wider medical education literature, and 2013 was the first time the term ‘post-psychometric era’ was used in the title of Hodges’ seminal paper (Hodges, 2013). Since then, numerous authors have claimed that we are now living in an era where subjective and qualitative data are increasingly valued. Notions from qualitative research methodologies, such as eliciting ‘rich’ and ‘thick’ data from semi-structured interviews and other approaches have generated new insights (Bearman, 2019; Schultze & Avital, 2011). And yet, as we and others have previously noted, anti-psychometric conceptions have begun to saturate the literature (Pearce, 2020; Schoenherr & Hamstra, 2016).

일부에서는 이제 [숫자와 등급]이 [서술자와 서술어]에 비해 의미가 없다는 믿음이 강하다(Cook 등 2016년; Ginsburg 등 2017년; Hanson 등 2013년). 프로그램적 관점에서 본다면, 평가는 풍부하고 의미있는 평가 데이터를 요구한다. 그리고 이를 위해서는 점점 더 서술적인 정보가 필요하다. 이러한 움직임의 동인은 평가자의 독특한 번역 과정뿐만 아니라 프로그램 평가의 결정이 신뢰할 수 있고 신뢰할 수 있는지 확인하는 방법과 관련된 문제와 관련이 있는 것으로 보인다. 진보진영이나 역량위원회가 발표에서 [풍부하지 않거나 의미가 없는 자료]를 바탕으로 고부담 결정을 내리기는 어렵다는 주장이다. 그러나, 이것이 [데이터가 오로지 질적이어야 한다]는 뜻으로 여기는 것은 잘못되었을 수 있다(Pearce, 2020). 정성적 설계이든, 정량적 설계이든, 데이터의 수집, 축적, 집계 및 제시 방법과 관계없이, [유의성meaningfulness]이란 별도의 측면facet이며, 철학적 지향의 함수이다. 프로그래밍 방식의 평가가 발전해온 궤적 속에서 풍부한 서술적 질적 데이터(관련 방법을 통해 생성됨)가 가장 고평가된 것은 역사의 흥미로운 변덕이다. 아이러니하게도, 방법은 의미보다 우선시 되어왔다.
There is now a strong belief in some circles that numbers and grades are meaningless compared with descriptors and narratives (Cook et al. 2016; Ginsburg et al. 2017; Hanson et al. 2013). Assessment, when considered from a programmatic mindset, requires the assessment data to be rich and meaningful. And this, increasingly, requires narrative information. The driver of this move appears to be connected with the problem of how to make sure that decisions in programmatic assessment are credible and trustworthy as well as the idiosyncratic translational processes of assessors. The argument being that it is difficult for a progression or competence committee to make a high-stakes decision based on data that are not rich or meaningful in their presentation. However, it may have been misguided to assume that this necessitates that the data be solely qualitative (Pearce, 2020). Irrespective of how data is collected, accumulated, aggregated, and presented, meaningfulness is a separate facet—regardless of whether it is qualitative or quantitative in its design and is a function of philosophical orientations. It is an interesting vagary of history that due to the trajectory taken by programmatic assessment, rich narrative qualitative data (generated through associated methods) has become most highly valued. Ironically, methods have been prioritized over meaning.

이 단계에서 문헌을 포화시켜나간 또 다른 주요 이슈는 [실무자가 어떻게 프로그래밍 평가를 구현해야 하는지]에 대한 것이다. 복 연구원의 경험 이후 추진과제가 조사 현장의 초미의 관심사로 떠올랐다. 프로그래밍 평가에 관한 기념비즉 '12가지 팁' 논문은 2015년에 발표되었다(Van der Vluten et al. 2015). 본 논문은 2012년 논문에 따라 '학습을 위한assessment for 프로그래밍 평가'로 프로그램 평가를 제시하였다. 이 논문은 2015년 이후 프로그램 평가가 표현된 방식으로 많은 진화를 통합하는데 기여했으며, [학습자 중심의 교육적 요소, 의미 있는 피드백 및 멘토링, 그리고 중요한 프로세스 관련 고려사항과 구현 과제]를 강조한다.
The other main issue that saturates the literature in this phase is how practitioners should go about implementing programmatic assessment. After the experience of Bok et al., the challenges of implementation came to be a pressing concern in the scene of inquiry. A seminal ‘Twelve Tips’ paper on programmatic assessment was published in 2015 (Van der Vleuten et al. 2015). This paper presented programmatic assessment in line with the 2012 paper as ‘programmatic assessment-for-learning’. The paper serves to consolidate many of the evolutions in the way programmatic assessment had been expressed since 2015, underscoring the learner-centred pedagogy elements, the meaningful feedback and mentoring aspects, and importantly the process related considerations and implementation challenges.

본 논문과 함께, 2017년에 출판된 중요한 책 챕터는 프로그램 평가를 이 분야에서 확고한 하나의 하위 분야로 통합하는 데 중요한 역할을 했다. 역사학자들은 [교과서와 교육 자원]은 scene of inquiry의 중요한 참조점이 되기 때문에, 반복적으로 학문을 생성하며, 학문 분야의 통합이 뒤따른다고 주장해왔다(Badino & Navarro, 2013; Kragh, 2013). 하든 앤 헌트의 A Practical Guide for Medical Teachers 의 한 챕터에서는 더욱 설득력있게 설명한다. 즉, '전통적 접근법'과 비교했을 때, 프로그램적 평가를 혁신적이고 대안적인 접근법으로 전략적으로 배치하여 제시한 것이다. 이 챕터는 의학 교육을 종합적으로 다룬 교과서 속에 [프로그램 평가]의 위치를 공고히crystalize하였다. 더 많은 교과서 챕터가 지금 등장하고 있습니다. 예를 들어, 2020년에 출판된 Assessment in Health Professions Education 에는 프로그램 평가에 관한 장이 수록되어 있다(Van der Bluten 등, 2020).
Along with this paper, an important book chapter published in 2017 (Van der Vleuten et al., 2017) played a crucial role in consolidating programmatic assessment as its own sub-discipline. Historians have argued that textbooks and educational resources iteratively create and subsequently consolidate a discipline, as they become an important reference point for a scene of inquiry (Badino & Navarro, 2013; Kragh, 2013). The book chapter in Harden and Hunt’s A Practical Guide for Medical Teachers recounts a compelling narrative—programmatic assessment is presented as an innovative and alternative approach in medical education assessment, strategically positioned against ‘traditional approaches’ to assessment. This chapter crystalizes programmatic assessment in a comprehensive textbook on medical education. Further textbook chapters are now appearing. For instance, Assessment in Health Professions Education published in 2020 features a chapter on programmatic assessment (Van der Vleuten et al., 2020).

최근 몇 년간 [진부하고 문제가 많으며 전통적인 평가 방식]을 극복한 [학습과 혁신의 승리로서 프로그램 평가]의 이야기를 되짚어보는 내러티브가 이어지고 있다.

'시험'이 어떻게 '배움을 위한 프로그래밍식 평가'가 되었는지에 대한 선구자들의 논문이 발표되었다(슈워스 & 반 데르 블뢰텐, 2019).
또 다른 반 데르 블뢰텐은 2005년 논문을 재방문하여 프로그램적 사고가 의학 교육 평가라는 scene of inquiry에 어떤 영향을 미쳤는지 설명한다(Van der Bluten, 2016).
반 데르 블뢰텐의 여러 컨퍼런스 기조연설은 의료 교육에서의 평가에 대한 이야기부터, 실무자들이 직면한 문제들, 최선의 방법을 찾기 위한 고군분투, 그리고 이러한 문제들을 극복하기 위한 프로그래밍 방식으로의 사고로의 전환까지를 다시 다루었다.
이러한 강연의 비디오는 유튜브와 같은 동영상 플랫폼과 반 데르 블뢰텐의 개인 웹사이트(Van der Blouten n.d.)에서 쉽게 이용할 수 있다.

In recent years, a narrative that recounts the story of programmatic assessment as a victory for learning and innovation overcoming tired, problematic and traditional approaches to assessment has continued.

A paper by the pioneers on how ‘testing’ has become ‘programmatic assessment for learning’ was published (Schuwirth & van der Vleuten, 2019).
Another by van der Vleuten revisits the 2005 paper to recount how programmatic thinking has affected the scene of inquiry in medical education assessment (Van der Vleuten, 2016).
Multiple conference keynotes by van der Vleuten have retold the story of assessment in medical education, from the problems practitioners faced, the struggle to find the best methods, to the shift to thinking programmatically to overcome these problems.
Videos to these lectures are readily available online on video platforms such as YouTube and links provided on van der Vleuten’s personal website (Van der Vleuten n.d.).

우리는 이러한 자원의 가치나 질에 대해 어떠한 판단을 하려는 것이 아님을 강조하고자 한다. 우리는 단지 이러한 자원들이 어떠한 프로그래밍적 평가를 하위-학문분야로서 공고히 하는지를 보여주기 위해 기술하는 것이다. 현재까지는 프로그램적 평가에서 [철학적 영향이나 의미]를 함축적이고 불확실하게 남겨두고 있으며, [철학적 입장] 뿐만 아니라 [교수설계 및 타당도]에 대한 관점이 어떻게 흡수되고 있는지에 대해서 불완전하다.

We should stress that we are making no judgments on the value or quality of these resources. We are simply being descriptive to highlight how these resources all add to the entrenchment of programmatic assessment as its own sub-discipline in a way that may be incomplete by leaving philosophical influences or implications implicit and uncertain, and by blending (in some cases blurring) those philosophical positions as well as perspectives on instructional design and validity, in how these are taken up.

마지막으로, 프로그래밍 평가의 '학제적 통합'의 또 다른 대표적인 모습으로써, 의학 교육 컨퍼런스에서 프로그램 평가를 다루는 일련의 흐름을 보면 짐작할 수 있다. 유럽 의료 교육 협회(AMEE) 회의, 유럽 의료 평가 위원회(EBMA) 회의, 오타와 의학 및 의료 전문가 역량 평가에 관한 회의와 같은 주요 회의들이 현재 모두 프로그램 평가에만 배정된 스트림(세션)을 운영하고 있다. 2020년 오타와 컨퍼런스는 프로그램 평가가 '합의문' 과제 중 하나로 선정된 첫 번째 사례이기도 하다. 현재 scene inf inquiry에는 프로그래밍 방식의 평가가 굳게 자리를 잡았다.
Finally, another exemplar of the ‘disciplinary consolidation’ of programmatic assessment is the emergence of entire streams on programmatic assessment at medical education conferences worldwide. Major conferences such as the Association for Medical Education in Europe (AMEE) conference, the European Board of Medical Assessors (EBMA) conference, and Ottawa Conferences on the Assessment of Competence in Medicine and the Healthcare Professions, now all run streams dedicated to programmatic assessment. The 2020 Ottawa Conference was also the first time that programmatic assessment was selected as one of its ‘consensus statement’ undertakings. Programmatic assessment is now entrenched in the scene of inquiry.

철학적 전제
Philosophical presuppositions

이전에 나타난 구성주의/해석주의 철학적 존재론은 이제 확고히 자리잡은entrenched 철학적 전제가 되었다. 이것은 판 데르 블뢰텐 등에 의해 [명시적으로 주장]되었다. "학습에 대한 구성주의적 개념을 기본으로 한다면, 학습에 대한 프로그램적 평가를 훈련 연속체의 모든 부분에 적용할 수 있다." (Van der Vleuten et al., 2015, 페이지 641) 그러나 이전 단계에서 존재했던 [초기의 철학적 실용주의]가 가장 최근 시기에 강조된 것으로 보인다. 이 단계에서는 실용성utility에 대한 강조가 실용적인pragmatic 고려사항에 기초한다. 2017년 교과서 챕터의 요약은 [평가를 최적화 문제]로 설명하고 있으며, 이는 평가계의 많은 사상가들의 생각이기도 하다. (Van der Bluten et al., 2017, 페이지 302). 이러한 표현은 1996년 논문과 그 논문에 나온 효용 공식utility formula으로 거슬러 올라가는데, 다만 공식의 변수parameter만 암묵적으로 다를 뿐이다. 풍부한 정보 수집과 마찬가지로 학습과 피드백이 강조된다. 실무자는 프로그램적 접근방식을 실행하는 데 실용적일 필요가 있다. 이러한 실용성은 [교육생/학습자의 진급]에 대해 [방어가능하고 정당화가능한 결정을 내릴 수 있는 능력]을 inquirer에게 제공할 [필수적이고 합리적인 양의 다양한 평가 데이터]에 의해 informed될 것이다.
The constructivist/interpretivist philosophical ontology that previously emerged became an entrenched philosophical presupposition. This is explicitly propounded by van der Vleuten et al.: “Programmatic assessment-for-learning can be applied to any part of the training continuum, provided that the underlying learning conception is constructivist” (Van der Vleuten et al., 2015, p. 641). However, it appears that the nascent philosophical pragmatism that was present in the previous phase is emphasised in this most recent period. In this phase, the emphasis on utility is built on pragmatic considerations. The summary of the 2017 textbook chapter explicates assessment as an optimization problem, in line with many thinkers in assessment circles. (Van der Vleuten et al., 2017, p. 302). This language harks back to the 1996 paper and its utility formula, only now the formula parameters are implicitly different. Learning and feedback is emphasised, as is the gathering of rich information. It is clearly articulated that practitioners need to be pragmatic in executing a programmatic approach, and these considerations will be informed by the requisite and reasonable volumes of varied assessment data that will afford inquirers the capacity to make defensible and justified decisions about trainee/learner progress.

여기서 강조된 [철학적 실용주의]는 새로운 평가 은유와 유추의 확산에 의해 문헌에서 잘 뒷받침된다. 대표적인 은유로는 교육자와 실무자를 위한 [사고 도구] 또는 [휴리스틱]입니다. [상식]과 ['실용적이 되는 것being pragmatic']의 개념에 호소하는 진술에 대하여 논쟁을 걸기는 어렵다. 예를 들어,

Schuwirth 등은 [의료와 프로그램 평가 사이에 5가지 특정 유사점]을 도출하여 현재 의료에 대한 사고가 평가 시스템에서 실제로 제정될 수 있음을 시사한다(Schuwirth 등, 2017).
Uijtdehaage와 Schwirth는 프로그램적 평가를 [보컬 코치의 역할]처럼 생각할 수 있다고 제안한다: "가수가 (자주 피드백을 제공함으로써) 최대한의 잠재력을 달성하도록 돕지만, 결국 "가수가 합창단에 합류할 수 있는지 또는 솔리스트가 될 수 있는지"를 종합적으로 결정하는 것.
트위드와 윌킨슨은 [강력하고 방어 가능한 진행 결정을 내리기 위해 정보를 종합하는 방법]에 대한 탐구를 통해, [임상 의사결정]과 [배심원 의사결정]이 모두 [프로그램 평가의 의사결정]과 유사점이 있음을 보여준다(Tweed & Wilkinson, 2019).
같은 연구자들은 또한 프로그램 평가를 '임상 4상에 들어가는 약“about to enter Phase IV trials”'에 비유한다. 즉, 이제는 프로그램적 평가가 어떻게 더 광범위하고 다양한 맥락에서 적용될 수 있는지를 볼 때임을 시사한다. '전부 아니면 전무' 접근법에 의해 제약을 받기보다는, 프로그램 평가의 여러 요소들이 어떤 곳에서 도입될 수 있는지를 보아야 한다는 것이다.
이러한 움직임은 '프로그래밍적 사고programmatic thinking'를 말하는 피어스와 프라이도(Pearce & Pridaux, 2019)에 의해 더욱 반영된다.

The philosophical pragmatism underscored here is well buttressed in the literature by a proliferation of new assessment metaphors and analogies. These metaphors are thinking tools or heuristics for interested educators and assessment practitioners. It is hard to argue with such statements that appeal to common sense and notions of ‘being pragmatic’. For example,

Schuwirth et al. draw five specific analogies between healthcare and programmatic assessment, suggesting that currently thinking in healthcare can actually be enacted in assessment systems (Schuwirth et al., 2017).
Uijtdehaage and Schuwirth suggest that the process of programmatic assessment can be thought of in terms of the role of a vocal coach: helping “a singer achieve his or her utmost potential (by giving frequent feedback) but eventually” making “a summative decision whether the singer can join the choir or can be the soloist” (Uijtdehaage & Schuwirth, 2018, p. 350).
Tweed and Wilkinson draw parallels in clinical decision-making and jury decision-making with decision-making in programmatic assessment, exploring ways to aggregate information to make progression decisions that are robust and defensible (Tweed & Wilkinson, 2019).
They also compare programmatic assessment to a drug “about to enter Phase IV trials” (Wilkinson & Tweed, 2018, p. 191), suggesting that it is time to see how programmatic assessment can be applied more widely and in varied contexts, noting that elements of programmatic assessment can be implemented where feasible, rather than practitioners being constrained by an ‘all-or-nothing’ approach.
This move is further echoed by Pearce and Prideaux who speak of “programmatic thinking” and how it can be applied in post-graduate medical education (Pearce & Prideaux, 2019).

업샷
Upshots

프로그램 평가는 [그 자체로 하나의 철학적 접근법]이 되었다. 그것은, 어떤 의미에서는, 그것만의 패러다임이고, 그것은 그것만의 역사적 서사를 말해준다. 프로그램 평가의 제자disciple가 되기 위해서는, 의학 교육에서 평가의 역사 – 장애물, 문제, 함정 및 해결책 – 에 대해 배울 수 있다. 우리는 이 경건한 용어를 경멸적인 의미로 사용한 것이 아니며, 단지 서술적인 의미로 쓴 것이다. 전향자convert가 되기 위해서는 우선 그것의 역사적 서사를 감상하고 그것의 철학적 토대에 따라야 한다. 프로그래밍 방식의 평가는 교육자들에게 반향을 불러일으킨다. 배치된 은유들은 신뢰와 헌신을 용이하게 합니다. 그러나 프로그래밍 방식의 평가를 운영하는 것은 여전히 사람과 문화에 크게 의존하고 있다. 이해당사자들은 그것이 잘 작동하기 위해서는 그것을 믿어야 하고, 따라서 참여와 바이-인(buy-in)에 대한 중요한 요구입니다.
Programmatic assessment has become its own philosophical approach. It is, in a sense, its own paradigm and it tells its own historical narrative. In order to become a disciple of programmatic assessment, one can learn about the history of assessment in medical education—its obstacles, problems, pitfalls, and solutions. We don’t use this pious terminology in a pejorative sense, just in a descriptive sense. In order to become a convert, one must first appreciate its historical narrative and subscribe to its philosophical underpinnings. Programmatic assessment resonates with educators. The metaphors deployed facilitate trust and devotion. But operationalizing programmatic assessment remains heavily dependent on people and culture. Stakeholders need to believe it for it to work well, hence the crucial calls for engagement and buy-in.

프로그램적 평가라는 아이디어는 이제 완전히 자리를 잡았고, 은유와 '논쟁이 어려운 주장'들이 scene of inquiry에 스며들었다. 링가드는 건강 직업 교육에서 "갓-텀"에 대해 저술했으며(Lingard, 2009), 프로그래밍 평가가 이 지위에 도달한 것으로 보인다. 그러나 배심원단은 이러한 아이디어와 관련 주장이 경험적으로 버틸 수 있는지에 대해 여전히 의견이 분분하다.

더 많은 평가 데이터, 풍부한 정보, 의사결정 위원회 및 프로그래밍 프로세스를 통해 타당성이 향상되는가?
프로그래밍 방식의 평가는 그것이 처음 등장한 네덜란드에서 떨어진 문화적 맥락에서 효과가 있는가?

Programmatic assessment ideas have become entrenched, and metaphors and ‘hard to argue statements’ have permeated the scene of inquiry. Lingard has written about “god terms” in health professions education (Lingard, 2009), and it seems that programmatic assessment has reached this status. However, the jury (to borrow one metaphor) is still out on whether these ideas and associated claims bear out empirically.

Is validity enhanced through more assessment data, rich information, decision-making committees and programmatic processes?
Does programmatic assessment work in cultural contexts away from the Netherlands, where it first emerged?

변화 관리 및 실행 전략과 관련된 이슈가 문의자들에게 긴급한 질문이 되고 있습니다. [다양한 데이터 수집]이라는 프로그램 평가(및 주요 메시지)의 특징은 [타당성을 입증하는 더 전통적인 접근법]을 배제하고도 [타당성에 대한 증거]로 취급되고 있다. 다른 의과대학들은 프로그램적 사고의 구현과 그에 맞는 효과를 보는 방법을 모색하고 있다(Pearce et al., 연구자들은 프로그래밍 평가가 학생 학습에 미치는 영향(Heeneman 등, 2015)과 그것이 교사와 학습자를 위해 개념적으로 어떻게 인스턴스화되는지에 대한 실증 연구를 수행하고 있다(Shut 등, 2018, 2020). 연구자들은 다양한 기회에 여전히 흥분해 있다.

Issues around change management and implementations strategies are becoming pressing questions to inquirers. Features of programmatic assessment (and key messages) such as the collection of diverse data are being treated as evidence for or evidence of validity without more traditional or recommended approaches to demonstrating validity. Different medical schools are exploring ways of implementing aspects of programmatic thinking and seeing what works for them (Pearce et al., 2021). Researchers are conducting empirical research into the impact of programmatic assessment on student learning (Heeneman et al., 2015), and how it is conceptually instantiated for teachers and learners (Schut et al. 2018, 2020). Researchers remain excited by manifold opportunities.

성찰을 마무리하며
Concluding reflections

[역사적 인식론]이라는 지적 전통을 바탕으로, 우리는 의료 교육에서 [프로그램 평가]의 변화하는 구성을 비판적으로 추적하여 평가와 관련된 유동적이고 문화적 탐구 관행을 강조하였다. 우리는 독자들에게 우리가 '왜'가 아닌 이 역사의 '무엇'과 '어떻게'에 집중하고 있었음을 다시 한번 강조하고자 한다.

우리는 다른 사람들이 우리의 설명이 정확한지 판단하기 위해 비슷한 방식으로 문헌에 관여할 것을 적극적으로 권한다.
우리는 궤적이 다른 가능한 경로에서 특정 경로를 택한 이유에 대해 몇 가지 제안을 했지만, 다른 사람들은 다른 비판적 메타 철학 관점에서 프로그래밍 방식 평가(그리고 실제로 더 광범위하게 평가)를 신중하게 조사하기를 바란다.

Drawing upon the intellectual tradition of historical epistemology, we have attempted to critically trace the shifting configurations of programmatic assessment in medical education, underscoring the fluid, cultural practice of inquiry in relation to assessment. We remind the reader that we were focusing on the ‘what’ and the ‘how’ of this history, rather than the ‘why’.

We actively encourage others to engage with the literature in a similar way to determine whether our account is accurate.
Although we have made some suggestions throughout as to why the trajectory took a certain path over other possible paths, we hope others carefully investigate programmatic assessment (and indeed, assessment more broadly) from different critical meta-philosophical perspectives.

우리는 철학적 조사를 통해 현재의 평가 관행을 새롭게 조명할 수 있다고 주장한다. Scene of inquiry - 즉, 변화하는 질문, 문제, 관행 및 추정의 변화- 에 초점을 맞춤으로써, 프로그래밍 평가의 역사에 접근함에 있어 아이디어의 역사적, 철학적 뿌리를 명확히 설명하였다. 이것은 왜 특정한 긴장이 실제로 나타나는지를 설명해주며, 적어도 그 이유를 드러내어준다. 바라건대, 우리가 프로그램 평가를 뒷받침하는 동기motivation를 발굴하는 데 성공하여, "적응적이고 유연한 담론을 위한 공간"을 열었기를 바란다(Lingard, 2009, 페이지 627). 부록 1은 이러한 철학적 역사를 요약한 것이다. 우리는 역사적 조사로부터 밝혀질 중요한 요점이라고 보는 것에 대한 성찰로 결론을 내리고, 마지막으로 이러한 노력에 비추어 프로그램적 평가를 위한 '다음은 무엇인가'를 제안한다.

We argued that new light would be shed on current assessment practices by interrogating them philosophically. In approaching the history of programmatic assessment by focusing on the scene of inquiry—the shifting questions, problems, practices and presuppositions of inquirers (Jardine, 2000) —historical and philosophical roots of the idea have been elucidated. This may resolve, or at least, reveal why certain tensions emerge in practice. Hopefully we have succeeded in “excavating the motivations that underpin” programmatic assessment and opened “a space for an adaptive and flexible discourse” (Lingard, 2009, p. 627). Supplementary figure 1 summarizes this philosophical history. We conclude by offering some reflections on what we see as important points to emerge from our historical probing, and finally suggest ‘what next’ for programmatic assessment in light of this endeavour.

관점적 의미
Perspectival implications

우리는 철학적 궤적을 추적하여 프로그램 평가의 궤적을 형성하는 전환되고 암묵적인 주장을 끌어내려고 시도했다. 우리는 개념적이고 방법론적인 주장을 주로 생각하고 입증하는 방법으로서 철학적 가정이 어떻게 변화해 왔는지를 강조해 왔다. 이는 프로그램 평가의 특정한 긴장, 모순, 취약점뿐만 아니라 [인지된 이익]이 프로그램 평가에서 나타날 수 있는 이유를 밝힌다. 이러한 문제들 중 몇 가지는 [어떤 존재론적, 인식론적 우위를 취하느냐]에 따라 발생한다. 간단히 말해서, 이 관점주의perpectivist 렌즈(Pearce, 2013)는 [다른 철학적 입장이 다른 해석으로 이어질 수 있고], 또 [다른 해석을 이끌어 낼 수 있다]는 것을 의미한다. 이에 대한 세 가지 예를 제시합니다.
We have attempted to draw out the shifting, implicit arguments shaping the trajectory of programmatic assessment by tracing its philosophical trajectory. We have highlighted how philosophical assumptions have shifted mainly as a way of thinking about and substantiating conceptual and methodological arguments. This illuminates why perceived benefits, as well as certain tensions, contradictions and vulnerabilities may appear in programmatic assessment. Several of these issues arise depending on which ontological and epistemological vantage point is taken. In short, this perspectivist lens (Pearce, 2013) means that different philosophical positions may and will lead to different interpretations. We offer three examples of this:

일부는 프로그램 평가에서 의사결정에 편향 경향이 있다고 생각할 수 있다. 픽셀 은유는 강력하지만 정보의 포화에 도달하면 픽셀이 고정되는 경향이 있다. 일단 평가자들이 그 이미지가 모나리자(또는 고군분투하는 학생)라는 것을 알 수 있다면, 이 단일하고 고정된 그림은 이 학생과 관련된 미래의 결정에 영향을 미칠 것이다. 이러한 종류의 [편견과 평가의 공정성 문제]는 정확히 심리측정학적 접근법이 다루려고 했던 것이다. 이것은 평가에서 철학적 전제가 진화함에 따라, 만약 이전 위치의 강점에 주의를 기울이지 않는다면 이전 개념의 특징은 사라질 수 있다는 것을 예시한다.
Some may consider that there is a propensity for bias in decision-making in programmatic assessment. Although the pixel metaphor is powerful, pixels have a tendency to become fixed when saturation of information is reached. Once assessors can see that the image is the Mona Lisa (or a struggling student), this unitary and fixed picture will influence future decisions regarding this student. These kinds of biases and issues of fairness in assessment are precisely what psychometric approaches were meant to deal with. This exemplifies that as philosophical presuppositions evolve in assessment, features of previous conceptions may become lost if care is not taken to build upon the strengths of earlier positions.
평가 데이터 포인트의 시간적 구성요소로 인해 발생하는 어려움이 있다. 학습은 시간에 따라 변화하고 지식, 기술, 역량의 개발은 일관적이지도 선형적이지도 않다. 이것은 당신의 철학적 관점에 따라 프로그래밍 방식의 접근에서 또 다른 긴장입니다. 일부는 측정 및 후기 실증주의적 사고방식으로 문제에 접근하는 경우 이질적인 형태의 데이터 집계를 방어 가능성으로 볼 수 있는 반면, 다른 일부는 구성주의/인터프리즘의 위치에서 작업하는 경우 삼각측량 과정을 접근법의 강점으로 볼 수 있다. 이는 평가가 그 자체로 관점 프로세스이며, 이는 다른 문제를 야기한다는 것을 강조합니다.
There are challenges made by the temporal component of assessment datapoints. Learning changes with time, and the development of knowledge, skills and competencies are neither consistent nor linear. This is another tension in a programmatic approach, depending on your philosophical outlook. Some may see the aggregation of disparate forms of data as an ersatz defensibility if they approach the problem with a measurement and/or post-positivist mindset, while others will see this process of triangulation as a strength of the approach if they are working from the position of constructivism/interpretivism. Again, this highlights that assessment is itself a perspectival process, which brings other challenges.
심리측정적 렌즈를 통해 볼 때, 프로그램 평가는 [구인의 표현] 및 [구인의 무관련성]과 같은 문제와 관련하여 많은 함정을 가지고 있으며, 이는 교육 측정 분야에서 계속 논의되고 있다(Newton, 2020). 그러나 프로그램 평가의 제자들은 이것이 의학 교육에서 평가에 대한 잘못된 접근이라고 주장하며, 그러한 심리학적 정보에 근거한 입장을 폐쇄하기 위해 구성주의적 은유적 전략을 사용할 것이다. 우리가 여기서 설명하려는 것은 다양한 철학적 전제가 공동체를 위한 긴장을 조성하고 있다는 것이다.
When viewed through a psychometric lens, programmatic assessment has many pitfalls in relation to issues such as construct representation and construct irrelevance, which continue to dominate discussions in educational measurement circles (Newton, 2020). But disciples of programmatic assessment will utilise constructivist metaphorical strategies to shut down such psychometrically informed positions, arguing that this is a misguided approach to assessment in medical education. What we are attempting to elucidate here is that divergent philosophical presuppositions are creation tensions for the community.

우리는 위의 주장들 중 어느 것도 지지하거나 어느 한쪽 편을 드는 것이 아닙니다. 우리는 단지 철학적 전제가 왜 그렇게 중요한지를 강조하려고 시도하고 있을 뿐이다. 탐구 현장에서 함축된 의미와 실무자들이 내리는 결정은 역사적, 지역적 맥락에 위치한 철학적 전제에 의해 주도되는 관점일 것이다. 다른 이들은 [평가에서 발생하는 긴장에 대응하는 방법]이 [평가 정책과 실천의 운명을 결정한다]고 언급했다(Govaerts et al. 2019). 우리는 대화와 비판적 성찰을 통해 정보에 입각한 철학적 결정과 행동이 이루어질 수 있다는 희망에서 어떤 입장을 구독하기 전에 철학적 전제가 단순히 명시된다는 것을 비슷하게 지지한다.
We are not advocating any of the above arguments or taking any sides. We are merely attempting to highlight why philosophical presuppositions are so important. The implications in the scene of inquiry, and decisions that practitioners take, will be perspectival—driven by philosophical presuppositions, situated in historical and local contexts. Others have noted that the way we respond to tensions in assessment determines the fate of assessment policy and practice (Govaerts et al. 2019). We similarly advocate that philosophical presuppositions are simply made explicit before subscribing to a position in the hope that through dialogue and critical reflection, informed philosophical decisions and actions can be made.

효용과 실용주의 기반에 대한 재고
Rethinking utility and its pragmatist foundations

[효용utility]의 개념은 (비록 그것의 정확한 표현이 시간이 지남에 따라 변화하는 것처럼 보이긴 하나) 프로그래밍적 사고를 관통하는 핵심 실타래이다. 프로그램 평가는 [평가의 실용성]이라는 개념에 기초했으며, 이는 실제 실행 중인 실용주의의 예시화인 것으로 보인다. 이러한 맥락에서 실용주의가 의미하는 바는 명확하게 표현되지 않았다. 이 외에도 [효용성]은 방법론적인 초점이 되었다. 평가 철학을 접근함에 있어서 [평가에서 목적을 명확하게 표현]하고, [평가 행위의 실질적인 정당성을 요구]하는 것이다.
The notion of utility is a key thread that runs through programmatic thinking, although its precise manifestation appears to shift over time. Programmatic assessment was founded on the notion of utility in assessment, which appears to be an instantiation of pragmatism in action. Although what pragmatism means in this context has not been clearly articulated. More than this, utility has become a methodological focus; a way of approaching assessment philosophically by requiring the clear articulation of purpose in assessment and a substantive justification of assessment practice (Pearce, 2020; Tavares et al. 2019).

우리는 [철학적 관점]에 관한 일부 논쟁이, 특히 방법론적 선택의 질문 및 정당화와 관련해서는, 프로그램적 담론으로 들어갔다는 것을 인정한다. 예를 들어, 진행 또는 역량 위원회 심의에 정보를 제공하기 위해 더 다양한 평가 데이터를 수집하고 수집해야 하는 요건과 같은 것이다. 그러나 이는 해석주의나 구성주의 원칙에 대한 명시적인 논의에 앞서 제안되었다. 근본적인 가정과 철학적 헌신에 대한 논의는 연구원들이 거의 없거나 우선순위로 다루지 않았다.

We accept that some debate regarding philosophical outlooks has entered programmatic discourse, especially in relation to the interrogation and justification of methodological choices. For example, with the requirement to collect and collate more diverse assessment data to inform progression or competence committee deliberations. However, this was suggested prior to any explicit discussion of interpretivist or constructivist principles. Discussions about underlying assumptions and philosophical commitments have been almost absent or not taken up by researchers as a priority.

철학적 전제를 주의 깊게 설명할 필요성
The need to carefully elucidate philosophical presuppositions

HPE에서 프로그램 평가를 채택하는 경우, 우리는 철학적 고려가 없는 채택을 경고한다. 실무자는 어떤 가정과 기본적인 약속이 작용하는지 정확히 알지 못하더라도(또는 알 필요조차 없을지도 모른다) 프로그래밍 방식의 평가의 효용을 활용할 수 있다. 그러나 이는 실무에서 철학적 입장을 흐리게 할 수 있고, 진보 위원회가 탐색해야 할 복잡한 예시로 이어질 수 있다. 앞으로 나아가야 하는 방향은, [프로그램 평가의 채택자]들이 그들이 가지고 있는 [철학적 전제를 신중하게 설명]하고 [평가 상황에 대한 그러한 관점을 정당화하는 것]이다. 전반적으로, 우리는 프로그램 평가 이론과 실천의 철학적 동인에 더 많은 관심을 요구하고, 그것들이 명시되어야 할 필요성을 강조한다.
In cases where programmatic assessment is being adopted in health professions education, we caution against its adoption devoid of philosophical considerations. Practitioners are able to leverage the utility of programmatic assessment without knowing (or even needing to know) exactly what assumptions and underlying commitments are at play. But this can lead to a blurring of philosophical positions in practice, and convoluted instantiations for progression committees to navigate. The way forward would be for adopters of programmatic assessment to carefully elucidate the philosophical presuppositions they hold and to justify such perspectives for the assessment context. Overall, we call for more attention to the philosophical drivers of programmatic assessment theory and practice, and stress the need for them to be made explicit.

평가 경계의 모호함
The blurring of assessment boundaries

프로그램 평가의 창립자founders들이 [교수 설계와 평가를 혼합한 것]은 현명한 조치였다. 평가의 영향을 고려하지 않고 교육이 부드럽게 흘러갈flow 수 있다고 제안하는 것은 어리석은 일일 것이며, 실제로 가능한 한 개별화된 평가를 할 수 있는 강력한 사례가 있다. 그러나, 특히 이것이 문제가 될 수 있음을 시사하는 인접 연구를 고려할 때, 이러한 [이중적 목적dual purpose]은 [프로그래밍 평가의 가시thorn]가 될 수 있다(Duitsman 등, 2019; Heenman 등, 2015; Tavares 등, 2020). 그럼에도 불구하고, 프로그래밍 방식의 평가는 전통적인 평가 경계를 명확하게 모호하게 하고 연구자와 교육자들이 평가가 이루어지는 더 넓은 맥락을 고려하도록 강요했다.
The blending of instructional design and assessment by the founders of programmatic assessment was a smart move. It would be foolish to suggest that education can flow without considering the impact of assessment, and in practice there is a strong case to be made for individualizing assessment where possible. However, it may be that this dual purposing is a thorn in the side for programmatic assessment, especially given adjacent research that suggests this may be problematic (Duitsman et al., 2019; Heeneman et al., 2015; Tavares et al., 2020). Regardless, programmatic assessment has clearly blurred traditional assessment boundaries and forced researchers and educationalists to consider the wider context in which assessment takes place.

다음은 프로그램 평가를 위해 어디로 가야 하나요?
Where to next for programmatic assessment?

우리는 잠재적으로 생산적인 미래 연구 방법 및 프로그램 평가를 위한 개발 기회에 대한 몇 가지 제안으로 마무리하기를 원하지만, 프로그램 평가를 위한 미래 궤적에 대해 추측하고 싶지 않다. 다음은 다음과 같습니다.

We don’t wish to speculate as to what the future trajectory holds for programmatic assessment, although we would like to finish by making some suggestions regarding some potentially productive future research avenues and development opportunities for programmatic assessment. These are:

(i)프로그래밍 평가 및 실제로 일반적인 평가에서 근본적인 철학적 입장에 대한 더 많은 조사를 장려한다.
(i)to encourage more probing of underlying philosophical positions in programmatic assessment, and indeed, in assessment in general;

(ii)프로그래밍 평가를 제정할 때 실무자가 가정 및 약속을 보다 명확하게 하도록 권장한다.
(ii)to encourage practitioners to make assumptions and commitments more explicit when enacting programmatic assessment;

(iii)위에 언급된 이중 목적 때문에 발생하는 잠재적 긴장을 해결한다.
(iii)to resolve the potential tension that has arisen due to the dual purposing noted above; and

(iv)세심하게 고려되고 강력하게 표현된 철학적 실용주의가 보건 직업 교육의 프로그램적 평가를 위한 최선의 방법일 수 있음을 시사한다.
(iv)to suggest that a carefully considered and robustly articulated philosophical pragmatism may be the best way forward for programmatic assessment in health professions education.

Adv Health Sci Educ Theory Pract. 2021 Oct;26(4):1291-1310.

doi: 10.1007/s10459-021-10050-1. Epub 2021 Apr 24.

A philosophical history of programmatic assessment: tracing shifting configurations

J Pearce 1, W Tavares 2

Affiliations collapse

Affiliations

1Tertiary Education (Assessment), Australian Council for Educational Research, 19 Prospect Hill Road, Camberwell, VIC, 3124, Australia. jacob.pearce@acer.org.

2The Wilson Centre and Post-MD Education. University Health Network and University of Toronto, Toronto, ON, Canada.

PMID: 33893881

DOI: 10.1007/s10459-021-10050-1

Abstract

Programmatic assessment is now well entrenched in medical education, allowing us to reflect on when it first emerged and how it evolved into the form we know today. Drawing upon the intellectual tradition of historical epistemology, we provide a philosophically-oriented historiographical study of programmatic assessment. Our goal is to trace its relatively short historical trajectory by describing shifting configurations in its scene of inquiry-focusing on questions, practices, and philosophical presuppositions. We identify three historical phases: emergence, evolution and entrenchment. For each, we describe the configurations of the scene; examine underlying philosophical presuppositions driving changes; and detail upshots in assessment practice. We find that programmatic assessment emerged in response to positivist 'turmoil' prior to 2005, driven by utility considerations and implicit pragmatist undertones. Once introduced, it evolved with notions of diversity and learning being underscored, and a constructivist ontology developing at its core. More recently, programmatic assessment has become entrenched as its own sub-discipline. Rich narratives have been emphasised, but philosophical underpinnings have been blurred. We hope to shed new light on current assessment practices in the medical education community by interrogating the history of programmatic assessment from this philosophical vantage point. Making philosophical presuppositions explicit highlights the perspectival nature of aspects of programmatic assessment, and suggest reasons for perceived benefits as well as potential tensions, contradictions and vulnerabilities in the approach today. We conclude by offering some reflections on important points to emerge from our historical study, and suggest 'what next' for programmatic assessment in light of this endeavour.

Keywords: Assessment; Historical epistemology; History of assessment; Philosophical positions; Programmatic assessment.

저작자표시 (새창열림)

'Articles (Medical Education) > 평가법 (Portfolio 등)' 카테고리의 다른 글

프로그램적 평가를 위한 오타와 2020 합의문 - 2. 도입과 실천(Med Teach, 2021) (0)	2021.12.02
평가프로그램에 대한 오타와 2020 합의문 - 1. 원칙에 대한 합의 (Med Teach, 2021) (0)	2021.12.01
평가에서 독소 빼내기: 발달평가의 역할이 있는가? (Med Educ, 2016) (0)	2021.11.07
총괄적 위임결정의 타당도에 대하여 (Med Teach, 2021) (0)	2021.11.05
시험의 타당도에서 구인 타당도로, 그리고 다시 회귀? (Med Educ, 2012) (0)	2021.11.05

Passing the Torch : 의학을 가르치는 것은 횃불을 전달하는 것과 같다.