합목적적 평가 프로그램 설계를 위한 전문가 가이드라인 (BMC Med Educ, 2012)

Expert validation of fit-for-purpose guidelines for designing programmes of assessment

Joost Dijkstra1*†, Robert Galbraith2, Brian D Hodges3, Pauline A McAvoy4, Peter McCrorie5, Lesley J Southgate5, Cees PM Van der Vleuten1, Val Wass6 and Lambert WT Schuwirth1,7




배경

Background


퀄리티에 대한 서로 다른 목표와 관점,

different aims and adopting diverse view- points on quality,


  • 심리측정 관점에서의 퀄리티는 종합적 결정의 신뢰성과 "타당성에 대한 통일된 관점"으로 정의된다 [9-13]. 

  • 교육적 관점에서 초점은 목표, 교육, 그리고 바람직한 학습행동을 ​​자극하기 위한 평가의 사용에 관한 것이 었습니다

From a psychometric per- spective quality has been almost exclusively defined as the reliability of combinations of decisions and a “unified view of validity” [9-13]. 

From an educational perspective the focus has been on the alignment of objectives, instruction, and on using assessment to stimulate desirable learning behaviour [14-16]


  • Baartman [17]은 역량 기반 교육을 퀄리티의 기반으로 삼았고, authenticity와 유의미성과 같은 교육 기반 기준을 추가 할 것을 제안했다.

. In another study Baartman [17] took competency-based education as a basis for quality, and proposed adding education-based criteria, such as authenticity and meaningfulness, to the established psy- chometric criteria.


대부분의 연구들은 이미 평가가 수행 된 이후에 평가 품질을 결정합니다. 불행하게도, 이것은 양질의 평가프로그램을 개발하려는 디자이너에게는 크게 도움이 되지 않습니다.

Most of this research determines as- sessment quality afterwards, when assessment has already taken place. Unfortunately, this does not provide assess- ment designers with much support when they intend to construct a high-quality programme.


반면에 보다 광범위한 교육 수준의 지침이 있는데, 예를 들어 g., the Standards for educational and psychological testing 가 있다[18]. 그러나 이러한 표준은 주로 평가 프로그램 대신 단일 테스트 (즉, 측정 도구)에 집중됩니다. 그리고 표준이 전문가 판단에 개방되어 문맥상의 차이 (예 : 규정)를 인정하고 있음에도 불구하고 그들은 특정 시험 틀과 학습 평가의 관점에서 여전히 공식화되어있다.

On the other hand guidance is available at a broader educational level, e.g., the Standards for educational and psychological testing [18]. But these standards focus predominantly on single tests (i.e. the measuring instrument) instead of on programmes of as- sessment. And, despite the standards being open to expert judgement and acknowledging contextual differences (e.g. in regulations), they are still formulated from a specific testing framework and from the perspective of assessment of learning [19].


프레임 워크는 여러 계층으로 나누어지며 이해 관계자 및 인프라 (외부 계층)의 컨텍스트에 배치됩니다. 시작점은 프로그램의 목적 (프레임 워크의 핵심 요소)입니다. 이 목적을 위해 5 개의 층 (차원)이 구별되었습니다. 

  • (1) 실행중인 프로그램은 프로그램의 핵심 활동, 즉 정보 수집, 정보 결합 및 평가, 후속 조치를 설명합니다. 

  • (2) 프로그램 지원은 테스트 건설 및 교수진 개발 개선, 이해 관계자의 수용 가능성 및 호소 가능성 확보와 같은 현재 평가 프로그램을 최적화하기위한 활동을 설명합니다. 

  • (3) 프로그램을 문서화하면 방어 가능한 프로그램을 달성하고 조직 학습을 포착하는 데 필요한 활동이 설명됩니다. 여기에는 규칙 및 규정, 학습 환경 및 도메인 매핑이 포함됩니다. 

  • (4) 프로그램 개선은 프로그램이 승인 된 후에 평가 프로그램의 재 설계를 목표로하는 차원을 포함합니다. 활동은 R & D 및 변경 관리입니다. 

  • (5) 프로그램을 정당화하는 최종 계층은 효과 성, 효율성 및 수용 가능성을 고려하여 프로그램의 목적이 달성되었다는 증거를 제공하기위한 활동을 설명합니다.

The framework is divided into several layers and is placed in the context of stakeholders and infrastructure (outer layer). The starting point is the purpose of the programme (key element in the framework). Around the purpose, 5 layers (dimensions) were distinguished. 

  • (1) Programme in action describes the core activities of a programme, i.e. collecting information, combining and valuing the information, and taking subsequent action. 

  • (2) Supporting the programme describes activities that are aimed at optimizing the current programme of assess- ment, such as improving test construction and faculty de- velopment, as well as gaining stakeholder acceptability and possibilities for appeal. 

  • (3) Documenting the programme describes the activities necessary to achieve a defensible programme and to capture organizational learning. Ele- ments of this are: rules and regulations, learning environ- ment, and domain mapping. 

  • (4) Improving the programme includes dimensions aimed at the re-design of the programme of assessment, after the programme is admi- nistered. Activities are R&D and change management. 

  • (5) The final layer justifying the programme describes activities that are aimed at providing evidence that the purpose of the programme is achieved taking account of effectiveness, efficiency, and acceptability.





Method


Study design


The development and validation of design guidelines was divided into four phases, 

    • starting with a brainstorm phase to generate ideas using a core group of experts (JD, CvdV and LWTS), 

    • followed by a series of discussions with a wider group of international experts to elaborate on this brainstorm. 

    • Next in a refinement phase, the design guidelines were fine-tuned based on the analysis of the discussions. 

    • Finally a member check phase was initiated to validate the guidelines based on expert consensus.


Participants


Procedure and data analysis


The brainstorm was done by the research team (JD, CvdV,LWTS) based on their experience and data from the preceding study [5]. This resulted in a first draft of the set of guidelines, which served as a starting point for the discus-sion phase. The discussion took place in multiple (Skype®)interviews with the participants. Individual interviews were held with each participant and led by one researcher (JD)with the support of a second member of the research team(either CvdV or LWTS). The interview addressed the first draft of guidelines and was structured around three open questions: 

    • 1. Is the formulation of the guidelines clear, con-cise,correct?

    • 2. Do you agree with the guidelines? 

    • 3. Are any specific guidelines missing? 

The interviews were recorded and analysed by the research team to distil a con-sensus from the various opinions, suggestion, and recom-mendations. One researcher (JD) reformulated the guidelines and to avoid overly adherence to initial formulations the interview data (expert suggestions) were taken as starting point. The goal of the new formulation was to rep- resent the opinions and ideas expressed by the experts as accurately as possible. Peer debriefing was done to check the reformulation by the research team (JD, CvdV, & LWTS) to reach initial consensus. After formulating a complete and comprehensive set of guidelines, a member- check procedure was conducted by e-mail. All participants were sent the complete set for final review and all responded. No content-related issues had to be resolved and some wording issues were resolved as a final consensus document was generated.


Results



일반사항

General



I). 결정 (및 그 결과)은 그것이 근거하고 있는 정보의 품질에 비례해야 합니다.

I). Decisions (and their consequences) should be proportionate to the quality of the information on which they are based.


II) 설계 프로세스의 모든 결정은 과학적 증거 또는 모범 사례의 증거에 의해 뒷받침되는 것이 바람직하다. 평가 프로그램을 설계 할 때 내린 선택을 뒷받침하는 증거가 없다면, 그 결정은 연구를 위해 우선 순위가 높은 것으로 식별되어야합니다.

II) Every decision in the design process should be underpinned preferably supported by scientific evidence or evidence of best practice. If evidence is unavailable to support the choices made when designing the programme of assessment, the decisions should be identified as high priority for research.


III) 평가 프로그램에서 활동을 수행하기 위해서는 특정 전문 지식을 이용할 수 있어야한다.

III) Specific expertise should be available (or sought) to perform the activities in the programme of assessment.




영역별 핵심 가이드라인 

Salient guidelines per dimensions in the framework



목적, 이해관계자, 구조

Purpose, stakeholders, and infrastructure


A1 평가 프로그램의 하나의 핵심 목적이 공식화되어야한다.

A1 One principal purpose of the assessment programme should be formulated.


A4 평가 프로그램에 대한 기회뿐만 아니라 제한 사항은 초기 단계에서 확인되어 설계 과정에서 고려되어야합니다.

A4 Opportunities as well as restrictions for the assessment programme should be identified at an early stage and taken into account in the design process.


A7 다양한 이해 관계자가 설계 프로세스에 참여하는 수준은 프로그램의 목적과 이해 관계자 자신의 필요에 따라 결정되어야합니다.

A7 The level at which various stakeholders participate in the design process should be based on the purpose of the programme as well as the needs of the stakeholders themselves.




프로그램 실행 

Programme in action


B1 프로그램의 평가 요소를 선택할 때 평가 프로그램의 목적에 기여하는 정도가 지침 원칙이어야합니다.

B1 When selecting an assessment component for the programme, the extent to which it contributes to the purpose(s) of the assessment programme should be the guiding principle.


B14 서로 다른 평가 요소에 의해 얻어진 정보의 결합은 목적, 내용 또는 데이터 패턴에 의해 정의된 유의미한 실체에 기초하여 정당화되어야한다.

B14 Combination of the information obtained by different assessment components should be justified based on meaningful entities either defined by purpose, content, or data patterns.


B21 정보는 평가의 목적과 관련하여 이해 관계자에게 최적으로 제공되어야한다.

B21 Information should be provided optimally in relation to the purpose of the assessment to the relevant stakeholders.



프로그램 지원

Supporting the programme



C4 평가 구성 요소 구축을 지원하려면 도메인 전문 지식 및 평가 전문성이 필요합니다.

C4 Support for constructing the assessment components requires domain expertise and assessment expertise.


C6 고부담 평가일수록 절차가 더 강력해야 한다.

C6 The higher the stakes, the more robust the procedures should be.


C8 프로그램의 acceptance가 광범위하게 모색되어야한다.

C8 Acceptance of the programme should be widely sought.




프로그램 문서화

Documenting the programme


D9 도메인 맵은 평가 프로그램에서 도메인을 적합하게 보여주어야한다.

D9 A domain map should be the optimal representation of the domain in the programme of assessment.


프로그램 개선

Improving the programme


E1 정기적 인 반복적 인 평가 및 개선 프로세스가 마련되어 피드백 루프를 마감해야합니다.

E1 A regular and recurrent process of evaluation and improvement should be in place, closing the feedback loop.


E4 변화를위한 모멘텀은 필요한 우선 순위 또는 외부 압력으로부터 생성되어야한다.

E4 Momentum for change has to be seized or has to be created by providing the necessary priority or external pressure.



프로그램 합리화

Justifying the programme


F2 새로운 시도 (개발)에는 평가, 바람직하게는 과학 연구가 수반되어야합니다.

F2 New initiatives (developments) should be accompanied by evaluation, preferably scientific research.


F6 비용 편익 분석은 프로그램의 목적에 비추어 정기적으로 이루어져야한다. 장기적으로보다 자원 효율적인 대안을 모색하는 적극적인 접근 방식을 채택해야합니다.

F6 A cost-benefit analysis should be made regularly in light of the purposes of the programme. In the long term, a proactive approach to search for more resource-efficient alternatives should be adopted.


F10 기밀성 및 정보 보안은 적절한 수준에서 보장되어야합니다.

F10 Confidentiality and security of information should be guaranteed at an appropriate level.




고찰 및 결론 

Discussion and conclusion


최대한 포괄적이되도록 노력하면서 우리는 과잉 포용의 위험에 대해 인정합니다. 우리는 평가 프로그램을 설계 할 때 이러한 지침을 신중하게 적용해야한다는 점을 강조하고자합니다. 인식한 상황과 실제 상황이 다르다는 것을 강조하며, 모든 지침이 모든 상황에서 적절하지 않을 수도 있습니다. 따라서 평가 프로그램을 설계하는 것은 어떤 지침을 다른 지침보다 우선해야 하는지를 선택하는 등 선택의 폭이 넓고 타협하는 것을 의미합니다. 그럼에도 불구하고 우리는이 프로그램이 평가 프로그램의 프레임 워크와 결합되어 설계자가 평가 프로그램의 복잡한 역 동성에 대한 개요를 유지할 수 있다고 생각합니다. 상호 연관된 일련의 가이드 라인은 설계자가 문제가있는 영역을 예측하는 데 도움을 주며 그렇지 않은 경우 실제 문제가 발생할 때까지 암시 적으로 유지됩니다.

In trying to be as comprehensive as possible we acknow- ledge the risk of being over-inclusive. We would like to stress that when designing a programme of assessment, these guidelines should be applied with caution. We recog- nise and indeed stress that contexts differ and not all guidelines may be relevant in all circumstances. Hence, designing an assessment programme implies making delib- erate choices and compromises, including the choice of which guidelines should take precedence over others. Nevertheless, we feel this set combined with the frame- work of programmes of assessment enables designers to keep an overview of the complex dynamics of a programme of assessment. An interrelated set of guide- lines aids designers in foreseeing problematic areas, which otherwise would remain implicit until real problems arise.



Additional file 1 Addendum complete set of guidelines - BMC Med Educ - final.doc. This addendum contains the set of 72 guidelines developed and validated in this study.



12909_2012_589_MOESM1_ESM.docx





Introduction

GENERAL GUIDELINES

I           Decisions (and their consequences) should be proportionate to the quality of the information on which they are based.

II         Every decision in the design process should be underpinned preferably supported by scientific evidence or evidence of best practice. If evidence is unavailable to support the choices made when designing the programme of assessment, the decisions should be identified as high priority for research.

III        Specific expertise should be available (or sought) to perform the activities in the programme of assessment.

PURPOSE OF THE PROGRAMME

A1       One principal purpose of the assessment programme should be formulated.

A2       Long-term and short-term purposes should be formulated. But the number of purposes should be limited.

A3       An overarching structure which projects the domain onto the assessment programme should be constructed.

INFRASTRUCTURE

A4       Opportunities as well as restrictions for the assessment programme should be identified at an early stage and taken into account in the design process.

A5       Design decisions should be checked against consequences for the infrastructure. If necessary compromises should be made, either adjusting the purpose(s) of the assessment programme or adapting the infrastructure.

STAKEHOLDERS

A6       Stakeholders of the assessment programme should be identified and a rationale provided for including the expertise of different stakeholders (or not) and the specific role(s) which they should fulfil.

A7       The level at which various stakeholders participate in the design process should be based on the purpose of the programme as well as the needs of the stakeholders themselves.

PROGRAMME IN ACTION

Collecting Information

B1       When selecting an assessment component for the programme, the extent to which it contributes to the purpose(s) of the assessment programme should be the guiding principle.

B2       When selecting an assessment (component or combination), consideration of the content (stimulus) should take precedence over the response format.

B3       The assessment should sample the intended cognitive, behavioural or affective processes at the intended level.

B4       The information collected should be sufficiently informative (enough detail) to contribute to the purpose of the assessment programme.

B5       The assessment should be able to provide sufficient information to reach the desired level of certainty about the contingent action.

B6       The effect of the instruments on assessee behaviour should be taken into account.

B7       The relation between different assessment components should be taken into account

B8       The overt and covert costs of the assessment components should be taken into account and compared to alternatives.

B9       Assessment approaches that work well in a specific context (setting) should first be re-evaluated before use in another context (setting) before implementation.

B10     A programme of assessment should deal with error and bias in the collection of information. Error (random) is unpredictable and should be reduced by sampling (strategies). Bias (Systematic) should be analysed and its influence should be reduced by appropriate measures.

B11     Any performance categorisation system should be as simple as possible.

B12     When administering an assessment (component), the conditions (time, place, etc.) and the tasks (difficulty, complexity, authenticity, etc) should support the purpose of the specific assessment component.

B13     When scheduling assessment, the planning should support instruction and provide sufficient opportunity for learning.

Combining Information

B14     Combination of the information obtained by different assessment components should be justified based on meaningful entities either defined by purpose, content, or data patterns.

B15     The measurement level of the information should not be changed.

B16     The consequences of combining information obtained by different assessment components, for all stakeholders, should be checked.

Valuing Information

B17     The amount and quality of information on which a decision is based should be in proportion to the stakes.

B18     A rationale should be provided for the standard setting procedures.

Taking Action

B19     Consequences should be proportionally and conceptually related to the purpose of the assessment and justification for the consequences should be provided.

B20     The accessibility of information (feedback) to stakeholders involved should be defined.

B21     Information should be provided optimally in relation to the purpose of the assessment to the relevant stakeholders.

SUPPORTING THE PROGRAMME

Construction Support

C1       Appropriate central governance of the programme of assessment should be in place to align different assessment components and activities.

C2       Assessment development should be supported by quality review to optimise the current situation (Programme in Action), appropriate to the importance of the assessment.

C3       The current assessment (Programme in Action) should be routinely monitored on quality criteria.

C4       Support for constructing the assessment components requires domain expertise and assessment expertise.

C5       Support tasks should be well-defined and responsibilities should lie with the right persons.

Political and Legal Support

C6       The higher the stakes, the more robust the procedures should be.

C7       Procedures should be made transparent to all stakeholders.

C8       Acceptance of the programme should be widely sought.

C9       Protocols and procedures should be in place to support appeal and second opinion.

C10     A body of appeal should be in place

C11     Safety net procedures should be in place to protect both assessor and assessee.

C12     Protocols should be in place to check (the programme in action) on proportionality of actions taken and carefulness of assessment activities.

DOCUMENTING THE PROGRAMME

Rules and Regulations (R&R)

D1       Rules and regulations should be documented.

D2       Rules and regulations should support the purposes of the programme of assessment.

D3       The impact of rules and regulations should be checked against managerial, educational, and legal consequences.

D4 In drawing up rules and regulations one should be pragmatic and concise, to keep them manageable and avoid complexity.

D5       R&R should be based on routine practices and not on incidents or occasional problems.

D6       There should be an organisational body in place to uphold the rules and regulations and take decisions in unforeseen circumstances.

Learning Environment

D7       The environment or context in which the assessment programme has to function should be described.

D8       The relation between educational system and assessment programme should be specified.

Domain Mapping

D9       A domain map should be the optimal representation of the domain in the programme of assessment.

D10     A domain map should not be too detailed.

D11     Starting point for a domain map should be the domain or content and not the assessment component.

D12     A domain map should be a dynamic tool, and as a result should be revised periodically.

IMPROVING THE PROGRAMME

R&D

E1        A regular and recurrent process of evaluation and improvement should be in place, closing the feedback loop.

E2        If there is uncertainty about the evaluation, more information about the programme should be collected.

E3        In developing the programme (re-design) again improvements should be supported by scientific evidence or evidence of best practice.

Change Management

E4        Momentum for change has to be seized or has to be created by providing the necessary priority or external pressure.

E5        Underlying needs of stakeholders should be made explicit.

E6        Sufficient expertise about change management and about the local context should be sought.

E7        Faculty should be supported to cope with the change by providing adequate training

JUSTIFYING THE PROGRAMME

Effectiveness

Scientific Research

F1        Before the programme of assessment is designed, evidence should to be reviewed.

F2        New initiatives (developments) should be accompanied by evaluation, preferably scientific research.

External Review

F3        The programme of assessment should be reviewed periodically by a panel of experts.

F4        Benchmarking against similar assessment programmes (or institutes with similar purposes) should be conducted to judge the quality of the programme.

Efficiency: cost-effectiveness

F5        In order to be able to justify the resources used for the assessment programme, all costs (in terms of resources) should be made explicit.

F6        A cost-benefit analysis should be made regularly in light of the purposes of the programme. In the long term, a proactive approach to search for more resource-efficient alternatives should be adopted.

Acceptability: political-legal justification

F7        Open and transparent governance of the assessment programme should be in place and can be held accountable

F8        In order to establish a defensible programme of assessment there should be one vision (on assessment) communicated to external parties.

F9        The assessment programme should take into account superseding legal frameworks.

F10      Confidentiality and security of information should be guaranteed at an appropriate level.




 2012 Apr 17;12:20. doi: 10.1186/1472-6920-12-20.

Expert validation of fit-for-purpose guidelines for designing programmes of assessment.

Author information

1
Department of Educational Development and Research, Maastricht University, Maastricht, The Netherlands. Joost.dijkstra@maastrichtuniversity.nl

Abstract

BACKGROUND:

An assessment programme, a purposeful mix of assessment activities, is necessary to achieve a complete picture of assessee competence. High quality assessment programmes exist, however, design requirements for such programmes are still unclear. We developed guidelines for design based on an earlier developed framework which identified areas to be covered. A fitness-for-purpose approach defining quality was adopted to develop and validate guidelines.

METHODS:

First, in a brainstorm, ideas were generated, followed by structured interviews with 9 international assessment experts. Then, guidelines were fine-tuned through analysis of the interviews. Finally, validation was based on expert consensus via member checking.

RESULTS:

In total 72 guidelines were developed and in this paper the most salient guidelines are discussed. The guidelines are related and grouped per layer of the framework. Some guidelines were so generic that these are applicable in any design consideration. These are: the principle of proportionality, rationales should underpin each decisions, and requirement of expertise. Logically, many guidelines focus on practical aspects of assessment. Some guidelines were found to be clear and concrete, others were less straightforward and were phrased more as issues for contemplation.

CONCLUSIONS:

The set of guidelines is comprehensive and not bound to a specific context or educational approach. From the fitness-for-purpose principle, guidelines are eclectic, requiring expertise judgement to use them appropriately in different contexts. Further validation studies to test practicality are required.

PMID:
 
22510502
 
PMCID:
 
PMC3676146
 
DOI:
 
10.1186/1472-6920-12-20


+ Recent posts