사례 기반 다지선다형 문항 작성을 위한 ChatGPT 프롬프트(Spanish Journal of Medical Education, 2023)
A ChatGPT Prompt for Writing Case-Based Multiple-Choice Questions 
Yavuz Selim Kıyak1 *

 

소개
Introduction

의과대학에서는 의대생의 고차원적인 능력, 특히 임상적 추론 능력을 평가하기 위해 다양한 필기 평가 방법을 사용합니다. 이러한 방법에는 사례 기반 객관식 문제(MCQ), 핵심 기능 문제, 스크립트 일치도 테스트 등이 포함됩니다(1). 이 중 사례 기반 MCQ는 매우 효과적이고 일반적으로 활용되는 것으로 알려져 있습니다(2). 그러나 의과대학의 한정된 자원으로 양질의 문항(문항)을 작성하는 과정은 상당한 도전 과제입니다.  
Medical schools use various written assessment methods to evaluate higher-order skills of medical students, particularly clinical reasoning. These methods encompass case-based multiple-choice questions (MCQs), key-feature questions, script concordance tests, and others (1). Among these, case-based MCQ stands out as both highly effective and commonly utilized (2). However, the process of writing high-quality items (questions) presents a significant challenge on the limited resources of medical schools. 

특히, 고차원적인 기술을 평가하기 위해 고안된, 맥락이 풍부한 단일 문항을 작성하려면 의대 교수의 상당한 시간 투자가 필요하며, 이는 종종 몇 시간으로 연장되기도 합니다(3). 많은 노력이 필요한 과정이지만, 의학교육에서 상당한 양의 MCQ에 대한 수요는 도전의 수준을 크게 높입니다. 진도 테스트만을 위한 문제은행을 개발하는 경우에도 수천 개의 문제를 통합해야 합니다(4). 따라서 의과대학이 직면한 중요한 과제는 고품질의 MCQ를 대량으로 작성하는 것입니다. 
More specifically, writing a single context-rich MCQ designed to assess higher-order skills demands a significant investment of a medical teacher's time, often extending into hours (3). While it requires an effortful process, the demand for a substantial quantity of MCQs within medical education significantly increases the level of challenge. Even developing a question bank solely for progress testing necessitates incorporating thousands of questions (4). Hence, the significant challenge faced by medical schools lies in writing a high quantity of high-quality MCQs. 

기존 솔루션
Existing Solutions

교육 분야에서 대량의 MCQ가 요구됨에 따라 연구자들은 소프트웨어를 사용하여 한 번에 많은 수의 문항을 생성하는 새로운 방법을 개발하게 되었습니다. 자동 문항 생성(AIG)은 "컴퓨터 기술을 사용하여 모델을 사용하여 문항을 생성하는 프로세스"(5)로서 이러한 목적으로 개발되었습니다. 템플릿 기반 AIG는 임상 문제에 직면했을 때 내용 전문가의 사고 과정을 드러내는 인지 모델을 만드는 것을 기반으로 합니다. 그런 다음 주제별 전문가는 질병/문제에 대한 인지 모델을 기반으로 항목 모델을 만듭니다. 문항 모델에는 템플릿 MCQ가 구축되고, 주제별 전문가가 제공한 사양을 고려하여 문항 내 적절한 부분에 적절한 단어를 할당하는 단어가 만들어집니다(6). 문항 모델은 수백 개의 문항을 만들 수 있기 때문에 템플릿 기반 AIG는 기존의 문항 작성 방식보다 실현 가능성이 높습니다.  
High volume of MCQ need in education led researchers to develop new methods to generate high number of items at once by using software. Automatic Item Generation (AIG) has been developed for this purpose as "a process of using models to generate items using computer technology" (5). Template-based AIG is based on creating a cognitive model, which reveals subject matter experts’ thinking process when they faced with a clinical problem. Then, subject matter experts create an item model based on the cognitive model on a disease/problem. In the item model, a template MCQ is built, and the words assigning the appropriate words to the appropriate parts in the items considering the specifications provided by subject matter experts (6). An item model is able to result in hundreds of MCQs, therefore template-based AIG is more feasible than traditional way of item writing. 

템플릿 기반 AIG는 지난 10년 동안 큰 성공을 거두었습니다. 영어, 프랑스어, 중국어, 스페인어, 한국어, 터키어 등 다양한 언어로 성공적으로 구현된 사례가 문헌에 나와 있습니다(6,7). 또한 캐나다의 국가 의사 면허 시험에서도 자동 생성된 문제가 사용되었습니다(8).  
Template-based AIG has shown a great success in the last ten years. Successful implementations have been shown in the literature in various languages, such as English, French, Chinese, Spanish, Korean, and Turkish (6,7). Furthermore, automatically generated questions have been used in national medical licensure exam in Canada (8). 

템플릿 기반 인공지능이 기존 문항 작성에 비해 우월함에도 불구하고, 인지 모델과 문항 모델을 개발하기 위해서는 분야별 전문가가 필요하기 때문에 여전히 인간의 노력에 의존하고 있습니다. 인공지능(AI)을 사용하면 시험 개발자가 문항을 생성하는 데 드는 노력이 줄어들지만, 연구자들은 여전히 템플릿 기반 방식에 비해 만족스럽지 못한 문항을 제공했기 때문에 이러한 목적으로 AI를 사용하는 것보다 템플릿 기반 AIG를 선호했습니다(6).  
Despite the superiority of template-based AIG over traditional item writing, it still depends on human efforts as it requires subject matter experts to develop cognitive models and item models. Although artificial intelligence (AI) is less effortful for test developers to generate items, researchers still preferred template-based AIG over the use of AI for this purpose because AI provided unsatisfying MCQs compared to template-based methods (6). 

MCQ 생성을 위한 AI
AI for Generating MCQs

AIG의 비템플릿 기반 방법은 자연어 처리(NLP)에 의존하는데, 과거에 사용되긴 했지만 만족스러운 MCQ를 생성하지 못했습니다(6). 그러나 대규모 언어 모델(LLM)인 생성적 사전 학습 트랜스포머(GPT)와 같은 NLP 기반 AI 솔루션이 이를 변화시킬 것으로 보입니다. GPT-3.5와 같은 LLM이 '우리말 해킹'에 성공했기 때문입니다(9).  
Non-template based methods in AIG relies on Natural Language Processing (NLP), which has been used but did not produce satisfying MCQs in the past (6). However, it seems NLP-based AI solutions like generative pretrained transformer (GPT), which is a large language model (LLM), is about to change this. It is because LLMs, such as GPT-3.5, achieved a success in “hacking our languages” (9). 

GPT-3.5의 변형인 ChatGPT는 2022년 11월에 공개적으로 도입되었습니다. 단 5일 만에 사용자 100만 명을 돌파했습니다(10). 이는 의료 전문직 교육에도 영향을 미쳤습니다(11-14). 그 결과 일부 의학교육 연구자들은 ChatGPT의 출시일을 ChatGPT 이전 시대와 이후 시대를 구분하는 중요한 기준으로 제시하기도 했습니다(15). 이러한 성공은 아마도 GPT-3가 이전 모델의 10배에 달하는 훨씬 더 큰 데이터 말뭉치를 보유하고 있다는 사실(16)과 ChatGPT가 무료로 공개되었기 때문일 것입니다(15).  
ChatGPT, a variant of GPT-3.5, was publicly introduced in November 2022. It passed one million user mark in only five days (10). It affected health professions education as well (11–14). As a result, some medical education researchers suggested the release date of ChatGPT as a crucial delineation between the pre-ChatGPT era and the post-ChatGPT era (15). This success was probably due to the fact that GPT-3 possesses a data corpus that is significantly larger, being ten times the size of prior models (16), and due to ChatGPT’s free and public release (15). 

저희의 개인적인 경험에 따르면 ChatGPT를 사용하여 고품질의 MCQ를 생성하는 것은 프롬프트의 품질에 달려 있습니다. 그래서 저희는 의대생들을 위해 Esh Tatla가 개발한 프롬프트를 수정하여 프롬프트를 개발했습니다(17). 원래 프롬프트가 유용하더라도 의학교육에서 객관식 문항을 구성하는 원칙에 더 가깝게 부합하는 MCQ를 생성하기 위해 수정이 필요했습니다. 지속적인 테스트와 개선을 통해 완성된 프롬프트는 표 1에 제시되어 있습니다. 또한 시험 개발자와 의대생의 이해를 돕기 위해 정답에 대한 설명을 제공하는 부분도 포함되어 있습니다.  
Our personal experiences show that generating high quality MCQs using ChatGPT depends on the quality of prompts. Therefore, we developed a prompt by revising the prompt developed by Esh Tatla for medical students (17). Even if the original prompt was useful, it required us to make revisions to generate MCQs that align more closely with the principles of constructing multiple-choice items in medical education. The prompt, achieved through continuous testing and refinement, is presented in Table 1. It also includes a part to provide explanations of the answers that may help test developers and medical students to better understand. 

이 프롬프트를 사용하려면 사용자는 대문자로 표시된 두 부분만 채우면 됩니다:
To use this prompt, the user needs only to fill two parts presented using capital letters:

  1. [주제를 입력하세요]
    [PLEASE INSERT A TOPIC]
  2. [난이도(예: 쉬움, 어려움)를 입력하세요]
    [PLEASE INSERT A DIFFICULTY LEVEL (E.G. EASY, DIFFICULT]

첫 번째 부분에서는 사용자가 주제를 삽입합니다. 이 주제는 의과 대학 교육의 학습 목표로 간주할 수도 있습니다. 두 번째 부분은 MCQ의 난이도를 결정하는 부분입니다. 
In the first part, the user inserts a topic. The topic can also be considered as a learning objective of a training in a medical school. The second part is to determine the difficulty of the MCQ.

표 2는 "일차 진료 환경에서 본태성 고혈압 관리"를 주제로, 난이도를 "쉬움"으로 설정한 프롬프트를 사용하여 생성된 MCQ를 보여 줍니다https://chat.openai.com/share/4342df6f-7cbe-43df-98dc-b110b53983e3(무료 연구 미리보기, 9월 25일 버전)에서 액세스할 수 있습니다. 
Table 2 shows an MCQ generated by using the prompt with “managing essential hypertension in a primary care setting” as the topic, and “Easy” as the difficulty level, which is accessible from https://chat.openai.com/share/4342df6f-7cbe-43df-98dc-b110b53983e3. (Free Research Preview, September 25 Version). 

의학교육학 박사 학위를 가진 의사로서 위의 MCQ는 그럴듯하고 잘 구성되어 있는 것 같습니다. 문항의 질에 대한 철저한 분석도 가능하겠지만, 프롬프트를 입력하고 버튼을 누르는 것만으로 몇 초 만에 이러한 MCQ를 생성할 수 있다는 점이 우선 흥미롭습니다. 과거에는 이러한 MCQ를 작성하려면 해당 분야 전문가들의 오랜 시간이 필요했습니다. 하지만 오늘날에는 인터넷 연결과 일반 컴퓨터, 그리고 상세한 프롬프트만 있으면 몇 초 만에 효율적으로 MCQ를 생성할 수 있습니다. 이 때문에 일부 연구자들은 2022년 11월부터 시작되는 이 시기를 포스트 ChatGPT 시대라고 명명하기도 합니다. 
As a physician with a PhD in medical education, it seems the MCQ above is both plausible and well-constructed. While a thorough analysis of the item's quality could be carried out, it is initially intriguing to generate such an MCQ within seconds by simply inputting a prompt and pressing a button, which is an extraordinary success. In the past, writing such MCQs required extensive time of subject matter experts. However, in today, with an internet connection, an average computer, and the detailed prompt, we can efficiently generate MCQs in a few seconds. This is why some researchers name this period began in November 2022 as post-ChatGPT era. 

향후 연구를 위한 권장 사항
Recommendations for Future Research

ChatGPT를 이용한 자동 MCQ 생성은 많은 연구 질문에 대한 해답을 제시합니다. 그 중 일부는 다음과 같습니다:
Automatic MCQ generation using ChatGPT comes up with many research questions that need to be answered. Some of them are:
 MCQ가 과학 및 임상 지식 측면에서 주제별 전문가의 평가를 통해 수용 가능한 것으로 간주됩니까? 
 프롬프트가 심리측정 표준을 충족하는 MCQ를 생성할 수 있나요? 
 ChatGPT가 생성한 MCQ의 품질은 인간 전문가가 생성한 것과 비교했을 때 어떤 차이가 있나요? 
 ChatGPT가 생성한 질문의 인지 수준이 의도한 학습 목표 및 블룸의 분류 체계와 어떻게 일치하나요? 
 MCQ의 의도된 난이도가 실제 평가 환경에서 실제 난이도와 일치합니까? 
 다양한 보건 전문직 교육 환경에서 MCQ를 생성하는 데 프롬프트를 효과적으로 사용할 수 있습니까? 
 MCQ를 시험에 포함하기 전에 사람이 편집해야 합니까? 
 설명이 형성 평가에서 의대생에게 효과적인 피드백으로 작용하나요? 
 Are the MCQs considered acceptable through evaluation by subject matter experts in terms of scientific and clinical knowledge? 
 Is the prompt capable of generating MCQs that meet psychometric standards? 
 How does the quality MCQs generated by ChatGPT compared to those created by human experts? 
 How do the cognitive levels of questions generated by ChatGPT align with the intended learning objectives and Bloom's taxonomy? 
 Do the intended difficulty levels of the MCQs align with their actual difficulty in real assessment settings? 
 Can the prompt be effectively used to generate MCQs in various health professions education settings? 
 Do the MCQs need editing by humans before incorporating them into exams? 
 Do the explanations serve as effective feedback for medical students in formative assessments? 

결론
Conclusions

 ChatGPT는 몇 초 만에 사례 기반 MCQ를 생성할 수 있는 흥미로운 기회를 제공합니다. 
 연구자들은 ChatGPT를 사용하여 생성된 MCQ의 품질을 평가하기 위한 연구를 수행해야 합니다.
 ChatGPT provides an exciting opportunity for generating case-based MCQs in a few seconds. 
 Researchers should conduct studies to evaluate the quality of MCQs generated by using ChatGPT. 


Table 1. A prompt for generating MCQs in medical education. 

You are developing a question bank for medical exams focusing on the topic of [PLEASE INSERT A TOPIC]. Please generate a high-quality single best answer multiple-choice question. Follow the principles of constructing multiple-choice items in medical education. Generate the questions using the following framework:

Case (write as a single narrative paragraph without providing each part separately): 
Patient details (gender/age)
Presenting complaint
Relevant clinical history
Physical examination findings
Diagnostic test results (optional)

Question stem: [Insert relevant information from the above sections without compromising the answer]
Acceptable question style: Ask for the BEST answer, NOT one that is TRUE/FALSE.

Answer options:
[Insert plausible answer option]
[Insert plausible answer option]
[Insert plausible answer option]
[Insert plausible answer option]
[Insert plausible answer option]

Explanation:
• Identify and explain the correct answer.
• Explain why this is the most appropriate answer based on evidence-based guidelines or expert consensus.
• Briefly explain why the other answer options are less correct or incorrect.

Difficulty level: [PLEASE INSERT A DIFFICULTY LEVEL (E.G. EASY, DIFFICULT]


Table 2. An MCQ generated by using the prompt in ChatGPT.

Case:

Mr. Johnson, a 55-year-old male, presents to the primary care clinic with a chief complaint of recurrent headaches. He reports a sedentary lifestyle and a family history of hypertension. His blood pressure readings during previous routine check-ups have consistently been elevated. Mr. Johnson denies any chest pain, shortness of breath, or visual disturbances. His past medical history is significant for obesity and a sedentary occupation. On physical examination, his blood pressure is 150/92 mmHg, and his BMI is 31. Laboratory results show normal renal function and lipid profile.  

Question stem:

Given Mr. Johnson's presentation with recurrent headaches, a family history of hypertension, and consistently elevated blood pressure readings, what is the BEST initial management strategy in a primary care setting? 

Answer options:

A. Initiate lifestyle modifications, including dietary changes and increased physical activity.
B. Start antihypertensive medication immediately to lower blood pressure.
C. Order a 24-hour ambulatory blood pressure monitoring for further assessment.
D. Recommend immediate admission to the hospital for comprehensive evaluation.
E. Suggest over-the-counter analgesics for headache relief and reassess in six months.

Explanation:

The correct answer is A. Initiate lifestyle modifications, including dietary changes and increased physical activity.

Explanation:

Hypertension management guidelines emphasize lifestyle modifications as the initial approach in patients with uncomplicated essential hypertension. For individuals like Mr. Johnson, who presents with elevated blood pressure, obesity, and a sedentary lifestyle, initiating lifestyle changes is crucial. These changes should include adopting a heart-healthy diet (such as the DASH diet), increasing physical activity, and weight reduction. Lifestyle modifications have been shown to significantly contribute to blood pressure control and may obviate the need for immediate pharmacologic intervention.

Option B (Start antihypertensive medication immediately) is not the best initial approach unless there are compelling indications for immediate treatment, which are not evident in Mr. Johnson's case. Option C (Order a 24-hour ambulatory blood pressure monitoring) may be considered in certain situations but is not the primary step in the management of essential hypertension. Option D (Recommend immediate admission to the hospital) is not warranted in the absence of hypertensive emergencies. Option E (Suggest over-the-counter analgesics) does not address the underlying hypertension and is not the primary management strategy for this patient. 


 


Abstract

The significant challenge faced by medical schools is the effortful process of writing a high quantity of high-quality case-based multiple-choice questions (MCQs) to assess the higher-order skills of medical students. The demand for a high volume of MCQs in education has led to the development of Automatic Item Generation (AIG), specifically template-based AIG, which involves creating cognitive and item models by subject matter experts to generate hundreds of MCQs at once using software. It demonstrated significant success in various languages and even being incorporated into national medical licensure exams. However, this method still heavily depends on the efforts of subject matter experts. This paper introduces a detailed ChatGPT prompt for quickly generating case-based MCQs and provides important research questions for future exploration into ChatGPT's potential in generating items, signaling the beginning of the artificial intelligence era in medical education, encouraging health professions education researchers to delve deeper into its potential.

+ Recent posts