다면인적성면접: 같은 개념, 다른 접근법

Multiple mini-interviews: same concept, different approaches

Mirjana Knorr & Johanna Hissbach



The comprehensive literature on the reliability of different types of MMI and their efficiency in the use of interviewing time strongly supports the superiority of the MMI method over conventional interview methods.


Because of the many factors that can be varied, results are not easily transferable from one MMI to another. Based on the published research, we can give some recommendations for aspects of design with regard to reliability values and costs. It is important to note that measures to increase reliability are often accompanied by an increase in costs. To date, only vague statements can be made concerning validity.


Recommendations for reliability

  • Increasing the number of stations, the number of interviewers per station or the number of items will enhance an MMI's reliability. Raising the number of stations is the most advisable of these three options.
  • A station time of 5–6 minutes is sufficient.
  • The use of skills-based rater training that includes mock interviews can improve rater agreement.
  • The use of normative anchored rating scales rather than descriptive adjectives (i.e. ‘poor’ or ‘outstanding’) will encourage raters to make use of the full rating scale.
  • Stations that are too easy or too difficult should be excluded because they do not allow for the differentiation of candidates according to ability.
  • A pleasant atmosphere for candidates should be ensured.
  • With reference to the variation in station type, research so far suggests there are no differences between one-to-one and interactive stations in reliability.
  • The addition of written tasks (i.e. questionnaires or writing stations) does not guarantee an increase in reliability.


Recommendations for validity

  • Users should be aware that an MMI is best suited to the assessment of factors that are not captured by established admissions criteria such as GPA and admission tests.
  • The breadth and narrowness of constructs for MMI attributes and external criteria should be considered.


Recommendations for costs

  • The extra costs implied by station development and the use of actors should be considered in any change from a conventional interview format to an MMI format.
  • Written tasks can be used to save costs of interviewers or actors.
  • The use of an internet-based MMI (iMMI) to save facility- and travel-related costs should be considered.


OBJECTIVES:

많은 교육기관에서 기존의 면접을 MMI로 대체하고 있다. MMI는 신뢰도가 높으며 면접자의 편향에 따른 영향이 적다. MMI이 각 기관의 상황에 따라 다르게 적용가능하기 때문에 어떤 상황에서 최고의 효과를 발휘하는가가 의문으로 남는다. 

Increasing numbers of educational institutions in the medical field choose to replace their conventional admissions interviews with a multiple mini-interview (MMI) format because the latter has superior reliability values and reduces interviewer bias. As the MMI format can be adapted to the conditions of each institution, the question of under which circumstances an MMI is most expedient remains unresolved. This article systematically reviews the existing MMI literature to identify the aspects of MMI design that have impact on the reliability, validity and cost-efficiency of the format.


METHODS:

Three electronic databases (OVID, PubMed, Web of Science) were searched for any publications in which MMIs and related approaches were discussed. Sixty-six publications were included in the analysis.


RESULTS:

40개 연구가 신뢰도에 대하여 보고하였다. 스테이션당 평가자를 늘리는 것보다 스테이션 수를 늘리는 것이 일반적으로 신뢰도를 높이는데 효과적이다. 그 외에 다른 것으로는 너무 쉬운 문항을 제외하는 것, normative anchored rating을 사용하는 것, skill-based 평가자 훈련을 실시하는 것 등이 있다. 타당도에 대해서는 31개의 연구가 있었는데, 연구 설계와 무관하게 MMI와 학업척도와의 관계는 매우 작거나 없었다. McMaster 의과대학의 MMI는 의과대학 및 면허시험 수행능력을 예측하였다. 구인타당도에 대한 결과는 아직 뚜렷하지 않다. 비용과 연관되는 가장 핵심 요소는 문항 개발과 연기자에게 지급되는 비용이었다.

Forty studies reported reliability values. Generally, raising the number of stations has more impact on reliability than raising the number of raters per station. Other factors with positive influence include the exclusion of stations that are too easy, and the use of normative anchored rating scales or skills-based rater training. Data on criterion-related validities and analyses of dimensionality were found in 31 studies. Irrespective of design differences, the relationship between MMI results and academic measures is small to zero. The McMaster University MMI predicts in-programme and licensing examination performance. Construct validity analyses are mostly exploratory and their results are inconclusive. Seven publications gave information on required resources or provided suggestions on how to save costs. The most relevant cost factors that are additional to those of conventional interviews are the costs of station development and actor payments.


CONCLUSIONS:

MMI연구를 분석하여 신뢰도가 높고 비용-효과적인 MMI에 대한 제언을 할 수 있으나, 아직 중요한 요소들이 모두 밝혀진 것은 아니다. dimensionality and construct validity, the predictive validity of MMIs  등에 대한 연구가 필요하다.

The MMI literature provides useful recommendations for reliable and cost-efficient MMI designs, but some important aspects have not yet been fully explored. More theory-driven research is needed concerning dimensionality and construct validity, the predictive validity of MMIs other than those of McMaster University, the comparison of station types, and a cost-efficient station development process.




간략한 역사와 현황(2002년부터 시행, 미국/캐나다/호주/영국/유럽/중동 등)

Over the past 10 years a specific form of admission interview, the multiple mini-interview (MMI), has enjoyed increasing popularity in the health sciences field following the criticism of conventional admission interviews for their unsatisfactory reliability.[1, 2] Originally introduced at McMaster University, in Hamilton, Ontario, Canada, in 2002,[3] the MMI has found widespread application at different medical schools and in other health sciences programmes in the USA, Canada, Australia and the UK, as well as in other European and Middle Eastern countries.



핵심 특징    

The core characteristic of the MMI is a multiple independent sampling methodology.[4] In a manner similar to that of an objective structured clinical examination (OSCE), each candidate rotates through several short standardised interview stations.[5] Thereby, a candidate has several independent encounters with different interviewers instead of one single panel interview.


신뢰도 향상이 목적임. 측정하려는 구인에 맞춰서 변형가능함. 즉, MMI는 구체적으로 어떤 것을 측정하기 위한 척도가 아니라 측정방법 중 하나라고 볼 수 있음. 

The MMI aims to enhance reliability by taking into account the problem of interviewer bias, as well as the context specificity of a candidate's performance.[5, 6] The number of stations and interviewers, the station content and the scoring system are flexible and vary considerably among different institutions. Most notably, an MMI is adjustable to the constructs being measured, although most authors design their MMIs to capture a set of dimensions predominantly described in the literature as ‘non-cognitive’ attributes. Consequently, the MMI is an assessment method or process rather than a clearly defined measure.[7-10]


각 기관마다 도입하고 있는 MMI는 다양하고, 본 연구의 연구질문은 아래와 같음. 

From a practical point of view, especially for an institution that is considering implementing an MMI into its admissions procedure, a highly relevant issue concerns which aspects of the format should be considered in order to design a successful (i.e. reliable, valid and cost-efficient) MMI. In the context of the wide range of approaches to the MMI, our goal is to shed light on responses to the following questions:

    • Which factors can be varied in an MMI design?
    • Which variations of these factors contribute towards an MMI that is successful in terms of reliability, validity and cost-efficiency?


We address these questions in a systematic review of the MMI literature. Based on our findings, we provide recommendations for the design of an MMI and an outlook on directions for future research. Therefore, this article takes a perspective that differs from that of another recently published systematic review which concentrated on the common features of MMIs.[11] By contrast, this review aims to add to the existing literature by explicitly focusing on the differences among approaches to the MMI and the effects of variations




Design of the MMI

The attributes that are to be measured, the types of station used, the details of the MMI process, and finally the scoring system specify an MMI design. All of these factors provide possibilities for variation. In order to answer our first research question (‘Which factors can be varied in an MMI design?’), we looked at the different variations that were reported in the literature.


평가하는 인적특성의 수 및 종류 Attributes

The development of a programme-specific MMI starts with the definition of the characteristics that are to be measured. Only a few authors have described their approach to identify these core characteristics through literature research and stakeholder analysis.[14, 15] In the reviewed literature, lists of core characteristics range in length between three[16, 17] and 19[18] attributes. Some attributes are more commonly used (e.g. communication skills) and others are more programme-specific (e.g. leadership potential[19]).


스테이션 수행업무 Stations

Station development is based on the selected set of attributes. Usually, candidates are asked to discuss a topic or dilemma with an interviewer, to answer standardised questions, to interact with a trained actor or to collaborate with one or several other candidates. Some MMIs also include problem-solving tasks,[15] presentations,[20-22] prioritising tasks,[22] creative tasks,[23] film clips,[24, 25] writing samples[26] or debriefing stations in which the candidate's performance at a previous station is discussed.[12, 13]


절차(스테이션 수, 총 면접일 수, 동시진행 circuit 수, 입실 전 시간, 입실 후 시간) MMI process

The usual number of stations varies between six[23, 27] and 12.[8, 9, 19, 28-34] There have also been reports of MMIs with only three stations. All of these MMIs were selecting candidates at a later point in their careers as they applied for residency or junior doctor posts.[16, 17, 20, 35] The number of interview days ranges between one[33, 36] and 11[37] with up to four sets per day[9, 38, 39] and up to seven simultaneous circuits per set.[9] Candidates are given between 30 seconds[15] and 3 minutes[20] to read the scenario and prepare for the task. Station duration is between 5 minutes[16, 17, 22, 39, 40] and 15 minutes.[21]


평가 시스템 Scoring system

A candidate's performance is usually rated on 4-[41-46] to 10-point[21, 32-34, 47-49] anchored Likert scales by one[3-5, 7, 12, 13, 15, 18, 25, 29, 30, 32-34, 36, 37, 39, 41-48, 50-57] or two[5, 12, 13, 15-18, 20, 21, 24, 30, 33, 35, 38, 39, 53, 55, 58-60] interviewers at each station. The station score is either measured on one single scale[3, 5, 16, 17, 19, 22, 26, 27, 30, 34, 45, 48-50, 52, 61] or is formed by the summation or aggregation of several subscales.[4, 5, 7, 12, 13, 20, 21, 23-25, 32, 33, 37-39, 41-44, 46, 47, 51, 54, 55, 57, 58, 60, 62, 63] In some cases, these additional subscales are meant to help raters make a decision on the overall station score, but are not considered in the total score.[15, 29] Subscales can be applied at every station[4, 12, 21, 23, 24, 33, 38, 39, 47, 55, 58] or they can be station-specific.[4, 15, 37, 38, 41-44, 46, 57, 58, 60, 62]


The list of variations leads to the second question (‘Which variations of these factors contribute towards an MMI that is successful in terms of reliability, validity and cost-efficiency? ). The following sections give an overview of the possible ranges of reliability, validity and costs as reported in the literature, as well as findings related to the impact of design differences.









Reliability

신뢰도 연구를 위한 방법들 Forty of the studies reviewed reported an estimation of reliability. In 22 of these, calculations were based on generalisability theory. Generalisability theory is a framework that combines assumptions of classical test theory with analysis of variance procedures.[64] Within this framework, generalisability studies (G studies) provide estimates of variance components and measurement error which form the basis for the calculation of the overall reliability (generalisability [G] coefficient). Based on G study results, decision studies (D studies) estimate the impact of different hypothetical MMI designs on reliability. Additional measures reported in the reviewed literature were correlations, intraclass correlations (ICCs) and Cronbach's alpha. Table 1 gives a structured overview of reported reliability values sorted by different types of reliability.


G study 연구결과 A closer look at the reported G study results shows that the proportion of variance attributable to candidate differences varies between 10%[19] and 74%,[59] although in most studies candidates accounted for < 30% of the variance. As an MMI aims to detect systematic differences between candidates, the goal for an ideal MMI would be to increase this proportion of intended variance and to reduce unwanted variance (i.e. rater, station, error).


신뢰도 향상시키기 위한 방법 The influence of MMI design variations on reliability was analysed in various studies. D studies show that overall generalisability can be increased by adding more stations or more raters to each station.[41] More specifically, increasing the number of stations appears to have greater impact on reliability than increasing the number of interviewers within each station.[5, 20, 33, 60, 65] Additionally, Hanson et al.[4] found that reducing the number of items nested within raters had less impact on reliability than reducing the number of stations. One way to raise the number of stations without increasing the overall interviewing time is by reducing station time.[41] To date, two studies have shown that 5–6 minutes can be sufficient to reliably assess a candidate's performance.[49, 52] Another possible method of increasing reliability without raising the number of active stations is to include questionnaires such as the Judgement and Decision-making Questionnaire and the Biographical Questionnaire used in MOR.[12, 13] However, reliability could not be increased by the addition of a writing station.[26]


신뢰도 향상시키기 위한 방법, 영향주인 요인 As Uijtdehaage et al.[19] point out, more factors other than the number of stations and raters per station contribute to the differences in reliability estimations. For example, the exclusion of very easy stations, the change to a normative anchored rating scale, and a less intimidating atmosphere may enhance overall generalisability.[19] Interviewer bias can be significantly reduced by changing from an information-based to a more skills-based form of rater training.[25] A comparison of reliability values for two different station types (one-to-one versus interactive) showed that both achieved similar reliabilities if each station type was represented by the same number of stations.[37] Finally, by comparing a fixed-effects design (G = 0.90) with a random-effects design (G = 0.68) Sebok et al.[60] demonstrated that differences in reliability estimations can also stem from different assumptions in the statistical model.






Validity

Of 31 studies that reported validity measures, 27 provided indicators for the criterion-related validity of the respective MMI. Dimensionality was analysed in five studies. Content and face validity were usually established by blueprinting processes and evaluation surveys.


준거관련 타당도 All reported criterion-related validities are summarised in Table 2. The criteria can be structured into three main clusters: psychological constructs, other measures relevant for admission, and performance measures (in-programme or post-graduation). Only McMaster University provides results for all three of these clusters (see Table 2). Siu and Reiter explained their observation that correlations between MMI results and different performance measures trended to an increase over time by the fact that later assessments put a greater emphasis on ‘non-cognitive’ domains.[66]


탐색적 요인분석 The literature reviewed included two studies that described exploratory factor analyses (EFA) based on MMI subscores. Lemay et al. reported a 10-factor solution in which each factor represented one of 10 stations.[47] Hecker et al. performed three separate factor analyses, all of which included age and grade point average (GPA).[58] They found a three-factor solution (moral and ethical values, interpersonal ability, academic ability) based on station-specific subscales, a single-factor solution based on communication skill scores assessed at all stations, and a two-factor solution (economics, interpersonal ability) based on critical thinking skills also assessed at all stations.[58] Although MMI design and methods differed, both studies suggest a multidimensional structure for their MMI.


IRT를 활용한 분석 A second branch of research focused on item response theory (IRT) to examine the dimensionality of different MMIs. These studies report a good fit of questions or items to an assumed unidimensional construct. This construct was defined as ‘pre-professionalism’ and ‘entry-level reasoning skills in professionalism’,[41, 43, 44] ‘latent professional potential’[18] or simply ‘professionalism.[60] However, the exact nature of the underlying attributes remains unknown.[60]











Cost-efficiency

비용 대비 효과적이다. Required costs and resources were mentioned in seven of the 66 publications reviewed. Cost analyses suggest that, compared with traditional interview formats, the MMI format is especially efficient in reducing interviewing time (i.e. the number of hours required to interview all candidates). It thus allows a larger number of candidates to be interviewed in a shorter time period.[5, 21, 22, 32, 67]


추가 비용 발생 요인 Additional costs most notably arise from the development of the blueprint and the MMI stations.[32] Researchers at McMaster University estimated 3 hours and costs of US$50 for the development of a single station.[32] In a recently published study, Hissbach et al.[68] reported much higher costs associated with their procedure. Station development represented a significant cost factor and implied a cost of approximately US$2000 per station. The high expenses reflect the station development time of 40 hours per station including test runs.[68]


비용 절감 수단 Measures to cut costs include the reduction of station time,[5] the reduction of the number of stations,[5] and the implementation of an internet-based version of the MMI (iMMI) to save costs for international applicants.[46] The inclusion of writing stations allows more candidates to be interviewed without increasing interviewer hours.[26] However, costs will increase if an MMI includes stations with simulated patients.[5, 22]




Discussion

From its introduction, the MMI procedure was intended to be adjustable to the requirements and conditions of different institutions and programmes. As with every assessment method, it is interesting to learn more about the conditions under which it works best. Consequently, our goal was to take a closer look at the impact of design changes on reliability, validity and cost-efficiency based on 66 studies.


With regard to our first question, the analysis revealed great variability in MMI designs in terms of attributes, station types, process details (number of stations, sets, circuits and days, time to prepare, station duration), and scoring system (type and usage of scales and subscales, scale range, number of raters) among institutions and also between subsequent years at the same institution. The wide range demonstrates that the MMI is indeed adjustable in many aspects.


Our second question was concerned with the impact of design changes on reliability, validity and cost-efficiency. Based on numbers of studies, reliability is the most studied of these three criteria so far, followed by validity and costs. In consequence, most conclusions can be drawn with regard to reliability, but there are further aspects that could be explored. Analyses of validity and costs provide some insight, but also raise further questions. In the subsequent paragraphs we will discuss each of the three aspects.


신뢰도 Reliability

Reliability is a strong point of the MMI procedure. Despite considerable differences in design, most values of internal consistency as well as overall generalisability are satisfactory, although a possible publication bias must be kept in mind. The multi-station approach, which is the core element of all MMI formats, allows a satisfactory level of reliability to be achieved by raising the number of stations. In addition, the MMI literature already provides useful information on the impacts of other aspects of MMI design on reliability, such as the number of items, station time, station type, station difficulty and type of rating scale.


적절한 스테이션 수에 대한 결론은 내리기 어려움 Nevertheless, as a result of the limited comparability of reliability values between studies, a general recommendation for a minimum number of stations cannot be derived from the MMI literature. For instance, it is difficult to give the exact reasons why a four-station MMI[4] with one rater per station yielded a higher generalisability value than an eight-station MMI.[41] Given the large impact of different model assumptions on reliability estimations, as demonstrated by Sebok et al.,[60] one must be very cautious in interpreting reliability values from different studies. Similarly, the MMI designs lead to differences in the separation of variance components. Therefore, it is also difficult to compare proportions of variance between studies. For these reasons, reports that concentrate on a specific MMI design and analyse how systematic changes influence reliability, as does the study by Uijtdehaage et al.,[19] are of high value.


평가자간 신뢰도 Results concerning other types of reliability show a mixed picture. Inter-rater reliabilities for MMI designs that include two raters at each station are moderate to satisfactory. The selection of raters and the quality of their training may be the most relevant factors in terms of a positive influence on inter-rater reliability and a reduction in systematic and unsystematic rater variance.


스테이션간 신뢰도, 동일 특성에 대한 문항간 신뢰도 Low inter-station reliabilities, as well as low inter-item reliabilities within attributes, are typically explained by the content and context specificity of a performance.[5, 41, 47] However, estimates differ between studies and attributes. The higher reliability values for communication skills than for teamwork reported by Dowell et al.[37] may be explained by the fact that communication skills were measured at more stations than teamwork.


Based on high inter-item reliabilities for items within a station, some authors suggested that the overall station score might be sufficient and subscores might not be needed.[24, 55] This raises the question of whether it is possible to measure several distinct constructs at one station after all.



타당도 Validity

학업적 척도보다 비학업적 척도와 상관 높음. 무언가 다른걸 측정하고 있다 The tendency towards weak to non-relationships with predominantly academic measures (e.g. GPA, science subtests of the Medical College Admission Test [MCAT]) and weak to moderate correlations with less academic measures (e.g. the MCAT verbal subtest, other admission tools) suggests that MMIs cannot replace conventional admission tools, but, rather, measure something different.


McMaster를 제외하고는 예측타당도에 대해 알려진 바는 적음. 다르게 나오는 연구도 있음. While several publications from McMaster University support the incremental validity of the McMaster MMI in predicting in-programme and licensing examination performance,[3, 7, 8, 50] little is known about the predictive validity of other MMIs. Higher correlations between the MMI and the CLEO (Considerations of the Legal, Ethical and Organisational Aspects of Medicine)/PHELO (Population Health and Ethical, Legal and Organisational Aspects of Medicine) area of the Medical Council of Canada Qualifying Examination (MCCQE) do not seem surprising given that the McMaster MMI puts an emphasis on ethical decision making.[7] The non-significant correlations between MMI results and licensing examinations reported by Hofmeister et al.[33] seem to differ from McMaster findings. These results may reflect differences in attributes measured by the specific MMI, but differences in sample types must also be considered.


구인타당도, 차원성(dimensionality). EFA보다 CFA가 더 적합할 수 있음. Analyses that aimed to investigate the construct validity and dimensionality of different MMIs are characterised by their explorative nature. Broadness of constructs is a relevant point that needs to be considered, as Griffin and Wilson demonstrated that several Big Five sub-facets showed significant correlations with MMI performance even when the superordinate factor did not.[63] Because of the possibility of higher- and lower-order factors, the multidimensional structure found in EFAs and the unidimensional construct described in IRT studies should not be interpreted as contradictory. However, both EFA studies have methodological weaknesses (e.g. data that would allow checking for cross-loadings are missing; age and GPA have been included in the analysis) and confirmatory factor analysis (CFA) would be a more suitable approach to the testing of prior assumptions about dimensionality.


전체적으로 validity에 대한 확실한 답은 없음. Overall, there is no comprehensive picture of MMI validity as yet. Researchers need to draw on definitions of the measured attributes and their theoretical assumptions to explain why an MMI result is related to an external measure. Therefore, to derive further general recommendations, more theory-driven research on the construct and predictive validity of different MMIs is needed.



비용-효과성 Cost-efficiency

문항개발에 들어가는 비용이 큼 Station development was identified as an important additional resource requirement for an MMI.[32] The comparison of station development costs reported by McMaster University and by Hamburg Medical School reveals a large gap. Given that McMaster University focuses on one attribute (ethical decision making) and additionally considers communication skills and collaborative ability,[7] it may be that institutions that intend to measure a broader construct face much higher costs for station development. However, the issue of how reliable and valid stations can be developed efficiently should be further explored.


신뢰도 향상을 위해 스테이션을 늘리면 그만큼 비용 증가. 지필고사를 활용하면 비용은 줄일 수 있을지 몰라도 신뢰도가 높아지지는 않음. Reliability, validity and costs are interdependent factors. If the number of stations is increased to enhance reliability or to measure a broader set of attributes, the costs of station development and staff will rise accordingly. The inclusion of writing stations might save the costs incurred by the use of interviewers for an additional interview station, but does not seem to increase reliability.[26] Given that the use of simulated patients represents another additional cost factor,[32] it would be interesting to learn more about the value of simulation-based stations in terms of reliability and validity in comparison with interview stations.



연구 방향 Directions for future research

'인지적'역량과 '비인지적'역량을 나누는 명확한 기준은 없음. 학업적-비학업적 구분이 더 나을 수 있음. The MMI literature is still vague in the terminology it uses to describe what it is that MMIs measure. Originally, Eva et al. labelled these competencies ‘non-cognitive attributes’[5] in order to draw a line between an MMI and conventional admissions criteria that measure ‘cognitive attributes’.[7] However, there is no definitive method of assigning specific attributes to the ‘cognitive’ or the ‘non-cognitive’ domain and the use of these terms has already been questioned by others.[43, 69] The distinction between academic and non-academic attributes may be more adequate, although it would still imply a strict dichotomy. Alternatively, as IRT studies suggest, a very broad comprehensive construct of ‘potential for professionalism’, embracing various specific MMI-tested characteristics, might be assumed.[18, 43, 60]


어떤 것을 측정할 것인가. Another question relates to the nature of the attributes being measured and of what rating scales assess. Eva et al.[7] point out that these competencies should not be thought of as traits because of the context specificity[6] of a behaviour. As latent state–trait theory suggests, the behaviour of a person in a given situation depends on the characteristics of the person, the characteristics of the situation and the interaction between these two sets of characteristics.[70] Stations may be understood as representing different situations. Alternatively, they may be seen as representing different methods if the MMI represents a multitrait–multimethod approach.[71] The variety in the application of measurement scales and different approaches to the analysis of dimensionality (e.g. one station measuring only one distinct attribute versus several stations measuring several attributes) indicates uncertainty about what information can be derived from the behaviours shown in MMI stations.


Dowell et al. state that ‘if overall MMI scores do predict a significant portion of the variance in medical school performance, the issue of construct validity will become less important’.[37] Nevertheless, both predictive and construct validity benefit from prior theoretical assumptions that can explain high or disappointing values. If a priori assumptions about the nature and the theoretical connection between attributes exist, structural equation modelling (SEM) might provide a useful statistical method with which to test these assumptions.


The ‘construct validity problem’ which is discussed in the assessment center (AC) literature raises similar questions. Given the similarities between ACs and MMIs (multi-station approach, behavioural ratings), the AC literature might provide a good starting point for the further analysis of MMI construct validity (see Bowler and Woehr[72] for a recent meta-analysis and Lance[73] for a recent review on AC construct validity).




 2014 Dec;48(12):1157-75. doi: 10.1111/medu.12535.

Multiple mini-interviewssame conceptdifferent approaches.

Author information

  • 1University Medical Centre Hamburg-Eppendorf, Hamburg, Germany.

Abstract

OBJECTIVES:

Increasing numbers of educational institutions in the medical field choose to replace their conventional admissions interviews with amultiple mini-interview (MMI) format because the latter has superior reliability values and reduces interviewer bias. As the MMI format can be adapted to the conditions of each institution, the question of under which circumstances an MMI is most expedient remains unresolved. This article systematically reviews the existing MMI literature to identify the aspects of MMI design that have impact on the reliability, validity and cost-efficiency of the format.

METHODS:

Three electronic databases (OVID, PubMed, Web of Science) were searched for any publications in which MMIs and related approacheswere discussed. Sixty-six publications were included in the analysis.

RESULTS:

Forty studies reported reliability values. Generally, raising the number of stations has more impact on reliability than raising the number of raters per station. Other factors with positive influence include the exclusion of stations that are too easy, and the use of normative anchored rating scales or skills-based rater training. Data on criterion-related validities and analyses of dimensionality were found in 31 studies. Irrespective of design differences, the relationship between MMI results and academic measures is small to zero. The McMaster University MMI predicts in-programme and licensing examination performance. Construct validity analyses are mostly exploratory and their results are inconclusive. Seven publications gave information on required resources or provided suggestions on how to save costs. The most relevant cost factors that are additional to those of conventional interviews are the costs of station development and actor payments.

CONCLUSIONS:

The MMI literature provides useful recommendations for reliable and cost-efficient MMI designs, but some important aspects have not yet been fully explored. More theory-driven research is needed concerning dimensionality and construct validity, the predictive validity of MMIs other than those of McMaster University, the comparison of station types, and a cost-efficient station development process.

© 2014 John Wiley & Sons Ltd.


+ Recent posts