보건인력교육연구의 방법 및 보고 퀄리티: 체계적 문헌고찰(Med Educ, 2011)

Method and reporting quality in health professions education research: a systematic review

David A Cook,1,2 Anthony J Levinson3 & Sarah Garside3






도입

INTRODUCTION



그러나 교란변수의 문제 외에도, 약한 방법이 결과 자체를 편향시키는 지의 문제는 해결이 필요합니다. 약한 연구 결과가 실제 효과를 과대 평가하거나 과소 평가합니까? 임상 의학에서의 몇몇 증거는 특정 연구 품질 척도와 연구 성과 사이의 연관성을 제시하는 반면, 다른 연구로부터의 증거는 그렇지 않다.

In addition to the issue of confounding, however, the question of whether weak methods bias the result itself also requires resolution: do results from weaker studies overestimate or underestimate the true effect? Some evidence in clinical medicine suggests an association between certain study quality measures and study outcomes , whereas evidence from other studies does not.3,4


의학 서적의 소비자는 투명하고 완전한보고가 필요합니다. 몇몇 전문가 패널은 좋은보고를 용이하게하기위한 지침을 개발했다 (예 :보고 표준의 통합 표준 [CONSORT], 비 무작위 설계와의 7 가지 투명 평가 보고서 [TREND], 8 역학 관찰 연구의보고 강화 [STROBE] 9) 보고 품질은 차선책으로 남아 있습니다 .10-12 의료 교육 문헌 13-16의보고 품질에 대한 네 가지 연구 결과에 따르면 문헌에는 결함이 있다. 임상 의학에 관한 최근의 여러 연구에서도 보고 지침을 더 잘 준수할 여지가 있음을 확인했습니다 .11,12,17-22

Consumers of the medical literature also need transparent and complete reporting. Several expert panels have developed guidelines to facilitate good reporting (e.g. 

  • Consolidated Standards of Reporting Trials [CONSORT],7 

  • Transparent Reporting of Evaluations with Non-randomised Designs [TREND],8 

  • Strengthening the Reporting of Observational Studies in Epidemiology [STROBE]9), 

yet reporting quality remains suboptimal.10–12 Four studies of reporting quality in the medical educationliterature13–16 have identified deficiencies . A number of recent studies in clinical medicine have likewise identified room forimprovement in adherence to reporting guidelines.11,12,17–22 


의학 교육의 체계적인 리뷰는 연구의 질을 평가하기 시작했으며 23,24, 몇 가지 연구 25-28은 방법 론적 품질에 특히 중점을 두었다.


Systematic reviews in medical education are beginning to evaluate study quality23,24 and a few studies25–28 have focused specifically on methodological quality.



마지막으로 의료 교육 연구를 체계적으로 검토하여 다양한 설계의 방법 론적 품질을 평가하는 데 최소한 세 가지 방법, 즉 

  • MERSQI (Medical Education Research Study Quality Instrument), 25,26 

  • Newcastle-Ottawa Scale, 24,30 및 24

  • BEME .23

Finally, at least three measures have been used in systematic reviews of medical education research to evaluate the methodological quality of disparate quantitative designs: the Medical Education Research Study Quality Instrument (MERSQI);25,26 the Newcastle–Ottawa Scale,24,30 and the Best Evidence in Medical Education (BEME) global rating.23







METHODS


연구 논문 선택

Study eligibility and selection


We included studies from a recent systematic review of Internet-based instruction.24,31 we defined Internetbased instruction as computer-assisted instruction that uses the Internet or a local intranet as the means of delivery.


자료 추출

Data extraction


The first step in determining reporting quality required the selection of a quality standard. We considered three different guidelines: the TREND,8 CONSORT for non-pharmacological treatments32 and STROBE9 statements


We selected the STROBE statement 

  • because most of the studies in our sample were observational (hence many elements of CONSORT did not apply) and 

  • because the STROBE guidelines have now been endorsed by over 100 journals (http://www.strobe-statement.org). 

We used the ‘more informative abstract’ headings for coding abstract completeness.33


To extract data on methodological quality, we used a modification of the Newcastle–Ottawa Scale (m-NOS),24 the MERSQI25 and the BEME global rating.23 


We also extracted data on ethical issues (institutional review board approval and participant consent) and study conclusions (our interpretation and our impression of the authors’ interpretation of whether the study results favoured the study intervention, the comparison intervention or neither).




자료 통합과 분석

Data synthesis and analysis


We calculated inter-rater agreement on quality codes using the intraclass correlation coefficient (ICC) for a single rater.34 We determined frequency of presence for all STROBE elements and ethical issues. To enable correlation with other quantitative measures, we calculated a completeness of reporting index for each main section (Title⁄ Abstract, Introduction, Methods, Results, Discussion) reflecting the percentage of elements present in that section.


We reported individual frequencies or scores for each item on the m-NOS and MERSQI, and calculated mean total scores for the m-NOS, MERSQI and BEME scales.




결과

RESULTS


연구 흐름

Trial flow


266개 논문 중 133개에서 추가 자료를 수집함

We identified 266 articles reporting comparative studies of Internet-based instruction involving 32 928 learners. For reasons of feasibility, we randomly selected half (n = 133) for additional data extraction on reporting quality and previously uncoded methodological features.


연구보고의 퀄리티

Quality of reporting


500단어 이하의 논문은 제외

Of the 133 articles, we excluded from reporting quality analyses three very short reports (< 500 words) with restricted journal requirements (e.g. no references permitted). We present reporting quality for the remaining 130 articles in Fig. 1; details are reported in Appendix S1.


총점

The overall reporting index (sumof the five individual reporting indices, maximum 500) was 253 ± 90, ranging from 39 to 486 for individual articles.



목표/설계/IRB/동의

    • Although 120 of 130 articles (92%) clearly described the Internet-based intervention

      • only 61 of 93 (66%) articles with a comparison arm clearly described the comparison intervention

      • Sixty-nine articles (53%) stated the study design and 

      • 22 (17%) reported sample size calculations

      • Seventy-five articles (58%) noted either institutional review board evaluation (56 studies, 43%) and⁄ or 

      • participant consent (57 studies, 44%).


참가자

    • Fifty-five studies (42%) reported the number of subjects eligible for participation

      • 103 (79%) reported follow-up rates and 

      • 17 (13%) provided a CONSORT-style flow diagram



통계적 유의성

    • Although 114 articles (88%) reported p-values

      • only 82 (63%) reported both the mean and a measure of variance (e.g. SD or standard error of the mean) and 

      • only 11 (8%) reported the CI for the difference between means.



장점과 단점 분석

    • Study limitations and strengths were infrequently acknowledged, 

      • with 66 articles (51%) commenting on sources of potential bias, 

      • 44 (34%) mentioning precision (i.e. adequacy of sample size), and 

      • 65 (50%) discussing the magnitude of effect or potential confounders

    • Even fewer articles (n = 29, 22%) interpreted study results in light of limitations.



RCT의 퀄리티가 조금 더 나은 편

The reporting quality of RCTs was somewhat higher than that of observational studies for most individual elements (Appendix S1). Reporting indices for all sections were significantly higher for RCTs than for observational studies (p < 0.001).



보고 퀄리티의 시계열 향상

Reporting quality improved over time (p = 0.002). The mean ± SD overall reporting index rose 

    • from 212 ± 81 for studies reported during 1996–2001 (n = 27) 

    • to 235 ± 82 for those reported in 2002– 2004 (n = 35), 

    • 261 ± 84 for those reported in 2005– 2006 (n = 42) and 

    • 307 ± 97 for those reported in 2007–2008 (n = 26).





연구 결과 해석

Interpretation of study results


We coded our interpretation of the study results as favouring the study (Internet-based) intervention, favouring the comparison, or neutral, and also coded our impression of the authors’ interpretations. These generally involved author interpretations of 

  • neutral results (our impression) as favouring the study intervention (n = 8), or of results 

  • favouring the comparison intervention (our impression) as neutral (n = 2) or 

  • favouring the study intervention (n = 1). 

  • In one instance we interpreted results as favouring the study intervention and the authors interpreted them as neutral.





방법론적 퀄리티

Methodological quality


We rated the methodological quality of 133 articles using three previously described scales (Table 1).



세 가지 척도사이의 정적 상관관계

We found high correlations (all p < 0.0001) between MERSQI scores and m-NOS (q = 0.73) and BEME (q = 0.62) scores, and moderate correlation between m-NOS and BEME (q = 0.57) scores.








방법론적 퀄리티와 효과크기

Association between methodological quality and effect size


Using the 209 studies reporting knowledge outcomes (25 397 learners), we explored associations between methodological quality and ES in three ways.




하위그룹 메타분석

Subgroup meta-analysis


방법론적 차이가 메타분석 결과에 영향을 주는가?

Firstly, to understand how methodological differences might affect the results of a meta-analysis, we performed meta-analyses on methodological quality subgroups. For controlled studies with no intervention, we found lower ESs for studies with two or more (versus one) groups and studies in which learners were not blinded to the study hypothesis. We also found lower quality associated with higher ESs for sample representativeness and selection of the comparison group, but 95% CIs overlapped. Media-comparative studies demonstrated a consistent association between lower quality and higher ESs for all features except sample representativeness, allocation concealment and participants blinded to study hypothesis, but differences were relatively small (< 0.2 SDs) and CIs overlapped substantially. By contrast, studies comparing two computer-based interventions showed higher ESs for all high-quality features except allocation concealment, although again CIs showed substantial overlap.



메타 회귀분석

Meta-regression


연구의 특징이 지식 측면의 성과와 연결되는가?

Secondly, we performed meta-regression to identify study features independently associated with knowledge outcomes (see details in Appendix S1). In the analysis of controlled studies with no intervention (n = 126), only the number of groups demonstrated a significant association: studies with two or more groups had a lower average ES than single-group studies (difference ) 0.35, 95% CI ) 0.61 to ) 0.08; p = 0.012).



Pooled estimate와의 차이

Deviation from pooled estimate


Among the nine quality features examined (Table 2), only the number of groups demonstrated a statistically significant difference (difference from pooled ES was 0.83 for single-group studies and 0.49 for two-group studies; between-subgroup difference = 0.34, 95% CI 0.07–0.61; p = 0.013). This indicates that results from single-group pre⁄ post-test studies differ from the pooled estimate (study results either greater than or less than the pooled ES) by about one-third SD more than results from two-group studies.





고찰

DISCUSSION


보고 품질이 일반적으로 suboptimal한 것으로 나타났습니다.

We found reporting quality to be generally suboptimal.


토론 섹션은 특히 결점을 보고하는 경향이있었습니다 : 결과 요약, 연구 제한 및 다른 연구와의 통합이 거의 발견되지 않았을뿐만 아니라, 우리는 연구 결과의 해석에있어 9 %의 경우에 저자와 의견 차이가 있었다. 이러한 의견 차이에서 저자들은 거의 항상 연구 개입을 긍정적으로 판단하였다. 우리는 방법 론적 품질의 세 가지 척도 사이에서 중등도의 고 상관 관계를 발견했으며, 방법론 점수가 높은 연구가 전반적 보고 지수가 더 높다는 것을 발견했습니다.

Discussion sections were particularly prone to reporting deficiencies: not only were the summary of results, study limitations and integration with other studies infrequently identified, but we disagreed with authors in the interpretation of study results 9% of the time and in these disagreements the study authors nearly always favoured the study intervention. We also found moderate to high correlation between three measures of methodological quality and found that studies with higher methodology scores had higher reporting indices.


단일 그룹 사전 / 사후 시험 연구가 두 그룹 연구보다 ES가 더 컸다는 것을 제외하고는, 개별 연구에서의 ES와 풀링 된 평가 사이의 차이의 크기는 고 품질 및 저 품질 실험 연구에서 유의한 차이가 없었다. 메타 회귀 분석은 다른 방법 피쳐를 조정 한 후에도 그룹 수에 대해 유사한 효과를 발견했습니다. 하위 집단 분석에서 우리는 no-intervention 비교 연구와 media-comparative 연구에 일관된 패턴이 없다는 것을 발견했다. 그러나 두 가지 컴퓨터 기반 개입의 비교는 거의 항상 높은 퀄리티의 연구에서 큰 ES가 나타났다.

The magnitude of difference between ESs in individual studies and the pooled estimate across studies was similar for highand low-quality experimental study designs, except that one-group pre⁄ post-test studies deviated more than two-group studies. Meta-regression found a similar effect for number of groups even after adjusting for other method features. In subgroup analyses we found no consistent pattern for no-intervention comparative studies and media-comparative studies; However, comparisons of two computer-based interventions nearly always revealed larger ESs for higher-quality studies.




장점과 단점

Strengths and limitations




다른 연구와의 비교

Comparison with other studies



의학 교육에 대한 다른 연구는 초록, 도입, 연구방법의 보고 품질이 suboptimal하다고 묘사했다 .13-16,38 본 연구에서는 포괄적 인 STROBE 프레임 워크를 사용하여 결과 보고 및 후속 고찰에서 이를 확장합니다.

Other studies in medical education have described suboptimal reporting quality focusing on the abstract,37 introduction15 and selected methods.13–16,38 By using the comprehensive STROBE framework, the present study expands on these, particularly on the reporting of results and subsequent discussion.


임상 의학 11,12,22에서보고 된 것처럼 의학 교육 실험 연구의 보고 품질은 시간이 지남에 따라 향상되었다. 이는 저자 교육이나 인지도 향상, 저널 정책 변화에 따른 것일 수 있습니다.

As has been reported in clinical medicine,11,12,22 it appears that reporting quality in medical education experimental research has improved over time. This may reflect increased author training or awareness, or changes in journal policies.




함의

Implications


보고와 방법 론적 품질 간의 연관성은, 단순히 더 강력한 방법을 사용할 수있는 연구자가 우수한 논문 작성 기술을 가지고 있음을 의미 할 수 있습니다. 또한 우수한 reporting를 통해 독자가 연구의방법 론적 정확성을보다 명확하게 식별 할 수 있음을 의미 할 수도 있습니다.

The association between reporting and methodological quality may simply mean that researchers capable of employing stronger methods have superior writing skills. It may also mean that superior reporting allows readers to discern more clearly a study’s methodological rigor.


정확히 어떤 보고 요소가 가장 중요한지는 개인 소비자의 인식과 목적에 달려 있습니다. 이처럼 목적이 다양하기 때문에 완전한 보고가 중요하고, 필수 보고 요소를 정의하는 지침이 필요하다. 단순히 가이드 라인을 따르는 것이 낮은 퀄리티의 연구나 작문 기술을 보완해주지는 않지만, STROBE, CONSORT 또는 TREND 진술과 같은 가이드 라인에 나열된 요소를 포함하면 광범위한 소비자가 연구 결과를 이해하고 적용 할 수 있습니다.

Precisely which reporting elements are most important depend on the perceptions and purposes of the individual consumer. This variety of purposes underscores the importance of complete reporting, which, in turn, validates the need for guidelines that define essential reporting elements. Rote adherence to guidelines will not compensate for poor-quality research or inferior writing skills, but inclusion of the elements listed in guidelines such as the STROBE, CONSORT or TREND statements will enable a wide range of consumers to understand and apply the study results.


가이드 라인은 도움이된다. 그러나 그것만으로는 부족할 것이다. 10,40,41 

  • Hands-on editing은 보고 품질을 향상시키는 것으로 나타났다. 42-45 

  • 인간 대상 보호와 같은 필수 보고 요소를 철저히 시행하는 저널 정책도 여기에 기여한다 .15 

  • 궁극적으로, 리뷰어와 편집자는 출판 기준을 높일 뿐만 아니라 논문 저자가 이것을 지키기 위해 필요한 기술을 개발할 수 있도록 도와줍니다.

Guidelines help,39 but will be insufficient on their own.10,40,41 Hands-on editing has been shown to improve reporting quality.42–45 Rigorously enforced journal policies on required reporting elements such as human subjects protections also contribute.15 Ultimately, it will fall to reviewers and editors to not only raise the bar, but to help authors develop the skills they need to vault it.


높은 수준의 방법론이나 보고 품질이 타당한 해석 및 결론을 보장하지 않습니다. 우리의 해석이 일부 연구 저자의 결과와 다르다는 사실은 confirmation bias를 시사하며, 저자들이 결과를 더 바람직한 결론을 선호하는 방향으로 해석하는 경향이 있음을 나타냅니다. 거의 20 년 전에 Cohen과 Dacanay는 신기술에 대해 거의 동일한 편견을 보였습니다. 최근에는 Colliver와 McGaghie가 연구 결과의 과도한 해석을 지적했습니다 .47

High methodological and reporting quality does not guarantee valid interpretations and conclusions. The finding that our interpretations differed from those of some study authors suggests confirmation bias, the tendency to interpret results as favouring a more desirable conclusion. Nearly 20 years ago, Cohen and Dacanay reported nearly identical bias towards new technologies.46 More recently, Colliver and McGaghie noted over-interpretation of study results.47



연구자는 의학 교육에서 양적 연구의 질을 어떻게 평가해야합니까? 비교를위한 참조 표준은 없지만 MERSQI와 m-NOS 점수 사이의 높은 상관 관계는 이들이 글로벌 BEME 점수보다 우수 할 수 있음을 시사합니다. MERSQI와 m-NOS가 유사한 도메인을 포함하고 있지만, 개별 스케일 항목은 이러한 도메인을 다르게 처리합니다. 

  • m-NOS에는 평가자의 주관이 더 많이 들어가기 때문에, 다양한 연구 설계에 대한 융통성이 높지만, MERSQI의 평가자간 일치도가 더 높은 것처럼, 에러나 bias의 위험이 있다.

  • MERSQI는 많은 수의 출판 된 연구의 질을 요약 한 점수의 타당성을 뒷받침 할 수있는 상당한 증거를 축적 해 왔으며, 이는 이러한 응용에 이점을 줄 수 있습니다.

How should researchers grade the quality of quantitative research in medical education? Although there is no reference standard with which to make comparison, the high correlation between the MERSQI and m-NOS scores suggests they may be superior to the global BEME score. Although the MERSQI and m-NOS cover similar domains , individual scale items address these domains differently. 

  • The m-NOS entails more rater subjectivity, which enhances flexibility for different study designs but increases the riskof reviewer error or bias, as reflected in the generally higher rater agreement for the MERSQI. 

  • The MERSQI has accumulated considerable evidence to support the validity of scores for summarising the quality of large numbers of published studies25 and this may confer an advantage for such applications.



단일 그룹 사전 / 사후 시험 연구는 이 설계가 겪고있는 여러 가지 유효성 위협을 감안할 때 예상되는 것처럼 ES를 과대 평가할 수 있습니다 1. 우리의 연구 결과가 다른 표본에서 확인을받을지라도, 연구 방법과 연구 계획간에 다른 명확한 연관성이 없기 때문에, 더 나은 연구방법이 진실에 가까운 정량적 추정을 제공한다는 conventional wisdom은 의심스럽다. 무작위 연구와 관찰 연구의 차이는 거의 없지만, 무작위 설계만이 개입과 결과간에 명확한 인과 관계를 허용한다. 그럼에도 불구하고 다양한 연구 방법을 사용하여 좋은 근거를 축적 할 수 있습니다. 연구자은 먼저 중요한 연구 질문을 던지고, 특정 연구 설계를 수용하기보다는 타당하게 연구를 해석하는 것에 대한 위협을 최소화하는 데 집중해야한다고 생각합니다.

It appears that single-group pre⁄ post-test studies may overestimate the ES, as might be expected given the multiple validity threats from which this design suffers.1 Although our findings merit confirmation in other samples, the absence of other clear associations between study methods and ESs calls into question the conventional wisdomthat better methods provide quantitative estimates closer to truth. Although we found little difference in ES between randomised and observational studies, only randomised designs permit a clear causal link between the intervention and the outcome. Nonetheless, good evidence can be accumulated using a variety of study methods. We believe that researchers should focus first on asking important research questions and then on minimising the threats to valid study interpretation, rather than embracing a specific research design.






 2011 Mar;45(3):227-38. doi: 10.1111/j.1365-2923.2010.03890.x.

Method and reporting quality in health professions education research: a systematic review.

Author information

1
Division of General Internal Medicine, College of Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.

Abstract

CONTEXT:

Studies evaluating reporting quality in health professions education (HPE) research have demonstrated deficiencies, but none have used comprehensive reporting standards. Additionally, the relationship between study methods and effect size (ES) in HPE research is unknown.

OBJECTIVES:

This review aimed to evaluate, in a sample of experimental studies of Internet-based instruction, the quality of reporting, the relationship between reporting and methodological quality, and associations between ES and study methods.

METHODS:

We conducted a systematic search of databases including MEDLINE, Scopus, CINAHL, EMBASE and ERIC, for articles published during 1990-2008. Studies (in any language) quantifying the effect of Internet-based instruction in HPE compared with no intervention or other instruction were included. Working independently and in duplicate, we coded reporting quality using the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement, and coded study methods using a modified Newcastle-Ottawa Scale (m-NOS), the Medical Education Research Study Quality Instrument (MERSQI), and the Best Evidence in Medical Education (BEME) global scale.

RESULTS:

For reporting quality, articles scored a mean±standard deviation (SD) of 51±25% of STROBE elements for the Introduction, 58±20% for the Methods, 50±18% for the Results and 41±26% for the Discussion sections. We found positive associations (all p<0.0001) between reporting quality and MERSQI (ρ=0.64), m-NOS (ρ=0.57) and BEME (ρ=0.58) scores. We explored associations between study methods and knowledge ES by subtracting each study's ES from the pooled ES for studies using that method and comparing these differences between subgroups. Effect sizes in single-group pretest/post-test studies differed from the pooled estimate more than ESs in two-group studies (p=0.013). No difference was found between other study methods (yes/no: representative sample, comparison group from same community, randomised, allocation concealed, participants blinded, assessor blinded, objective assessment, high follow-up).

CONCLUSIONS:

Information is missing from all sections of reports of HPE experiments. Single-group pre-/post-test studies may overestimate ES compared with two-group designs. Other methodological variations did not bias study results in this sample.

PMID:
 
21299598
 
DOI:
 
10.1111/j.1365-2923.2010.03890.x
[Indexed for MEDLINE]


+ Recent posts