DREEM 파트2: osteopathic 학생에서 심리측정적 특성(BMC Med Educ, 2014)

The DREEM, part 2: psychometric properties in an osteopathic student population

Brett Vaughan1,2*†, Jane Mulcahy1† and Patrick McLaughlin1,2† 





배경

Background


DREEM(Dundee Ready Education Environment Measure)은 Roff 등에 의해 개발되었다. [1]

The Dundee Ready Educational Environment Measure (DREEM) was developed by Roff et al. [1]



이 검토[3-13] 이후 DREEM을 사용한 추가 연구들이 발표되었다. 단, Hammond 등 이외의 연구도 발표되었다. [3] 아무도 이 척도의 정신학적 특성을 조사하지 않았다.

Additional studies using the DREEM have been published since this review [3-13], however, other than Hammond et al. [3] none investigated the psychometric properties of the measure.


많은 저자들이 DREEM의 요인 구조를 조사했으며 5요소 구조를 재현하지 못했다[3,15-17]. 해먼드 외 [3] DREEM의 심리측정적 특성과 관련된 여러 가지 문제를 강조하였다. 그들의 연구는 DREEM의 원래 5요소 모델에 맞는 것을 생산하기 위해 50개 항목 중 17개를 제거해야 한다는 것을 보여주었다. 유소프 [18]는 말레이시아 의대생 모집단에서 확인 인자 분석(CFA)(1+1 요인 모델과 원래 5 요인 모델의 분석)을 사용하여 DREEM의 5개 모델을 생성했다. 모델 적합은 17개 항목의 원래 5요소 모델을 통해서만 달성되었다. Jakobsson 등이 수행한 탐색 요인 분석. [17] 스웨덴어 버전의 DREEM에 대한 연구에서 5개에서 9개의 요인 솔루션 사이에서 밝혀졌다. 이 저자들은 [3,17,18] DREEM 척도의 내부 일관성과 구성 타당성이 안정적이지 않으며, 모델 자체를 개정할 필요가 있을 수 있다고 결론지었다.

A number of authors have investigated the factor structure of the DREEM and have failed to reproduce the 5-factor structure [3,15-17]. Hammond et al. [3] highlighted a number of issues with the psychometric properties of the DREEM. Their research indicated that in order to produce a fit for the original 5-factor model of the DREEM, 17 out of the 50 items had to be removed. Yusoff [18] produced five models of the DREEM using confirmatory factor analysis (CFA) (plus one one-factor model and an analysis of the original five-factor model) in a Malaysian medical student population. Model fit was only achieved with the original five-factor model with 17 items. An exploratory factor analysis conducted by Jakobsson et al. [17] revealed between five and nine factor solutions in their study of a Swedish version of the DREEM. These authors [3,17,18] have concluded that the internal consistency and construct validity of the measure is not stable, and that the model itself may need to be revised.



방법

Methods


Study design


Participants


Data collection


Data analysis


Data were entered into SPSS for Mac (IBM Corp, USA) for analysis. A flow diagram outlining the data analysis process is found at Figure 1. The data were transformed and a CFA was performed on the data set with the 5-factor structure identified by Roff et al. [19] and then on the 5-factor structure model proposed by Hammond et al. [3]. The SPSS data file was transferred to AMOS Version 21 (IBM Corp) for the CFA calculation using the Maximum Likelihood Estimation method. CFA investigates the fit of the data to the constructed model, and presents relationships between the data in the model and estimations of error. In the CFA a range of model fit statistics are generated to describe how the data fits the model being tested. Readers are encouraged to access Brown [20] and Schreiber et al. [21] who present further detail about the CFA process and the fit statistics. The data were not normally distributed a bootstrapping procedure was applied for each of the two models, 1000 iterations of the data were generated. No changes to either of the models were made based on the results of the CFA.



Given the authors of the DREEM have recommended calculating a total score for the scale, a Rasch analysis is appropriate [22]. Rasch analysis provides a mathematical model of the data that is independent of the sample, rather than the sample dependent calculation used in classical test theory [23,24]. In this analysis, data are fitted to the Rasch mathematical model as closely as possible [23]. The data were converted in SPSS to an ASCII format and imported into the RUMM2030 (RUMM Laboratory, Australia) program for Rasch analysis, where the polytomous Partial Credit Model was used.


The RUMM2030 program produces three model fit statistics in order to determine the fit to the model. The first is an item-trait chi-square (χ2) statistic demonstrating the invariance across the trait being measured. A staχ2 tistically significant Bonferonni-adjusted indicates misfit to the Rasch model. The other two statistics relate to the item-person interaction, where the data is transformed to approximate a z distribution. A fit to the Rasch model is indicated by a mean of 0 and a standard deviation (SD) of 1. Further, individual item and person statistics are presented as residuals and a χ2 statistic. SD’s Residual greater than ± 2.5 and/or significant Bonferroni-adjusted χ2 statistics indicate poor item fit, and residual SD’s greater than ± 2.5 indicate a poorly fitting person(s). Person fit issues can produce misfitting items [25]. Internal consistency of the scale is calculated using the Person Separation Index (PSI) which is the ratio of true variance to observed variance using the logit scores [22]. The minimum PSI is 0.70 for group use which indicates acceptable internal consistency [22].


Examination of the fit of each item to the Rasch model is undertaken by observing the item thresholds and category probability curves. The threshold is the point at which there is an equal probability of the respondent selecting one option over another, in order (i.e. 2 or 3 on the item scale, not 1 or 3). RUMM2030 provides two graphical approaches for observation of the thresholds, a threshold map and the category probability curve. Disordered thresholds can exist where respondents are not selecting the responses in an ordered fashion. This can sometimes be resolved by rescoring the item in order to collapse one or more scale response options into one score. An example of this rescoring is where the original scale scoring was 1, 2, 3, 4, 5 with a disordered threshold; the item may be rescored as 1, 1, 2, 3, 4 for example. To resolve the disordering, RUMM2030 requires that scale options are coded sequentially.


Person fit issues are examined using the fit residual SD. If the SD is between −2.5 and +2.5 then the person’s response to the scale is deemed to fit the Rasch model. Generally, person’s whose responses are outside of this range are removed from the analysis.


Once any person and item issues have been resolved, differential item function (DIF) is examined. DIF is where the response to an item on the scale is consistently dependent upon a factor outside of that being measured on the scale (i.e. age, gender). In RUMM2030, DIF can be viewed graphically and in table form. In the table, a Bonferonni-adjusted statistically significant p-value indicates a significant main effect for that factor. RUMM2030 provides the opportunity to spilt items affected by DIF in order to score the item based on the factor affecting the item [25]. This may produce different subscale or total scale scores. Where DIF is undesirable, the item may need to be removed from the scale.


Residual correlations are then calculated to observe whether there is local dependency. Local dependency is where one item on the scale correlates with another, inflating the PSI. In RUMM2030, items that have a correlation of 0.20 or more are examined. Where there is a substantial change in the PSI (often a decrease), removal of one of the items is often required. When all scale issues have been resolved, a principal components analysis is undertaken to assess the unidimensionality of the scale. Unidimensionality is an underlying assumption of the Rasch model [22,26]. Performing a paired t-test on the items loading on the first factor (or Rasch factor) allows for the examination of whether the person estimate for the first factor differs from that of all of the items combined. When the person estimate is the same for the first factor and all scale items, the scale is determined to be unidimensional. Unidimensionality is a desirable outcome for scales of this type as it indicates that the scale is measuring a single underlying construct. Tennant & Conaghan [22] provide an overview of testing for dimensionality.



결과

Results


표1

Table 1.



전반적 척도

Overall scale


Statistics for the CFA for both models are presented in Table 2. The data from the present study did not fit either model for any of the fit statistics. The path models generated by AMOS are at Additional file 1 (Roff et al. [19] scale) and Additional file 2 (Hammond et al. [3] scale).



The data did not fit the Rasch model as demonstrated by the statistically significant χ2 value (p < 0.0001). The PSI (0.922) indicated internal consistency of the DREEM. The standard deviation fit residuals for both items (1.86) and persons (1.93) were greater than 1.5 indicating that both the DREEM items and person responses did not fit the Rasch model. Poor fit residuals (>2.5) were noted for items 9, 7, 19, 27, 28 and 50, along with statistically significant χ2 values for items 16, 25, 27, 28 and 35, indicating a poor fit of these items to the Rasch model. Disordered thresholds were observed for items 1, 2, 5–7, 12–16, 18–24, 27, 28, 33–35, 38, 40–45, 47 and 49. Forty-four persons also failed to fit the Rasch model. Differential item functioning was analysed for each item. Age and receiving a government allowance did not impact on any items. Gender (item 45), employment (item 31) and year level (items 2, 6, 10, 15, 17, 18, 20, 22, 24, 26, 28, 31, 38, 40, 50) demonstrated DIF. Six separate Rasch analyses failed to produce a satisfactory unidimensional model fit.


Subscale rasch analysis


Perception of teaching


The item-trait interaction was statistically significant (p = 0.000043) suggesting misfit between the data and Rasch model. The PSI was 0.853 indicating acceptable internal consistency. Person fit was acceptable (fit residual SD= 1.30) however the item fit residual SD was 1.69, beyond the recommended cut-off of 1.50.


Four separate analyses were conducted that included recoding of item response scales, and deletion of misfitting persons and items. A model fit was achieved through the deletion of 5 of 12 items (1, 7, 25, 38, 44) and the removal of data for 25 of 245 misfitting persons. No recoding of the remaining scale items was necessary. The model fit statistics were χ2 =0.694, PSI =0.819, item fit residual SD=1.25, and person fit residual SD=0.94. The remaining items were 13, 16, 20, 22, 24, 47, and 48 (Table 3). No threshold disordering was present and the residual correlations did not indicate any local dependency. Age, gender, employment status, and receiving a government allowance did not demonstrate DIF. Items 22 and 24 demonstrated DIF for year level. PCA demonstrated a unidimensional scale.


Perception of Teachers


Academic self-perception


Perception of Atmosphere


Social self-perception




라쉬 분석기반 DREEM 

Rasch-analysed DREEM


The 23 items from the Rasch-analysed subscales were then reanalysed as one whole scale using the Rasch model, in order to determine if the modified 5-factor DREEM was unidimensional. The scale fit statistics were: chi-square (χ2 < 0.001), PSI (0.872), itemfit residual (SD= 1.79) and person fit residual (SD= 1.51). These statistics indicate a poor fit to the Rasch model. Items 26, 46 and 50 demonstrated a poor fit and disordered thresholds were observed for items 13, 16, 20, 21, 22, 24, 34 and 47. Twenty-three misfitting persons were also identified.


Four Rasch models were generated. Ten items (3, 4, 10, 16, 20, 26, 31, 46, 47, 50) were deleted as were data for 28 persons. Rescoring of item 21 was required in order to resolve the threshold disordering – rather than being scored as 0, 1, 2, 3, 4 the itemwas scored 0, 0, 1, 2 and 3 (Figure 2). DIF for age was observed for item 37 and DIF for year level was observed at item 22, 24, and 37. Item 37 was subsequently deleted and this also resolved the DIF for item 22. The scale fit statistics following these modifications were: chi-square (χ2 = 0.421), PSI (0.859), item fit residual (SD= 1.04) and person fit residual (SD= 1.00). No residual correlations were observed and the PCA indicated the scale was unidimensional. The 12-item version of the DREEMcan be summed to produce a total score for the scale. The threshold map for the revised scale is at Figure 3. The personitem threshold is displayed at Figure 4 and demonstrates a mean person location of 1.578.






고찰

Discussion


본 연구는 고전적인 시험 이론과 항목 반응 이론을 모두 사용하여 DREEM의 정신학적 특성을 제시하였다. VU의 골병리학 프로그램에서 학생 모집단의 데이터는 Roff 등이 제안한 DREEM의 a-priori 5-요인 구조에 맞지 않았다. [1] 및 Hammond 등[3]이 제안한 요인 구조. 해먼드 외 [3] 17개 항목은 0.7 미만의 적합 지수를 가지고 있으므로 제거해야 한다는 점에 주목했다. 그러나 이들의 모델링은 18개 항목의 지수가 0.7 미만이고 추가 파일 2의 모델은 18개 항목의 삭제를 나타낸다. 현재 데이터에 맞는 모델을 개발하려면 두 모델에 모두 상당한 변경이 필요할 것이다. Yusoff [18]은 말레이시아 의대생 표본에서 원래의 5요소 구조에 맞추기 위해 DREEM의 단축판(17항목)이 필요하다는 것을 입증했다. 이러한 결과는 본 연구에서 수행된 바와 같이 문항응답이론을 사용하여 DREEM 구조에 대한 추가 분석의 필요성을 뒷받침한다.

This study has presented the psychometric properties of the DREEM using both classical test theory and item response theory. Data from the student population in the osteopathy program at VU did not fit either the a-priori 5-factor structure of the DREEM proposed by Roff et al. [1] nor the factor structure proposed by Hammond et al. [3]. Hammond et al. [3] noted that 17 items had fit indices less than 0.7 and should be removed. However their modelling suggested that 18 items had indices less than 0.7 and the model at Additional file 2 represents the deletion of 18 items. Significant changes to both models would be required in order to develop a model that fits the current data. Yusoff [18] demonstrated in a sample of Malaysian medical students, that a shortened version (17 items) of the DREEM was required in order to fit the original five factor structure. These results support the need for further analysis of the structure of the DREEM using item response theory, as undertaken in the present study.


Rasch 분석은 반응과 무관하게 척도를 분석할 수 있는 기회를 제공하며, 척도를 정련하여 심리측정적 가치를 높일 수 있는 역할을 할 수 있다. 데이터는 탐사 인자 분석을 통해 확인된 a-priori 구조(즉, CFA) 또는 요인 구조에 데이터를 맞추는 대신 Rasch 수학적 모델에 적합하다. DREEM의 데이터를 Rasch 모델에 입력했으며 전체 50개 항목 규모에 적합한 모델을 식별할 수 없었다. 해먼드 외 [3] DREEM이 1-요인 척도일 수 있음을 제안하였다. Rasch 모델에 맞지 않는다는 것은 50개 항목 DREEM이 단차원적이지 않으며 교육 환경의 기본적인 a-priori 구조를 측정하지 않을 수 있음을 시사한다. 이 결과는 또한 각 항목의 결과를 DREEM의 총 점수로 합산하는 것이 문제가 될 수 있으며 타당한 실천sound practice이 아닐 수 있음을 시사한다[27].

Rasch analysis provides an opportunity to analyse a scale independently of the responses and can play a part in refining a scale to enhance its psychometric value. Data is fitted to the Rasch mathematical model rather than fitting the data to the a-priori structure (i.e. CFA) or factor structure identified through an exploratory factor analysis. Data from the DREEM were entered into the Rasch model and no suitable model could be identified for the full 50-item scale. Hammond et al. [3] suggested that the DREEM may be a one-factor scale. The lack of fit to the Rasch model suggests the 50-item DREEM is not unidimensional and possibly not measuring the underlying a-priori construct of educational environment. This result also suggests that summing the result of each item into a total score for the DREEM may be problematic and not sound practice [27].



DREEM의 50개 항목 버전에 대한 PSI는 0.922로, 스케일이 내부적으로 일치함을 나타낸다. 이는 크론바흐의 알파 점수가 0.75 [28], 0.87 [5], 0.89 [29], 0.90 [6,30], 0.912 [31], 0.93 [7,18,32,33]인 저자와 일치한다. 알파 점수의 이러한 변동성은 통계의 표본 의존성을 입증하며 저자가 DREEM의 심리학적 상태를 계속 조사해야 할 필요성을 뒷받침한다. 또한 0.90을 초과하는 알파 점수는 DREEM 항목에 중복성이 있을 수 있음을 시사한다. 서로 강하게 상관관계가 있는 항목은 점수를 부풀릴 수 있기 때문이다.

The PSI for the 50-item version of the DREEM was 0.922 indicating that the scale is internally consistent. This is in agreement with a range of authors who have reported Cronbach’s alpha scores of 0.75 [28], 0.87 [5], 0.89 [29], 0.90 [6,30], 0.912 [31], and 0.93 [7,18,32,33]. This variability in alpha scores demonstrates the sample-dependent nature of the statistic and supports the need for authors to continue to investigate the psychometrics of the DREEM. Additionally, the alpha scores over 0.90 suggest that there may be redundancy in the DREEM items, as items that correlate strongly with each other can inflate the score.



DREEM에 복수의 치수가 있는지 확인하기 위해, Roff 등으로부터 식별된 각 항목에서 Rasch 분석을 실시했다. [19. 다양한 퀄리티의 5개 하위항목별 Rasch 모델 적합이 달성되었다. 가장 잘 맞는 것은 PSI가 0.80이 넘는 SPL 척도였. 이것은 이 하위 척도에 대한 크론바흐의 알파 점수가 종종 강한 이전의 연구와 일치한다. 예를 들어, Hammond 등. [3]와 De Oliveira Filho 등[34]은 비록 Ostapczuk 등이지만 의대생 모집단에서 각각 0.80과 0.82의 알파 점수를 보고했다. [9] 독일 치과의학 학생 모집단에서 알파 점수가 0.70이라고 보고했다. 현재 연구에서 이 하위 척도의 정교화는 내적 일관성에 큰 영향을 미치지 않았으며 나머지 7개 항목은 학생들의 교수에 대한 인식의 척도를 제공하는 것으로 보인다. 단일 하위 척도로서 1차원이며 각 항목에 대한 반응을 종합하여 총점을 만들 수 있다.

In order to establish whether there were multiple dimensions to the DREEM, Rasch analyses were conducted for the items on each of the subscales identified by Roff et al. [19]. Rasch model fit was achieved for each of the 5 subscales with varying degrees of quality of fit. The strongest fit was the Perception of Teaching subscale where the PSI was over 0.80. This is consistent with previous research where the Cronbach’s alpha score for this subscale is often strong. For example, Hammond et. al. [3] and De Oliveira Filho et al. [34] reported alpha scores of 0.80 and 0.82 respectively in medical student populations, although Ostapczuk et al. [9] reported an alpha score of 0.70 in a German dental student population. In the current study the refinement of this subscale has not significantly impacted internal consistency and it would appear that that the remaining 7 items provide a measure of students’ perception of teaching. As a single subscale, it is unidimensional and the responses to each item can be summed to create a total score.


본 연구에서는 교육과정의 학년에 의해 영향을 받는 항목이 몇 가지 있었다. 항목 22(교직은 나의 자신감을 계발할 만큼 충분히 염려된다)와 24(교직 시간은 잘 활용된다)는 둘 다 학년에 따른 DIF를 입증했다. 이 두 항목은 다른 여러 항목과 함께 2학년 학생들과 다른 모든 학년 수준 간에 서로 다른 응답 패턴을 보였다. 본 연구의 제1부에서는, 제2학년도의 골수학 프로그램에 대한 학생들의 인식에 여러 가지 문제가 있는 것으로 확인되었다. 이러한 인식의 차이는 이러한 항목과 함께 DIF가 존재하는 이유일 수 있으며, 향후 연구에서는 학년이 이러한 항목에 영향을 미치는지 여부를 조사해야 한다.

In the current study there were some instances of items being affected by year level in the course. Items 22 (The teaching is sufficiently concerned to develop my confidence) and 24 (The teaching time is put to good use) both demonstrated DIF for year level. These two items, along with a number of other items, exhibited different response patterns between year 2 students and all other year levels. In part 1 of this study, it was identified that there were a number of issues with the students’ perception of the osteopathy program in year 2. This difference in perception may be the reason for the presence of DIF with these items, and future studies should investigate whether student year level affects these items.


문항 자체 및 척도 완성 방법과 관련된 여러 가지 문제가 식별되었다. 이 문제는 항목의 표현이나 '중립neutral' 범주의 사용에 있을 수 있다. systematic한 응답편향 문제를 피하기 위해 사용되기는 하지만, 부정적 단어 또는 표현된 항목은 잠재적으로 문제가 될 수 있다[35].

There were a number of issues identified that related to the items themselves and the way that participants completing the measure responded to them. It is possible that the issue may lie in the wording of the item, or the use of a ‘neutral’ response category. Negatively worded or phrased items are potentially problematic [35] although they are used to avoid systematic response bias issues.


이 문구는 항목에 대한 해석이 사람마다 불균일하게 다르다는 것을 의미할 수 있으며, CTT와 IRT에서 모두 척도의 심리측정적 특성에 영향을 미칠 수 있다. 본 연구에서는, 부정적으로 표현된 많은 항목들이 척도의 단축버전에 유지되었다. 그 예로는 11항(교직은 지나치게 교사 중심적이다)이 있다. 반대로 9번 항목(교사들은 권위주의적이다)은 삭제되었다. 학생들은 이것이 상당히 강한 표현이며 그 문항에 대한 예상 점수에 부합하지 않는 방식으로 반응하는 것으로 인식될 수 있다.

The phraseology can mean that the interpretation of the item varies from person to person in a non-uniform manner and can impact on the psychometrics of a scale, both in CTT and IRT. In the present study, a number of negatively phrased items were retained in the brief version of the measure. An example is item 11 (The teaching is too teacher-centred) in the Perception of Teaching subscale. Conversely, item 9 (The teachers are authoritarian) was removed. Students possibly perceived this to be quite a strong statement and responding to the item in such a way that it did not fit the expected score for that item.


이전에 자기보고식 척도에서 중립적인neutral 응답옵션에 문제가 있을 수 있다는 주장이 제기되었으며, 일부는 이 점수 매기기 방법에 반대하였다[36,37]. 콜라 외 [38]에서는 응답 범주가 척도상의 진정한 중간점으로 인식되지 않는 경우에 특히 그러하다고 제안하였다. 이 저자들은 응답자들이 문항에 어떻게 반응해야 할지 확실하지 않을 때 이러한 유형의 중간점을 사용하는 것을 "쓰레기통dumping ground"라고 표현했다. 

It has been previously suggested there may be issues with neutral response options in self-report measures and some have counselled against this method of scoring [36,37]. Kulas et al. [38] have suggested this is particularly so if the respondents do not perceive the response category to be a true mid-point of the statements on the scale. These authors suggested the term‘dumping ground’ for this type of midpoint as respondents are likely to use it when they are unsure of how to respond to the item. 


현재 연구에서는 문항을 rescore할 수 없었기 때문에 Rasch 분석 중에 제거된 여러 항목이 있을 수 있다. 응답은 다른 인접 응답 옵션과 유사한 항목을 다시 검색하기 위해 함께 접을 수 있다[37]. 예를 들어, DREEM에서 'SD'와 'D' 반응은 모두 동의하지 않음을 나타내기 합쳐질 있다. 중립 응답은 다른 응답과 합치는collapse 것이 불가능하다. 본 연구의 항목 21은 이러한 재구성의 예를 제공하며 그 효과는 그림 2에서 입증된다.

In the current study it is possible that a number of items that were removed during the Rasch analysis because it was not possible to rescore the items. Responses can only be collapsed together to rescore the item where they are similar with another adjacent response option [37]. For example, in the DREEM the ‘strongly disagree’ and ‘disagree’ responses can be collapsed together, as they both represent the same do not agree responses. It is not possible to collapse another response with the neutral response category. Item 21 in the present study provides an example of this rescoring and the effect is demonstrated at Figure 2.


5요인 DREEM

Five-factor DREEM


본 연구는 5개 항목별 개별 Rasch 분석을 기반으로 DREEM의 수정된 버전을 개발하였다(표 3). 각 하위 척도를 분석하여 항목이 각 인자의 구조를 측정하는지 여부를 판단하였다. 비록 5개 항목 중 4개 항목에서 PSI가 허용 수준 0.70 미만이었지만, 5개 subscale 모두에 대해 Rasch 모델에 장착되었다[22]. 오직 perception of teaching 척도만이 허용 가능한 PSI를 달성했다. 저자들은 대부분의 subscale에 대한 낮은 PSI 값을 고려할 때 DREEM(50개 항목)의 원래 버전을 사용할 경우 주의를 기울여야 한다고 제안한다. 이러한 낮은 PSI 값은 크론바흐 알파와 표본에 독립적인 PSI 사이의 차이를 강조한다. 

The present study has developed a modified version of the DREEM based on individual Rasch analyses of the 5 subscales (Table 3). Each subscale was analysed to determine if the items were measuring the construct of each factor. Fit to the Rasch model was achieved for all 5 subscales albeit that the PSI for 4 out the 5 subscales was below an acceptable level of 0.70 [22]. Only the Perception of Teaching subscale achieved an acceptable PSI. The authors suggest caution should be applied if the original version of the DREEM (50 items) is to be used, given the low PSI values for the majority of the subscales. These low PSI values highlight the difference between the sample-dependent nature of Cronbach’s alpha and the sample-independent PSI. 


후속 연구가 이 5요인 버전의 DREEM을 사용하려는 경우, 5개의 독립적인 단차원 척도로 구성된다는 점을 감안할 때 총점이 생성되어서는 안 된다. 그러나 각 하위 척도 내의 항목은 합계sum될 수 있다. 본 연구와 마찬가지로, Hammond 등에 의한 연구. [3]와 Yusoff[18]는 심리측정적으로 타당한 척도를 개발하기 위해 DREEM에서 상당한 수의 문항을 제거했다.

If subsequent studies are to use this 5-factor version of the DREEM, a total score should not be generated, given that it consists of 5 independent unidimensional scales. However, the items within each subscale can be summed. As in the present study, the studies by Hammond et al. [3] and Yusoff [18] removed a substantial number of items from the DREEM in order to develop a psychometrically sound scale.


제거된 문항을 재검토하는 것이 중요하다. 

    • 왜냐하면 그 문항이 교육 풍토의 중요한 측면을 조사할 수 있고, 단어를 바꾸어 재기술 할 수 있기 때문이다. 

    • 또는 모든 항목에 대해 조정된 점수, 중립적 대응 범주를 제거하거나, 다른 유사한 항목과 함께 collapse 수 있기 때문이다 

It is important to review the items that have been removed as 

    • they may investigate important aspects of the educational climate, and could be reworded; 

    • or the scoring adapted for all items, to remove the neutral response category, or collapsed with other similar items. 


부정적인 단어 항목이 문제가 되는 것으로 보고되었지만, Rasch 분석된 항목에는 여전히 이러한 방식으로 단어화된 예들이 있다.

Although negatively worded items have been reported to be problematic, there are instances in the Rasch analysed subscales that are still worded in this manner.


단축형 DREEM

‘Short-form’ DREEM


모델 적합은 12개 항목에서 단차원적인 척도로 달성되었다. 항목24(교습 시간이 잘 사용됨)는 18~20세 사이의 연령에 대한 DIF를 입증했다. 현재 연구에서 개발된 DREEM 단축버전은 (모두 항목 22, 24 및 30만 제외하면), 유소프가 개발한 17개 항목과 거의 중복되지 않는다. 이러한 차이의 가능한 이유는 Yusoff[18] 연구에서 CTT를 사용하고 현재 연구에서 IRT를 사용하기 때문이다. Yusoff 연구에서 17개 항목을 검증하기 위한 추가 작업이 필요하다.

Model fit was achieved for a 12-item scale that was unidimensional. Item24 (The teaching time is put to good use) demonstrated DIF for age with those between 18–20 years of age. There is very little overlap with the 17-item scale developed by Yusoff [18] with only items 22, 24 and 30 appearing in both the short-form version of the DREEM developed in the present study and that developed by Yusoff. A possible reason for this difference is the use of CTT in the Yusoff [18] study and the use of IRT in the present study. Further work to validate the 17-item in the Yusoff study is required.


본 연구에서 개발된 DREEM의 12개 문항 버전은 사회적 자기 인식을 제외하고 원래 DREEM 항목별 항목을 포함하고 있다는 점을 감안할 때 잠재적으로 '단순형' 버전으로 사용될 수 있다. 이러한 규모의 장점은 응답자에 대한 잠재적인 수용성과 관리의 효율성이다. 이 개정된 버전의 DREEM은 그 단차원적 특성을 고려할 때 전체 척도에 대한 총 점수를 산출하기 위해서 합산sum할 수 있다[39]. 건강과학 학생 모집단을 대상으로 유효하거나 신뢰할 수 있는 척도로 사용하기 전에 학습 환경을 평가하기 위해 단형 버전을 검증해야 한다. 이 작업은 전체 DREEM[26]에 대한 응답과 비교하여 추가 학생 샘플을 사용하여 수행되는 것이 바람직하다.

The 12-item version of the DREEM developed in the present study could potentially be used as a ‘short-form’ version given that it contains items from each of the original DREEM subscales, except Social self-perception. The advantage of such a scale is the potential acceptability to respondents and efficiency of administration. This revised version of the DREEM can be summed to produce a total score for the entire scale given its unidimensional nature [39]. Validation of the short-form version is required to evaluate the learning environment prior to using it as a valid or reliable measure with health science student populations. Preferably this should be undertaken with additional student samples, through comparison with responses to the full DREEM[26].


결론

Conclusions


Rasch 분석과 함께 두 가지 버전의 DREEM이 생산되었다. 

  • 한 버전은 원본 저자가 제안한 5단계 척도에 기초하고 

  • 다른 버전은 12개 항목 '단순형' 버전에 기초한다.

Two versions of the DREEM were produced with the Rasch analysis. One version is based on the 5-factor scale proposed by the original authors and the other a 12-item ‘short-form’ version.


DREEM을 사용하는 기관은 자체 모집단에서 측정의 정신학적 특성을 조사하고 보고하도록 권장된다. 총점 계산 시, 본 연구 결과에 따르면, 교육 환경의 단일 기초 구조를 측정하고 있지 않음을 시사하는 50항목 DREEM도 주의할 것을 권고한다.

Those institutions that use the DREEM are encouraged to investigate and report the psychometric properties of the measure in their own populations. Caution is also advised when calculating a total score the 50-item DREEM given the results of the present study suggest that it is not measuring the single underlying construct of educational environment.







 2014 May 20;14:100. doi: 10.1186/1472-6920-14-100.

The DREEMpart 2psychometric properties in an osteopathic studentpopulation.

Author information

1
Discipline of Osteopathic Medicine, College of Health & Biomedicine, Victoria University, Melbourne, Australia. brett.vaughan@vu.edu.au.

Abstract

BACKGROUND:

The Dundee Ready Educational Environment Measure (DREEM) is widely used to assess the educational environment in health professional education programs. A number of authors have identified issues with the psychometric properties of the DREEMPart 1 of this series of papers presented the quantitative data obtained from the DREEM in the context of an Australian osteopathy program. The present study used both classical test theory and item response theory to investigate the DREEM psychometric properties in an osteopathy student population.

METHODS:

Students in the osteopathy program at Victoria University (Melbourne, Australia) were invited to complete the DREEM and a demographic questionnaire at the end of the 2013 teaching year (October 2013). Data were analysed using both classical test theory (confirmatory factor analysis) and item response theory (Rasch analysis).

RESULTS:

Confirmatory factor analysis did not demonstrate model fit for the original 5-factor DREEMsubscale structure. Rasch analysis failed to identify a unidimensional model fit for the 50-item scale, however model fit was achieved for each of the 5 subscales independently. A 12-item version of the DREEM was developed that demonstrated good fit to the Rasch model, however, there may be an issue with the targeting of this scale given the mean item-person location being greater than 1.

CONCLUSIONS:

Given that the full 50-item scale is not unidimensional; those using the DREEM should avoid calculating a total score for the scale. The 12-item 'short-form' of the DREEM warrants further investigation as does the subscale structure. To confirm the reliability of the DREEM, as a measure to evaluate the appropriateness of the educational environment of health professionals, further work is required to establish the psychometric properties of the DREEM, with a range of student populations.

PMID:
 
24884704
 
PMCID:
 
PMC4050100
 
DOI:
 
10.1186/1472-6920-14-100


+ Recent posts