Evaluating Reliablility and Validity Evidence for Merrill's 2007 5 Star Instrument
Author(s)Cropper, Max Hale
Full recordShow full item record
AbstractMerrill’s 2007 5 Star instrument, hereafter called the M-5 Star, is based on Merrill’s well recognized First Principles of Instruction. However, the instrument has not been tested for reliability and validity. In a pilot study, Cropper’s version of his instrument (C-5 Star) showed some reliability and validity evidence, but the M-5 Star needed similar evidence. To address this gap in the literature, the purpose of this study is to assess the reliability and validity evidence for M-5 Star. Raters were drawn from a graduate course in online course evaluation and asked to rate a sample (N = 6) of exclusively online university classes using M-5 Star and three comparison instruments. The comparison instruments also purport to examine course quality but lack the emphasis on instructional strategies in M-5 Star. Interrater reliability evidence for the M-5 Star and the comparison instruments was moderate to substantial (M5-Star ICC = .56, p = .001); Texas IQ ICC = .43, p = .001; WebCT ICC = .75, p = .001: SREB ICC = .53, p = .001). However, interrater reliability was tentative because rater pair scores were averaged, biasing the scores toward agreement and inflating ICC. Low correlations between M-5 Star (the criterion) and the comparison measures indicate divergent validity support that M-5 Star is measuring a different core concept of quality in online classes. M-5 Star correlation with WebCT was r = .39, p = .44 (r2 = .15), with WebCT was r = .44, p = .38 (r2 = .20), and with SREB was r = .43, p = .39 (r2 = .19). In addition to divergent validity analysis, a content validity index (CVI) analysis was undertaken using experts in the area of First Principles. According to First Principles experts, other than a few items on the rubric, the vast majority of the M-5 Star CVI results support close alignment with Merrill’s First Principles of Instruction. Of the 63 M-5 Star individual items, 56 (89%) of them received high scores on Aiken’s CVI that were significant at the .10 level. Study limitations are discussed at length, alongside calls for future research and practical and scholarly significance for the research.