Table 1 shows the descriptive statistics and the results of testing normality distribution. The mean item scores ranged from 3.35 (item 2) to 3.91 (item 7). The skewness and kurtosis values were low, indicating no evidence of a difference from the normal distribution. All items are very strongly related to the scale since the corrected-item total correlation was 0.50 or higher.
Unidimensionality, monotonicity, and local independenceThree assumptions must be tested before IRT can be applied [18]. First, unidimensionality was tested through principal component analysis (PCA), which clearly extracted one factor with eigenvalue 3.82, explaining 54.57% of the variance (for more detail, see Table A1 and Fig. A1 in Additional file 1). Second, local independence was tested by checking the residual correlation between pairs of items using the Yen Q3 test [20]. Chen and Thissen [21] suggest that local independence is questioned when the correlation is greater than 0.20. The results showed that the correlation between several items was slightly above the threshold of 0.20, with a maximum of 0.26 (see Table A2 in the Additional file 1). Given that the main recommendation to prevent local dependence is good questionnaire instrument development and positive wording of the items in the scale, which is met in the case of the Understanding Society survey and SWEMWBS, the items can be considered locally independent. Last, all items were monotonically increasing. This means that choosing a higher category on the response scale indicated a higher level of mental well-being. Therefore, it can be concluded that all three assumptions (unidimensionality, monotonicity, and local independence) for IRT analysis are met.
IRT analysisAt first, the model fit of the GPCM and GRM results was compared. Three fit indices were used for evaluation: Log-likelihood, Bayesian information criterion (BIC), and Akaike information criterion (AIC). The results showed that GRM was preferable since Log-likelihood was higher (GRM = –64629.06 vs. GPCM = –65555.10), and AIC (GRM = 129328.10 vs. GPCM = 131180.20) and BIC (GRM = 129573.40 vs. GPCM = 121425.50) lower. GRM was therefore applied for the next steps of IRT analysis.
The discrimination parameter (a) and threshold parameters (b) calculated from GRM appear in Table 2. The results showed that all items discriminate very well since the parameter a can be considered as “high” for item 1 and “very high” for other items according to guidelines by Baker [19]. The highest value of the parameter was 3.04 (item 5), and the lowest was 1.33 (item 1). These results are clearly illustrated by the IIFs (Fig. 1), which show that the least discriminating item 1 is placed lowest and, in terms of shape, is the most flat. By contrast, the items with the largest value of parameter a contain more information and are characterized by variability in the shape of the curve. The values of the discrimination parameter correspond with the informative contribution, i.e., the most discriminative items are also the most informative and vice versa.
Table 2 Discrimination and thresholds parameters for SWEMWBSFig. 1Item information functions (IIFs) for seven items of the SWEMWBS with a vertical line at θ = 0
Four difficulty parameters were estimated for each SWEMWBS item, 28 in total. Most (18) have a negative value, and the remaining 10 have a positive value, indicating that the scale is better able to measure and discriminate between respondents with a negative value for latent trait, i.e. low mental well-being. The values of parameter b ranged from − 3.07 (item 7) to 1.91 (item 1). The interpretation of the values themselves can be illustrated by the specific example of item 1, for which the parameter value b1 = − 2.85 means that a respondent with a latent trait level (θ = − 2.85) has a 50% chance of answering item 1 with category 2 or higher; a respondent with θ = − 1.50 has a 50% chance of answering with categories 3 to 5 rather than categories 1 or 2; a respondent with θ = 0.05 has a 50% chance of answering with categories 4 to 5 rather than categories 1 to 3; up to a respondent with θ = 1.91 has a 50% chance of choosing category 5 rather than categories 1 to 4. The results also showed differences in difficulty between categories on the response scale across and within items. For differences across items, this means that the respondent has to attain different value of latent trait to select a particular category on the response scale. For example, for choosing category 5 on the response scale, the respondent has to have a latent trait value of at least 0.79 for item 7 but 1.91 for item 1.
Differences within items are demonstrated by unequal distance between categories on the response scale. For example, for item 2, the difference between thresholds b1 and b2 is − 1.12, between b2 and b3 is − 1.42, and between b3 and b4 − 1.86. From this perspective, item 5 showed the best functioning with differences of − 1.13, − 1.07, and − 1.43. The results described above are also supported graphically through ICCs (see Fig. 2).
Fig. 2Item characteristic curves (ICCs) for each item of the SWEMWBS
The functioning of the scale as a whole is shown in Fig. 3. The TIF indicates that the scale functions very well, especially between − 2.30 to 1.40 of the latent trait continuum, a range within which the standard error is also smallest. This figure also illustrates the previously presented finding that the SWEMWBS performs better on the left side, i.e., at negative values of the latent trait.
Fig. 3Test information function (TIF) and Standard Error for SWEMWBS
The last part of the IRT analysis focused on evaluating the functioning of the response scale based on CCC for each item from SWEMWBS (see Fig. 4). These curves show how well or poorly each response category performs both in the context of the whole scale and especially when moving between adjacent categories. Ideally, each category should be the most probable in some part of the latent trait, meaning it should have a clear peak and not be overlapped by another category along its entire length. A related issue is that each higher category should be selected with a higher probability than a lower category as the value of the latent trait increases [22]. The slope (parameter a) and location (parameter b) of the curves are also important for interpretation. Specifically, the greater the slope, the higher the peak of the curve, and the probability of selecting a different category changes more rapidly along the latent trait continuum θ [23]. The results show that the response scales for all items of the SWEMWBS show good functioning and that the number of response scale categories is adequate. For the CCCs of all items, each category has a clear peak, indicating that respondents are able to distinguish well between all five categories and use them appropriately. In terms of the shape of the curve, items 4 and 5 perform best since their slope is steeper, whereas item 1 performs slightly worse in this respect.
Fig. 4Category characteristic curves (CCCs) for each item of the SWEMWBS
Reliability and criterion-related validityReliability was tested using Cronbach’s alpha (α) and McDonald’s omega (ω) and was estimated 0.858 for α and 0.857 for ω. To assess the criterion validity of the SWEMWBS, the correlation with other relevant measures was calculated. The results shown in Table 3 indicated a very good criterion validity of the SWEMWBS as correlation with all measures can be considered to be large (r > = 0.30) by the criterion of Gignac and Szodorai [24]. The largest negative correlation (− 0.59) was found with the GHQ-12 scale, which is consistent with previous studies (e.g., [8, 12, 25, 26]), because this instrument measures the opposite of mental well-being, psychological distress. On the contrary, a high positive correlation was found with satisfaction with life overall (0.47). The lowest positive correlation was identified for the subjective assessment of general health, namely 0.31. This positive relationship has also been supported in other studies (e.g., [12, 26]). It should be noted, however, that although the correlation is high, these are not identical concepts and it is necessary to distinguish between mental well-being and other concepts such as life satisfaction, subjective health, positive and negative emotions, distress, etc., as these are usually only partial components of well-being.
Table 3 Correlations of the SWEMWBS with other relevant measures
Comments (0)