Psychometric Properties of Attention Measures in Young Children with Neurofibromatosis Type 1: Preliminary Findings

Although it has been demonstrated that young children with NF1 have attention and executive difficulties (Casnar & Klein-Tasman, 2017; Templer et al., 2013), the psychometric properties of the tools used to measure these domains have not been established with this population. These data are necessary given the high rate of attention problems in the NF1 population (Beaussart et al., 2018; Casnar & Klein-Tasman, 2017; Plasschaert et al., 2016), differences in clinical features between children with NF1 and those with idiopathic ADHD (Huijbregts, 2012), and the call for these data to inform clinical trials and developmental research (Klein-Tasman et al., 2021; Walsh et al., 2016). In this study, we reported preliminary data on the reliability and validity of attention measures, including a performance-based task (DAS-II DF), several computerized measures (Cogstate, NIH Toolbox, and K-CPT 2), and two parent-report measures (BRIEF and Conners Inattention/Hyperactivity Scale) in a small sample of young children with NF1.

Summary of Preliminary Psychometric PropertiesCogstate

While the completion rate of the Cogstate Identification task (a measure of attention) was high, almost half of the sample did not pass a validity integrity check. The Identification task demonstrated poor agreement and moderate consistency across time points. In terms of validity, more support was generally found for the Identification task. The Identification task had some associations with the other computerized measures of attention, but minimal relations with parent report of attention and executive function. Importantly, the task was more strongly related to general intellectual abilities than it was to parent-reported attention and executive behavioral concerns. Thus, when using this task, one must consider the effect that intellectual functioning has on performance. The Identification task did not yield significant practice effects.

NIH Toolbox

The NIH Toolbox DCCS, Flanker, and LSWM tasks were examined in the present study. The children in our sample were generally able to successfully complete the DCCS and without significant practice effects. However, performance on the DCCS was quite different between time 1 and time 2 in terms of both agreement (ICC) and consistency (Pearson r). Clinicians and researchers should use this measure longitudinally with caution. There was considerable support for convergent validity of the DCCS as it was generally related to other computerized measures and parent-report. However, there was weak evidence of discriminant validity for this measure. The DCCS was more strongly related to general intellectual abilities and fund of vocabulary knowledge than it was with many of the attention and executive measures.

Our sample had a high completion rate for the Flanker and without significant practice effects. This measure demonstrated acceptable reliability in terms of both consistency (Pearson r) and agreement (ICC). Although the Flanker demonstrated evidence for convergent validity with other computerized measures, it had minimal relations with parent-reported attention, and the pattern of associations for the Flanker indicated that this task was highly related to general intellectual abilities, more so than to measures of attention.

Many of the young children in the present sample had difficulty with the LSWM task, as evidenced by the low completion rate. On this task, children had to first pass practice trials in which they order animals based on their size. Many children in our sample were unable to do so, and thus no data from this task were generated for almost half of the participants. The LSWM task had low agreement (ICC) and moderate consistency (Pearson r) between time 1 and time 2 scores, but these findings should be interpreted with caution, given the low completion rate. The LSWM task was related to other measures in the present study, though it was unrelated to most parent-reported attention abilities. Given that it is a working memory measure, it is not surprising that the associations were not as high as the attention measures. There was mixed evidence of discriminant validity of the LSWM, as evidenced by the low associations with vocabulary (TPVT), but not general intellectual abilities (GCA). Finally, the LSWM did not demonstrate practice effects.

Out of the NIH Toolbox measures, the Flanker demonstrated the highest agreement and consistency between scores at time 1 and time 2. In terms of validity, all of the NIH Toolbox tasks had relations with other measures of attention and, thus, have some support for convergent validity. However, both the Flanker and the DCCS had patterns of associations that were stronger with measures of intellectual and vocabulary ability than with attention or executive ability. The LSWM had stronger evidence than the Flanker and DCCS for discriminant validity. None of the NIH Toolbox tasks showed practice effects.

K-CPT 2

Similar to the Cogstate, although a large portion of our sample was able to successfully complete the K-CPT 2, about 40% of participants did not pass the validity check. The outcome measures of the K-CPT 2 yielded a wide range of test-retest interpretations. Omissions had the highest agreement (as indicated by ICC values) between time 1 and time 2 scores and was the only score that was in the good-to-excellent range across all measures. In terms of consistency (as indicated by Pearson r), all scores except Variability demonstrated moderate-to-strong reliability. There was considerable support for convergent validity. Firstly, there were several correlations between each score and the other computerized measures. Secondly, many of the scores were also at least moderately related to most parent-reported attention and executive symptoms, with the exception of Commissions, Variability, and HRT BC. Support for discriminant validity was somewhat mixed, as Commissions, HRT SD, and Variability each had stronger correlations with measures of intellectual ability than with parent-reported attention symptoms. Analyses of practice effects indicated that overall, the K-CPT 2 yield practice effects only for Omissions. Additionally, Variability was significantly related to age at time 1, but not at time 2. This may suggest that practice does indeed play a role in Variability scores. Thus, researchers and clinicians are advised to interpret decreases in Omissions over time in children with NF1 with caution.

Overall, given its high reliability and consistent relations with other measures of attention, Omissions emerged as a strong preliminary metric of attention difficulties in young children with NF1, as long as practice effects are controlled for. Indeed, an avenue for future research is to include a control group so that it is possible compare improvements in Omissions across time points to a group of unaffected children to investigate whether the improvements are in excess of what would be expected based on practice alone. Future research should investigate whether practice effects are present at longer test-retest intervals as well.

DAS-II Digits Forward

In the present study, the DAS-II DF showed good test re-test reliability using both ICC and Pearson r values and did not demonstrate significant practice effects. The task largely showed evidence for discriminant validity, as evidenced by the very low relation with vocabulary knowledge and moderate relation with intellectual abilities. There was generally support for convergent validity of the DF task, particularly as it pertains to its relations with parent reports of attention difficulties. Interestingly, this task was moderately related to Commissions, and weakly related to Omissions. This pattern is unexpected given previous literature that showed that elevations on Omission errors are more common than on Commission errors in NF1 (Heimgärtner et al., 2019; Arnold et al., 2018). This may suggest that this DF task is less sensitive to sustained attention and more sensitive to impulsivity difficulties in NF1, which is a less frequently reported difficulty. This may also help explain why this DF task identified fewer difficulties than the Identification task and HRT SD on the K-CPT 2; it may be tapping into a facet of attention that is less impaired than the other measures. Alternatively, the fewer evident difficulties on this task may be due to administration. While on a computerized measure, a child is tasked with maintaining their attention for an extended time, typically with minimal prompting. On a traditional digits forward task, children are interacting with a test administrator, which may be more engaging and offers more opportunities for behavioral management techniques to finish the task.

BRIEF

With the exception of Inhibit, which was in the moderate range, each BRIEF score’s ICC value was in the good-to-excellent range of test re-test reliability. In terms of Pearson r, all scores demonstrated good consistency. No subscales had a significant practice effect.

Conners Inattention/Hyperactivity

The Conners Inattention/Hyperactivity scale had the highest test re-test reliability out of all of the measures in the present study. However, this scale did demonstrate a significant practice effect, with scores tending to worsen over time. Thus, we caution interpretation of changes (or lack thereof) in this scale across a similar timeframe when working with young children with NF1. Given the increase in impairment over time, improvements as a result of intervention may not be captured unless the amount of change in ratings is compared with a control group.

Implications

Given that the measures investigated demonstrated varying degrees of reliability and validity, there may not be a one-size-fits-all measure for use with this population. Clinicians and researchers must be cautious in their selection of measures and interpretation of data when using these measures with young children with NF1. When prioritizing test re-test reliability, such as in the case of longitudinal research, the performance-based indices with the highest agreement are Omissions and the Flanker and would thus be appropriate measures for use with young children with NF1. However, parent-report measures are largely more reliable, particularly the Conners Early Childhood Inattention/Hyperactivity scale.

There was generally support for validity across the measures, though Commissions was mostly unrelated to the other computerized measures and parent report measures. Importantly, many of these measures demonstrated stronger associations with cognitive functioning than other attention or executive measures, especially the DCCS and Commissions. However, the statistical significance of these differences was not tested due to the small size of the present sample. Upon considering evidence of convergent and discriminant validity, Detectability seems to be strongly related to attention in our sample.

It is also important to consider and reflect on the high proportion of participants who were either unable to complete the tasks or did not pass validity checks. Typically, this would indicate that the performance on a task is uninterpretable; however, it may be the case that the validity check in and of itself is clinically relevant and related to the high estimates of attention deficits in this population (Hyman et al., 2005; Templer et al., 2013). Clinicians and researchers should be aware of the high rates of young children with NF1 not passing validity checks, and not necessarily discount performance when an integrity check is not met. Future research with a larger sample should investigate whether young children with NF1 who do not pass validity indicators have higher rates of attention deficits than those who do pass.

Characterization of Difficulties

There was evidence that children with NF1 are vulnerable to difficulties across many of the measures related to attention and executive functioning included here. The mean performance of the sample on Identification, Detectability, Perseverations, HRT, Variability, and HRT ISI were one standard deviation above the normative mean. This would indicate difficulty discriminating between targets and non-targets, responding slowly and inconsistently. The mean performance of our sample suggested that the participants were inattentive and lacked vigilance on the K-CPT 2. This is consistent with previous reports of the performance of young children on the First Edition of the K-CPT (Arnold et al., 2018; Sangster et al., 2011) and another continuous performance task (Heimgärtner et al., 2019). Furthermore, mean performance on Omissions and HRT SD was two standard deviations above the normative mean, further emphasizing the sample’s difficulties with inattention and inconsistent performance throughout testing. Commissions, which can be an indicator of impulsivity (Halperin et al., 1991), on the other hand, was within the average range for the sample. This general profile of difficulty sustaining attention, but minimal difficulty with impulsivity is consistent with previous findings using both performance-based and parent-report measures of attention difficulties (Arnold et al., 2018; Payne et al., 2012; Sangster et al., 2011). Thus, the present performance-based findings provide further support for inattention being a central difficulty for young children with NF1.

Fewer difficulties were evident on the NIH Toolbox measures, the Shift scale, and the DAS-II DF task, with mean performances having been in the average range. The Flanker is a measure of executive attention, which largely overlaps with executive function (Zelazo et al., 2013). Performance within the typical range would suggest that, on average, our sample demonstrated age-appropriate cognitive control. Performance on the DCCS provides further support for age-typical executive abilities, as it is thought to measure cognitive flexibility (Zelazo et al., 2013). Average performance on the DAS-II DF task is consistent with a previous study using the same task with a different sample of young children with NF1 (Casnar & Klein-Tasman, 2017). Thus, overall performance suggests that executive difficulty may be less evident in young children with NF1. Further research about the timeline of the emergence of executive challenges on assessment measures is warranted.

Although mean performance on the LSWM task was in the average range for those who completed this task, it is important to recognize that almost half of the sample was not able to complete the task because they did not pass the practice trials. In the practice trials, the participants are asked to say the animals on the screen in size order, and then practice repeating them in size order without the stimuli on the screen. If they are unable to do so, the task discontinues. Understanding size and order are fundamental math and relational vocabulary concepts. Since the rates of learning disabilities are high in the NF1 population (Hyman et al., 2005), this task may not have been developmentally appropriate for the young children in the sample. Additionally, the low rate of completion could be due to working memory being a core deficit in NF1 (Templer et al., 2013). It could be the case that the LSWM demanded too much of a working memory load for the young children in this sample, even on the practice trials. Thus, it may be the case that the children in our sample who were able to complete the task have less cognitive difficulties than those who were unable to and are hence inflating the mean performance score. In any case, the reasons for difficulty with completing the LSWM are likely heterogeneous.

Limitations and Future Directions

This study is not without limitations. Firstly, this pilot study is underpowered and limited by a small sample. However, there are currently no available psychometric data to help inform research and clinical practice when working with young children with NF1. This study also did not include a control group of unaffected children as comparison, though normative data do exist for typically developing children. Using normative data is helpful as it offers large, stratified samples to match that of the most recent census. Most notably, the testing conditions, including the length of the battery, likely vary substantially from normative data collection procedures. Thus, our sample likely had a longer study visit with many more measures than the normative sample, which could impact data in the form of fatigue. Our sample is also largely white, which may limit the generalizability of our findings. Future research should expand upon the present study to include a more nationally representative, larger sample of children with NF1. Another avenue for future research would be to investigate the role of persistence, motivation, and effort in the completion of these tasks in young children with NF1. More generally, there is a need for additional psychometric research in a broader age range with the NF1 population. Many of the measures in the present study also provide normative data for older children and into adulthood. The reliability and validity of these measures may change with age, especially since executive dysfunction tends to worsen with age in NF1 (Beaussart et al., 2018). Furthermore, there are different measures of attention and executive abilities that are used with older children and adolescents, such as the Conners Continuous Performance Test, Third Edition (CPT-3; Conners, 2008) whose psychometric properties should be investigated in older children with NF1. Given how prevalent attention and executive difficulties are in this population, it is vital that this line of research continues to ensure the appropriate tools are being used to measure these difficulties across development in NF1.

Conclusions

Children with NF1 can exhibit difficulties with attention and executive function from a young age (Casnar & Klein-Tasman, 2017), which can differ from those seen in ADHD, yet the validity and reliability of attention measures in this population were previously unknown. These preliminary psychometric findings shed some light on which measures may be most effective at capturing these challenges. Yet, there may not be a one-size-fits-all measure of attention for use with young children with NF1, though the present analyses may lend insight into best practices for creating clinical or developmental study batteries for young children with NF1. When choosing a measure to use in a clinical or research setting, it is important to consider what the goal of the assessment is, whether to prioritize test-retest reliability and practice effects, and whether it is more important to choose a measure that has considerable support for validity. In general, the BRIEF and K-CPT 2 emerged as strong measures for use with young children with NF1, particularly because they offer a variety of scores that tended to be both reliable and demonstrated evidence of validity. However, Omissions on the K-CPT 2 and Emotional Control on the BRIEF may have practice effects, and should thus be used with caution, especially in clinical research. Additionally, our findings confirm previous work that has shown inattention to be a central concern for young children with NF1. Thus, it is particularly imperative that professionals use appropriate, reliable, and valid tools to evaluate these difficulties when assessing inattention in this population.

Comments (0)

No login
gif