Music based Autism Diagnostics MUSAD A n

Research in Developmental Disabilities

Music-based Autism Diagnostics (MUSAD) – A newly developed diagnostic measure for adults with intellectual developmental disabilities suspected of autism

a a Thomas Bergmann b , Tanja Sappok , Albert Diefenbacher , Sibylle Dames ,

a c c , Manuel Heinrich d , Matthias Ziegler , Isabel Dziobek

a Protestant Hospital Ko¨nigin Elisabeth Herzberge, Herzbergstrasse 79, 10365 Berlin, Germany 1

c Faculty of Life Sciences/Department of Psychology, Humboldt-Universita¨t zu Berlin, Unter den Linden 6, 10099 Berlin, Germany Statistics – Joint Masters Program Berlin, Freie Universita¨t Berlin, Garystr. 21, 14195 Berlin, Germany d Berlin School of Mind and Brain, Humboldt-Universita¨t zu Berlin, Unter den Linden 6, 10099 Berlin, Germany

ARTICLEINFO

ABSTRACT

Article history: The MUSAD was developed as a diagnostic observational instrument in an interactional Received 25 January 2015

music framework. It is based on the ICD-10/DSM-5 criteria for autism spectrum disorder Received in revised form 20 April 2015

(ASD) and was designed to assess adults on a lower level of functioning, including Accepted 28 May 2015

individuals with severe language impairments. This study aimed to evaluate the Available online

psychometric properties of the newly developed instrument. Methods: Calculations were based on a consecutive clinical sample of N = 76 adults with

Keywords: intellectual and developmental disabilities (IDD) suspected of ASD. Objectivity, test-retest Autism

reliability, and construct validity were calculated and a confirmatory factor analysis was Diagnostics

Intellectual disability

applied to verify a reduced and optimized test version.

Assessment Results: The structural model showed a good fit, while internal consistency of the Music therapy

subscales was excellent ( v >

.82 and item- total correlation from .21 to .85. Objectivity was assessed by comparing the scorings of two external raters based on a subsample of n = 12; interrater agreement was .71 (ICC 2, 1). Reliability was calculated for four test repetitions: the average ICC (3, 1) was .69. Convergent ASD measures correlated significantly with the MUSAD, while the discriminant Modified Overt Aggression Scale (MOAS) showed no significant overlap. Conclusion: Confirmation of factorial structure and acceptable psychometric properties suggest that the MUSAD is a promising new instrument for diagnosing ASD in adults with IDD.

ß 2015 Elsevier Ltd. All rights reserved.

* Corresponding author at: Evangelisches Krankenhaus Ko¨nigin Elisabeth Herzberge, Herzbergstrasse 79, 10365 Berlin, Germany. Tel.: +49 30 5472 4951; fax: +49 3091741523. E-mail address:

1 t.bergmann@keh-berlin.de Tel.: +49 30 5472 0; fax: +49 30 5472 2000. (T. Bergmann).

http://dx.doi.org/10.1016/j.ridd.2015.05.011 0891-4222/ß 2015 Elsevier Ltd. All rights reserved.

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

1. Introduction Autism Spectrum Disorder (ASD) is a frequently co-occurring condition in individuals with intellectual

developmental disabilities (IDD). The prevalence of IDD within the autism spectrum is estimated to be between 30% and 35% ( Centers for Disease Control and Prevention, 2012; Fombonne, 2003b ). Despite the clinical relevance of this group due to high rates of comorbid challenging behaviors leading to an above-average administration of antipsychotics ( McCarthy et al., 2010; Sappok, Budczies, et al., 2014 ), and frequent admissions to inpatient treatment ( Tsakanikos, Costello, Holt, Sturmey, & Bouras, 2007 ), research activities focusing on adults in the low-functioning range of the autism spectrum are rare ( Matson & Shoemaker, 2009 ). There is a lack of diagnostic standards assessing ASD in adults with IDD, especially in those with limited language skills ( Bo¨lte & Poustka, 2005, 2005; Matson & Shoemaker, 2009 ). Generally ASD seems to be under-diagnosed in adulthood ( Brugha et al., 2011 ): reasons for this may

be the change of diagnostic criteria over the decades, increasing sensitivity to ASD in children or individual adaptation to social demands. In adults with IDD, diagnostics are further complicated by, for example, limited self-report and a lack of information about early child development due to loss of contact with families. Symptom overlap with schizophrenia, long-term hospitalization, severe sensory impairments, and IDD itself may lead to misinterpretation and wrong treatment concepts ( Akande, Xenitidis, Roberston, & Gorman, 2004; Sappok, Bergmann, Kaiser, & Diefenbacher, 2010 ). In cases of suspected ASD, comprehensive diagnostics is the basis for adequate treatment and support, enhancing health, reducing challenging behaviors, developing social and emotional skills and leading to a better quality of life.

In children and young people, a huge number of ASD screening tools and a diagnostic gold standard including a parental interview, the Autism Diagnostic Interview-Revised (ADI-R; Lord, Rutter, & Le Couteur, 1994 ) and a play- and interview-based behavior observational assessment, namely the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989 ), allow for valid diagnostic statements in cases where ASD is suspected. While increasing numbers of specific tools and questionnaires have been developed to screen for ASD in adults with IDD in recent years, such as the Pervasive Developmental Disorder in Mental Retardation Scale (PDD-MRS; Kraijer & Bildt, 2005 ), the Autism Spectrum Disorders – Diagnosis for intellectually disabled Adults (ASD-DA; Matson, Wilkins, Boisjoli, & Smith, 2008 ), the Diagnostic Behavioral assessment for Autism Spectrum disorder – Revised (DIBAS-R; Sappok, Gaul, et al., 2014 ), and the Autism Check List (ACL; Sappok, Heinrich, & Diefenbacher, 2014 ), there is a lack of diagnostic standards and a specific measure for structured behavioral observation in adults with IDD and severe language impairment. Even though the ADOS is generally applicable in adults with IDD ( Berument et al., 2005; Sappok, Diefenbacher, et al., 2013 ), childlike materials and prompts seem to be inappropriate in assessing adults. Additionally, limited feasibility is reported, correlating with the severity of IDD and speech impairments ( Bergmann, Sappok, Diefenbacher, & Dziobek, 2015; Sappok, Diefenbacher, et al., 2013 ). In light of the lack of specific ASD diagnostic measures for adults on a lower level of functioning, valid procedures based on nonverbal communication are highly desirable.

Musical interaction as a nonverbal means of communication and an adult form of play may build a framework to assess ASD in adults with IDD ( Wigram, 2000 ). The strong connection between music and ASD is described in terms of exceptional musical interests and abilities early on by Kanner (1943) , and has been supported recently by a Cochrane review of music therapy in the treatment of children with ASD ( Geretsegger, Elefant, Kim, & Gold, 2014 ). But what kind of diagnostic information might music provide? Interactional skills, social affect, and reciprocity are to be observed in joint music-making, including individuals with severe language impairments. Stereotyped, restricted, and repetitive behaviors and interests may occur in musical exploration and expression. Multisensory aspects of musical instruments (auditive, visual, haptic, olfactory) allow the investigation of abnormal sensory interests. Motor coordination and mannerisms are to be seen in the ways in which instruments are handled (right-left drum beat) and in movements like tapping and dancing. Overall, most behavioral characteristics listed as core symptoms of ASD in the ICD-10 and DSM-5 can be observed in musical action and interaction ( Bergmann et al., 2015 ). Given the background of the strong association between music and ASD, several assessment tools and two explicit music-based diagnostic instruments have been developed in the field of ASD: Wigram’s Harper House Music Therapy Assessment (1999) and the Music Therapy Diagnostic Assessment (MTDA; Oldfield, 2004 ). Both were developed to assess children and lack a comprehensive psychometric verification.

In 2009, the Music-based Scale for Autism Diagnostics (MUSAD) was designed in a music therapeutic setting alongside further specific instruments for ASD screening in this group of patients ( Sappok, Gaul, et al., 2014; Sappok, Heinrich, et al., 2014 ). The instrument was developed along the ICD-10 research criteria for autism (F84.0, F84.1) taking into account the latest changes made in the DSM-5 ( Bergmann et al., 2015 ). The concept is comparable to the ADOS, using prompts to provoke diagnostic relevant behaviors that are to be coded on a 4-point Likert scale regarding the severity of symptom expression in the autistic spectrum. Ten predefined active musical interactional situations were used to create

a playful, naturalistic, and age-appropriate framework, also engaging non-speakers in a diagnostic assessment. The implementation procedure is as follows: 1. Free play (warm-up); 2. Piano (joint attention); 3. Gongs (dynamic/affective attunement); 4. Congas (musical dialog); Break; 5. Sing a song (socio-emotional togetherness); 6. Ocean drum (contact via instrument, imagination); 7. Symbolic instruments (pretend play); 8. Music selection (asking for help); 9. Balloon game (turn-taking): and 10. Dancing together (bodily synchronization). The conga situation may provide an example of the entire diagnostic work-up:

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

1. Implementation The Investigator initiates joint play with a common pulse, followed by slight tempo changes and again stabilization of a basic beat. The next task is to hit the drum with alternating right and left hand. Next are simple motifs and breaks in order

to initiate interplay; if the client does not react or stops, he or she is supported verbally and gesturally. Finally a crescending drum roll with release of the suspense in a final blow invites the client to share affectivity.

2. Description To change perspective, the investigator is first asked to describe the client’s behaviors in free-text along predetermined observation priorities: i.e., in this case, motor coordination, imitation skills, metric synchronization and variations, turn-

taking skills, social reciprocity and shared joy.

3. Scoring Task-specific items like ‘‘Rhythmic synchronization of tempo changes’’ or overall items like ‘‘Joy in playing together’’ are to be scored on a 4-point scale in order to operationalize ASD-related behaviors. (For a complete description see:

Bergmann et al., 2015 ). Eighty-eight items are grouped in five domains according to ASD main characteristics, including (1) social interaction, (2)

communication, (3) stereotyped and repetitive behaviors, (4) sensory-motor issues, and (5) affective dysregulation. Temper tantrums, aggression, and self-injury are mentioned as ‘‘nonspecific problems’’ in the ICD-10, so we added the last domain (affective dysregulation) as a possible ASD marker. Regarding the future development of diagnostic criteria and diagnostic understanding of ASD, motor issues were also included. The scale consists of two modules, one for verbal and one for non- verbal individuals. However, the two modules differ only in the domain of social communication (2), involving a set of verbal items or an alternative nonverbal item set without influence for conducting the investigation. The inclusion of low-level musical interventions without the requirement to imitate ( Bergmann et al., 2011; Schumacher & Calvet, 2007 ) and a course of tasks with increasing demands on social and physical contact was developed in order to decrease irritability and rejection in individuals with a low level of functioning facing an unfamiliar environment. In a previous study, a feasibility of 95% was achieved when applying the MUSAD in adults with IDD ( Bergmann et al., 2015 ), which is considerably higher than the feasibility for the ADOS in this group, which was reported at 81% ( Berument et al., 2005; Sappok, Diefenbacher, et al., 2013 ).

The primary objective of the present study was to examine the MUSAD along the main criteria for test quality, i.e., objectivity, reliability, and validity based on a clinical sample. The secondary aim was the improvement of test economy by reducing the number of items.

2. Material and methods

2.1. Procedure Data collection with the newly developed MUSAD was conducted at a psychiatric department that specialized in mental

health care for adults with IDD in Berlin, Germany. This service consists of an inpatient and outpatient unit and offers assessment and treatment for adults with IDD and mental disorders and/or severe challenging behaviors. Given this setting, all participants in this study had an additional mental or behavioral problem on admission. In case of suspected ASD, the diagnostic assessment, including MUSAD investigation, was made after remission of acute exacerbation of the psychiatric illness, mainly at the outpatient clinic. Diagnostic classification including ASD and severity of IDD was conducted in accordance with the diagnostic research criteria for mental disorders proposed by ICD-10 ( World Health Organization, 2008 ). The MUSAD assessment took place in a big room equipped with a predefined set of standard music therapy instruments arranged according to the course of musical-interactional diagnostic situations ( Bergmann, Sappok, Diefenbacher, & Dziobek, 2012 ). The admission of the procedure was based on a manual with prescribed interventions for each musical-interactional situation. The MUSAD manual is part of the unpublished test draft (for a complete description see also: Bergmann et al., 2015 ).

All sessions were videotaped to allow better subsequent diagnostic behavioral observation and scoring as well as to gain material to assess interrater reliability. All investigations were carried out by the test developer, who was blinded for the final diagnosis, all scorings had been carried out before the final clinical diagnosis was made. Although the diagnostic team was blinded to the MUSAD scoring, single video sequences were used to support diagnostic decision- making in cases of diagnostic uncertainty. However, no information about the scoring of certain items or overall sum- scores was provided.

ASD diagnoses were assigned by a multidisciplinary team consensus conference according to the ICD-10 diagnostic research criteria for autism or atypical autism (F84.0/F84.1). If no information about the developmental history could be obtained, atypical autism was diagnosed. The multi-disciplinary team consisted of at least one psychiatrist, a clinical psychologist, a special-needs caregiver, therapists, and a member of the nursing staff who was experienced in the fields of IDD and ASD. Diagnostic classification was based on all available information, including medical histories, psychiatric and physical examinations, video-based behavior analyses across a variety of contexts, and various standardized measures such as the PDD-MRS ( Kraijer & Bildt, 2005; Kraijer & Melchers, 2003 ), the German version of the Social Communication Questionnaire-current (FSK-aktuell; Bo¨lte, Poustka, Rutter, Bailey, & Lord, 2006 ), the ACL ( Sappok, Heinrich, et al., 2014 ), and, in cases of diagnostic uncertainty, the ADOS ( Bo¨lte, Rutter, Le Couteur, & Lord, 2006 ), and/or the ADI-R ( Bo¨lte, Rutter, et al.,

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

2006 ). The SCQ-current was completed by an informant from the patient’s private living environment; the PDD-MRS, the ADOS, and the ADI-R were completed by a psychologist (H.K.) who was not involved in the study. Existing data from diagnostic procedures were used, and these procedures were performed with the informed consent of the patients as a part of routine patient care (National Hospital Law § 25.1, version 18.09.2011). This study was part of a larger study on the development and adaptation of instruments for ASD diagnosis in this group ( Sappok, Budczies, et al., 2013; Sappok, Gaul, et al., 2014; Sappok, Heinrich, et al., 2014 ) which was approved by the local ethics committee and was conducted according to the recommendations of the Declaration of Helsinki.

2.2. Sample In the period between 1/2010 and 12/2011 the MUSAD was applied to 91 adult patients who were consecutively included

in the diagnostic procedure. Either ASD diagnostics were part of the treatment contract or ASD was suspected as a result of clinical behavioral observation in the inpatient or outpatient setting, conspicuous biographical disclosures and/or unclear previous findings. Inclusion criteria were age >18 years and the presence of an IDD (ICD-10: F70-73). There were no further selection criteria except for limitations like logistic problems or missing consent documents, resulting in an ad-hoc sample reflecting clinical reality. In our in- and outpatient unit, the study participants did not receive regular music therapy or any form of musical training allowing the development of musical skills in advance of the MUSAD investigation. During the three months preceding the study, the procedure was tested and slightly edited based on observations with 11 patients. Of the 80 cases included in this study, four were excluded due to profound sensory impairments and rejection of the procedure. All calculations were based on the remaining N = 76. For demographic and clinical characteristics of the study sample, see Table 1 .

ASD was diagnosed in 50 participants (66%), while the remaining 26 participants (34%) did not show ASD but were diagnosed with schizophrenia, mood disorders, attachment disorders, sensory deficits, obsessive-compulsive behaviors, attention deficit hyperactivity disorder, or challenging behaviors on a background of IDD. The gender distribution in the ASD group was 42 men to eight women, reflecting the well-known accumulation of ASD in males ( Fombonne, 2003a ). A similarly unequal distribution (21 to 5) was also found in the non-ASD group, suggesting an increased suspicion of autism in males. Eighteen participants were non-verbal, and 15 were able to speak in single words, indicating a high proportion (43%) of participants with profound expressive language impairments within the entire sample. Since deficits in verbal and nonverbal communication are closely associated with autism, profound speech impairments were found in 52% of the ASD group compared to 27% in individuals with IDD only. There were no significant differences in any clinical or demographic characteristic between the ASD and the IDD only group.

2.3. Measures Convergent scales to screen for ASD used in this study are listed below, followed by more elaborate diagnostic procedures

and discriminant measures.

Table 1 MUSAD sample characteristics.

Characteristic

Total (N = 76)

ASD (n = 50)

Non-ASD (n = 26) p-value a

Gender Females

Age Years

.938 b IQ, intellectual disability

M (SD)

1.000 Speech level

.406 1 = short sentences

.439 2 = max 3-word phrases

.362 3 = single words

.238 4 = no speech

.266 MUSAD-module choice 1 = non-verbal

Note: M, mean; SD, standard deviation.

b Unpaired two-sample t-test (two-sided, uncorrected for multiple comparisons).

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

2.3.1. Convergent ASD measures The Pervasive Developmental Disorder in Mental Retardation Scale (PDD-MRS) is the first interview-based screening questionnaire specifically designed to assess ASD in individuals with ID ( Kraijer & Bildt, 2005 ). The comprehensive norming study carried out by Kraijer and Bildt (2005) found strong support for the diagnostic value in terms of sensitivity and specificity over the complete ranges of ID (mild to profound) and age (2–55 years). The PDD-MRS consists of 12 items related to current behavior and was used for a structured interview of an important reference person carried out by a psychologist (H.K.) experienced in the field of ASD and ID in this study.

The Social Communication Questionnaire (SCQ) was designed to screen for ASD from childhood to early adulthood (4–18 years) and includes 40 binary items assessing current or lifetime behavior ( Rutter, Bailey, & Lord, 2003 ). In this study, the German form of the SCQ-current was applied ( Bo¨lte, Poustka, et al., 2006 ). Several studies have shown that the SCQ is a useful ASD screening tool in toddlers, young children, and adolescents (i.e., Allen, Silove, Williams, & Hutchins, 2007; Chandler et al., 2007; Oosterling et al., 2010 ). However, there are conflicting results referring to the optimal cut-off point for use in adults with ID ( Brooks & Benson, 2013; Sappok, Diefenbacher, Gaul, & Bo¨lte, 2014 ). The SCQ was completed by professional caregivers.

The Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989 ) is a semi-structured observational instrument using play and interview situations to assess social and communicative abilities in individuals suspected of having ASD. It consists of four modules, oriented to the client’s expressive language level; modules 1 and 2 were developed for young children. Despite the child-oriented play materials, the ADOS is also applicable in adults with impaired language skills; however, a dropout rate of 15–19% is reported for this population ( Bergmann et al., 2015; Berument et al., 2005; Sappok, Diefenbacher, et al., 2013 ). Furthermore, the ADOS tended to be over-inclusive (sensitivity 100%; specificity 45%), although its psychometric properties could be improved by using a revised algorithm (sensitivity 94%; specificity 65%: Sappok, Diefenbacher, et al., 2013 ). The ADOS was administered by two psychologists (I.D., K.R.) experienced in the field: I.D. conducted most investigations and was certified in its use by an official ADOS/ADI-R trainer.

2.3.2. Discriminant scales The Modified Overt Aggression Scale (MOAS) is a rating scale assessing the intensity of different forms of displayed aggression (verbal, toward objects, toward the self, and toward others: Knoedler, 1989; Yudofsky, Silver, Jackson, Endicott, & Williams, 1986 ). First evaluations of the interrater reliability in adults with IDD indicated good values for the MOAS overall score ( Oliver, Crawford, Rao, Reece, & Tyrer, 2007 ).

The Aberrant Behavior Checklist (ABC; Aman, Singh, Stewart, & Field, 1985 ) is a symptom checklist for assessing problem behavior in children and adults with IDD. It consists of 58 items and was empirically developed by factor analysis on data from about 1000 residents. The five ABC subscales – (1) Irritability, Agitation, (2) Lethargy, Social Withdrawal, (3) Stereotypic Behavior, (4) Hyperactivity, Noncompliance, and (5) Inappropriate Speech – were used in an evaluation to measure the acute effects of risperidone on severe behavior problems in children with ASD ( McCracken et al., 2002 ). However, assessing autism-related behaviors like irritability, tantrums, and aggression, the scale does not reflect the full range of ASD core symptoms and was therefore included to gain information about the discriminant validity of the MUSAD.

The MOAS and the ABC were handed to close carers, i.e., relatives, caring staff, or special need-carers, to assess the participants’ current behaviors.

2.4. Data analysis

2.4.1. Factorial validity and construct reliability

A confirmatory factor analysis (CFA) was calculated to assess the factorial validity of the five-dimensional MUSAD test draft. In order to calculate with maximum sample size, we did not build subgroups according to verbalization. Taking into account that the MUSAD communication domain (2) consists of two different item sets for verbal and nonverbal individuals, we excluded these module-dependent items. Before calculating CFA based on all items applicable for individuals with and without verbal abilities, several steps were conducted to further strengthen the factorial model. On the background that the DSM-5 was released after the MUSAD development period, items were regrouped according to the new ASD diagnostic criteria. This concerned, in particular, items assessing sensory issues within the sensory-motor domain. Along with the DSM-5 model, they were included in the cluster of stereotyped, restricted and repetitive patterns of behavior, interests, or activities. The regrouping process resulted in the following theoretically assumed factors: (1) social interaction, (2) stereotyped, restricted and repetitive behaviors and interests, (3) motor issues, and (4) affective dysregulation. In a next step, considering the high coding effort for the full item set ( Bergmann et al., 2015 ), items with high content redundancy were deleted. In addition, we decided to remove the affective dysregulation domain to further reduce coding effort. Finally, we tested a three-factorial model (F1: social interaction; F2: stereotypies and sensory issues; F3: motor coordination) with 37 items.

Confirmatory factor analyses were conducted with Mplus (Version 6.11) using a weighted least square-mean and variance adjusted (WLSMV) estimator ( Muthe´n, Du Toit, & Spisic, 1997 ). WLSMV is the preferred estimation method when indicator variables are ordinal and uses polychoric correlations. Missing data were dealt with using the full information

128

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

maximum likelihood (FIML) method in Mplus. Model fit was evaluated based on a set of different recommendations ( Beauducel & Wittmann, 2005; Heene, Hilbert, Draxler, Ziegler, & Bu¨hner, 2011; Hu & Bentler, 1999; Yu, 2002 ). Consequently, besides the global model test using the Chi square statistic, the root mean square errors of approximation (RMSEA), the weighted root mean square residual (WRMR), and the comparative fit index (CFI) were used to evaluate model fit. The following cutoffs were applied: RMSEAs < .06, CFIs > .95, WRMRs < 1.0.

The reliability of the subscales was calculated in terms of internal consistency using McDonald’s Omega ( McDonald, 1999 ). Omega is calculated using factor loadings and error variances of single indicators. In contrast to Cronbach’s Alpha, omega is characterized as a better estimator for scale reliability when the assumption of essential tau-equivalent (same factor loadings) indicators is not met ( Zinbarg, Revelle, Yovel, & Li, 2005 ).

2.4.2. Descriptive item analysis Item means, standard deviations, item difficulties, and part-whole corrected item total correlations were calculated for each subscale separately and for all items integrated in the final CFA model.

2.4.3. Interrater agreement The consistency of scores across raters was evaluated based on a stratified random sampling of n = 12 individuals. Patients’ key characteristics, i.e., ASD, speech level, and gender, reflected the total sample (9 ASD – 3 non-ASD; 7 verbal – 5 nonverbal; 10 male – 2 female; age span: 20–52, M = 36, SD = 10.7). Video-based scoring was carried out by two external blinded raters. One was a music therapist, the other a psychologist experienced in diagnostics of Asperger syndrome in adults (Charite´ Autism Outpatient Clinic, Berlin). Consensus ratings, including the test developers’ scoring, were performed in advance to aid familiarization with the instrument. During the independent scoring of the subsample, every third case was used for calibration by a consensus discussion after data entry. Concordance between scorings was calculated using the intraclass correlation coefficient, two-way random model, and single measure using the available scores of all 37 items integrated in the final CFA model (ICC 2, 1; Shrout & Fleiss, 1979 ). In order to assess the objectivity of the test developer’s estimations, the ICC among all three raters was calculated.

2.4.4. Test-retest reliability Measurement accuracy of the instrument was calculated based on test repetitions of four randomly selected cases, conducted and scored by the test developer. Investigations were carried out in the same environment; a time interval of at least three months was chosen to minimize habituation and memory effects. One participant was diagnosed with autism (F84.0), the remaining three with atypical autism (F84.11; weaker expression of symptoms). The degree of IDD ranged from mild to profound and one person was non-verbal. Due to the ordinal data, the agreement between test and retest was measured using the intraclass correlation coefficient, two-way mixed model, and single measure using the available item scores of all 37 items integrated in the final CFA model (ICC 3, 1; Shrout & Fleiss, 1979 ).

2.4.5. Convergent and discriminant validity To obtain evidence on how far the MUSAD scores can be interpreted as reflecting autism symptomatology, the MUSAD raw values (sum score) including 37 selected items were correlated with the test values of other measures applied in this study. For the calculation of convergent validity, two ASD screening instruments were chosen: the SCQ and the PDD-MRS (described above). Despite the small sample size, correlations with the different ADOS modules were also made to gain additional information (ADOS module 1 – MUSAD nonverbal, n = 17; ADOS module 2 – MUSAD verbal, n = 7). To assess discriminant validity, the ABC and the MOAS were chosen, neither of which would be expected to be highly associated with core ASD symptomatology. Calculations were made using Pearson’s correlation coefficient.

3. Results

3.1. Factorial validity and construct reliability

The chi-square test of exact model fit was significant: x 2 (df = 626) = 822.8, p < .001. However, the fit indices of the

CFA for all 37 module independent items were CFI = .97, RMSEA = .06, 90% CI [.05, .07] and WRMR = 1.02, indicating a good model fit. Possible reasons for model misfits are offered in Section 4 . Standardized factor loadings for the social interaction domain (F1) ranged between .62 and .93 (Mdn = .85), while those for the domain of stereotyped, restricted and repetitive behaviors (F2) were between .43 and .93 (Mdn = .76), and those for the motor coordination domain (F3) were between .56 and .94 (Mdn = .92). All factor loadings for all domains were significant, with p < .001 except for MUS306: particular interest in parts of objects (p = .003). The latent factor intercorrelations ranged between .74 and .88.

The construct reliability of each subscale, measured with McDonald’s omega, was >.90 ( v 1 = .98, v 2 = .94; and v 3 = .92). Fig. 1 contains the first order factor model including all standardized loadings, residuals and factor- intercorrelations.

3.2. Descriptive item analysis Means, standard deviations, item difficulties, part-whole corrected item total correlations and factor loadings for all items

integrated in the final CFA model are summarized in Table 2 . Item mean values for discrete scale values from 0 to 3 ranged from 0.12 (MUS306 particular interest for parts of objects) to 2.45 (MUS106 reactive social smile): thus, item difficulties ranged

[(Fig._1)TD$FIG]

MUS101

F1

MUS103 MUS105 MUS106 MUS107 MUS108 MUS109 MUS110 MUS111

MUS102

MUS113

MUS118 MUS119 MUS120 MUS122 MUS205 MUS508 MUS116 MUS206

MUS303 MUS304 MUS305 MUS306 MUS307

F3

MUS308 MUS309

F4

MUS501 MUS506 MUS411

MUS412 MUS413 MUS414 MUS415

MUS101

F1

MUS103 MUS105 MUS106 MUS107 MUS108 MUS109 MUS110 MUS111

MUS102

MUS113

MUS118 MUS119 MUS120 MUS122 MUS205 MUS508 MUS116 MUS206

MUS303 MUS304 MUS305 MUS306 MUS307

F2

MUS308 MUS309

F3

MUS501 MUS506 MUS411

χ 2 (df = 626, n = 76) = 822.82, p < 0.001

CFI = 0.97 RMSEA = 0.06 WRMR = 1.02

MUS412 MUS413 MUS414 MUS415

Fig. 1. MUSAD first order factor model. Method: WLSMV-estimator, polychoric correlation (ordinal indicators), missing-pattern = 41, number of free parameters = 148. F1 = social interaction; F2 = stereotyped, restricted, repetitive behaviors, interests, or activities; F3 = motor coordination.

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

129

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

Table 2 Descriptive item statistics and standardized factor loadings.

Items

Item characteristics

Code Description

SD

p i r i(x i) l Std

F1: social interaction MUS101

Eye gaze when contacting 1.55 1.06 .52 .69 .86 MUS102

Eye contact during play or activity 1.75 0.88 .58 .85 .88 MUS103

Reaction to directed gaze 1.67 1.04 .56 .55 .83 MUS105

Socially directed facial expression 2.17 0.82 .72 .71 .83 MUS106

Reactive social smile 2.45 0.91 .82 .72 .93 MUS107

Modulation of facial expressions 2.04 0.82 .68 .78 .89 MUS108

Gestures to regulate musical interaction 1.95 0.87 .65 .82 .93 MUS109

Bodily alignment in interaction 1.22 0.76 .41 .71 .81 MUS110

Body tension and posture to communicate emotional states 1.83 0.74 .61 .70 .73 MUS111

Degree of reciprocity 1.88 0.79 .63 .63 .88 MUS113

Motivation for making interpersonal contact 1.80 1.06 .60 .78 .89 MUS114

Reaction to contact offers 1.75 0.93 .58 .84 .90 MUS118

Joy in playing together 2.08 0.96 .69 .72 .84 MUS119

Response to directing the attention 1.53 1.03 .51 .59 .81 MUS120

Integration of the investigator into client’s own play 1.66 0.76 .55 .79 .88 MUS122

Quality of interpersonal motor synchronization 2.00 0.89 .67 .76 .83 MUS205c

Rhythmic synchronization in tempo changes 1.80 0.92 .60 .43 .62 MUS508

Emotional response to physical contact 1.69 0.69 .56 .77 .70 v 1 = .98

F2: stereotyped, restricted, and repetitive behaviors MUS116

Regulation of proximity and distance 1.04 0.82 .35 .62 .82 MUS206c

Imagination and creativity 1.86 0.69 .62 .56 .88 MUS301

Mannerisms 0.93 1.00 .31 .56 .61 MUS302

Complex stereotypic motorics 1.28 0.92 .43 .47 .67 MUS303

Unusual interests and savant talents 0.88 0.93 .29 .37 .43 MUS304

Non-functional play 0.51 0.60 .17 .52 .79 MUS305

Sensory interest in objects, persons and own body 0.81 0.94 .27 .65 .93 MUS306

Particular interest in parts of objects 0.12 0.33 .04 .21 .51 MUS307

Restricted patterns of interest/playing styles 0.62 0.81 .21 .63 .83 MUS308

Compulsive-ritualized handling of instruments/objects 0.95 0.71 .32 .50 .52 MUS309

Stereotypy in musical expression and play gesture 1.69 0.80 .56 .75 .88 MUS401

Self-stimulation with objects/own body 0.65 0.90 .22 .51 .84 MUS501

Flexibility of musical dynamics 1.94 0.67 .65 .50 .72 MUS506

Self-injurious behavior 0.17 0.47 .06 .41 .72 v 2 = .94

F3: Motor coordination MUS411

Right-left coordination 1.32 0.96 .44 .58 .92 MUS412

Integration of the extremities into the body image 1.44 0.87 .48 .78 .94 MUS413

Dexterity in the handling of instruments 1.25 0.88 .42 .68 .78 MUS414

General assessment of motor coordination 1.59 0.88 .53 .81 .92 MUS415

Abnormal gait pattern 0.55 0.76 .18 .34 .56 v 3 = .92

Note: M, mean; SD, standard deviation; p i , item difficulty; r i(x i) , part-whole corrected item total correlation calculated for each subscale separately; l Std , standardized factor loading as a result of confirmatory factor analysis using WLSMV estimator; v , McDonald’s Omega as a measure of construct reliability.

.82. Item-total correlation ranged from .21 (MUS306 particular interest for parts of objects) to .85 (MUS102 eye contact during play or activity).

3.3. Interrater agreement Correlating the scorings of the two blinded raters, the ICC (2, 1) was .71, 95% CI [.59, .82] based on the final version

including 37 items. The ICC (2, 1) across three raters (2 blinded raters & test developer) was .67, 95% CI [.62, .72].

3.4. Test-retest reliability The ICC (3, 1) for the four tests and retests were between .45 and .80 and resulted in an average ICC (3, 1) of .69 based on

the final MUSAD version. For details see Table 3 .

3.5. Convergent and discriminant validity The MUSAD total score was significantly correlated with the sum scores of the PDD-MRS, r(37) = .55, p < .001, one-sided,

whereas the correlation between MUSAD and SCQ was moderate, r(63) = .32, p = .005, one-sided. Both ASD screeners were part of the internal diagnostic standard. A strong to very strong positive relationship was found when correlating ADOS

131 Table 3

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

Test-retest reliability. Case no.

Age, gender

ICC (3, 1) absolute 95% CI 37 45 years,

Module

ASD (severity)

Retest interval

[.62, .90] male

2 F84.11

139 days = 4.6 months

162 days = 5.3 months

400 days = 13.2 months

126 days = 4.1 months

(severe)

Total 20–48 years

Mild/severe

modules 1 and 2 with the MUSAD sum score (M1: r(15) = .58; M2: r(5) = .83, p < .05, one-sided). Analyses of divergent validities revealed a non-significant and negligible relationship between the MUSAD and the MOAS (r(56) = .15, p = .252), whereas a weak positive relationship was found in correlation to the ABC (r(59) = .28, p = .03).

4. Discussion In this paper we present results generated in a clinical sample of 76 individuals with IDD and suspicion of ASD supporting

the objectivity, reliability, and construct validity of a newly developed ASD diagnostic observational instrument based on interactions in a music framework (MUSAD). One aim of the current study was to test the factorial validity of the instrument.

Before running a CFA to verify the factorial structure of the MUSAD draft version, we slightly modified the model according to the proposed two-dimensional ASD symptom structure (factor 1: social communication/interaction; factor 2: restricted/repetitive patterns of behavior, including sensory issues) of the DSM-5. These changes seemed necessary, since the test draft development happened before the DSM-5 had been released. The two-dimensional ASD model as defined in the current DSM-5 is supported by numerous studies ( Frazier et al., 2012; Guthrie, Swineford, Wetherby, & Lord, 2013; Mandy, Charman, & Skuse, 2012 ) and shapes the current understanding of ASD. Because the MUSAD also assesses motor issues, a third factor (motor coordination) was integrated in the tested model. Impairments in motor coordination are not included as ASD core symptomatology in the ICD-10 and DSM-5 manuals, but are discussed as being cardinal ASD symptoms in current literature ( Fournier, Hass, Naik, Lodha, & Cauraugh, 2010; Heasley, 2012; Hilton, Zhang, White, Klohr, & Constantino, 2011 ).

As a result of the CFA, the RMSEA and CFI indicated a good fit, while the WRMR exceeds the recommended threshold of 1.0 ( Yu, 2002 ). However, the WRMR is seen as an experimental fit index ( Muthen, 2014 ), so we lent more weight to the CFI and

RMSEA in the evaluation. The x 2 test was significant, indicating no exact model fit and the need to investigate potential

reasons for misfit ( Heene et al., 2011 ). Modification indices (MIs) were evaluated to offer possible explanations. Specifying correlated errors between some items assessing the use of mimicking to communicate affective states would improve model fit. This may indicate that the social communication factor (SC) may be more complex, with at least a potential lower order factor of sharing affect. Moreover, specifying correlated errors between items with the same task (dancing) as a shared focus of observation would improve model fit. This may be due to shared method variance of different indicators. In addition, the MIs indicated improved model fit when allowing some motor items to load on the restrictive/repetitive behaviors factor (RBB): thus, the CFA approach with crossloadings set to zero may be too restrictive. However, we decided not to specify these paths to avoid over-fitting and to keep the model’s parsimony for replication in a validation sample.

We found high factor correlations between .74 and .88. Thus, the constructs are not independent of each other. In other factor analytical studies, comparable high correlations between the two characteristic domains of SC and RBB were found ( Frazier et al., 2014; Matson et al., 2008 ). In addition, these high correlations in our study are at least to some degree due to sample characteristics. Model fit was evaluated in a clinical sample of individuals with IDD and suspicion of ASD. Therefore, all individuals show probably higher symptom loads in deficits in social interaction compared to the sample from the general population. Furthermore, motor deficits are common in individuals with IDD, which may explain the high correlations between all factors. In addition, the non-specified cross-loadings can explain in part the high factor correlation between the RBB and the motor coordination factor.

Taken together, the CFA supports factorial validity of the slightly modified three-dimensional MUSAD draft version including 37 items. Analysis of the factorial validity of the verbal set of items (MUS2a) and the corresponding nonverbal set (MUS2b) will be performed in a follow-up study. Future studies may allow us to perform a nested factor analysis with a greater sample size to first investigate whether impairments in motor coordination may fit into the RBB domain on a higher hierarchical level, and second to evaluate the presence of lower order social communication domains.

Construct reliability of the three domains SC, RBB, and motor coordination was excellent, as indicated by McDonald’s omegas >.92 ( Cicchetti et al., 2011 ). Values of Cronbach’s Alpha >.8 have been reported for the IDD specific PDD-MRS ( Kraijer & Bildt, 2005 ), the ASD-DA ( Matson et al., 2008 ), and the DIBAS-R ( Sappok, Gaul, et al., 2014 ).

Item difficulties of all 37 items included in the CFA showed high variability and ranged between .04 (particular interest for parts of objects) and .82 (reactive social smile). Varying difficulties are desirable in assessing a broad spectrum of autism- related behaviors and a divergent group of individuals with mild to profound IDD. In particular, items with high difficulties

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

are probably useful in displaying differences between individuals with a high symptom load. Overall, we found high item- total correlations: only item MUS306 (particular interest for parts of objects) was below the threshold of >0.3 ( Ferketich, 1991; Maltby, Day, & Macaskill, 2010 ). At the same time, this item is the most peripheral one with the highest difficulty. For reasons discussed above, we decided to keep this item under observation in further scale development.

To assess objectivity, interrater agreement was calculated in terms of ICC. Agreement between the two external raters (ICC = .71) as well as between three raters including the test developer (ICC = .67) are ‘‘good’’ according to the cutoffs recommended by Cicchetti and Prusoff (1983) . This is indicative of sufficient objectivity in the test developer’s ratings. However, compared to the results of the ADOS pilot study ( Lord et al., 2000 ) with more than 80% exact agreement among raters and across all modules, the MUSAD results seem to be worthy of improvement. These improvements in interrater reliability could be achieved by training of the raters including consensus conferences, further item selections, the revision of the 4-point coding descriptions, and a more stringent execution of the MUSAD.

In assessing the stability of the MUSAD over time, the mean value of four test-retest correlations (ICC = .69) was acceptable. The ADOS showed similar results, with an ICC of .82 for the SC domain and .56 for RRB ( Lord et al., 2000 ). Assessing convergent validity by calculating correlations between the MUSAD total scores and the SCQ, the PDD-MRS and ADOS resulted as expected in positive significant correlations ranging from .32 to .85. Overall, these results support the contention that the MUSAD measures ASD symptoms. Looking more closely at convergent scales, the highest correlations were found with the ADOS modules 1 and 2 (r = .58 resp .83). This may be caused by the conceptual comparability in assessing situation-specific interactional behaviors of the MUSAD and ADOS assessments, but due to small sample sizes, the data should be interpreted with caution. A correlation of r = .55 was found with the PDD-MRS, which is similar to the convergent validity of the ASD-DA ( Matson et al., 2008 ) and the DIBAS-R ( Sappok, Gaul, et al., 2014 ), both IDD-specific ASD screeners. The comparatively moderate agreement of the MUSAD with the SCQ sum score may be caused by the fact that the SCQ is designed to assess children and does not catch all aspects of an adult ASD/ID phenotype or a reduced parallelism due to the postponement of MUSAD items assessing verbal communication. In assessing discriminant validity, the low correlation with the MOAS supports the independence of measured constructs. However, unexpectedly, we found a weak positive relationship between MUSAD and ABC scores. Interestingly, closer examination shows that two of five ABC domains cover ASD core features that are also covered by the MUSAD, i.e., lethargy/social withdrawal and stereotyped behavior; while another two cover additional ASD behavioral characteristics, i.e., irritability and hyperactivity/ noncompliance. The ABC is a widely used measure in ASD treatment studies ( Kaat, Lecavalier, & Aman, 2014 ), and has been used diagnostically to identify additional ASD in individuals with Down’s Syndrome ( Ji, Capone, & Kaufmann, 2011 ). These strong associations between behaviors assessed with the ABC and ASD in individuals with IDD suggest that the ABC is less useful to assess discriminant validity in further studies. Significant correlations between the MUSAD and ASD measures on the one hand and a negligible correlation with the MOAS indicates construct validity of the MUSAD, i.e., the potential to detect the adult IDD phenotype of ASD.

To summarize, the MUSAD showed adequate psychometric properties in terms of reliability, objectivity and factorial validity. However, interrater reliability should be improved and tested in further scale development. Compared to other music-based approaches in diagnosing ASD, the development of the MUSAD allowed a comprehensive psychometric verification in this field for the first time with an adequate sample size. Wigram’s Harper House Music Therapy Assessment (1999) is documented by a single case study and is based on Bruscia’s Improvisation Assessment Profiles (IAP; Bruscia, 1987 ), which have not yet been validated at all. Oldfield’s Music Therapy Diagnostic Assessment (MTDA; Oldfield, 2004 ) did not allow a comprehensive test-theoretical verification due to a small sample size of N = 30. Both instruments were developed to assess children, and not adults, suspected of having ASD. Therefore, within music-based approaches to diagnosing ASD, the MUSAD is the first instrument which strives for adequate psychometric properties and it is the first music-based approach to assess adults with IDD. Compared to the ADOS, the strength of the MUSAD concept is its slightly more flexible approach to make the situation more suitable and to allow for assessment of individuals who are difficult to assess, such as those with severe to profound intellectual disability ( Bergmann et al., 2015 ). Since further scale development intends to develop an algorithm, the MUSAD may become an additional valuable source for gaining relevant diagnostic information and may improve the diagnostic process in diagnosing ASD in adults with IDD, especially in those with limited language skills.

4.1. Limitations The consecutive clinical ad-hoc sample of adults with suspected ASD investigated here resulted in a blurred separation of

groups by assessing ‘‘borderline autistic’’ individuals and an unbalanced ratio of n = 50 ASD vs. n = 26 non-ASD. Due to the lack of a balanced and clearly separated control group of individuals with IDD only, the investigation of diagnostic validity, including ROC analysis and the calculation of a diagnostic algorithm, was postponed. In factor analysis a case number of more than 100 is recommended ( MacCallum, Widaman, Zhang, & Hong, 1999 ). The smaller sample size of N = 76 is explained by the monocentric study design in this early phase of test development and the high examination effort of behavioral observational instruments. Against the backdrop of only three given factors, the absence of estimation problems, excellent internal consistency, and fit indices indicating good model fit, the final CFA model seems to be sufficiently robust despite a sample size <100. Due to the referred communication domain with a module-dependent verbal item set and its non-verbal counterpart, further investigations of an expanded factorial model are necessarily based on new data.

T. Bergmann et al. / Research in Developmental Disabilities 43–44 (2015) 123–135

5. Conclusion The results of this study indicate that a music-based framework seems appropriate for assessing ASD symptomatology in

adults with IDD. Conceptualized as a specific ASD observational instrument, the MUSAD proved useful in diagnosing even highly affected individuals with limited language skills. Indications of its objectivity, reliability, and factorial and construct validity were found. Study results are going to be verified in piloting the MUSAD based on an IDD, gender and age matched sample assessing diagnostic validity.

Conflicts of interest This research was in part supported by Stiftung Irene, gemeinnu¨tzige Stiftung zum Wohle autistischer Menschen, Hamburg,

Germany. The funding body had no influence on the design of the study, the writing of the manuscript, or the decision to submit the paper for publication.

Acknowledgements We wish to thank Kai Reimers for data collection and Linda Westphal for scoring as an independent expert. Special thanks

to our patients and their legal custodians for their participation in the study. References