Attention and grouping

  • Presented as as “The role of attention in the perception of grouping structure” at the International Conference of Students of Systematic Musicology (SysMus).
Figure 1


Existing models of the perception of musical structure mostly do not account for the fact that listeners’ hearings are known to vary substantially: the same passage can be interpreted differently by different listeners, or by the same listener at different times. Attention—the deliberate or unconscious focus a listener may place on a particular aspect of the music, such as its melody or rhythm—seems to play a role in the perception of structure, but whether it is an important cause of grouping preferences or the product of them is unclear. We study the influence that paying attention to musical features (including harmony, melody, rhythm and timbre) has on grouping decisions. The experiments use composed musical stimuli exhibiting changes in particular features by design; some stimuli exhibit a single change, while others exhibit changes in different features at different times, leading to ambiguous segment boundaries and groupings.

We first tested whether our subjects were able to correctly associate changes with musical features, to establish that their understanding of the stimuli was multidimensional and not purely holistic. Second, we tested whether an explicit instruction to focus on a feature increased the salience of boundaries marked by a change in that feature. Finally, we tested whether focusing on a feature would make groupings according to that feature preferable. To do so, we asked subjects to perform a distractor pattern-detection task that directed their attention to a particular feature. They then heard ambiguous stimuli, which had structure AAB and ABB with respect to two different features, and indicated their preferred grouping.

The results showed that listeners were skilled at identifying changes, that correctly-directed attention boosted the salience of changes, and that focusing on a feature could indeed cause a listener to prefer one grouping over another. Whereas one’s level of musical training greatly impacted how one responded on the first two experiments, its impact was not significant in the third task, suggesting that attention is a general mechanism in guiding grouping preferences.


Structure in music involves both segmentation and grouping processes. For example, a listener may infer a boundary between a verse and a chorus of a pop song if it is marked by a change in melody (segmentation); and they may group all the verses together because they are (varied) repetitions of each other (grouping).

Models of grouping structure usually try to predict how music is segmented from the ground up: what are the shortest patterns that are noticed are perceived (e.g., gestures, motives), how are these arranged to build larger patterns, and so on up to the scale of the piece. However, in contrast to this “bottom-up” approach, “top-down” factors are recognised as also important, these being factors associated with the listener rather than the music. A listener’s education and their familiarity with a given piece of music have been shown to affect the kinds of groupings they will perceive. But what else? Might the simple factor of what a listener is paying attention to play a role?


These are the questions we investigated: first, are listeners able to attend to different features within a piece of music? Second, does the salience of a change in music increase when one is focusing on the feature that changes? Third, does focusing on a feature make a listener more likely to group sections in accordance with how that feature changes?


In vision, the relationship between attention and perception has been studied very closely, thanks especially to eye-tracking technology. But comparable “ear-tracking” technology for music is not possible; we must manipulate a listener’s attention as an independent variable, not a dependent one.


To address our questions, we needed a set of musical stimuli with specific kinds of musical changes in specific patterns. (In real music, there are usually many factors changing at once and their relative importance is tough to disentangle.) We composed a set of three musical “environments,” manipulating four features: harmony (as chord progressions), melody, rhythm and timbre. Each environment has two voices, and each voice has four possible patterns. By combining these parts together, we can create excerpts of music that exhibit any pattern of changes that we want. For example, here are three clips from different environments, each of which has structure AABB with respect to harmony, ABAB with respect to timbre, and ABBA with respect to rhythm.


Experiment 1: Change identification

Is listening “multi-dimensional”—that is, are listeners aware of the independence of musical features and able to track them individually? Participants were shown stimuli with pattern AB (i.e., with one feature changing in the middle) and asked which feature changed. We expect them to answer better than chance. (Although previous studies don’t make this a controversial question, this experiment is useful to assess the skills of our participants and the comprehensibility of our composed stimuli.)

Experiment 2: Salience judgement

Are changes in music more salient when you are paying attention to the feature that changes? We ask participants to focus on a single feature while listening to a short clip, and ask them to rate the salience of the change; the change may or may not match their focus. We expect salience to increase when the focus matches.

Experiment 3: Pattern detection &s; grouping preference

Does focusing on a feature make a listener more likely to analyze a piece according to that feature? We direct each listener’s focus by presenting them with a pattern and asking them to determine whether the pattern occurs in a longer excerpt. We then ask them which grouping of the excerpt they prefer, and expect them to prefer the analysis that matches the focal feature.

Figure 1


We obtained positive results for each experiment, and every difference was significant:

  • In Experiment no. 1, participants identified the correct changing feature at a rate of 86%, above the chance level of 25%.
  • In Experiment no. 2, participants rated changes as more salient when they were paying attention to the feature that changed (0.63 compared to 0.15 on a scale from 0 to 1).
  • In Experiment no. 3, participants preferred the grouping implied by the feature they were focused on 65% of the time, up from a chance level of 50%.

We expected the results to be affected by participants’ level of musical training, and this was borne out by the data (see Figure below). In Experiment no. 1 (top left), greater musical training correlated with greater accuracy in identifying the feature that changed. In Experiment no. 2 (top right), the difference in salience between the two conditions (“match” and “wrong”) was larger when musical training was greater.

In Experiment no. 3, musical training again correlated with participants’ accuracy in detecting whether the pattern was present or not (bottom left). However, there was no correlation in Experiment no. 3 between musical training and how often participants preferred the grouping implied by the focal feature (bottom right).

Thus, while musical training improves one’s ability to articulate what kinds of changes occur in music, it does not change the fact that paying attention to a feature can lead one to group according to changes in that feature. This bolsters the view that attention is a fundamental mechanism that guides how listeners interpret grouping structure in music.