Convention and Representation in Music

  • Hannah H. Kim (Macalester)


In philosophy of music, formalists argue that pure instrumental music is unable to represent any content without the help of lyrics, titles, or dramatic context. In particular, they deny that music’s use of convention counts as a genuine case of representation because only intrinsic means of representing counts and conventions are extrinsic to the sound structures making up music. In this paper, I argue that convention should count as a way for music to genuinely represent content for two reasons. First, the view that only intrinsic ways of representing counts is too stringent. If use can ground meaning in language, then use might also ground meaning (and representation) in music, too. Second, even if we were to insist on intrinsic features, convention should count as a way for music to genuinely represent because convention is an intrinsic feature of music. Without knowledge of musical systems and encultured listening, music wouldn’t even be recognized as music, so convention is already baked into our listening practices.

Keywords: music, representation, convention, language, aesthetics

How to Cite:

Kim, H. H., (2023) “Convention and Representation in Music”, Philosophers' Imprint 23: 5. doi:



Published on
25 Jul 2023
Peer Reviewed

I. Introduction

In the eighteenth century, “pure” instrumental music — music without text, title, program, or dramatic setting — became popular, and this uncoupling of music from words led to new questions regarding instrumental music’s nature and capacity. Eduard Hanslick famously argued that music “consists of tone successions, tone forms” and that “these have no content other than themselves”, thereby founding musical formalism (2018, 109). Arguing against the emotivists who found meaning and beauty in music’s ability to express or represent emotions, Hanslick argued that non-programmatic instrumental music is pure sonic structure with no inherent meaning: internally complete, and without reference to anything beyond itself. The beauty and uniqueness of music, he argued, have to do with the forms embodied by the music, from basic schemes outlining how melodies are presented to work-specific instances of orchestration, instrument arrangement, repetition, and variation.

Setting aside normative questions regarding the proper appreciation of music and the source of music’s value, I’d like to probe the descriptive question: is music capable of representing or otherwise conveying content? Insofar as music’s content is sonically moved forms, Hanslick answers that neither can instrumental music represent any other content, including emotional content, nor is it music’s aesthetic point to do so (Hanslick 2018, 16, 32, 42; Kivy 1989, 53). Roger Scruton (1997, 273) agrees, writing that music is “an abstract art, with no power to represent the world”. Nick Zangwill (2004) also agrees, though he emphasizes that music need not represent emotions to be experienced as meaningful.

Kivy (2009, 74), presenting what he calls enhanced formalism, concedes that some music can express emotions but argues that emotive properties of music are “perceptual, phenomenological properties, not semantic or representational ones”. Though Kivy’s brand of formalism is discussed as a “contour theory” that focuses on the parallel between musical features and typical expressions of emotions, Kivy also acknowledges the role convention plays in how music is seen as expressive. “If I have lavished more attention on ‘contour’ than on ‘convention’”, he writes, “it is not because I think the contribution of the latter to musical expressiveness is any the less” (1989, 133).

In this paper, I’ll be pushing Kivy’s line of thought to challenge the formalist argument that convention doesn’t count as a way for pure instrumental music to represent content. By ‘convention’, I mean practices or understandings that are artificial, invented, or optional, as opposed to natural, fundamental, and mandatory (Goodman 1989). This definition suffices to understand what’s at stake between the formalists and myself since it focuses on developed practices or associations that surround music qua structure of sounds.

The formalist stance on music, convention, and representation might be summarized as:1

  1. If medium X represents content, X uses only intrinsic features of X to do so.

  2. Music must use convention to represent content.

  3. Convention is not intrinsic to music.

    • C1. Music does not use only intrinsic features to represent content.

    • C2. Music does not represent content.

I’ll show that 1 is questionable and that 3 is false, concluding that convention should count as a way for music to genuinely represent content.

Formalists demand that music be “intrinsically meaningful”, focusing on what music can represent “by itself”. They say that a medium’s intrinsic features, independent of extrinsic features’ meddling, must be sufficient to trigger an understanding of what’s represented. Hanslick argues that music cannot represent content because there is no necessary link between tones and the ideas or emotions they represent. Associating cheerfulness with G major, he writes, is merely a “physiologic-psychological correlation”, which is only our interpretation, not a feature of the tone itself — so it doesn’t count as an instance of representation. Scruton, too, writes that a medium is capable of representation only if “unaided” or “unprompted” engagement with it can result in comprehension (1997, 281). He uses this criterion to deny music’s representational capacity, arguing that a listener couldn’t grasp any content without the title, lyrics, or dramatic context providing hints. Kivy also writes that the issue is about what music itself, when listened to as pure sonic form, is capable of (1989, 223).

But 1 is questionable, because the view that only intrinsic features can lead to genuine representation is too stringent a requirement. Not even language would pass the test! Pragmatics and ordinary language philosophy show how use can ground or generate meaning. Instead of working in fundamentally different ways, music and language might be on a kind of spectrum on how meaning gets conveyed through sound. The existence of this kind of continuum between speech and music isn’t necessarily problematic (Hamilton 2007), and I aim to support this by highlighting the role extrinsic features play for both language and music. Music, like language, can represent content by being put to a particular use. Music can be “about” things and “tell” simple stories about, say, resolution after a struggle, even without words, dramatic settings, titles, or other accompanying details.

Since conventions are practices or understandings, a formalist interested only in music’s sonic structure would consider them extrinsic to music. However, 3 is false because conventions play an ineliminable role in making music and endowing it with the kinds of structural qualities formalists care about. Empirical studies show that narrative listening — hearing “stories” or plot-like content in a passage of music — is rampant, and the musical medium is able to “carry” content in virtue of the ways in which we’ve learned to listen to certain music. Factors surrounding our listening practices, such as culture (awareness of musical systems) and conventions, are not external influences that taint our pure enjoyment of music. Without conventions, we wouldn’t even be able to properly hear and recognize aspects of the sonic structure of music. Encultured listening plays a necessary role in making sense of musical meaning; insisting that only what music can “intrinsically” represent counts as genuine representation is to artificially limit sources of meaning not only for music, but also for language.

Music’s representational capacity matters because it limits the scope of what musical formalism can be arguing. Formalist claims about what music can and can’t do should be understood normatively. Instead of a descriptive view detailing what music is and isn’t capable of, formalism should be taken as a prescriptive view regarding the proper appreciation of music or what music ought to do to be meritorious. Alternatively, formalism should only be taken to apply to a small subset of Western instrumental music, namely, absolute music from the eighteenth and nineteenth centuries which explicitly rejects representation. If any music can come to represent content, then less and less purely instrumental music can be safely described as absolute. This is a significant result, since Kivy (1989, 235) maintains that only a small amount of instrumental and vocal music in the Western tradition has representational, narrative, semantic, or other extra‐musical content, leaving the majority of Western music “absolute”.

In the next section, I’ll argue that philosophers comparing music to language have been implicitly assuming a compositional semantic model of language — the view that language units like phrases and sentences become meaningful due to their constituent word meanings adding up. Challenging this assumption makes it easier to see how language and music both rely on use and convention. In section III, I’ll argue against the formalist claim that conventions are extrinsic to music by showing just how inextricable they are to hearing music as music and noticing the kinds of features that formalists value. Section IV will provide examples of conventions in music and defend convention-driven musical representation against formalists’ worries. I’ll then conclude in section V with a comparison to literary formalism and upshots of music’s newly acknowledged representational capacity.

II. An Assumption About Language

Let me first clarify what I mean when I say music can represent content. Philosophers continue to debate just what representation is, but for my purposes, I use the word ‘represent’ in the basic, ordinary, intuitive sense: Leonardo da Vinci’s painting Mona Lisa represents a woman, a green light at an intersection represents permission to proceed, and ‘it is snowing outside’ represents a cold, flurry weather condition. Anyone who sees the Mona Lisa can see its represented content, and anyone who speaks English can understand the meaning of the weather utterance.

I argue that some music is such that anyone can make out the represented content (given the right enculturation, as we’ll see below). The content might be simple, and sophisticated knowledge of music might help, but the point is that music can represent content in ways that language does. If you don’t like to say that language represents content — perhaps because language just has content, or communicates content, or conveys content — then feel free to pick your preferred verb and apply it to music. The argument is that whatever we think language is doing, music can do, too, through similar mechanisms, though its representational capacity might be less fine-grained than language’s is.

Many philosophers, including Kivy (1989, 228–229), Scruton (1997, 286), Susanne Langer (1957, 232, 240), and Ann Clark (1982, 198) draw an analogy between music and language but conclude that music is not a kind of language because it lacks literal meanings. According to them, music is not a language because it lacks dictionary meaning or an assigned connotation with fixed import. In psychology of music, a foundational axiom is that music is an abstract stimulus, not “mired in referential content like language” (Besson and Friederici 1998, 5). Hanslick, too, notes the affinity between music and language — we speak of ‘thoughts’ in musical works and call a rationally closed group of tones a ‘phrase’ — but concludes that music and language are fundamentally different because sound is only a means for an end in language whereas in music, sound is itself an end (2018, 60).

However, there’s an implicit understanding of language that is being taken for granted here. When philosophers deny that music is a kind of language, they assume a compositional model of semantics: the view that words, phrases, or sentences gain their meaning in virtue of the “dictionary meaning” of each word and the way word parts fit together. In this model, a sentence is like a structure with many Lego pieces (words) and it is the “literal meaning” of the constituent words that give rise to the overall “meaning” of the entire structure (the sentence). The compositional model assumes the smaller parts to compose the larger meaning, and this view of language explains why language might be representative in ways that music cannot be; words have fixed meanings “in themselves” whereas tones do not have direct connections to any particular meanings “in themselves”.

However, this model of semantics is far from uncontroversial, and notably, proponents of ordinary language philosophy challenge the idea that words, let alone sentences, have, and build on, “fixed” or “literal” meanings.2 In Philosophical Investigations, Ludwig Wittgenstein shows how patterns of our lives contribute to our understanding of linguistic meaning.3 Depending on the context, ‘slab’ can refer to a flat piece of stone or convey a request for the flat piece of stone. In philosophy of language, pragmatics asks how and why the communicated content of our utterances sometimes goes beyond the literal meaning of words used. The meaning of words, context, and speaker intention all contribute to the meaning of an utterance, and according to ordinary language philosophy, use “grounds” our language meanings in the sense that it creates the stability on which our linguistic meaning can rest. As Toril Moi elaborates in Revolutionary of the Ordinary, we discover what words mean by looking at their use. Because use is systematic, public, and shared, we can create a map, an outline, of the “grammar” of the relevant words, expressions, or language-games. In this exercise, world and words are intertwined. To understand the meaning of our utterances, it is necessary to understand the situations, habits, and needs that give rise to them. We know the meaning of an utterance not because we all grasp some independently existing meaning that is associated with the words making up the utterance, but because we’re sensitive to the ways in which particular words are used in particular contexts.

If convention can be understood as a practice that arises through shared understandings and repetition, it seems that music, too, can be used in the way language is used. Our composing, performing, and listening practices would count as the “use” of music, and our reactions to them, whether cognitive, emotional, or somatic, might constitute what the music “represents”. If a particular melody is played every time there is cause for celebration, the next time the melody is played, we might find its rhythm to represent “celebration” (e.g., the Happy Birthday melody). If a musical passage doesn’t resolve back to the tonic, we might take its melody to represent incompleteness given musical conventions. Philosophers argue that music is not a kind of language, because it lacks assigned connotations and fixed import — but we might think that language lacks these things, too! If so, it is unclear what the boundary between music and language really is. Music has a “grammar” of its own, its own syntax that renders notations well-formed (or ill-formed), and its “semantics”, like language, might be extracted from use.

To be clear, the argument isn’t that music is a kind of language.4 Of course music’s representative capacity isn’t as specific as language’s is, and in language, the meaning of single words can be fixed (such that dictionaries exist and are routinely helpful).5 Music, on the other hand, has no “words” (the smallest unit of meaning), and we don’t have a principled way of telling apart musical “words” from musical “sentences” (Kania 2020, 114, fn32; Clark 1982, 201). I’m happy to concede this difference between language and music, but music’s lack of discernible words doesn’t preclude its “phrases” from being meaningful or representational.

We might also say that music and language simply have different ways of representing, music perhaps admitting more layers (e.g., rhythm, timbre, pitch) while lacking anything like lexical boundaries. Perhaps music need not be like language in terms of having discernible word-like units; perhaps music and language are two media along the same spectrum where sound and symbols can represent something beyond themselves. Many content-representing mechanisms that we attribute to language (e.g., convention) are also found in music. Though music has no words, its keys can convey mood (minor keys conventionally convey sadder moods than major keys do),6 and hearing the chromatic scale brings to mind something ascending or descending. Garry Hagberg (2021, 369) writes that “semantic content is not hermetically sealed within the linguistic” and that music can awaken “constellations of connotations”, which, once mobilized, “can carry semantic weight”. Semantic content can emerge out of the process of reflective engagement. We simply don’t experience music as a pure structure;7 music immediately recommends certain contents to our minds while we’re listening. We can’t help but hear the loaded “meaning” in music, like languages we know.

Music, like language, must be learned, and both can be experienced meaningfully (Kania 2020, 113). Athanasopoulous et al. (2021) found that harmonic style is sufficient to effect a music’s seeming expressivity in both Western and non-Western music. Listeners tend to understand and agree on what emotive terms would apply to musical passages (or at least agree on a reasonable range of descriptions); nobody would describe the opening of Beethoven’s Fifth Symphony as peaceful and idyllic. Margulis et al.’s (2019) study found that participants attributed consistent narrative content to wordless music. Study participants — both educated and non-educated, Western and Eastern, urban and rural — found “stories” in the musical excerpts. Margulis et al. conclude that “narrative listening is a fundamental mode of engaging with music, with enculturation determining which specific musical patterns seem to tell stories” (2019, 6).

Text setting (writing lyrics to already-existing melodies) points to a shared sense of what kinds of content or meaning cohere with what kind of music. The practice of associating texts to already-existing music would be futile (or arbitrary and therefore aesthetically ineffective) if there weren’t a shared sense of how a particular passage might be described. Composers must have a “general agreement” of appropriate music-wording pairings, and competent listeners must have a general sense of when the composers had gotten them “right” or not.

In this section, I’ve argued that music’s representative capacity is easier to acknowledge once we remove a barrier that clouds its similarity to language. Moving away from the compositional model of semantics shows how use and convention shape linguistic meaning, and if we think ordinary language philosophers are right to point to the close connection between language and forms of life, then we can point to conventions surrounding music, too, as a source of musical meaning and representation. In the next section, I’ll show why insisting on intrinsic representation is too stringent a requirement for something to genuinely represent.

III. Questioning the Intrinsic/Extrinsic Divide

Formalists acknowledge that sometimes music seems meaningful, but they dismiss its genuine ability to represent content. As we saw, they demand music be “intrinsically meaningful”, focusing on what music can represent “by itself” such that “unaided” listeners can grasp its meaning. While formalists admit that musical passages can be correlated with emotions or ideas (à la Hanslick and Kivy), they maintain that whatever we perceive can’t be a feature of the tones themselves, and therefore fails to qualify as a genuine instance of representation.

In the same vein, formalists might accept that conventions can endow meaning to music without that counting as an instance of musical representation. Like texts or accompanying dramatic context, conventions are considered extrinsic to music — so, the argument goes, those meanings or content associated with music are not internal or structural to the music itself. Music must be able to convey content “on its own” for it to count as a genuine instance of representation.

Though I’ll eventually argue that convention’s deep connection to music makes it more plausible that convention is baked into — that is, intrinsic to — any music, I first want to argue that this formalist demand that music represent things “on its own” is too strong. The question I want to press is this: what does it mean for a tone or a rhythm to represent something on its own? Does art in any medium do this?8 Would we ask of a poem or a painting whether it can represent anything “on its own”? At some level, the question seems silly — insofar as a poem relies on lexical semantics, audible qualities of the text, visual formatting of the text, and the history of its particular form, it is clear that the poem doesn’t represent anything “in itself” or “on its own”. Understanding the content or meaning of the poem requires linguistic understanding, historical awareness, and psychological associations around sound, shape, and affect. Why do we think music is so unlike poetry as to be able to represent anything “on its own”? Not even language would pass this test, but we consider language capable of representing content. So we shouldn’t require music to be able to represent anything “on its own” in order to acknowledge its genuine representative capacity. Conventions and use surrounding music should count as legitimate ways to represent content.

Jason Leddington (2021) pushes back on the idea that a medium is capable of representation only if an “unaided” engagement with the work results in comprehension, arguing that whether something is aided or unaided is audience-relative (356). I think this is right, but since Leddington discusses this idea in the context of pictorial representation, I’ll provide examples in the next section that show whether a musical engagement is aided or unaided is also an audience-relative matter. Properly hearing music and recognizing its elements require familiarity with the conventions that surround a given musical system. Conventions, then, are intrinsic to music in an important sense. Without convention, we wouldn’t even recognize music as music, let alone hear the kinds of structural qualities that formalists consider central to our music appreciation.

IV. Convention in Music

Let me ground us in the phenomena by providing two musical examples with conventions at play. Bach’s Chromatic Fantasia and Fugue in D Minor features a Picardy third as the minor key is resolved with a major chord at the end of the first section. The technical aspects of the first theme are impressive, but given the unexpected major chord in a minor key, we also can’t help but hear a frantic struggle coming to a positive end. This is a simple narrative that we can discern from the melody, and the music is capable of representing franticness with rapid chromatics and unexpected resolution with a Picardy third.

It is also conventional for classical music to repeat many of its melodic phrases. In fact, ethnomusicologists consider repetition among the few universally shared musical characteristics in the world. Sometimes, musical compositions take advantage of this expected repetition to convey a story-like progression. In Mozart’s Fantasia in D Minor, a melody led by leisurely triplets is followed by a quarterly eighth-note based melody, then staccatos, a reverting to the eighthnotes, then finally a loud chromatic outburst. The transition back and forth between the quarterly eighth-notes and the staccatos before reaching the chromatics without ever returning to the original triples helps us “read” from the rhythmic patterns. What’s represented is a development of an unstable situation and an unsuccessful attempt to control the situation, whether it be an inner emotional life or an external event surrounding a community.

The argument isn’t that hearing representations is necessary for the enjoyment of Bach or Mozart, only that these pieces can represent, say, an emotionally charged day of an anxious person. The suggestion is that just as conventions can manage to convey content in the case of language, an artful exploitation of them can manage to convey content in the case of music, too.

We are happy to allow enculturation and usage to render language meaningful. Nouns like ‘paywall’ and ‘selfie’ were newly coined from established words to capture new facets of modern life. ‘Zoom’ used to be a verb meaning to visually magnify, but common usage has changed the meaning of the word to now include “to meet virtually.” These examples illustrate just how routinely convention introduces new content to a word. In the same way, existing musical elements can gain new meaning through the ways in which they’re used. We’ve seen how Mozart uses conventional repetition to convey a narrative, and how Bach uses associations between major/minor keys and mood to suggest an unexpected happy ending with the Picardy third. New musical practices, such as EDM drops can take hold as an established practice and come to routinely represent a new level of intensity.

In The Corded Shell, Kivy devotes a whole chapter to the role convention plays in helping music express content. Being a competent listener, Kivy writes, requires knowledge of conventions and associations in one’s musical culture. Making emotive distinctions in music is, in part, learning how to hear what one’s culture conditions one to hear. We find descriptions of music more or less appropriate depending on our recognition of musical “tags”, associations between music and certain content that have accrued over hundreds of years. For instance, when we hear a passage with descending pitch, we visualize something “coming down” (say, someone sliding down the stairs), and a brass fanfare is likely to bring to mind something royal or official.

Despite admitting conventions’ role in musical expressiveness, Kivy curiously only mentions two modes of analyzing or relating to music, writing that either “description of music can be respectable, “scientific” analysis, at the familiar cost of losing all humanistic connections; or it lapses into its familiar emotive stance at the cost of becoming, according to the musically learned, meaningless subjective maundering” (1989, 9). But the received distinction between dry technical description and idiosyncratic subjective reaction don’t exhaust the ways in which we engage with music. There are intersubjective grounds on which we can talk about music that are grounded on objective facts about form and psychology. There is room between the scientific and the subjective.

Analysis relying on convention and enculturation fills the gap: an intersubjective, socially derived interpretation of music that relies neither on “objective” descriptions (e.g., wavelengths) nor idiosyncratic “subjective” descriptions (e.g., personal memories). A particular music being open to interpretation doesn’t condemn it to being “hopelessly relativistic” or “merely whimsical” (Bicknell 2002, 160). Convention and enculturation provide the grounds on which intersubjective agreement can be found. Though there are ongoing discussions about how exactly convention confers meaning, what is important for the purpose of our discussion about conventions is that they are widely shared and understood among the culturally competent, and that they produce reliably similar content detection in wordless music.

Conventions and encultured listening practices abound. Sometimes, historical contingencies explain the musical associations we make. For example, the snare drum evokes the military, given cross-cultural use of drums in war. At other times, conventions seem to track some intuitive psychological connection between sound and meaning: syncopation feels more informal than formal; passages with chromatic mediants bring to mind words like “wonder” and “awe”; and the open, slowly changing harmonies from Aaron Copland have been said to conjure up the notion of “America”. Harmonic, melodic, and rhythmic features determine what is represented in music, and it is convention and familiarity with cultural associations that link those musical elements to extra-musical content.

Kivy (1989) argues that recognizing music as music requires using convention from the get-go. In the context of mereology, C. S. Sutton (2012) also argues that extrinsic factors like intention can determine and ground object kinds. A mere lump of clay, when paired with appropriate extrinsic features such as intentions or conventions around statues, gives rise to a new object: a statue. We might apply this insight to music, too: mere noise becomes music only when extrinsic features (intentions or conventions) enter the picture. This means that without convention, there would not be music at all, and any discussion of what music is capable of “on its own” “without convention” would be a non-starter, since music’s very existence is grounded in culturally recognized conventions.

Recognizing musical elements and interpreting those elements crucially rely on one’s cultured experience of music, including knowledge of conventions surrounding a given musical system.9 An anecdote might help illustrate the point.10 During a charity concert in the U. S., the crowd applauded as renowned sitar player Ravi Shankar finished tuning his instrument. To this, he replied, “Thank you. If you appreciate the tuning so much, I hope you’ll enjoy the playing more”. There is a recording of this incident on YouTube, and having seen it, I can understand why Shankar’s tuning and warm-up might have sounded like a mini performance. He plays what sounds like arpeggios, and the audience might have simply thought that Northern Indian music features short compositions. But the point I want to illustrate is that without the relevant knowledge surrounding Hindustani classical music, or at least being exposed to a musical style less familiar to the average American audience, the music appreciators weren’t even capable of correctly discerning where the tuning ended and where the music began. Without the requisite musical familiarity, the listeners couldn’t even distinguish music from mere noise.

Bicknell (2002) thinks there’s something about the form of the music that affords intersubjective agreement, but the ability to make out music’s form itself requires familiarity with a given musical system. So it must be something prior to form — something like convention — that leads to our shared intersubjective experience of music. Listeners tend to attribute more tension to musical systems they aren’t familiar with, and listeners from different cultural backgrounds use different adjectives to describe the same musical excerpts (Margulis et al. 2019). For instance, atonal music from Anton Webern was described with words like “horror”, “murder”, and “paranoia” by U. S. study participants, whereas Chinese participants described the music with words like “happy”, “playful”, and “friends” (Margulis et al. 2019). U.S. listeners focused on the atonality because most Western music is tonal, whereas Chinese listeners, without the presumed tonal framework, were able to focus on other aspects of the music, such as staccatos. Similarly, in a study involving U. K. and “unwesternized” Pakistani ethnic groups, Athanasopoulos et al. (2021) found that non-Western listeners didn’t rate major and minor keys as significantly different from each other and didn’t hear their different emotional valences. These findings show that recognizing and interpreting aspects of a given music rely on familiarity with rules and conventions that make up a musical culture.

This also challenges the idea that musical appreciation can be aided or unaided in general. What we notice in music is determined more by cultural tradition than by the inherent qualities of the music, and what passages we find content-laden — and how we determine those contents — are also sensitive to enculturation. There’s even evidence for bimusicalism, a familiarity with two musical systems which allows listeners to pick out musical elements with more sensitivity and recall passages with more ease.

Our listening experience is pre-structured depending on the categories and schemas developed from prior listening experiences (Polite 2017, 99). Without conventions, music would be mere noise. What organizes and structures noises as music are conventional familiarities that we have learned to apply to noises. It would be one thing to argue that the aesthetic merit of music ought to only consider the intrinsic features of the music. But when it comes to representative capacity, extrinsic features ought to also count since “music alone” can’t even exist independently, let alone its elements be perceived. We can’t talk about formal features of music without also talking about the means of interpreting them — and to talk about interpreting music is to bring in conventions.

The general spirit of my point is that we should think about the way we engage with art instead of focusing on the art object alone. It seems that formalists, in their desire for rigorous research on music, swing too far the other direction to foreground the art object, leaving behind the subject that is experiencing the object. But music is something we create, consume, and contemplate — and not something that is enjoyed in the abstract apart from its performative context.11 As such, when we theorize about music, a central notion should be the ways in which we engage with it, instead of talking about works as if they are objects in a vacuum that we appreciate.

It is, of course, important to theorize about art in a way that goes beyond individual idiosyncrasies. But neglecting the perceiving subject might have led to underappreciating the practice-based meaning-making we naturally fall into when listening to music, including the role convention and culture play in rendering music representationally capable. There is no good reason to reject conventions as something external to the way music works, especially since recognizing music as music, as discussed, relies on conventions to begin with.

V. Conclusion

So far, I argued that the musical medium, like the linguistic medium, can represent content by convention. In both music and language, convention isn’t just an idiosyncratic external imposition. How does this relate to what the formalists believe?

Formalists would be happy to admit that when it comes to program music, awareness of representational features and content might be needed for a full appreciation of the work (Leddington 2021, 354). But I’m arguing that even non-programmatic music can represent content by way of convention. Formalists aligned with Kivy might also happily admit that music has expressive properties that are intrinsic to it. But I’m pushing for more. Putting aside the related claim that there might not even be a clear distinction between expressivity and representation, I’m arguing that music’s intrinsic features can represent content in addition to expressing it.

If conventions manage to endow music with representative capacity, then Hanslick and his defenders are wrong to argue that pure instrumental music can’t represent any content. They may argue that music is best enjoyed if appreciated apart from their representational capacity, but strictly speaking, they’d be in error if they aim to argue that pure instrumental music literally lacks the capacity to represent anything. Formalism can at best be interpreted as a normative claim concerning what music should do, and not as a descriptive claim concerning what music can and cannot do. Perhaps the formalist claim should be understood as the thesis that music qua art need not represent content. This is consistent with the claim that a representational understanding of music isn’t necessary for appreciating a work of music (Scruton 1997, 250–252). To support this conclusion, let me draw a parallel between musical formalism and literary formalism.

Like musical formalism, literary formalism argues that literary criticism is about details of the language itself, not character or plot.12 Critics were to establish the formal patterns of the work as the merit-raising feature, focusing on the language instead of characters’ psychologies or events within the work (Knight 1933, 28, 50n, 64). What was intellectually and aesthetically interesting about literature weren’t the contents of the story but the form of its telling. Emotional responses to the work were amateurish — a distraction at best — and a genuine appreciation of literature required a “necessary aloofness from a work of art” (Knight 1933, 28). Moi argues that literary formalism, or at least as initially articulated by L. C. Knight in his inaugural “How Many Children Had Lady Macbeth?”, had a specific aesthetic and professional agenda, aiming to redirect criticism in a direction that valorized modernist literature (Anderson, Felski, Moi, 2019, 4).

Musical formalism, too, is about redirecting critics’ and listeners’ attention to features that render pure instrumental music valuable. We can say that music’s unique beauty or value doesn’t depend on representation or narrativization, but it is false to say that pure instrumental music is unable to represent content or tell simple stories without the introduction of irrelevant, idiosyncratic, and subjective impositions. The ‘can’t’ in the formalist slogan “music can’t represent content” isn’t about ability, but about fittingness; it has the same force as the ‘can’t’ in “You can’t swim in this pool without a swimming suit”. Musical formalism, like literary formalism, should only be applied normatively. As a view that highlights noteworthy aspects of a work, formalism recommends certain ways of engaging with a work.

Since absolute music is defined as music without representational, narrative, semantic, or other extra-musical content, the conclusion to be drawn isn’t that absolute music isn’t so absolute, but that pure instrumental music — music without text, title, program, or dramatic setting — is not always absolute music. In a similar vein, Gregory Karl and Jenefer Robinson (2015) have argued that most music lies somewhere between “absolute music” and “program music”. If any music can come to represent content via convention or some other means, then less and less purely instrumental music can be safely described as absolute — a significant result since Kivy claims most pure instrumental music of the Western canon is absolute.

Of course, the possibility of possessing content does not mean that particular musical works do in fact possess it. The point is not that pure instrumental music always possesses content, but that some of it can and does represent content. The formalist project, then, might be understood as the exploration of instrumental musical art that resists such representational interpretations. In this case, I have provided a principled way to make this distinction between two bodies of instrumental musical art: those that invite representational listening through convention and other means, and those that don’t.


  1. Thanks to an anonymous referee for help with this formulation.
  2. Jenette Bicknell (2003, 189), makes a similar methodological point, arguing that if we’re going to compare music to language, we ought to “cast as wide a net as possible” to “examine how language users actually communicate”. Constantijn Koopman and Stephen Davies (2001, 261) mention in passing that “ordinary language allows for a more generous use of ‘meaning’”. See also Chung (2019), who analyzes musical performances as Austinian performative utterances.
  3. See for instance PI, section I, parts 1–20.
  4. Stephen Davies (1994) argues that it’s not useful to compare music to language in respect to its meaning and that doing so might be more misleading than illuminating.
  5. Thanks to an anonymous referee for this point.
  6. Phillip Ball (2010) argues that the connection between minor key and sad mood “might simply be a matter of cultural convention rather than an innate property of the music” given that “some cultures, such as Spanish and Slavic, use minor keys for happy music, and some cultures — including Europe before the Renaissance, not to mention the ancient Greeks — don’t link minor keys to sadness”. George Athanasopoulos et al. (2021) and Elizabeth Margulis et al. (2019), to be discussed more shortly, also suggest that tone-mood connections might be culture-specific given that Pakistani and Chinese study participants didn’t connect atonality or minor key to negative emotions.
  7. Bergeron and Lopes (2008) give yet another reason to believe this by showing how visual information shapes our sense of music’s expressiveness.
  8. Thanks to Ted Gracyk for this point.
  9. This isn’t a new point, but I think it hasn’t been properly applied to music’s representational capacity. Richard Kuhns (1978, 121) writes that only the capable listener picks up on objects, feelings, events, relationships, and other musics being referred to and represented by a piece of music. Bicknell (2003, 187) compares musical literacy to verbal literacy. Koopman and Davies (2001, 265) write that coming to experience music as meaningful “results from a (largely unconscious) learning process in which we become acquainted with the conventions of the musical tradition to which it belongs”. Hagberg (2021, 369) compares music appreciation to chess, arguing that, like a checkmate, music’s seeming meaningfulness can’t be isolated from shared cultural frameworks. Bence Nanay (2021, 351) discusses the role of fluency in our enjoyment of music, which requires being embedded in a particular musical practice.
  10. Thanks to Brandon Polite for telling me about this incident.
  11. Bicknell (2002, 254) highlights the importance of taking people’s listening experiences (i.e., the “uses” of music) seriously, suggesting that we ought to take people’s representational, meaning-bearing, and even semantic experiences of music seriously, especially if such reports come from performers, composers, musicologists, and philosophers.
  12. L. C. Knights’ “How Many Children Had Lady Macbeth?” is to literary formalism what Hanslick’s On the Musically Beautiful is to musical formalism.

Works Cited

Anderson Amanda, Felski Rita, and Moi Toril. (2019). Character: Three Inquiries in Literary Studies. University of Chicago Press.

Athanasopoulos George, Eerola Tuomas, Lahdelma Imre, Kaliakatsos-Papakostas Maximos. (2021). Harmonic Organisation Conveys Both Universal and Culture-Specific Cues for Emotional Expression in Music. PLOS ONE 17(2): e0244964.

Ball Phillip. (2010). Does a Minor Key Give Everyone the Blues? Nature.

Bergeron Vincent and Lopes Dominic McIver. (2008). Hearing and Seeing Musical Expression. Philosophyand Phenomenological Research 78(1): 1−16.

Besson Mireille and Friederici Angela D. (1998). Language and Music: A Comparative View. Music Perception 16: 1–9.

Bicknell Jeanette. (2002). Can Music Convey Semantic Content? A Kantian Approach. The Journal of Aesthetics and Art Criticism 60(3): 253−261.

Bicknell Jeanette. (2003). The Problem of Reference in Musical Quotation: A Phenomenological Approach. The Journal of Aesthetics and Art Criticism 59(2): 185−191.

Chung Andrew J. (2019). What is Musical Meaning? Music Theory Online 25(1).

Clark Anne. (1982). Is Music a Language? The Journal of Aesthetics and Art Criticism 41(2): 195−204.

Davies Stephen. (1994). Musical Meaning and Expression. Cornell University Press.

Goodman Nelson. (1989). “Just the Facts, Ma’am!”. In Relativism: Interpretation and Confrontation, Michael Krausz(ed.), Notre Dame: University of Notre Dame Press.

Hagberg Garry L. (2021). Kivy’s Mystery: Absolute Music and What the Formalist Can (or Could) Hear. The Journal of Aesthetics and Art Criticism 79(3): 366−376.

Hamilton Andy. (2007). Music and the Aural Arts. British Journal of Aesthetics 47: 46−63.

Hanslick Eduard. (2018). On the Musically Beautiful, Lee Rothfarb and Christoph Landerer (trans.). Oxford University Press.

Kania Andrew. (2020). Philosophy of Western Music: A Contemporary Introduction. Routledge.

Karl Gregory and Robinson Jenefer. (2015). Yet Again, ‘Between Absolute and Programme Music’. British Journal of Aesthetics 55: 19−37.

Kivy Peter. (1989). Sound Sentiment: As Essay on the Musical Emotions (Including the Complete Text of The Corded Shell). Temple University Press.

Kivy Peter. (2009). Antithetical Arts: On the Ancient Quarrel Between Literature and Music. Oxford: Oxford University Press.

Knight L. C. (1933). How Many Children Had Lady Macbeth? An Essay in the Theory and Practice of Shakespeare Criticism. Cambridge. Reprinted in Explorations, New York University Press, 1964, 15−54.

Koopman Constantijn and Davies Stephen. (2001). Musical Meaning in a Broader Perspective. Journal of Aesthetics and Art Criticism 59(3): 261−273.

Kuhns Richard. (1978). Music as a Representational Art. British Journal of Aesthetics 18(2): 120−125.

Langer Susanne K. (1957). Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art. Harvard University Press.

Leddington Jason P. (2021). Sonic Pictures. Journal of Aesthetics and Art Criticism 79(3): 354−365.

Margulis Elisabeth, Wong Patrick C. M., Simchy-Gross Rhimmon, and McAuley J. Devin. (2019). What the Music Said: Narrative Listening Across Cultures. Palgrave Commun 5: 146.

Moi Toril. (2017). Revolutionary of the Ordinary: Literary Studies after Wittgenstein, Austin, and Cavell. University of Chicago Press.

Nanay Bence. (2021). Looking for Profundity (in All the Wrong Places). Journal of Aesthetics and Art Criticism 79(3): 344−353.

Polite Brandon. (2017). Prelude to a Theory of Musical Representation. Revista Música 17: 89−108.

Scruton Roger. (1997). The Aesthetics of Music. Clarendon Press.

Sutton C. S. (2012). Colocated Objects, Tally-Ho: A Solution to the Grounding Problem. Mind 121(483): 703−730.

Wittgenstein Ludwig. (2009). Philosophical Investigation, G. E. M. Anscombe, P. M. S. Hacker, and Joachim Schulte (trans.). Wiley-Blackwell.

Zangwill Nick. (2004). Against Emotion: Hanslick was Right about Music. British Journal of Aesthetics 44: 29−43.