Article
Author: Emar Maier (University of Groningen)
I argue that emojis are essentially little pictures, rather than words, gestures, expressives, or diagrams. đ means that the world looks like that, from some viewpoint. I flesh out a pictorial semantics in terms of geometric projection with abstraction and stylization. Since such a semantics delivers only very minimal contents I add an account of pragmatic enrichment, driven by coherence and non-literal interpretation. The apparent semantic distinction between emojis depicting entities (like đ) and those depicting facial expressions (like đ) I analyze as a difference between truth-conditional and use-conditional pictorial content: đ depicts what the world of evaluation looks like, while đ depicts what the utterance context looks like.
Keywords:
How to Cite: Maier, E. (2023) âEmojis as Picturesâ, Ergo an Open Access Journal of Philosophy. 10(0). doi: https://doi.org/10.3998/ergo.4641
Wittgenstein is sometimes (half-)jokingly credited with the invention of emojis, on the basis of the following excerpt from his Lectures on Aesthetics:1
If I were a good draughtsman, I could convey an innumerable number of expressions by four strokes.
Such words as âpompousâ and âstatelyâ could be expressed by faces. Doing this, our descriptions would be much more flexible and various than they are expressed by adjectives. [âŻ] I could instead use gesture or [âŻ] dancing. In fact, if we want to be exact, we do use gesture or facial expression. (Wittgenstein 1966)
The suggestion is that drawing stylized faces would be a useful addition to written language, as it would provide an efficient way to express certain meanings,2 especially those kinds of meanings that are usually conveyed by gesture or facial expression, or else, less efficiently, or less precisely, by evaluative adjectives. In the context of other famous remarks like âthe human body is the best picture of the human soulâ (Wittgenstein 1958: 178), this could be taken as suggesting the view that such face drawings, like gestures and facial expressions, are expressives, that is, meaningful signs that do not directly contribute to truth conditions, but rather express something non-propositional, like the speakerâs emotional state.
There is no doubt that some modern emojis are used roughly as Wittgenstein envisages for his face sketches. Not surprisingly, some of the points Wittgenstein makes are echoed in recent linguistic analyses of emojis. In particular, we see suggestions that emojis are like gestures (Gawne & McCulloch 2019; Pasternak & Tieu 2022; Pierini 2021), that face emojis in particular are expressives (Grosz, Kaiser, & Pierini 2021; Grosz, Greenberg, De Leon, & Kaiser 2023). More generally, emoji are often considered to function somewhat like words (Barach, Feldman, & Sheridan 2021; King 2018; Scheffler, Brandt, de la Fuente, & Nenchev 2022; Tang, Chen, Zhao, & Zhao 2020).
What many of the modern linguistic approaches share is that they treat emojis, like regular words, as symbolsâa mode of signification loosely characterized as conventional, non-natural, arbitrary, and/or learned. The obvious alternative to this symbolic view is one that treats emojis, like pictures, as iconsâa mode of signification loosely characterized as based on resemblance between form and content (Peirce 1868). Such a view is apparently taken for granted by some semioticians (Cohn, Engelen, & Schilperoord 2019; Danesi 2016), but it is never explicitly argued for or made very precise. In this paper I propose, formalize, and defend such an iconic semantics for emojis. More specifically, I argue that emojis are simply little pictures. That is, like photographs and drawings, they are used to depict âwhat the world looks likeâ.3
A pictorial account of emojis promises several advantages over rival symbolic accounts. First, it sees the use of emojis as continuous with other, more obvious pictureâtext integrations, like stickers, gifs, and memes in modern internet communication, but also like illustrated books, instruction manuals, and comicsâand of course Wittgensteinâs face drawings. This allows me for instance to borrow a fully general pragmatic model of pictureâtext composition (originally proposed for analyzing comics) and apply it to the use of emoji in Section 3.
Second, the pictorial account can explain creative, non-canonical emoji usage, like a use of the violin emoji to illustrate a cello performance, (1-a), or the use of the âpersevering faceâ emoji to express a host of seemingly unrelated emotions (frustration, sadness) and activities involving closing the eyes (praying, pretending to be asleep), united only by the fact that someone looks like that.
(1) | a. | thanks to @anonymous for chatting to me about reaching a global audience online during lockdown with his stunning cello performances |
b. | Oh man⯠that too? They stole that too? | |
c. | Behind every quiet person there is sad untold story | |
d. | I need a guy thatâs ready for a serious relationship | |
e. | I always close my eyes and pretend to be sleeping |
To deal with such a range of conceptually distinct uses, a symbolic account would have to assume that the lexical item or is multiply ambiguous, while the pictorial account readily predicts these creative usages from the generic pictorial meaning, that is, that âit looks roughly like thisâ. Weâll explore creative uses in Section 4.
Thirdly, and somewhat more speculatively, the pictorial account suggests a natural explanation of the rise of gender and skin-tone modifiers: on certain specific occasions, or may simply more closely resemble what they depict (the runner Iâm describing, or my use of the gesture) than the default or , respectively. A symbolic account that analyzes as a lexical item expressing the speakerâs approval would presumably assign the exact same meaning to the skin-tone variants or and thus have a harder time explaining their different distributions and felicity conditions. Weâll revisit this argument in Section 5.
Having made my sales pitch, I should add that there are also some limitations and caveats to the scope of my proposal. Emojis are not a wholly homogenous class, and my pictorial account will not treat all 3,521 emojis in the current Unicode standard uniformly. First, Iâm assuming, with Grosz et al. (2021), a semantic distinction between emojis that depict facial expressions and hand gestures and those that depict other entities and eventualities. I propose to model that semantic distinction within my overall pictorial framework as follows: while the entity/event emojis depict what the world of evaluation looks like, the face/hand emojis tend to depict what the utterance context looks like. A face emoji thus essentially depicts what the actual speaker looks like while producing the utterance. The Wittgensteinian intuition that face emojis are expressive is then explained by the further assumption that the human facial expressions (or hand gestures) depicted are themselves meaningful signs, and that the pictorial meaning layer naturally composes with the expressive gesture, in a way to be made precise in Section 5.
Iâm also leaving open the possibility that there is some subclass of emojis that are best analyzed as symbols. For instance, , the fifth most common emoji on Twitter according to emojitracker.com, is conventionally used to denote recycling or, more commonly, retweeting, but itâs not obviously a picture of either of these activitiesâit doesnât resemble them. Similarly, the Belgian flag emoji doesnât resemble the country it stands for, but refers to it by an arbitrary convention, much like the English word âBelgiumâ, or an actual Belgian flag. Some emojis in the Emojipedia4 category of âSymbolsâ fall in the grey area between picture and symbol âis just a conventional symbol of love, or is it perhaps in some sense a stylized (see Section 2.2) picture of a human heart, which by (fossilized) metonymic extension (see Section 4.3) is associated with love and positive emotions? For the purposes of this paper Iâm happy to concede that there may a subclass of symbolic emojis, with fuzzy borders. Still, there might be an alternative analysis that treats (some of) these symbolic emojis as pictures as well.5 Following the two-stage meaning composition account I propose for face emojis in Section 5, briefly hinted at above, we might say that flag emojis are quite literally pictures of flags, which in turn are genuine symbols of countries. This would leave us with the task of explaining the apparent transparency of the pictorial layer here. While I do offer such an explanation for the apparent transparency of face emojis (Section 5), I will not pursue this route for flag and other symbol emojis here.
In addition to pictorial and symbolic emojis, and some in the grey area in between, there are also pictureâsymbol hybrids. For instance, depicts a sleepy face with a giant snot bubble coming from the left nostril. The snot bubble here is a convention borrowed from Japanese manga and anime that symbolizes that a character is sleeping (Cohn 2013). We can analyze such mixtures by syntactically separating the symbolic elements from the pictorial elements, for instance as described for speech balloons and other pictureâsymbol hybrids in comics by Maier (2019). I will not discuss this matter further here.
Finally, for reasons of space Iâll restrict attention to the use of emojis as separate discourse units, that is, typically inserted after a sentence, or as a stand-alone discourse move:
(2) | Great idea count me in Iâm on my way |
That is, Iâm disregarding âpro-speech emojisâ (Pierini 2021)âemojis that are syntactically integrated into a sentence and âreplaceâ a specific word or concept, like âloveâ and âpresentâ in (3-a), âhappyâ in (3-b), and sometimes even a specific (English) word sound or shape, in rebus-like fashion, like in (3-c).6
(3) | a. | keep doing what you need to do, u bro if I was in Detroit Iâd give you a . |
b. | Our project eventually succeeded, and I felt very (Tang et al. 2020) | |
c. | In the of her hand. (Scheffler et al. 2022) |
With the above restrictions and caveats in place, we are left with the claim that a significant portion of emoji uses, the exact boundaries of which remain vague, but including many uses of emojis for animals, plants, objects, activities, hand gestures, people, facial expressions, are wholly or primarily pictorial. In this paper I will explicate in some detail what a pictorial semantics (and pragmatics) for emojis might look like.
The paper is structured as follows. In Section 2 I propose a formal semantic account of pictorial content in terms of geometric projection. In Section 3 I explain how the rather minimal projective semantics is enriched in the context of a discourse, building on insights from the study of linguistic discourse structure and coherence relations. In Section 3 I address what I call the pictorial overdetermination challenge: an emoji like depicts a certain type of two-door red car, but can be used to denote cars of any color, make and model. The solution I propose invokes a pragmatic process of figurative interpretation, extending the basic projective contents via metaphoric and metonymic interpretation. In Section 5, finally, I turn to the second fundamental challenge facing a pictorial account: how to explain the apparent expressivity of face and hand emojis? I propose an extension of the Kaplanian account of use-conditional meaning to pictures and then show how the use-conditional pictorial content of an emoji naturally combines with the use-conditional content of the depicted facial expression or gesture to yield the observed expressive use of face and hand emojis.
Pictures are representations. They represent the world as being a certain way. Hence, just like utterances can be true or false with respect to a given world, we could say that a picture is true or false with respect to a certain worldâmore colloquially, that a picture is an accurate or inaccurate picture of (a part of) that world. If we can capture a workable notion of pictorial truth we can define the semantic content of a picture as the set of worlds that the picture is true ofâin the same way that we also define the content of an utterance in possible worlds semanticsâin order to get a proper investigation of the semantics and pragmatics of pictures off the ground.
Intuitively, pictorial truth, unlike linguistic truth, is a matter of resemblance: a given picture is true of a world iff it resembles part of that world. On reflection, resemblance has turned out to be too vague, and arguably neither sufficient nor necessary for pictorial truth (Goodman 1976; Greenberg 2013). The geometrical notion of a projection function has been used with some success as a replacement for resemblance in pictorial semantics (Abusch 2020; Greenberg 2021).
More technically, a geometric projection is a recipe for turning a 3D scene into a 2D pictorial representation of that scene. Itâs a function, Î , mapping a possible world w and a viewpoint v (formally, a vector located at a certain spatiotemporal location, intuitively representing the gaze direction of some viewer/camera located somewhere in w) onto a picture p: Î (w,v) = p.
There are many different such recipes that qualify as geometric projection functions. One well-known example Î takes the world and the viewpoint, and (i) puts a white picture plane perpendicular to the viewpoint direction vector, (ii) draws all âprojection linesâ connecting some part of an edge of an object in the world towards the viewpoint origin, and (iii) marks in black wherever the projection line intersects the picture plane. This procedure will generate a linear perspective, black and white drawing.
(4) |
More complicated projection functions might include rules for representing colors, distinguishing edges and surfaces in the world, or some additional distortion, abstraction, and stylization transformations to create depictions that deviate more or less from âphotorealisticâ projection in different ways.
(5) |
When we know how to turn some part of the world into a picture, using a Î and a v, we can define when a picture is true:
(6) |
Here weâre assuming the projection function Î to be fixed, that is, provided by the context, just as in the linguistic domain we assume the language to be given pre-semantically, that is, before computing the truth value of an utterance. In other words, we can think of the projection function as the pictorial analogue of a language (Giardino & Greenberg 2015; Greenberg 2013).
From the pictorial truth definition in (6) we can define various candidate notions or levels of pictorial content. A natural analogue of classic propositional content results from existential closure over the viewpoint: given a pictorial language Î , a picture expresses the proposition that there is a viewpoint from where the world projects onto that picture:
(7) |
Alternatively, for different purposes we may need other notions of content, for example, analogues of centered/diagonal propositions (sets of worldâviewpoint pairs, Rooth and Abusch 2017) or horizontal contents (sets of worlds, i.e., assuming a fixed, contextually given viewpoint). Iâll eventually introduce a dynamic semantic notion of pictorial content, in terms of information states.7 To avoid potential difficulties delimiting and quantifying over the space of possible pictorial (or linguistic) languages, we will assume that a Î is pre-semantically given. How an interpreter arrives at this Î is then a matter of contextual pragmatic inference that we will only talk about informally.
Finally, it is worth noting that the purely projective pictorial content defined in (7) is a rather minimal notion of semantic content, what Kulvicki (2006) calls âbarebones contentââto be further enriched to what he calls âfleshed out contentâ (see Greenberg 2021 a related view involving different levels of pictorial semantic content). In this paper I side with Abusch (2020) and stick with the barebones semantics in (7). I relegate the more fleshed out content derivation to the levels of discourse processing (see Section 3) and pragmatics (see Section 4).
If emojis are pictures, they are not very ârealisticâ, but rather âabstractâ or âstylizedâ. In the geometric projection framework the differences between, say, a simple line drawing and a full color photograph can be thought of as corresponding to different parameter settings inside the projection function. A line drawing projection ignores colors, shadows, and other properties of surfaces, and instead focuses only on (clear, relatively sharp) edges of objects. Qualitatively very different scenes (solid blue cube on wooden table lighted from above, transparent glass cube on metal surface lighted from the left, etc.) could thus give rise to the same abstract line drawing.
Now, apart from ignoring surface textures, opacity, colors, and shadowsâletâs call this âabstractionââa typical line drawing also simplifies the geometry of the edges that the basic linear projection algorithm would give us. Letâs call this âstylizationâ. By stylization, slightly crooked edges and small imperfections might be represented by perfectly straight lines on the picture plane. We could also stylize our depiction by approximating any shape projected on the plane with the closest simple polygon (with less than 37 sides, say).8 Weâll consider such approximative geometric transformations part of the projection function (Abusch 2012; Greenberg 2021). With a properly abstract and stylized projection function, a simple wire cube drawing like would be true of not just different geometric worlds where weâre looking at an actual floating Platonic cube (of arbitrary color and size), but also of worlds like ours where weâre looking at a shape that is roughly cube-like, like a sugar cube, a Rubikâs cube, or a dented cardboard box.
If emojis are picturesâas I maintainâthey are clearly more like line drawings than like photos, with the relevant projection function involving multiple types of abstraction and approximative stylization transformations on top of a basic linear projection algorithm. Take a common object emoji, like the âwrapped presentâ emoji. Here are a few instantiations of this emoji in different emoji sets.
(8) |
Apple appears to be using a more photorealistic type of projection and HTC a more stylized one. More specifically, in our projective semantics terminology, we would say that Appleâs projection function, Î Apple, seems to involve a standard linear perspective;9 uses a range of different colors to mimic a smooth, somewhat shiny lightbrown or gold surface; and marks shadows and shiny edge highlights as if light is falling on the object from top left.
The OpenMoji projection seems to involve more abstraction and stylization. The box and the bow for instance are entirely symmetrical, suggesting that Î OpenMoji ignores the precise shape and location of the bow in the basic projection image in favor of an approximation. There are only a few colors, again suggesting approximation, and all edges are marked uniformly in thick black lines. On the other hand, the dropdown shadow from the lid is still preserved. The type of perspective in OpenMojiâs projection remains unclear because a full frontal viewpoint seems to have been chosen.
HTC, finally, uses the same canonical viewpoint as OpenMoji and builds similar color, texture, and symmetry abstractions into its projection function, though with a slightly different edge processing, and now even ignoring all shadows.
Note that these informal (abductive) inferences about the nature of the three Î âs, drawn on the basis of just one image each, are all defeasible: Apple might in principle have intended to depict a crooked, multicolored box through a parallel method of projection, without shading; and HTCâs projection function might be sensitive to shadows and shading but we donât notice because the scene had a light source near the viewpoint, etc. In fact, all three pictures might in principle be high resolution photographs of more or less abstract drawings of objects (Greenberg 2013; Kulvicki 2013).
As discussed also at the end of Section 2.1 weâll ignore this general projection uncertainty and assume that context, common sense and experience with pictures of common objects pragmatically (or in any case, pre-semantically) narrow down the space of possible parameter settings to something like the assumptions laid out above.
At the HTC level of stylization and abstraction we might be tempted to consider an alternative account, as suggested by an anonymous referee, viz., that emojis are diagrams. That is, denotes a wrapped gift box in roughly the abstract but still iconic way that overlapping, partly greyed out circles in a Venn diagram might denote that all men are mortal. So how exactly do diagrams differ from pictures, icons, and symbols?
Giardino and Greenberg (2015) define iconicity as representation by virtue of âa kind of âdirectâ or ânaturalâ correspondence between the spatial structure of the sign and the internal structure of the thing it represents.â Pictures fall under this definition, with the relevant correspondence provided by (the inverse of) the projection function. Importantly, pictorial correspondence is essentially viewpoint-dependent: a given linear perspective photograph may be a true depiction of a world from some specific viewpoint, but false from another viewpoint. According to Giardino and Greenberg, such viewpoint-dependence is what sets pictures apart from other icons, most saliently diagrams: a Venn diagram like (9) may convey, in virtue of a natural correspondence between circle overlap and set intersection, the proposition that all humans are mortal, regardless of any specific viewpoint or perspective.
(9) |
Back to emojis. In the description of the projective content of the HTC wrapped present emoji in 2.2 Iâve assumed that itâs a stylized depiction from a canonical, frontal viewpoint. One might argue that if we incorporate such a fixed full frontal viewpoint into Î HTC weâd technically end up with a viewpoint-independent projection, which by Giardino and Greenbergâs (2015) definition might already technically put us outside the pictorial domain. However, viewpoint-independence doesnât seem to suffice to make an icon a diagram, for then interactive VR games or 3D marble sculptures would be diagrams as well.10 Following Hagen (1986) I will extend the label âpictorialâ to include forms of projection that stipulate a canonical viewpoint. This leaves it an open question whether , while clearly iconic, is best thought of as a picture or a diagram.
Perhaps the diagrammarianâs case is strongest for face emojis. In our terminology, and are iconic in the sense that they denote facial expressions and/or emotionsâwe return to this matter in Section 5âby virtue of a ânaturalâ correspondence between shapes in the sign (specifically the shapes of mouth and eyes) and properties of the speakerâs face and/or emotional state. Although this paper is primarily concerned with defending this iconic account against symbolic accounts, let me briefly explain my reasons for going one step further and pinning down the correspondence in question as pictorial, as cashed out in terms of the fairly well-understood geometric notion of a projection function.
First, to the extent that it makes intuitive sense to consider Appleâs colorful and detailed a picture, it makes sense to try and extend that approach, if possible, to more abstract emojis like and other emoji sets like HTCâs, and perhaps even more abstract representations like emoticons :). On the account presented in 2.2 above, and simply exemplify different pictorial dialects, characterized in terms of projection functions with different types of stylization and abstraction built in. Going in the other direction on the abstraction scale, we already noted in Section 1 that the pictorial account also provides an intuitive link between emojis and, say, Whatsapp stickers or gifs that often include drawings, photos, or videos that are uncontroversially projective and viewpoint-dependent in nature.
In Section 1, while comparing our pictorial account with the symbolic alternative, we also mentioned two other potential advantages of the pictorial view: (i) it correctly predicts the flexibility and creativity of emojis (e.g., denoting a cello performance, or denoting a wide variety of conceptually unrelated emotions and activities associated with a face that looks like that); and (ii) it can easily make sense of the rise of skin-tone and gender modifiers. It is not obvious whether a diagrammatic account would be similarly well positioned to explain these two phenomenaâthe explanations sketched earlier at least require a much more vision-like type of formâmeaning correspondence than we find in, say, Venn diagrams (for which, indeed, creative interpretations and skin-tone/gender modifiers seem rather unlikely).
It all comes down to how exactly the diagrammarian spells out the ânaturalâ correspondence between emoji and denotation in a way that is not projective or pictorial but instead distinctly diagram-like. The deeper problem is that itâs not clear what counts as âdiagram-likeâ. Beyond detailed accounts of some specific logical and mathematical diagram systems (Shin, Lemon, & Mumma 2018), there is, as far as Iâm aware, no general, formally precise, positive characterization of diagrammatic representation. Hence, the view that (some) emojis are diagrams is less informative than describing geometric projection functions with stylization, abstraction, and canonical viewpoints. In this paper Iâll henceforth restrict attention to the pictorial view.
The pictorial semantics I have proposed for emojis is incredibly minimal:
(10) |
As we saw in 2.2, already some defeasible presemantic reasoning about the underlying Î Apple is required to get even this much. A lot more pragmatic reasoning is needed to turn this basic pictorial content into something worth adding to an actual tweet or text. I follow Grosz et al. (2021) and Kaiser and Grosz (2021) in appealing to coherence and discourse structure as a crucial factor in the pragmatics of emojis, but my reliance on pragmatic enrichment will be somewhat more radical, in part due to my much more minimal, pictorial semantics.
Hobbs (1979) famously proposed a systematic theory of discourse interpretation where maximizing coherence is a driving force behind various pragmatic inferences in communication and textual interpretation. Consider the simple discourse in (11):
(11) | I missed another Zoom meeting this morning. My internet was out. |
We donât merely interpret this as a conjunction of two eventualities occurring (missing a meeting and the internet being down), but almost automatically infer some kind of causal link between the two: I missed a meeting because my internet was out. Depending on the nature of the eventualities described we can infer different relations between them. While in (11) we inferred a relation commonly known as Explanation, in (12) weâll likely infer a different one called Result.
(12) | I missed another Zoom meeting this morning. They fired me. |
There are a number of more or less formalized theoretical frameworks describing the inference of these so-called coherence relations (Asher & Lascarides 2003; Kehler 2002; Mann & Thompson 1988). In all of them it is assumed that there is a certain finite number of such relations, ultimately grounded in âmore general principles of coherence that we apply in attempting to make sense out of the world we find ourselves inâ (Hobbs 1990). Following Pagin (2014) and Cohen and Kehler (2021) I refer to coherence-driven inferences as a form of âpragmatic enrichmentâ of the more minimal underlying semantic content.
The most comprehensive coherence theory, that is also immediately compatible with the formal semantic machinery weâve introduced thus far, is called Segmented Discourse Representation Theory (SDRT, Asher and Lascarides 2003). In SDRT, discourse relations like Result, Explanation, Contrast, Background, and Narration are represented at a level of discourse representation that extends a given dynamic semantic framework, typically Discourse Representation Theory (DRT, Kamp 1981). The relata are elementary discourse units, typically corresponding to sentences or clauses that express propositions, typically describing the existence of certain events or states (Davidson 1967). Using special propositional discourse referents (Ï1, Ï2, âŻ) to label these elementary units and using DRT to represent their semantic contents, we get Segmented Discourse Representation Structures (SDRS) like (13)
(13) |
In this traditional box notation, the outer box is an SDRS proper. It describes two discourse units, Ï1 and Ï2, as related by the Result relation: Ï2 is the result of Ï1. The smaller boxes are DRSâs, they represent the contents of the individual discourse units. The first discourse unit, Ï1, corresponding to the first sentence of the discourse in (12), is characterized by this DRS box as (i) contributing two discourse referents, viz., an event e1 and an individual x, and (ii) ascribing a number of properties and relations to these discourse referents, viz., that e1 is an event of missing, that the agent of e1 (the person who is missing something) is i (a special indexical discourse referent picking out the actual speaker), etc.
The model-theoretic interpretation of coherence conditions in SDRT can be formalized as an extension of the semantics of DRT, which is a dynamic extension of first-order logic. For instance, Narration holds between two units iff the information carried by both units is true (or, more dynamically: if both units update the common ground consecutively) and the main event described by the first unit (eÏ1) immediately precedes (or âoccasionsâ, notation: âș) the main event described by the second. Note that this semantics presupposes that both units introduce a main event. With standard (S)DRT notation, that is, KÏ is the DRS box associated with unit Ï; and â stands for DRS merge,11 the DRT way of dynamic information updating (spelled out at the representational level of the DRS).
(14) |
In words, (14) says that interpreting two units (Ï1,Ï2) joined by Narration means roughly that we add the information of both units together (i.e., we join the sets of discourse referents they introduce, and we join the sets of conditions they impose on them), and then add a new condition that says that the main event described in the first unit immediately precedes that described in the second.
The introduction of coherence relations in the discourse interpretation process (i.e., the step by step construction of an SDRS from a sequence of utterances) is guided by a global constraint that seeks to maximize overall discourse coherence (i.e., add as many coherence relations as possible) and a number of defeasible pragmatic inference rules. For instance, if one unit Ï1 introduces a state and a subsequent (structurally accessible) unit Ï2 introduces an event, then all else being equal we can add âBackground (Ï1,Ï2)â to the SDRS under construction. Weâll skip over all details of context change composition in the model-theoretic semantics, accessibility, complex discourse units, etc.âsee Geurts, Beaver, and Maier (2020) for a gentle introduction to DRT and SDRT, and Asher and Lascarides (2003) for details.
Weâre interested in the application of coherence theory, and SDRT in particular, as a way of modeling pragmatic enrichment with partly or wholly pictorial discourses. First letâs rephrase Maier & Bimpikouâs (2019) DRT style analyses of purely pictorial narratives like (15) into the SDRT framework, by viewing the panels as elementary discourse units.12
(15) |
The basic assumption behind Maier & Bimpikouâs (2019) PicDRT is that pictures are like elementary discourse units, that is, they express information about what the world looks like. As outlined above, to interpret a sequence of propositional unitsâpictorial or linguisticâas a coherent narrative means that we infer coherence relations. In this case, and in many panel transitions in many comics, the inferred coherence relation defaults to Narration: the policeman is chasing a squirrel and then he catches it.
(16) |
The first thing to note is that, unlike in competing dynamic semantic accounts of pictorial discourse (Abusch & Rooth 2017; Wildfeuer 2019), the semantic representation in (16) literally contains pictures as constituents of its DRS boxes. This is in line with the original motivation of (S)DRT as a model of human cognitive discourse processing, with the (S)DRS approximating a dynamically changing structured mental representation âin the hearerâs headâ (Kamp 1981). The idea behind PicDRT representations like (16) is that human mental representations can be partly symbolic (as modeled by discourse referents and first-order logical formulas like âagent (e1, x)â), but also partly iconic and even pictorial (as modeled by the inclusion of picture conditions).
Zooming in on the pictorial DRS components in (16), Maier & Bimpikou (2019), inspired by Abusch (2012), add a âsyntacticâ level of processing where pictures are labeled with viewpoint referents (v1, v2), and what they call salient regions of interest in the picture are labeled with individual discourse referents (x1, y2). A preliminary DRS representation of the first panel, with 2 salient regions introducing discourse referents, would be model-theoretically interpreted as in (17). Note that f is a partial assignment function, mapping discourse referents onto elements in the modelâs domain (i.e., f maps x1 to an individual, v1 to a viewpoint vector, and e1 to an event). The (dynamic) semantic content of a DRS is an âinformation stateâ, that is, the set of worldâassignment pairs that verify the DRS (Nouwen, Brasoveanu, van Eijck, & Visser 2022).
(17) |
Paraphrasing informally, the DRS in (17) contributes the information that (i) there is a certain viewpoint from which the world looks like the whole picture (i.e., Î (w, f (v1)) = ); (ii) thereâs an individual that, when projected from that same viewpoint, looks like the bluish region (i.e., Î (f(x1), f(v1)) = ); and (iii) thereâs another individual that looks like the smaller brownish region (i.e., Î (f(y1), f(v1)) = ).
At a post-semantic level, based on general world-knowledge, genre, and background information about what things look like under common projections, properties and relations may be freely predicated of these discourse referents (e.g., âpoliceman (x1)â), as a form of âfree pragmatic enrichmentâ (Recanati 2010). Moreover, different discourse referents from different pictures can be equated (e.g., x2 = x1âa free pragmatic pictorial analogue of anaphora resolution, Abusch 2012). In addition to the discourse unit labels (Ï1, Ï2) and coherence relations (âNarration(Ï1, Ï2)â) sketched in (17), we further add to the post-semantic enrichment stage the introduction of event discourse referents (e1,e2âŻ). Note that this last enrichment is crucially driven by the semantics of Narration, which, as defined in (14), presupposes that both units introduce an event discourse referent.
(18) |
The model-theoretic interpretation of (18), the post-semantically enriched version of the SDRS in (17), is a straightforward enrichment of the information state in (17) involving only standard DRT and SDRT semantics (but I will skip over the formalities here).
In sum: the semantics proper of a single picture is very minimal, basically, âthe world looks like this at some point in space and timeâ. When presented with a few pictures in a seemingly deliberate sequence we go beyond mere conjunction of those minimal propositions (the world looks like this at some point and like that at some point), just like we do when presented with a series of utterances.13. The sequencing thus triggers a cognitive enrichment process that crucially involves the inference of various coherence relations in order to satisfy a global desire for maximal coherence.
Since the coherence-driven enrichment mechanism (here formalized in SDRT) thus applies uniformly to verbal and visual discourse, it should be well suited to modeling multimodal mixtures, like comics with textual elements (Wildfeuer 2019), illustrations with captions (Rooth & Abusch 2019) or tags (Greenberg 2019), and film (Cumming, Greenberg, & Kelly 2017; Wildfeuer 2014). If emojis are pictures, semantically, this same machinery should help us enrich the communicative content of emojis in relation to the surrounding text and/or other emojis.
Weâve assigned a minimal, pictorial content to object emojis like that can be roughly paraphrased as âthere is a viewpoint near where there is some object that looks like that.â This semantic content is a proposition, or, if we follow the DRT approach sketched in the previous section, the dynamic equivalent of a proposition, viz. an information state.14 Following Lascarides and Stoneâs (2009) original analysis of speechâgesture integration, but more directly following Grosz et al.âs (2021) analysis of activity emojis, these propositional emoji uses can be analyzed as discourse units in their own right, alongside the textual units.
(19) | Ï1: Happy Birthday! | Ï2: Iâm coming over this afternoon | Ï3: |
Maximizing coherence means inferring coherence relations between these discourse units. Ï2 contributes the existence of an event of the speaker coming over in the afternoon of the utterance day, while Ï3 contributes, roughly, the existence of a gold/brown box with a red bow at some point in space and time. The conjunction of those two pieces of information as such is not a coherent discourse, so we infer a relation, probably a causal relation (the box is the reason for the visit), which in SDRT would be called âExplanationâ:
(20) |
The semantics of Explanation, like Narration, (14), demands two events, and says that the second unitâs main event causes the firstâs. Thus, the inference of an Explanation relation (to increase coherence), triggers the further inference that the picture depicts not just what the world looks like, but contributes an event. But how does a picture of a box depict an event?15
At this point I defer to what cognitive scientists call general cognitive schemas, scripts, or frames (Fillmore 2008): things that look like that, that is, nicely wrapped boxes with bows, typically contain gifts, and gifts are typically quite saliently involved in events of giving and receiving. Note that this is the same reasoning as what gave rise the inference that the entity depicted by the mostly blue shape in the comic in (15) is (probably) a police officer and that the inferred position of his arms legs are (probably) snapshots of him running (rather then assuming a weird pose and floating in the air).
With all the defeasible pragmatic enrichments above, the coherent interpretation of the tweet in (19) now looks something like this:
(21) |
Paraphrasing the interpretation of the SDRS in (21): thereâs an event of the speaker coming over and an event of giving a present that looks like this, , and the latter event explains the former.
But does the gift really have to look just like that, that is, in a gold-colored box with a red bow? Of course, weâre assuming that Appleâs projection function includes some abstraction and stylization, leading to some indeterminacy about the actual size, shape, color, lighting, and background of the box that is depicted, but what if itâs a blue box with a yellow bow, or a ball wrapped in newspaper without a bow, or even just an electronic gift certificate? As it stands, the pictorial account would make the speaker a liar16 if she meant to give such a gift, which is obviously absurd. The emoji can be used for almost any kind of gift, it doesnât really have to look like this . I consider this the fundamental challenge for a truly pictorial account of emojis, and I address it in the next section.
To illustrate our discourse semantics framework once more, consider another example tweet:
(22) | Iâm coming over this afternoon |
In this variation of (19) the emoji still clearly counts as a separate discourse unit, but now the connection is likely one of Elaboration (the event of my coming over involves a car) rather than Explanation.17
(23) |
In words: thereâs an event of the speaker coming over and that event includes the event of driving a car that looks like this. The phrase âlooks likeâ as always has to be understood in terms of projection.
Since Appleâs emojis tend to have various different colors that in many ways seem to reflect the actual colors of the depicted objects with some degree of faithfulness,18 we might reasonably assume that Appleâs projection function approximates actual color (within a finite set of fixed color codes). But then, counterintuitively, (22) would be false if the speaker drives a silver car.19
A first attempt at addressing this color mismatch problem in particular is to assume that apparently color is not in fact preserved in the relevant projections. Instead, just like weâre already abstracting away from details of shape, texture, shadow, etc., the Apple projection apparently also ignores most colors and just maps certain real-world surfaces to a default red. Itâs tricky to define exactly what colors get mapped to red, and which to black, white, and to the various shades of grey and or blue that occur in this particular car emoji, and how these color marking rules should be adjusted for different categories of objects and their emoji depictions. These may just be technicalities, in which case itâs worth noting that the resulting projection would intuitively count as a pictorial mapping, in keeping with our starting point that the car emoji is a car picture.
However, this leaves us with a parallel problem of shape mismatches. What if the intended present is a round object wrapped in newspaper? Or the car is a big black 4 door BMW that looks nothing like ? Extending the color solution outlined above to shapes would mean that Appleâs projection specifies a fixed shape and then effectively maps every car to that shape. Now note that such a projection would be essentially concept-based in the sense that anything that falls under the concept of âcarâ, no matter what it looks like, gets mapped onto the car emoji. This would take us away from pictorial representation and well into symbolic territory. In fact, this projection function is literally just the inverse of the lexical semantic meaning of the English word car, so weâd end up with a symbolic rather than an iconic account.
Before I present a solution to the overdetermination challenge that avoids going symbolic like this, letâs first explore this alternative.
As alluded to in Section 1, the apparent default view of emoji semantics treats emojis as essentially an extension of the lexicon of a certain genre of written language. In (Grosz et al. 2021: 348), for instance, activity emojis are said to âserve as free-standing event descriptions, whose core argument is anaphoricâ. Although they try to remain agnostic on the details of the lexical entry of an activity emoji like (because it has to cover both playing the violin and being a violinist, among other things), their notion of an event description that contains an argument that is moreover anaphoric, points in the direction of a language-like, that is, symbolic, lexicon extension, rather than a strictly pictorial semantics like the one Iâm defending here.20
Interestingly, when we look at the variety of uses of our limited set of emojis, it becomes apparent that the symbolic approach also runs into an overdetermination problem. While it avoids overdetermining what the object looks like, it overdetermines its conceptual classification. In some cases, really does denote something that looks like that without being a gift. For instance, in the tweet in (24) the advertiser is most likely creatively using the âwrapped presentâ emoji to elaborate on the event of collecting parcels, relying on the fact that parcels are often boxes that kind of look like that.
(24) | We deliver in South Africa via pep store for R59.95 or you can collect your parcels at Ferndale, Randburg |
A symbolic account that treats as a word thatâs a synonym of âgiftâ or âwrapped presentâ or that otherwise ties the meaning symbolically to the concept of a gift,21 would predict that (24) would be infelicitous if the author intended to include people simply ordering stuff for themselves rather than as a gift.
In this particular case, the symbolist might object that the use in (24) is infelicitousâthe author should have chosen to illustrate the general concept of parcels or delivery.22 But the same kind of creative usage of course happens when there is no better alternative emoji. For instance, on a symbolic account, probably denotes something involving a violin. But that would exclude uses where it denotes playing a viola, or a cello (neither of which have their own dedicated emoji).
(25) | Double-cello action in #Arenskyâs beautiful quartet has inspired our two cellists to treat the audience to some bonus duos⯠|
In light of (25) the symbolist might propose âbleachingâ their lexical entry to accommodate cellos (e.g., to âbowed classical instrumentâ), but there will always be new, unforeseen use cases that donât quite fit any proposed lexical definition (e.g., bring your ukulele! ). Weâll discuss a more extreme case in the next subsection and suggest an appeal to metaphoric and other figurative meaning extensions to deal with such emoji usages. This appeal is in principle open to both symbolic and pictorial accounts, but Iâll argue that it works best with the pictorial account.
A good illustration of the flexibility of emoji meaning involves the well-known use of and to refer to somewhat taboo body parts and events involving them. On a symbolic account, we might theoretically give a lexical semantic interpretation of that includes both eggplants and male genitalia. But probably a more intuitive approach would have eggplants as the literal meaning and derive the other use as a secondary,23 non-literal meaning somehow. But what kind of non-literal meaning is this?
Lakoff (1993) uses the term âimage metaphorâ to describe a metaphorical interpretation based on visual resemblance between the literal meaning (âsourceâ) and the metaphorical interpretation (âtargetâ). He illustrates the phenomenon with linguistic examples like (26):
(26) | a. | My wife⯠whose waist is an hourglass. (André Breton, cited by Lakoff 1993) |
b. | His toes were like the keyboard of a spinet. (Rabelais, cited by Lakoff 1993) | |
c. | The road snaked through the desert (Barnden 2010) |
Note for instance that the waist in (26-a) is not in any way conceptually related to an hourglassâit doesnât help keep time, for instanceâit just looks like one.
Examples like (26) show the need for a general account of image metaphor, wholly independently of emoji or pictorial semantics. Without getting into the details here, weâll assume such an independent account, say Lakoffâs, or, more conveniently integrated within the kind of SDRT framework weâre already using, Agerri, Barnden, Lee, and Wallingtonâs (2007). Now, proponents of both the pictorial and the symbolic account could appeal to that account to explain the common non-literal interpretation of the eggplant emoji. Interestingly, since the symbolist makes the eggplant emoji roughly synonymous with a linguistic utterance of â(thereâs an) eggplantâ we would expect the eggplantâpenis metaphor to occur linguistically as wellâand occasionally it does:
(27) | The Warri pikin took to his IG account this morning to flaunt his eggplant in wet white underwear.24 |
Although examples like (27) are not hard to google, they really are not that common either. In fact, itâs quite possible that many of the linguistic instantiations of this particular image metaphor are derivative on the widespread emoji usage.
On my pictorial account of , we can readily explain why the emoji instantiation is so much more prominent than its linguistic cousin. The the emoji literally tells us what the world looks like, and hence, when we get to the level of pragmatic enrichment, this pictorial content is readily extended with further image-based inferences. In cognitive processing terms, the basic pictorial semantics would predict engagement of the visual system in semantic processing already,25 so we can expect image-metaphoric pragmatic processing to be a natural follow-up, that is, exploiting more cognitive processing terminology, the image metaphoric pragmatics is primed by the pictorial semantics.26
Generalizing beyond , the interpretation process I propose is as follows. The literal semantics of the emoji is projective: from some viewpoint, the world projects onto this, , which means it contains a red car shaped object. With this minimal meaning in hand (e.g., (mentally) represented in the form of a basic pictorial DRS condition), we enter into the realm of pragmatics, which includes various kinds of pragmatic enrichments, including the inference of coherence relations, properties, events, (as described in Section 3) but also, typically, finding a non-literal meaning whenever that fits the context better.2728
In the eggplant case, deriving this figurative meaning involved a rather pure image metaphor, for example, mapping the depicted vegetable to a body part on the basis of a visual resemblance. In other cases, the metaphor may be partly based on conceptual similarity. This is unproblematic for both symbolic and pictorial accounts, because since Lakoff and Johnson (2003) it is commonly accepted that metaphors involve similarities or analogies at the level of semantic content (i.e., mental concepts, in their cognitive semantic framework), rather than at a strictly linguistic level. This means that even if image metaphors are the most natural complements to our proposed pictorial semantics, any other kind of figurative interpretation we find in text or speech can in principle be applied to emojis as wellâafter all, on the projective account, pictures and linguistic utterances express similar semantic contents (viz., possible worlds propositions, or their dynamic equivalents, information states).
Letâs apply this to the examples illustrating the pictorial overdetermination challenge in Section 4.1. In the case of referring to a big black BMW, the interpreter may map the depicted red car shaped object to various makes and models of cars on the basis of a mixture of resemblance and conceptual similarity (viz., all being cars). In the case of referring to an electronic gift certificate, we have to move beyond image metaphor altogether and assume a purely conceptual mapping (from box with bow to gift card, on the basis of both being gifts).
Finally, in addition to metaphor, there are many metonymic uses of emojis that likewise require non-resemblance-based mappings. For instance, (literally) depicts an old-fashioned camera, but can be used metonymically to denote photos, and literally depicts a half avocado, but can be used metonymically to denote avocados generally, or healthy vegan food more generally (28-a), or even the typical consumers of said food (28-b).
(28) | a. | Forever mad at myself for taking so long to go vegan |
b. | Proud to be #Hipster |
To sum up, the pictorial account of emojis suggests a continuity between emoji semantics and image-metaphoric pragmatics, which correctly predicts the widespread use of image metaphors in emoji usage (see ). In addition, pragmatic emoji interpretation on my pictorial accountâas on any symbolic alternativeâmay also invoke (partly) conceptual metaphor, or various forms of metonymy. All kinds of non-literal meanings can be associated with any concept, whether itâs introduced symbolically by a word, or pictorially by a painting, animated gif, sticker, or emoji.
A proper description and formalization of â(vague) resemblanceâ (either projectively or non-projectively), of âconceptual similariyâ, of hybrid image-based/conceptual metaphors, of the metonymyâmetaphor distinction, of the integration of metaphoric meaning extensions in the coherence-driven SDRT account of pragmatic enrichment, and of the conventional entrenchment and eventual lexicalization of stale metaphors/metonyms over time, is all well beyond the scope of the current paper. To defend these substantial omissions I can only point out that accounts of all these phenomena are already independently needed for the proper analysis of any figurative meaning in any kind of language, and hence in no way tied to the interpretation of emojis or pictures specifically.
Up until this point we have been focusing almost entirely on a specific subclass of emojis, viz., those depicting familiar, concrete objects. In actual usage, object emojis however are decidedly less common than emojis depicting expressive parts of the body, especially faces and hands. According to emojitracker, the top 20 emojis include 14 face and 2 hand emojis, and 0 object or event emojis (there is also a recycling symbol and 3 types of heart emojis, which I already put aside as potentially symbolic rather than pictorial in Section 1).
Apart from some âsymbolic modifiersâ (like the heart-shaped eyes in , for which in Section 1 I deferred to Maier 2019), on my account these face emojis are just as pictorial as the object emojis discussed above, or as animated gifs, cartoons, or manga panels. Nonetheless, they are known to interact somewhat differently with the surrounding text. According to Grosz et al. (2021; in press), Kaiser and Grosz (2021), face emojis are expressives, meaning that they are used to express the speakerâs emotional state, roughly the same way verbal expressives do. Thus, the two utterances in (29) mean roughly the same: Kate said that Sue sent the report and I have a negative emotional attitude about that.
(29) | a. | kate said that sue sent the report to ann (Grosz et al. 2021) |
b. | kate said that sue sent the f cking report to ann (Grosz et al. 2021) |
Potts (2007) lists the defining characteristics of expressives: their contribution is hard to paraphrase precisely in non-expressive terms; they are speaker-oriented (âindexicalâ) and (hence) unaffected by embeddings (but in some special cases may be subject to pragmatic âperspective shiftâ in the sense of Amaral, Roberts, and Smith 2007; Harris and Potts 2010) and they are infinitely gradable (e.g., by varying intonation or repetition).
Emojis satisfy Pottsâs characteristics. Regarding (i), the in (29-a) indicates a negative attitude, but the linguistic paraphrase I gave above is just a rough approximation, not by any means semantically or pragmatically equivalent. Regarding (ii), Grosz et al. (2021) show that face emojisâunlike activity emojisâtend to express the emotional state of the producer of the utterance, while activity emojis can be anchored to any salient entity, depending on what connection creates the most coherent output.
(30) | a. | Sueâs on her way now | |
â ⯠and {Iâm/*sheâs} happy about that | |||
b. | Sueâs on her way now | ||
â ⯠and {*Iâm/sheâs} traveling by car |
Whatâs more, face emojis tend to project out of embeddings, while activity emojis can also be interpreted under negation:
(31) | a. | If I had gone, Iâd have missed Ada | |
â Iâm happy (that I didnât go, because now I could hang out with my friend Ada) | |||
âÌž If Iâd gone, Iâd have been happy (because then Iâd have missed that annoying Ada) | |||
b. | By now, Sue hasnât trained for months (Grosz et al. 2021) | ||
â surfing is part of the training29 |
Furthermore, Kaiser and Grosz (2021) show experimentally that face emojis are not always anchored to the actual speaker, but like linguistic expressives may indeed be subject to a perspective shift, for instance in constructions with a salient third-person experiencer argument the face emoji may be interpreted as conveying either the speakerâs or the experiencerâs attitude:
(32) | Anna admired Betty | |
â Iâm glad about that | ||
â Anna has a positive attitude |
Regarding (iii), while face emojis themselves are not as flexible as Wittgensteinâs suggestion of drawing expressive faces by hand, their emotive content can be scaled indefinitely by creating sequences of similar or the same emojis (McCulloch & Gawne 2018):
(33) | Omgggggg heâs so cute |
The linguistic data reviewed above strongly suggest that face emojis are first of all semantically different from the object and event emojis that we discussed in the previous section, and second of all that they seem to be expressives. In this section I reconcile these observations with my primary claim that emojisâface, object, and event emojis alikeâare pictures. This requires that we first get clear on what expressives are and how to analyze them semantically (viz., in terms of use conditions). I then argue that many facial expressions and hand gestures are really expressives, and that face and hand emojis are âuse-conditional picturesâ of such expressive gestures.
Expressivism is the view that some linguistic constructions can express meaningful semantic content that does not contribute to the derivation of truth-conditional content. Philosophers and linguists, more or less independently of each other, have provided expressivist accounts of ethical and esthetic vocabulary, knowledge ascriptions, mental state self-ascriptions, exclamatives, epithets, slurs, etc. What exactly is expressed by these constructions or statements containing them is a matter of debate, ranging from the emotional state of the speaker (Ayer 1936; Potts 2007; Stevenson 1944) to a more abstract semantic notion like use-conditional content (Charlow 2015; Gutzmann 2015; Kaplan 1999; Predelli 2013). While Grosz et al. (2021) opt for a more Pottsian (2007) analysis (defining emotive content in terms of real intervals signifying emotional valence), Iâll introduce and adopt the latter, more minimalistic approach to expressive content, which has the benefit of not forcing the semanticist to make any assumptions about the underlying cognitive architecture of emotions.
The use-conditional analysis of expressives can be traced back, again, to Wittgenstein:
We ask, âWhat does âI am frightenedâ really mean, what am I referring to when I say it?â And, of course, we find no answer, or one that is inadequate. The question is: âIn what sort of context does it occur?â (Wittgenstein 1958)
In other words, expressive utterances are not amenable to a standard compositional semantic treatment in terms of reference and truth. âI am frightenedâ is not so much a (truth-evaluable) assertion about what the world is like, but rather an expression of the speakerâs emotional state. Instead of trying to capture the propositional content, that is, the set of worlds where the sentence is true, we should look for the âcontexts of useâ. While Wittgenstein himself takes this idea much further, turning âmeaning is useâ into a general characterization of linguistic meaning across the board, Kaplan (1999) offers a way to isolate this insight about the meaning of expressive vocabulary and integrate it into an otherwise traditional formal semantic framework.
There are words that have a meaning, or at least for which we can give their meaning, words like âfortnightâ and âferalâ. There are also words that donât seem to have a meaning, words like [âouchâ and âoopsâ]. If the latter have a meaning, theyâre at least hard to define. Still, they have a use, and those who know English know how to use them. (Kaplan 1999)
Gutzmann (2015) works out the details of semantic composition, adding significant extensions to Kaplanâs program. Iâll adopt some of Gutzmannâs implementation and notation below. The general idea is that in semantics we encounter two types of content: descriptive (or truth-conditional) and expressive (or use-conditional). Some expressions carry only descriptive content (âflowerâ, âwalkâ, âeveryâ) and combining them into a sentence will give us its truth conditions, in linguistics typically captured as a possible worlds proposition. In the following weâll use to denote the descriptive content of a term α.
To deal with indexicals, Kaplan (1989) had already introduced a second semantic parameter, c, to the semantics. That way we can model the truth-conditional proposition expressed by an utterance of an expression in a context, (34-b), as well as the more abstract âcharacterâ modelling the descriptive linguistic meaning of the sentence, (34-c). Notation: spc and adc denote the speaker/writer of context c and hearer/reader/interpreter/addressee of c, respectively.
(34) |
As we saw in the quotation above, the starting assumption of Kaplanâs (1999) expressivism was that there are some expressions that do not contribute to this truth-conditional content (or character), but instead express content we can model in terms of use conditions. Take Kaplanâs central example: âOopsâ. While itâs weird to judge âoopsâ as either true or false, we can judge whether a particular âoopsâ was uttered felicitously on a given occasion by a given speaker. For instance, in the context where someone just saw a car run over and kill their familyâs beloved pet, an âoopsâ would be infelicitous, because the word âoopsâ seems reserved for what Kaplan calls âminor mishapsâ, as captured in the use condition in (35-a). We then define use-conditional content as a set of contexts âthose where the expression is felicitously used, (35-b).
(35) | a. | use condition: a use of âoopsâ is felicitously uttered in c iff the speaker just observed a minor mishap in c |
b. | use-conditional content: | |
Gutzmann goes on to set up a type system to model the compositional contributions of hybrid and subsentential expressives, but first he introduces a nice fracture notation for Logical Forms (LF) that puts expressives (and their use-conditional interpretations) on top, and descriptions (and their truth-conditional interpretations) below.30
(36) | |
The full linguistic meaning of a sentence with some expressives is thus a pair consisting of a use-conditional content and a truth-conditional character.31
Emblematic gestures like the middle finger or waving goodbye are sometimes said to be non-verbal expressives (Ebert 2014). Indeed, we can easily verify this by checking off Pottsâs (2007) criteria, the way we already did for emojis above. For instance, the meaning of the middle finger gesture concerns the speakerâs attitude (towards their addressee); it is surely negative but hard to pin down with a purely descriptive paraphrase; and it can be graded continuously by exaggerating or repeating the gesture (or combining it with facial expressions or verbal expressives).32 Since these emblematic gestures are as much conventional, intentional, symbolic, and hence as âlanguage-likeâ as Kaplanâs verbal examples âouchâ and âoopsâ, it makes sense to analyze them semantically on a par, that is, as contributing use-conditional content.
(37) | a. | use condition: use of middle finger gesture is felicitous iff the speaker is very annoyed at the addressee |
b. | use-conditional content: | |
Crucially, as with verbal expressives, there are both felicitous and infelicitous uses of the middle finger. Someone who gives their neighbor the finger to greet her is doing something wrong, or at least breaking an established convention, as is someone who gives someone the finger while she is angry at someone else. Hence, the use-conditional content provided by the definition in (37) is non-trivial and arguably approximates the gestureâs core linguistic meaning.
Now, the same considerations apply to (some) facial expressions. Though a smile, unlike the middle finger gesture, is to some extent more natural and perhaps even culturally universal (Darwin 1872), and not always intentional or conscious, we can still say that it is felicitous if the speaker has a friendly disposition towards the addressee, and infelicitous otherwise.33 Notation: Iâm using an âoverlineâ notation borrowed from sign language studies to denote co-speech gestures.
(38) |
Facial expressions and face emojis both behave like expressives, as we verified by checking off Pottsâs criteria in 5.3 and 5.1, respectively. But only face emojis are at the same time pictures. Iâve proposed viewing face emojis as pictures of facial expressions, which are in turn expressives. But this does not immediately explain why the emojis themselves behave like expressives.
To close the gap between âpicture of expressiveâ and âexpressive behaviorâ we could first try to appeal to some kind of pictorial transparency. A representational system is called transparent iff in that system a representation of a representation of X is itself (interpreted as) a representation of X (Kulvicki 2003; 2006). Some forms of pictorial representation are indeed sometimes viewed as transparent: a drawing of a drawing of a mountain is, arguably, in some cases, also a drawing of a mountain.34 Linguistic description, by contrast, is not usually transparent: a linguistic description of a linguistic description of a mountain is a description of a sequence of words, not of a mountain. What we really need for our current purposes is a cross-medial transparency principle that allows us to infer that a picture of a sentence or gesture expresses what that sentence or gesture expresses.
One complication in arguing for such a principle involves indexicality: a painting of an inscription that reads âI love youâ, if indeed it expresses anything about love, does not necessarily express the painterâs love; likewise, a photo of Ada giving Stella the finger expresses not the photographerâs negative emotion, but (at best) Adaâs. Yet, as argued in 5.1, we need to account for the observation that a use of a face emoji, a picture of a facial expression, expresses the negative emotions of the current speaker. To get this right the transparency-theorist should then stipulate that face emojis depict the speaker, along with stipulating the relevant cross-medial transparency.
I prefer a slightly different route. I propose to extend Kaplanâs and Gutzmannâs distinction between descriptive words (with truth-conditional content) and expressive words (with use-conditional content), to the pictorial domain. While depicts what the world looks like, depicts what the context looks like. More precisely, letâs capture the meaning of an âexpressive pictureâ like in use-conditional terms:
(39) | use condition: a use of is felicitous in c iff c looks like this: |
Instead of saying that the picture is true of a world (and then letting pragmatic enrichment determine where, when, and how in the world things look that way), (39) defines when a picture is felicitously used in an utterance context. Saying that the context looks a certain way should be understood as saying that the world of the utterance context, seen from a canonical viewpoint associated with the utterance context, projects onto the given picture. Iâll assume that each utterance context determines a default, canonical viewpoint, vc, which is the viewpoint that corresponds to someone looking straight at the current utterer.
(40) |
Paraphrasing (40): the use-conditional content of the face emoji is the set of utterance contexts in which the speaker looks like this: .
To be sure, an actual tweeter doesnât always actually wear such a big grin on her face while typing a . What (40) says is that she conveys (in a use-conditional way) to her addressee that she has such a facial expression. In other words, by using the speaker presents herself as looking that way, but as with any form of linguistic communication that presentation may involve some pretense, exaggeration, or even insincerity or deception.35
We can now analyze descriptive text adorned with expressive face emojis as follows:
(41) |
The fact that the picture depicts the context from its canonical viewpoint vc now gives us the observed speaker orientation of face emojis. But we donât yet see any of the expected emotional content in (41).
Instead of appealing to a general stipulation of cross-medial transparency, we can actually derive the emotional content of the emoji pragmatically by simply combining the use-conditional pictorial content in (41) and the use-conditional content of the smile. Letâs consider, step by step, what the use of the emoji in (41) is communicating to the receiver of the text message. Assuming, in Gricean fashion, that the speaker is cooperative, their use of the picture in c must have been felicitous, which entails that the context, or more specifically the speaker, looks like (where âlooks likeâ is understood projectively, relative to Appleâs stylized projection function). The speakerâs looking like that smiley face plausibly entails that the speaker of the context was smiling.36 Finally, by the use-conditional semantics of smiling (a smile is felicitous in c iff spc has a positive disposition towards adc in c, = (38)) we can infer that a (cooperative) speaker that is smiling has a positive disposition towards their addressee. By this chain of rational reasoning, we have effectively composed the use-conditional pictorial semantics of face emojis with that of facial expressions and thereby pragmatically derived exactly the kind of transparency we need.
The two-stage pictorial account of face emojis presented above, though consisting of a few more moving parts than an expressive symbol account like Grosz et al.âs (2023), retains the general benefits of a pictorial account already listed in Section 1. Letâs briefly revisit these benefits, applied specifically to expressive (face and hand gesture) emojis.
First, the pictorial account treats face emojis as continuous with other pictures of expressive facial expressions, like in (42), likewise typically used to express the message authorâs emotional state.37
(42) |
Second, it naturally predicts the kind of creative usage of emojis like , which can be used to indicate a host of psychologically unrelated but outwardly similar looking states, from physical exertion to helplessness, as illustrated in (1) (Section 1). Instead of a multiply ambiguous term, the pictorial account treats it simply as a picture of what the speaker looks like, leaving the exact nature of the underlying physical or mental state unspecified, only to be filled in pragmatically in context.
Third, on lexical expressive accounts like Grosz et al. (2021; 2023) the recent rise of gender-and skin-tone specific emoji variants is quite puzzling. Letâs look at a hand gesture emoji, , since face emojis donât have skin tone specific versions (yet). For the symbolist, literallyâif use-conditionally or emotivelyâexpresses that the speaker is giving approval (see Grosz et al. 2023 for a very detailed account along these lines). But then why would we want a and to evidently express the exact same thing? On the pictorial account the use of a skin tone matched emoji variant is only natural, given the pictorial use condition: is used felicitously by me now if my hand now projects onto . My hand projects more straightforwardly (i.e., with less abstraction and stylization transformations) onto a skin tone matching version than onto a default yellow version, and it arguably doesnât project at all to a clearly mismatched skin tone one. Exactly the features of the pictorial analysis that gave rise to the pictorial overdetermination challenge (Section 4.1) now explain the use of skin tone matching emojis.38
I have argued that emojis can be analyzed quite literally as âlittle picturesâ. Not lexical expressives, typographic gestures, anaphoric event descriptions, or diagrams, but pictures that, like photos or drawings, inform us what the world looks like. I thus proposed a formal semantics of emojis in terms of geometric projection, as used also to model the semantic interpretation of pictures and visual narratives (Abusch 2012; Greenberg 2013).
A lot of the communicative work that emojis do in computer mediated communication, for example, elaborating on eventualities described in the text or expressing speaker emotions, relies on various kinds of pragmatic inferences on the basis of the rather minimal semantic content provided by the geometric semantics. Iâve discussed pragmatic enrichment through the inference of coherence relations and their presupposed events, and through metaphor and metonymy. When it comes to face and hand emojis Iâve discussed how to pragmatically link a use-conditional picture semantics with a use-conditional semantics of facial expressions and other gestures.
On the account developed here, emojis and text can combine to form genuinely multimodal discourse. The textâpicture integration analysis Iâve proposed here immediately extends to the use of other arguably pictorial elements commonly inserted in text messages, like emoticons or ASCII drawings, but also more obviously pictorial elements like stickers and animated gifs. The framework is also partly continuous withâand indeed inspired byâsemantic accounts of multimodal textâimage combinations in more static print media, like comics or instruction manuals. In Section 5 we went beyond those static types of multimodality by looking at the expressive usage of the subclass emojis depicting faces and hands, which is mainly useful in interactive communication like chat, text, or Twitter. Here my account incorporates insights from semantic accounts of expressivity in (spoken) language and gesture.
Many issues in the semantics and pragmatics of emojis remain wide open. On the semantic side Iâd like to gain a better understanding of the large grey area between arguably pictorial emojis (, , ) and arguably symbolic emojis (, ), and about the integration of symbolic and pictorial elements inside a single emoji (). On the pragmatic side Iâd like first and foremost to gain a better understanding of the different types of figurative interpretations that we ultimately have to appeal to to extend the use of emojis beyond the entities, people, or events they literally depict. Since these are big issues that have been deserving of formal philosophical and linguistic scrutiny already independently of emojis I will leave it at this for now.
(i) | I really like you, you fucking asshole. |
This work is supported by NWO Vidi grant 276-80-004 (FICTION). Thanks to Tatjana Scheffler, Dorit Abusch, Patrick Grosz, Sarah Zobel, Dolf Rami and online audiences at Sinn und Bedeutung 25, the Bochum Language Colloquium, and the VICOM workshop at DGfS 2022 for discussion. Huge thanks to two anonymous journal referees that provided extensive and very constructive comments.
1Â Abusch, Dorit (2012). Applying Discourse Semantics and Pragmatics to Co-reference in Picture Sequences. Proceedings of Sinn und Bedeutung, 17, 9â25.
2Â Abusch, Dorit (2020). Possible-Worlds Semantics for Pictures. In Daniel Gutzmann, Lisa Matthewson, CĂ©cile Meier, Hotze Rullmann, and Thomas Ede Zimmerman (Eds.), The Wiley Blackwell Companion to Semantics (1â31). John Wiley & Sons.
3Â Abusch, Dorit and Mats Rooth (2017). The Formal Semantics of Free Perception in Pictorial Narratives. Proceeding of the Amsterdam Colloquium, 21, 85â95.
4Â Agerri, Rodrigo, John Barnden, Mark Lee, and Alan Wallington (2007). On the Formalization of Invariant Mappings for Metaphor Interpretation. Proceedings of ACL, 45, 109â112.
5 Altshuler, Daniel and Julian Schlöder (2021). If Pictures Are Stative, What Does This Mean for Discourse Interpretation? Proceedings of Sinn und Bedeutung, 25, 19â36.
6Â Amaral, Patricia, Craige Roberts, and E. Allyn Smith (2007). Review of the Logic of Conventional Implicatures by Chris Potts. Linguistics and Philosophy, 30 (6), 707â49. Â http://doi.org/10.1007/s10988-008-9025-2
7Â Asher, Nicholas and Alex Lascarides (2003). Logics of Conversation. Cambridge University Press.
8Â Ayer, Alfred (1936). Language, Truth and Logic. Victor Gollancz.
9Â Barach, Eliza, Laurie Feldman, and Heather Sheridan (2021). Are Emojis Processed like Words?: Eye Movements Reveal the Time Course of Semantic Processing for Emojified Text. Psychonomic Bulletin & Review, 28(1), 978â91. Â http://doi.org/10.3758/s13423-020-01864-y
10Â Barnden, John (2010). Metaphor and Metonymy: Making Their Connections More Slippery. Cognitive Linguistics, 21(1), 1â34.
11Â Brasoveanu, Adrian and Jakub Dotlacil (2015). Incremental and Predictive Interpretation. Semantics and Linguistic Theory (SALT), 25(1), 57â81.
12Â Charlow, Nate (2015). Prospects for an Expressivist Theory of Meaning. Philosophersâ Imprint, 15(23), 1â43.
13Â Cohen, Jonathan and Andrew Kehler (2021). Conversational Eliciture. Philosophersâ Imprint, 21(12), 1â26.
14Â Cohn, Neil (2013). Beyond Speech Balloons and Thought Bubbles: The Integration of Text and Image. Semiotica, 2013 (197), 35â63. Â http://doi.org/10.1515/sem-2013-0079
15Â Cohn, Neil, Jan Engelen, and Joost Schilperoord (2019). The Grammar of Emoji? Constraints on Communicative Pictorial Sequencing. Cognitive Research: Principles and Implications, 4, Article 33. Â http://doi.org/10.1186/s41235-019-0177-0
16Â Cumming, Samuel, Gabriel Greenberg, and Rory Kelly (2017). Conventions of Viewpoint Coherence in Film. Philosophersâ Imprint, 17(1), 1â29.
17Â Danesi, Marcel (2016). The Semiotics of Emoji: The Rise of Visual Language in the Age of the Internet. Bloomsbury.
18Â Darwin, Charles (1872). The Expression of the Emotions in Man and Animals. John Murray.
19Â Davidson, Donald (1967). The Logical Form of Action Sentences. In Nicholas Rescher (Ed.), The Logic of Decision and Action (81â95). University of Pittsburgh Press.
20Â Ebert, Cornelia. The Non-at-Issue Contributions of Gestures and Speculations about Their Origin. Slides. Retrieved from authorâs homepage http://www.cow-electric.com/neli/talks/CE-demonstration-stuttgart.pdf
21Â Ekman, Paul, Richard Davidson, and Wallace Friesen (1990). The Duchenne Smile: Emotional Expression and Brain Physiology: II. Journal of Personality and Social Psychology, 58(2), 342â53. Â http://doi.org/10.1037/0022-3514.58.2.342
22Â Fillmore, Charles (2008). Frame Semantics. In Dirk Geeraerts (Ed.), Cognitive Linguistics: Basic Readings (373â400). De Gruyter Mouton.
23Â Gawne, Lauren and Gretchen McCulloch (2019). Emoji as Digital Gestures. Language@Internet, 17(2). https://www.languageatinternet.org/articles/2019/gawne
24Â Geurts, Bart, David Beaver, and Emar Maier (2020). Discourse Representation Theory. In Edward Zalta (Ed.), The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/spr2020/entries/discourse-representation-theory/
25Â Giardino, Valeria and Gabriel Greenberg (2015). Introduction: Varieties of Iconicity. Review of Philosophy and Psychology, 6(1), 1â25. Â http://doi.org/10.1007/s13164-014-0210-7
26Â Ginzburg, Jonathan, Chiara Mazzocconi, and Ye Tian (2020). Laughter as Language. Glossa: A Journal of General Linguistics, 5(1), 1â51. Â http://doi.org/10.5334/gjgl.1152
27Â Goodman, Nelson (1976). Languages of Art: An Approach to a Theory of Symbols. Hackett Publishing.
28Â Greenberg, Gabriel (2013). Beyond Resemblance. Philosophical Review, 122(2), 215â87. Â http://doi.org/10.1215/00318108-1963716
29Â Greenberg, Gabriel (2019). Tagging: Semantics at the Iconic/Symbolic Interface. Proceedings of the Amsterdam Colloquium, 22, 11â20.
30Â Greenberg, Gabriel (2021). Semantics of Pictorial Space. Review of Philosophy and Psychology, 12(1), 847â87. Â http://doi.org/10.1007/s13164-020-00513-6
31Â Grosz, Patrick, Elsi Kaiser, and Francesco Pierini (2021). Discourse Anaphoricity and First-Person Indexicality in Emoji Resolution. Proceedings of Sinn und Bedeutung, 25(1), 340â57.
32Â Grosz, Patrick, Gabriel Greenberg, Christian De Leon, and Elsi Kaiser (2023). A semantics of face emoji in discourse. Linguistics & Philosophy, 46, 905â957. https://doi.org/10.1007/s10988-022-09369-8
33Â Gutzmann, Daniel (2015). Use-Conditional Meaning: Studies in Multidimensional Semantics. Oxford Univeristy Press.
34Â Hagen, Margaret (1986). Varieties of Realism: Geometries of Representational Art. Cambridge University Press.
35Â Harris, Jesse and Christopher Potts (2010). Perspective-Shifting with Appositives and Expressives. Linguistics and Philosophy, 32(6), 523â52. Â http://doi.org/10.1007/s10988-010-9070-5
36Â Hobbs, Jerry (1979). Coherence and Coreference. Cognitive Science, 3(1), 67â90. Â http://doi.org/10.1207/s15516709cog0301_4
37Â Hobbs, Jerry (1990). Literature and Cognition. CSLI.
38Â Kaiser, Elsi and Patrick Grosz (2021). Anaphoricity in Emoji: An Experimental Investigation of Face and Non-Face Emoji. Proceedings of the Linguistic Society of America, 6(1), 1009â23. Â http://doi.org/10.3765/plsa.v6i1.5067
39Â Kamp, Hans (1981). A Theory of Truth and Semantic Representation. In Jeroen Groenendijk, Theo Janssen, and Martin Stokhof (Eds.), Formal Methods in the Study of Language (277â322). Mathematical Centre Tracts.
40Â Kaplan, David (1989). Demonstratives. In Joseph Almog, John Perry, and Howard Wettstein (Eds.), Themes from Kaplan (481â614). Oxford University Press.
41Â Kaplan, David (1999). The Meaning of âOuchâ and âOopsâ: Explorations in the Theory of Meaning as Use. Unpublished manuscript. http://eecoppock.info/PragmaticsSoSe2012/kaplan.pdf
42Â Kehler, Andrew (2002). Coherence, Reference, and the Theory of Grammar. University Of Chicago Press.
43Â King, Alex (2018). A Plea for Emoji. American Society for Aesthetics Newsletter, 38(3), 1â3.
44Â Kulvicki, John (2003). Image Structure. The Journal of Aesthetics and Art Criticism, 61(4), 323â40.
45Â Kulvicki, John (2006). Pictorial Representation. Philosophy Compass, 1(6), 535â46.
46Â Kulvicki, John (2013). Images. Routledge.
47Â Lakoff, George (1993). The Contemporary Theory of Metaphor. In A. Ortony (Ed.), Metaphor and Thought (202â51). Cambridge University Press.
48Â Lakoff, George and Mark Johnson (2003). Metaphors We Live By. University of Chicago Press.
49Â Lascarides, Alex and Matthew Stone (2009). A Formal Semantic Analysis of Gesture. Journal of Semantics, 26(4), 393â449. Â http://doi.org/10.1093/jos/ffp004
50Â Maier, Emar (2019). Picturing Words: The Semantics of Speech Balloons. Proceedings of the Amsterdam Colloquium, 22, 584â92.
51Â Maier, Emar and Sofia Bimpikou (2019). Shifting Perspectives in Pictorial Narratives. Sinn und Bedeutung, 23(2), 91â106. Â http://doi.org/10.18148/sub/2019.v23i2.600
52Â Mann, William and Sandra Thompson (1988). Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8(3), 243â81.
53Â McCulloch, Gretchen and Lauren Gawne (2018). Emoji Grammar as Beat Gestures. Proceedings of the International Workshop on Emoji Understanding and Applications in Social Media 1(1), 1â4.
54Â Nouwen, Rick, Adrian Brasoveanu, Jan van Eijck, and Albert Visser (2022). Dynamic Semantics. In Edward N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/fall2022/entries/dynamic-semantics/
55Â Pagin, Peter (2014). Pragmatic Enrichment as Coherence Raising. Philosophical Studies, 168(1), 59â100. Â http://doi.org/10.1007/s11098-013-0221-8
56Â Pasternak, Robert and Lyn Tieu (2022). Co-Linguistic Content Inferences: From Gestures to Sound Effects and Emoji. Quarterly Journal of Experimental Psychology, 75(10), 1828â43. Â http://doi.org/10.1177/17470218221080645
57Â Peirce, Charles S. (1868). On a New List of Categories. Proceedings of the American Academy of Arts and Sciences, 7(1), 287â98.
58Â Pierini, Francesco (2021). Emojis and Gestures: A New Typology. Proceedings of Sinn und Bedeutung, 25, 720â32. Â http://doi.org/10.18148/sub/2021.v25i0.963.
59Â Potts, Christopher (2007). The Expressive Dimension. Theoretical Linguistics, 33(2), 165â98. Â http://doi.org/10.1515/TL.2007.011
60Â Predelli, Stefano (2013). Meaning without Truth. Oxford University Press.
61Â Recanati, Francois (2010). Pragmatic Enrichment. In Delia Fara and Gillian Russell (Eds.), Routledge Companion to Philosophy of Language (67â78). Routledge.
62Â Rooth, Mats and Dorit Abusch (2017). Picture Descriptions and Centered Content. Proceedings of Sinn und Bedeutung, 21(2), 1051â64.
63Â Rooth, Mats and Dorit Abusch (2019). Indexing across Media. Proceedings of the Amsterdam Colloquium, 22, 612â24.
64Â Scheffler, Tatjana, Lasse Brandt, Marie de la Fuente, and Ivan Nenchev (2022). The Processing of Emoji-Word Substitutions: A Self-Paced-Reading Study. Computers in Human Behavior, 127(1), 1â11. Â http://doi.org/10.1016/j.chb.2021.107076
65Â Schlenker, Philippe (2018). Visible Meaning: Sign Language and the Foundations of Semantics. Theoretical Linguistics, 44(3â4), 123â208. Â http://doi.org/10.1515/tl-2018-0012
66Â Shin, Sun-Joo, Oliver Lemon, and John Mumma (2018). Diagrams. In Edward N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/win2018/entries/diagrams/.
67Â Stevenson, C. L. (1944). Ethics and Language. Yale University Press.
68Â Tang, Mengmeng, Bingfei Chen, Xiufeng Zhao, and Lun Zhao (2020). Processing Network Emojis in Chinese Sentence Context: An ERP Study. Neuroscience Letters, 722(1), 134815. Â http://doi.org/10.1016/j.neulet.2020.134815
69Â Viebahn, Emanuel (2019). Lying with Pictures. The British Journal of Aesthetics, 59(3), 243â57. Â http://doi.org/10.1093/aesthj/ayz008
70Â Wildfeuer, Janina (2014). Film Discourse Interpretation: Towards a New Paradigm for Multimodal Film Analysis. Routledge.
71Â Wildfeuer, Janina (2019). The Inferential Semantics of Comics Panels and Their Meanings. Poetics Today, 40(2), 215â34. Â http://doi.org/10.1215/03335372-7298522
72Â Wittgenstein, Ludwig (1958). Philosophical Investigations. Basil Blackwell.
73Â Wittgenstein, Ludwig (1966). Lectures and Conversations on Aesthetics, Psychology and Religious Belief. Cyril Barrett (Ed.). Basil Blackwell.