Mapping Out Narrative Structures and Dynamics Using Networks and ...

7 downloads 932 Views 6MB Size Report
Mar 24, 2016 - CL] 24 Mar 2016 ... ical setting of “a galaxy far, far away” is presented, the ..... S2, and S3 in Fig. ??) that suggest they are the Expo-.
Mapping Out Narrative Structures and Dynamics Using Networks and Textual Information Semi Min and Juyong Park

arXiv:1604.03029v1 [cs.CL] 24 Mar 2016

Graduate School of Culture Technology, KAIST, Daejeon, Korea 34141 Human communication is often executed in the form of a narrative, an account of connected events composed of characters, actions, and settings. A coherent narrative structure is therefore a requisite for a well-formulated narrative – be it fictional or nonfictional – for informative and effective communication, opening up the possibility of a deeper understanding of a narrative by studying its structural properties. In this paper we present a network-based framework for modeling and analyzing the structure of a narrative, which is further expanded by incorporating methods from computational linguistics to utilize the narrative text. Modeling a narrative as a dynamically unfolding system, we characterize its progression via the growth patterns of the character network, and use sentiment analysis and topic modeling to represent the actual content of the narrative in the form of interaction maps between characters with associated sentiment values and keywords. This is a network framework advanced beyond the simple occurrence-based one most often used until now, allowing one to utilize the unique characteristics of a given narrative to a high degree. Given the ubiquity and importance of narratives, such advanced network-based representation and analysis framework may lead to a more systematic modeling and understanding of narratives for social interactions, expression of human sentiments, and communication.

I.

INTRODUCTION

Recent advances in quantitative methodologies for the modeling and analyses of large-scale heterogeneous data have enabled novel understanding of various complex systems from the social, technological, and biological domains [1]. The field of application is also rapidly expanding, now including the traditional academic fields of cultural studies humanities. It is allowing researchers to obtain novel answers to both long-standing problems by finding complex patterns that were previously hidden. Recent examples include high-throughput analyses of language and literature based on massive digitization of books (e.g., Project Gutenberg [2] and Google Books) and proliferation of social media [1, 3], emergent processes in cultural history [4], and scientific analysis of art [5]. A theoretical data modeling and analysis framework that has attracted attention for cultural studies is networks [4, 6]. Network science attempts to understand the structure and behavior of a complex system from the connection and interaction patterns between its components [7–10]. Owing to its flexibility as a modeling framework, network science has led to a novel understanding of many systems that are not only easily recognizable as a network such as the Worldwide Web [11, 12], the Internet [13], but also those that have been extensively studied in non-network contexts such as biological systems or social organizations [14, 15]. In this paper we propose a network science-based framework for a cultural system that is ubiquitous in society and boasts a long history of study but we believe still can benefit from one: Narratives. Narratives (or stories) are important in that they are the most common way in which we communicate and recount our experiences. The connection between networks and narratives can also be seen in the very definition of the word: The New Oxford

American Dictionary, for instance, defines narrative as “a spoken or written account of connected events.” This suggests that using networks may help us understanding how the various building blocks of narratives are weaved to become a coherent structure for effective delivery of messages and arousal of emotions. This way of thinking about narratives is deeply correlated with an interesting recent movement in literary studies named “distant reading” proposed by Moretti [16–18]. Distant reading is an approach to literature based on processing large amounts of literary data to devise and construct general “models” of narratives to understand them as a class, in contrast to reading each work very closely (hence the term “distant”) to understand it. A model constructed through reduction and abstraction, the reasoning goes, would enable us to grasp the general underlying structures and patterns of a class of complex objects called narrative, as an X-ray machine would allow us to understand the general skeletal features of the human body. To many of us this way of thinking is familiar as the very principle of research in the natural sciences: To understand a system, one collects data and performs statistical analysis based on abstract models to gain an understanding of the general characteristics of the system. A model of a system has the following characteristics. An abstract representation or notion of a system, a model necessarily excludes some features of the system it is representing. A random exclusion of features, of course, is unlikely to result in a useful model; it is important to make a judicious choice on which features to retain and which to exclude so that the model incorporates important or essential of the system. Of course, it is very difficult to know beforehand which is the best choice of features. One practical starting point can be a common description of the system by people, since such a description is already a type of mental representation which can be viewed as a model, however rudimentary. The network

2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many instances when we recount a story, we focus prominently on the characters and their relationships. Take the Star Wars movie franchise, for instance, the top grossing space opera in modern times [19, 20]. Once the generic physical setting of “a galaxy far, far away” is presented, the story progresses via the character’s actions, adventures, and relationships and interactions with others; that Leia Organa and Luke Skywalker are twins play an crucial role in their fate, and the revelation via “I am your father” is perhaps the most memorable scene in the narrative. In addition to these individual relationships, group-level relationships are important as well for the story, such as the Empire versus the Rebel Alliance, the dark side versus the light side of the Force, etc. Examples abound from history: The story of Oedipus that precedes Star Wars in terms of shocking familial revelation; Dexter, a favorite American TV show of one of the authors, is a series of episodes that portray the titular character navigating his social world of his sibling, family, and rival criminals [50] These examples all function as empirical bases for approaching narratives from the network modeling framework. One of the earliest models proposed for narratives in the distant reading philosophy introduced above, in fact, was networks. Moretti applied the network framework to Shakespeare’s Hamlet for detecting specific regions in the plot, and performed many experiments such as extracting specific nodes in the network of characters to observe changes and make comparisons between different networks. Other network-based studies of narrative include the study of the community structure of the character network in Victor Hugo’s Les Mis´erables [21], the social networks of characters based on conversation in 19th-century British novels [22], networks of mythologies and sagas [23–25], and more recently, a technique for dialog detection in novels applied to writer J. K. Rowling’s Harry Potter series [26]. While these serve to demonstrate the scientific community’s interest in networkbased understanding of narratives, most works are limited to the story of the static topological properties of networks found in the said stories, when we know that narratives are essentially dynamically progressing entities, and the text itself is a source of much information that can be used extensively. Given the wide range of analytical and computational tools that constitute network science, we believe there is much opportunity for further studies of narratives using networks that take into account such essential aspects. This paper is intended to be one such attempt inspired by those works, hopefully laying out potential future directions in utilizing network science and computational linguistics for understanding the dynamics of narratives in a systematic manner.

II. A.

MATERIALS AND METHODS

Material: Victor Hugo’s Les Mis´ erables

We analyze Victor Hugo’s novel Les Mis´erables using the methods introduced in this paper. Set around the popular uprising in Paris in 1832 CE, Les Mis´erables is known for its vivid depiction of the conditions of the tumultuous times and intuition into the human psyche via multiple intersecting plots involving richly developed characters [27]. Its main plot follows fugitive Jean Valjean’s trajectory that shows him transform into a force for good while being constantly haunted by his criminal past. During his journey he interacts with many characters, some helpful and friendly, and others antagonistic and hostile. The most important characters of the novel include the following: • Fantine: A young woman abandoned with daughter Cosette early in the novel. She later leaves Cosette in the care of the Thenadiers, who then abuse her. She is rescued by Valjean when Javert arrests her on charge of assaulting a man. • Cosette: Fantine’s daughter, later adopted by Valjean. Under Valjean’s care she grows into a beautiful woman, and falls in love with Marius. • Marius: A young man associated with the “Friends of the ABC (Les Amis de l’ABC in French),” a group of revolutionaries. He is critically wounded at the barricade, but is rescued by Valjean. He later marries Cosette. • Javert: A police inspector in a relentless pursuit of Valjean. After being rescued by Valjean at the barricades and realizing the immorality of the old French system he has served loyally, he commits suicide. • Thenadier : A wretched man who abuses young Cosette. A lifetime schemer of robbery, fraud, and murder, he conspires to rob Valjean until Marius stops him, and gets arrested by Javert. B.

Method: Interacting Timelines and Network Construction

The widely-accepted essential building blocks of a narrative are characters (also called agents or actants), events, and the causal or temporal relationships that weave them together [28, 29]. An interrelated sequence composed of those elements is called a plot which may be viewed as the backbone of a narrative. A narrative may also be broken down into formal units such as acts, scenes, chapters, etc. [30]. Historically there have been many attempts to establish a general form of narrative structure, of which a well-known example is Aristotle’s three-act plot structure theory. It states that Act One

3 presents the central theme and questions, followed by Acts Two and Three that present major turning points and conclusion. Variant forms exist such as the four-act structure theory [31, 32]. While these have existed for a long time and been widely applied, we find it difficult to imagine that there is an a priori reason for all narratives to consist of three or four parts. Then, can we deduce the structure of a narrative from the narrative itself? It appears that the increasing availability of narrative texts in digital format and analytical methods for data analysis offer an opportunity for a new look at narrative structures, and the formulation of a flexible framework that can properly capture the complexity of a given narrative. An interesting pair of concepts helpful for picturing the content of a narrative that serve as the basis for formalism was given by Propp [33] who, while trying to establish a symbolic notation-based formalism for Russian folktales, proposed that narrative content consists of two layers that he labeled the fabula and the sjuzet. The fabula refers to the entire world that contains the narrative, while the sjuzet refers to those elements of that world explicitly presented to the audience. For instance, if the narrative is depicting a man dining with his family in his home, the sjuzet comprises the man and his family (the characters), the act of dining (the event), and his home (the place), while the fabula is all of the above plus the rest of the story world such as the man’s colleagues at work, their concurrent actions and whereabouts, etc. The sjuzet therefore can be considered the part of the story world currently under observation, and the rest of the fabula the part that “operates” in the background. Each component of the fabula may or may not become sjuzet (explicitly presented to the audience) at another point in the narrative, but they are nevertheless indispensable for the consistency of the story world and future plot development via implicit action. We start by representing a narrative as a set of character timelines, basically the record of a character’s appearances in the narrative. The point of appearance is marked in narrative units which can be scenes, chapters, etc., shown in Fig. 1. In our paper we follow the convention used in the construction of the character network from Victor Hugo’s Les Mis´erables in Ref. [21]: two characters are connected in the network when they appear in the same narrative unit. An interaction defined in this fashion would be more general than direct conversations, as it could include a common experience or shared space in addition to a conversation. The narrative format can also present some practical issues in defining an interaction. In a play or movie script, for instance, it would be much easier to identify a conversation between characters as an interaction, which would be more explicit and narrower in scope. An online resource named moviegalaxies provides a collection of social networks of characters in hundreds of movies built in this way, along with static network properties such as the diameters and clustering coefficients [34]. Using this narrower defini-

tion in a novel is potentially problematic: it is difficult to detect conversations in a novel (though some advances have been recently made [26]), but more fundamentally it would miss non-verbal interactions which exist abundantly in a novel. For this reason, it is difficult to state at this point which would be a better approach. Perhaps a comparison study could be illustrating, although it is out of the scope of this work [18]. The rest of this paper is dedicated to exploring what the character network based on Fig. 1 can tell us about the narrative structure and how it progresses. The methodology will be demonstrated using the English translation of Victor Hugo’s Les Mis´erables [35], although it will be clear that the formalism itself applicable to any comparable narrative. Our choice of Hugo’s work is based on its stature as a classic known for a set of richly developed characters [27], familiarity in network science [21], and the free availability of the complete text on Project Gutenberg. Using the original French version would be ideal, but we point out that the network construction according to Fig. 1 is unaffected, and the wider availability of advanced computational linguistic tools for the English language does provide advantages for incorporating the text for enriched analyses that will be demonstrated in the latter part of the work. In Fig. 1 (B) we show the narrative units in Les Mis´erables on several levels. From top to bottom, they are the Chapters (colored according to their Sentiment Polarity Index defined in Sec. II C 1), Books (groups of Chapters), Sequences (groups of Books), and Volume (even larger groups of Books). All but the Sequences, whose definition is given later in Sec. III 4, are by the author’s designation. It is reasonable to assume that the author intended each unit to represent a theme or subplot. The five Volumes of Les Mis´erables, for instance, are titled “Fantine,” “Cosette,”, “Marius,” “The Idyll in the Rue Plumet and the Epic in the Rue St. Denis,”, and “Jean Valjean,” indicating their central character or plot. Since the different narrative units offer a varying degree of resolution of the narrative, one may again choose the one that is most useful for their purposes. For our goal of studying the complexity of Les Mis´erables, however, the five Volumes appear too few; we therefore choose to work with the Chapters (of which there are 365) for most purposes, and the Sequence in a later analysis.

1.

Network Topology and Growth Patterns

From the network of characters built based on the Interacting Timelines of Fig. 1 (A) we can measure various static network properties. But a narrative is essentially a dynamical system that unfolds in time; what interests a reader is how the story is told in time, not necessarily the final, static network of characters. We need to study how the network grows over time and what we can learn about the narrative from it. This is because the network growth is essentially coupled to the narrative flow:

4 A Narrative Units, Interacting Character Timelines, and Character Network Construction

Network Construction

Character Timelines

Narrative Units

Characters

B Valjean Book1

2

Sequence 1

3

4

5

2 Volume 1

6

7

3

8

10

4

11

12 13

5 Volume 2

14

18

19

20

21

22

6

7

8

9

10

23

24

26

11

27 28 29

12

Volume 3

30 32 33 35 36

13 14

15

16

Volume 4

38

39

40

17 18

37

19

20

42 43 44 45 46 4748

21 Volume 5

FIG. 1: Interacting Timeline Framework for Network Modeling of Narratives. (A) Construction of the character network from a narrative. We represent the narrative as a set of character timelines, the record of appearances of the characters in narrative units (e.g. chapters, scenes, etc.). An interaction can be defined as co-appearance in a narrative unit. (B) A narrative unit is not unique. One may use the author’s designation (i.e. the Volumes, Books, or Chapters in a novel) or define a new one such as the Sequence based on the unit-to-unit continuity of character compositions (defined in Sec. III 4). The narrative units in Victor Hugo’s Les Mis´erables are shown here, from the finest (Chapters, top) to the coarsest (Volumes, bottom).

Starting from an empty network in the beginning of the narrative, the network grows as new characters are introduced and interact with others. In this sense, we can say that the temporal growth of the network is intimately connected to the concept of the so-called narrative stages. A common classification of narrative stages includes Exposition, Rising Action, Climax, Falling Action, Resolution, etc., named according to their role and nature [36] . For example, the Exposition stage introduces the characters and the space they inhabit. Once the motives and allegiances of the characters are presented, in the Rising Action the characters begin to struggle against each other until all conflicts are resolved through the later stages. We study the network growth pattern on two levels. First, on the aggregate level, we measure the growth the

number of nodes n and edges m of the network. Second, on the individual character level, we measure two values, appearance a (the number of chapters in which a character makes an appearance) and degree k of the characters.

C.

Method: Sentiment Analysis and Topic Modeling

An analysis focused solely on the network topology leaves out an essential component of a narrative, the text. This is important because a narrative is in essence much more than a record of who-meets-whom; in the form of text, a narrative contains the details that can vary signif-

5

A Sentiment Analysis

B Topic Modeling

D

Negative Words

Positive Words

Hate Pain Sad …

Admire Happy Love …

War

Sentiment Polarity Index (SPI)

-1

Violence

Conflict

Military

Topic 1

+1

Museum

Exhibition

W

T

Topic 2

C Chapter Sentiments in Les Misérables Ch. 12: Myriel, a virtuous man, is introduced.

Ch. 131: Writer muses on religion and cloisters

Ch. 31: Fantine goes on a picnic with friends.

Ch. 259: Cosettes and Marius fall in love.

Sentiment Polarity Index

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6

0

5 0

10 0

15 0

20 0

30 0

35 Chapters 0

Ch. 261: Cosette and Marius part.

Ch. 51: Fantine falls into misery.

Ch. 22: Valjean nearly drowns.

25 0

Ch. 81: Writer describes Battle of Waterloo.

FIG. 2: Sentiment Analysis and Topic Modeling of Narratives. (A) The principle of sentiment analysis. Words associated with positive or negative sentiments contribute towards the Sentimental Polarity Index (SPI) of the text ranging from −1 (most negative) to +1 (most positive). (B) The principle of topic modeling. Clusters of words detected from a set of texts that tend to appear together are identified as the topics. (C) SPIs of the chapters of Les Mis´erables. Vertical gray bars indicate the 21 Sequences of Les Mis´erables. Each sequence is colored according to the sign of the mean SPI of its constituent chapters (blue for positive, and red for negative). We compare the SPI and content for eight chapters in the narrative: Positive chapters depict uplifting characters or events (e.g., introduction of Myriel, a man of great character in Chapter 12) and happy events (e.g., Fantine going on a picnic, Cosette and Marius falling in love, etc.), while negative chapters depict pain and suffering (e.g., Valjean nearly drowning, Fantine in misery, war, lovers parting, etc.)

6 icantly between interactions [32, 33]. In Les Mis´erables, for instance, the nature of Valjean’s relationships to different characters that is at the center of its drama – at the same time a savior and protector to Cosette, and a fugitive criminal to Javert – is wholly missing in the simple appearance-based network. This means that leveraging the actual text of the narrative may lead to a richer and proper understanding of the narrative, which we perform by using some tools developed in computational linguistics. Here we utilize two: The first tool is Sentiment Analysis that identifies the positive and negative sentimental qualities of a text, which allows us to study the sentimental states of character relationships and the build-up and the resolution of tension in the narrative. The second tool is Topic Modeling that identifies the topics inside the novel, which allow us to associate the characters with the topics at different points in the narrative that define the characters’ states, and quantify the impact of events on the characters. 1.

Sentiment Analysis

Sentiment Analysis, also called Mood Analysis or Opinion Mining, is a technique for determining the sentimental qualities of a given text based on the words it contains. Its origin can be traced back to an attempt in the 1990’s to translate written reviews of products into numerical rating scores: To this day it is common to produce a numerical Sentiment Polarity Index (SPI) of a given text that shows its positive or negative quality. Basically it count the words of known positive or negative sentimental states from a text to produce SPI. For instance, words such as “admire,” “happy,” and “love” contribute to the text’s positive sentiment, where as “hate,” “pain,” and “sad” would contribute to its negative sentiment. (See Fig. 2 (A)). We note an interesting connection to the Western literary tradition of the generic division of drama into comedy and tragedy, often stylized using two masks – the laughing that represents Thalia, the Muse of comedy in Greek and Roman mythology, and weeping one that represent Melpomene the Muse of tragedy, also shown in Fig. 2 (A). Here we use the LIWC (Linguistic Inquiry and Word Count) [37] program, one of several available [38], to determine the SPI of the chapters of Les Mis´erables. LIWC actually returns two separate values, π ≥ 0 and ν ≥ 0, for the positive and negative sentiments for the input text, which we combine, for convenience, into a single SPI variable

From Σ = {σ} we can compute SPIs of the characters and character pairs using the timeline framework in Fig. 1 (A). If a character α, for instance, has appeared in Chapters 1, 2, and 100, we define Σ[α] = {σ1 , σ2 , σ100 } to be the SPI set of α, from which we can calculate quantities such as the character’s average character SPI, σ[α] = (σ1 + σ2 + σ100 )/3. The SPI of a character pair is similar: if two characters α and β have co-appeared in Chapters 2 and 100, for instance, their average SPI is σ[α, β] = (σ2 + σ100 )/2.

2.

Topic Modeling

Our second example of incorporating textual information for network-based narrative study is topic modeling. We will see that it allows us to determine the “topical state” of a character at any given point in the narrative and map out a detailed picture of interaction between characters. Topic modeling is a method for extracting clusters of correlated keywords from a set of documents that can be identified as separate “topics” of the texts. The basic idea is presented in Fig. 2 (B) through a tripartite network composed of three layers of nodes: D of documents, W of words, and T of topics. The goal is to find T , essentially “bags of words” appearing often together in documents, from the text data consisting of D and W. Many studies have reported the success of topic modeling in identifying word sets that match the human understanding of groups of texts, and its practical applicability to problems like word sense induction [39–41]. Here we employ the Non-Negative Matrix Factorization (NNMF), famously used for identifying distinguishable parts in images [42–45]. It decomposes the word– document TF-IDF (Term-Frequency – Inverse Document Frequency) matrix M (dim(M ) = |W| × |D|) into QH, product of two matrices Q and H such that dim(Q) = |W| × |T | and dim(Q) = |T | × |D|. The number of topics |T | is an input parameter typically set to be smaller than |W| and |D| [51]. We can then interpret the matrix Q = {qij } as the association strength between word i and topic j, and H = {hjk } as that between topic j and chapter k. Using the framework of Fig. 1 (A) we can again define the character-topic association strength tαk between character α and topic k as follows:

P  σ ≡ log10

 π+1 . ν+1

(1)

Defined in this way, σ > 0 when the text is net positive (π > ν), σ = 0 when neutral (π = ν), and σ < 0 when net negative (π < ν). We now have a set of values Σ = {σ1 , . . . , σc }, where c = 365 is the number of chapters in Les Mis´erables.

tαk ≡ P



hjk

k∈T ,j∈Cα

hjk

,

(2)

where Cα is the set of chapters where character α appears in. It is the normalized sum of all topic-chapters associations hjk from the chapters featuring character α. We used the scikit-learn Python machine learning package to perform NNMF.

7 III. 3.

RESULTS

Network Topology and Narrative Structure

This section is a summary of our previous work [47]. We start by constructing the network of characters based on Fig. 1. The final network of Les Mis´erables contains 63 characters after very minor ones are excluded. Drawing an edge between two characters if they have appeared in a chapter together results in m = 504 edges. The network is shown in Fig. 3. In it, 25.8% of the character pairs are connected, the mean geodesic length is 1.85, the network diameter is 4 (between the pair of Babet and Geborand, and 17 other pairs of relatively minor characters), and the clustering coefficient is 0.77 [52]. Based on the distinction between different stages in a narrative, we can assume that n and m would not simply increase linearly in time but nonlinearly in accordance with the nature of the stages. In Fig. 4 we show the growth of n and m along the narrative time measured in chapters. As expected, the growth is not linear, especially for the number of nodes n. After the first batch of characters are introduced at the beginning of the narrative, there are specific points in the narrative where many new characters are introduced simultaneously (noted S1, S2, and S3 in Fig. ??) that suggest they are the Exposition stages. An inspection of the actual story confirms this: • Stage S1 : Fantine’s friends are introduced as her happy days are depicted. • Stage S2 : Valjean’s former fellow prison inmates testify during the trial of the fake Valjean. • Stage S3 : “The Friends of ABC” (young progressives) are introduced, shown debating various social issues of the day. There is also a stretch of chapters (S4 ) where the network shows little growth. This part largely coincides with Volume 2 (“Cosette”) of the novel composed of those chapters that contain no narrative progression (i.e. the author digresses to discuss the battle of Waterloo, religion, the vagrant children of Paris, etc.) or that show no network growth, being mainly about Valjean and Cosette’s flight from the pursuit of Thenadier while avoiding people in general. Finally, near the end of the narrative at S5, it is the number of edges m that lead the growth of the network while n shows little increase. This – new edges being created between existing nodes without the addition to new ones – implies a convergence of the characters into a common environment: this part in fact describes the scene at the barricade where nearly all major characters (who have been introduced before) converge. In Fig. 5 we show the appearance a and degree k of the individual characters. The final histogram of a is shown in Fig. 5 (A). It has a skewed distribution with many

characters appearing in a handful of chapters and a few characters appearing in many chapters, for instance Marius appears in 122 chapters, Valjean in 121, and Cosette in 97, whereas the mean and the median are 19.3 and 9 respectively, nearly an order of magnitude smaller than the most frequent characters. In Fig. 5 (C) we show the temporal growth of each character’s cumulative appearance. Although Marius and Valjean are similar in the total appearances (122 and 121, respectively), how these values are reached are very different. Valjean first appears in the beginning of the novel, then with regularity until there is a noticeable absence between chapters 160 and 233 (indicated by a plateau). During Valjean’s absence, Marius, making his first appearance in chapter 170, takes the center stage in the novel and appears in almost every chapter until he overtakes Valjean in appearance. This is a direct reflection of the structure of Les Mis´erables: the first part is mainly about Valjean (with Marius absent), the second part is mainly about Marius (with Valjean absent), and the final part features both as major characters. The degree k (Fig. 5 (B) and (D)), on the other hand, differs in interesting ways from a. The three highest-degree nodes are Valjean (k = 43), Cosette (k = 41), and Javert (k = 39), whereas Marius is down to k = 34. The degree therefore captures the nature of the social sphere around a character that appearance alone cannot tell: Valjean is a well-travelled character linking many different spheres of the story world, whereas Marius associates with a narrow pool of characters (namely the young fellow rebels) and his love interest Cosette.

4.

Sentiment Analysis and Narrative Progression

The chapter sentiments are shown in Fig. 2 (C). We also study how the sentiments and content of chapters match: Positive chapters tend to depict uplifting characters (e.g. Myriel, a virtuous man) and events (e.g., Fantine going on a picnic, Cosette and Marius falling in love, etc.), whereas negative chapters depict pain and suffering (e.g., Valjean nearly drowning, Fantine in misery, war, lovers parting, etc.). We also see alternating clusters of positive and negative chapters, indicating a certain pattern of emotional fluctuations. This is reminiscent of an interpretation of narrative as a metaphor for life that fluctuates between contradictory states of harmony and peace, and tension and fear [46]. We also note that the average chapter SPI is σ = 0.06 ± 0.01, i.e. net positive. We believe this is an example of the so-called “Pollyanna effect” referring to a universal positivity bias in human language [3]. We show the sentiments σ for select characters and pairs in Fig. 6. In Fig. 6 (A) we show ten characters – five major (frequently appearing) and five minor (infrequently appearing) – for comparison. While their average values are positive (due to the Pollyanna effect), the joyless Javert is more negative than other main characters such as Marius, Valjean, and Cosette. Nevertheless, ma-

8

VII

Pontmercy

Gillenormand

Fauchelevent

Cosette

Hucheloup

Mabeuf

Enjolras

Thenardier

Feuilly

VI

Valjean

Simplice

Zephine

Tholomyes

Dahlia

Listolier Listolier

Fantine

Fameuil Fameuil

Favourite

I Myriel

Courfeyrac

Grantaire

III

Champmathieu

Bahorel

Combeferre

Brevet

Chenildieu

Prouvaire

Joly

Bossuet

II

Cochepaille

Javert

Marius

Gavroche

Gervais

Baptistine

Babet

IV

Magloire

Brujon

Toussaint

Claquesous

Montparnasse Magnon

Eponine V

FIG. 3: The Character Network of Les Mis´ erables and Its Community Structure. The character network of Les Mis´erables. The node radius is proportional to its degree (the number of its neighbors). The network shows many common characteristics of a social network such as the small-world property and the community structure. The node color indicates the community to which it belongs (we identify seven, labeled I to VII), while the edge color indicates the sign of the cosentiments of the character pair (blue for positive, and red for negative), defined and discussed further in Sec. III 4.

Number of Nodes

Number of Links 500

60

S4 S3

400

S2 40 300

S1

30

S5 200

Number of edges m

Number of nodes n

50

20

100 10

Nodes Edges Links

0

0

50

100

150

200

250

300

0

350

Narrative Time Narrative Time (Chapters)

FIG. 4: Growth Patterns of the Character Networks The number of characters n and the number of edges m in Les Mis´erables grow in a nonlinear fashion, indicating that different stages in narratives contribute differently to the network growth via character introduction or formation of new connections.

9

88

(B)Unweighted Degree Histogram (B) Degree Distribution 8

Number of Characters

77

7

66

6

55

5

7

5

4

33

22

22

11

11 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120

15 15

4

33

0 0

20 20

6

44

0 0

(C) Weighted Deg 25 25

8

10 10

55

00

00 00

55

1010

1515

2020

2525

3030

3535

Appearance

4040

Valjean

Valjean

6000

45

Valjean Cosette

Marius

5000 Javert Boussuet, Marius, Thenardier, Fantine 4000 Courfeyrac

40

100

Cosette

Marius

35

80

30

60

25 Thenardier Courfeyrac 20 Javert

3000

15

2000

40

Fantine Bossuet

100

(F) Weighted Deg

50 120

10

Degree

(D)Degree Cumulative Degree (E)

(C) Appearance Cumulative Appearance (D)

0 300 600 900 1200 1500

Histogram (A) Appearance Distribution

10

20

1000 5 0

0

0

50

100

150

200

250

300

350

Chapters Narrative Time

0

0

50

100

150

200

250

300

350

0

50

Chapters Narrative Time

FIG. 5: Centralities of Characters in Les Mis´ erables Histograms of (A) appearance and (B) unweighted degrees of the characters of Les Mis´erables. The histograms are relatively skewed, with some characters having high values and many having small values. The three most frequently appearing characters are Marius (122), Valjean (121), and Cosette (97), while the three highest-degree characters are Valjean (43), Cosette (41), and Javert (39), and the three highest-weighted degree characters are Valjean (5203), Marius (4148), and Cosette (3977). The discrepancies indicate the differences in the characteristics of their social networks. (C) and (D) are the growths of these quantities for each character, showing the differing points at which the characters are actively depicted.

jor characters experience a wider range of SPIs than the minor ones, which we believe indicates their sentimental complexity. In the figure we see that Valjean appears frequently in both positive and negative chapters, showing his role as the carrier of varying sentimental states, in contrast to short-lived minor ones. In Fig. 6 (B) we show the SPIs of a number of character pairs. Valjean understandably shows a higher average SPI when with his adoptive daughter Cosette than with his archnemesis Javert, although the wide range of SPIs again indicate the sentimental complexity of the leading character

pairs. The Pollyanna effect still stands true here; in general, the average SPI of character pairs (dotted line) is a positive value at σ0 = 0.07. Therefore it is sensible to define the cosentiment of a character pair (α, β) to be σ[α, β] − σ0 . This quantity was already used for edge colors in Fig. 3. We can also use this to study the sentimental states within and between communities, shown in Fig. 6 (C). In the figure, the diagonal elements show the fractions of positive and negative edges inside the communities, whereas the off-diagonal elements show those between two communities. The circle radius indicates the

100

150

10 logarithm of the number of edges. Communities II and VI are in general the most negative inside, showing the harsh and tragic nature of the common experiences of the prisoners and revolutionaries. To the contrary, Communities V and VII are the most positive inside. Between communities, II and VI are the most negative, due to Javert’s presence at the tragic barricade scene with the revolutionaries. We now study the sentimental qualities of the network and how they change along the narrative progression. It is shown in Fig. 7, where each panel corresponds to a Sequence of the novel first introduced in Fig. 1 (B). The definition and rationale for the Sequence are as follows: Sometimes a plot or a storyline may span multiple consecutive narrative units, which makes it reasonable to bundle them into a larger one. To achieve it we need to determine the similarity between subsequent narrative units. One possibility we use here is the character composition; consecutive units belonging to the same or highly similar plots are likely to contain similar characters. Specifically, starting from the 40 Books of Les Mis´erables(excluding eight that contain no characters), we bundle the consecutive ones whose characters are similar above a prescribed threshold. Using the cosine similarity (although others such as the Jaccard index may be used) and setting the threshold to be the average similarity (0.49) between consecutive book pairs, we end up with the 21 Sequences shown in Fig. 7. We also show the fraction of negative and positive edges. The correlation between sentimental fluctuations and narrative flow are perhaps the best understood from Fig. 7 by studying Marius and his revolutionary friends. When they are first introduced in Sequence 8, the sentiment is overwhelmingly positive, reflecting the air of optimism from their cause. Such initial positivity is not long-lived, however, as they have to struggle with their adversaries in subsequent Sequences 11, 14, and 15. After they overcome these challenges they briefly regain their positive sentiment (Sequence 16), but then are thrust into the most tragic and climactic circumstances (Sequences 17–20) that show high negativity. Finally, at the end of the novel (Sequence 21) the resolution is reached showing a highly positive sentiment. The fluctuations between positive and negative in this fashion are known to be by design [46]

5.

Topic Modeling and Mapping Interaction Dynamics via Topical States

We set |T | = 50. The results for all 50 are given in Fig. 8. The keywords (the strongest ones are in bold) tell us that the topics are often about the characters (e.g. T1, T2, and T3), places (e.g. T11, T20, and T25), or events (e.g. T7, T22, and T42). The character–topic associations {tαk } are visualized in Fig. 9 (A) for Valjean and Marius, scaled so that the strongest topic fills the space between the two circles. The five strongest topics for

each characters are T1, T4, T3, T2, and T7 for Vajean, and T2, T1, T3, T4, and T14 or Marius. From Fig. 8 we see that they are about themselves and related characters or actions (valjean, escape, marius, eponine, etc.). We can also use them to identify topics associated with the communities by summing up the tαk over the chapters that contain two or more of the members of the community, which are shown in Fig. 9 (B). The topics shown are relevant to multiple members of the group, for instance, characters from inside the community (e.g., T1 and T4 for Community I) or outside (e.g., T2 and T29 for Community I), or the events or places, for instance T41 (the trial) for Community II of Javert and Valjean’s fellow prison inmates. We now introduce an interesting use of topics for representing narrative dynamics. An impactful event in a person’s life is one that brings about significant changes in the person’s state. This means that even in a narrative, if one could define a character’s state at a given point, one could measure the impact or significance of an event by comparing the states from before and after the event. We use topic modeling to do exactly this, by interpreting tαk as the topical state of the character. The idea is straightforward: Since an associated topic indicates the action, events, interactions, etc. taking place in the character’s presence, it can be understood as telling us the situation or the state of the character. While Fig. 9 (A) shows the topical states averaged over the entire novel, we can define a character’s topical state at a given point in the narrative by obtaining the topical associations from the corresponding chapter(s). As an example now study the impact that the interactions between Marius and Valjean have on the character’s states. For simplicity we consider Valjean and Marius to be interacting largely two times in Les Mis´erables, prompting us to partition the novel into the following four phases: 1. Phase I (Chapters 1 to 233): Before the first interaction. Valjean and Marius lead separate lives. 2. Phase II (Chapters 234 to 266): The first interaction take place. Marius falls in love with Cosette, causing Valjean to become anxious about losing her. 3. Phase III (Chapters 272 to 295): Valjean is absent from the narrative, so no interaction takes place. Marius parts from Cosette, then joins the revolutionaries at the barricade. 4. Phase IV (Chapters 296 to the end of the narrative): The second interaction takes place. Marius gets injured at the barricade, then is rescued by Valjean. Cosette and Marius marry. Valjean dies. Our strategy now is to observe the changes in characters’ states tαk . We then use them to understand the details of the interaction dynamic. First, the changes in tαk for the characters at the end of each phase are shown in Fig. 10 (A), obtained by subtracting the tαk immediately

11 B Sentiment Polarity Indices of Character Pairs Cosette.Valjean Cosette.Marius

Listolier Fameuil Cochepaille

Chapter Average Mean

Chenildieu

Fameuil.Listolier Dahlia.Listolier Dahlia.Fameuil Chenildieu.Cochepaille

Character Pair Average Mean

Chenildieu.Javert

Median

0.4

0.2

0.0

SPI

Appearances by Valjean

0.6 1

SPI

−0.2

−0.4

Median SPI

0.4 0.2

SPI

00 -0.2 -0.5 -0.4 -0.6 -1

Appearances by Valjean and Javert

0.6

0.4 0.5 0.2

0.4

Marguerite

Javert.Valjean Gillenormand.Marius

0.2

Thenardier

Marius.Valjean

0.0

Javert

−0.2

Cosette

−0.4

Major Characters

Marius Valjean

Minor Characters

Minor Characters

Major Characters

A Sentiment Polarity Indices of Individual Characters

0 -0.2 -0.4

0

50

100

150

200

250

300

-0.6

350

0

50

100

150

200

250

Chapters

300

350

Chapters

C Intra- and Inter-community Cosentiments

Communities I

II

III

IV

V

VI

VII

I

VII

VI

II

II III

I

III

IV

IV V

12 edges 22 edges

V

55 edges VI

Fraction of Negative Edges VII

Fraction of Positive Edges

FIG. 6: Sentiments of Characters, Character Pairs, and Communities. (A) Sentiment Polarity Indices (SPIs) for the characters of Les Mis´erables. The yellow boxes indicate the SPI ranges of 50% of the chapters around each character’s median (25% below, 25% above). The leading characters (higher in the plot) feature a wider range of SPIs than the marginal ones (lower in the plot), reflecting their role in the sentimental fluctuations of the narrative. The SPIs of the chapters in which Valjean appears are shown below. (B) SPIs for character pairs. Valjean indeed shows an higher SPI when together with prot´eg´ee Cosette than pursuer Javert, although SPIs for leading characters again show a wide range. (C) The intra- and inter-community cosentiments. Communities II and VI are in general the most negative inside, due to the fact that prisoners and revolutionaries share difficult and tragic experiences (harsh prison terms and death at the barricade). Communities V and VII are the most positive inside. Between communities, II and VI are the most negative, due to Javert’s presence at the barricade with the revolutionaries.

before the interactions from that from immediately after. At the end of Phase I, Valjean is the most strongly associated with T1, T5, T21, T47, and T29, whereas Marius is with T2, T14, T8, T32, and T37 which represent their

trajectories up to that point according to Fig. 8. They share no common topics, as expected from the lack of any interaction up to that point – in fact, the correlation between their {tαk } is negative at −0.20 ± 0.01. At the

12 Count

Geborand

Baptistine

Magloire

Myriel

Bossuet

Cravatte

Philippe

Jacquin

1

6

Isabeau

Valjean Brevet

Eponine

18

Fameuil

Listolier

Eponine

Javert

Courfeyrac

Myriel

2

Fauchelev

7

19

13

Thenardier

Valjean

Volume 11 Volume Victurnien

Thenardier

Bamatabois

Fantine

Cosette

Marguerite

Philippe

Favourite

Zephine

Dahlia

Tholomyes Bossuet

Pontmercy

Fantine enjoys her life with her friends, but falls into a miserable one. Valjean saves her from Javert.

Montparnasse

Enjolras

Toussaint

Fantine

Mabeuf Marius Gillenormand Lieutenant

Valjean

Cosette Gillenormand

Marius, estranged from his grandfather for his liberal views, leaves his family.

Lieutenant

Marius

Gavroche

Cosette fears for her safety when a stranger follows her in the garden, (who turns out to be Marius).

Volume 4

Fantine

12

Jacquin

Myriel is a positive influence on Valjean by forgiving him when he steals his silverware.

Vaubois

Magnon

Marius

Valjean

Cosette Marius Toussaint

Simplice

Fauchelevent

Perpetue

Douai

Chenildieu

Javert Bossuet

Cochepaille

Gervais

Champmathieu

Valjean

Cosette

Fantine

Scaufflaire

Enjolras

Joly Bahorel

Montparnasse

Eponine

Javert

Brevet

3

Bamatabois

Thenardier

Valjean

Philippe

Boulatruelle

Bahorel

Mabeuf

Claquesous

Brujon

Babet

Montparnasse

Toussaint

4

9

21

Eponine

Thenardier

Valjean

Fauchelevent

Cosette

Burgon

Philippe

Fantine

Javert

Prouvaire

Javert

Bossuet

Joly

Gavroche Grantaire

Combeferre Marius

Feuilly

Bahorel

Courfeyrac Enjolras Prouvaire

Hucheloup

5

10

16

Joly

Bahorel

Bossuet

11

Gueulemer

Javert

17

Babet Javert Montparnasse

Brujon

Magnon

Mabeuf

Claquesous

Pontmercy

Thenardier

Eponine

Marius

Combeferre

Feuilly

Bossuet

Enjolras

Prouvaire

Grantaire

Boulatruelle

Courfeyrac

Fantine

Cosette

Pontmercy

Marius

Thenardier attacks Valjean, who is saved by Marius and later turns out.

Philippe

Enjolras

Marius, despondent upon finding Cosette gone, joins the barricade.

Fraction of Negative Edges Fraction of Positive Edges

Positive Cosentiment

Negative Cosentiment

The revolutionaries build barricades and are filled with optimism.

Mabeuf

Marius develops a feeling for a woman he sees at the Luxembourg Gardens, (who later turns out to be Cosette).

Marius

Courfeyrac

Valjean saves Cosette from Thenardier and flees to Paris. Pursued by Javert, they take refuge in the in the Petit-Picpus convent.

Gribier

Volume 2

Thenardier

Valjean fakes his own death by falling into the ocean, and buried his treasure in Montfermeil.

Pontmercy Fauchelevent

Valjean

Courfeyrac

Marguerite

15

Lieutenant

Courfeyrac

Pontmercy

Mabeuf

Volume 3 Volume 3

Montparnasse

Enjolras

Marius Gillenormand

Lieutenant

Enjolras

Eponine

Cosette Marius Thenardier

Magnon

Marius befriends Mabeuf, who helps him find a job.

Feuilly

Gillenormand

Gavroche

Enjolras

Combeferre Bossuet

Mabeuf Grantaire

Courfeyrac

Fantine

Gavroche

Eponine

Gillenormand Pontmercy

Thenardier

Cosette Marius Valjean Javert

The bloodshed is over. Cosette and Marius got married. Valjean dies in piece.

Boulatruelle

Claquesous

Champmathieu

Toussaint

Fauchelevent

Baroness

Prouvaire

Valjean decides to leave Paris. Marius wants to marry Cosette, but his grandfather suggests he make her his mistress. Thenardier attempts to attack Valjean again, but is thwarted by Eponine.

Javert

Combeferre

Volume 4 Volume 4

14

20

Grantaire

Prouvaire

Bahorel

Courfeyrac Bossuet Mabeuf

Enjolras

Cosette

8

Pontmercy

Marius

Prouvaire

Courfeyrac

Bossuet

Valjean reveals his true identity to save a man. Fantine dies.

Legle

Grantaire

Tholomyes

Combeferre Feuilly

Babet

Valjean

Fauchelevent

Thenardier and his ilk escape from the prison with the aid of Gavroche.

Pontmercy

Marius

Toussaint

Magnon

Gillenormand

Joly

FeuillyHucheloup

Combeferre

Valjean saves Marius at the barricade. A massacre takes place.

Gavroche

Thenardier

Volume 5

Gillenormand Eponine Thenardier Brujon Magnon Gavroche

Marius befriends Friends of ABC, young revolutionaries, who have hopes for a better France.

18

Gillenormand

Toussaint

Gillenormand

Thenardier

Fauchelevent

Toussaint

Valjean

Gavroche delivers a letter from Marius to Cosette, which is intercepted by Valjean.

20 Volume5

Gillenormand, Marius’ grandfather, is introduced.

Javert

Thenardier

Cosette Valjean

Fauchelevent

Cosette

Marius

Courfeyrac

Mabeuf

Javert

Gavroche

Combeferre

Pontmercy

Valjean worries that he might lose Cosette who has feelings for Marius.

Bahorel

Enjolras

Prouvaire

Bossuet

Hucheloup

Joly

Feuilly

Marius escapes a near death with help from Eponine, who dies

Volume4

FIG. 7: Network Snapshots Showing Sentiments and Narrative Flow Snapshots of character networks in the 21 Sequences of Les Mis´erables. Edges are colored according to the cosentiment between the characters. The fractions of positive and negative edges are indicated in each snapshot, along with the summary of major plots in the Sequence. The sentimental fluctuations often reflect the build up of drama, tension, and resolution.

13 Topics Topic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Keywords Topic valjean , jean, lantern, light, perceive 26 marius , father, say, letter, know 27 enjolras , barricade, gun, combeferre, insurgent 28 cosette , say, toussaint, garden, happy 29 bishop , magloire, madame, monseigneur, say 30 thenardier , man, franc, doll, madame 31 revolution , louis, king, philippe, france 32 gillenormand , theodule, grandfather, mademoiselle, aunt 33 child , mother, father, little, girl 34 english , wellington, cuirassier, battle, road 35 rue , street, du, des, calvaire 36 javert , police, inspector, mayor, spy 37 wall , door, grating, large, house 38 bench , girl, marius, luxembourg, young 39 love , soul, heart, god, destiny 40 jondrette , leblanc, benefactor, door, say 41 bed, sleep, candlestick, open, bishop 42 social , undermine, civilization, society, human 43 fauchelevent , coffin, digger, grave, cemetery 44 sewer , paris, subterranean, bruneseau, city 45 madeleine , sur, mayor, town, cart 46 waterloo , napoleon, wellington, battle, blucher 47 convent , nun, prioress, mother, holy 48 barricade , insurrection, paris, saint, faubourg 49 garden , night, flower, forest, grass 50

Keywords fantine , franc, marguerite, thenardiers, victurnien paris , city, make, rome, populace gavroche , montparnasse, elephant, pistol, bahorel sister , fantine, simplice, madeleine, doctor tholomyes , favourite, blachevelle, fantine, dahlia horse , maire, tilbury, scaufflaire, say courfeyrac , bahorel, portress, tragedy, coffer infinite , infinity, philosophy, god, i cloister , convent, monasticism, monastery, community sand , water, fontis, foot, ship gamin , titi, arab, pear, paris grantaire , enjolras, laigle, joly, bossuet et, qu, je, la, le woman , dandy, old, man, spur mur , droit, rue, polonceau, picpus president , district, attorney, champmathieu, jury revolt , war, insurrection, uprising, riot brujon , babet, montparnasse, eponine, claquesous esprit , luc, century, vulgar, age mabeuf , plutarque, book, flora, old hucheloup , rue, wine, barricade, shop boulatruelle , forest, treasure, road, mender let, arab, doctor, populace, indignantly slang , language, le, historian, word vi, plank, eighty, voice, shovelful

FIG. 8: Complete List of Topics for Les Mis´ erables. 50 Topics of Les Mis´erables found via Non-Negative Matrix Factorization (NNMF). Strongly associated keywords are also listed (the strongest keywords in bold). The topics are frequently about the characters (e.g. T1, T2, and T3), places (e.g. T11, T20, and T25), or events (e.g. T7, T22, and T42).

end of Phase II after their first interaction the correlation increases to 0.42 ± 0.01, showing that an interaction works to correlate the character states. At the end of Phase III (no interaction) it decreases again slightly to 0.33 ± 0.001. At the end of Phase IV where they interact again for the final time and quite extensively it reaches its highest value of 0.70 ± 0.01. These show that an interaction functions to assimilate the characters’ states, and an inspection of the changes ∆tαk provides us with more detail of this assimilation dynamics. For simplicity, we again focus on the five topics (for each character) that gain the most in strength after each phase, shown in Fig. 10 (A). After the first interaction, we find that the five such topics for Valjean are T4, T2, T1, T7, and T25, whereas for Marius they are T1, T4, T25, T7, and T45. When we compare the strongly associated topics from before and after the interactions, we find there are some that we can interpret as having been transferred from one character to the other. An example is T2 (marius), the strongest one with Marius before Phase II, which gains the most for Valjean after. The same goes for T1 (valjean), this time from Valjean to Marius. Second, there are topics that have entered the characters states exogenously, i.e. those that not strongly associated with either character. They represent new common experiences or interests that occur during the interactions: T4 (cosette), T7 (revolution), and T25 (garden) are such cases. They again reflect the story accurately: Cosette

becomes the focal point of both characters, as a new love interest for Marius that causes severe anxiety to Valjean. Some topics enter only one character’s state, such as T45 (mabeuf) which is about a character Mabeuf who shares his story with Marius at the barricade, but has little to do with Valjean – Valjean’s topical state indeed has near-zero component of T45. Next, during Phase III, T11 (rue), T24 (barricade), T46 (hucheloup), T42 (revolt), and T28 (gavroche) gain the most strength for Marius, reflecting the events and the characters he experiences during that time. Valjean is absent. Finally, during Phase IV, T3 (enjolras), T2, T28 (gavroche), T24 (barricade), and T1 gain the most strength with Valjean, whereas topics T3 T1, T28, T35 (sand), and T12 (javert) gain the most strength with Marius. Note how the directionality of T28 and T24 from Marius to Valjean reflects the actual way things happen between the characters: Gavroche (T28), a friend of Marius’, carries a letter from Marius to Valjean that motivates Valjean to join the barricade (T24) in search of Marius. Our discussion here about topic transfers and entry can be systematically visualized as in Fig. 10 (B) on top of the basic interaction timeline first introduced in Fig. 1, showing that the textual information indeed allows to construct a much more detailed picture of an interaction than a simple occurrence-based network construction. The results provided in this section, by showing that the story of a narrative can be identified, quantified, an-

14 A Strongly Associated Topics for Valjean and Marius 1 2 3 4

1 2 3 4

Topic 1

Topic 2

7

Valjean

Topics

valjean, jean, lantern, light, perceive, escape, old, wall, day, galley, patrol, come, life, place

Marius

14

Topics

marius, father, say, letter, know, man, come, address, eponine, tell, think, promise, baron, hand

B Strongly Associated Topics for Communities

T2, T8, T1, T4, T12

T3, T41, T12, T1, T31

VII

VI

II

III

I IV

T30, T39, T26, T9, T25

T5, T17, T1, T7, T31

V

T3, T2, T32, T37, T46

T6, T43, T2, T4, T26

T1, T4, T2, T29, T19

FIG. 9: Topics Associated with Valjean, Marius, and The Communities. (A) Topics strongly associated with Valjean (left) and Marius (right). The topic-character association strengths are scaled so that the largest value fills the space between the two circles. Topic T1 is the most relevant to Valjean, while Topic T2 is to Marius. They contain the respective character names as the strongest keywords, but also contain with words closely related to each character. (B) Topics strongly associated with the communities in Fig. 2. The topics can be about the characters inside the community, or even from the outside as long as they are sufficiently associated with multiple members of the community. For instance, T2 (marius), T29 (sister, fantine), and T19 (fauchelevent) are strongly associated with Community I, although the characters belong to other communities. The topics can also be the events involving the community members, for instance T41 (the trial – attorney, jury) for Community II composed of Javert and Valjean’s fellow prison inmates.

alyzed, and visualized by making use of appropriate analytical and computational tools, we believe demonstrate the benefits and opportunities of approaching traditional subjects as narrative from a novel perspective that allows us to find new patterns and gain a richer understanding not readily available previously.

IV.

DISCUSSIONS AND CONCLUSIONS

In this paper we proposed a network-based framework for modeling a narrative by focusing on the characters and their interactions. We started by representing a narrative as a set of interacting character timelines, from which we constructed a growing character network. To legitimize our approach it was necessary to understand how the character network topology and dynamics reflected the narrative structure correctly. We found that

character centralities captured the role and the nature of the social spheres of characters in the narrative, while the temporal growth of network showed distinct phases with differing patterns of increasing nodes or edges depending on whether the narrative was focusing on isolated characters (stagnant growth), expanding the story world by introducing new characters (growth led by number of nodes), or when existing characters converge into the building process to the resolution (growth led by number of edges). An important characteristic of well-written drama is that it evokes emotion in the reader, which in the western literary tradition is conventionally represented by the generic division of drama into comedy and tragedy. This had an interesting connection to a modern computational methodology called sentiment analysis. We found that many characters, especially the central ones, showed significant fluctuations of sentiments during the narrative

15

A Major Topics for Valjean and Marius 47

1

1 2

5

1 2 3

4 7

Valjean

21

29

Absent During Phase III

After Phase II

After Phase I

25 1

2

45

4 7

After Phase I

37

3

42

After Phase III

After Phase II

14

1

46

Topics

24

28

8

Marius

After Phase IV

11

After Marius Phase PhaseIV4

12

Topics

35 32 28

25

28

24

B Timeline of Interaction between Valjean and Marius First Interaction

Valjean

Phase I Sequences 1-5, 7-11

Second Interaction

Phase II Sequences 12-13, 15

Phase IV Sequences18-21

Phase III Sequences16-17

Time

Timeline of a character

24 1

2

4

7

25

Topic transferred from Valjean to Marius

Marius

Marius’ new topics obtained in first meeting

Topic transfer between characters Topic input from outside

45

Newly incorporated topics during the interaction

11

24 42

1

3

28

28 46

12

35

Marius’ new topics obtained in Phase 3

FIG. 10: Mapping Out Interactions Diagrammatically As Dynamic Topic Exchange. Events lead to character transformation, which we quantify via the character’s topical states. With respect to the interaction between Valjean and Marius, we divide Les Mis´erables into four phases. (A) The net changes in the topical states of the characters at the end of each phase, quantified by the differences tαk . After Phase II, T2 (the strongest topic for Marius before) shows a sharp increase for Valjean. Likewise, T1 (Valjean’s strongest topic before) shows a sharp increase for Marius. T4, T7, and T25 increase for both characters, while T45 increases only for Marius. (B) Diagrammatic representation of the changes of Marius’ and Valjean’s topical states as ’topic transfers’ during each phase; Topics can be exchanged between characters (e.g., T1 and T2 during Phase II) or enter either character’s topical state exogenously (e.g., T4, T7, T25, and T45). The dirctions can also reflect those of actual story elements: during Phase IV (Chapters 296–365), Valjean, prompted by a letter from Marius, joins the barricade. This is directly reflected in the transfer of topics T24 and T28 from Marius to Valjean.

16 flow, acting as the carriers of mood and emotions of the narrative. This was true of character relationships as well, and we showed how the sentimental fluctuations correlated with the narrative progression that showed detectable patterns of dramatic tension build-up and resolution. Finally, we used topic modeling as a way to define the state of a character via the topics (keywords) with which they are associated at various points in the narrative. This allowed us to trace quantitatively the changes in characters’ states, and quantify and map out the details of an event or an interaction between characters. We also demonstrated that the flow of topics between characters can reflect the actual story in interesting ways, providing us with a way to systematically represent the patterns of character interactions that previously resided in the text of the narrative. We believe that our paper presents a wide range of ideas for studying narrative structures that merit further exploration using the methods of network science, data analysis, and computational linguistics. Looking further, representing a narrative as a dynamically unfolding system of character networks and interactions also sets the stage for using theories and tools for understanding of

dynamical systems, not only networks. Advances in this area have practical implications as well, such as an improved algorithm for computer-assisted writing and storytelling which no doubt can benefit from a more robust understanding of the patterns of character relationships and interactions. Given the ubiquity and importance of narratives, we hope that future developments based on our work will be beneficial for a wide range of fields including literature, communication, and storytelling.

[1] J.-B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, et al., Science 331, 176 (2011). [2] Project Gutenberg, URL https://www.gutenberg.org/. [3] P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M. Kloumann, J. P. Bagrow, et al., Proceedings of the National Academy of Sciences 112, 2389 (2015). [4] M. Schich, C. Song, Y.-Y. Ahn, A. Mirsky, M. Martino, A.-L. Barab´ asi, and D. Helbing, Science 345, 558 (2014). [5] D. Kim, S.-W. Son, and H. Jeong, Scientific Reports 4, 7370 (2014). [6] D. Park, A. Bae, M. Schich, and J. Park, EPJ Data Science 4, 1 (2015). [7] M. Newman, Networks: An Introduction (Oxford University Press, Pres2010). [8] R. Albert and A.-L. Barab´ asi, Reviews of Modern Physics 74, 47 (2002). [9] D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World (Cambridge University Press, 2010). [10] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques (Elsevier, New York, 2011). [11] L. A. Adamic and B. A. Huberman, Science 287, 2115 (2000). [12] R. Albert, H. Jeong, and A.-L. Barab´ asi, Nature 401, 130 (1999). [13] J. H. Choi, G. A. Barnett, and B.-S. Chon, Global Networks 6, 81 (2006). [14] S. P. Borgatti and P. C. Foster, Journal of Management 29, 991 (2003). [15] V. Grimm, E. Revilla, U. Berger, F. Jeltsch, W. M. Mooij, S. F. Railsback, H.-H. Thulke, J. Weiner, T. Wie-

gand, and D. L. DeAngelis, Science 310, 987 (2005). [16] F. Moretti, New Left Review 81, 80 (2011). [17] F. Moretti, Distant Reading (Verso, New York, 2013). [18] Pamphlets by Stanford Literary Lab, URL https:// litlab.stanford.edu/pamphlets/. [19] Box office mojo, URL http://www.boxofficemojo.com/. [20] The Numbers, URL http://www.the-numbers.com/. [21] M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 (2004). [22] D. K. Elson, N. Dames, and K. R. McKeown, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, 2010), pp. 138–147. [23] P. M. Carron and R. Kenna, EPL 99, 28002 (2012). [24] P. Mac Carron and R. Kenna, The European Physical Journal B 86, 1 (2013). [25] D. Kydros, P. Notopoulos, and G. Exarchos, International Journal of Humanities and Arts Computing 9, 115 (2015). [26] M. C. Waumans, T. Nicod`eme, and B. Hugues, PLoS Onene 10, e0126470 (2015). [27] A. Welsh, Nineteenth-Century Fiction 33, 8 (1978). [28] S. Rimmon-Kenan, Narrative Fiction: Contemporary Poetics (Routledge, London, 2003). [29] M. Bal and C. V. Boheemen, Narratology: Introduction to the Theory of Narrative (University of Toronto Press, 2009). [30] H. P. Abbott, The Cambridge Introduction to Narrative (Cambridge University Press, 2008). [31] S. Field, Screenplay: The Foundations of Screenwriting (Delta, New York, 2007). [32] C. Vogler, The Writer’s Journey (Michael Wiese Productions, Seattle, 2007).

Acknowledgments

The authors would like to thank Kyungyeon Moon, Wonjae Lee, and Bong Gwan Jun for helpful comments. This work was supported by the National Research Foundation of Korea (NRF-20100004910 and NRF-2013S1A3A2055285), BK21 Plus Postgraduate Organization for Content Science, and the Digital Contents Research and Development program of MSIP (R018415-1037, Development of Data Mining Core Technologies for Real-time Intelligent Information Recommendation in Smart Spaces).

17 [33] V. Propp, Morphology of the Folktale (University of Texas Press, Austin, Texas, 2010). [34] Moviegalaxies, URL http://www.moviegalaxies.com. [35] Les Mis´erables, URL http://www.gutenberg.org/ files/135/135-h/135-h.htm. [36] G. Freytag, Freytag’s Technique of the Drama: An Exposition of Dramatic Composition and Art (Scholarly Press, 1896). [37] Y. R. Tausczik and J. W. Pennebaker, Journal of language and social psychology 29, 24 (2010). [38] P. Gon¸calves, M. Ara´ ujo, F. Benevenuto, and M. Cha, in Proceedings of the first ACM conference on Online social networks (ACM, 2013), pp. 27–38. [39] D. Jurgens and K. Stevens, in Proceedings of the ACL 2010 System Demonstrations (2010), pp. 30–35. [40] T. Van de Cruys and M. Apidianaki, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language TechnologiesVolume 1 (2011), pp. 1476–1485. [41] K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (Association for Computational Linguistics, 2012), pp. 952–961. [42] D. D. Lee and H. S. Seung, Nature 401, 788 (1999). [43] D. D. Lee and H. S. Seung, in Advances in Neural Information Processing Systems (2001), pp. 556–562. [44] W. Xu, X. Liu, and Y. Gong, in Proceedings of the 26th annual international ACM SIGIR conference on Re-

[45] [46] [47] [48]

[49] [50]

[51]

[52]

search and development in informaion retrieval (ACM, 2003), pp. 267–273. Y. Zhao and G. Karypis, Machine Learning 55, 311 (2004). R. McKee, Substance, Structure, Style, and the Principles of Screenwriting (HarperCollins, New York, 1997). S. Min and J. Park, in Complex Networks VII: Studies in Computational Intelligence (2016), pp. 257–266. S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications (Cambridge University Press, 1994). P. V. Marsden, Annual Review of Sociology pp. 435–463 (1990). As a mature-rated show, the titular character’s identity and the overarching plot of the drama may be discomforting and too cruel for some readers to describe here; we refer the interested to seek appropriate sources for more information. The decomposition is also approximate in practice, i.e. M ' QH with the difference between M and QH (called reconstruction error) measured by the squared Frobenius norm. Although our network appears denser than typical social networks [48, 49], this is likely due to the fact that most characters of the novel are involved in some common plot while the rest of the story world is pushed into the background.

Suggest Documents