arXiv:1707.03095v3 [cs.CL] 14 Jul 2017
Look Who’s Talking: Bipartite Networks as Representations of a Topic Model of New Zealand Parliamentary Speeches B. Curran1 , K. Higham2 , E. Ortiz3 , D. Vasques Filho1 1 Te P¯ unaha Matatini, University of Auckland, Private Bag 92019, Auckland 1011, New Zealand 2 Te P¯ unaha Matatini, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand 3 Departament de F´ısica de la Mat`eria Condensada, Universitat de Barcelona, Mart´ı i Franqu`es 1, 08028 Barcelona, Spain. YThe authors contributed equally to this work. *
[email protected]
Abstract Quantitative methods to measure the participation to parliamentary debate and discourse of elected Members of Parliament (MPs) and the parties they belong to are lacking. This is an exploratory study in which we propose the development of a new approach for a quantitative analysis of such participation. We utilize the New Zealand government’s digital Hansard database to construct a topic model of parliamentary speeches consisting of nearly 40 million words in the period 2003-2016. A Latent Dirichlet Allocation topic model is implemented in order to reveal the thematic structure of our set of documents. This generative statistical model enables the detection of major themes or topics that are publicly discussed in the New Zealand parliament, as well as permitting their classification by MP. Information on topic proportions is subsequently analyzed using a combination of statistical methods. We observe patterns arising from time-series analysis of topic frequencies which can be related to specific social, economic and legislative events. We then construct a
1/28
bipartite network representation, linking MPs to topics, for each of four parliamentary terms in this time frame. We build projected networks (onto the set of nodes represented by MPs) and proceed to the study of the dynamical changes of their topology, including community structure. By performing this longitudinal network analysis, we can observe the evolution of the New Zealand parliamentary topic network and its main parties in the period studied.
1
Introduction
Topic models have received widespread attention in recent years [1–4] as they have proven to be useful tools for dealing with the vast amount of semantic information that is becoming available. Topic modeling is a set of machine learning techniques that take a collection of documents as input and attempts to discern the themes that pervade them [5]. However, the methods that topic models utilize to search, summarize and understand large electronic archives have rarely been applied to political texts. The New Zealand government has been making parliamentary transcripts (’Hansard’) available in digital format since 2003. Suitable annotation of these transcripts allow them to be used as a corpus for the development of topic models. This comprehensive corpus of political text can then be examined through a number of lenses. Topic models allow us to monitor the ebb and flow of themes that are discussed in parliament over multiple years and associate particular themes with individual Members of Parliament (MPs). This allows the identification of trends of topics that particular parties follow. That is, we may observe which issues are discussed repeatedly with great interest and which cease to be mentioned. In the four parliamentary terms analyzed there was a transition of power from the 5th Labour government (1999-2008) to the 5th National government (2008-). The left-leaning Labour Party and right-leaning National Party have been the two parties sharing power for most of the 20th century. In 1996, the method of electing MPs was changed to a mixed-member proportional (MMP) system and the two major parties were joined by a number of smaller parties. These smaller parties have sometimes held the balance of power, with the left-wing Green Party as the largest of these. A number of textual analyses of political speeches are concerned with finding
2/28
where on the political spectrum a speaker falls (e.g. [6–8]). Topic modeling as applied in our analyses cannot determine the sentiment of a statement or speech. Despite this fact, multiple aspects of politicians’ policy interests can be unraveled with further statistical analysis. Here we construct bipartite networks [9–12], whereby sets of MPs are linked to a set of topics, with each link representing a topic that is of clear interest to a particular MP, based on the content of their parliamentary speeches. We can then decompose such bipartite networks into their two projections: the MP-projection and the topic-projection. The former represents a network where the links between MPs indicate the existence of a mutual interest, and the latter represents a network where links represent topics that co-occur as interests of a particular MP. In this study, we make use only of MP-projections. Measuring properties such as the node degree (i.e. the number of links that connect it to other nodes), homophily [13, 14] and clustering and community structure [15, 16] of these networks provides information about their underlying topology. For instance, one can discover whether or not the typical range of interests of an MP is changing, as well as patterns in this behavior over time. Moreover, we apply community-detection methods [17, 18] in order to find clusters of politicians that share interests, and investigate the partisan composition of these communities. This work is of an exploratory nature, in that our goal is twofold: to present a novel quantitative approach of measuring political activity and to demonstrate the benefits of performing quantitative analysis in a domain normally reserved for qualitative approaches, by using a combination of machine learning and complex networks techniques. The remainder of this paper is organized as follows: the Methods and Data sections 2.2 and 2.3 introduce fundamental aspects of topic modeling and bipartite networks respectively and outline the preparation and organization of our data; Sections 3 and 4 present the results of our analyses alongside a discussion and our conclusions.
3/28
2
Methods and Data
2.1
Data
The semantic data we are using for our analyses are extracted from the New Zealand Hansard database [19]. Hansard presents records of what is said in the debating chamber as debates (a collection of speeches on a particular topic), speeches (individual statements by MPs) or dailies and volumes (collections of speeches over different time periods). By considering only those documents labeled in that database as a ‘speech’ we were able to find out in which topics specific MPs were engaging with. This makes it possible to associate speeches and by extension MPs with topics of interest over time. Once these data are obtained, we observe that many speeches are rather short and contain little topical content. An example is given below, which comes from a committee discussion on the Shop Trading Hours Amendment Bill and was published in Hansard Volume 716 on the 17th of August 2016 [19]:
”CHAIRPERSON (Lindsay Tisch): Just a point: this debate concludes at quarter past and to whoever is speaking at the time, I will be stopping it at that point.”
In an attempt to remove these non-topical speeches, we have removed from our database speeches with 150 words or less, which constitute about 20% of the database. This cut-off is shown in Fig. 1 which presents the distribution of word-counts per speech. This decision is informed by observations of the insufficient topical content of speeches below this threshold.
2.2
Topic Models
The process of topic modeling involves utilizing a set of algorithms that have been developed to understand the underlying thematic structure of a corpus. The simplest and most commonly used topic model is Latent Dirichlet Allocation (LDA) [20]. Within the framework of LDA, each document is a mixture of corpus-wide topics and each topic can be understood as a distribution over keywords. The total number of
4/28
Frequency
3000 2000 1000 0
0
500
1000
1500
Speech size (number of words)
2000
Fig 1. Distribution of word-counts for New Zealand Parliamentary speech delivered in the period February 2003 to August 2016. In blue, the speech sizes used in our analysis. Speeches shorter than 150 words are omitted from our analysis due to lack of topical content, this threshold is indicated by the dashed vertical line. documents comprising the corpus is denoted as D and the total number of topics as K. Additionally, the order of words that comprise the document is not considered, only the frequency with which words appear. From the perspective of LDA, documents are imagined to be the result of a generative process. This is the process by which the model assumes the documents arose given certain hidden variables. The word-distributions per document are observed, while the topic structure – per-word topic assignment, per-document topic proportions and per-corpus topic distributions – are hidden elements. Therefore, the central computational problem for LDA is to infer the hidden structure that likely generated the observed corpus. This means computing the conditional distribution of the hidden variables given what is observed. This conditional distribution is usually referred to as the posterior and can be expressed as
p(β1:K , θ1:D , z1:D |w1:D ) =
p(β1:K , θ1:D , z1:D , w1:D ) p(w1:D )
(1)
where β1:K are all topics in the corpus, θ1:D are the per-corpus topic proportions, z1:D the per-corpus topic assignments and w1:D the whole set of observed words. Unfortunately, computing the posterior is computationally unfeasible and hence needs to be approximated by an inference algorithm. Consequently, topic modeling algorithms are commonly classified as sampling-based algorithms or variational algorithms [21].
5/28
In this work, we used the MALLET software package [22] for the topic modeling component of our analysis. MALLET implements Gibbs sampling [23], which constructs a sequence of random variables in a Markov chain, where each variable is dependent on the previous. The algorithm then assumes that the true posterior distribution is the limiting distribution of this sequence, and obtains an approximation to this posterior using these samples. For a full mathematical description of LDA and a further discussion of the methods used to estimate a posterior, see [21]. LDA assumes the topics are the same for all documents, and only the topic proportions vary. Therefore, MALLET requires an input which specifies the number of topics to be discovered. Choosing this number is critical to the success of a topic model, as too few topics may merge distinct themes, while too many topics may introduce many ”themes” consisting of vocabularies that appear to have nothing in common, or even start splitting topics that were identifiable at smaller input values. For our analysis it is important that the topics are easily identifiable and distinct from one another. We found that 30 topics satisfies these requirements. Identified topics and their corresponding keywords can be found in Table A.2. It is worth noting, however, that some topics (nine of them, corresponding to about 36% of the corpus) appeared to consist mainly of terms that were primarily either procedural or general rhetoric, such as ”proud”, ”hope” or ”nation”. As this language reveals little in the way of substantive interactions, such topics were omitted from our subsequent analyses after networks had been inferred. Fig. 2 shows the remaining topics with their rescaled proportions.
Proportion
0.06 0.04 0.02
Le
gis
la B tion L a Ec u d g e w o t So and nom cia o y Go l we rder De vern lfare v m Lo el op en t m ca l g C ent o v ri En ern me vir me on n t me nt Fin Tax E a Em duc nce p ati Ca loym on F Pr ore nte ent im ig rb ary n a ury in ffa du irs s Ho tries us i He Drung al t gs Tra hca ns re po rt
0.00
Fig 2. Topics identified by an LDA-based topic model of the Hansard speech corpus for the period February 2003 to August 2016 and their respective normalized proportions as a fraction of all the topical content discovered after the omission of nine topics with little specific thematic content. A list of topic and the key words that comprise them can be found in Table A.2 in Supporting Information.
6/28
2.3
Bipartite networks
A bipartite network is mathematically defined as a graph G = {U, V, E}, where U and V are disjoint sets of nodes and E = {(ui , vj ) : ui ∈ U, vj ∈ V } is the set of links connecting these nodes. For our purposes, the sets U and V correspond to the sets of MPs and topics, and the set E represents the links that emerge when an MP speaks sufficiently frequently about a topic. No connections among nodes of the same set are allowed in the bipartite network, that is, MPs are connected only to topics not others MPs and vice versa. Each set of nodes can have independent properties, such as the probability distribution for their nodes degree, or the number of nodes (system size). Once we find the set of topics that a particular MP speaks about ‘often’ enough (this criterion is defined below), these are represented as links between the MP and those topics. After this process is completed for all MPs, we can construct a bipartite network where nodes representing MPs are connected only to nodes representing topics, and vice versa. Bipartite structures play an important role in the analysis of social and economic networks. They are normally used to represent conceptual relations - such as membership, affiliation, collaboration, employment, ownership and others - between two different types of entities within a system [10–12]. Often, we are more interested in one of the types of nodes (e.g. MPs) and, in order to investigate the relationships between them, we create a new network with only these nodes. This new graph is a projection of the original bipartite network. Topic modeling results in a natural bipartite network with projections that can be easily interpreted. The projections of a bipartite network are obtained by connecting nodes which share a common neighbor. That is, if two MPs are both linked to the same topic in the bipartite network, then they are linked in the MP-projection. For a bipartite network, this process results in two completely separated components, each composed exclusively of one type of node (MPs and topics in our case). Fig. 3 shows a schematic drawing of a bipartite network and its possible projections. The edges between nodes in these projections are then weighted, dependent on the number of neighbors the nodes share in the bipartite network. In our analyses we use simple weighting method [24], whereby each edge has a weight that equals the number of
7/28
neighbors the nodes share in the bipartite graph. If two MPs are linked to the same three topics in the bipartite graph, then the edge linking them in the MP-projection will have weight equal to three. (a)
Topics
MP's U MP-projection
1
1
1 1
Topic-projection
(b)
1 2
2
1
(c)
2
1 1
1
1
2
Fig 3. Schematic drawing of the (a) bipartite network and its projections onto set of nodes (b) U , where nodes represent MPs, and (c) V , where nodes represent topics. In (b) and (c) the numbers on top of the links indicate the weights, which correspond to the number of common neighbors that two linked nodes share in the bipartite network. The weighting in these projections offers a way to eliminate edges that represent tenuous links. This is important, as complete subgraphs (where every node is connected to every other node in the subgraph) of MPs are generated by every topic. This means that every MP that speaks about a popular topic is connected to every other MP that speaks about that same topic. The existence of popular topics can make analyses such as community detection challenging in the absence of weighting. In order to build bipartite networks connecting MPs to topics, we looked at the corpus of each MPs speeches in more detail. We considered an MP to be connected to a topic when at least 6.7% of the MP’s speeches over the course of a year was assigned to that topic by MALLET. This occurs when MPs talk about a topic twice as much as would be expected if they were talking about all topics equally within a year. This method, removes topics that MPs only touch on briefly or in passing, which does not indicate engagement with the topic. Finally, MPs that had spoken less than 104 words in the entire term were removed form the network for the lack of significance in the volume of words spoken.
8/28
3 3.1
Results Words spoken
Despite having fewer MPs, opposition parties tend to have a greater total word count than the governing party. Figure 4 shows the total word count for each of the 3 largest parties (as of the 50th parliament) over the course of 4 parliaments. In each parliament, the total word count for opposition parties exceeds that of the governing party. The increase in words spoken does not appear to be driven by any particular MP or small group of MPs (see Supporting Information, Fig. (A.2).
Words (millions)
5.0
National Labour Green others
2.5
0.0
47th
48th
49th
Parliament
50th
Fig 4. Total number of words spoken by each party in their speeches longer than 150 words, for each of the four complete parliamentary terms studied.
3.2
Time Series of Topic Popularity
Allowing a decomposition by party, we ran a topic model on data concatenated by MP and year. The topic proportions obtained over a total number of 30 topics are normalized for each year so that they are comparable across a time span of 14 years. Proceeding this way, we can reproduce the evolution of topic popularity over time at the Parliament and its decomposition for each of the three most represented parties. Clear trends and differences across parties are visible in Fig. 5 and 6. Evolution of proportion of other argued topics appears in Fig. A.1, Supporting Information.
9/28
Topic proportion
All parties
National
0.072
0.072
0.048
0.048
0.024
0.024
2004 2007 2010 2013 2016 Year Foreign affairs
2004 2007 2010 2013 2016 Year
Law and order
Economy
Topic proportion
Labour
Development
Greens
0.072
0.072
0.048
0.048
0.024
0.024
2004 2007 2010 2013 2016 Year
2004 2007 2010 2013 2016 Year
Fig 5. Evolution over time of the proportions in which the topics labeled as Foreign Affairs, Law and order, Economy, and Development, are discussed at the Parliament, and the corresponding decomposition by party. Changes in topics being discussed often correlate with real world events such as the financial crisis of 2008 6), Canterbury and the Christchurch earthquakes of 2010/2011 (5) and more recently the housing crisis from about 2013 (5).
3.3
The Parliamentary Speech Network
The MP-projected networks for the 47th to 50th parliaments resulting from the process described above are shown in Fig. 7. The community structure [18] in these networks is visible, as is the party make-up of these communities. Table 1 shows the number of MPs per party present in each of these four networks. Table 1. Number of Members of Parliament in the three largest parties (as of the 50th Parliament) for each complete parliamentary term studied. Term
National
Labour
Green
Others
47th 48th 49th 50th
26 48 58 61
52 50 47 39
8 7 11 14
29 16 11 11
10/28
Topic proportion
All parties
National
0.072
0.072
0.048
0.048
0.024
0.024
0
2004 2007 2010 2013 2016 Year Environment
0
Housing
Canterbury
Topic proportion
Labour 0.12
0.048
0.08
0.024
0.04
2004 2007 2010 2013 2016 Year
Transport
Greens
0.072
0
2004 2007 2010 2013 2016 Year
0
2004 2007 2010 2013 2016 Year
Fig 6. Evolution over time of the proportions in which the topics labeled as Environment, Housing, Canterbury, and Transport, are discussed at the Parliament, and the corresponding decomposition by party. Note the change in scale in the Topic proportion axis for the Green Party time series to accommodate their notable interest in environmental policy. Before examining the structure of the network, it is important to note that in all networks a number of MPs are not connected to any others. This is a reflection of our methodology. The 6.7% threshold that is applied to filter topics may in fact remove all topics in the unlikely scenario that the MP in question talks about many topics in roughly equal proportion. MPs with diverse interests may not be identified as fitting into any particular community. A striking difference between the MP networks of the first two parliaments (47th and 48th ) and last two (49th and 50th) is the party composition of visible communities. In the first two (Labour government, 2002-2008) the communities are quite party-wise diverse, while in the last two (National government, 2008-2014) there is a close-knit community made up of National Party MPs. This is corroborated by the three largest communities composition, for every term in our analysis, shown in Fig. 8 [17]. In the first two terms, we note heterogeneous, smaller core communities and the absence of a community that is much larger than the others. That changes for the last two terms, specially the last one, where we see one community much larger than the other
11/28
48 th
47 th 75
158
52 61
152 10355
49
42 33 221 216 10
81
204
54 209
15 139 82
180 161
119
70 207 206 140 23 168 40
125
156
16312
92
154
87 211 203 191 177 166 79 128101 45 93 104 197 109 189 201 124 22 155 24 32 84 228 5 20531 71 99 127 74
53
105
198
139
47
122
39
29
Mana Labour Progressive Greens ACT Maori National United Future First
199
188 159
49 th 50
48
227
80
98 33
123 229
94
199
184
66
39
17
57 23 162 165 40 200 118 102 192 16 27 148 26 36 107 5 195 166110 81 73 93 173 76 219 197 15831 29 60 91 1974 211 178 58 160 105 101 150 210 72 175 226 32 224 174 130 6485 89 157 134 214 201225 115 13 38 63 59 79 88 142 206 92 172 28 78 6 133 179 156 170 42 131193 176 202 153 213 209 56 8
116 180 1451905213620315
20
132 186 138
97
226 20
190
52
204 151
23 68
39 106
93
180
121
175 74 157 189 193 81 15 153
29 109
200
50 th 8
1
186
50
88
121
91
86 198 185
111107 225 26 79
165 33
217
176
172
63 175 6 120193153 123 30 13 85223 200 37 210 74 174 148 19 41 150 196 162 206 226 133179 59 14 9 212 214 83 216 157 131 76 141 18438 219 114 60 194 144 18 151 73 64 40 115 224 97 209 77 43 105 57 173
222
10
208
36 138 211
51
41
176
40
98
39
25
227 203 215
216
143 55 10833 152
185
67
43
22
173
126
21 126 143 183 86 141 151
107
179
123
189
198
185
6 63
206162 48 105
125
190
138
202 119 221 225
8
98
113
62
79 75
50
132
163 201 95 31 57 134 148 101 209 87 155 210154 138 47 43 25 166 128 10 158 213 5 122 42 9245 3
66 21 188
195
199 76 156
26 88211 90 104 99
22438
171 115230
187
146 123
229
12
95 90
28 85 130 198
118
84
28 130 106 187 210 88 85 11
202 62 181
113 17
32 53
171
182
225
82
121
108 185
195
137 167 230 117 96 118
147
153 43229 2
197 17
218
10
58 178
23
25
145 16
156 21 56 66 213 116 129 89190
102
134 46 44 118 132 34 183 110
136 7 164 169
72
166
109 112 135
197 35 121 65
227
48
Fig 7. (Color online) MP networks for parliaments 47th to 50th corresponding to the MP-projection of the original bipartite graph. Node size is proportional to the total number of words spoken by each MP over the course of the parliamentary term. Node labels identifying MPs are provided in A.1, Supporting Information. emerging, dominated by MPs of the National Party. Another point worth noticing is the smaller presence of minor parties in the largest communities over time, ending with no presence whatsoever in the largest community of the 50th term network. Also supporting this idea of a close-knit community made up of National Party MPs is Fig. 10(b) that compares homophily between MPs (based on party affiliation) for the empirical data and configuration model networks. The latter is a random model in which we keep the same degree sequence of the empirical one and rewire the links. It shows that if connections were just random, then the expected homophily would be fairly constant and the network would be slightly dissortative. On the other
12/28
National Labour Green others
Number of MPs
40 30 20 10 0
47th
48th
49th
50th
Parliament
Fig 8. Size and party composition of the three largest communities discovered in the MP-projected network for each of the parliaments examined.
Frequency
10
47th
48th
49th
50th
5 0
0
10
20
30 0
10
20
30 0
10
20
Number of MPs per topic
30 0
10
20
30
(a)
Frequency
50
47th
48th
49th
50th
25 0
0
1
2
3
4
0
1
2
3
4
0
1
2
3
Number of topics of interest per MP
4
0
1
2
3
4
(b)
Frequency
40
47th
48th
49th
50th
20 0
0
20
40
60
0
20
40
60
0
20
MPs connectivity
40
60
0
20
40
60
(c)
Fig 9. Frequency distributions of the (a) number of MPs linked to each topic in the original bipartite graph, (b) number of topics linked to each MP in the original bipartite graph, and (c) number of links each MP has to other MPs in the MP-projected network. hand, there is an increasing homophily of the empirical networks, meaning that MPs preferentially share interests with other members of their parties, particularly within
13/28
35
25
15
47th
Empirical network Configuration model
0.10
Homophily
Average degree
All MPs National Labour Green others
48th
49th
Parliament (a)
50th
0.05 0.00
−0.05
47th
48th
49th
Parliament
50th
(b)
Fig 10. Left: Time series of the average degree of MPs in the MP-projected network, decomposed by party. Right: Time series of net homophily in the MP-projected networks for the empirical data and configuration model. It shows that if connections were random, the expected homophily would stay nearly constant and the network would be slightly dissortative. The results show average and standard deviation over 1000 runs. the National Party. The degree distributions in Fig. 9 also tell an interesting story. From Fig 9(a) we see that topics are attracting the interest of more MPs over time. Associate with that is Fig. 9(b) that shows the distribution of topics that MPs spend the most time on during each government. Most MPs speak about 2-3 topics in large proportions over these periods, however for parliaments 47-49 we see a trend towards larger repertoires. From Fig. 9(c), we can see the Labour government in the 47th and 48th parliaments display a fairly sparse network, where most MPs share interests with less than twenty other MPs. The 49th and 50th parliaments appear to show the formation of a compact National Party group (visible in Fig. 7) speaking about the same topics as many others within this group, such that members of this group in the 50th parliament are connected to at least 45 other MPs, most of whom are also members of this group.
4
Discussion
Topic models provide a way to parse human speech and extract themes from large bodies of text that are often difficult and time consuming to analyze manually. In few cases is it more important to gather and process this information than in the speeches of those people that control the legislative and political direction of a country. Topic modeling is unlikely to replace traditional media analysis of political speech, however,
14/28
here we have shown that it is a useful tool in examining larger themes and trends in political discourse. We were able to use topic modeling to track changes in the content of parliamentary speeches across time, and identify features in these time-series that correspond to particular issues or events. In the time period examined, a number of large events influenced political discourse in New Zealand, such as the 2011 Christchurch earthquake, the global economic crisis and changes to local government with the creation of the Auckland ‘Super City’ via the amalgamation of numerous smaller councils (see Fig. A.1) as well as more recently, the housing crisis. Parliamentary discussions around all of these topics were identified, alongside more conventional themes such as the economy, the budget, and social welfare. Breaking down these topics by time and party shows the different emphases parties are putting on topics. For example, we can see that much of the discussion around the developing housing crisis has been pushed by Labour (see Fig. 5) and, to a lesser extent, the Green Party, while the governing National Party showed little additional interest. Conversely, around the time of the economic downturn the National Party spent more of its time talking about economics. In other cases, such as the discourse surrounding the Christchurch earthquake and the governance of the Canterbury region (see Fig. 5), increase in discussion was driven by all parties. Unsurprisingly, the party that spend the most of their time discussing the environment, was the Green Party. Some events are sufficiently large that it would be difficult for a political party to ignore them, such as the Christchurch earthquake which influences the Canterbury topic. Exogenous events such as these force politicians to comment, driving conversation across party lines. Topics where there are major differences in trends suggest endogenous drivers, where discussion is the result of conscious decisions by political parties. The National Party’s apparent indifference to the increasingly vocal opposition parties’ discussion around housing over the period examined would suggest that National were consciously choosing not to engage with this topic, while Labour and Green were consciously choosing to engage. A mixture of mechanisms could also drive changes in the level of discussion. The increase in the discussion of the economy by National appears to occur at the same time as the global financial crisis. It also coincides with the National Party taking
15/28
power. The change in discussion could be attributed to the crisis, or it could be that the governing party simply talks proportionally more about the economy when they first enter office. Continuing this analysis past another change of government sometime in the future would allow us to identify the drivers of this type of pattern. From a basic analysis of the topic proportions we were able to identify an increase in the number of topics under discussion per MP, alongside an increase in the total number of words spoken per year. We can also observe that parties in opposition tend to talk more than parties in power. In the 47th and 48th parliaments, the National Party has a greater total word count. In the 49th and 50th after the National Party takes power, the Labour Party becomes the most vocal (see Fig. 4) Topic models also produce a natural bipartite network that can be decomposed into its projections and analyzed using standard network techniques. Without the use of sophisticated or computationally expensive methodologies we have shown that the networks resulting from a topic model can display useful information such as community structure and interpretable degree distributions. Much of the popular political analysis since the current National government took power in 2008 (the 49th parliament) has noted the factionalism with the major opposition party, Labour. The community structure we have inferred (Fig. 8) supports this more traditional analysis, with Labour MPs speaking more, on disparate topics while National MPs largely kept to a smaller number of topics. Extracting and examining the top three communities for each parliament we can see the National Party coming to dominate discussion within the core communities in the 49th and 50th parliament. We also see a gradual decline in the participation of smaller parties in these largest communities identified, in particular, having no influence in the largest community in the 50th parliament. Fig. 9 shows the gradual increase in the number of topics discussed by MPs over time (see Fig. 9(b)), indicating decreasing topic specialization by individual MPs in their parliamentary speeches. This has resulted in MPs becoming more highly connected over time (see Fig. 9(c)). At the same time, the average degree of National MPs has increased significantly (10(a)). When considered in light of the communities shown in 8 this analysis suggests a widening of political discourse in New Zealand with opposition parties talking about a greater number of topics, and the development of
16/28
tight knit communities mostly consisting of government MPs talking about a smaller range of topics.
Acknowledgments The authors would like to thank Te P¯ unaha Matatini for funding this project.
Author contributions The four authors designed the study, contributed to the analysis of the results and to the writing of the manuscript.
References 1. Titov I, McDonald R. Modeling online reviews with multi-grain topic models. InProceedings of the 17th international conference on World Wide Web 2008 Apr 21 (pp. 111-120). ACM. 2. Ramage D, Dumais ST, Liebling DJ. Characterizing microblogs with topic models. ICWSM. 2010 May 23;10:1-. 3. Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X. Comparing twitter and traditional media using topic models. InEuropean Conference on Information Retrieval 2011 Apr 18 (pp. 338-349). Springer, Berlin, Heidelberg. 4. Grimmer J. A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis. 2009 Dec 3;18(1):1-35. 5. Blei DM. Probabilistic topic models. Communications of the ACM. 2012 Apr 1;55(4):77-84. 6. Laver M, Benoit K, Garry J. Extracting policy positions from political texts using words as data. American Political Science Review. 2003 May;97(2):311-31.
17/28
7. Slapin JB, Proksch SO. A scaling model for estimating time series party positions from texts. American Journal of Political Science. 2008 Jul 1;52(3):705-22. 8. Laver M, Garry J. Estimating policy positions from political texts. American Journal of Political Science. 2000 Jul 1:619-34. 9. Guillaume JL, Latapy M. Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications. 2006 Nov 15;371(2):795-813. 10. Koskinen J, Edling C. Modelling the evolution of a bipartite network — Peer referral in interlocking directorates. Social Networks. 2012 Jul 31;34(3):309-22. 11. Wasserman S, Faust K. Social network analysis: Methods and applications. Cambridge university press; 1994 Nov 25. 12. Breiger RL. The duality of persons and groups. Social forces. 1974 Dec 1;53(2):181-90. 13. Newman ME. Mixing patterns in networks. Physical Review E. 2003 Feb 27;67(2):026126. 14. McPherson M, Smith-Lovin L, Cook JM. Birds of a feather: Homophily in social networks. Annual review of sociology. 2001 Aug;27(1):415-44. 15. Girvan M, Newman ME. Community structure in social and biological networks. Proceedings of the national academy of sciences. 2002 Jun 11;99(12):7821-6. 16. Newman ME. The structure and function of complex networks. SIAM review. 2003;45(2):167-256 17. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008 Oct 9;2008(10):P10008. 18. Fruchterman TM, Reingold EM. Graph drawing by force directed placement. Software: Practice and experience. 1991 Nov 1;21(11):1129-64
18/28
19. New Zealand Parliamentary Debates; 2002–2016. 20. Campbell JC, Hindle A, Stroulia E. Latent Dirichlet allocation: extracting topics from software engineering data. The art and science of analyzing software data. 2014 Jul 16;1. 21. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993-1022. 22. McCallum AK. Mallet: A machine learning for language toolkit. 23. Griffiths T. Gibbs sampling in the generative model of latent dirichlet allocation. 24. Zhou T, Ren J, Medo M, Zhang YC. Bipartite network projection and personal recommendation. Physical Review E. 2007 Oct 25;76(4):046115.
19/28
A
Supporting Information
Codes and names of Members of the Parliament Table A.1. Members of the Parliament and their respective codes in the networks. Codes that are missing in this table is due to the fact that some speakers don’t show in the network. This is either because they were invited speakers (not a MP) or because the MP had spoken too few words (below ten thousand words) during the entire term. Code
MP
Code
MP
1
Claudette Hauiti
118
Maurice Williamson
2
Larry Baldock
119
Doug Woolerton
3
Lesley Soper
120
Jian Yang
5
Keith Locke
121
Tariana Turia
6
John Hayes
122
Mahara Okeroa
7
Holly Walker
123
Gerry Brownlee
8
Pita Sharples
124
Craig McNair
9
Ian Mckelvie
125
Georgina Beyer
10
Ruth Dyson
126
David Shearer
11
Rod Donald
127
Marc Alexander
12
Paul Swain
128
Ron Mark
13
Jonathan Young
129
Jan Logie
14
Scott Simpson
130
Rodney Hide
15
Steve Chadwick
131
Nikki Kaye
16
David Clendon
132
Jacqui Dean
17
Pansy Wong
133
Cam Calder
18
Maggie Barry
134
Chester Borrows
19
Steven Joyce
135
Brendan Horan
20
Ashraf Choudhary
136
William Sio
21
Sue Moroney
137
Roger Sowry
22
Marian Hobbs
138
Nanaia Mahuta
23
Phil Heatley
139
Pita Paraone
24
Edwin Perry
140
Ken Shirley
25
Maryan Street
141
Chris Hipkins
20/28
26
Charles Chauvel
142
Roger Douglas
27
Rahui Katene
143
Allan Peachey
28
Michael Cullen
144
Alfred Ngaro
29
Georgina te Heuheu
145
Phil Twyford
30
Mike Sabin
146
John Tamihere
31
Richard Worth
147
Mike Ward
32
Sandra Goudie
148
Kate Wilkinson
33
Moana Mackey
149
Donna Awatere Huata
34
Andrew Little
150
Louise Upston
35
Julie-Anne Genter
151
Hone Harawira
36
Kevin Hague
152
Katherine Rich
37
Paul Foster-Bell
153
Paul Hutchison
38
Chris Tremain
154
Dave Hereora
39
Parekura Horomia
155
Judith Tizard
40
Nick Smith
156
Metiria Turei
41
Todd McClay
157
Jonathan Coleman
42
Sue Bradford
158
Simon Power
43
Damien O’Connor
159
Ann Hartley
44
Eugenie Sage
160
Stuart Nash
45
David Benson-Pope
161
Brent Catchpole
46
Asenati Lole-Taylor
162
Nathan Guy
47
Lindsay Tisch
163
Harry Duynhoven
48
Eric Roy
164
Denise Roche
49
Bernie Ogilvy
165
Kennedy Graham
50
Tau Henare
166
Lianne Dalziel
51
Richard Prosser
167
Helen Duncan
52
John Carter
168
Jim Peters
53
Jill Pettis
169
Andrew Williams
54
Lynda Scott
170
Carmel Sepuloni
55
Steve Maharey
171
Mark Gosche
56
Rajen Prasad
172
Catherine Delahunty
21/28
57
Nicky Wagner
173
Anne Tolley
58
Michael Woodhouse
174
Hekia Parata
59
Tim Macindoe
175
Jo Goodhew
60
Sam Lotu-Iiga
176
Te Ururoa Flavell
61
Richard Prebble
177
Matt Robson
62
Helen Clark
178
Simon Bridges
63
Tim Groser
179
Katrina Shanks
64
Amy Adams
180
Sue Kedgley
65
Rino Tirikatene
181
Taito Phillip Field
66
Darien Fenton
182
Bill Gudgeon
67
John Boscawen
183
Kelvin Davis
68
Bob Clarkson
184
Gareth Hughes
70
Gerrard Eckhoff
185
Trevor Mallard
71
Jonathan Hunt
186
Iain Lees-Galloway
72
Aaron Gilmore
187
Don Brash
73
Melissa Lee
188
Murray McCully
74
Tony Ryall
189
Pete Hodgson
75
Dianne Yates
190
Ross Robertson
76
Russel Norman
191
Jim Sutton
77
Poto Williams
192
Hilary Calvert
78
David Garrett
193
Jackie Blue
79
Phil Goff
194
Tracey Martin
80
Paul Quinn
195
Lynne Pillay
81
Heather Roy
196
Simon O’Connor
82
Martin Gallagher
197
Peter Dunne
83
David Clark
198
Lockwood Smith
84
Margaret Wilson
199
Darren Hughes
85
John Key
200
Chris Auchinvole
86
Grant Robertson
201
Rick Barker
87
Mark Burton
202
Winnie Laban
88
Clayton Cosgrove
203
George Hawkins
22/28
89
Jacinda Ardern
204
Brian Connell
90
Tim Barnett
205
Murray Smith
91
Clare Curran
206
Shane Ardern
92
Jim Anderton
207
Janet Mackey
93
Chris Carter
208
Mojo Mathers
94
Brendon Burns
209
Annette King
95
N´ andor T´anczos
210
David Cunliffe
96
Deborah Coddington
211
David Parker
97
Kris Faafoi
212
Megan Woods
98
Mita Ririnui
213
Paula Bennett
99
Russell Fairbrother
214
Jami-Lee Ross
100
Kenneth Wang
215
Mark Blumsky
101
Wayne Mapp
216
Barbara Stewart
102
Carol Beaumont
217
John Banks
103
Muriel Newman
218
Mark Peck
104
Dail Jones
219
Kanwaljit Singh Bakshi
105
David Carter
220
Joanne Hayes
106
Gordon Copeland
221
Judy Turner
107
Christopher Finlayson
222
Steffan Browning
108
Brian Donnelly
223
Paul Goldsmith
109
Winston Peters
224
Craig Foss
110
Raymond Huo
225
Judith Collins
111
Denis O’Rourke
226
Colin King
112
Meka Whaitiri
227
Shane Jones
113
Dover Samuels
228
Stephen Franks
114
Mark Mitchell
229
Jeanette Fitzsimons
115
David Bennett
230
Peter Brown
116
Louisa Wall
231
Graham Kelly
117
Paul Adams
23/28
Topics identified and keywords Table A.2. Topics identified and their respective key words from Mallet * Topics that were removed from the networks. ** Sometimes just part of M¯ aori words appear due to accent marks. Those are not readable in the format used in Mallet (e.g. ’wh¯anau’ would appear as ’wh’ only). Speeches in M¯ aori are translated to English, i.e. the corpus in this work contains these speeches in their translated versions. Therefore, this topic needed to be removed from the network. Topic
Key words
Budget
government tax budget percent money zealanders labour national billion zealand cuts income year rate cut million superannuation spending rates english
Canterbury
volume date sitting text incorporated bound christchurch house page important good support piece time back work call part earthquake canterbury
Crime
police crime victims law prison justice person criminal people order offenders offence corrections offences community system officers prisoners bill parole
Development
percent year years million government increase cost number increased costs rate time period significant level numbers increasing impact report total
Drugs
bill people alcohol issue young age harm problem drugs legislation law make drug evidence tobacco support dog volume control smoking
Economy
government zealand economy economic labour jobs zealanders growth country development business plan world investment businesses sector national future years research
Education
education school schools students government national zealand funding training early tertiary science system teachers minister childhood student research standards television
Employment
work workers people labour employment bill day employers working compensation time pay minimum government wage acc employees hours business relations
24/28
Environment
environment management government energy emissions change water environmental climate resource conservation green national electricity zealand scheme trading oil marine carbon
Finance
zealand financial companies company bank assets finance business investment owned capital commerce market state banks sector money reserve power fund
Foreign affairs
zealand united international country people world countries defence foreign human rights immigration states force government australia security nations pacific convention
Government
information public government commission report service security office commissioner chief minister inquiry agencies general department executive review ministry authority independent
Healthcare
health services care board king district people minister zealand public annette national ryall tony dr mental hospital government medical years
Housing
government housing building house people home homes state zealand property houses land problem auckland labour years national social affordable minister
Law and Order
law court rights legal justice case general act parliament made decision courts electoral evidence high cases judge matter person system
Legislation
bill act ensure provisions committee provide legislation regulatory review amendment regulations important current process time authority protection required regime move
Local ment
govern-
local council government auckland city community people board councils bill communities canterbury area members trust district regional environment central authorities
M¯ aori**
te ori ng ki iwi settlement ti treaty nei crown kia wh koutou land tou waitangi tau ora ko nau
25/28
Miscellaneous*
people things members back good thing put lot bit talk side sort time talking country fact thought heard absolutely happened
Political party*
party national labour members government john election vote parliament mr key leader house opposition policy political zealanders speech zealand don
Primary
indus-
tries
zealand industry trade agreement country free farmers food primary dairy export world industries products biosecurity production china important market agreements
Procedural
bill committee legislation select house process members time
terms*
reading support important parliament submissions issue made stage issues debate forward bills
Procedural
bill part act clause amendment legislation order paper supple-
terms*
mentary section minister amendments committee title provisions purpose definition page provision call
Procedural
mr member speaker order hon point house raise members mal-
terms*
lard trevor chairperson assistant peters robertson interruption question debate brownlee winston
Procedural
minister hon question prime dr answer mr david questions mem-
terms*
ber asked smith government table statement document made house leave rt
Rhetoric*
people great time day house years parliament world today life country proud acknowledge history nation work place rugby team hard
Rhetoric*
issues green issue party point view deal debate society future problem fact make things simply terms approach sense process gambling
Social welfare
children people families work family social child young support benefit parents youth women care working poverty welfare vulnerable government leave
26/28
Tax
bill tax system scheme pay revenue money people department make income loan interest student levy support fair costs small time
Transport
transport government road public national auckland roads minister zealand land hon vehicle rail car people money infrastructure williamson charges bridges
27/28
Time series of topic proportions All parties
Labour
Greens
0.07
0.09
0.07
0.05
0.05
0.06
0.05
0.02
0.02
0.03
0.02
0.00
Topic proportions
National
0.07
2005 2010 2015
0.00
0.00
2005 2010 2015
2005 2010 2015
0.00
0.07
0.07
0.07
0.07
0.05
0.05
0.05
0.05
0.02
0.02
0.02
0.02
0.00
2005 2010 2015
0.00
0.00
2005 2010 2015
2005 2010 2015
0.00
0.07
0.07
0.07
0.07
0.05
0.05
0.05
0.05
0.02
0.02
0.02
0.02
0.00
0.00
0.00
2005 2010 2015
2005 2010 2015
2005 2010 2015
0.00
0.07
0.07
0.07
0.07
0.05
0.05
0.05
0.05
0.02
0.02
0.02
0.02
0.00
2005 2010 2015
0.00
0.00
2005 2010 2015
2005 2010 2015
0.00
Legislation Political party Primary Industries Healthcare 2005 2010 2015
Budget Education Drugs Finance 2005 2010 2015
Social welfare Local government Employment 2005 2010 2015
Tax Government Crime 2005 2010 2015
Year
Fig A.1. Time evolution of topic proportions, as discussed at Parliament and its decomposition by party.
Histogram of number of words spoken by MP 20 15
47th
48th
49th
50th
Frequency of MPs
10 5 0 20 15 10 5 0
0
50
100
150
200
250
0
50
100
150
200
250
Words spoken (thousands) Fig A.2. Histogram of number of words spoken by MP during the four parliaments.
28/28