An Automated Method of Topic-Coding Legislative ... - CiteSeerX

An Automated Method of Topic-Coding Legislative Speech Over Time with Application to the 105th-108th U.S. Senate∗ Kevin M. Quinn†

Burt L. Monroe‡

Michael Colaresi§

Michael H. Crespin¶

Dragomir R. Radevk April 20, 2006

DRAFT– Please do not cite without permission. Abstract In this paper, we describe a method for statistical learning from speech documents that we apply to the Congressional Record in order to gain new insight into the dynamics of the political agenda. Prior efforts to evaluate the attention of elected representatives across topic areas have been expensive manual coding exercises and are generally circumscribed along one or more features of detail: limited time periods, high levels of temporal aggregation, coarse topical categories, and so on. Conversely, the Congressional Record has scarcely been used for such analyses, largely because it contains too much information to absorb. We describe here a method for inferring, through the patterns of word choice in each speech and the dynamics of word choice patterns across time, (a) what the topics of speeches are, and (b) the probability that attention will be paid to any given topic or set of topics over time. We use the model to examine the agenda in the United States Senate from 1997-2004, a database of over 70 thousand documents containing over 70 million words. We estimate the model for 42 topics and provide evidence that we can reveal speech topics that are both distinctive and inter-related in substantively meaningful ways. We demonstrate further that the dynamics our model gives us leverage into important questions about the dynamics of the political agenda.

∗

This research was supported by the U.S. National Science Foundation through grant BCS 05-27513. Department of Government and Institute for Quantitative Social Science, Harvard University, kevin [email protected] ‡ Department of Political Science, Michigan State University, and Center for Political Studies, University of Michigan, [email protected] § Department of Political Science, Michigan State University, [email protected] ¶ APSA Congressional Fellow and Department of Political Science, University of Georgia, [email protected] k School of Information, University of Michigan, [email protected] †

1

1

Introduction The verbatim records of democratic legislatures represent a source of untapped information of

singular importance for the study of democratic societies. In legislative records, we observe – as we can nowhere else – the deliberations of political elites about the issues of the day, covering time scales ranging from minutes to centuries. No other source – not election returns, not opinion polling, not newspapers, not legislation, not party manifestos – provides such a potentially rich and consistent source of information about the dynamics of democracy. These sources have remained largely untapped because of their sheer volume. A typical year of any legislative record includes tens of thousands of speeches, and tens of millions of spoken words. It is essentially impossible for a single individual to read, much less absorb or analyze in any way, the entire record of a given legislature as quickly as it is produced. Attempts to analyze The Congressional Record, for example, are generally limited to small segments of time or very small samples, requiring extensive and expensive manual coding efforts.1 Increasingly, however, such records are available in electronic form. This, with advances in statistical learning and computational linguistics (as well as computational hardware), opens up possibilities for putting such records to scientific use. In this paper, we seek to leverage these empirical opportunities to gain new insight into the dynamics of the political agenda. To what topics do elected representatives pay attention? How are policy areas interrelated (Katznelson and Lapinski, 2006)? How do leaders distribute their attention (Jones and Baumgartner, 2004)? What are the dynamics of this process? Is policy making incremental, or explosive, or both, characterized by punctuations (Wlezien, 1995; Baumgartner and Jones, 1993; Jones and Baumgartner, 2005)? How do shocks to the agenda reverberate across different topic areas? Under what circumstances do leaders push or follow public attention (Jacobs 1

For example, Hill and Hurley (2002) use two manual coders to place a sample of 2,000 Senate speeches from 1990 into three categories. Box-Steffensmeier et al. (1997), in a study that examines timing of position-taking announcements (including floor speech) on NAFTA during the 103rd Congress note that “Data on timing, to be adequate to the task, are collected at substantial cost” (p. 330). Maltzman and Sigelman (1996) use an electronic indexing system to estimate simply the number of speeches of one particular type during the 103rd Congress.

2

and Shapiro, 2000; Stimson et al., 1995)? To answer such questions, we must have a measure of how the attention of legislators is distributed across policy topics, political issues, and other matters that compete for their scarce time. In the study of American politics, the closest analog to our project here can be found in the measures developed by the Policy Agendas Project (Jones et al., N.D.), directed by Frank Baumgartner and Bryan Jones, and discussed in a long-running series of publications (most prominently Baumgartner and Jones (1993), Baumgartner and Jones (2002), and Jones and Baumgartner (2005)).2 They measure the policy agenda through a massive manual coding effort on Congressional bill introductions and hearings. (They measure the public agenda through the Gallup organization’s “Most Important Problem”, the systemic agenda through coding a sample of editorials from The New York Times, and policy itself through legislation and budgets.) The results are annually aggregated indicators of total and relative attention given in each of these arenas across 19 (and occasionally more) major topic areas.3 Legislative speech increases our ability to examine the public agenda at finer grain. Legislative speech can be evaluated at more rapid time scales (to the day or lower for some purposes), can be evaluated for groupings of legislators below the entire chamber (parties, state delegations, committee members, individuals), and – we argue – can capture fine and unexpected subtleties in the topology of the policy agenda itself. Moreover, legislative speech captures attention by elites to political issues that both do and do not ultimately translate into legislation. Further, our approach is relatively inexpensive and can be easily applied to study political speech in other legislatures and in other languages. We describe here a dynamic model of speech distribution across topics. As in a typical clustering problem, we assume that each speech delivered on a given day has some probability of being “about” 2

In comparative politics, the closest analog comes from the Political Manifestos Project (Budge et al., 2001). They use their extensive manual codings to place party programs on dimensional scales, and it is worth noting that this is also operationalized through proportion of statements on particular topics. Within our larger project (Monroe et al., 2005) the authors are, with others, developing comparable databases for other national parliaments and international assemblies. Such data also have application to scaling (ideal-point estimation) problems (Monroe and Maeda, 2004). 3 For some measures, these are further subdivided into over 200 minor topics.

3

one of K topics. We further assume that the distribution of attention is a dynamic process, related from one day to the next. We seek to determine, through the patterns of word choice in each document and the dynamics of word choice patterns across time, (a) what the topics of speeches are, and (b) the probability that attention will be paid to any given topic or set of topics over time. This bears closest resemblance to the latent Dirichlet allocation model of Blei et al. (2003), used for a variety of unsupervised classification problems. (One similar computational linguistics application is the identification of topics across abstracts in a particular scientific journal.) Ours varies from their model on several dimensions, but most notably in the use of dynamics. This paper is organized as follows. In Section 2 we develop a statistical model for legislative speech data. Of particular interest is the dynamic hierarchical prior distribution for the documents’ latent topic indicators. This portion of the model specification allows us to investigate the dynamics of legislative speech using well-developed time series tools (West and Harrison, 1997). The remainder of the paper deals with the application of our model to the spoken record of the 105th-108th U.S. Senate. In Section 3 we discuss our data sources and methods for turning raw textual data into a more easily analyzed format. We apply the model to these data in Section 4. We make the case that the topic-codings produced by the model agree with generally recognized issue areas and that the relationships between the issue areas are also congruent with a common-sense understanding of American politics. We go on to show how the model can yield new insights about the dynamics of issue attention. The final section concludes with our plans for the future.

2

A Model for Dynamic Multi-Topic Speech The data generating process that motivates our model is the following. On each day that

Congress is in session a legislator can make speeches. These speeches will be on one of a finite number K of topics. The probability that a randomly chosen speech from a particular day will be on a particular topic is assumed to vary smoothly over time. At a very coarse level, a speech can be thought of as a vector containing the frequencies of words in some vocabulary. These vectors of

4

word frequencies can be stacked together in a matrix whose number of rows is equal to the number of words in the vocabulary and whose number of columns is equal to the number of speeches. This matrix is our outcome variable. Our goal is to use the information in this matrix to make inferences about the topic probabilities and how they change over time as well as the topic membership of individual speeches. We begin by laying out the necessary notation. Let t = 1, . . . , T index time; d = 1, . . . , D index speech documents; k = 1, . . . , K index possible topics that a document can be on; and w = 1, . . . , W index words in the vocabulary. For reasons that will be more clear later, we also introduce the function s : {1, . . . , D} → {1, . . . , T }. s(d) tells us the time period in which document d was put into the Congressional Record. In addition, let ∆N denote the N -dimensional simplex.

2.1

The Sampling Density

The dth document yd is a W -vector of non-negative integers. The wth element of yd , denoted ydw , gives the number of times word w was used in document d. We consider two equivalent probability models for yd . First, if yd is generated from topic k we could assume that ydw ∼ Poisson(exp(αd + βkw )). Here αd is related to the overall length, in total number of words, of document d while βkw captures the log odds of word w relative to some baseline. Equivalently, we can condition on the total number nd of words in each document. This generates a multinomial distribution for yd : yd ∼ Multinomial(nd , θ k ) Here θ k ∈ ∆W −1 is the vector of multinomial probabilities with typical element θkw . In what follows, we condition on nd for all d and use the multinomial sampling density. This allows us to fit the resulting model more quickly. For purposes of interpretation, we will at some points below

5

make use of the transformation 0 θk2 θkW θk1 − c , log − c , . . . , log −c βk = log θk1 θk1 θk1 P θkw where c = W −1 W w=1 log θk1 . If we let πtk denote the marginal probabilities that a randomly chosen document is generated from topic k in time period t we can write the sampling density for all of the observed documents as: p(Y|π, θ) ∝

D X K Y

πs(d)k

W Y

ydw θkw

w=1

d=1 k=1

As will become apparent later, it will be useful to write this sampling density in terms of latent data z1 , . . . , zD . Here zd is a K-vector with element zdk equal to 1 if document d was generated from topic k and 0 otherwise. If we could observe z1 , . . . , zD we could write the sampling density above as: p(Y, Z|π, θ) ∝

D Y K Y

πs(d)k

d=1 k=1

2.2

W Y

!zdk ydw θkw

w=1

The Prior Specification

To complete a Bayesian specification of this model we need to determine prior distributions for θ and π. We assume a conjugate Dirichlet prior for θ. More specifically, we assume θ k ∼ Dirichlet(λk ) k = 1, . . . , K For the data analysis below we assume that λkw = 1.01 for all k and w. The prior for π is more complicated. Let π t ∈ ∆K−1 denote the vector of topic probabilities at time t. The model assumes that a priori

zd ∼ Multinomial(1, π s(d) ) We reparameterize to work with the unconstrained ωt =

log

πt(K−1) 0 πt1 , . . . , log πtK πtK 6

In order to capture dynamics in π t and to borrow strength from neighboring time periods, we assume that ω t follows a Dynamic Linear Model (DLM) (West and Harrison, 1997; Cargnoni et al., 1997; Martin and Quinn, 2002). Specifically,

ω t = F0t η t + t η t = Gt η t−1 + δ t

t ∼ N (0, Vt ) t = 1, . . . , T

(1)

δ t ∼ N (0, Wt ) t = 1, . . . , T

(2)

Here Equation 1 acts as the observation equation and Equation 2 acts as the evolution equation. We finish this prior off by assuming prior distributions for Vt , Wt , and η 0 . Specifically, we assume Wt = W for all t and Vt = V for all t in which Congress was in session with V and W both diagonal and Vii ∼ InvGamma(a0 /2, b0 /2) ∀i and Wii ∼ InvGamma(c0 /2, d0 /2) ∀i We assume η 0 ∼ N (m0 , C0 ) For days in which Congress was not in session we assume that Vt = 10I. We have found that this helps prevent oversmoothing. Note that what we have done is to specify a hierarchical prior that factorizes as: p(z, ω, η, Vt , Wt ) = p(z|ω)p(ω|η, Vt )p(η|Wt , m0 , C0 )p(Vt )p(Wt ) In what follows we specify Ft and Gt as: Ft = Gt =

IK−1 0K−1

t = 1, . . . , T

IK−1 IK−1 0K−1 IK−1

This produces a local linear trend model for ω t . 7

t = 1, . . . , T.

While we adopt a fairly simple model for the dynamics in the Senate data, the DLM framework that we make use of is extremely general. By properly choosing Ft , Gt and forms for Vt and Wt a very wide range of dynamics an be captured. It is worth noting that, viewed as a clustering / classification procedure, the model above is designed for unsupervised clustering. At no point does the user pre-tag documents as belonging to certain topics. As we will demonstrate below in the context of Senate speech data, our model, despite not using user-supplied information about the nature of the topics, produces topic labelings that adhere closely to generally recognized issue areas. While perhaps the greatest strength of our method is the fact that it can be used without any manual coding of documents it can also be used in semi-supervised fashion by constraining some elements of Z to be 0 and 1 in order to force particular documents into particular topic categories. Appendix A presents details of the Expectation Conditional Maximization (ECM) algorithm used to fit this model.

3

Data and Measurement The data used here are drawn from the United States Congressional Speech corpus (Monroe

et al., 2006) developed under the Dynamics of Political Rhetoric and Political Representation Project (Monroe et al., 2005). The input data used here consist of frequency counts of word stems in Senate speech from the 105th-108th Congresses (1997-2004). These are generated through several steps. The raw source for the data is the electronic version of the (public domain) United States Congressional Record served by the Library of Congress on its THOMAS system (The Library of Congress, N.D.) and generated by the Government Publishing Office (The United States Government Printing Office, N.D.). The Congressional corpus covers both chambers for the time period from the 101st Congress (1988-) to the present; the current paper uses the subset of these data from the 105th-108th Senates only. The basic units of the electronic version of the Congressional Record are “html documents”.

8

The Record is subsetted into three major components, one for the Senate and two for the House of Representatives (one of which covers “extensions”, or speeches inserted into the record without being given on the floor). These are subsetted by day. Days are subsetted into the html documents, corresponding roughly to titled subsections in the printed Record. This subsetting process is not regularized and html documents can contain zero, one, or several speakers, discussing one or more items or topics. There are roughly 9,000 of these documents per year in the Senate; there are 71,181 documents from 1997-2004 used here. We then parse these html tokens to create a tagged XML version. Each paragraph is tagged as speech or nonspeech, with speeches further tagged by a speaker id. An excerpt is shown in Figure 1. Unlike some parliamentary records (such as the Australian Hansard), the Congressional Record is not already tagged in this way, so it is a fairly elaborate and time-consuming process to identify patterns that indicate the start and stop of speech and that identify the speaker. Changes in formatting over time, misspellings, typographic errors, and similar inconsistencies are surprisingly common. For the present paper, we define our unit of analysis as a speech document. A single speech observation consists of all text tagged as speech with the same speaker within a single html document. This operationalization will adjoin in one speech any parts separated by the intervention of another speaker (as with the question in Figure 1) and will not recognize as separate any distinct speeches by the same person within the same html document. It would also separate any speech that continued across two html documents, although we are unaware of any examples of this beyond procedural speakers and leaders in very long debates which are often split into parts. There are roughly 15,000 such speeches within a given year of the Senate; our data here are based on 118,065 such speech documents from 1997-2004. Each speech is then stripped of capitalization and punctuation following a tuned set of rules (details in Monroe et al. (2006)) that allows some non-alphanumeric information to remain (e.g., don’t, 9/11, $17). The resulting strings of words are then stemmed using the Porter’s Snowball II 9

stemmer (for English) (Porter, 1980, N.D). This attempts to reduce word forms into common stems. Porter’s stemmers are aggressive and, although the algorithm allows for user-tuned exceptions, we have made no attempt to identify any. As a result, the reader may note a few oddities in the tables below. For example, IRS has been lower-cased to irs and then “stemmed” to the “singular” ir. For our purposes, the instances where this actually merges two words that should not be merged, e.g., his and hi, appear to be vastly outweighed by the increased leverage from merging words by stem. We then count the stems in each speech. These stem counts per speech are the inputs to the inferential model. The vast majority of unique stems are infrequently used. Such stems provide little information to the model while increasing the computational demands. For our purposes here, we filter out any stems that occur in fewer than 0.05% of speeches. The entire 16-year Senate corpus contains roughly 240,000 unique stems; our eight-year period from 1997-2004 contains 154,351 unique stems. After filtering, our estimations here are based on 3,807 unique stems with a total of 73,990,638 stem observations.

4

Results from 105th-108th U.S. Senate We fit numerous specifications of the model outlined in Section 2 to the 105th-108th Senate

data. In particular, we allowed the number of topics K to vary from 3 to 60. For each specification of K we fit several models using different starting values. Mixture models, such as that used here, typically exhibit a likelihood surface that is multimodal. Since the ECM algorithm used to fit the model is only guaranteed to converge to a local mode, it is typically a good idea to use several starting values in order to increase one’s chances of finding the global optimum. As K moved up to about 20 the rough substantive outlines of the clusters began to take shape with clusters that corresponded to topics such as judicial politics, education, the military, health care, etc. However, until we moved K up to about 40 there appeared to some lumping together of categories that made interpretation difficult. The results we present below are based on the 42 topic fit with the highest maximized log posterior.

10

4.1

Interpreting the Topics

We decided on substantive labels for the clusters after examining β k and also reading a modest number of documents that were assigned to a high probability of being on topic k for k = 1, . . . , K. We discuss each of these procedures in turn. In order to get a sense of what words tended to distinguish documents on a given topic k from documents on other topics we examined both the magnitude of βkw for each word w as well as the weighted distance of βkw from the center of β−kw . The former provides a measure of how often word w was used in topic k documents relative to other words in topic k documents. A large positive value of βkw means that word w appeared quite often in topic k documents. The weighted distance of βkw from the center of the β−kw , which we operationalize as: βkw − median (βjw ) rkw =

j6=k

MAD (βjw )

,

j6=k

provides a measure of how distinctive the usage of word w is on topic k documents compared to other documents. To take an example, the word “the” always has a very high β value as it is very frequently used. However, it is used roughly similarly across all of the topics so its value of r is quite close to 0. We combine these measures by ranking the elements of β k and rk and adding the ranks. This combined index gives us one measure of how distinctive word w is for identifying documents on topic k. Appendix B presents 42 tables (one for each topic) of key words on each topic sorted by the index described above. Inspection of these tables produced rough descriptive labels for all of the clusters. After arriving at these rough labels we went on to read a number of speech documents that were assigned to each cluster. In general we found that the information in the tables in Appendix B did a very good job describing the documents assigned to each topic. However, by reading the documents we were able to discover some nuances that may not have been apparent in the tables of β values. In general, we found that documents within a topic were very homogeneous with respect to issue(s) being discussed. 11

Figures 2 and 3 present excerpts from speech documents that had high (over 0.9999) probabilities of being on what we label as ‘Law and Crime 1 [Violence]’ and ‘Education’ topics. Reading these excerpts we see that these speech documents do appear to be on the topics of crime and education respectively. Senator Murray’s speech is particularly interesting because it uses words such as “terrorism”, and “medical” and “psychological” that might seem out of place in an education speech. Yet, because our model takes into account the overall pattern of word usage in a document we still correctly classify this document as an education speech. It is worth noting that this technique will extract information about the mode or centroid of the cluster’s meaning and lexical use. There will be speeches that do not fall comfortably into any category, but which are rare enough not to demand their own cluster. Reading some of the raw documents also revealed some additional meaning behind the clusters. For instance, two of the clusters originally labelled as ‘Procedure’ clusters turn out to be composed almost exclusively of pro forma “hobby horse” statements by Senator Jesse Helms about the current level of national debt, and by Senator Gordon Smith about the need for hate crime legislation. The descriptive names of the clusters along with the percentage of documents and words in each cluster appear in the terminal nodes of the dendrogram in Figure 4. As we see here a large fraction of speech documents are procedural statements. It is worth noting that many of these procedural speech documents are composed of only a few sentences. A high document frequency for some topic does not necessarily imply that a large number of words were spoken on that topic. We discuss the hierarchical structure among the topics in the next subsection.

4.2

The Substantive Structure of the Rhetorical Agenda

The β k vector not only helps identify the internal meaning of each topic (at its centroid), but provides a means of identifying how the topics relate to one another. The face validity of these relationships provides further support for the claim that we are identifying topics with coherent substantive meaning. Moreover, this informs efforts to identify subtopics of larger topics.4 Perhaps 4

This is especially useful given the relatively arbitrary modelling choice of K.

12

more important, this can inform literatures about the theoretical relationships among policy areas and the proper empirical approaches to policy and issue coding. In order to get some sense of how the 42 topics related to each other we performed agglomerative clustering of β 1 , . . . , β 42 . Agglomerative clustering begins by assigning each of the 42 vectors to its own unique cluster. The two vectors that are closest to each other are then merged to form a new cluster. This process is repeated until all vectors are merged into a single cluster. Reading Figure 4 from the bottom up provides information about which clusters were merged first (those merged at the lowest height). We calculated the Euclidean distance between each of these vectors and then joined clusters to minimize the maximum distance between the vectors in the clusters. As noted above, the terminal nodes in this figure give our descriptive labels for the 42 topics along with the percentage of speech documents and words that are assigned to each topic.5 Reading the figure from the bottom up we see that topics that share a penultimate node share a substantive or stylistic link. Some of these are obvious topical connections, such as between the two health economics clusters or between energy and environmental regulation. Some are more subtle topical connections. For example, the “Environment 1” category, which is dominated by issues related to management and conservation of public lands and water, and the “Commerce / Transportation / Telecom” category are related through the common reference to distributive public works spending. Note common key words “project” and “area.” The “Banking and Finance” category and the “Labor 1” category discuss different aspects of economic regulation and intervention, the former with corporations and consumers, the latter with labor markets. Others are related by linguistic style as well. The symbolic categories, for example, all have “great”, “proud”, and “his” as keywords. We can also read Figure 4 from the top down to get a sense of whether there are substantively recognizable metaclusters of topics. Reading from the top down we see that the first major split divides procedural speech from non-procedural speech. Non-procedural speech then divides into substantive speech about bills and proposals on the one hand and symbolic and hobby horse speech 5 It is worth noting that the topic labels were assigned before any hierarchical clustering of the β vectors was done. We are thus confident that the structure represented in Figure 4 is real structure in the data.

13

on the other. The more substantive branch then divides into topics that deal with conceptual Constitutional rights issues or more explicit policy areas that require Congress to appropriate funds, enact regulations, and so on. Within policy areas, we see further clear breakdowns into domestic and international policy, with domestic itself further clumped into fairly clear groupings around social policy, macroeconomic policy, economic regulation, all-state distributive / pork barrel issues, and West-East/Rural-Urban issues. It is also informative to note how some issues are separated in perhaps unexpected ways. Health policy, for example is split into four categories. Speeches that emphasize the economic aspects of health care (costs of Medicare, prescription drugs, insurance, provision, etc.) are placed near other economic categories. Speeches that emphasize the medical aspects of health care (prevention, research, regulation, suffering) are placed within the other social policy categories. Abortion is placed with the (highly partisan) constitutional grouping. Similarly, military topics can be found in the form of discussion of war under the defense / international grouping, but also in the distributive grouping (e.g., base closures, homeland defense, VA hospitals) and in the symbolic grouping (e.g., praising soldiers killed in service, funding the WWII Memorial). There is, in fact, an ongoing debate about the underlying structure of public policy, which has implications both for how we code policy topics in empirical analysis (Baumgartner and Jones, 2002) and how we understand policy more generally (Katznelson and Lapinski, 2006). Clausen (1973) is one of the earliest to propose a coding scheme – his had a five-tiered hierarchical structure – and others simply reflect the organization inherent in the U.S. Code (Heitschusen and Young, 2006) or the Congressional Committee system. The current standard, which comes close to reflecting committee structure, is that of Baumgartner and Jones. Their system has evolved a bit over the last decade, and currently identifies 19 major topic areas with 225 hierarchically nested subtopics. In at least one place (Jones and Baumgartner, 2005, , 240), they identify six “supercategories” above these, consisting of “economics”, “human services and law”, “infrastructure and environment”, “agriculture and public lands”, “de14

fense, international affairs, and foreign trade”, and “government operations”. Most of their major categories appear as one or more of our 42 clusters. One (Housing and Community Development) does not appear at all.6 A few are too small to appear in our 42 cluster model (e.g., Science and Technology appears largely as telecommunications and is lumped with other commerce infrastructure). More interesting are the three categories – Civil Rights / Liberties, Law / Crime / Family, and Labor / Employment / Immigration – that in our model are grouped differently. Neither way of grouping these is wrong – these issues are, of course, related to one another in both ways. Relationships in the language used to discuss them, however, certainly suggest relationships in the politics surrounding them. Our implicit meta-clustering differs in more substantial ways from theirs. Defense has identifiable international and domestic components. Health has identifiable economic and social components. Foreign trade is a domestic topic, more closely related in its political language to agriculture than to foreign affairs. Katznelson and Lapinski (2006) provide a very recent challenge to the Baumgartner and Jones typology. They provide a three-tiered typology with four Tier 1 categories7 , 14 Tier 2 categories, and 69 Tier 3 categories, intended to be comprehensive and mutually exclusive. Since they aim to encompass all of American history, we obviously do not find examples covering all of their Tier 3 or even Tier 2 categories. Most of our categories, however, do correspond fairly closely to a single Tier 3 category and their hierarchical structure is similar to ours. There are again, however, discrepancies in things like defense (we find domestic and international), trade (domestic rather than international), health (social and economic), and crime.8 It is worth noting that the model allows for an automated “cross-walk” between any other coding scheme and ours. Any external document delivered at a covered point in time (a speech, 6

This is probably a function of using the Senate rather than the House, as well as the time period. Actually seven, with three being minor unique categories (District of Columbia, housekeeping, and “quasiprivate”). 8 We note in passing that this can also be compared with comparative approaches, such as the coding scheme in the Comparative Manifestos Project (Budge et al., 2001). 7

15

a bill, an article, a survey response) can be, post hoc, placed into its most likely category based on its word stem frequencies and the model parameters for that date. We can then take any set of documents coded by another scheme (manually or otherwise), code them by ours, and calculate exactly how categories from the two schemes line up. We have not pursued this here. In general, the relationships between the clusters we identify seem to square well with how we think about policy debate in American politics. We emphasize that none of this structure was built into our statistical model. We do not assume a priori that certain words are more likely to be in certain clusters and we do not force a hierarchical structure onto θ 1 , . . . , θ K (or equivalently β 1 , . . . , β K ). The structure that appears in Figure 4 is based entirely on patterns of word usage by Senators from 1997 to 2004.

4.3

Policy and Issue Dynamics

This approach provides unique leverage on a number of literatures concerned with the dynamics of issue debate and policy agendas. The data are extremely rich and cannot be fully explored in this space, so we provide a few examples of what is possible. 4.3.1

Descriptive Summaries of Agenda Dynamics

First and foremost, our model estimates a number of quantities of interest directly that provide useful descriptive summaries of the otherwise overwhelmingly large Congressional Record. Several interesting examples can be found in our “Constitutional” metacategory. Consider the ‘Judicial Nominations’ category. We show in Figure 5 the estimated probability that, on any given day, a document will be in this category. The uptick in this probability over time presents some evidence that the nomination process has been increasingly discussed from 1997 to 2004. In general, the fraction of speeches on ‘Judicial Nominations’ appears to increase just before the 2002 election when it looks likely that the Republicans will retake the Senate and thus allow President Bush more freedom to appoint conservative jurists. It continues to grow and experience outbursts of speech (and partisan rancor) as Senate Democrats exercise their one remaining piece

16

of power, filibustering judicial nominations. In fact, the record holding debate in our data – the most words on one issue in a single “day” of the Congressional Record – was on this topic and actually spread over two days in the real world. This was the November 12, 2003 “Justice for Judges” marathon, in which Senator Rick Santorum organized a 30-hour debate (that went for 39 hours) on four federal judge nominations, attempting unsuccessfully to overcome a Democratic filibuster and get “up-or-down votes.” Our data show 87 speeches and over 230,000 words spoken in the Judicial Nominations category on November 12. The other major judicial topic, ‘Supreme Court / Constitutional’, exhibits very different dynamics. These can be seen in Figure 6. This series contains the second largest topical outburst in our data: the conclusion of the Clinton impeachment trial (February 12, 1999, 78 speeches, 171,000 words). This series also includes a number of other constitutional issues related to the Supreme Court, the Department of Justice and Attorney General, and possible constitutional amendments. For example, the next largest spikes appear in January 2001 (Ashcroft nomination) and July 2004 (Federal Marriage Act). Figure 7 displays the time plot of π Abortion – the marginal probability of seeing a document in the abortion category. Here we see that attention to abortion in the Senate has waned from the late 1990s to 2004. In the 105th Senate, there were instances where approximately 6% of the documents were on the ‘abortion’ topic. Aside from a single moderate spike in March 2003, the amount of abortion rhetoric during the Bush presidency is generally quite low and stable around 1%. We suspect that Senate Republicans introduced the abortion issue more frequently during the Clinton presidency as a largely symbolic device. It certainly suggests a bit of “stickiness” to the abortion issue – a tendency to stay on or off the agenda for long stretches of time, an issue we explore in the next section.

17

4.3.2

Evaluating Subannual Agenda Dynamics

Dynamics are a central concern for scholars of the policy process. Baumgartner and Jones (1993, 2002) focus on policy dynamics as the unifying thread tying together the policy literature. The key to understanding the policy process is understanding the variety of mechanisms by which policy agendas come to experience trends, stochastic shocks, positive and negative feedback. This motivated their massive multidecade Policy Agendas data project, with which they and many others have examined dynamics occurring at an annual level over the postwar era. While informing long-term concerns, annual data mask important political dynamics underlying policy debate and policy change at smaller time scales, including electoral cycles, budgetary cycles, seasonality, response to events, temporal causality among institutions, different policy arenas (e.g., the president, Congress, media, public opinion), and policy itself. We cannot examine all of these here, but as a first step it is simple to demonstrate that we can illuminate subannual dynamics in the rhetorical agenda that (a) are of interest, and (b) are beyond the reach of other approaches. A first concern is to examine and compare the dynamics of the relative distribution of attention to each topic area. Roughly speaking, we evaluate our posterior estimate of the (log-odds) of words in documents we have classified in a particular category as a univariate time series using conventional methods. For the daily data, we find that an AR-7 (autoregressive lags up to 7 days) with monthly dummy variables, captures most of the relevant daily dynamic. Once estimated, each series can be interpreted for its persistence and volatility.9 9 It is worth noting that this is “not quite right” in a variety of ways, at least five of which could be important. The first is that the data here are the output of our other model and do not have their noisiness properly incorporated here. This is less problematic for the clearest topic categories and groupings of categories, where few documents are misclassified in any systematic way. The second is that we impose dynamic structure in the original measurement model, and then reestimate dynamics with the AR model. The original structure is so weak that we do not think it has negative impact, but this requires further examination. We could address this by attempting to estimate the more elaborate dynamic structure within the topic-detection model itself. The third is that the data are inherently compositional. Failure to incorporate all regressors in all models – most notably the lags of all categories on all categories – biases our results. This is probably slight for 42 categories, but a more proper model would be the compositional VAR described by Brandt et al. (1999). Fourth, even with the logratio transformation, the errors are not Gaussian. There is clearly zero-inflation, an increased probability that a topic will not be addressed at all on any given day. Finally, the Box-Jenkins methods themselves are a matter of art and we have not fully explored all specifications. None of these should have dramatic impact on the broad qualitative observations we make here. More to the point, our main purpose here is to demonstrate what is possible. Scholars who would like to try a different approach to

18

Consider first the issue of persistence. In our data, this reflects the “stickiness” of surprises, or memory, in particular issue areas. That is, once something causes a topic to be discussed more (or less) than expected on a particular day – to be disturbed from its equilibrium – how long does it take to return to equilibrium? This type of notion is fundamental to Baumgartner and Jones’ notions of punctuated equilibrium in the policy agenda, as well as the theoretical alternatives (e.g., incrementalism, issue cycles, garbage can processes). Figure 8 shows this phenomenon, as derived from the AR-7 model on words, for a selection of topics. Generally, shocks in the rhetorical agenda dissipate away from the system within two-three weeks (something that would, of course, be impossible to discern from annual data). Some issues have essentially no memory whatsoever. The most extreme example in our data is the “Symbolic [Sports]” category. These speeches are made to congratulate home-state sports champions. No one stands up the next day to rebut or get equal time. As shown in Figure 8 , shocks in this series are completely ephemeral, disappearing completely by the next day. There are more substantive ephemera as well. The topic dealing with violent crime is a good example. Shocks in attention here are largely speeches about events, comments on the news. They do not lead to sustained discussion, however, losing 95% of attention in a day and disappearing a week later. Other agenda items have much more persistence. The extreme examples in our data are the two one-person hobby horse categories: Jesse Helms’ almost daily speech on the debt and deficit until his retirement and Gordon Smith’s almost daily speech on hate crime starting in 2001. The next most persistent topic is abortion. This likely reflects at least three factors: polarization (speeches on one side draw response on the other), intrapartisan unity (party leaders can move blocs of speakers to talk (or not talk) about the issue at once), and electoral cycles (it receives sustained attention in election runups). Many categories are persistent because circumstances demand the Senate’s sustained attention in some institutional role (e.g., judicial nominations, impeachment, specification, now have the data and tools to do so.

19

war), or because there is a sustained debate about a major policy proposal (e.g., tax reform, intelligence reform, bankruptcy reform, education reform, prescription drug coverage, campaign finance reform). The second major dimension to dynamics is the structural volatility of each topic. Simply put, how variable is the topic – how predictable is discussion of the topic from day to day? We can summarize this as typical error in the one-day ahead forecast from our AR-7 model. Figure 9 shows the persistence and volatility of all 42 topic areas. In the upper left, we see ephemeral commentary on unpredictable topics (sports, crime, health care topics of the day). In the lower right, we see topics that appear and disappear predictably, and stay on the agenda when they do (highly partisan and electoral issues). In the upper right, we see the issues that can surprise, and stick on the agenda when they do. This includes major policy proposals (No Child Left Behind in education), major economic shocks (employment), and major world events (9/11 followed by intelligence reform and war). In the lower left, we see the issues that have predictable and brief bouts of attention. Here we find technocratic policy areas that require Congressional attention, either at a steady rate or on an annual cycle, but have low (partisan) political salience during the period: e.g., wildlife conservation, internet regulation, human rights. 4.3.3

Intervention Analysis: 9/11

Similarly, we can examine the short- and long-term impact of particular events. The most obvious example in our data is 9/11. This is an event that occurred on a single day, and which we know had both immediate and long-term effects on both rhetoric and policy in American politics. We can incorporate such “interventions” easily into models like the one used in the previous section and get direct measures of the rhetorical impacts. This is a sharp test of face validity for the model – if 9/11 effects, of all things, are indiscernible or implausible, we have little reason to expect we will find anything more subtle. Note also that such effects would be impossible to detect without detailed data at the daily level.

20

The left two columns of Table 1 summarize the results of this analysis. The Senate did not convene on 9/11. When it convened on September 12, the “surprises” in rhetoric – those that were difficult to predict on September 10 – are placed in the “international affairs / arms control” and “symbolic – military” categories. This partially reflects the uniqueness of the speeches in the immediate aftermath. Not being plentiful enough across the entire dataset, they do not form a category of their own but are placed with the closest topical mode. Over the full eight years, the symbolic military category is dominated by speeches honoring fallen soldiers. On September 12, the speeches honoring policemen, firemen, and other rescue workers are in this category. The arms control category is dominated, over the full eight years, by discussion of weapons of mass destruction, nuclear nonproliferation, international military organizations (e.g., NATO), and similar topics. Speeches from September 12 that declared the 9/11 attacks an act of war (but from an as yet unknown assailant) – use of force / attacks by someone else – are placed by the model in this category. Also increasing that day, but not reaching statistical significance, were speeches categorized in the other international affairs category, in defense (use of force), in armed forces infrastructure, and in intelligence, as well as in other symbolic categories. There is also more procedural speech reflecting the difficulty in arranging sufficient time for everyone who wanted to speak. The long-term effects (Table 1) include increases in the expected war / military / terrorism categories: use of force, intelligence, and the three military categories (manpower, infrastructure, and symbolic). There are also spurious effects, typical in such models. For example, Gordon Smith started his hate crime speeches in April of 2001 and continued through the end of the period. So, while the equilibrium level of such speeches for 9/12/2001-2004 is indeed higher than the equilibrium level for 1997-9/11/2001, it doesn’t have anything do with the attacks. Keeping in mind this caveat (that all apparent effects could be related to other structural changes with a similar time pattern, especially Clinton-Bush and economic softening), the long-term post-9/11 equilibrium seems to involve significantly less attention to a few social issues (e.g., child protection, education) as well as 21

a few of the Republican wedge issues used in opposition to Clinton (e.g., abortion, constitutional). To the extent something might reflect a Clinton-Bush change, rather than a post-9/11 change, we can separate anything that was discernible in the nine months starting the Bush term. This is difficult, however, since this period also includes the party switch of Senator Jeffords. This altered majority control of the Senate and also presumably altered the pattern of rhetoric. Nonetheless, if we include a Bush regime variable as well, we can separate out some pre-9/11 agenda effects due to the presidency change from those that followed 9/11. The second half of Table 1 summarizes these results. Here we see two substantive categories considered to have significantly increased attention in the first year of the Bush presidency, but prior to 9/11: energy and military personnel. Moreover, there is significant decreased attention to the two wedge issues (abortion and constitution), to use of force, as well as to child protection and agriculture. This brings the true post-9/11 effects – especially increased attention to use of force, intelligence, symbolic military speech – into sharper relief. Some of these estimated effects are large enough to be discerned through the descriptive plots. Figure 10 shows the probabilities of randomly chosen documents being on one of three foreign policy categories– ’Use of Force’, ‘Arms Control’, and ‘Diplomacy / Human Rights’. Prior to October 2002 (when Congress authorized the use of force in Iraq) Senate speech was roughly evenly split between these three topics. After this point, the overwhelming majority of Senate speech on these three topics falls into the ‘Use of Force’ category. In fact, the debate authorizing the use of force in Iraq is the third largest topic outburst in our data (October 10, 2002, 55 speeches, 122,000 words). While it shouldn’t be surprising that this increase occurred, it is a bit more interesting to note that the attention paid to the other two categories diminishes somewhat during this same time. While the Iraq war does seem to be related to a change in issue attention by Senators, the war in Afghanistan does not seem to have any clear relationship with the frequency of foreign policy speech on the Senate floor. This may have to do with the relative lack of controversy surrounding the Afghan campaign or may simply reflect the differing amounts of military and economic resources 22

committed to each campaign. It is also interesting to see how π Symbolic−M ilitary tracks with π Def ense−U seof F orce . We plot both series from August 2001 to May 2003 in Figure 11. This plot seems generally consistent with what is known of the lead-up to the Iraq war. As noted above, we see a large increase in symbolic speech immediately after 9/11. At this point there is very little talk of foreign intervention. Talk of intervention begins to increase a bit in spring 2002 and then there is large upswing in October 2002 when Congress authorized President Bush to use force against Iraq. Things are relatively quiet until February when, as invasion appears imminent, the amount of Senate speech on foreign intervention and Iraq increases. During combat operations in March 2003, talk of military intervention falls off and related speech becomes dominated by symbolic tributes to soldiers and the military. 4.3.4

Macro-Representation in Rhetoric

Students of representation are interested, in part, in the extent to which elected officials follow or lead their citizens (Stimson et al., 1995; Jacobs and Shapiro, 2000). The annual data of Baumgartner and Jones, for example, are unlikely to pick up temporal change at time-scales appropriate to this question (or, more accurately, cannot be used to assess what time scale is appropriate for this question) and are therefore limited in application largely to measures of simultaneous “congruence” in policy attention (Jones and Baumgartner, 2005). Stimson et al. (1995), on the other hand, have finer time scales, but only highly aggregated measures of legislative and citizen preference that cannot be disaggregated into, say, attention across policy topics. Our data are fine enough, across time and policy, to evaluate such things and provide important complements to these prior approaches. As a first exploration, we examine leadership/followership of legislative rhetoric with issue salience, as measured by the Gallup “most important problem” question. The Policy Agendas Project has collected these data and coded answers according to their 19+ topic coding schema.10 10

We are grateful to Frank Baumgartner for providing these data and Amber Boydstun for help in adapting them to our purposes.

23

These are available almost monthly for our time period and are smooth enough to allow for reasonable interpolation across missing months. In turn we aggregate our measures up to the monthly level to allow for direct comparison. To gain insight into the question, we examine the “Granger causality” between pairs of MIPpoll series and attention in relevant rhetorical categories. This technique examines two univariate time series to determine the extent to which the first can be used to reliably forecast the second, the second to forecast the first, both, or neither. If one series can forecast a second, and not vice versa, it means that movements in the first reliably precede those of the second. This is probably a necessary condition for the first to be causally prior to the second. But it is certainly not sufficient. Both may be caused by a third phenomenon, with the first simply responding more quickly than the second. Furthermore, the first may simply anticipate the second, preceding it without causing it. Consider first foreign policy. Figure 13 illustrates the series used with a few Defense/International categories of MIP and rhetoric.11 Table 2 provides the results of the tests. We find that, over this time period, legislative speech in all categories related to the military and terrorism does in fact lead / anticipate the public salience of foreign policy. There is minimal evidence of causality in the nonmilitary aspects of foreign policy, including trade (consistent with our earlier finding that rhetoric about trade is based more on domestic economic implications than on foreign relations) and minimal evidence of the public leading the Senate in any of these categories. In short, it seems very likely that the public calls for attention to defense / international issues after elites are already paying attention to them. This could be because elite attention raises public attention in this area. It could also be that presidential priorities or the external realities of war, bombing, and the like receive swifter attention from the Senate than the public. It could even be that the Senate pays attention quickly to such issues because they know the public will soon demand they do so. In any case, it seems very unlikely that the Senate pays attention to defense 11 The MIP variable is based on all responses coded by Jones and Baumgartner as “International”, “Defense”, or “Trade”.

24

and international affairs only after it becomes salient in public opinion. At a lower level of attention, we find almost exactly the same pattern in crime. The more event driven categories of rhetoric (violent crime, child protection) provide significant Granger causality to crime being named as the nation’s most important problem. Given our previous insight into the ephemeral nature of crime rhetoric, it seems less plausible that public opinion is moved by legislative rhetoric than that legislators act immediately to register their reactions, while it may take several weeks before it registers in a Gallup poll. This does not completely square with the data, however. The first spike up in attention to crime and child protection occurs in 1999 after a wave of school shootings, including Columbine. The second peak occurs on October 11, 2000 with final debate on the Victims of Trafficking and Violence Protection Act, after some buildup. This suggests some substantive opinion leadership on the part of Congress. In any case, it also suggests that if politicians are pandering in these areas, they are doing so proactively in anticipation of public opinion. Conversely, for our time period, public opinion and elite rhetoric on economics appear to be completely unrelated. One likely explanation is that majorities and minorities move opposite one another in their desire to discuss the economy. In a good economy, the majority will want to discuss it; in a bad economy, the minority will want to discuss it. The economy is most salient for the public when conditions are poor – certainly this is when it is most likely to be cited as the “most important problem” (Wlezien, 1995) – but responses by the parties are likely to wash out when aggregated together as they are here. We suspect that when we examine such series separately for different parties, the results will be different.

5

Discussion We hope we have demonstrated not only that legislative records hold rich information about

the substance of democratic politics, but that we can extract that information in meaningful ways. Our data and method offer leverage into the political agenda that supplement and extend prior un-

25

derstandings in ways that would have been prohibitively expensive or impossible using conventional methods. Our next steps are to look in a more systematic way at the micro- and macro-dynamics of the processes at play here. First, there are strong reasons to believe that the dynamics of different policy and issue areas are inherently different. Some policy areas are more volatile than others, some are more responsive to external shocks, some have longer term memories when shocked, and some have underlying seasonality driven by fiscal years and elections. We seek to understand why this is the case and how it affects our understanding of political dynamics. Second, we want to know how these policies interact dynamically. Attention is a finite commodity and its distribution is constrained. More attention to one thing must mean less for something else. We want to examine how the attention dynamics in one topic area affect those in another. Third, we want to know what drives these dynamics. At the level of the macro-agenda, do we see leadership (the Congressional agenda leading the public agenda) or responsiveness (the Congressional agenda following the public agenda) or both? How are these affected by other agendas, like that of the President or as reflected in the media? Fourth, how does the rhetorical agenda relate to the legislative agenda and actual policy outputs? Policy change must be preceded by talk about change, but of course not all talk translates into action. Can we discern from rhetoric what changes are going to occur before they happen? These dynamics become even more interesting when we start to disaggregate by party or even individual. We should, for example, be able to examine the conditions under which majority party’s can successfully constrain the legislative agenda (that we see in roll call votes) to be only the most advantageous bits of the larger political agenda (that we appear to see in speech). We should also be able to see whether minority parties engage in heresthetical maneuvers – attempts to shift the public agenda to issues on which they might become winners – as posited by Riker (1986). And what drives individual rhetoric? Is there a discernible electoral connection (Mayhew, 1975) in rhetoric? Is there a reason to expect rhetorical “shirking” among retiring legislators, as has been 26

investigated in voting patterns (Rothenberg and Sanders, 2000)? We know that parties structure voting patterns to greater and lesser extents over time and under different institutional conditions; what are the circumstances in which legislators also talk the party line? The methods we present here open up great possibilities for such analyses in ways that prior methods did not. Perhaps most exciting, they also travel beyond English and beyond the Congressional setting, where conventional methods might be prohibitively expensive or impossible to apply. We hope this might provide an important new window into the nature of democratic politics.

27

References Adler, E. Scott, and John S. Lapinski, editors. 2006. The Macropolitics of Congress. Princeton University Press. Baumgartner, Frank R., and Bryan D. Jones. 1993. Agendas and Instability in American Politics. The University of Chicago Press. Baumgartner, Frank R., and Bryan D. Jones, editors. 2002. Policy Dynamics. The University of Chicago Press. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3. Box-Steffensmeier, Janet M., Laura W. Arnold, and Christopher J. W. Zorn. 1997. “The Strategic Timing of Position Taking in Congress: A Study of the North American Free Trade Agreement.” American Political Science Review 91(2):324–338. Brandt, Patrick, Burt L. Monroe, and John T. Williams. 1999. “Compositional Time Series.” Paper presented to the Political Methodology Society. College Station, July. Budge, Ian, Hans-Dieter Klingemann, Andrea Volkens, Judith Bara, and Eric Tannenbaum. 2001. Mapping Policy Preferences: Parties Electors and Governments, 19451998 . Oxford: Oxford University Press. Cargnoni, Claudia, Peter M¨ uller, and Mike West. 1997. “Bayesian Forecasting of Multinomial Time Series Through Conditional Gaussian Dynamic Models.” Journal of the American Statistical Association 92(438):640–647. Clausen, Aage R. 1973. How Congressmen Decide: A Policy Focus. St. Marin’s Press. Heitschusen, Valerie, and Garry Young. 2006. Macropolitics and Changes in the U.S. Code: Testing Competing Theories of Policy Production, 1874–1946 , pp. 129–50. In Adler and Lapinski (2006). Hill, Kim Quaile, and Patricia A. Hurley. 2002. “Symbolic Speeches in the U.S. Senate and Their Representational Implications.” Journal of Politics 64(1):219–231. Jacobs, Lawrence R., and Robert Y. Shapiro. 2000. Politicians Don’t Pander: Political Manipulation and the Loss of Democratic Responsiveness. Chicago: University of Chicago Press. Jones, Bryan D., and Frank R. Baumgartner. 2004. “Representation and Agenda Setting.” Policy Studies Journal 32:1–24. Jones, Bryan D., and Frank R. Baumgartner. 2005. The Politics of Attention: How Government Prioritizes Problems. The University of Chicago Press. Jones, Bryan D., John Wilkerson, and Frank R. Baumgartner. N.D. “The Policy Agendas Project.” http://www.policyagendas.org. Katznelson, Ira, and John S. Lapinski. 2006. The Substance of Representation: Studying Policy Content and Legislative Behavior , pp. 96–126. In Adler and Lapinski (2006). Maltzman, Forrest, and Lee Sigelman. 1996. “The Politics of talk: Uncontrained Floor Time in the U.S. House of Representatives.” Journal of Politics 58(3):819–30.

28

Martin, Andrew D., and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953-1999.” Political Analysis 10(Spring):134– 153. Mayhew, David R. 1975. Congress: The Electoral Connection. New Haven: Yale University Press. McLachlan, Geoffrey, and David Peel. 2000. Finite Mixture Models. New York: John Wiley & Sons. Monroe, Burt L., Steven P. Abney, Michael P. Colaresi, Kevin M. Quinn, and Dragomir Radev. 2005. “The Dynamics of Political Rhetoric and Political Representation.” Center for Political Studies, University of Michigan. Monroe, Burt L., and Ko Maeda. 2004. “Rhetorical Ideal Point Estimation: Mapping Legislative Speech.” Society for Political Methodology, Stanford University, Palo Alto. Monroe, Burt L., Cheryl L. Monroe, Kevin M. Quinn, Dragomir Radev, Michael H. Crespin, Michael P. Colaresi, Jacob Balazar, and Steven P. Abney. 2006. “United States Congressional Speech Corpus.” Center for Political Studies, University of Michigan. Porter, Martin F. 1980. “An Algorithm for Suffix Stripping.” Program 14(3):130–137. Porter, Martin F. N.D. http://snowball.tartarus.org/algorithms/english/stemmer.html. Riker, William H. 1986. The Art of Political Manipulation. Yale University Press. Rothenberg, Lawrence S., and Mitchell S. Sanders. 2000. “Severing the Electoral Connection: Shirking in the Contemporary Congress.” American Journal of Political Science 44:316–25. Stimson, James A., Michael B. MacKuen, and Robert S. Erikson. 1995. “Dynamic Representation.” American Political Science Review 89(3):543–65. The Library of Congress. N.D. “THOMAS.” http://thomas.loc.gov. The United States Government Printing Office. N.D. “The Congressional Record, GPO Access.” http://www.gpoaccess.gov/crecord. West, Mike, and Jeff Harrison. 1997. Bayesian Forecasting and Dynamic Models. New York: Springer. Wlezien, Christopher. 1995. “The Public as Thermostat: Dynamics of Preferences for Spending.” American Journal of Political Science 39:981–1000.

29

Senate OMNIBUS APPROPRIATIONS 20040122
NULL OMNIBUS APPROPRIATIONS PROCESS S128 14617 Mr. President, I come this morning to again review the lay of the land. As I said a couple of days ago, many of my colleagues, most of our caucus, expressed deep concern alarm, really at the hijacking of the process that went on during the deliberations on the Omnibus appropriations bill. I said at the time, and I believe it ought to be repeated, that I believe the process in the Senate was fair. I have immense respect for the distinguished chairman of the Appropriations Committee. He worked with Members on both sides to accommodate consensus and to reach agreement and the process worked. That process was destroyed at the eleventh hour by some in the administration and by leadership on the Republican side in the House. Changes were demanded. Ultimatums were set. The House and Senate were actually forced to take positions in conference diametrically in opposition to the very positions we took on the Senate floor after a very deliberative debate; positions that I think have great merit. S128 14617 On an overwhelming vote, the Senate supported the notion that we ought to have country-of-origin labeling. They did it because they believed it is an opportunity for us to enhance our ability to add confidence to consumers' choice, knowing if they buy 100 percent U.S. beef they are not going to buy meat with downer cattle from foreign countries. . . . S128 15054 Do I understand that the Senate and the House, on both overtime and mad cow, or country of origin, voted by large majorities to have there be a continuation of overtime and to have country-of-origin labeling on all beef that comes into the United States? Did both bodies, by an overwhelming vote, sustain country of origin and elimination of the President's effort to wipe out overtime? S128 14617 The assistant Democratic leader is correct. That is a succinct summary of what we did. We voted to ensure there be country-of-origin labeling, like 43 other countries have in the world today, knowing we will not be able to export our product to Japan unless it is labeled. We did that. . . .

Figure 1: Example of XML markup of the Congressional Record. 30

RESTORING JUVENILE JUSTICE FUNDING -- (Senate - May 06, 2004) [Page: S4943] ---

Mr. KOHL. Madam President, I rise today to discuss juvenile crime and juvenile crime prevention programs. We must remember that a strategy to combat juvenile crime consists of a large dose of prevention programs as well as strong enforcement. Juvenile justice programs have proven time and time again that they help prevent crime, strengthen communities, and give children a second chance to succeed and lead healthy lives. It is no secret that robust funding for these programs in the 1990s contributed to a 68 percent drop in juvenile crime from 1994 to 2000. Most importantly, investment in our at-risk children will help prevent a life marred by crime and wasted in prison. For these programs to succeed, however, they must be priorities for this Congress and for this administration. We fear that we are failing to live up to our responsibility on this essential issue. A little more than 3 months ago, President Bush released his fiscal year 2005 budget proposal. In it, juvenile justice and delinquency programs will receive only about one-third of the funding they received 3 years ago. This is at a time when recent statistics indicate an uptick in juvenile crime and an increase in school murder rates. We understand that other priorities compete with juvenile justice funding and local crime prevention programs. Yet the amounts we are discussing are so small in the grand scheme of the budget, and the results from the programs so immense, that they mandate our attention. When the Senate considered the budget resolution, we began to address the shortfalls in juvenile justice funding. I was pleased to work with Senators HATCH and BIDEN on an amendment to restore cuts made to juvenile justice programs and local law enforcement funding. Our amendment represents a step in the right direction by restoring juvenile justice funding to last year’s levels, and reversing the trend of ever-diminishing appropriations for these programs. It is essential that the Kohl-Hatch-Biden amendment that restores juvenile justice funding remain in the final Budget Resolution. These programs are a wise investment. For every dollar spent on prevention, we save $3 to $4 in costs associated with juvenile crime. Furthermore, law enforcement officials strongly support prevention efforts. A recent poll shows that 71 percent of police chiefs, sheriffs and prosecutors believe that crime prevention efforts would have the greatest impact in reducing youth violence and crime. So for those who may fear that a crime prevention strategy is not ‘‘tough’’ enough on juveniles, we suggest that these programs make sound economic sense and are overwhelmingly endorsed by law enforcement. We must do a better job of funding them. Let me tell you about two essential programs. In 1992, we established the Title V Local Delinquency Prevention Program. Title V was and remains unique in that it is the only source of federal funding solely dedicated to juvenile crime prevention efforts. More importantly, Title V has proven to be a very successful program that encourages investment, collaboration, and long-range prevention planning by local communities. Title V programs include preschool and parent training programs, youth mentoring, after-school activities, tutoring, truancy reduction, substance abuse prevention and gang prevention outreach. Through these initiatives, large cities like Milwaukee to small communities like Ladysmith, WI are creating environments that strengthen families and help children avoid crime and develop into productive adults.

Figure 2: An Excerpt from an Example Document in the ‘Law and Crime 1 [esp. Violence / Drugs]’ Category by Senator Kohl. 31

SUBMITTED RESOLUTIONS -- (Senate - January 27, 2004) [Page: S288] ---

(TEXT OF SENATE RESOLUTION 292 OMITTED.) Mrs. MURRAY. Mr. President, today I am pleased to submit a resolution designating the week of February 2, 2004, as ‘‘National School Counseling Week,’’ on behalf of my colleagues Senator BIDEN, Senator DORGAN, Senator JOHNSON and Senator DODD. This resolution would honor and celebrate the important work of school counselors, which the Senate has recognized since 1965 through the inclusion of school counseling in the Elementary and Secondary Education Act. Across the country, there are approximately 95,000 school counselors, including 2,100 in Washington State. School counselors are critical components of a successful school and contribute significantly to the growth and success of students. In fact, school counselors were instrumental in helping students, teachers, and parents deal with the trauma of terrorism on September 11, 2001, and its aftermath. However, despite their important service, counselors are expected to serve, on average 485 students each, and are overwhelmed. The American School Counseling Association, the American Medical Association, and the American Psychological Association recommend the ratio of students to school counselors be 250 students to 1 school counselor. I want to share just a few examples of how school counselors throughout America are helping students. In a middle school in southern California, school counselors realized that 257 students were in danger of not passing onto the next grade. They discovered that only 15 percent of the students understood the promotion and retention requirements. The school counselors presented a series of individual and small group lessons on promotion and retention criteria. After the lessons, 100 percent of the students understood the requirements. As a result, 72 of the 257 students, about 28 percent, avoided retention that year. In a high school in Racine, WI, a math teacher realized that 100 of his students failed algebra in the first quarter of the year. He asked a school counselor for help. Together, they discovered some of the reasons why students were failing. They initiated several programs, such as peer tutoring and homework assistance. As a result, 93 of the 100 students passed algebra by the end of the year and were able to move on to the next level of math. A school district in Kentucky realized that the retention rate among ninth grade students was unacceptably high. School counselors, teachers and administrators worked together to develop and implement strategies targeted at helping ninth graders move to 10th grade. As a result, retention rates improved in 16 of the 17 high schools in the county in just one year. One school saw the retention rate improve more than 25 percent. This resolution is merely the beginning of what we need to be doing to support school counselors. We need to reduce the ratio of students to counselors to, at the most, 250 to 1. We need to help schools maintain their funding so that school counselors are not cut from school budgets. And we need to support our school counselors so that they can continue to be integral in the fabric of our schools and help our students achieve success in high school and beyond.

Figure 3: An Excerpt from an Example Document in the ‘Education’ Category by Senator Murray. 32

Judicial Nominations, 1.0 / 2.4% Supreme Court / Constitutional, 1.1 / 3.0% Campaign Finance, 0.9 / 2.4% Abortion, 0.5 / 1.1% Law & Crime 1 [esp. Violence / Drugs], 1.3 / 1.8% Child Protection, 0.9 / 2.6% Health 1 [Medical], 1.5 / 2.4% Social Welfare, 2.0 / 2.8% Education, 1.8 / 4.6% Armed Forces 1 [Manpower], 1.0 / 1.5% Armed Forces 2 [Infrastructure], 2.3 / 3.0% Intelligence, 1.4 / 3.9% Law & Crime 2 [Federal, incl. White Collar / Immigration], 1.8 / 2.7% Environment 1 [Public Lands], 2.2 / 2.5% Commerce / Transportation / Telecom, 2.0 / 2.9% Banking & Finance / Tort Reform, 1.1 / 3.1% Labor 1 [Workers, esp. Retirement], 1.0 / 1.5% Debt / Deficit / Social Security, 1.7 / 4.6% Labor 2 [Employment, esp. Jobs / Wages], 1.4 / 4.5% Individual Taxation, 1.1 / 2.7% Energy, 1.4 / 3.3% Environment 2 [Regulation], 1.1 / 2.8% Agriculture, 1.2 / 2.5% Foreign Trade, 1.1 / 2.4% Procedural 3 [Legislation 1], 2.0 / 2.8% Procedural 4 [Legislation 2], 3.0 / 3.5% Health 2 [Economics − Seniors], 1.0 / 2.6% Health 3 [Economics − General], 0.8 / 2.3% Defense [Use of Force], 1.4 / 3.7% International Affairs [Diplomacy / Human Rights], 1.9 / 3.0% International Affairs [Arms Control], 0.9 / 2.3% Symbolic [Tribute − Living], 1.9 / 1.3% Symbolic [Tribute − Constituent], 3.2 / 1.9% Symbolic [Remembrance − Military], 2.3 / 1.9% Symbolic [Remembrance − Nonmilitary], 2.4 / 2.3% Symbolic [Congratulations − Sports], 0.6 / 0.4% Debt / Deficit [~Daily Jesse Helms Speeches], 0.5 / 0.1% Hate Crime [~Daily Gordon Smith Speeches], 0.4 / 0.1% Procedure 1 [Housekeeping 1], 20.4 / 1.5% Procedure 5 [Housekeeping 3], 15.5 / 1.0% Procedure 6 [Housekeeping 4], 6.5 / 1.6% Procedure 2 [Housekeeping 2], 2.4 / 0.8%

Height 50

100

150

200

250

Agglomerative Clustering of 42 Topic Model

Procedural [Housekeeping]

Bills / Proposals

Social Distributive Infrastructure

DHS

Symbolic

Constitutional / Conceptual Domestic Policy (Spending / Regulation)

Regulation

33

International

Core Economy Macro Western

Figure 4: Hierarchical Agglomerative Clustering of β 1 , . . . , β K . Clustering based on minimizing the maximum Euclidean distance between cluster members. Each cluster is labelled with a topic name, followed by the percentage of documents and words, respectively, in that cluster.

34 Judicial 1

Figure 5: The Probability of a Randomly Chosen Document being on ‘Judicial Nominations’ (π judicial1 ). Values of π that are farther than 7 days from a day in which legislative speech occurred have been whited out.

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Probability

Jan 01, 1997 Jan 31, 1997 Mar 02, 1997 Apr 01, 1997 May 01, 1997 May 31, 1997 Jun 30, 1997 Jul 30, 1997 Aug 29, 1997 Sep 28, 1997 Oct 28, 1997 Nov 27, 1997 Dec 27, 1997 Jan 26, 1998 Feb 25, 1998 Mar 27, 1998 Apr 26, 1998 May 26, 1998 Jun 25, 1998 Jul 25, 1998 Aug 24, 1998 Sep 23, 1998 Oct 23, 1998 Nov 22, 1998 Dec 22, 1998 Jan 21, 1999 Feb 20, 1999 Mar 22, 1999 Apr 21, 1999 May 21, 1999 Jun 20, 1999 Jul 20, 1999 Aug 19, 1999 Sep 18, 1999 Oct 18, 1999 Nov 17, 1999 Dec 17, 1999 Jan 16, 2000 Feb 15, 2000 Mar 17, 2000 Apr 16, 2000 May 16, 2000 Jun 15, 2000 Jul 15, 2000 Aug 14, 2000 Sep 13, 2000 Oct 13, 2000 Nov 12, 2000 Dec 12, 2000 Jan 11, 2001 Feb 10, 2001 Mar 12, 2001 Apr 11, 2001 May 11, 2001 Jun 10, 2001 Jul 10, 2001 Aug 09, 2001 Sep 08, 2001 Oct 08, 2001 Nov 07, 2001 Dec 07, 2001 Jan 06, 2002 Feb 05, 2002 Mar 07, 2002 Apr 06, 2002 May 06, 2002 Jun 05, 2002 Jul 05, 2002 Aug 04, 2002 Sep 03, 2002 Oct 03, 2002 Nov 02, 2002 Dec 02, 2002 Jan 01, 2003 Jan 31, 2003 Mar 02, 2003 Apr 01, 2003 May 01, 2003 May 31, 2003 Jun 30, 2003 Jul 30, 2003 Aug 29, 2003 Sep 28, 2003 Oct 28, 2003 Nov 27, 2003 Dec 27, 2003 Jan 26, 2004 Feb 25, 2004 Mar 27, 2004 Apr 26, 2004 May 26, 2004 Jun 25, 2004 Jul 25, 2004 Aug 24, 2004 Sep 23, 2004 Oct 23, 2004 Nov 22, 2004 Dec 22, 2004

35 0.00

0.05

0.10

0.15

0.20

Judicial 2

Figure 6: The Probability of a Randomly Chosen Document being on ‘Supreme Court / Constitutional’ (π judicial2 ). Values of π that are farther than 7 days from a day in which legislative speech occurred have been whited out.

Probability


36 0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Abortion

Figure 7: The Probability of a Randomly Chosen Document being on ‘Abortion’ (π abortion ). Values of π that are farther than 7 days from a day in which legislative speech occurred have been whited out.

Probability


0.6 0.4

Jesse Helms re Debt

Abortion

0.2

Fraction of Shock Remaining

0.8

1.0

Persistence in Topic Categories

Defense [Use of Force] Environment 1 [Public Lands] Law & Crime 1 [Violent] Symbolic [Sports] 2

4

6

8

10

12

Day (Shock on Day 1

Figure 8: Estimated persistence of shocks to selected topics, AR-7 model.

37

14

Daily Month−adjusted Dynamics in Attention (Words/Topic Logratios), 42 Topic Model Labor 1 [Workers]

8

●

Commentary on Ephemera Armed Forces 1 [Manpower] ● Debt / Social Security Law & Crime 1 [Violent] ●

Labor 2 [Employment] Trade ● Energy ● Education ● Constitutional

Symbolic [Sports] Health 1 [Medical] ●

●

●

Sustained Agenda Innovations, 97−04

●

6

Armed Forces 2 [Infrastructure] ● Defense [Use of Force] ● Intelligence Environment 2 [Regulation] ●

●

Agriculture Child Protection ●

Social Welfare

Jesse Helms re Debt

●

●

●

Taxes Banking / Finance ● ● Health 2 [Seniors] Crime 2 [Federal] Judicial Nominations ● ● ● Int'l Affairs [Arms] Int'l Affairs [Nonmilitary] Symbolic [Nonmilitary] Commercial Infrastructure ● ●

●

Hobby Horses

●

Environment 1 [Public Lands]

4

Volatility (MAD, One−day−ahead forecast error)

●

●

Regular / Brief (e.g., Low Salience)

Gordon Smith re Hate Crime ● Procedural Health 33 [Economics] Finance ● Campaign ● ● Symbolic [Military]

Symbolic [Living] ●

●

Regular / Sustained (e.g., Electoral)

Symbolic [Constituent] Procedural 4 ● Abortion ●

●

Procedural 2 ●

2

Procedural 6 Procedural 5 ● Procedural 1 ● ●

0.00

0.05

0.10

0.15

Persistence (Fraction of shock sustained 14 days)

Figure 9: Persistence and volatility in the daily dynamics of topic attention. Both derived from independent models of daily word/topic logratios (as given by topic detection model), AR-7 with monthly dummies.

38

39 0.00

0.05

0.10

0.15

Foreign Policy Categories

Figure 10: The Probability of a Randomly Chosen Document being on‘Defense [Use of Force]’, ‘International [Arms]’, or ‘International [Diplomacy / Human Rights]’. ‘Use of Force’ is at the bottom in dark brown, ‘Arms’ is in the middle in light blue, and ‘Diplomacy / Human Rights’ is at the top in yellow. Values of π that are farther than 7 days from a day in which legislative speech occurred have been whited out.

Probability


40

0.00

0.05

0.10

0.15

Apr 01, 2003

Mar 02, 2003

Jan 31, 2003

Jan 01, 2003 Dec 02, 2002

Nov 02, 2002 Oct 03, 2002 Sep 03, 2002

Aug 04, 2002 Jul 05, 2002

Jun 05, 2002

May 06, 2002

Apr 06, 2002

Mar 07, 2002

Feb 05, 2002

Jan 06, 2002

Dec 07, 2001

Nov 07, 2001

Oct 08, 2001

Sep 08, 2001

Aug 09, 2001

Figure 11: The Probability of a Randomly Chosen Document being on ‘Symbolic [Military]’ or ‘Defense [Use of Force]’. Symbolic is at the bottom in brown and Force is above in light blue. Values of π that are farther than 7 days from a day in which legislative speech occurred have been whited out.

Probability

0.20

Symbolic − Military and Foreign Intervention / Iraq

May 01, 2003

1

−0.5

2

0.0

Defense/International: Most Important Problem and Legislative Rhetoric

0

−1.5

−2

−2.0

−1

Defense [Use of Force]

−3.5

−4

−3.0

−3

−2.5

Relative Salience / Attention

−1.0

Trade Symbolic [Military] Intelligence

1998

2000

2002

2004

Year

Figure 12: Monthly Most Important Problem = Defense / International (log-odds) and monthly logratios of words/topic in selected categories. These are typical inputs for the Granger causality test.

41

1.0

0.11

Crime 2 [Federal]

0.08

0.5

0.09

0.10

Law & Crime 1 [Violent]

0.0

0.07

Child Protection

−1.0

0.03

0.04

−0.5

0.05

0.06

Relative Salience / Attention

0.12

1.5

0.13

0.14

2.0

0.15

Crime: Most Important Problem and Legislative Rhetoric

1998

2000

2002

2004

Year

Figure 13: Monthly Most Important Problem = Crime (log-odds) and monthly logratios of words/topic in selected categories. These are typical inputs for the Granger causality test.

42

9/12/01 Judicial Nominations Constitutional Campaign Finance Abortion Crime 1 [Violent] Child Protection Health 1 [Medical] Social Welfare Education Armed Forces 1 [Manpower] Armed Forces 2 [Infrastructure] Intelligence Crime 2 [Federal] Environment 1 [Public Lands] Commercial Infrastructure Banking / Finance Labor 1 [Workers] Debt / Social Security Labor 2 [Employment] Taxes Energy Environment 2 [Regulation] Agriculture Trade Procedural 3 Procedural 4 Health 2 [Seniors] Health 3 [Economics] Defense [Use of Force] International Affairs [Nonmilitary] International Affairs [Arms] Symbolic [Living] Symbolic [Constituent] Symbolic [Military] Symbolic [Nonmilitary] Symbolic [Sports] Jesse Helms re Debt Gordon Smith re Hate Crime Procedural 1 Procedural 5 Procedural 6 Procedural 2

Post-9/11 5.28 −2.19 −3.19 −2.59

9/12/01

Post-9/11 3.25

Bush II −2.67 −2.25

−6.68

−3.01

−3.87 2.83 2.19 5.00

−3.40

−2.79

−2.39

2.23 −3.60 4.06

−2.84 3.40

2.65 1.98 3.58 2.00

−2.30 −2.83 −3.23

3.96 −2.17

3.25

6.61 2.17

2.02

−4.56

−4.62 3.05 3.88 6.38

−9.56 10.06 2.22 2.35

6.07

−2.56

2.15 2.62 2.02

3.41

−3.79

−4.72 3.59

7.40 2.28

Table 1: Each series (centered logratios of daily words) modelled as an independent AR-7 with monthly dummies. The AR and monthly coefficients are not reported here. Values shown are t-values, such that |t| > 1.96.

43

p (Speech → MIP)

p (MIP → Speech)

Defense Defense [Use of Force] Int’l Affairs [Nonmilitary] Int’l Affairs [Arms] Armed Forces 1 [Manpower] Armed Forces 2 [Infrastructure] Intelligence Trade

0.048 0.201 0.498 0.029 0.047 0.039 0.929

0.577 0.307 0.018 0.514 0.984 0.137 0.075

Crime Law / Crime 1 [Violent] Child Protection Crime 2 [Federal]

0.002 0.022 0.092

0.835 0.895 0.181

Economics Debt / Social Security Labor 2 [Employment] Taxes Banking / Finance Labor 1 [Workers]

0.957 0.927 0.166 0.635 0.658

0.602 0.519 0.994 0.356 0.107

Table 2: Bivariate Granger causality tests (F-tests) for ‘Most Important Problem’ Gallup responses and legislative speech in related categories.

44

A

Estimation

We use the Expectation Conditional Maximization (ECM) algorithm to fit this model. Schematically, the algorithm proceeds as follows. (0)

1. Set initial values θ (0) , ω (0) , η (0) , Vt

(0)

and Wt .

2. i = 0 3. do until convergence (i)

(i)

(a) Calculate zˆdk ≡ E(Zdk |Y, θ (i) , ω (i) , η (i) , Vt , Wt ) (b) Set ω (i+1) = argmax ω

(c) Set

η (i+1)

(d) Set

(i+1) Vt

for d = 1, . . . , D and k = 1, . . . , K

ˆ Y, θ (i) , η (i) , V(i) , W(i) ) p(ω|Z, t t (i)

(i)

ˆ Y, θ (i) , ω (i+1) , V , W ) = argmax p(η|Z, t t η

(i+1)

(e) Set Wt

(i)

(i)

ˆ Y, θ (i) , ω (i+1) , η (i+1) , V , W ) = argmax p(Vt |Z, −t t

for t = 1, . . . , T

Vt

(i+1)

ˆ Y, θ (i) , ω (i+1) , η (i+1) , V = argmax p(Wt |Z, t Wt

(i+1)

ˆ Y, ω (i+1) , η (i+1) , V (f) Set θ (i+1) = argmax p(θ|Z, t θ

(i+1)

, Wt

(i)

, W−t )

for t = 1, . . . , T

)

(g) i = i + 1 More specifically, each of the steps above is accomplished as follows.

A.1

Step 3a (Conditional Expectation of Z)

Standard calculations (McLachlan and Peel, 2000) reveal that ydw w=1 θkw QW ydw . j=1 πs(d)j w=1 θjw

πs(d)k

QW

zˆdk = PK

A.2

Step 3b (Conditional Maximization of ω)

A simple analytic solution for this step is not available. Instead, we use the BFGS algorithm to numerically maximize Y ˆ Y, θ (i) , η (i) , V(i) , W(i) ) ∝ fN (ω t |F0 η t , Vt ) p(ω t |Z, fM (ˆ zd |1, π t ) t t t d:s(d)=t

where fN (·|µ, Σ) is the normal density with mean µ and variance-covariance matrix Σ; fM (·|n, ρ) is the multinomial mass function with sample size n and probability vector ρ; and ! exp(ωt(K−1) ) exp(ωt1 ) 1 πt = ,..., , , . P P PK−1 1 + K−1 1 + K−1 j=1 exp(ωtj ) j=1 exp(ωtj ) 1 + j=1 exp(ωtj ) Handling each of the ω t vectors separately is appropriate since they are conditionally independent.

45

A.3

Step 3c (Conditional Maximization of η)

Because of the conditionally Gaussian structure of this portion of the model we can use standard results for Gaussian dynamic linear models (West and Harrison, 1997) to calculate the value of η that maximizes the posterior conditional on the other model parameters. Specifically: 1. for t = 1, . . . , T { (a) at = Gt mt−1 (b) Rt = Gt Ct−1 G0t + Wt (c) ft = F0t at (d) Qt = F0t Rt Ft + Vt (e) At = Rt Ft Q−1 t (f) et = ω t − ft (g) mt = at + At et (h) Ct = Rt − At Qt A0t } 2. η T = mT 3. for t = (T − 1), . . . , 1 { (a) Bt = Ct G0t+1 R−1 t+1 (b) η t = mt + Bt (η t+1 − at+1 ) } Note that ft is the one-step-ahead forecast of ω t . If we are interested in forecasting this quantity can also be easily stored.

A.4

Step 3d (Conditional Maximization of V)

Conditional on the other model parameters, each diagonal element Vii of V follows an inverse gamma distribution with mode at P b0 + t∈T (ωti − [F0t η t ]i )2 /2 (a0 + |T |) /2 + 1 where T denotes the set of days in which Congress was in session and [F0t η t ]i is the ith element of F0t η t .

A.5

Step 3e (Conditional Maximization of W)

Conditional on the other model parameters, each diagonal element Wii of W follows an inverse gamma distribution with mode at 2 P d0 + Tt=2 ηti − Gt η t−1 i /2 (c0 + T − 1) /2 + 1 where Gt η t−1 i is the ith element of Gt η t−1 . 46

A.6

Step 3f (Conditional Maximization of θ)

Simple calculations show that the value of θ that maximizes the conditional posterior has typical element PD ˆdk ydw + λkw d=1 z n o θkw = P P D W z ˆ y + λ kj d=1 dk dj j=1

47

B

Key Words by Topic

48

nomine confirm nomin circuit hear court judg judici case vacanc judiciari major clinton appeal record been bush last white ha ninth appoint hold suprem bench republican consent were than month consid committe of filibust justic dure never posit democrat view

β 5.61 5.18 5.40 5.13 4.68 5.52 5.68 4.93 4.45 4.30 4.52 4.16 4.13 4.03 3.94 5.02 4.06 4.18 3.58 5.51 3.54 3.64 3.36 3.89 3.36 4.83 3.47 4.76 4.43 3.84 3.47 4.92 7.11 3.91 4.34 3.70 3.54 3.31 4.56 3.33

median −0.80 0.42 0.35 −0.61 2.41 1.65 0.79 −0.56 2.55 −2.15 0.03 2.67 1.36 0.27 2.61 4.22 1.31 3.28 1.25 4.75 −1.77 0.32 1.78 0.08 −2.79 1.89 2.46 3.78 3.71 2.50 2.39 3.46 6.74 −1.03 1.29 2.61 2.10 2.10 2.01 1.82

difference 6.42 4.76 5.04 5.74 2.27 3.86 4.89 5.49 1.90 6.44 4.49 1.49 2.77 3.76 1.32 0.79 2.75 0.90 2.33 0.75 5.31 3.32 1.58 3.81 6.15 2.94 1.01 0.98 0.72 1.34 1.08 1.46 0.37 4.94 3.05 1.09 1.44 1.22 2.54 1.51

MAD 1.26 0.89 1.12 1.49 0.55 1.14 1.48 1.60 0.50 1.52 1.24 0.36 0.64 0.83 0.30 0.28 0.78 0.29 0.49 0.29 1.16 0.93 0.33 1.18 1.32 1.20 0.26 0.40 0.28 0.44 0.30 0.62 0.17 1.74 1.22 0.39 0.49 0.35 1.14 0.49

ratio 5.10 5.35 4.50 3.85 4.15 3.38 3.30 3.43 3.81 4.23 3.63 4.17 4.32 4.56 4.44 2.80 3.51 3.14 4.75 2.56 4.56 3.57 4.86 3.22 4.66 2.46 3.86 2.44 2.57 3.03 3.57 2.36 2.20 2.83 2.50 2.81 2.96 3.44 2.23 3.11

Table 3: Key Words on ‘Judicial Nominations’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 49

case court attornei suprem justic nomin judg m decis constitut counsel nomine white ashcroft articl testimoni investig gener view law marriag record john wit hi right evid grand judiciari victim trial confirm crime independ lawyer flag appoint testifi alleg clinton

β 4.39 4.62 3.89 3.54 4.22 3.47 4.21 3.26 3.43 4.50 3.44 3.19 3.06 3.23 3.07 2.96 3.24 3.90 3.09 4.53 3.13 3.32 2.81 2.94 4.83 4.41 3.05 2.72 3.04 3.44 3.00 2.77 3.52 3.22 2.75 2.80 2.73 2.51 2.55 2.86

median 2.55 1.65 0.50 0.08 1.29 0.35 0.79 0.32 2.21 1.71 0.22 −0.80 1.25 −0.80 1.18 0.27 0.80 2.90 1.82 3.15 −0.75 2.61 0.71 0.54 3.55 3.45 0.88 −0.83 0.03 0.77 −0.30 0.42 0.70 1.39 −0.28 −0.66 0.32 0.15 −0.44 1.36

difference 1.84 2.97 3.39 3.46 2.94 3.12 3.43 2.94 1.22 2.79 3.23 3.99 1.80 4.03 1.89 2.69 2.44 1.00 1.27 1.38 3.88 0.70 2.09 2.40 1.28 0.96 2.17 3.55 3.01 2.67 3.30 2.35 2.83 1.83 3.04 3.46 2.42 2.36 2.98 1.49

MAD 0.50 1.14 1.20 1.18 1.22 1.12 1.48 0.93 0.46 1.27 1.23 1.26 0.49 1.46 0.62 0.65 0.97 0.47 0.49 0.73 1.55 0.30 0.61 0.86 0.72 0.54 0.88 0.94 1.24 1.24 1.36 0.89 1.48 0.85 1.15 1.39 0.93 0.64 0.95 0.64

ratio 3.70 2.60 2.83 2.93 2.40 2.79 2.31 3.15 2.66 2.20 2.63 3.17 3.68 2.76 3.02 4.16 2.51 2.10 2.61 1.89 2.51 2.36 3.43 2.79 1.78 1.80 2.48 3.78 2.44 2.15 2.43 2.64 1.91 2.16 2.63 2.49 2.60 3.71 3.16 2.33

Table 4: Key Words on ‘Supreme Court / Constitutional’ (category includes impeachment, Department of Justice, marriage). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 50

campaign candid elect monei contribut polit soft ad parti limit rais voter speech group financ issu or interest mccain-feingold reform run union court influenc corrupt suprem free expenditur televis advertis advocaci system corpor spent spend disclosur public vote democraci first

β 5.19 4.67 5.00 5.28 4.44 4.75 4.21 4.13 4.32 4.37 3.80 3.61 3.81 3.80 4.46 4.44 5.08 3.77 3.58 4.58 3.34 3.63 4.09 3.13 3.20 3.34 3.27 3.18 3.03 2.96 3.00 4.35 3.31 3.11 3.94 3.11 4.10 4.70 3.05 4.01

median 1.04 −0.29 1.53 2.54 1.57 2.08 −1.12 1.51 1.50 2.18 2.17 −0.70 0.71 2.22 1.21 3.58 4.53 2.77 −3.88 1.91 1.65 1.28 1.65 −0.06 −1.18 0.08 1.49 0.05 0.06 −0.50 −1.12 2.90 1.02 1.56 2.14 −0.88 2.86 3.29 0.16 3.28

difference 4.15 4.97 3.47 2.73 2.86 2.68 5.34 2.63 2.82 2.18 1.64 4.30 3.11 1.57 3.26 0.86 0.55 1.00 7.46 2.67 1.69 2.36 2.44 3.19 4.38 3.26 1.78 3.13 2.97 3.46 4.12 1.45 2.29 1.54 1.79 4.00 1.24 1.41 2.89 0.73

MAD 0.91 0.68 1.03 0.88 0.55 0.80 0.72 0.44 0.97 0.79 0.37 0.82 0.93 0.50 1.45 0.40 0.28 0.37 2.46 1.31 0.53 0.93 1.14 0.83 1.32 1.18 0.62 1.06 0.95 1.03 1.24 0.83 0.95 0.53 0.99 1.52 0.72 0.89 1.27 0.45

ratio 4.55 7.27 3.39 3.09 5.24 3.35 7.40 6.02 2.91 2.77 4.45 5.26 3.33 3.13 2.25 2.17 1.95 2.74 3.03 2.05 3.20 2.53 2.13 3.83 3.31 2.76 2.89 2.96 3.13 3.37 3.32 1.75 2.41 2.92 1.82 2.62 1.72 1.58 2.28 1.63

Table 5: Key Words on ‘Campaign Finance’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 51

procedur abort babi thi life doctor human ban decis or except case issu pregnanc kill right physician medic mother v necessari which cell an health woman birth never partial my of suprem perform women choos dr posit circumst letter deliv

β 5.26 5.87 4.61 6.55 4.85 4.41 4.45 4.50 4.21 5.33 3.87 4.24 4.74 4.19 3.67 4.88 3.73 4.78 4.54 3.74 3.65 4.74 3.44 5.22 5.04 4.67 3.48 3.57 3.08 5.15 7.10 3.54 3.90 4.90 3.23 3.68 3.30 3.15 3.05 3.08

median 1.12 −0.70 −0.69 5.80 2.02 0.02 1.47 0.19 2.21 4.53 1.14 2.55 3.58 −2.46 0.83 3.45 −0.98 1.23 0.51 −0.36 1.95 4.19 −0.93 4.65 2.54 0.52 −0.64 2.10 −0.78 4.47 6.74 0.08 1.32 2.25 0.79 0.45 2.10 0.98 1.66 0.57

difference 4.14 6.57 5.30 0.75 2.83 4.39 2.97 4.31 2.00 0.80 2.73 1.69 1.15 6.66 2.84 1.43 4.71 3.56 4.03 4.11 1.70 0.55 4.37 0.57 2.50 4.16 4.12 1.47 3.85 0.68 0.36 3.47 2.58 2.65 2.44 3.23 1.20 2.17 1.39 2.52

MAD 0.75 2.00 1.37 0.25 0.82 1.12 0.79 1.22 0.46 0.28 0.53 0.50 0.40 2.00 0.62 0.54 1.21 1.31 1.44 1.20 0.49 0.22 1.21 0.24 1.07 1.73 1.25 0.49 0.71 0.31 0.17 1.18 0.98 1.20 0.70 1.19 0.35 0.62 0.34 0.68

ratio 5.56 3.28 3.88 2.94 3.45 3.93 3.77 3.53 4.36 2.81 5.15 3.40 2.91 3.32 4.61 2.68 3.89 2.71 2.81 3.41 3.48 2.54 3.60 2.33 2.34 2.41 3.31 3.02 5.44 2.20 2.14 2.93 2.64 2.22 3.50 2.71 3.39 3.53 4.04 3.70

Table 6: Key Words on ‘Abortion’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 52

enforc act crime gun law victim violenc abus prevent juvenil crimin polic drug school firearm domest local children violent youth justic offend hate offic assault grant prosecut sexual safe case traffick substanc treatment student illeg safeti commun effort prison kill

β 4.37 4.47 4.75 4.06 4.86 3.95 4.33 3.89 3.76 3.75 3.80 3.56 4.43 4.16 3.26 3.26 3.83 4.49 3.17 3.22 3.79 3.07 3.03 4.02 2.88 3.48 2.97 3.03 2.96 3.41 2.71 2.63 3.06 3.23 2.79 3.44 3.92 3.47 2.83 2.58

median 1.66 3.32 0.70 0.14 3.15 0.77 0.42 0.83 1.88 −1.01 0.24 0.52 1.78 2.14 −1.72 0.77 1.94 2.50 −1.44 −0.61 1.29 −0.96 −0.47 2.80 −0.70 1.79 −1.03 −1.30 1.11 2.55 −1.93 −0.09 0.88 1.13 −0.03 1.78 2.88 2.89 −0.05 0.83

difference 2.71 1.15 4.05 3.92 1.71 3.19 3.90 3.06 1.87 4.76 3.56 3.04 2.65 2.02 4.98 2.49 1.89 1.99 4.62 3.82 2.50 4.03 3.51 1.21 3.58 1.69 4.00 4.33 1.84 0.86 4.63 2.72 2.18 2.09 2.82 1.66 1.04 0.58 2.89 1.75

MAD 0.75 0.40 1.48 1.07 0.73 1.24 1.66 1.13 0.69 1.83 1.48 1.09 1.36 0.98 1.42 0.84 0.90 1.16 1.11 1.23 1.22 0.71 0.84 0.73 0.73 0.87 1.43 1.72 0.77 0.50 1.53 0.63 1.08 1.20 1.18 1.06 0.73 0.39 1.35 0.62

ratio 3.61 2.84 2.74 3.65 2.34 2.56 2.35 2.72 2.72 2.60 2.41 2.79 1.95 2.06 3.51 2.98 2.11 1.72 4.16 3.11 2.04 5.65 4.20 1.67 4.87 1.95 2.79 2.51 2.40 1.72 3.03 4.30 2.02 1.75 2.38 1.57 1.43 1.52 2.14 2.84

Table 7: Key Words on ‘Law and Crime 1’ (category includes violence, drugs, police, prisons). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 53

gun tobacco smoke kid show firearm crime kill law school juvenil children enforc young violenc crimin dealer drug youth cigarett advertis check addict stop violent prison polic background compani alcohol prevent boi them murder product teenag drive illeg industri caus

β 4.76 4.03 3.60 3.50 3.49 3.11 3.49 3.01 4.27 3.75 3.16 4.19 3.12 3.25 3.28 3.02 2.66 3.64 2.66 2.71 2.57 2.77 2.56 2.79 2.49 2.67 2.64 2.42 3.34 2.42 2.92 2.37 4.15 2.22 3.24 2.38 2.29 2.36 3.49 2.46

median 0.14 0.04 −0.53 0.49 2.10 −1.72 0.70 0.83 3.15 2.14 −1.01 2.50 1.66 1.34 0.42 0.24 −1.62 1.78 −0.61 −2.09 −0.50 0.84 −1.38 1.68 −1.44 −0.05 0.52 −0.16 1.83 −0.77 1.88 −0.11 3.71 −1.25 2.16 −1.15 0.58 −0.03 2.02 1.84

difference 4.62 3.99 4.13 3.00 1.39 4.83 2.79 2.18 1.11 1.61 4.17 1.69 1.46 1.91 2.86 2.78 4.28 1.86 3.26 4.80 3.07 1.93 3.94 1.11 3.93 2.73 2.11 2.58 1.51 3.19 1.03 2.48 0.43 3.47 1.08 3.52 1.70 2.39 1.46 0.62

MAD 1.07 1.58 1.13 1.24 0.61 1.42 1.48 0.62 0.73 0.98 1.83 1.16 0.75 1.05 1.66 1.48 1.13 1.36 1.23 2.15 1.03 0.97 1.42 0.59 1.11 1.35 1.09 0.84 1.17 1.19 0.69 0.91 0.41 1.08 0.92 1.77 0.74 1.18 1.37 0.36

ratio 4.31 2.52 3.66 2.43 2.28 3.40 1.89 3.52 1.53 1.64 2.28 1.46 1.94 1.81 1.72 1.88 3.81 1.36 2.66 2.23 2.99 1.99 2.77 1.87 3.55 2.03 1.94 3.07 1.30 2.69 1.50 2.74 1.05 3.20 1.17 2.00 2.31 2.02 1.07 1.73

Table 8: Key Words on ‘Child Protection’ (category includes tobacco, alcohol, school violence, sexual abuse). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 54

diseas cancer research health prevent patient treatment devic food medic fda aid lead breast caus drug effect human ill screen test infect medicin tobacco product smoke treat care institut cell act heart public children safeti organ nih advanc women studi

β 4.36 4.06 4.36 4.89 3.85 3.56 3.67 3.21 3.34 3.89 3.69 3.17 2.97 3.07 2.97 4.04 3.48 3.19 2.86 2.72 3.00 2.69 2.64 3.14 3.66 2.68 2.61 3.96 2.91 2.57 3.89 2.59 3.86 4.02 3.37 3.25 2.55 2.49 3.77 3.11

median 0.47 −0.14 1.78 2.54 1.88 0.16 0.88 −0.63 0.81 1.23 −1.28 1.06 1.69 −1.37 1.84 1.78 2.68 1.47 −0.02 −0.70 1.49 −1.59 −0.42 0.04 2.16 −0.53 0.98 2.59 1.75 −0.93 3.32 0.78 2.86 2.50 1.78 2.21 −2.02 1.30 2.25 1.99

difference 3.89 4.20 2.59 2.35 1.96 3.39 2.79 3.84 2.53 2.66 4.97 2.11 1.29 4.43 1.12 2.26 0.80 1.72 2.88 3.41 1.51 4.28 3.07 3.09 1.50 3.21 1.63 1.38 1.16 3.50 0.56 1.81 0.99 1.51 1.59 1.04 4.57 1.19 1.52 1.12

MAD 1.11 1.22 1.10 1.07 0.69 1.21 1.08 1.05 0.96 1.31 2.38 0.77 0.38 1.66 0.36 1.36 0.42 0.79 1.00 0.87 0.69 1.33 0.85 1.58 0.92 1.13 0.55 0.97 0.55 1.21 0.40 0.71 0.72 1.16 1.06 0.68 1.91 0.51 1.20 0.75

ratio 3.52 3.45 2.35 2.20 2.85 2.81 2.58 3.66 2.63 2.03 2.09 2.75 3.38 2.67 3.14 1.66 1.92 2.18 2.88 3.93 2.19 3.23 3.59 1.95 1.63 2.85 2.93 1.43 2.12 2.89 1.40 2.55 1.38 1.31 1.50 1.53 2.39 2.35 1.27 1.49

Table 9: Key Words on ‘Health 1 [Medical]’ (category emphasizes the medical / scientific aspects of health: particular diseases, prevention, research, regulation). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 55

care health act home hospit support children educ student nurs medicar train disabl coverag access patient welfar rural school qualiti provid afford adopt grant medicaid colleg program beneficiari payment worker improv senior incom skill commun need institut area medic introduc

β 5.29 5.17 4.42 4.13 3.87 4.36 4.86 4.47 3.90 3.57 4.15 3.56 3.61 3.21 3.75 3.32 3.14 3.54 3.97 3.40 4.95 3.15 2.97 3.47 3.44 3.43 5.10 2.89 3.55 3.13 3.71 3.19 3.18 2.91 4.03 4.61 2.94 3.49 3.45 3.55

median 2.59 2.54 3.32 2.59 0.51 3.74 2.50 2.18 1.13 −0.77 0.21 1.71 0.98 0.04 1.85 0.16 0.22 0.97 2.14 1.25 3.62 1.11 1.75 1.79 −0.64 1.03 3.41 −0.69 0.58 1.46 2.28 1.47 0.59 0.25 2.88 3.90 1.75 2.44 1.23 2.08

difference 2.71 2.63 1.10 1.54 3.37 0.62 2.36 2.29 2.76 4.33 3.94 1.85 2.62 3.16 1.90 3.15 2.93 2.56 1.83 2.15 1.33 2.05 1.21 1.68 4.08 2.40 1.69 3.58 2.98 1.68 1.43 1.71 2.59 2.66 1.15 0.71 1.18 1.05 2.22 1.47

MAD 0.97 1.07 0.40 0.58 1.11 0.26 1.16 1.13 1.20 1.73 1.97 0.84 1.24 1.10 0.96 1.21 1.01 1.28 0.98 0.99 0.81 0.86 0.43 0.87 2.11 1.25 1.11 1.23 1.64 0.83 0.85 0.87 1.33 1.11 0.73 0.48 0.55 0.62 1.31 0.89

ratio 2.81 2.46 2.73 2.67 3.03 2.36 2.04 2.02 2.31 2.51 2.00 2.19 2.12 2.89 1.98 2.61 2.90 2.01 1.86 2.18 1.65 2.38 2.84 1.94 1.93 1.91 1.53 2.92 1.82 2.01 1.67 1.97 1.94 2.39 1.58 1.48 2.17 1.70 1.69 1.64

Table 10: Key Words on ‘Social Welfare’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 56

school teacher educ student children test local learn district class account classroom achiev teach better idea size kid parent grade public monei titl read best special everi behind need improv opportun meet level differ fail math high qualiti resourc elementari

β 5.90 4.94 5.65 4.98 5.01 3.86 4.17 3.65 4.29 3.67 3.77 3.58 3.44 3.40 3.45 3.39 3.26 3.54 4.14 3.19 4.15 4.10 3.40 3.53 3.24 3.37 3.82 3.07 4.64 3.78 3.63 3.27 3.49 3.35 2.98 2.84 3.40 3.24 3.52 3.03

median 2.14 0.00 2.18 1.13 2.50 1.49 1.94 1.50 1.03 0.94 2.12 −1.06 1.73 −0.30 2.43 1.61 0.50 0.49 1.18 −0.70 2.86 2.54 0.85 2.06 2.30 1.76 2.97 1.20 3.90 2.28 2.72 2.42 2.27 2.49 1.71 −1.62 2.26 1.25 1.90 −1.03

difference 3.76 4.93 3.47 3.85 2.51 2.37 2.23 2.14 3.27 2.73 1.65 4.64 1.71 3.70 1.02 1.78 2.76 3.05 2.96 3.89 1.29 1.56 2.55 1.47 0.94 1.61 0.86 1.88 0.75 1.51 0.91 0.85 1.22 0.86 1.27 4.46 1.14 1.99 1.62 4.06

MAD 0.98 1.37 1.13 1.20 1.16 0.69 0.90 0.47 1.47 0.83 0.63 1.45 0.45 1.07 0.33 0.55 0.61 1.24 1.62 0.76 0.72 0.88 1.09 0.68 0.33 0.65 0.47 0.42 0.48 0.85 0.47 0.37 0.65 0.41 0.41 0.94 0.61 0.99 0.94 1.76

ratio 3.84 3.61 3.07 3.22 2.17 3.45 2.48 4.61 2.23 3.29 2.63 3.20 3.78 3.45 3.05 3.27 4.55 2.46 1.82 5.10 1.79 1.76 2.34 2.16 2.86 2.47 1.82 4.43 1.55 1.76 1.91 2.30 1.89 2.12 3.07 4.73 1.85 2.02 1.72 2.31

Table 11: Key Words on ‘Education’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 57

veteran va forc militari care reserv serv men guard member activ benefit pai personnel duti health servic our defens retire compens arm soldier disabl medic base retir uniform recruit affair armi support women war troop sacrific honor readi civilian combat

β 5.36 4.42 4.13 4.99 4.77 3.87 3.91 3.73 3.79 4.18 3.59 4.24 4.01 3.66 3.47 4.52 4.81 5.41 3.99 3.26 3.24 3.63 3.14 3.49 3.55 2.90 3.37 2.75 2.93 3.05 3.26 4.11 3.95 3.84 2.89 2.61 2.78 2.70 2.62 2.61

median 0.52 −0.82 2.28 1.52 2.59 1.64 2.29 1.41 0.28 3.15 2.21 2.45 2.59 0.48 0.87 2.54 3.16 4.88 1.79 −1.37 0.38 0.91 −0.48 0.98 1.23 2.10 0.83 0.33 −0.63 0.54 0.01 3.74 2.25 1.99 −0.62 −0.33 0.87 0.99 −0.56 0.01

difference 4.84 5.24 1.85 3.47 2.19 2.23 1.62 2.32 3.51 1.03 1.38 1.79 1.43 3.18 2.60 1.98 1.65 0.54 2.20 4.63 2.86 2.73 3.62 2.51 2.32 0.80 2.54 2.42 3.56 2.51 3.25 0.38 1.71 1.85 3.52 2.94 1.91 1.72 3.17 2.60

MAD 1.37 1.45 0.57 1.60 0.97 0.83 0.64 0.77 1.32 0.51 0.60 0.93 0.71 1.42 1.00 1.07 0.97 0.33 1.16 1.70 1.02 1.40 1.40 1.24 1.31 0.28 1.35 0.64 1.53 1.20 1.80 0.26 1.20 1.27 1.66 0.83 0.90 0.77 1.38 1.14

ratio 3.52 3.62 3.24 2.17 2.27 2.71 2.55 3.02 2.67 2.00 2.31 1.93 2.00 2.24 2.59 1.85 1.70 1.62 1.89 2.73 2.81 1.94 2.59 2.02 1.77 2.88 1.87 3.77 2.33 2.09 1.80 1.42 1.43 1.45 2.12 3.53 2.12 2.22 2.29 2.29

Table 12: Key Words on ‘Armed Forces2 [Manpower]’ (includes veterans’ issues). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 58

appropri defens forc report request confer guard depart fund project homeland base emerg critic million level threat coast air respond subcommitte includ fire oper contain budget equip within alloc technologi militari aircraft attack train capabl port author fiscal airport committe

β 4.22 4.14 3.63 3.92 3.53 3.54 3.19 4.08 4.78 3.55 3.04 3.00 3.17 3.01 4.43 3.31 2.82 2.78 3.30 2.75 3.02 3.97 2.60 3.46 2.60 4.10 2.54 2.67 2.55 3.10 3.63 2.64 2.77 2.99 2.44 2.47 3.95 3.64 2.44 4.21

median 2.67 1.79 2.28 3.05 1.65 2.16 0.28 2.68 3.03 1.67 0.02 2.10 1.72 2.22 3.13 2.27 0.88 −0.17 1.21 1.66 1.15 3.27 0.67 2.18 1.59 2.21 0.75 2.15 0.27 1.32 1.52 −0.66 1.19 1.71 0.42 −0.65 2.70 1.49 −0.54 3.46

difference 1.55 2.35 1.35 0.87 1.87 1.37 2.91 1.41 1.75 1.88 3.03 0.90 1.45 0.79 1.30 1.04 1.94 2.95 2.09 1.10 1.86 0.70 1.93 1.28 1.01 1.89 1.80 0.52 2.28 1.78 2.11 3.30 1.58 1.27 2.02 3.12 1.25 2.15 2.99 0.75

MAD 0.69 1.16 0.57 0.44 0.79 0.64 1.32 0.90 1.18 1.12 1.16 0.28 0.82 0.38 0.92 0.65 0.79 1.34 1.36 0.51 1.07 0.51 0.82 0.91 0.48 1.46 0.75 0.28 1.06 1.20 1.60 1.86 0.97 0.84 0.73 1.48 1.00 1.67 1.39 0.62

ratio 2.24 2.03 2.36 1.99 2.38 2.13 2.21 1.56 1.49 1.69 2.61 3.26 1.77 2.09 1.41 1.61 2.47 2.20 1.54 2.17 1.74 1.37 2.35 1.42 2.12 1.30 2.39 1.87 2.15 1.49 1.32 1.77 1.64 1.52 2.77 2.10 1.26 1.28 2.15 1.22

Table 13: Key Words on ‘Armed Forces 2 [Infrastructure]’ (category includes military bases and civil defense aspects of homeland security). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 59

intellig homeland commiss depart agenc director secur base defens respons truck terrorist border inform 9/11 airport recommend attack report threat 11 airlin committe septemb close respond secretari air forc terror fbi within investig flight oversight white safeti competit cia passeng

β 4.42 3.45 3.36 4.06 3.71 3.12 4.46 3.07 3.55 3.22 2.79 3.13 2.80 3.34 2.70 2.75 2.81 2.89 3.56 2.71 2.67 2.75 4.11 2.79 2.78 2.57 3.29 2.91 2.97 2.85 2.33 2.55 2.31 2.13 2.10 2.13 2.85 2.50 2.06 2.12

median 0.13 0.02 1.82 2.68 1.72 0.96 2.79 2.10 1.79 2.70 −0.30 0.63 0.48 2.41 −1.39 −0.54 1.30 1.19 3.05 0.88 1.27 −0.82 3.46 1.44 2.28 1.66 2.10 1.21 2.28 0.84 −0.87 2.15 0.80 −0.39 0.07 1.25 1.78 0.96 −2.21 −1.05

difference 4.29 3.44 1.54 1.38 2.00 2.16 1.67 0.97 1.75 0.51 3.09 2.50 2.32 0.93 4.09 3.30 1.51 1.70 0.51 1.83 1.40 3.57 0.66 1.35 0.50 0.91 1.19 1.70 0.69 2.01 3.20 0.40 1.51 2.52 2.03 0.88 1.07 1.54 4.27 3.17

MAD 1.47 1.16 0.67 0.90 1.25 0.74 1.34 0.28 1.16 0.33 1.07 1.59 1.02 0.71 1.34 1.39 0.73 0.97 0.44 0.79 0.56 1.68 0.62 0.78 0.31 0.51 1.08 1.36 0.57 1.60 1.58 0.28 0.97 1.30 0.91 0.49 1.06 1.33 2.02 1.77

ratio 2.91 2.97 2.30 1.53 1.60 2.91 1.25 3.49 1.51 1.56 2.90 1.57 2.27 1.30 3.06 2.38 2.06 1.76 1.16 2.32 2.51 2.12 1.06 1.73 1.63 1.80 1.10 1.25 1.20 1.26 2.02 1.43 1.55 1.94 2.24 1.79 1.01 1.16 2.11 1.79

Table 14: Key Words on ‘Intelligence’ (category includes terrorism and homeland security). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 60

act inform enforc record law court section crimin internet investig electron protect privaci fbi such report comput attornei case ident applic voter properti govern judiciari immigr appli fraud bank financi or victim document requir agenc congress crime border justic custom

β 4.51 4.16 3.55 3.40 4.52 3.84 3.40 3.34 3.00 2.98 2.75 4.11 2.90 2.75 3.66 3.74 2.70 2.80 3.36 2.43 2.59 2.43 2.73 3.98 2.62 3.09 2.54 2.48 2.72 2.85 4.89 2.83 2.30 3.97 3.44 3.91 2.97 2.43 3.07 2.27

median 3.32 2.41 1.66 2.61 3.15 1.65 1.65 0.24 0.03 0.80 −0.70 3.00 −1.18 −0.87 3.19 3.05 0.30 0.50 2.55 −0.15 0.82 −0.70 0.67 3.22 0.03 0.60 1.51 −0.70 0.79 1.63 4.53 0.77 0.73 3.05 1.72 3.46 0.70 0.48 1.29 0.09

difference 1.19 1.75 1.89 0.79 1.36 2.18 1.75 3.11 2.98 2.18 3.45 1.11 4.08 3.62 0.47 0.69 2.40 2.30 0.81 2.58 1.77 3.13 2.06 0.76 2.59 2.49 1.04 3.18 1.93 1.23 0.36 2.06 1.57 0.92 1.72 0.46 2.28 1.95 1.78 2.18

MAD 0.40 0.71 0.75 0.30 0.73 1.14 0.85 1.48 1.32 0.97 1.14 0.69 1.98 1.58 0.29 0.44 1.10 1.20 0.50 0.68 0.76 0.82 1.08 0.53 1.24 1.53 0.52 1.55 1.11 0.74 0.28 1.24 0.61 0.71 1.25 0.36 1.48 1.02 1.22 1.02

ratio 2.95 2.45 2.52 2.65 1.87 1.91 2.07 2.10 2.25 2.24 3.03 1.62 2.06 2.29 1.61 1.58 2.18 1.92 1.63 3.79 2.33 3.82 1.91 1.44 2.09 1.63 1.99 2.05 1.74 1.66 1.27 1.66 2.59 1.30 1.38 1.29 1.54 1.90 1.46 2.13

Table 15: Key Words on ‘Law and Crime 2 [Federal]’ (category includes the FBI, Border Patrol, immigration, white collar crime). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 61

land water park act river natur wildlif area conserv forest fish tribe project speci resourc histor restor preserv recreat lake establish fisheri interior coastal environment acr local within site wilder habitat indian design trust counti develop introduc environ corp clean

β 4.69 4.44 4.09 4.39 3.60 3.45 3.37 4.01 3.78 3.91 3.56 3.47 4.09 3.28 3.99 3.08 3.19 3.16 2.95 3.20 3.19 2.99 3.13 2.85 3.62 2.99 3.62 2.93 3.08 2.76 2.81 3.58 3.02 2.80 3.10 4.04 3.59 2.77 2.81 2.61

median 1.32 1.02 −0.12 3.32 −0.66 1.37 −1.57 2.44 0.65 −0.49 −0.70 −0.78 1.67 −1.49 1.90 1.02 0.97 0.94 −1.30 −0.78 2.10 −2.59 −0.69 −2.42 0.66 −1.36 1.94 2.15 0.51 −2.40 −2.56 0.48 1.82 1.23 1.18 2.54 2.08 0.98 −0.46 0.39

difference 3.37 3.41 4.21 1.07 4.26 2.08 4.94 1.57 3.14 4.40 4.26 4.24 2.42 4.77 2.10 2.06 2.22 2.22 4.25 3.98 1.09 5.59 3.83 5.28 2.96 4.34 1.68 0.78 2.57 5.16 5.38 3.11 1.20 1.58 1.92 1.50 1.51 1.79 3.27 2.23

MAD 0.92 0.95 1.11 0.40 1.21 0.58 1.30 0.62 1.16 1.80 1.57 1.55 1.12 1.52 0.94 0.57 0.78 0.77 1.09 1.52 0.43 1.88 1.49 1.42 1.50 1.55 0.90 0.28 1.11 1.44 1.86 1.74 0.53 0.53 0.93 0.94 0.89 0.68 1.34 0.72

ratio 3.65 3.61 3.79 2.65 3.52 3.61 3.81 2.54 2.71 2.44 2.71 2.74 2.17 3.14 2.23 3.61 2.85 2.87 3.88 2.61 2.51 2.98 2.56 3.73 1.97 2.79 1.88 2.79 2.31 3.59 2.89 1.79 2.26 2.96 2.07 1.59 1.69 2.63 2.45 3.08

Table 16: Key Words on ‘Environment 1 [Public Lands]’ (category includes public lands and water management, resource conservation, and Native American affairs). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 62

small busi act highwai transport internet loan credit local capit market bank competit section compani rural carrier rail technologi contract transit invest project road area revenu commerci sec electr economi financi satellit access introduc establish commerc fuel effici regulatori growth

β 4.34 4.68 4.16 3.53 3.75 3.20 3.24 3.38 3.50 2.74 3.42 2.95 3.24 3.14 3.58 3.10 2.54 2.54 3.20 2.91 2.70 3.27 3.28 2.38 3.31 2.65 2.53 2.30 2.31 3.05 2.75 2.33 3.14 3.24 2.74 2.63 2.20 2.46 2.32 2.57

median 2.09 2.62 3.32 0.05 1.13 0.03 0.42 1.47 1.94 0.96 1.45 0.79 0.96 1.65 1.83 0.97 −1.11 −1.24 1.32 0.92 0.48 1.46 1.67 0.81 2.44 0.77 0.20 −1.53 −0.31 1.56 1.63 −1.11 1.85 2.08 2.10 0.87 −0.25 0.81 −0.26 0.82

difference 2.25 2.06 0.84 3.48 2.62 3.17 2.82 1.90 1.56 1.78 1.97 2.16 2.28 1.49 1.75 2.13 3.65 3.78 1.88 1.99 2.23 1.81 1.61 1.57 0.87 1.88 2.33 3.82 2.62 1.49 1.13 3.45 1.29 1.16 0.64 1.76 2.45 1.64 2.58 1.75

MAD 0.68 0.87 0.40 1.42 1.44 1.32 1.36 1.04 0.90 0.65 1.23 1.11 1.33 0.85 1.17 1.28 1.31 1.35 1.20 1.15 1.17 1.25 1.12 0.45 0.62 1.06 1.18 1.31 1.02 1.06 0.74 1.58 0.96 0.89 0.43 1.11 0.75 0.93 1.30 1.08

ratio 3.31 2.37 2.08 2.44 1.82 2.40 2.08 1.82 1.74 2.71 1.61 1.95 1.72 1.76 1.50 1.66 2.79 2.79 1.57 1.74 1.90 1.45 1.45 3.45 1.41 1.78 1.98 2.92 2.57 1.41 1.52 2.19 1.34 1.30 1.47 1.59 3.27 1.77 1.99 1.61

Table 17: Key Words on ‘Commerce / Transportation / Telecom’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 63

bankruptci bank credit case ir compani file card financi lawyer court class problem debt action attornei taxpay account chapter abus small pai busi fee corpor law board plaintiff lawsuit union loan litig damag code consum claim settlement interest incom audit

β 4.56 3.89 4.00 3.86 3.82 4.09 3.70 3.33 3.47 3.35 3.85 3.27 3.75 3.42 3.50 3.17 3.48 3.32 2.89 3.12 3.32 3.67 3.88 3.19 2.95 4.12 2.84 2.68 2.77 2.92 2.89 2.91 2.65 2.58 3.49 2.92 2.48 3.26 2.72 2.27

median −0.20 0.79 1.47 2.55 −0.68 1.83 0.96 −0.07 1.63 −0.28 1.65 0.94 3.10 0.83 2.35 0.50 1.40 2.12 −0.48 0.83 2.09 2.59 2.62 0.48 1.02 3.15 1.40 −1.93 −0.51 1.28 0.42 −0.18 0.88 0.41 1.33 1.56 −0.41 2.77 0.59 −1.11

difference 4.76 3.10 2.53 1.31 4.50 2.27 2.74 3.40 1.84 3.63 2.19 2.33 0.64 2.58 1.15 2.67 2.07 1.20 3.36 2.29 1.23 1.09 1.26 2.71 1.93 0.97 1.44 4.60 3.28 1.65 2.47 3.08 1.77 2.17 2.16 1.36 2.89 0.49 2.13 3.38

MAD 1.51 1.11 1.04 0.50 1.87 1.17 1.25 0.95 0.74 1.15 1.14 0.83 0.36 1.22 0.65 1.20 1.17 0.63 0.97 1.13 0.68 0.71 0.87 1.50 0.95 0.73 0.66 1.83 1.50 0.93 1.36 1.78 0.79 0.93 1.65 0.87 1.16 0.37 1.33 1.04

ratio 3.15 2.80 2.42 2.64 2.40 1.94 2.19 3.58 2.49 3.14 1.92 2.82 1.78 2.12 1.76 2.23 1.77 1.91 3.46 2.03 1.81 1.52 1.45 1.81 2.04 1.33 2.18 2.52 2.19 1.77 1.82 1.73 2.24 2.34 1.31 1.56 2.50 1.35 1.60 3.24

Table 18: Key Words on ‘Banking and Finance’ (category includes corporate regulation, small business, tort reform, and bankruptcy). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 64

worker social retir benefit plan act employ pension small employe incom busi job earn stock wage pai compani contribut tax account unemploy current option corpor labor save insur minimum capit overtim coverag deduct economi retire invest code introduc senior work

β 4.78 4.11 4.37 4.67 4.43 4.28 4.16 4.12 3.89 4.43 3.93 4.40 4.09 3.53 3.40 3.37 4.02 4.09 3.31 5.22 3.56 3.15 3.81 3.35 3.31 3.28 3.68 3.74 2.96 2.85 2.82 2.89 3.19 3.45 2.85 3.56 2.78 3.54 3.10 4.69

median 1.46 1.35 0.83 2.45 2.47 3.32 0.99 −0.41 2.09 1.30 0.59 2.62 2.55 0.16 −0.23 0.28 2.59 1.83 1.57 2.49 2.12 −0.52 2.85 1.02 1.02 1.34 2.03 1.18 0.59 0.96 −1.30 0.04 −0.70 1.56 −1.37 1.46 0.41 2.08 1.47 4.02

difference 3.33 2.76 3.54 2.22 1.96 0.96 3.17 4.53 1.80 3.14 3.34 1.78 1.55 3.38 3.62 3.09 1.44 2.27 1.73 2.73 1.44 3.68 0.96 2.34 2.29 1.94 1.65 2.56 2.37 1.89 4.12 2.84 3.89 1.90 4.22 2.10 2.38 1.46 1.63 0.67

MAD 0.83 0.81 1.35 0.93 0.81 0.40 1.35 1.92 0.68 1.42 1.33 0.87 0.69 1.03 1.04 0.92 0.71 1.17 0.55 1.60 0.63 1.01 0.49 0.97 0.95 0.79 0.88 1.45 0.96 0.65 1.28 1.10 1.95 1.06 1.70 1.25 0.93 0.89 0.87 0.47

ratio 3.99 3.41 2.62 2.39 2.42 2.37 2.34 2.35 2.65 2.20 2.51 2.05 2.24 3.27 3.47 3.35 2.02 1.94 3.17 1.70 2.29 3.64 1.96 2.40 2.41 2.45 1.87 1.77 2.46 2.89 3.22 2.60 1.99 1.80 2.48 1.69 2.56 1.64 1.88 1.41

Table 19: Key Words on ‘Labor 1 [Workers]’ (emphasizes benefits and conditions of employment, including retirement / pensions). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 65

social year cut budget debt spend balanc deficit over trust surplu pai monei tax trillion resolut next secur medicar economi govern futur surplus interest revenu last congress billion propos rais growth plan retir fund than incom put bush fiscal dollar

β 5.02 5.36 4.84 5.62 4.57 4.82 4.36 4.39 4.23 3.93 4.20 4.26 4.37 5.32 3.78 3.72 3.68 5.13 4.13 3.75 4.18 3.56 3.25 3.53 3.17 3.78 4.04 4.82 3.92 2.97 2.97 3.74 3.23 4.58 4.11 3.03 3.74 2.87 3.87 3.18

median 1.35 4.46 1.66 2.21 0.83 2.14 1.83 0.04 3.60 1.23 −0.67 2.59 2.54 2.49 −0.96 1.85 2.42 2.79 0.21 1.56 3.22 2.43 −2.07 2.77 0.77 3.28 3.46 2.83 2.84 2.17 0.82 2.47 0.83 3.03 3.71 0.59 3.00 1.31 1.49 1.92

difference 3.67 0.91 3.18 3.41 3.74 2.68 2.54 4.35 0.63 2.71 4.87 1.68 1.83 2.83 4.74 1.87 1.26 2.34 3.91 2.19 0.95 1.13 5.33 0.77 2.40 0.50 0.59 1.99 1.07 0.80 2.16 1.27 2.39 1.56 0.40 2.44 0.74 1.55 2.38 1.27

MAD 0.81 0.33 0.97 1.46 1.22 0.99 0.74 1.41 0.20 0.53 1.81 0.71 0.88 1.60 1.76 0.66 0.44 1.34 1.97 1.06 0.53 0.48 1.99 0.37 1.06 0.29 0.36 1.42 0.67 0.37 1.08 0.81 1.35 1.18 0.28 1.33 0.50 0.78 1.67 0.79

ratio 4.54 2.72 3.29 2.34 3.06 2.71 3.44 3.09 3.13 5.07 2.69 2.35 2.07 1.77 2.69 2.82 2.87 1.75 1.99 2.07 1.81 2.35 2.68 2.10 2.27 1.74 1.64 1.41 1.60 2.18 1.99 1.56 1.77 1.32 1.41 1.83 1.48 1.99 1.42 1.60

Table 20: Key Words on ‘Debt / Deficit / Social Security’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 66

job worker pai wage economi hour compani minimum overtim unemploy work peopl cut get trade monei lost labor thei tax lose week countri bush employ deficit corpor incom rais manufactur them welfar paid off price growth don’t kid econom busi

β 4.49 3.87 3.77 3.36 3.50 3.10 3.48 3.01 2.96 2.90 4.53 4.77 3.10 4.35 3.14 3.56 2.59 2.65 5.40 4.01 2.43 3.08 4.08 2.44 2.69 2.34 2.40 2.44 2.62 2.53 4.06 2.15 2.23 2.86 2.34 2.29 3.80 2.24 3.26 3.38

median 2.55 1.46 2.59 0.28 1.56 1.87 1.83 0.59 −1.30 −0.52 4.02 4.08 1.66 3.35 1.45 2.54 1.46 1.34 4.86 2.49 1.36 2.57 3.64 1.31 0.99 0.04 1.02 0.59 2.17 0.71 3.71 0.22 0.99 2.12 1.08 0.82 2.98 0.49 2.01 2.62

difference 1.95 2.42 1.18 3.08 1.94 1.22 1.65 2.42 4.26 3.42 0.51 0.69 1.44 1.00 1.70 1.01 1.14 1.31 0.53 1.52 1.06 0.51 0.44 1.13 1.70 2.30 1.38 1.85 0.46 1.82 0.35 1.94 1.24 0.74 1.26 1.47 0.83 1.75 1.25 0.76

MAD 0.69 0.83 0.71 0.92 1.06 0.44 1.17 0.96 1.28 1.01 0.47 0.65 0.97 0.96 1.25 0.88 0.56 0.79 0.61 1.60 0.63 0.46 0.49 0.78 1.35 1.41 0.95 1.33 0.37 1.47 0.41 1.01 0.80 0.71 0.97 1.08 1.00 1.24 1.38 0.87

ratio 2.82 2.90 1.66 3.34 1.84 2.81 1.41 2.51 3.33 3.38 1.07 1.05 1.49 1.05 1.36 1.15 2.03 1.65 0.88 0.95 1.69 1.11 0.90 1.44 1.25 1.63 1.46 1.39 1.24 1.24 0.84 1.92 1.55 1.05 1.29 1.36 0.83 1.41 0.91 0.88

Table 21: Key Words on ‘Labor 2 [Employment]’ (category includes discussion of jobs, wages, and general state of the economy). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 67

tax cut incom pai estat over relief marriag than penalti credit coupl year code marri economi capit busi american repeal more small earn rate taxpay monei gain social elimin bracket make rais govern revenu percent give less propos burden spend

β 6.48 4.74 4.59 4.60 4.13 4.22 4.45 4.15 4.40 4.08 4.07 3.70 5.10 3.65 3.60 3.98 3.42 4.37 4.78 3.40 4.82 3.67 3.39 4.10 3.88 4.23 3.24 3.40 3.32 3.11 4.62 3.23 4.18 3.30 4.79 3.82 3.25 4.05 3.01 3.94

median 2.49 1.66 0.59 2.59 −0.70 3.60 1.28 −0.75 3.71 0.58 1.47 1.45 4.46 0.41 −0.64 1.56 0.96 2.62 3.76 0.07 4.11 2.09 0.16 1.66 1.40 2.54 0.82 1.35 1.52 −2.92 3.96 2.17 3.22 0.77 3.05 2.85 2.08 2.84 0.69 2.14

difference 3.99 3.08 4.00 2.02 4.82 0.62 3.17 4.89 0.69 3.50 2.60 2.25 0.65 3.24 4.24 2.42 2.46 1.75 1.01 3.33 0.71 1.59 3.24 2.44 2.48 1.68 2.42 2.05 1.79 6.03 0.66 1.07 0.95 2.53 1.75 0.96 1.17 1.20 2.32 1.80

MAD 1.60 0.97 1.33 0.71 1.28 0.20 1.18 1.55 0.28 1.17 1.04 0.66 0.33 0.93 1.19 1.06 0.65 0.87 0.54 1.02 0.39 0.68 1.03 1.21 1.17 0.88 0.65 0.81 0.62 1.34 0.38 0.37 0.53 1.06 1.08 0.52 0.49 0.67 0.64 0.99

ratio 2.49 3.19 3.00 2.83 3.77 3.09 2.69 3.17 2.46 2.98 2.49 3.43 1.94 3.50 3.57 2.29 3.76 2.02 1.86 3.27 1.82 2.33 3.13 2.02 2.12 1.90 3.75 2.54 2.88 4.51 1.71 2.90 1.81 2.40 1.62 1.87 2.38 1.79 3.61 1.82

Table 22: Key Words on ‘Taxes’ (emphasizes individual taxation, including income and estate taxes). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 68

energi fuel ga oil price produc electr renew natur suppli depend gasolin power sourc product barrel clean polici drill standard effici vehicl area market coal gallon climat economi technologi wind emiss car environ domest demand import util alaska our environment

β 5.44 4.29 4.29 5.00 4.29 3.86 3.76 3.76 3.54 3.63 3.45 3.48 3.60 3.33 4.01 3.24 3.16 3.84 3.15 3.53 3.25 3.15 3.61 3.70 2.96 2.96 2.82 3.50 3.48 2.73 3.08 2.85 2.87 2.91 2.75 4.01 2.82 3.09 5.28 3.28

median 1.24 −0.25 0.11 0.25 1.08 1.49 −0.31 0.27 1.37 0.68 1.06 −1.39 1.96 0.89 2.16 −1.99 0.39 2.67 −1.56 2.01 0.81 0.36 2.44 1.45 −1.38 −2.48 −1.16 1.56 1.32 −0.55 −2.84 −0.11 0.98 0.77 1.26 3.58 0.52 0.37 4.88 0.66

difference 4.20 4.54 4.18 4.76 3.21 2.37 4.07 3.49 2.17 2.95 2.39 4.86 1.65 2.44 1.85 5.23 2.77 1.17 4.71 1.51 2.44 2.79 1.17 2.25 4.34 5.44 3.98 1.94 2.17 3.28 5.92 2.96 1.89 2.14 1.48 0.44 2.30 2.72 0.41 2.61

MAD 1.27 0.75 1.19 1.83 0.97 0.62 1.02 0.92 0.58 0.89 0.67 1.55 0.62 0.67 0.92 1.46 0.72 0.60 1.49 0.72 0.93 0.94 0.62 1.23 1.22 1.77 0.81 1.06 1.20 0.73 2.52 1.00 0.68 0.84 0.43 0.31 0.89 1.39 0.33 1.50

ratio 3.31 6.05 3.51 2.60 3.29 3.81 3.99 3.79 3.77 3.30 3.58 3.14 2.67 3.62 2.01 3.58 3.83 1.93 3.15 2.10 2.63 2.95 1.90 1.83 3.56 3.07 4.90 1.83 1.81 4.48 2.35 2.97 2.78 2.57 3.47 1.39 2.58 1.96 1.22 1.74

Table 23: Key Words on ‘Energy’ (category includes energy supply and prices, environmental effects). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 69

wast land water site forest nuclear fire mine environment road fuel river area mountain nevada energi clean park epa move environ highwai storag spent interior acr governor standard transport project pollut wildlif power natur safe fish public speci tree alaska

β 4.21 3.85 3.75 3.68 3.83 4.11 3.26 3.17 3.67 3.06 3.04 3.05 3.56 2.91 3.18 3.50 2.92 2.93 2.94 3.26 2.85 3.01 2.80 2.76 2.71 2.61 2.80 3.08 3.19 3.21 2.40 2.40 2.89 2.54 2.57 2.48 3.68 2.31 2.35 2.62

median 0.59 1.32 1.02 0.51 −0.49 0.33 0.67 0.40 0.66 0.81 −0.25 −0.66 2.44 −0.71 0.76 1.24 0.39 −0.12 −1.51 2.65 0.98 0.05 −1.84 1.56 −0.69 −1.36 1.11 2.01 1.13 1.67 −1.18 −1.57 1.96 1.37 1.11 −0.70 2.86 −1.49 −0.86 0.37

difference 3.62 2.53 2.73 3.18 4.32 3.78 2.59 2.77 3.01 2.25 3.29 3.71 1.12 3.62 2.42 2.26 2.53 3.05 4.45 0.61 1.87 2.96 4.63 1.19 3.40 3.97 1.70 1.07 2.06 1.54 3.58 3.97 0.93 1.16 1.46 3.17 0.81 3.80 3.21 2.24

MAD 0.78 0.92 0.95 1.11 1.80 2.06 0.82 0.52 1.50 0.45 0.75 1.21 0.62 0.75 1.11 1.27 0.72 1.11 1.79 0.33 0.68 1.42 1.63 0.53 1.49 1.55 0.87 0.72 1.44 1.12 1.03 1.30 0.62 0.58 0.77 1.57 0.72 1.52 1.42 1.39

ratio 4.65 2.74 2.88 2.86 2.39 1.83 3.16 5.31 2.00 4.94 4.39 3.07 1.82 4.80 2.17 1.78 3.51 2.75 2.48 1.83 2.74 2.08 2.84 2.26 2.28 2.55 1.95 1.48 1.43 1.38 3.49 3.06 1.52 2.02 1.90 2.02 1.13 2.50 2.26 1.62

Table 24: Key Words on ‘Environment 2’ (category includes pollution, wildlife protection). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 70

farmer price produc farm crop agricultur disast compact food market product payment commod loan wheat incom drought rural livestock loss rancher usda part north confer emerg grain dakota tobacco polici land conserv trade year assist minnesota northeast flood countri net

β 5.42 4.69 4.45 5.23 4.20 4.73 3.89 3.62 3.76 4.22 3.97 3.79 3.15 3.38 3.05 3.38 3.13 3.43 3.03 3.08 2.87 3.00 3.31 3.22 3.27 3.18 2.66 3.21 3.05 3.55 3.03 2.93 3.33 4.85 3.83 2.85 2.61 2.63 4.23 2.71

median 0.74 1.08 1.49 0.84 −0.90 0.70 0.13 −1.63 0.81 1.45 2.16 0.58 −1.04 0.42 −2.16 0.59 −1.91 0.97 −1.88 1.26 −2.01 −1.80 2.89 1.75 2.16 1.72 −2.26 1.36 0.04 2.67 1.32 0.65 1.45 4.46 2.61 0.91 −1.47 −0.15 3.64 −0.06

difference 4.68 3.60 2.96 4.39 5.10 4.03 3.76 5.24 2.94 2.78 1.82 3.21 4.19 2.96 5.21 2.79 5.04 2.46 4.91 1.83 4.88 4.80 0.42 1.47 1.11 1.46 4.92 1.85 3.01 0.87 1.71 2.28 1.89 0.40 1.23 1.94 4.08 2.78 0.59 2.77

MAD 1.43 0.97 0.62 1.71 1.44 1.78 1.22 1.20 0.96 1.23 0.92 1.64 1.10 1.36 1.13 1.33 1.75 1.28 1.75 0.76 1.50 1.91 0.23 0.83 0.64 0.82 1.08 1.06 1.58 0.60 0.92 1.16 1.25 0.33 0.93 1.02 1.54 1.13 0.49 1.31

ratio 3.28 3.70 4.76 2.56 3.54 2.26 3.07 4.35 3.06 2.26 1.97 1.96 3.81 2.18 4.61 2.09 2.87 1.92 2.81 2.41 3.24 2.52 1.84 1.76 1.72 1.79 4.54 1.75 1.90 1.44 1.85 1.98 1.51 1.20 1.32 1.90 2.64 2.46 1.21 2.11

Table 25: Key Words on ‘Agriculture’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 71

trade agreement china negoti import countri worker unit world free market u job labor steel export product tariff produc wto foreign open chines economi relat deficit intern polici sanction track fast our japan africa global manufactur farmer interest econom relationship

β 5.76 4.81 4.71 4.12 4.35 4.70 3.86 4.46 4.13 3.52 4.04 4.91 3.99 3.55 3.45 3.82 3.91 3.36 3.33 3.43 3.60 3.14 3.27 3.62 3.07 3.27 3.51 3.69 3.03 2.84 2.80 5.29 2.78 2.68 2.84 3.16 3.14 3.34 3.85 2.61

median 1.45 2.10 −0.10 1.23 3.58 3.64 1.46 3.17 2.35 1.49 1.45 4.13 2.55 1.34 −0.82 −0.06 2.16 −2.25 1.49 −2.51 1.70 1.82 −1.37 1.56 1.82 0.04 1.88 2.67 −0.24 0.59 −0.26 4.88 −0.52 −0.87 0.71 0.71 0.74 2.77 2.01 0.74

difference 4.32 2.71 4.81 2.89 0.77 1.06 2.40 1.29 1.78 2.03 2.59 0.78 1.45 2.21 4.27 3.88 1.75 5.61 1.84 5.94 1.90 1.32 4.64 2.06 1.25 3.23 1.62 1.02 3.27 2.24 3.06 0.42 3.30 3.55 2.13 2.45 2.40 0.58 1.84 1.87

MAD 1.25 0.66 1.52 0.90 0.31 0.49 0.83 0.60 0.79 0.62 1.23 0.44 0.69 0.79 1.48 1.95 0.92 2.00 0.62 2.47 0.94 0.41 1.78 1.06 0.41 1.41 0.88 0.60 1.30 0.55 0.58 0.33 1.26 1.16 0.96 1.47 1.43 0.37 1.38 0.68

ratio 3.46 4.10 3.16 3.20 2.45 2.17 2.88 2.17 2.25 3.30 2.11 1.79 2.10 2.79 2.88 1.99 1.90 2.81 2.96 2.40 2.03 3.18 2.61 1.95 3.04 2.29 1.85 1.68 2.51 4.11 5.25 1.25 2.61 3.06 2.22 1.67 1.68 1.58 1.33 2.76

Table 26: Key Words on ‘Foreign Trade’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 72

mr consent unanim order move senat ask amend presid quorum tabl thank object agre from offer reconsid motion minut request lai vote chair desk understand appropri further consider colleagu absenc behalf madam send correct adopt accept yea which urg nai

β 5.51 4.36 4.32 4.14 4.12 6.10 4.78 5.88 5.60 3.45 3.47 3.89 3.53 3.61 5.46 3.77 3.12 3.40 4.05 3.41 3.01 4.69 3.06 3.06 3.52 3.82 2.97 2.95 3.99 2.83 2.84 3.02 3.03 2.90 2.88 2.79 2.78 4.47 2.88 2.75

median 4.26 2.46 2.53 2.67 2.65 4.95 3.48 4.19 4.67 1.33 1.19 2.85 1.89 2.40 4.91 2.73 −0.11 0.91 2.19 1.65 0.57 3.29 1.82 0.89 2.74 2.67 2.06 1.78 3.50 0.96 1.09 1.82 1.86 1.66 1.75 1.41 −0.22 4.19 2.11 −0.24

difference 1.24 1.89 1.80 1.47 1.47 1.15 1.29 1.70 0.94 2.12 2.29 1.04 1.64 1.21 0.55 1.04 3.24 2.49 1.87 1.75 2.44 1.40 1.24 2.18 0.78 1.14 0.91 1.17 0.49 1.87 1.75 1.20 1.17 1.24 1.13 1.38 3.00 0.29 0.77 3.00

MAD 0.28 0.26 0.23 0.33 0.33 0.56 0.60 0.88 0.48 0.41 0.76 0.47 0.60 0.51 0.33 0.50 0.69 1.01 1.07 0.79 0.44 0.89 0.45 0.82 0.42 0.69 0.28 0.37 0.31 0.35 0.47 0.51 0.54 0.49 0.43 0.48 1.01 0.22 0.34 1.05

ratio 4.39 7.28 7.83 4.52 4.42 2.04 2.16 1.93 1.94 5.13 2.99 2.22 2.74 2.36 1.64 2.07 4.68 2.47 1.74 2.23 5.59 1.57 2.74 2.66 1.85 1.66 3.25 3.17 1.58 5.38 3.72 2.35 2.18 2.53 2.64 2.88 2.98 1.33 2.29 2.86

Table 27: Key Words on ‘Procedure 3 [Legislation 1]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 73

leader major am senat move issu hope week done to been consent forward unanim other object vote offer on appropri work time understand confer bring thi agreement pass can complet clotur bill import hour abl togeth agre proce get colleagu

β 4.44 4.35 4.51 6.10 3.98 4.49 4.00 3.88 3.99 7.24 4.78 3.53 3.47 3.49 4.52 3.60 4.93 3.90 6.21 4.12 4.85 4.90 3.70 3.64 3.29 6.22 3.54 4.11 4.83 3.12 3.46 5.62 4.12 3.09 3.46 3.01 3.42 2.88 4.80 4.02

median 2.30 2.67 3.80 4.95 2.65 3.58 2.90 2.57 2.72 6.84 4.22 2.46 2.23 2.53 4.09 1.89 3.29 2.73 5.63 2.67 4.02 4.19 2.74 2.16 2.37 5.80 2.10 3.09 4.13 1.87 0.11 4.62 3.58 1.87 2.67 2.15 2.40 0.90 3.35 3.50

difference 2.14 1.68 0.71 1.15 1.33 0.91 1.10 1.31 1.27 0.40 0.56 1.07 1.24 0.97 0.43 1.71 1.64 1.17 0.58 1.45 0.83 0.71 0.96 1.47 0.92 0.42 1.44 1.02 0.70 1.25 3.35 0.99 0.54 1.22 0.79 0.86 1.02 1.98 1.45 0.52

MAD 0.60 0.36 0.28 0.56 0.33 0.40 0.37 0.46 0.52 0.23 0.28 0.26 0.26 0.23 0.22 0.60 0.89 0.50 0.34 0.69 0.47 0.41 0.42 0.64 0.27 0.25 0.66 0.57 0.43 0.44 1.54 0.64 0.31 0.44 0.37 0.33 0.51 0.68 0.96 0.31

ratio 3.58 4.72 2.59 2.06 4.02 2.29 2.94 2.85 2.43 1.74 1.97 4.11 4.81 4.21 1.98 2.85 1.83 2.32 1.73 2.09 1.75 1.73 2.27 2.29 3.40 1.64 2.18 1.80 1.62 2.85 2.17 1.55 1.73 2.80 2.13 2.60 2.00 2.91 1.51 1.67

Table 28: Key Words on ‘Procedure 4 [Legislation 2]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 74

senior drug prescript medicar coverag benefit plan price beneficiari pai health care cost citizen part premium afford choic over privat than same cover medicin pharmaceut which hospit insur thi doctor to under compani gener better incom abl differ hmo patient

β 5.57 5.99 5.44 5.77 4.67 5.07 4.90 4.32 4.13 4.49 4.95 4.82 4.95 3.97 3.75 3.78 3.73 3.63 4.07 4.04 4.32 3.76 3.45 3.37 3.33 4.59 3.53 4.08 6.23 3.31 7.21 4.13 4.03 3.84 3.30 3.52 3.48 3.40 3.42 3.23

median 1.47 1.78 −0.17 0.21 0.04 2.45 2.47 1.08 −0.69 2.59 2.54 2.59 2.81 2.10 2.89 −1.27 1.11 1.25 3.60 1.97 3.71 2.82 1.39 −0.42 −1.12 4.19 0.51 1.18 5.80 0.02 6.84 3.40 1.83 2.90 2.43 0.59 2.67 2.49 −2.18 0.16

difference 4.10 4.21 5.61 5.56 4.62 2.62 2.43 3.24 4.82 1.91 2.40 2.24 2.14 1.87 0.86 5.05 2.63 2.38 0.48 2.08 0.61 0.95 2.07 3.79 4.45 0.41 3.03 2.90 0.42 3.29 0.37 0.73 2.20 0.94 0.87 2.93 0.81 0.92 5.60 3.07

MAD 0.87 1.36 1.51 1.97 1.10 0.93 0.81 0.97 1.23 0.71 1.07 0.97 0.95 0.65 0.23 1.70 0.86 0.62 0.20 0.86 0.28 0.37 0.54 0.85 1.10 0.22 1.11 1.45 0.25 1.12 0.23 0.40 1.17 0.47 0.33 1.33 0.37 0.41 2.52 1.21

ratio 4.71 3.09 3.71 2.82 4.22 2.81 3.00 3.33 3.93 2.67 2.25 2.32 2.24 2.89 3.71 2.98 3.05 3.86 2.37 2.41 2.16 2.58 3.85 4.44 4.06 1.88 2.72 2.01 1.67 2.95 1.62 1.84 1.89 1.98 2.61 2.20 2.19 2.26 2.22 2.54

Table 29: Key Words on ‘Health 2 [Economics - Seniors]’ (category includes Medicare, prescription drug coverage). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 75

patient care doctor health insur medic plan coverag decis right hmo cover or physician premium protect employ access hospit trial liabil emerg cancer treatment an case room issu specialist appeal problem compani clinic deni cost damag best differ lawsuit treat

β 5.26 5.60 4.59 5.55 5.19 4.90 4.63 3.96 3.84 4.71 4.34 3.68 5.11 3.64 3.77 4.49 4.12 4.02 3.62 3.57 3.64 3.68 3.42 3.61 5.07 3.69 3.29 4.34 3.21 3.23 3.84 4.05 3.40 3.20 4.43 3.20 3.22 3.39 3.23 2.94

median 0.16 2.59 0.02 2.54 1.18 1.23 2.47 0.04 2.21 3.45 −2.18 1.39 4.53 −0.98 −1.27 3.00 0.99 1.85 0.51 −0.30 0.20 1.72 −0.14 0.88 4.65 2.55 0.58 3.58 −1.46 0.27 3.10 1.83 −0.65 0.91 2.81 0.88 2.30 2.49 −0.51 0.98

difference 5.10 3.01 4.57 3.01 4.01 3.67 2.16 3.92 1.63 1.26 6.52 2.29 0.58 4.62 5.04 1.50 3.13 2.17 3.11 3.87 3.44 1.96 3.56 2.73 0.41 1.14 2.71 0.76 4.67 2.97 0.73 2.23 4.05 2.29 1.62 2.33 0.92 0.90 3.75 1.95

MAD 1.21 0.97 1.12 1.07 1.45 1.31 0.81 1.10 0.46 0.54 2.52 0.54 0.28 1.21 1.70 0.69 1.35 0.96 1.11 1.36 1.33 0.82 1.22 1.08 0.24 0.50 0.74 0.40 1.13 0.83 0.36 1.17 1.56 0.71 0.95 0.79 0.33 0.41 1.50 0.55

ratio 4.22 3.12 4.09 2.82 2.77 2.80 2.66 3.58 3.55 2.35 2.58 4.27 2.03 3.82 2.97 2.17 2.31 2.26 2.80 2.85 2.59 2.40 2.93 2.52 1.70 2.29 3.64 1.91 4.13 3.59 2.03 1.91 2.60 3.24 1.70 2.93 2.78 2.22 2.50 3.52

Table 30: Key Words on ‘Health 3 [Economics - General]’ (category includes provision, access, costs). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 76

iraq forc resolut unit saddam troop war world threat hussein iraqi attack peac destruct alli mass un bush weapon intellig nato militari regim power reconstruct our terrorist kosovo ground bosnia administr kill council men against situat soldier intern action terror

β 4.93 4.15 3.82 4.47 3.96 3.72 4.55 4.03 3.26 3.66 3.66 3.38 3.05 3.00 3.00 2.94 3.08 3.09 3.82 3.15 3.20 3.99 2.57 3.12 2.65 5.33 3.26 3.04 2.48 2.57 3.89 2.48 2.53 2.85 3.67 2.66 2.56 3.22 3.31 3.21

median 0.24 2.28 1.85 3.17 −1.86 −0.62 1.99 2.35 0.88 −1.86 −1.52 1.19 −0.02 −0.28 −0.37 −0.11 −1.04 1.31 0.73 0.13 −2.71 1.52 −0.80 1.96 −1.08 4.88 0.63 −1.79 0.89 −2.28 3.00 0.83 0.58 1.41 2.69 1.92 −0.48 1.88 2.35 0.84

difference 4.69 1.86 1.97 1.30 5.82 4.34 2.56 1.69 2.38 5.52 5.18 2.19 3.07 3.28 3.36 3.05 4.12 1.77 3.09 3.02 5.91 2.47 3.37 1.16 3.72 0.45 2.63 4.83 1.60 4.85 0.89 1.65 1.95 1.44 0.97 0.74 3.04 1.33 0.96 2.37

MAD 2.01 0.57 0.66 0.60 2.43 1.66 1.27 0.79 0.79 2.52 2.40 0.97 0.98 1.02 1.17 0.96 1.62 0.78 1.84 1.47 2.99 1.60 1.07 0.62 1.42 0.33 1.59 2.62 0.55 2.12 0.62 0.62 0.84 0.77 0.68 0.37 1.40 0.88 0.65 1.60

ratio 2.33 3.27 2.98 2.18 2.40 2.62 2.01 2.13 3.03 2.19 2.16 2.27 3.12 3.21 2.88 3.19 2.54 2.27 1.68 2.05 1.98 1.55 3.15 1.89 2.63 1.36 1.66 1.85 2.90 2.29 1.43 2.67 2.33 1.87 1.44 1.99 2.17 1.52 1.47 1.48

Table 31: Key Words on ‘Defense [Use of Force]’ (includes wars and other military interventions in Iraq, Afghanistan, Bosnia, Kosovo). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 77

unit human peac nato china forc intern democraci resolut europ relat russia world freedom allianc govern chines foreign european alli remain israel threat russian conflict religi polit free un republ regim polici engag minist east kosovo soviet must traffick effort

β 4.59 3.75 3.58 4.22 3.69 3.70 3.83 3.41 3.46 3.17 2.96 2.89 3.85 3.21 2.82 4.17 2.89 3.45 2.67 2.69 2.79 2.80 2.67 2.52 2.45 2.79 3.36 2.72 2.55 2.55 2.35 3.56 2.34 2.40 2.68 2.76 2.31 3.74 2.28 3.42

median 3.17 1.47 −0.02 −2.71 −0.10 2.28 1.88 0.16 1.85 −0.50 1.82 −0.78 2.35 1.11 −0.73 3.22 −1.37 1.70 −0.45 −0.37 2.11 −1.40 0.88 −1.22 0.44 −0.23 2.08 1.49 −1.04 −0.56 −0.80 2.67 1.15 −1.34 0.02 −1.79 −1.17 3.07 −1.93 2.89

difference 1.42 2.27 3.60 6.93 3.78 1.41 1.94 3.25 1.60 3.67 1.14 3.68 1.50 2.10 3.56 0.94 4.25 1.75 3.12 3.06 0.69 4.20 1.79 3.75 2.01 3.02 1.29 1.23 3.59 3.11 3.15 0.88 1.19 3.74 2.66 4.55 3.48 0.67 4.21 0.54

MAD 0.60 0.79 0.98 2.99 1.52 0.57 0.88 1.27 0.66 1.14 0.41 1.23 0.79 0.93 0.75 0.53 1.78 0.94 1.06 1.17 0.30 1.96 0.79 1.46 0.73 1.55 0.80 0.62 1.62 1.42 1.07 0.60 0.47 1.66 1.46 2.62 1.30 0.49 1.53 0.39

ratio 2.38 2.88 3.66 2.32 2.48 2.48 2.22 2.57 2.43 3.23 2.77 2.99 1.91 2.26 4.75 1.79 2.39 1.87 2.94 2.62 2.29 2.14 2.27 2.57 2.74 1.95 1.61 2.00 2.21 2.19 2.94 1.46 2.55 2.25 1.83 1.74 2.67 1.36 2.75 1.39

Table 32: Key Words on ‘International Affairs [Diplomacy / Human Rights]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 78

test treati threat weapon russia nuclear defens unit missil chemic capabl china russian convent prolifer korea world destruct ratifi north control forc arm agreement relat sanction ratif sign mass possibl iran administr ban technologi strateg deploi stockpil ballist soviet india

β 4.61 5.22 4.16 5.34 3.98 5.35 4.74 4.71 5.00 4.24 3.56 3.96 3.61 3.57 3.49 3.52 4.09 3.28 3.24 3.73 3.79 3.62 3.91 3.56 3.04 3.16 3.02 2.95 2.97 2.89 3.36 4.18 3.19 3.72 3.11 3.19 3.16 3.25 2.81 2.75

median 1.49 0.16 0.88 0.73 −0.78 0.33 1.79 3.17 −0.86 0.11 0.42 −0.10 −1.22 −0.03 −1.49 −0.73 2.35 −0.28 −1.22 1.75 2.28 2.28 0.91 2.10 1.82 −0.24 −1.83 1.66 −0.11 1.97 −1.74 3.00 0.19 1.32 −0.83 −1.15 −2.43 −2.51 −1.17 −1.61

difference 3.12 5.06 3.28 4.61 4.76 5.02 2.94 1.54 5.86 4.12 3.14 4.06 4.83 3.59 4.98 4.25 1.74 3.55 4.46 1.98 1.51 1.33 3.00 1.46 1.22 3.39 4.85 1.29 3.08 0.91 5.10 1.18 3.00 2.41 3.94 4.33 5.59 5.77 3.98 4.36

MAD 0.69 1.72 0.79 1.84 1.23 2.06 1.16 0.60 2.39 1.65 0.73 1.52 1.46 1.04 1.23 1.45 0.79 1.02 1.17 0.83 0.70 0.57 1.40 0.66 0.41 1.30 1.68 0.36 0.96 0.20 2.24 0.62 1.22 1.20 1.60 1.91 2.49 2.70 1.30 1.33

ratio 4.54 2.93 4.17 2.51 3.87 2.44 2.54 2.58 2.45 2.49 4.30 2.66 3.31 3.44 4.04 2.94 2.20 3.48 3.83 2.37 2.16 2.34 2.14 2.21 2.96 2.60 2.88 3.56 3.22 4.58 2.28 1.90 2.45 2.01 2.46 2.26 2.25 2.14 3.05 3.27

Table 33: Key Words on ‘International Affairs [Arms Control]’ (category includes discussion of treaties, nonproliferation, WMDs). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 79

serv hi career dedic john posit honor nomin dure miss mr staff becam wish repres judg great ha graduat experi tribut him join accomplish director began attornei confirm known bench proud univers best bar man appoint wife degre outstand court

β 4.96 6.01 3.82 3.53 3.48 3.46 3.73 3.80 3.78 2.96 4.95 3.70 2.95 3.15 3.51 4.43 3.78 5.43 3.16 3.09 2.89 4.46 3.33 2.78 3.13 2.72 3.49 3.05 2.75 2.72 2.78 3.70 3.14 2.60 3.22 2.84 2.83 2.57 2.95 3.98

median 2.29 3.55 −0.04 0.56 0.71 2.10 0.87 0.35 2.61 0.53 4.26 1.74 0.57 1.50 2.49 0.79 2.68 4.75 −0.20 1.34 −0.88 2.32 2.06 1.19 0.96 0.85 0.50 0.42 1.23 −2.79 1.30 1.33 2.30 −0.04 0.98 0.32 −0.06 0.67 0.35 1.65

difference 2.67 2.46 3.87 2.97 2.77 1.37 2.86 3.44 1.17 2.43 0.69 1.96 2.38 1.65 1.03 3.64 1.10 0.68 3.35 1.75 3.77 2.13 1.27 1.60 2.16 1.88 2.99 2.63 1.52 5.51 1.48 2.37 0.84 2.64 2.25 2.53 2.89 1.90 2.59 2.33

MAD 0.64 0.72 0.70 0.48 0.61 0.35 0.90 1.12 0.39 0.41 0.28 0.71 0.47 0.48 0.36 1.48 0.42 0.29 1.07 0.56 0.96 0.93 0.47 0.36 0.74 0.31 1.20 0.89 0.37 1.32 0.41 1.06 0.33 0.60 0.96 0.93 1.05 0.54 1.03 1.14

ratio 4.20 3.42 5.55 6.13 4.53 3.86 3.17 3.08 3.02 5.97 2.42 2.75 5.07 3.43 2.85 2.46 2.65 2.31 3.12 3.15 3.93 2.29 2.71 4.45 2.92 6.11 2.50 2.95 4.08 4.17 3.65 2.24 2.56 4.38 2.34 2.72 2.76 3.53 2.51 2.04

Table 34: Key Words on ‘Symbolic [Tribute - Living]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 80

recogn dedic honor serv insert contribut celebr congratul career achiev accomplish proud hi wish dr commun tribut school student director success univers excel great award began becam john learn outstand throughout board commend mr end graduat join m church leadership

β 3.90 3.78 4.11 4.41 3.87 3.66 3.38 3.36 3.36 3.40 3.19 3.12 5.41 3.21 3.84 4.78 3.02 4.69 4.27 3.22 3.58 4.06 2.92 3.76 4.10 2.74 2.74 2.85 2.96 3.25 3.29 3.24 3.00 4.91 3.98 2.95 3.28 2.99 2.96 3.57

median 2.22 0.56 0.87 2.29 0.20 1.57 −0.38 0.09 −0.04 1.73 1.19 1.30 3.55 1.50 0.45 2.88 −0.88 2.14 1.13 0.96 1.77 1.33 0.33 2.68 0.33 0.85 0.57 0.71 1.50 0.35 1.54 1.40 1.34 4.26 2.71 −0.20 2.06 0.32 −0.50 2.10

difference 1.69 3.22 3.24 2.13 3.67 2.08 3.76 3.27 3.40 1.67 2.00 1.82 1.86 1.71 3.39 1.90 3.90 2.55 3.14 2.26 1.80 2.73 2.60 1.08 3.76 1.89 2.17 2.13 1.45 2.89 1.75 1.85 1.66 0.65 1.28 3.15 1.22 2.67 3.46 1.47

MAD 0.34 0.48 0.90 0.64 1.05 0.55 0.56 0.48 0.70 0.45 0.36 0.41 0.72 0.48 1.19 0.73 0.96 0.98 1.20 0.74 0.65 1.06 0.69 0.42 1.53 0.31 0.47 0.61 0.47 1.03 0.63 0.66 0.55 0.28 0.55 1.07 0.47 0.93 1.21 0.61

ratio 5.00 6.64 3.59 3.35 3.48 3.81 6.67 6.80 4.88 3.69 5.58 4.49 2.58 3.57 2.84 2.61 4.07 2.60 2.63 3.05 2.79 2.58 3.78 2.60 2.47 6.16 4.62 3.49 3.12 2.80 2.76 2.79 2.99 2.28 2.33 2.93 2.61 2.86 2.87 2.39

Table 35: Key Words on ‘Symbolic [Tribute - Constituency]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 81

honor men sacrific memori dedic freedom di kill serv soldier courag hero god love proud forc hi rememb battl great celebr world lost tribut histori brave peac son remind spirit john fire flag enemi join bless dai 11 father heart

β 4.11 4.01 3.43 3.58 3.03 3.75 3.16 3.15 3.99 3.56 3.09 2.94 3.05 3.34 2.93 3.75 5.20 3.31 2.78 3.72 2.70 4.16 3.07 2.76 3.45 2.91 2.91 2.77 2.58 2.52 2.67 2.90 2.96 2.55 3.18 2.41 4.30 2.80 2.91 2.77

median 0.87 1.41 −0.33 0.16 0.56 1.11 −0.17 0.83 2.29 −0.48 −0.09 −1.19 −0.23 0.68 1.30 2.28 3.55 1.37 0.34 2.68 −0.38 2.35 1.46 −0.88 1.78 −1.15 −0.02 −0.06 0.90 0.25 0.71 0.67 −0.66 −0.95 2.06 −0.84 3.17 1.27 0.50 0.78

difference 3.24 2.60 3.76 3.42 2.46 2.64 3.33 2.32 1.71 4.04 3.18 4.12 3.28 2.66 1.63 1.47 1.65 1.94 2.43 1.04 3.08 1.82 1.61 3.64 1.67 4.06 2.93 2.83 1.68 2.26 1.95 2.23 3.63 3.49 1.11 3.25 1.13 1.53 2.41 1.99

MAD 0.90 0.77 0.83 1.11 0.48 0.93 0.90 0.62 0.64 1.40 0.90 0.95 0.98 0.93 0.41 0.57 0.72 0.69 0.46 0.42 0.56 0.79 0.56 0.96 0.68 1.36 0.98 0.88 0.41 0.52 0.61 0.82 1.39 1.00 0.47 0.66 0.54 0.56 0.92 0.71

ratio 3.59 3.37 4.52 3.07 5.09 2.83 3.72 3.75 2.68 2.89 3.54 4.34 3.34 2.87 4.04 2.58 2.29 2.82 5.27 2.50 5.46 2.30 2.87 3.79 2.45 2.99 2.98 3.21 4.05 4.38 3.19 2.72 2.61 3.48 2.38 4.90 2.08 2.74 2.62 2.80

Table 36: Key Words on ‘Symbolic [Remembrance - Military]’ (category includes tributes to other public servants and discussion of the WWII Memorial). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 82

great hi paul john alwai man reagan him serv love miss never rememb life thurmond proud honor remark wa father career wife thank he my togeth friend knew tribut wish old chafe god had dedic ronald becam bob long best

β 4.24 5.71 3.61 3.49 3.75 3.84 3.37 4.72 4.05 3.46 2.87 3.47 3.32 3.95 2.94 2.73 3.23 2.98 5.78 3.00 2.61 2.92 3.84 6.09 5.05 2.98 3.88 2.89 2.47 2.79 2.49 2.55 2.63 4.67 2.36 2.59 2.33 2.30 3.10 3.02

median 2.68 3.55 −0.15 0.71 1.66 0.98 −0.12 2.32 2.29 0.68 0.53 2.10 1.37 2.02 −1.34 1.30 0.87 1.39 4.64 0.50 −0.04 −0.06 2.85 4.13 4.47 2.15 2.35 0.47 −0.88 1.50 1.27 −0.97 −0.23 3.77 0.56 −0.92 0.57 −0.33 2.55 2.30

difference 1.57 2.16 3.76 2.77 2.09 2.87 3.49 2.40 1.76 2.78 2.33 1.37 1.95 1.93 4.28 1.43 2.36 1.59 1.14 2.51 2.65 2.99 0.99 1.96 0.58 0.83 1.53 2.41 3.35 1.29 1.22 3.52 2.86 0.91 1.80 3.52 1.76 2.63 0.54 0.73

MAD 0.42 0.72 0.71 0.61 0.64 0.96 0.98 0.93 0.64 0.93 0.41 0.49 0.69 0.82 1.33 0.41 0.90 0.55 0.58 0.92 0.70 1.05 0.47 1.05 0.31 0.33 0.76 0.92 0.96 0.48 0.37 1.12 0.98 0.49 0.48 1.20 0.47 0.63 0.25 0.33

ratio 3.76 3.01 5.30 4.54 3.25 2.99 3.57 2.57 2.77 3.00 5.73 2.81 2.84 2.36 3.21 3.54 2.61 2.89 1.98 2.72 3.81 2.85 2.12 1.87 1.88 2.50 2.02 2.61 3.49 2.68 3.28 3.14 2.91 1.84 3.71 2.93 3.75 4.15 2.17 2.20

Table 37: Key Words on ‘Symbolic [Remembrance - Nonmilitary]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 83

team game plai player win fan basebal congratul record victori great proud footbal accomplish won champion athlet sport best career and seri season of leagu achiev dedic four recogn final in led most year three field hall over celebr hi

β 5.96 5.44 4.86 4.92 4.79 4.43 4.77 4.30 4.50 4.25 4.80 4.22 4.56 4.09 4.24 3.98 5.13 4.95 4.15 3.99 7.38 3.75 4.82 7.37 4.68 4.03 3.70 3.62 3.87 3.94 7.13 3.59 4.40 5.63 3.71 3.93 3.43 4.36 3.39 5.95

median 0.24 0.39 1.45 −1.18 0.55 −2.17 −1.99 0.09 2.61 −0.51 2.68 1.30 −1.56 1.19 −0.28 −0.70 −1.81 −1.37 2.30 −0.04 6.63 0.44 −0.65 6.74 −0.98 1.73 0.56 1.60 2.22 2.43 6.37 0.73 3.28 4.46 2.24 1.10 −0.70 3.60 −0.38 3.55

difference 5.72 5.05 3.41 6.10 4.24 6.61 6.76 4.21 1.89 4.75 2.13 2.92 6.12 2.90 4.52 4.67 6.94 6.31 1.85 4.03 0.75 3.31 5.46 0.62 5.65 2.30 3.14 2.02 1.66 1.51 0.76 2.86 1.12 1.17 1.47 2.83 4.12 0.76 3.76 2.39

MAD 0.78 0.74 0.37 0.75 0.57 0.80 1.27 0.48 0.30 0.46 0.42 0.41 1.20 0.36 0.78 0.56 1.60 1.43 0.33 0.70 0.19 0.27 1.29 0.17 1.34 0.45 0.48 0.29 0.34 0.32 0.22 0.44 0.28 0.33 0.31 0.65 0.69 0.20 0.56 0.72

ratio 7.29 6.82 9.31 8.09 7.41 8.25 5.33 8.77 6.34 10.37 5.10 7.21 5.11 8.09 5.78 8.29 4.35 4.42 5.61 5.79 3.86 12.12 4.24 3.75 4.22 5.09 6.49 6.98 4.92 4.71 3.48 6.49 3.98 3.52 4.82 4.33 5.95 3.78 6.68 3.33

Table 38: Key Words on ‘Symbolic [Congratulations - Sports]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 84

hundr at four three ago of year five two the seven and past which close reflect nine mr six thousand on than eight june dure fifteen mai march ten april stood juli 25 1998 more februari 15 8 thirti a

β 12.73 11.90 11.22 10.86 11.45 10.93 11.70 11.73 10.90 12.10 10.55 10.87 10.19 10.19 10.19 10.19 10.51 10.20 10.78 10.83 11.23 9.94 10.43 9.75 10.20 10.00 9.73 9.66 10.10 9.52 11.70 9.76 9.83 9.25 9.93 9.07 9.56 8.41 8.52 10.23

median 1.17 4.55 1.60 2.24 2.40 6.74 4.46 1.37 2.96 7.48 0.41 6.63 2.18 4.19 2.28 1.33 −0.14 4.26 0.90 1.63 5.63 3.71 0.20 0.85 2.61 −1.61 3.14 1.00 0.77 0.79 −0.11 0.97 1.12 1.58 4.11 0.42 1.45 1.07 −1.14 6.57

difference 11.56 7.35 9.62 8.62 9.04 4.19 7.24 10.36 7.94 4.62 10.15 4.24 8.01 6.01 7.91 8.86 10.65 5.94 9.87 9.20 5.60 6.22 10.24 8.90 7.59 11.61 6.59 8.66 9.33 8.73 11.81 8.79 8.71 7.67 5.82 8.65 8.10 7.34 9.65 3.65

MAD 0.32 0.25 0.29 0.31 0.40 0.17 0.33 0.50 0.35 0.24 0.39 0.19 0.28 0.22 0.31 0.34 0.54 0.28 0.50 0.50 0.34 0.28 0.53 0.37 0.39 0.59 0.30 0.44 0.53 0.49 0.95 0.55 0.56 0.48 0.39 0.54 0.54 0.37 0.53 0.30

ratio 36.22 29.04 33.23 28.20 22.80 25.16 21.72 20.82 22.88 19.38 25.82 21.77 28.83 27.76 25.80 26.25 19.91 20.96 19.55 18.42 16.70 22.20 19.20 23.80 19.55 19.60 21.72 19.77 17.49 17.64 12.42 15.94 15.58 15.82 14.89 16.12 15.10 19.62 18.15 12.32

Table 39: Key Words on ‘Debt / Deficit [Jesse Helms]’ (almost daily deficit / debt ‘boxscore’ speeches by Helms). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 85

of and in chang by to a act with the hate well becom believ current categori add new can them describ send would first mind like mr occur substanc todai unaccept march thi enforc last mai ani signal our two

β 9.78 9.71 9.58 8.82 8.65 10.04 10.24 8.97 8.61 10.25 9.46 8.28 8.28 8.99 8.82 8.27 8.27 8.35 8.97 8.38 8.00 8.28 8.84 8.30 8.27 8.02 8.23 8.04 8.27 8.25 8.27 7.57 8.49 8.97 7.49 7.51 8.28 8.27 8.30 7.57

median 6.74 6.63 6.37 2.94 4.86 6.84 6.57 3.32 5.09 7.48 −0.47 3.26 2.22 3.35 2.85 0.33 1.52 3.87 4.13 3.71 1.16 1.86 4.59 3.28 1.34 3.37 4.26 1.78 −0.09 3.66 −0.19 1.00 5.80 1.66 3.28 3.14 3.54 −0.15 4.88 2.96

difference 3.04 3.08 3.21 5.87 3.78 3.20 3.67 5.64 3.52 2.78 9.93 5.02 6.06 5.65 5.97 7.95 6.76 4.48 4.84 4.67 6.84 6.42 4.24 5.02 6.94 4.65 3.97 6.26 8.37 4.59 8.47 6.57 2.69 7.31 4.21 4.37 4.74 8.42 3.42 4.60

MAD 0.17 0.19 0.22 0.35 0.22 0.23 0.30 0.40 0.25 0.24 0.84 0.31 0.27 0.48 0.49 0.43 0.42 0.36 0.43 0.41 0.42 0.54 0.40 0.45 0.52 0.32 0.28 0.45 0.63 0.35 0.68 0.44 0.25 0.75 0.29 0.30 0.44 0.73 0.33 0.35

ratio 18.25 15.79 14.76 16.56 17.12 13.95 12.36 14.00 14.14 11.65 11.89 16.22 22.14 11.68 12.25 18.68 16.03 12.28 11.20 11.36 16.14 11.92 10.58 11.20 13.40 14.38 13.99 14.04 13.21 13.28 12.37 15.01 10.56 9.73 14.74 14.41 10.71 11.58 10.25 13.27

Table 40: Key Words on ‘Hate Crime [Gordon Smith]’ (almost daily speeches on hate crimes by Smith). Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 86

order without the from object recogn so second call clerk roll mr will report previou suffici to and ha of agre under time senat third absent announc chair by no further other major on desir i remain new there been

β 10.03 9.85 11.17 9.91 10.01 8.52 9.82 8.13 8.72 9.28 8.68 8.26 9.38 8.43 7.68 7.75 9.32 8.74 8.17 8.57 8.20 7.92 8.51 10.18 6.82 6.80 7.10 7.35 7.46 8.39 6.61 6.82 6.86 8.65 6.64 10.52 6.63 7.58 8.84 7.15

median 2.67 2.73 7.48 4.91 1.89 2.22 4.26 2.22 2.93 −0.47 0.35 4.26 5.03 3.05 0.86 0.68 6.84 6.63 4.75 6.74 2.40 3.40 4.19 4.95 1.47 −0.70 0.91 1.82 4.86 3.94 2.06 4.09 2.67 5.63 0.76 6.83 2.11 3.87 4.51 4.22

difference 7.36 7.13 3.69 5.00 8.12 6.30 5.57 5.91 5.79 9.75 8.33 4.00 4.35 5.39 6.82 7.07 2.47 2.12 3.41 1.83 5.80 4.51 4.32 5.23 5.36 7.50 6.18 5.53 2.60 4.44 4.55 2.73 4.19 3.02 5.88 3.69 4.53 3.72 4.33 2.93

MAD 0.33 0.29 0.24 0.33 0.60 0.34 0.42 0.35 0.42 0.82 0.64 0.28 0.39 0.44 0.44 0.48 0.23 0.19 0.29 0.17 0.51 0.40 0.41 0.56 0.35 0.44 0.44 0.45 0.22 0.45 0.28 0.22 0.36 0.34 0.41 0.45 0.30 0.36 0.50 0.28

ratio 22.62 24.69 15.48 14.99 13.54 18.72 13.18 17.08 13.71 11.88 13.03 14.10 11.19 12.34 15.33 14.69 10.79 10.86 11.61 10.98 11.31 11.42 10.53 9.33 15.41 17.07 14.03 12.19 11.74 9.81 16.31 12.52 11.73 9.00 14.42 8.20 15.13 10.19 8.64 10.33

Table 41: Key Words on ‘Procedure 1 [Housekeeping 1]’. Column 1 gives the word stem generated by the Porter2 stemmer, column 2 gives the β value for that word stem on this topic (on a log odds scale with a sum to 0 constraint), column 3 gives the median β value for this word across all topics other than the current topic, column 4 gives the difference between columns 2 and 3, column 5 gives the median absolute deviation of the β values for this word across all topics other than the current topic, and column 6 gives the ratio of column 4 to column 5. Rows have been sorted on rank(β) + rank(ratio). 87

consent unanim the of mr to order further and consider be quorum in upon immedi at record laid with adjourn until follow presid calendar call previou relat for under ask hour am third period proce reconsid rescind absenc time agre

β 10.51 10.47 11.95 10.46 10.44 10.95 9.57 9.13 10.23 9.18 10.80 9.05 9.84 8.76 8.84 8.97 8.76 8.80 9.02 8.58 8.78 8.73 10.56 8.47 8.87 8.52 8.56 10.08 8.88 10.61 8.59 8.43 8.01 8.39 9.17 8.87 8.65 7.77 9.11 8.79

median 2.46 2.53 7.48 6.74 4.26 6.84 2.67 2.06 6.63 1.78 5.34 1.33 6.37 1.97 1.50 4.55 2.61 0.15 5.09 −0.75 2.00 2.14 4.67 −0.82 2.93 0.86 1.82 5.82 3.40 3.48 1.87 3.80 1.47 1.85 0.90 −0.11 0.41 0.96 4.19 2.40

difference 8.05 7.94 4.48 3.72 6.18 4.10 6.90 7.07 3.60 7.41 5.46 7.72 3.47 6.79 7.33 4.42 6.14 8.66 3.93 9.33 6.78 6.59 5.90 9.29 5.94 7.66 6.75 4.25 5.47 7.13 6.72 4.63 6.54 6.53 8.27 8.98 8.24 6.81 4.92 6.39

MAD 0.26 0.23 0.24 0.17 0.28 0.23 0.33 0.28 0.19 0.37 0.37 0.41 0.22 0.31 0.40 0.25 0.30 0.47 0.25 0.53 0.44 0.42 0.48 0.53 0.42 0.44 0.41 0.34 0.40 0.60 0.44 0.28 0.35 0.40 0.68 0.69 0.59 0.35 0.41 0.51

ratio 30.91 34.61 18.77 22.30 21.79 17.92 21.22 25.35 18.47 20.06 14.72 18.67 15.98 21.98 18.54 17.48 20.62 18.50 15.81 17.56 15.40 15.63 12.21 17.38 14.07 17.22 16.41 12.38 13.85 11.88 15.44 16.80 18.82 16.15 12.19 12.98 13.90 19.62 11.99 12.46


mr consent unanim of to at order the consider follow will am be on and hour occur quorum begin expect session with further move pm time major hope complet mai tomorrow other announc proce todai resum senat leader from until

β 7.43 6.78 6.64 8.17 8.76 6.83 6.25 9.31 5.91 6.11 7.84 6.12 7.93 7.94 7.93 5.72 5.70 5.53 5.53 5.51 5.64 6.82 5.46 5.60 5.51 6.92 5.65 5.76 5.55 5.61 5.91 5.72 5.30 5.77 6.02 5.29 8.22 6.21 6.94 5.49

median 4.26 2.46 2.53 6.74 6.84 4.55 2.67 7.48 1.78 2.14 5.03 3.80 5.34 5.63 6.63 1.87 1.78 1.33 2.18 1.79 0.90 5.09 2.06 2.65 −1.10 4.19 2.67 2.90 1.87 3.14 0.91 4.09 0.91 0.90 3.66 −1.27 4.95 2.30 4.91 2.00

difference 3.17 4.32 4.12 1.42 1.91 2.28 3.58 1.83 4.14 3.97 2.81 2.32 2.60 2.30 1.30 3.84 3.92 4.20 3.34 3.72 4.74 1.73 3.40 2.95 6.61 2.74 2.98 2.85 3.68 2.47 5.00 1.63 4.39 4.87 2.36 6.56 3.27 3.91 2.03 3.49

MAD 0.28 0.26 0.23 0.17 0.23 0.25 0.33 0.24 0.37 0.42 0.39 0.28 0.37 0.34 0.19 0.44 0.45 0.41 0.34 0.36 0.53 0.25 0.28 0.33 0.70 0.41 0.36 0.37 0.44 0.30 0.70 0.22 0.44 0.68 0.35 0.70 0.56 0.60 0.33 0.44

ratio 11.18 16.59 17.93 8.55 8.35 9.02 11.00 7.68 11.21 9.41 7.23 8.42 7.00 6.87 6.69 8.83 8.79 10.15 9.82 10.34 8.87 6.95 12.19 8.89 9.46 6.66 8.34 7.64 8.38 8.15 7.11 7.46 9.96 7.17 6.82 9.43 5.84 6.53 6.08 7.92


of mr consent unanim and at meet on the am to session dure pm hear record thursdai hold begin which announc mai june by march be in testimoni committe appear tuesdai wednesdai act dc purpos further natur presid room april

β 9.40 8.22 7.90 7.90 8.91 8.12 7.96 9.24 10.00 7.53 9.25 7.51 7.35 6.94 8.05 6.72 6.89 6.48 6.58 6.62 6.39 6.52 6.18 6.94 6.29 8.39 8.21 6.70 8.36 6.32 6.60 6.89 6.89 6.44 6.85 5.93 6.60 8.19 6.77 6.04

median 6.74 4.26 2.46 2.53 6.63 4.55 2.42 5.63 7.48 3.80 6.84 0.90 2.61 −1.10 2.41 2.61 −0.72 1.78 2.18 4.19 0.91 3.14 0.85 4.86 1.00 5.34 6.37 0.27 3.46 1.11 −0.60 −1.00 3.32 0.40 1.80 2.06 1.37 4.67 0.58 0.79

difference 2.66 3.95 5.44 5.37 2.29 3.58 5.54 3.61 2.52 3.73 2.40 6.62 4.74 8.04 5.64 4.11 7.61 4.70 4.40 2.44 5.48 3.38 5.33 2.08 5.29 3.05 1.84 6.44 4.90 5.21 7.20 7.89 3.56 6.04 5.05 3.87 5.22 3.52 6.19 5.24

MAD 0.17 0.28 0.26 0.23 0.19 0.25 0.37 0.34 0.24 0.28 0.23 0.53 0.39 0.70 0.55 0.30 0.71 0.33 0.34 0.22 0.44 0.30 0.37 0.22 0.44 0.37 0.22 0.65 0.62 0.50 0.75 0.89 0.40 0.61 0.59 0.28 0.58 0.48 0.74 0.49

ratio 15.96 13.95 20.89 23.39 11.73 14.13 15.00 10.78 10.58 13.52 10.49 12.37 12.20 11.50 10.33 13.79 10.65 14.41 12.93 11.27 12.44 11.14 14.25 9.40 12.08 8.24 8.45 9.96 7.89 10.50 9.60 8.88 8.84 9.97 8.63 13.88 9.08 7.29 8.32 10.60


An Automated Method of Topic-Coding Legislative ... - CiteSeerX

An Automated Method of Topic-Coding Legislative ... - CiteSeerX

Suggest Documents

AN AUTOMATED METHOD FOR DETERMINATION OF ... - CiteSeerX

AN AUTOMATED METHOD FOR SENSITIVITY

An Automated Method to Delineate the Ice Extension of ... - CiteSeerX

An Automated Method to Delineate the Ice Extension of ... - CiteSeerX

An Automated Method for Inspection of IC Bonds - CiteSeerX

An Automated Enzymatic Method for Measurement of

An Automated Method for Software Controlled Cache ... - CiteSeerX

An Automated Method for Digitizing Color Thematic Maps - CiteSeerX

An Automated Method for Software Controlled Cache ... - CiteSeerX

An Automated Method for Digitizing Color Thematic Maps - CiteSeerX

Automated Legislative Drafting: Generating paraphrases of legislation

AN AUTOMATED DESIGN SYNTHESIS METHOD FOR COMPLIANT ...

AN AUTOMATED AND UNIVERSAL METHOD FOR MEASURING

Network Assessor: an automated method for ... - BioMedSearch

An Automated Method to Parameterize Segmentation ...

An Automated Alcohol Dehydrogenase Method for Ethanol ...

Automated Alphabet Reduction Method with Evolutionary ... - CiteSeerX

Automated microaneurysm detection method based on ... - CiteSeerX

automated method for model comparison MAMMOTH - CiteSeerX

An Automated Benchmarking Toolset1 - CiteSeerX

Session S1G DEVELOPMENT OF AN AUTOMATED ... - CiteSeerX

Evaluation of an Automated Preanalytical Robotic ... - CiteSeerX

Session S1G DEVELOPMENT OF AN AUTOMATED ... - CiteSeerX

Development of an Automated Method for Selected Aromas of ... - MDPI