Computer Supported Cooperative Work (2012) 21:449–473 DOI 10.1007/s10606-012-9156-4
© Springer 2012
Enabling Large-Scale Deliberation Using Attention-Mediation Metrics Mark Klein MIT Center for Collective Intelligence, Cambridge, MA, USA (E-mail:
[email protected]) Abstract. Humanity now finds itself faced with a range of highly complex and controversial challenges—such as climate change, the spread of disease, international security, scientific collaborations, product development, and so on—that call upon us to bring together large numbers of experts and stakeholders to deliberate collectively on a global scale. Collocated meetings can however be impractically expensive, severely limit the concurrency and thus breadth of interaction, and are prone to serious dysfunctions such as polarization and hidden profiles. Social media such as email, blogs, wikis, chat rooms, and web forums provide unprecedented opportunities for interacting on a massive scale, but have yet to realize their potential for helping people deliberate effectively, typically generating poorly-organized, unsystematic and highly redundant contributions of widely varying quality. Large-scale argumentation systems represent a promising approach for addressing these challenges, by virtue of providing a simple systematic structure that radically reduces redundancy and encourages clarity. They do, however, raise an important challenge. How can we ensure that the attention of the deliberation participants is drawn, especially in large complex argument maps, to where it can best serve the goals of the deliberation? How can users, for example, find the issues they can best contribute to, assess whether some intervention is needed, or identify the results that are mature and ready to “harvest”? Can we enable, for large-scale distributed discussions, the ready understanding that participants typically have about the progress and needs of small-scale, collocated discussions?. This paper will address these important questions, discussing (1) the strengths and limitations of current deliberation technologies, (2) how argumentation technology can help address these limitations, and (3) how we can use attention-mediation metrics to enhance the effectiveness of large-scale argumentation-based deliberations. Key words: Deliberation, Metrics, Argumentation
1. Review of existing deliberation technologies Let us define deliberation as a process where communities (1) identify possible solutions for a problem, and (2) select the solution(s) from this space that best meet their diverse needs (Walton and Krabbe, 1995) (Eemeren and Grootendorst 2003). How well do existing technologies meet this challenge? A wide range of social computing technologies have emerged in the past few decades, including email, chat, web forums, wikis like wikipedia, media sharing
450
Mark Klein
sites like youtube and flickr, open source software development efforts such as Linux, solution competitions such as Innocentive.com, idea-sharing systems such as ideastorm.com, peer-filtering sites such as Slashdot, group decision support (GDSS) systems (Benbasat and Lim, 2000) (Reagan-Cirincione, 1994) (Smallman, 2008) (Shrager et al., 2010) (Convertino et al., 2008) (Boland et al., 1992) (Poole et al., 1988) (Farnham et al., 2000) (Kramer and King, 1988) (Pervan and Atkinson, 1995) (Luppicini, 2007) (Gopal and Prasad, 2000) (Macaulay and Alabdulkarim, 2005) (Powell et al., 2004), and scientific collaboratories (Finholt, 2002). Experience with such systems has shown that they can foster, by virtue of reducing the cost of participation, voluntary contributions at a vast scale, which in turn can lead to remarkably powerful emergent phenomena (Tapscott and Williams, 2006) (Sunstein, 2006) (Surowiecki, 2005) (Gladwell, 2002) (Dennis and Valacich, 1993) that include:
& & & &
Idea synergy: the ability for users to share their creations in a common forum can enable a synergistic explosion of creativity, since people often develop new ideas by forming novel combinations and extensions of ideas that have been put out by others. The long tail: social computing systems enable access to a much greater diversity of ideas than would otherwise be practical: “small voices” (the tail of the frequency distribution) that would otherwise not be heard can now have significant impact. Many eyes: social computing efforts can produce remarkably high-quality results by virtue of the fact that there are multiple independent verifications - many eyes continuously checking the shared content for errors and correcting them. Wisdom of the crowds: large groups of (appropriately independent, motivated and informed) contributors can collectively make better judgments than those produced by the individuals that make them up, often exceeding the performance of experts, because their collective judgment cancels out the biases and gaps of the individual members.
On the other hand, group decision-making is also prone to dysfunctional emergent behaviors that can deeply undercut the quality of the deliberation outcomes (Tversky and Kahneman, 1974) (Sunstein, 2006) (Schulz-Hardt et al., 2000) (Cook and Smallman, 2007). To more fully understand the strengths and limitations of deliberation technologies, it is helpful to divide them up based on how they structure content. One category is time-centric tools, i.e. tools like email, chat rooms, and web forums where content is organized based on when a post was contributed. Such systems enable large communities to weigh in on topics of interest, but they face serious shortcomings from the perspective of enabling collective:
&
Scattered content: The content in time-centric tools is typically widely scattered, so it’s hard to find all the contributions on a topic of interest. This
Attention-Mediating Deliberation Metrics
&
&
&
451
also fosters unsystematic coverage, since users are often unable to quickly identify which areas are well-covered, and which need more attention. Low signal-to-noise ratio. The content captured by time-centric tools is notorious for being voluminous and highly repetitive. This is a selfreinforcing phenomenon: since it can be difficult to find out whether a point has already been made in a large existing corpus, it’s more likely that minor variants will be posted again and again by different people. Some authors may do so simply hoping to win arguments by sheer repetition. This low signal-to-noise ratio makes it difficult to uncover the novel contributions that inspire people to generate creative new ideas of their own. Balkanization: Users of time-centric systems often tend to self-assemble into groups that share the same opinions—there is remarkably little cross-referencing, for example, between liberal and conservative blogs and forums—so they tend to see only a subset of the issues, ideas, and arguments potentially relevant to a problem. This tends to lead people to take on more extreme, but not more broadly informed, versions of the opinions they already had. Dysfunctional argumentation: Time-centric systems do not inherently encourage or enforce any standards concerning what constitutes valid argumentation, so postings are often bias-rather than evidence-or logic-based.
Enormous effort is typically required to “harvest” the corpuses created by timecentric tools to identify the most important issues, ideas, and arguments. Intel, to give a typical example, ran a web forum on organizational health that elicited a total of 1000 posts from 300 participants. A post-discussion analysis team invested over 160 person-hours to create a useful summary of these contributions (at 10 minute a post, probably longer than it took to write many of the posts in the first place). The team found that there was lots of redundancy, little genuine debate, and few actionable ideas, so that in the end many of the ideas they reported came from the analysis team members themselves, rather than the forum.1 It could be argued that many of these concerns are less prominent in topiccentric tools such as wikis and idea-sharing systems. In wikis, for example, all the content on a given topic is captured in a single article. But wikis are deeply challenged by deliberations on controversial topics (Kittur et al., 2007) (Viegas et al., 2004). They capture, by their nature, the “least-common-denominator” consensus between many authors (any non-consensus element presumably being edited out by those that do not agree with it), and the controversial core of deliberations are typically moved to massive talk pages for the article, which are essentially time-centric venues prone to all the limitations we noted above. Idea-sharing tools—such as Dell’s Ideastorm.com, the Obama administrations’ Open for Questions web site, and Google’s project10tothe100.com—are organized around questions: one or more questions are posted and the community is asked to contribute, rate, and comment on proposed solutions. Such sites can elicit huge levels of activity—the Obama site for example elicited 70,000 ideas and 4 million votes in
452
Mark Klein
3 weeks—but they are prone to several serious shortcomings. One is redundancy: in all of these sites, many of the ideas represent minor variations of each other. When there are thousands of posts submitted, manually pruning this list to consolidate equivalent posts is a massive undertaking. In Google’s case, for example, the company had to recruit 3,000 employees to filter and consolidate the 150,000 ideas they received in a process that put them 9 months behind their original schedule. Another issue is non-collaborativeness. Idea-sharing sites tend to elicit many fairly simple ideas. The ideas generated by the google project, for example, (e.g. make government more transparent, help social entrepreneurs, support public transport, create user-generated news services) were in large part not novel and light on detail. Surely that massive amount of effort could have been used to compose a smaller number of more deeplyconsidered ideas, but idea-sharing sites provide little support (or incentive) for this, because people can not collaboratively refine submitted ideas. 2. Large-scale argumentation Large-scale argumentation represents a promising approach to addressing the weaknesses with current deliberation technologies. We describe this approach below. Argumentation tools (Kirschner et al., 2003) (Moor and Aakhus, 2006) (Walton, 2005) take an argument-centric approach based on allowing groups to systematically capture their deliberations as tree structures made up of issues (questions to be answered), ideas (possible answers for a question), and arguments (statements that support or detract from an idea or argument) that define a space of possible solutions to a given problem (figure 1): Such tools have many advantages. Every unique point appears just once, radically increasing the signal-to-noise ratio, and all posts must appear under the posts they logically refer to, so all content on a given question is co-located in the tree, making it easy to find what has and has not been said on any topic, fostering more systematic and complete coverage, and counteracting balkanization by putting all competing ideas and arguments right next to each other. Careful critical thinking is encouraged, because users are implicitly encouraged to express the evidence and logic in favor of the options they prefer (Carr, 2003), and the community can rate each element of their arguments piece-by-piece. Users, finally, can collaboratively refine proposed solutions. One user can, for example, propose an idea, a second raise an issue concerning how some aspect of that idea can be implemented, and a third propose possible resolutions for that issue. The value of an argument map can extend far beyond the deliberation it was initially generated for, because it represents an entire design space of possible solutions that can be readily harvested, refined and re-combined by other communities facing similar problems. Most argumentation systems have been used by individuals or in small-scale settings, relying in the latter case on a facilitator to capture the free-form interactions of a collocated group as a commonly-viewable argument map (Shum
Attention-Mediating Deliberation Metrics
453
Figure 1. A screenshot of the Deliberatorium, a large-scale argumentation system.
et al., 2006). Argumentation systems have also been used, to a much lesser extent, to enable distributed deliberations over the Internet (Jonassen and Jr, 2005) (Chklovski et al., 2005) (Lowrance et al., 2001) (Heng and de Moor, 2003) (Karacapilidis et al., 2004) (Rahwan, 2008). These maps tend to be poorly structured, however, because many users are not skilled argument mappers, and the scale of participation has been small,2 typically involving only a handful of authors on any given task. The author and his colleagues have investigated, over the past several years, how an argument-centric approach can be extended to operate effectively at the same large scales as other social computing systems. Our approach is simple. Users are asked to create, concurrently, a network of posts organized into an argument map. We use the IBIS argumentation formalism (Conklin, 2005) because it is simple and has been applied successfully in hundreds of collective decision-making contexts. A set of community conventions (similar to those that underlie other social computing systems like Wikipedia and Slashdot) help ensure that the argument map is well-organized. Each post should represent a single issue, idea, pro, or con, should not replicate a point that has been made elsewhere in the argument map, and should be attached to the post it logically refers to. A
454
Mark Klein
central tenet is the “live and let live” rule: if one disagrees with an idea or argument, the user should not change that post to undermine it, but should rather create new posts that present their alternative ideas or counter-arguments. Every individual can thus present their own point of view, using the strongest arguments they can muster, without fear of sabotage by anyone else. This process is supported by capabilities that have proven invaluable in other social computing systems, including rating (to help the community encourage and identify important issues, ideas and arguments), watchlists (which automatically notify users of changes to posts they have registered interest in), version histories (to allow users to roll-back an post to a previous version if it has been “damaged” by an edit), and home pages (which allows users to develop an online presence). The system also provides multiple forms of social translucence (Erickson et al., 2002) (i.e. visual cues concerning who is doing what in the system), thereby fostering a sense of belonging as well as enabling self-organized attention mediation by the community. See (Klein, 2007) for further discussion of the issues underlying the design of large-scale argumentation capability. The system itself is accessible at http://franc2.mit.edu/ci/. Because good argument-mapping skills are not universal, moderators help ensure that new posts are correctly structured. Their job is part education, and part quality control. Posts, when initially created, are given a “pending” status and can only be viewed by other authors. If a post doesn’t adequately follow the argument map conventions, moderators will either fix it or leave comments explaining what needs to be done. Once a moderator has verified that a post follows the conventions, the post is “certified” and becomes available to be viewed, edited, commented on, or rated by the general user population. The certification process helps ensures well-structured maps, and provides incentives for users to learn the argument formalism. Moderators serve a relatively modest role in all this: their role is not to evaluate the merits of a post, but simply to work with authors to ensure that the content is structured in a way that maximizes its utility to the community at large. We estimate, based on our experience to date, that there needs to be about 1 moderator for every 20 active authors, to ensure that posts are checked and certified in a timely fashion without undue burden on each moderator. This figure is well within the bounds of the percentage of “power users” that are a common feature of social computing systems. Experienced authors, with a track record of successful argument map creation, can be selected to join the moderator pool. There is already, in addition, a substantial world-wide community of people with argument mapping skills. One organization alone (cognexus.org) has trained and certified hundreds of people on argument mapping techniques. Argument mapping is, in addition, a natural skill for lawyers, philosophers, mathematicians, library scientists, debaters, and others who frequently create taxonomies, arguments or proofs. Such individuals may be inspired by the opportunity to contribute their skills to deliberations about complex critical challenges, even if they do not have substantial content expertise in that area.
Attention-Mediating Deliberation Metrics
455
We have implemented an initial version of these ideas, in the form of a web-based tool called the Deliberatorium (Klein and Iandoli, 2008) (Iandoli et al., 2009) and evaluated it to date with over 700 users deliberating on a wide range of topics. The largest evaluation was performed at the University of Naples with 220 masters students in the information engineering program, who were asked to use the system to deliberate, over a period of 3 weeks, about the use of bio-fuels in Italy (Klein and Iandoli, 2008) We observed a very high level of user participation. All told, the students posted over 3000 issues ideas and arguments, in addition to 1900 comments. This is, to our knowledge, both the largest single argument map ever created, as well as (by far) the largest number of authors for a single argument map. Roughly 1800 posts were eventually certified, and about 70% of all posts could be certified without changes, demonstrating that, even after a relatively short usage period, most authors were able to create properly-structured posts. The certification ratio, in addition, increased over the duration of the experiment. The breadth and depth of coverage was, in the judgment of content experts, quite good: this community of non-experts was able to create a remarkably comprehensive map of the current debate on bio-fuels, complete with references, exploring everything from technology and policy issues to environmental, economic and socio-political impacts. We were hard-pressed to imagine any other approach that would allow over 200 authors to write what was in effect a substantial book on a complex subject, in a couple weeks, with no one in charge. Other evaluations (including a deliberation with 120 students at the University of Zurich, with 73 users at Intel, and with 40 users at the US Federal Bureau of Land Management), have explored the efficacy of our large-scale argumentation tool for a range of topics and incentive structures. These evaluations support the idea that large-scale argumentation can be applied effectively to complex deliberations. Substantial user communities with no initial familiarity with argumentation formalisms have been able, in a range of contexts, to rapidly create substantive, useful, compact, and well-organized maps on complex topics, while requiring levels of moderator effort much lower than those needed to harvest, post-hoc, discussions hosted by such conventional social computing tools as web forums. We have since developed a better understanding of why we were able to achieve such substantial levels of use in a large-scale argumentation system. Active participation in social media systems tends to occur when the benefits, to the individual contributors, substantially exceed the costs (Benkler, 2006). With small numbers of participants, an informal approach, based for example on phone or email or web forums, clearly minimizes participation costs and can produce good deliberation outcomes. But this picture changes, we suggest, as the scale of the discussion grows. It has been found (Hars and Ou, 2002) (Lakhani and Wolf, 2005) (Roberts et al., 2006) (Bonaccorsi and Rossi, 2004) that users are motivated by two key benefits when contributing to social computing systems: (1) finding their tribe
456
Mark Klein
(i.e. getting connected with people who share their interests) and (2) becoming a hero (having a substantive positive impact on a community they care about). How does this play out in a deliberation context? Let us make the reasonable assumption that points (individual issues ideas, and arguments) have something like a Gaussian distribution in the user population (i.e. some points are known to most people, some to only a few). We can thus expect widely known points to be submitted frequently from multiple sources, and more “out-of-the-box” (but potentially valuable points) to arise less often. It seems clear that the number of unique points contributed to the deliberation will grow much more slowly than the number of participants. Our simulations suggest that this growth is roughly a logarithmic function of the community size, implying a roughly linear growth in redundancy (figure 2). The larger the user community, therefore, the more potential redundancy there is, and thus the more value argument mapping offers in terms of improving the signal to noise ratio. There is widespread disaffection with the low signal-to-noise ratio of current social media tools. We can thus expect that, as the scale of the discussion grows, users will increasingly recognize the opportunity to “become a hero” by contributing something (i.e. creating a value-rich deliberation map) that is highly valued by the community. Argument mapping also increases user’s chances of “finding their tribe”. While contributing to unstructured discussions is # posts # unique posts redundancy
Redundancy vs community size 1200
1000
# posts
800
600
400
200
0 1
69
137
205
273
341
409
477
545
613
681
749
817
885
# contributors
Figure 2. Simulation results for number of redundant posts in a deliberation.
953
Attention-Mediating Deliberation Metrics
457
easier, the high volume and redundancy of such discussions means that most posts will probably be overlooked by most readers. In an argument map, by contrast, if you have a unique point to make, it has a much greater chance of being seen. We can thus expect the benefits of argument mapping, to a contributor, will increase rapidly with the size of the user community. We can also expect, moreover, that the costs of participation for a contributor will grow only slowly as the community scales. Contributing to a deliberation map incurs two main costs: 1) Unbundling the contribution into its constituent issues, ideas, and arguments 2) Locating the proper place to place these elements in the map
The cost of unbundling a contribution is independent, of course, of the size of the map. The cost of locating a contribution should increase with the size of the map, but only slowly. Remember that a deliberation map is structured like a tree. To find the right place to put a post in a tree, you just have to pick the right top-level branch to place it, the right sub-branch under that, and so on, until you reach the place where it belongs. If the average branching factor (number of sub-branches per branch) of a tree is N, then the average number of steps needed to locate a post is no more than the Nth logarithm of the tree size. The overall picture is thus the following (figure 3): Cost vs Benefits
mapping cost compression benefit
160 140 120 100 80 60 40 20 0 1
14
27
40
53
66
79
92 105 118 131 144 157 170 183 196 209 222 235 248
community size
Figure 3. Cost vs benefits for argument mapping as a function of community size.
458
Mark Klein
The benefits of adding to an argument map grow rapidly as the community scales, in proportion to how much the signal-to-noise ratio is improved, roughly exponentially, The costs of adding to an argument map grow only logarithmically with scale. At some point, we can expect, the benefits to individual authors will greatly exceed the costs, thereby providing compelling incentives for participation. Technological and process refinements can further improve this picture by reducing authoring and moderation costs. One option, for example, is “wisdom of the crowds” moderation, wherein the full user community, performing simplified moderation “micro-tasks”, substitutes for a relatively small cadre of expert moderators. There is good reason to believe that such an approach can work. It has been shown many times that large numbers of people with even modest skills can, in the aggregate, perform judgment tasks better than experts (Surowiecki, 2005). It has also been shown, by such systems as Amazon’s Mechanical Turk, that large number of people are willing to perform such micro-tasks cheaply, or even for free if they believe in the project. While these results are promising, our work has led us to conclude that, to fully realize argumentation technology’s potential for supporting large-scale deliberations, we need to address the critical challenge of attention allocation. For the kinds of topics that most require large-scale deliberation, even a user community of moderate size can quickly generate large and rapidly growing argument maps. How can we help users identify the portions of the map that can best benefit from their contributions, in maps with hundreds or thousands of posts? How can the stakeholders assess whether the deliberations are progressing well, whether some intervention is needed to help the deliberations work more effectively, and when the results are mature and ready to “harvest”? Can we foster, for large-scale deliberations, the understanding that participants in small-scale discussions typically have about where the discussion has gone, what remains to be addressed, and where they can best contribute. Without this kind of big picture, we run the risk of severely under-utilizing the collective intelligence potentially provided by large-scale social media. 3. Attention mediation metrics We can meet this challenge, we believe, by developing algorithms that provide users with personalized suggestions concerning which parts of the argument map they should contribute to, and why (figure 4): Each user is free to accept or ignore suggestions as they like, but they know that the suggestions are based on an overview of the deliberation as a whole and are intended to help them make maximal use of their unique skills and perspectives. If the suggestions are reasonably well-done, the emergent effect should be that the collective intelligence of the user community is maximized because each user contributes where they can do the most good.
Attention-Mediating Deliberation Metrics
459
Figure 4. Using metrics to enable attention mediation in large-scale deliberations.
How can such suggestions be generated? The first step in our approach is to perform an exception analysis (Klein, 2003) to systematically identify the kinds of situations where the deliberation is proceeding sub-optimally and the users should therefore be notified. We do so by defining a normative model that specifies (1) the goals a deliberation process should ideally achieve, (2) the process steps used to achieve these goals, and (3) how the process steps can fail to achieve their goals (i.e. the exceptions). Next, we identify metrics that can assess whether and where, in the argument map, each type of exception is taking place. When a deliberation is actually taking place, the metrics are evaluated to identify which exceptions are currently occurring. Finally, participants are notified (based on a model of their roles and interests) about the exceptions that they probably are interested, and able, to help resolve. These notifications thus represent personalized suggestions concerning how the participants can best contribute to the current deliberation. We illustrate these steps in the paragraphs below. 3.1. An Exception Analysis of Large-Scale Deliberation Our deliberation model formalizes a straightforward view of what makes up a good collaborative decision-making process. The main steps ( ) and goals ( ) in the process are as follows (figure 5):
460
Mark Klein
Figure 5. Normative process model for collaborative decision-making.
The complete analysis of this model (available upon request) is too large to review in it’s entirety in this paper. We will provide instead, in the following paragraphs, illustrative examples of the exceptions and metrics derived from our analysis. Metrics for “Identify Possible Solutions”.
One of the key goals for any
deliberation process is to achieve excellent coverage of the space of possible solution ideas, but this goal may not always be achieved, leading to such exceptions ( ) as “dominated by narrow range of ideas” (figure 6a). We have identified several possible metrics for detecting this exception. One is to look for issues where the ideas come from “non-disjoint” author sets i.e. where all the authors tend to agree about the pros and cons for different ideas and therefore probably share an intellectual frame. This is straightforward to assess in an argument map: we can for example perform vector orthogonalization (Householder, 1958) on participants’ rating vectors, followed by a simple vector distance calculation, to assess how much the opinions for different users diverge. If we have only a moderate degree of opinion divergence for the authors of the ideas for a given issue, this suggests that the ideas captured so far may be relatively less diverse. Another way of detecting this exception involves measuring the use of shared vocabulary in the ideas for a given issue. If there is heavy use of shared terminology, this again suggests that the ideas are only moderately diverse. A third approach is simply to count the number of authors that contributed ideas for an issue: if a small number of authors contributed the bulk of the ideas, this suggests the diversity may be relatively low. Another key goal for solution generation is to identify high-quality (i.e. promising) solutions (figure 6b): One straightforward metric for identifying low-quality solutions is, of course, to look for idea posts with low aggregate rating scores. Note that ratings for posts
Attention-Mediating Deliberation Metrics
461
Figure 6. (Partial) exception analysis for “identify possible solutions”.
in a argument map will generally be more meaningful than those in other forms of social media (e.g. wikis, web forums) because each rating is attached to a single point (issue, idea, or argument) rather than a collection of multiple points that may vary widely in quality. A second metric for assessing solution quality is based on process, rather than outcome. As we noted above, in our system argument map posts can be edited openly, like wiki pages, allowing the community to collaboratively improve each post. In controversial topic areas, however, open authoring can lead to destructive “edit wars”, where people who disagree about a post repeatedly replace competitors’ content with their own perspective. This “idea sabotage” can be detected by looking for alternating edits by users that appear to have divergent opinions (based on their rating behavior: see above) about the issue they are proposing solutions for. Even these simple examples make it clear that it can be difficult to define metrics that guarantee that a given exception is taking place. Doing so would often require sophisticated natural language understanding technology well beyond the current state of the art. It does seem feasible, however, to use wellknown data mining techniques (e.g. word frequency statistics, network analysis, vector orthogonalization and so on), taking advantage of the additional semantics
462
Mark Klein
provided by argument maps, to define metrics that assess whether or not an exception might be taking place. The deliberation participants can then decide for themselves whether the exceptions are indeed in play and need to be addressed. Metrics for “Evaluate Solutions”. When evaluating potential solutions for a problem, one key goal is to ensure that the evaluation of the ideas is complete: i.e. all key arguments for and against an idea have been identified (figure 7): One straightforward metric for the exception “incomplete argumentation” is to simply identify ideas which have few or no arguments attached to them. A second, more subtle, approach, is to assess whether a user has given ratings for ideas that are inconsistent with the ratings they gave the underlying arguments. We can use such techniques as Bayesian inference (Bolstad, 2010) to propagate a user’s ratings for arguments up the argument map to predict how the user should have rated the ideas these arguments address. If there is a large divergence between a user’s predicted and actual ratings for an idea, that suggests that the user has not yet entered arguments that are compelling to him or her. Metrics for “Select Solutions”. This aspect of the deliberation process (when the community is asked to converge upon one or more agreed-upon solutions for the key issues) has a range of challenging exceptions. We will focus, here, on the exceptions associated with the sub-goal “stakeholders make judgments rationally” (Figure 8). One key exception involves participants converging on one or a few highly rated solution ideas for an issue before a full consideration of the relevant ideas and arguments i.e. before the argument map for that issue is “mature”. Deliberation maturity can be assessed in several ways. Deliberations, especially in argument map contexts, tend to evolve from defining issues to proposing ideas to identifying trees of pro and con arguments. Early activity also tends to be characterized by creating new posts, and later activity by refining them, followed
Figure 7. (Partial) exception analysis for “evaluate solutions”.
Attention-Mediating Deliberation Metrics
463
Figure 8. (Partial) exception analysis for “select the best solution”.
eventually by relative quiescence. An actively growing map with a few relatively shallow argument branches, as well as few edits per post, is thus probably relatively immature. We can also assess opinion churn for each issue (i.e. whether the highest-rated ideas for individuals, as well as the community as a whole, are still changing rapidly or not). A second key exception is “groupthink”, which can be defined as a group dedicating the bulk of its attention to refining a single solution idea, often the first one endorsed by an influential figure, rather than comparing several alternatives in depth. This is straightforward to assess in an argument map because we can measure when one idea under an issue is receiving the bulk of the community’s attention (views, rates, edits and additions) while competing ideas and their underlying arguments remain largely untouched. A third important exception is “balkanization”, wherein a community divides itself into sub-groups where members of each group agree with one other but ignore contributions of groups with competing ideas. This exception can be assessed using a metric that looks for clustering in the attention allocation of the participants in the discussion for a given issue. The final exception we will consider is “non-independent ratings”. It has been shown that when people are asked to rate competing ideas, if they can see the ratings made to date (e.g. they see the ideas in popularity-sorted order), then the first ideas that happen to get a rating advantage tend to become the eventual winners—they “lock in” to the winning position—even if they are worse than ideas that appeared later or started with lower ratings (Salganik et al., 2006). The underlying problem is that people attend to, and rate, ideas based on their popularity rather than their inherent merits. This exception can be detected using a metric that checks whether the popularity order for a set of competing ideas has remained relatively unchanged as the deliberation has progressed in maturity,
464
Mark Klein
especially if the idea popularity ranking diverges from what would be predicted by looking at the underlying argument ratings.
3.2. Generating Attention Mediation Suggestions When a deliberation is actually taking place, in our approach, analysis algorithms generate personalized suggestions for users concerning which posts they might want to look at in order to contribute most effectively to the deliberation at hand. These suggestions are identified by (1) maintaining continuously updated values for the deliberation metrics (such as those we identified above), (2) identifying the exceptions that seem to be active based on these metric values, and (3) notifying the user about the exceptions they can help resolve. Calculating metrics. Metrics are calculated by running queries over the database created by users as they deliberate using the Deliberatorium. This database includes entities representing every user and post (issue, idea, argument, and comment), relations representing the connections between posts, and time-stamped events capturing all changes (creation, editing, moving) to these entities and relations. Since the database has a network structure, we use a graph pattern-matching language to express our queries. The current system uses PQL (Klein and Bernstein, 2004) but it could use other graph query languages such as SPARQL (Group, 2008) for the same purpose. One metric, for example, involves looking for ideas or arguments where the propagated and actual ratings differ widely. This could be captured using the following PQL query: PQL query code
Explanation
(entity ?e isa (or idea pro con)) (attribute average-rating of ?e is ?av) (attribute propagated-rating of ?e is ?pv) (let ?diff be (abs (− ?av ?pv))) (when (> ?diff 2))
Binds idea, pro and con entities to variable ?e
(let ?strength be ?diff)
Binds actual community rating to ?av Binds propagated rating to ?pv Find the difference between the propagated and actual ratings Ensures the difference is more than 2 (on a scale of 5)—smaller differences do not trigger this metric Sets the metric’s “strength” to be a function of the size of the rating disconnect
Each metric can be viewed, in effect, as a kind of “daemon” that continually scans the deliberation database for instances that trigger it. Every daemon has a “strength” value, representing how strongly it was triggered. In the above, for
Attention-Mediating Deliberation Metrics
465
example, the daemon’s strength is a function of the difference between the actual and propagated ratings. Identifying exceptions. Exceptions are presumed to be occurring when their associated daemon(s) are firing strongly for a given part of the deliberation map. The more strongly the exception’s daemons fire, the more likely the exception is considered to be. So, for example (see Figure 7), if both the “inconsistent ratings” and “few arguments” daemons fire strongly for a given post in the map, the “incomplete argumentation” exception is considered to be active there. It is difficult, as we noted above, to define metrics that guarantee a given exception is taking place, so there are likely to be “false positives” (i.e. cases where the daemons incorrectly flag an exception somewhere when it is not in fact taking place). Given that, users have to assess, for themselves, whether an exception is in fact “real” and requires attention. If such false positives are too numerous, however, it undercuts the whole point of the approach, which is after all to greatly reduce the user effort required to find the areas in the deliberation that can benefit from their particular knowledge and skills. This represents an open challenge for us at this point, but we believe that one promising strategy is to implement “crowd-sourced” exception validation. The idea is that, when a user views a suggested post, he or she can check off whether or not the proposed exceptions are actually taking place. As more and more users do so, the system can begin to highlight only those posts whose exceptions have been widely validated, reducing the impact of false positives. We can even imagine, eventually, applying machine learning techniques to this user validation data to help develop better exception detection metrics. It is of course probably over-ambitious to imagine that software algorithms will be able to fully identify all departures from “ideal” deliberation without explicit human guidance. People, especially the “customers” for a deliberation, should be able to augment these suggestions. This is straightforward to do within our current structure: we need only create additional daemons that fire, more or less strongly, for parts of the deliberation map specified by human users. Notifying users. Once the active exceptions are identified, users are provided with personalized suggestions corresponding to the exceptions they seem wellsuited to help resolve. This matching process is conducted based on the user’s roles, interests and skills. Roles. The Deliberatorium includes several user roles, including author, moderator (responsible for assuring the map is well-structured), topic manager (responsible for assuring the deliberation is progressing effectively) and customer (the individuals seeking solutions to the problems being discussed). Different roles will be interested in different exceptions. Topic managers, for example, would
466
Mark Klein
probably be interested in the “idea sabotage” exception mentioned above because they have the power and responsibility to do something about it (e.g. by managing who is allowed to edit a given post). Interests. Our model of user interests is based on an analysis of their deliberation activity, assuming that a user is interested in a post if they devoted a substantial amount of effort to viewing, rating, commenting on, or editing it (or its neighbors) in the past, and:
& & &
The post is currently experiencing high/growing activity The post is receiving community scores that differ strongly from the rating the user gave the post The user expressed a strong opinion about the post (positive or negative) and it has a high/growing controversy score (i.e. the post itself has a wide rating variance and/or there are many highly rated pros and cons underneath it)
We also cluster users according to their map activity, so that we can identify whether a post has been interesting to someone who has shared interests and even opinions with you in the past, as with e-commerce recommender systems (Adomavicius and Tuzhilin, 2005). Skills. A user’s skills can be inferred by identifying the parts of the map in which they contributed to highly-rated posts. Once the personalized suggestions are generated, they are presented to the users in the form of a deliberation map subset wherein the suggested posts are presented along with the reasons for their selection (i.e. listing the exceptions considered active for those posts, as well as why the user might be interested in them) (figure 9): More highly recommended posts are made visibly salient by appearing in a larger font. Users are then free to contribute to resolving the exceptions they feel most interested and capable of addressing. 4. Contributions The key contribution of this work is to explore how software algorithms can help users allocate their efforts, in a large-scale argumentation context, to where they can do the most good. This approach, if executed well, can synergistically harness the creativity and judgment of human communities along with computer systems’ ability to rapidly analyze large data sets. More specifically, we have shown how exception analysis can identify useful deliberation metrics, as well as how metrics can be used to generate personalized attention mediation suggestions. We also presented a substantial model of exceptions and metrics for largescale deliberations, and demonstrated how semi-formal argument maps make it practical to define much more powerful metrics than for conventional
Attention-Mediating Deliberation Metrics
467
Figure 9. The personalized suggestions display.
(unstructured) social media such as email, web forums, and so on. While there has been substantial effort devoted to manually-coded, post-hoc metrics on the efficacy of on-line deliberations (Steenbergen et al., 2003) (Stromer-Galley, 2007) (Trénel, 2004) (Cappella et al., 2002) (Spatariu et al., 2004) (Nisbet, 2004), existing deliberation technologies have made only rudimentary use of automated real-time metrics to foster better emergent outcomes during the deliberations themselves. The core reason for this lack is that, in existing deliberation tools, the content takes the form of unstructured natural language text, limiting the possible deliberation metrics to the analysis of word frequency statistics, which is a poor proxy for the kind of semantic understanding that would be necessary to adequately assess deliberation quality. One of the important advantages of using argument maps to mediate deliberation is that they allow us, by virtue of their additional semantics, to automatically derive metrics that would require impractically resource-intensive manual coding for more conventional social media. We are aware of one other effort to develop real-time deliberation metrics for large-scale argument mapping, but this work (Shum et al., 2012) is based on measuring how well the deliberations adhere (e.g. in terms of revisability, simultaneity, visibility and so on) to a model of how small-scale, physically collocated conversations should take place (Clark and Brennan, 1991). Our work is unique, we believe, in how it attempts to assess (and improve) how effectively large distributed groups are deliberating (i.e. exploring and converging on problem solutions) rather than just how well a few collocated individuals are conversing.
468
Mark Klein
5. Limitations and future work While our aspiration is for our approach to be viable for all kinds of deliberations, an argument mapping approach does impose a greater learning curve, as well as more overhead, than free-form social media such as web forums. Given that, our sense is that it is best suited for high-impact problems where current approaches (including physical meetings as well as conventional social media) are inadequate, i.e. where many participants are involved, these participants are geographically distributed, and the problems inherently involve many potential solutions and evaluation perspectives, so a disciplined approach for capturing the deliberations becomes critical. These kinds of problems often include, for example, “strategic” (early life cycle) decision-making in such realms as policy, product, and process design. Even in these contexts, of course, our approach may add little value if there are debilitating cultural and organizational barriers to effective deliberation, such as management that is unwilling to incorporate broader input in its decision making, a culture where participants are unwilling to engage in (or even tolerate) constructive confrontation, or intellectual property concerns that prevent the free exchange of insights and ideas. The simple argument map scheme we use represents a tradeoff between expressiveness and ease-of-use. It does not formally capture, for example, such concepts as:
&
& &
Interdependencies: often the solution to a complex problem consists of a package of component solutions that have to fit together well in order to work (Johnson and Halpin, 1974) (Klein et al., 2003). This could be handled, for example, by having ideas that consist of packages of other ideas, that could be evaluated based, among other things, on how well these ideas fit together. Relative valuations: often people wish to discuss the relative, rather than the absolute merits of ideas (e.g. “carbon tax is better than cap-and-trade because …”). This could be handled by having “better than” links between ideas that can in turn be linked to pro and con arguments. Warrants vs antecedents: the current formalism doesn’t allow one to deliberate separately about the antecedents and warrant for an argument. If someone claims, for example, that “global warming is bunk because the weather was cold today”, you can’t argue for or against the warrant (i.e. “a cold day debunks global warming”) separately from the antecedent (“today was a cold day”). This could be handled by allowing one to attach arguments to the links between other arguments, or (equivalently) by being able to specify when the children of an argument have a conjunctive semantics as opposed to the default disjunctive semantics.
We have experimented with extending our formalism to cover these cases, but our experience to date has been that every additional bit of formal complexity
Attention-Mediating Deliberation Metrics
469
seems to substantially increase the barrier to widespread participation. We are thus taking a very incremental approach to adding new formal structures, and are studying how to do so in a way that works both for novice and more expert users. While this has not happened in our evaluations to date, it is of course possible that moderators may undercut the deliberations by imposing their own biases when deciding which posts to certify. There are many ways to address this concern. For example, moderators can double-check each others’ decisions, especially when authors complain that a post of theirs was inappropriately left uncertified. We can also provide a moderator rating system so that the user community can identify consistently poor moderators, plus a voting process for deciding when to remove moderator privileges from such individuals. Such a “meta-moderation” model has worked successfully with such systems as Wikipedia and Slashdot. Another potential problem is that the community’s understanding of a topic may shift, so that the initial formulation of the deliberation (as captured by the set of key issues being discussed) may need to change after considerable effort has been invested deliberating about the old set of issues. At worse, all affected posts might have to be de-certified and go through a new certification process to make sure they get fit properly into the new structure. This has also not happened in our evaluations to date—the core questions have been clear-cut from the start - but it does suggest that the participants in a deliberation should frame the “skeleton” of initial issues carefully so this problem can be avoided as much as possible. While there are many avenues, as we can see, for future work, our immediate efforts will focus on the following key questions:
&
&
&
Evaluating this attention-mediation approach. This will include assessing the validity of the deliberation metrics (i.e. do they do a good job of detecting deliberation exceptions?) as well as the effectiveness of the attention mediation algorithms (i.e. do they result in more efficient and effective deliberations?). This will require, among other things, developing a better understanding of how frequently different exceptions occur in different deliberation contexts. Developing better metrics. This will include identifying a more complete set of metrics—we would like to investigate, for example, whether cognitive biases (Tversky and Kahneman, 1974) can be detected using deliberation metrics—as well as implementing better metrics, building on technological advances from such areas as data mining, network analysis and natural language processing. Developing better attention mediation algorithms. Attention mediation can be viewed as a kind of diagnosis problem, where we use the metrics values (“symptoms”) to determine which exceptions (“diseases”) may be occurring and what attention mediation actions (“cures”) should be taken. Since it is unrealistic to expect software algorithms to fully understand the deliberation
470
Mark Klein customer’s needs and how they can be met, we will need to develop “mixedinitiative” attention mediation algorithms as well as better abstractions and visualization for user interaction, so human users can both guide the algorithms to focus on key elements of the argument map, as well as be guided in turn.
Acknowledgements The author would like to gratefully acknowledge the many useful conversations he has had on the topic of deliberation metrics with Prof Ali Gurkan (Ecole Centrale Paris), Prof. Luca Iandoli (University of Naples), and Prof. Haji Reijers (Eindhoven University of Technology). The work has been supported by the National Science Foundation.
Note 1. Based on personal communication with Catherine Spence, Information Technology Enterprise Architect, Computing Director/Manager at Intel. 2. The one exception we are aware of (the Open Meeting Project’s mediation of the 1994 National Policy Review (Hurwitz 1996)) was effectively a comment collection system rather than a deliberation system, since the participants predominantly offered reactions to a large set of preexisting policy documents, rather than interacting with each other to create new policy options.
References Adomavicius, G., & Tuzhilin, A. (2005). Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. Benbasat, I., & Lim, J. (2000). Information technology support for debiasing group judgments: an empirical evaluation. Organizational Behavior and Human Decision Processes, 83(1), 167–183. Benkler, Y. (2006). The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press. Boland, R. J., Maheshwari, A. K., Te’eni, D., Schwartz, D., & Tenkasi, R. V. (1992). Sharing Perspectives in Distributed Decision Making. Computer-Supported Cooperative Work. Bolstad, W. M. (2010). Understanding Computational Bayesian Statistics. John Wiley. Bonaccorsi, A., & Rossi, C. (2004). Altruistic individuals, selfish firms? The structure of motivation in Open Source software. Cappella, J. N., Price, V., & Nir, L. (2002). Argument Repertoire as a Reliable and Valid Measure of Opinion Quality: Electronic Dialogue During Campaign 2000. Political Communication, 19 (1), 73–93. Carr, C. S. (2003). Using computer supported argument visualization to teach legal argumentation. In P. A. Kirschner, S. J. B. Shum, & C. S. Carr (Eds.), Visualizing argumentation: software tools for collaborative and educational sense-making (pp. 75–96). Springer-Verlag. Chklovski, T., Ratnakar, V., & Gil, Y. (2005). User interfaces with semi-formal representations: a study of designing argumentation structures. Proceedings of the 10th international conference on Intelligent user interfaces, 130–136.
Attention-Mediating Deliberation Metrics
471
Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC, US: American Psychological Association. Conklin, J. (2005). Dialogue Mapping: Building Shared Understanding of Wicked Problems. John Wiley and Sons, Ltd. Convertino, G., Billman, D., Shrager, J., Pirolli, P., & Massar, J. P. (2008). The CACHE Study: Group Effects in Computer-Supported Collaborative Analysis. Journal of Computer Supported Cooperative Work, 17(4), 353–393. Cook, M. B., & Smallman, H. S. (2007). Visual Evidence Landscapes: Reducing Bias in Collaborative Intelligence Analysis. Human Factors and Ergonomics Society Annual Meeting Proceedings. Dennis, A. R., & Valacich, J. S. (1993). Computer brainstorms: More heads are better than one. Journal of Applied Psychology, 78(4), 531–537. Eemeren, F. H. v., & Grootendorst, R. (2003). A Systematic Theory of Argumentation: The Pragmadialectical Approach. Cambridge University Press. Erickson, T., Halverson, C., Kellogg, W. A., Laff, M., & Wolf, T. (2002). Social Translucence: Designing Social Infrastructures that Make Collective Activity Visible. Communications of the ACM, 45(4), 40–44. Farnham, S., Chesley, H. R., McGhee, D. E., Kawal, R., & Landau, J. (2000). Structured online interactions: improving the decision-making of small discussion groups. Computer Supported Cooperative Work. Finholt, T. A. (2002). Collaboratories. Annual Review of Information Science and Technology, 36 (1), 73–107. Gladwell, M. (2002). The Tipping Point: How Little Things Can Make a Big Difference. Back Bay Books. Gopal, A., & Prasad, P. (2000). Understanding GDSS in Symbolic Context: Shifting the Focus from Technology to Interaction. MIS Quarterly, 24(3), 509–546. Group, R. D. F. D. A. W. (2008). SPARQL Query Language for RDF. http://www.w3.org/TR/rdfsparql-query/. Hars, A., & Ou, S. (2002). Working for Free? Motivations for Participating in Open-Source Projects. International Journal of Electronic Commerce, 6(3), 25–39. Heng, M. S. H., & de Moor, A. (2003). From Habermas' s communicative theory to practice on the internet. Information Systems Journal, 13(4), 331–352. Householder, A. S. (1958). Unitary Triangularization of a Nonsymmetric Matrix. Journal of the ACM, 5(4), 339–342. Iandoli, L., Klein, M., & Zollo, G. (2009). Enabling on-line deliberation and collective decision-making through large-scale argumentation: a new approach to the design of an Internet-based mass collaboration platform. International Journal of Decision Support System Technology, 1(1), 69–91. Johnson, E. M., & Halpin, S. M. (1974). Multistage inference models for intelligence analysis. Jonassen, D., & Jr, H. R. (2005). Mapping alternative discourse structures onto computer conferences. International Journal of Knowledge and Learning, 1(1/2), 113–129. Karacapilidis, N., Loukis, E., & Dimopoulos, S. (2004). A Web-Based System for Supporting Structured Collaboration in the Public Sector. LECTURE NOTES IN COMPUTER SCIENCE, 218–225. Kirschner, P. A., Shum, S. J. B., & Carr, C. S. (2003). Visualizing Argumentation: Software tools for collaborative and educational sense-making. Springer. Kittur, A., Suh, B., Pendleton, B. A., & Chi, E. H. (2007). He says, she says: conflict and coordination in Wikipedia. SIGCHI Conference on Human Factors in Computing Systems. Klein, M. (2003). A Knowledge-Based Methodology for Designing Reliable Multi-Agent Systems. In P. Giorgini, J. P. Mueller, & J. Odell (Eds.), Agent-Oriented Software Engineering IV (Vol. 2935, pp. 85–95). Springer-Verlag.
472
Mark Klein
Klein, M. (2007). The MIT Collaboratorium: Enabling Effective Large-Scale Deliberation for Complex Problems. Klein, M., & Bernstein, A. (2004). Towards High-Precision Service Retrieval. IEEE Internet Computing Journal, 8(1), 30–36. Klein, M., & Iandoli, L. (2008). Supporting Collaborative Deliberation Using a Large-Scale Argumentation System: The MIT Collaboratorium. Directions and Implications of Advanced Computing; Conference on Online Deliberation (DIAC-2008/OD2008). Klein, M., Sayama, H., Faratin, P., & Bar-Yam, Y. (2003). The Dynamics of Collaborative Design: Insights From Complex Systems and Negotiation Research. Concurrent Engineering Research & Applications, 11(3), 201–210. Kramer, K., & King, J. (1988). Computer-Based Systems for Coperative Work and Group Decision Making. Computing Surveys, 20(June), 115–146. Lakhani, K. R., & Wolf, R. G. (2005). Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. In J. Feller, B. Fitzgerald, S. Hissam, & K. R. Lakhani (Eds.), Perspectives on Free and Open Source Software. MIT Press. Lowrance, J. D., Harrison, I. W., & Rodriguez, A. C. (2001). Capturing Analytic Thought. First International Conference on Knowledge Capture, 84–91. Luppicini, R. (2007). Review of computer mediated communication research for education. Instructional Science, 35(2), 141–185. Macaulay, L. A., & Alabdulkarim, A. (2005). Facilitation of e-Meetings: State-of-the-Art Review. e-Technology, e-Commerce and e-Service (EEE’05). Moor, A. d., & Aakhus, M. (2006). Argumentation Support: From Technologies to Tools. Communications of the ACM, 49(3), 93. Nisbet, D. (2004). Measuring the Quantity and Quality of Online Discussion Group Interaction. Journal of eLiteracy, 1, 122–139. Pervan, G. P., & Atkinson, D. J. (1995). GDSS research: An overview and historical analysis. Group Decision and Negotiation, 4(6), 475–483. Poole, M. S., Holmes, M., & DeSanctis, G. (1988). Conflict Management and Group Decision Support Systems. Proceedings from Proceedings of Computer Supported Cooperative Work. Powell, A., Piccoli, G., & Ives, B. (2004). Virtual teams: a review of current literature and directions for future research. ACM SIGMIS Database, 35(1), 6–36. Rahwan, I. (2008). Mass argumentation and the semantic web. Journal of Web Semantics, 6(1), 29–37. Reagan-Cirincione, P. (1994). Improving the accuracy of group judgment: a process intervention combining group facilitation, social judgment analysis, and information technology. Organizational Behavior and Human Decision Processes, 58(2), 246–270. Roberts, J. E. F. F. R. E. Y., Hann, I. L.-H. O. R. N., & Slaughter, S. A. N. D. R. A. (2006). Understanding the Motivations, Participation and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects. Management Science, 52(7), 984–999. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science, 311(5762), 854–856. Schulz-Hardt, S., Frey, D., Lüthgens, C., & Moscovici, S. (2000). Biased information search in group decision making. Journal of Personality and Social Psychology, 78(4), 655–669. Shrager, J., Billman, D., Convertino, G., Massar, J. P., & Pirolli, P. (2010). Soccer Science and the Bayes Community: Exploring the Cognitive Implications of Modern Scientific Communication. Topics in Cognitive Science, 2(1), 53–72. Shum, S. B., Liddo, A. D., Iandoli, L., & Quinto, I. (2012). A Debate Dashboard to Support the Adoption of Online Knowledge Mapping Tools. VINE Journal of information and Knowledge Management Systems. Shum, S. J. B., Selvin, A. M., Sierhuis, M., Conklin, J., & Haley, C. B. (2006). Hypermedia Support for Argumentation-Based Rationale: 15 Years on from gIBIS and QOC. In A. H. Dutoit,
Attention-Mediating Deliberation Metrics
473
R. McCall, I. Mistrik, & B. Paech (Eds.), Rationale Management in Software Engineering. Springer-Verlag. Smallman, H. S. (2008). JIGSAW – Joint Intelligence Graphical Situation Awareness Web for collaborative intelligence analysis. In M. P. Letsky, N. Warner, S. Fiore, & C. A. P. Smith (Eds.), Macrocognition in Teams: Theories and Methodologies (pp. 321–337). Aldershot, UK.: Ashgate. Spatariu, A., Hartley, K., & Bendixen, L. D. (2004). Defining and Measuring Quality in Online Discussions. The Journal of Interactive Online Learning, 2(4). Steenbergen, M. R., Bachtiger, A., Sporndli, M., & Steiner, J. (2003). Measuring political deliberation: a discourse quality index. Comparative European Politics, 1(1), 21–48. Stromer-Galley, J. (2007). Measuring deliberation’s content: A coding scheme. Journal of Public Deliberation, 3(1), 12. Sunstein, C. R. (2006). Infotopia: How Many Minds Produce Knowledge. Oxford University Press. Surowiecki, J. (2005). The Wisdom of Crowds. Anchor. Tapscott, D., & Williams, A. D. (2006). Wikinomics: How Mass Collaboration Changes Everything. Portfolio Hardcover. Trénel, M. (2004). Measuring the quality of online deliberation. Coding scheme 2.4. Social Science Research Center Berlin, Germany. Available at: www. wz-berlin. de/online-mediation/files/ publications/quod_2_4. pdf. Tversky, A., & Kahneman, D. (1974). Judgments under uncertainty. Heuristics and biases. Science, 185(4157), 1124–1131. Viegas, F. B., Wattenberg, M., & Dave, K. (2004). Studying cooperation and conflict between authors with history flow visualizations. SIGCHI conference on Human factors in computing systems. Walton, D. N. (2005). Fundamentals of Critical Argumentation (Critical Reasoning and Argumentation). Cambridge (MA): Cambridge University Press. Walton, D. N., & Krabbe, E. C. W. (1995). Commitment in dialogue: Basic concepts of interpersonal reasoning. Albany, NY: State University of New York Press.