Collaborative Learning in Networks. And Web-Based Experiments. Duncan Watts
. Yahoo! Research. Page 2. Outline of Talk. • Introduce substantive problem of.
Collaborative Learning in Networks And Web-Based Experiments
Duncan Watts Yahoo! Research
Outline of Talk • Introduce substantive problem of collaborative learning in networks • Describe D ib a recentt web-based bb d experiment that examines a simple version i off the th generall problem bl • Discuss potential for web-based experiments in general, and potential for “experimental macro-sociology”
Complex Problems • Many problems in science, business, and engineering are “complex” complex in the sense that they exhibit: – Multiplicity of potential solutions • In any given competitive situation, many designs/strategies/procedures are conceivably optimal
– Interdependence among different parts of a solution • Changing the value of one parameter can have different effects depending on values of other parameters p
• Complexity is often represented abstractly as a “fitness landscape:” a mapping between the parameters of a potential solution sol tion and its corresponding “fitness” (Kauffman, 93; Levinthal, 97) – “Simple” problems have smooth fitness landscapes, with a single peak – “Complex” problems have “rugged” landscapes with many peaks, separated by valleys
Exploration vs. Exploitation •
Successfully navigating a rugged fitness landscape requires some balance between exploitation of known solutions solutions, and exploration for potentially better solutions (March 1991) – Too much exploitation leads to suboptimal long-run performance – Too much exploration is costly and forgoes short-run short run advantages of exploitation
•
Exploration-Exploitation tradeoff is standard element of all complex optimization algorithms in CS, Stat Mech – MCMC algorithms, g , Markov decision processes, p , genetic g algorithms, g , etc.
•
Here we are less interested in how to solve complex problems optimally (in an algorithmic sense) than how people/organizations actually y solve them – Unclear what “real” fitness landscapes look like – Unclear what the problems solvers know about the landscapes they are navigating
Exploration-Exploitation Tradeoff Arises in Many Forms •
Rational Search (e.g. Radner) – N projects with unknown payoffs distributions – Must choose between learning current distribution vs. exploring new ones
•
Boundedly Rational Search – Like above, but with cognitive biases • Satisficing (Simon) • Prospect Theory (Kahneman and Tversky)
•
Organizational Learning (March 1991) – Refinement vs. Invention – Basic B i S Science i vs. D Development l t
•
Evolutionary Models (in particular, of organizations) – Variation vs. Selection (Hannan and Freeman)
Collaborative Learning In Communication Networks • In many contexts, contexts the exploration-exploitation exploration exploitation tradeoff is complicated by the presence of other problem solvers – Potentially helpful because individuals can learn from the experience of others, thus improving collective learning – Potentially y harmful because learning g may y also lead the collective may converge on a suboptimal solution
• Information flow within an organization therefore likely to impact its problem solving abilities (Leavitt (Leavitt, 1951; Lazer and Friedman, 2007; Mason et al. 2008, )
Individual vs. Collective Learning • In a network context,, not all individuals are equal q – Individuals with “central” or “bridging” positions stand to gain from exposure to novel information, complementary ideas, or brokerage g opportunities pp ((Granovetter,, Burt))
• Again, unclear whether differences in network positions across individuals are good or bad for collective performance f – Conceivably central or bridging actors can produce efficiencies that are shared by all – But opportunity to gain relative advantage may also lead to conflict between individual and collective interests
The Current Project j • Substantive Questions – How do individuals collaboratively solve (certain kinds of) “complex” problems? – How does the structure of the communication network between them contribute to their collective performance? – How does individual position in the network relate to • IIndividual di id l strategy t t and d performance? f ? • Collective performance?
• Our approach mostly experimental – Seek to exploit recent advances in web-based experimentation, esp. Amazon’s Mechanical Turk (AMT)
• But have also verified experiments with simulations – Not discussed today
Screenshot of Experiment
Generating The Fitness Landscape • Background generated with 4-octave 2D Perlin noise – procedure for generating pseudo random noise, pseudo-random noise used to create realistic looking landscapes
• Add Added d tto a unimodal i d l 2D Gaussian function with mean chosen uniformlyy at random and SD = 3 • Normalized so maximum points = 100
Generating g the Networks • Goal: 16 16-node node fixed fixed-degree degree graphs with extreme statistics • Start with fixed-degree g random g graphs p – All players have same amount of information – Only position in graph can affect success
• Rewire to increase or decrease some graph feature – Maximum, Average, Variance – Betweenness, Closeness, Clustering, Network Constraint – Ensuring connected graph
• Stop when no rewiring improves feature • Repeat R t 100 times, ti kkeep maximal i l graph h
Network Features of Interest • Clustering: – Average probability that two neighbors are themselves connected (local density)
• Betweenness – Number of shortest paths that pass through node
• Closeness – Average A shortest h t t path th to t allll other th nodes d
• Network Constraint
1 nc(i) (i) = 2 – Redundancy R d d with ith neighbors i hb k
∑
j ∈N (i )
(1 +
∑
q∈N (i ),q ≠ j
wqj )2
Communication Networks
Greatest Average Betweenness
Smallest Average Betweenness
Smallest Maximum Closeness
Greatest Maximum Closeness
Greatest Average Clustering
Smallest Average Clustering
Greatest Maximum Betweenness
Greatest Variance in Constraint
All Individuals in all networks have 3 neighbors, All Individuals have the same view of the world
Experiments • For each session, 16 subjects j are recruited from Amazon’s Mechanical Turk – Standing panel alerted previous day – Accept p work & read instructions – Sit in “waiting room” until enough players have joined
• Each session comprises 8 games – One for each network topology
• Each game runs for 15 rounds – 100 x100 grid – Relative R l ti di dimensions i off peak k and d llandscape d adjusted dj t d such h that peak is found sometimes, but not always
• 171 out of total of 232 games (25 sessions) used – 61 games removed because player dropped out
Preview of Results 1. Network structure affects individual search strategy 2. Individual search strategy affects group success 3. Network structure also affects group success directly, via information diffusion 4. Individual and group performance are in tension
Network structure affects individual search strategy • Networks differ in amount of exploration l ti • Related to clustering
Network structure affects individual search strategy • Higher clustering Ð Higher probability of neighbors guessing in identical location • More neighbors guessing in identical location Ð Higher probability of copying
Individual search strategy affects group success • More players copying each other (i.e., fewer exploring) in current round Ð L Lower probability b bilit off finding peak on next round
Individual search strategy affects group success • No significant differences in % of games in which peak was found • But pattern similar to differences in exploration
Communication Networks
Greatest Average Betweenness
Smallest Average Betweenness
Smallest Maximum Closeness
Greatest Maximum Closeness
Greatest Average Clustering
Smallest Average Clustering
Greatest Maximum Betweenness
Greatest Variance in Constraint
Network structure affects group success
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Diffusion of Best Solution
Individual and group performance are in tension • More likely to find peak with more players exploring (= fewer players copying) • Wh When peakk iis ffound, d large difference in points (nearly 2x income)
Individual and group performance are in tension BUT: • Per player, y higher g points on average with “copying” strategy t t • Æ free-riding problem / social dilemma
Individual and group performance are in tension
• Most successful node in structured networks ~ performance f off median node in unstructured networks
Individual Performance Is Combination of Individual Position and Collective Performance
G t t Maximum Greatest M i Betweenness B t
G t t Average Greatest A Betweenness B t
G t t Maximum Greatest M i Closeness Cl
Summaryy of Results • Network structure affects individual search strategy – Clustering increases copying (exploitation)
• Individual search strategy affects group success – More copying means lower probability of finding maximum
• Network structure affects group success directly – Networks with lower average path length spread information more quickly
• Individual and group performance are in tension – Individuals can improve own relative performance by freeriding on others’ exploration – Best position in structured networks as good as average position in unstructured networks
What About “Real” Problem Solving? •
Our setup is artificial in several respects: – Real-world problems unlike to comprise just two dimensions • NK model may be better here (Kauffman, Levinthal, Lazer/Friedman)
– Unclear how to interpret p ruggedness gg of fitness landscape p • NK model also has this problem
– Networks do not resemble organizational networks • No hierarchy, division of labor etc.
– Incentives are also unrealistic • Little strategic play, no competition, etc.
• •
One should therefore be cautious inferring g much about managerial or strategic questions from our results Nevertheless, similar findings have emerged in other studies – March ((1991)) identified similar “dilemma” for individuals – Lazer and Friedman found that short path length -Æ rapid convergence
•
Also platform should generalize to more realistic scenarios
Web-Based Experimentation • Our project adds to a small but growing body of web webbased experiments – – – –
Salganik, Dodds, and Watts (2006) Mason and Watts (2009) Paolacci et al (2010), Horton et al. (2010) Suri and Watts (2010)
• Major recent innovation has been use of standing panel to run synchronous experiments • Allows for three major advances over physical labs – Possible to scale up to much larger networks – Speedup of hypothesis hypothesis-testing testing loop – Selection of individuals based on past play
Experimental Macro-Sociology? •
“Can you put an army in a lab?” (Zelditch, 1969) – At the time, the answer was “No” – Led to emphasis on small-group research
•
The web is removing this constraint – Synchronous play and sampling are also getting resolved – Also growing evidence that people “play” similarly on the web as they do in physical labs (Suri and Watts ‘10, Paolacci ‘10)
•
•
In the near future, will increasingly see large scale, networked, lab-style experiments in which micro- and macro- variables can be manipulated and observed Still unclear how many experiments of potential interest could be conducted on the web – Most things that a real army does are still not online – Same true for many problems of interest to economic and organizational sociology – But this should be viewed as a challenge
Requisite Book Plug… g • Whyy are social phenomena so complex and unpredictable? • Why do we still feel we can predict and control th ? them? • What could we do better? – In business, government, science www.everythingisobvious.com
Thank you! Questions?
Backup Slides Agent Based Simulation and Comparison with Experiments
Agent Based Model, Based on Real Agents • Extract individual playing strategies • Build agent-based simulation where agents play like “real” players • Explore E l problem bl space tto di discover new h hypotheses th – More complex landscapes – Different composition of individual strategies – Larger networks
• Return to experiments to test hypotheses
Exploration vs. Exploitation Probability of exactly copying / guessing within 5 units from neighbor given maximum has not yet been found
Simulation Details • Fit linear model to users’ probability of copying by round • Obtain distribution of slopes & intercepts • On each round: – If agent or neighbors have score = 100, copy – If agent or neighbors have 60 < score < 100, guess within 3 units of score – Else, Else copy highest score with probability based on intercept, slope & round or explore uniformly at random
• 100 simulated sessions ((800 simulated games)
Finding g the maximum Simulation • 100 simulated sessions (800 simulated games) • Maximum is found by at least one agent g in 59% of games [63%] • Maximum is found by all agents in 49% of games [56%] Human Players
Frequency of Finding Maximum
H Human E Experiments i t
Si l ti Simulations
Networks Affect Convergence Time Simulation
• •
Human Players
Replicates findings from experimental work Suggests model of player behavior is reasonable
Networks Affect Convergence Time
Simulation Human Players
• Replicates findings from experimental work • Suggests model of player behavior is reasonable
Individual Performance Is Combination of Individual Position and Collective Performance
G t t Maximum Greatest M i Betweenness B t
G t t Average Greatest A Betweenness B t
G t t Maximum Greatest M i Closeness Cl
• Individuals in centralized networks perform well, relative to their peers • All individuals in centralized networks perform poorly relative to individuals in decentralized networks • Corroborates experimental results
Next Steps • Explore problem space to discover new hypotheses – More complex payoff functions – Larger networks – Different composition of individual strategies
• Realistic model, but may be over-fit – Point threshold & imitation radius learned from known features of payoff functions – Copying py g / round depends p on N rounds
• Return to experiments to test hypotheses