An Information Modeling Approach to Improve Quality of User ...

4 downloads 0 Views 2MB Size Report
Aug 8, 2014 - Increasingly, organizations turn to data produced outside org. boundaries. ▫ Social media, crowdsourcing facilitate user- generated content ...
An Information Modeling Approach to Improve Quality of User-generated Content Roman Lukyanenko Faculty of Business Administration Memorial University of Newfoundland August 8, 2014 FACULTY OF BUSINESS ADMINISTRATION

WWW.BUSINESS.MUN.CA

Outline • Background and motivation • Research Problem  Information Quality (IQ) in User-generated content (UGC)  Limitations of existing approaches

• Proposed approach  Contributor-centric, use-agnostic IQ  Conceptual modeling as a factor of crowd IQ

• Theoretical propositions • Empirical evidence  Impact of conceptual modeling on accuracy and information loss

• Principles for modeling UGC  Demonstration of the proposed principles  Impact of conceptual modeling on dataset completeness

• Contributions and future research

Quality of UGC

2

Background and Motivation • Traditionally, IS are used in well-controlled information production settings • Increasingly, organizations turn to data produced outside org. boundaries  Social media, crowdsourcing facilitate usergenerated content (UGC): • Various forms of digital information produced by members of the general public – often casual content contributors (the crowd) – rather than by employees or others closely associated with an organization

Quality of UGC

3

Harnessing UGC • UGC supports decision making and operations  Businesses • better understand customers, develop products

 Health care • telemedicine, doctor reviews

 Government • public services, disaster management

 Scientific research • citizen science

Quality of UGC

4

Example: Citizen Science

eBird.org

Quality of UGC

5

Research Problem: IQ in UGC • Major challenge in making effective use of UGC is crowd IQ  E.g., accuracy of a citizen science observation on eBird.org

Gura 2013 Nature Quality of UGC

6

Limitations of existing approaches

• Traditional approach is ‘fitness for use’ • Popular approaches to crowd IQ  Educate and train online users  Provide data collection instructions  “Clean” data post-hoc

• Focus on data consumers  Dissuade contributors from providing data  Prevent contributors from communicating important situational knowledge

Quality of UGC

7

Proposed: Contributor-centric, useagnostic IQ

• IQ from contributors’ perspective  Crowd IQ: “the extent to which stored information represents the phenomena of interest to data consumers, as perceived by information contributors” • Use-agnostic, contributor-centric • Cognizant of data consumers

• How to design IS sensitive to contributors?  Rethink approaches to conceptual modeling Quality of UGC

8

Proposed: Conceptual modeling as a factor of crowd IQ

• Conceptual modeling  “describing some aspects of the physical and social world around us for the purposes of understanding and communication” (Mylopoulos 1992)

Quality of UGC

9

Research questions • Research question 1  How does conceptual modeling affect IQ in UGC settings?

• Research question 2  What conceptual modeling principles can be developed to improve crowd IQ?

Quality of UGC

10

Connection between modeling and IQ

• Traditionally: modeling facilitates intended uses via predefined abstractions (e.g., classes) • Ontology, cognition  World is made of unique individuals • Class-based models capture common rather than unique attributes of individuals

 Classes are observer-dependent and use-driven • Crowd contributors and data consumers may not share classes in a domain

Quality of UGC

11

Illustration of theoretical propositions Great Egret Bird

Incorrect guess → ↓ accuracy

Snowy Egret Tree White FishIbis

P1

Accuracy, completeness undermined when classes are unfamiliar to contributors

P2

Information loss increases when classes are familiar to contributors

Quality of UGC

Avoid participating → ↓ dataset completeness

Any choice (incl. correct) → ↓ instance completeness (attribute loss)

12

Impact on Accuracy and Information Loss

• Three laboratory experiments  Potential data contributors, biology non-experts

• How to determine “familiar” classes for the anonymous crowds?  Psychology: basic-level categories

• Two class-based models:  “Useful” (biological species)  “Familiar” (basic-level categories)

Quality of UGC

13

Experiment 1: Free form • N=247 non-experts (141 female, 106 male)  24 full-color images of plants and animals

 Condition 1: Classify it: What is it?  Condition 2: Describe it using attributes / features

• Free-form responses

Quality of UGC

14

Experiment 1: Hypotheses Using classes useful to data consumers

Accuracy

Using classes familiar to data contributors

Instance Completeness

• H-1.1: Accuracy. Non-experts will classify instances with fewer errors at the basic level than at the species-genus level

• H-1.2: Instance Completeness. Non-experts will describe instances in terms of attributes subordinate to the basic level  grey beak, yellow belly vs. can fly, has feathers Quality of UGC

15

Results: H-1.1: Accuracy Useful classes (e.g., great egret):  141 total  27 (19.15%) correct

Using classes useful to data consumers

Familiar classes (e.g., bird)  1550 total  1523 (98.26%) correct

avg. p < 0.001*

Accuracy

Using classes familiar to crowd contributors * Based on Fisher’s exact test; Sig with Bonferroni correction Quality of UGC

16

Results: H-1.2: Instance Completeness

• Analysis of attributes:  6,429 attributes are below basic level • E.g., gray beak, deformed fin, looks sick

 685 attributes at the basic level • E.g., can fly, has feathers

Using classes useful to data consumers

Using classes familiar to crowd contributors

avg. p < 0.001*

Instance Completeness

* Based on χ2 test; Sig with Bonferroni correction Quality of UGC

17

Experiment 2: Fixed-choice • Direct test of data entry with predefined classes • N=77 non-experts • Task: select class from predefined list

Quality of UGC

18

Experiment 2: Materials “Useful” Condition

“Familiar” Condition

What is it? Select one:

What is it? Select one:

o Arctic Tern o Bonaparte's Gull o Caspian Tern o Common Tern o Herring Gull o Iceland Gull o Killdeer o Parasitic jaeger o Pomarine jaeger o I don’t know o Other ___

o Animal o Common Tern o Iceland Gull o Killdeer o Seagull o Shorebird o Tern o Waterfowl o Bird o I don’t know o Other ___

Cognitive psychology

Items in bold are correct Quality of UGC

19

Results: H-2.1: Accuracy Useful classes (e.g., great egret):  271 total  73 (24.84%) correct

Using classes useful to data consumers

Familiar classes (e.g., bird)  375 total  277 (73.88%) correct

avg. p < 0.01*

Accuracy

Using classes familiar to crowd contributors * Based on χ2 test; Sig with Bonferroni correction Quality of UGC

20

Experiment 3 • Impact of imposing structure on accuracy  Challenge to select predefined classes for crowds

• N=66 business students (non-experts) “Useful” Condition

“Familiar” Condition

What is it? Select one:

What is it? Select one:

o Arctic Tern o Bonaparte's Gull o Caspian Tern o… o Common Tern o I don’t know o Other ___

o Animal o Tern o… o Waterfowl o Bird o I don’t know o Other ___

Free-form What is it? Write one:

Items in bold are correct Quality of UGC

21

Results: H-3.2: Accuracy “Useful”

“Familiar”

Free-form

% Accuracy

35.5

66.7

77.3

% Basic-level

0.4

33.3

52.2

Class-based model with familiar options

avg. p < 0.05* Accuracy

Free-form data collection



Accuracy does not necessarily increase when “familiar” options are included in a predefined schema

* Based on χ2 test

Quality of UGC

22

Findings from Lab. Studies • Conceptual modeling - important factor for crowd IQ • Dilemma of modeling in UGC:    

Quality of UGC

Accuracy declines when classes are driven by data consumer needs Accuracy increases when classes are familiar to contributors But using such classes undermines instance completeness (i.e., results in significant attribute loss) Potential for lower accuracy when using “familiar” classes

23

Principles of modeling UGC • Modeling UGC should be based on user and use-invariant representations • Instances should be the primary construct in UGC • Attributes can be attached to an instance • Classes can be attached to an instance Instances 0…* 1…* 0….* 1…* Classes Attributes 0…* 0…* Quality of UGC

24

Demonstration of the principles • NLNature – a real citizen science IS

Scientists then use data to:

Observe wildlife

Quality of UGC

www.nlnature.com

25

Demonstration of the principles

Quality of UGC

26

Demonstration of the principles

Quality of UGC

27

Impact on dataset completeness

• Field Experiment using NLNature  Class-based condition (species-only)  Instance-based condition

• Hypotheses:  H-4.1 More instances observed in the instancebased condition  H-4.2 More instances of novel (i.e., not present in existing schema) species in the instancebased condition

Quality of UGC

28

Instance-based condition

Quality of UGC

29

Species-only condition

Quality of UGC

30

Results: H-4.1 • Period: June to Dec (6 months) No of users in condition

No of observations

Class-based

42

87

Instance-based

39

390

Condition

Class-based model

avg. p < 0.01* Dataset completeness -instances stored

Instance-based model * Based on permutation test Quality of UGC

31

Results: H-4.2 • Period: June to Dec (6 months) Condition

No of users in condition

No of new species

Class-based

42

7

Instance-based

39

119

Class-based model

avg. p < 0.01* Dataset completeness -novel species stored

Instance-based model * Based on permutation test Quality of UGC

32

Findings from Field Exp. • Conceptual modeling affects dataset completeness  Prevailing class-based approaches may result in lower dataset completeness  Existing IS may preclude discovery of new classes of instances

• Potential value of the proposed instancebased approach for modeling UGC

Quality of UGC

33

Contributions • Impact of conceptual modeling on information quality  Prevailing class-based modeling may have detrimental impact on IQ (Lukyanenko et al. 2014) Antecedents of IQ: "a significant gap in the IS research” (Petter et al. 2013, p. 30)

• Contributor-oriented IQ • Instance-based conceptual modeling  More effective ways to harness UGC Exemplar of an “[e]xciting ..work” exploring “new technological environments” (Goes MISQ Editorial 2014, p. vi)

Quality of UGC

34

Future work • Deeper understanding of the impact of modeling on IQ:  Information loss  Interaction between classification and familiarity • Contributor-oriented IQ management  Impact on decision making (in-progress: study with data consumers)

• Beyond citizen science  Corporate settings  Health IS, telemedicine

Quality of UGC

35

Future work (cont’d) • Extending instance-based approach to conceptual modeling  How to combine it with traditional modeling?  Do we need “instance-based” grammars? • Lukyanenko and Parsons 2013a

 How to better manage attribute-based data collection  Implications for user interfaces • Lukyanenko and Parsons 2013b

Quality of UGC

36

References Goes, P. B. (2014). Editor's comments: design science research in top information systems journals. MIS Quarterly, 38(1), iii-viii. Gura, T. (2013). Citizen science: amateur experts. Nature, 496(7444), 259-261. Lukyanenko, R. and Parsons J. (2013a). Is Traditional Conceptual Modeling Becoming Obsolete? In W. Ng, V.C. Storey, and J. Trujillo (Eds.), International Conference on Conceptual Modeling (ER 2013), Lecture Notes on Computer Science Vol. 8217, Springer, Heidelberg. pp. 61-73. Lukyanenko, R. and Parsons, J. (2013b). Reconciling theories with design choices in design science research. In J. vom Brocke et al. (Eds.), International Conference on Design Science Research in Information Systems and Technologies (DESRIST 2013), Lecture Notes on Computer Science Vol. 7939, Springer Berlin / Heidelberg. pp. 165-180. Lukyanenko, R., Parsons, J., & Wiersma, Y. F. (2014). The IQ of the Crowd: Understanding and Improving Information Quality in Structured UserGenerated Content. Information Systems Research, 25(4), 669-689. Petter, S., DeLone, W. and McLean, E. 2013. Information Systems Success: The Quest for the Independent Variables, JMIS, 29 (4), pp. 7-62, p. 30

Quality of UGC

THANK YOU! FACULTY OF BUSINESS ADMINISTRATION

WWW.BUSINESS.MUN.CA

Suggest Documents