big fat fragmented

Introduction

The Data Svodín - Southwest Slovakia Lengyel Culture settlement 915 features (122 graves) 420 stratigraphic relations (189 ambiguous) 34583 recorded finds (11161 diagnostic) Superimposed by Eneolithic, Bronze Age, Iron Age and Medieval settlements.

Introduction

Big Data with Big Issues Ambiguity Heterogeneity Fragmentarity Multimodality Description

Ambiguity

Unclear relations

Ambiguity of relations between find complexes We know there is a superposition but we do not know the sequence This information should not be discarded

Ambiguity Chronological phasing of 3 features based od stratigraphy A disturbs B ⇒ A < B

Phase

All possible solutions Feature Solution A B C no. 3 2 1 1 3

2

3

2

2

1

2

3

2

1

3

4

3

1

2

5

3

1

3

6

B overlaps C ⇒ B ≠ C

Probability of features dating to phases

No. of possible solutions

⇒

Feature A B C

1 0 4 1

Phase 2 2 2 2

3 4 0 3

⇒

Phase Feature 1 2 3 A 0% 33% 67% B 67% 33% 0% C 17% 33% 50%

=

Solutions with feature dating to phase All possible solutions

A

B

C

Fragmentarity

Missing connections Fragmentarity resulting from depositional and postdepositional transformations Absence of evidence ≠ evidence of absence Use external / prior knowledge to fill in the gaps

Fragmentarity Determining superpositions based on prior knowledge Post structure 1 < Feature A, B, C Post structure 1 ? Feature D, E, F, G

C

F E

G

D

B

Post structure 1 A

Fragmentarity Determining superpositions based on prior knowledge Post structure 1 ⇒ House 1

Feature D, E ⇒ Pit 1

⇒

Feature D = Feature E House 1 ≠ Feature F, G, Pit 1

House 1 C

F Pit 1

D

E

G

Post structure 1 ≠ Feature D, E, F, G B

Post structure 1 A

Heterogeneity

Diverse entities

Heterogeneity of data caused by different sizes and purposes of settlement features (e.g. pits, graves, ditches). Our model must reflect the complexity of the processes, which resulted in the observed evidence. What we consider random depends on the context.

Heterogeneity

Statistical hypothesis testing Parametric tests – assume knowledge of underlying distribution e.g. Students T-test Nonparametric tests – based on simulated random distribution of data Permutation and bootstrap tests

Multimodality

Ups and downs Development of styles can have multiple peaks in time Models used by ordination methods must reflect this fact

Multimodality Unimodal distributions of ceramic attributes in time Svodín:

Troy:

0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0

Ware 662 0.6

frequency ratio

frequency ratio

Type G1 0.5 0.4 0.3 0.2 0.1 0

Multimodality Multimodal distributions of ceramic attributes in time Svodín:

Troy:

Type Tasse-1

Ware 616 0.25

0.1 0.08 0.06 0.04 0.02 0

frequency ratio

frequency ratio

0.12

0.2 0.15 0.1 0.05 0

Multimodality Unimodal distributions of frequencies in time Mode A < Mode B

⇒

Type A

Mode A

Type A < Type B

Mode B

Type B

Multimodality Multimodal distributions of frequencies in time Mode A < Mode B

Mode A

⇒ Type A < Type B ERROR Using methods assuming unimodal distributions (Seriation, Correspondence Analysis, Principal Component Analysis, etc.)

Mode B

Type A

Type B

Multimodality Multimodal distributions of frequencies in time Mode A2 > Mode B1

Mode A1

⇒

Type A Mode A2

Type A ≮ Type B

Using methods assuming multimodal distributions Mode B1

Type B Mode B2

Multimodality Multimodal distributions of frequencies in time Mode A2 < Mode B1

⇒

Mode A1

Type A Mode A2

Type A < Type B Using methods assuming multimodal distributions

Type B

Mode B1

Mode B2

Description system

Who is who Graphs can be used as a method of storage. Relations between entities hold the most important information about them. Graph databases (sets of Entity – Relation – Entity triplets) are a more natural way to describe archaeological data.

Description system Graph Database

Features, Finds and Interpreted Structures = Vertices Contextual and stratigraphic relations = Edges e.g.: Feature ⇒ Contains ⇒ Find

Feature ⇒ Disturbs ⇒ Feature Vertex

Edge

Vertex

Conclusions • Accept and work with uncertainty. • Clearly define and apply all prior knowledge to fill in the gaps in evidence. • Provide the simplest explanation of the evidence, without ignoring the complexity of the underlying structures. • Use appropriate models for analysis.

[email protected] http://uniba.academia.edu/demjan