Spatio-Temporal Data Warehouses:

4 downloads 1982 Views 18MB Size Report
Data Warehouses (DW) address requirements of decision-making users. ⇒ Business .... French- speaking countries. Data from French-speaking countries and their neighbors ... Also referred to as slowly changing dimensions [Kimball 96] ... number ... m( ) m( ). Pollutant name type. loadLimit ... Geo location. C a le n d a.
Spatio-Temporal Data Warehouses: Current Status and Research Issues ´ Esteban ZIMANYI Department of Computer & Decision Engineering (CoDE) Universit´e Libre de Bruxelles [email protected]

Invited Talk 19th International Symposium on Temporal Representation and Reasoning Leicester, UK 12-14 September 2012 1

y Background

Contents

• Data Warehouses and OLAP • Nested Relational Calculus • Temporal Aggregation _ Spatio-Temporal Data Warehouses _ From Spatio-Temporal Data to Trajectory Data _ Conclusions & Future Work

2

Data Warehouses _ Operational databases (OLTP) are not suitable for data analysis • Contain detailed data • Do not include historical data • Perform poorly for complex queries due to normalization _ Data Warehouses (DW) address requirements of decision-making users ⇒ Business Intelligence _ A data warehouse is a collection of subject-oriented, integrated, nonvolatile, and time-varying data to support data management decisions _ Online analytical processing (OLAP) allow decision-making users to perform interactive analysis of data _ Data Warehouses typically store huge amounts of data 3

Typical Data Warehouse Architecture Internal sources

Data staging

Metadata OLAP tools

Operational databases

ETL process

Enterprise data warehouse

OLAP server

Reporting tools

Statistical tools External sources

Data marts Data mining tools

Data sources

Back-end tier

Data warehouse tier

OLAP tier

Front-end tier

4

Time date season ... Calendar

District name population area ...

Geo location

Conventional Data Warehouses: Example Province name majorActivity capital ...

River Month month ...

name flow ...

Water Pollution load

Station

Quarter

name ... Pollutant

Year year ...

name type loadLimit ...

Categories

quarter ...

Category name description ...

5

Nested Relational Calculus District

Province

id name population area ... province

id name majorActivity capital ...

_ Simple query: Name and population of districts of the Antwerp province {d.name, d.population | District(d) ∧ ∃p (Province(p)∧ d.province = p.id ∧ p.name = ‘Antwerp’)} _ Aggregation query with group filtering: Total population by province provided that it is greater than 100,000 {p.name, totalPop | Province(p)∧ totalPop = sum2 ({d.id, d.population | District(d) ∧ d.province = p.id})∧ totalPop > 100, 000 } _ Corresponds to an SQL query with the GROUP BY and HAVING clauses 6

Temporal Aggregation [Gamper et al. 2009] _ Instantaneous Temporal Aggregation • To each time instant t is associated an aggregation group valid at t • Aggregation function applied to each group ⇒ a single aggregate value at t • Span Temporal Aggregation: Similar, at a coarser granularity _ Moving Window Temporal Aggregation • To each time instant t is associated a window w = [t−w, t+w0 ] and an aggregation group valid at w • Aggregation function applied to each group ⇒ a single aggregate value at t • Multi-Dimensional Temporal Aggregation: Similar, at a coarser granularity _ Equivalent to temporal group composition and temporal partition composition in [Vega L´opez et al. 2005] 7

Temporal Aggregation v

Instantaneous Temporal Aggregation

f

time

t v

Span Temporal Aggregation

f

t1

time

tn i v

Moving Window Temporal Aggregation

f

t-w

t

time

t+w’

Temporal Table

MultiDimensional Temporal Aggregation

v f

t1

tn

time

i

8

OLAP Queries: Instantaneous Temporal Aggregation Pollutant name type loadLimit ...

Time date season ...

Water Pollution

Station name ...

load

_ By pollutant and day count the number of stations that exceeded the load limit {p.name, t.date, countExc | Pollutant(p) ∧ Time(t) ∧ countExc = count1 ({w.station | WaterPollution(w) ∧ w.pollutant = p.id ∧ w.time = t.id ∧ w.load > p.loadLimit})}

9

OLAP Queries: Moving Window Temporal Aggregation Pollutant name type loadLimit ...

Time date season ...

Water Pollution

Station name ...

load

_ By station, pollutant, and day, give the 3-day moving average of load {s.name, p.name, t.date, 3dMovAvg | Station(s) ∧ Pollutant(p) ∧ Time(t) ∧ 3dMovAvg = avg2 ({w.id, w.load | WaterPollution(w) ∧ w.station = s.id ∧ w.pollutant = p.id ∧ ∃t1 (Time(t1 ) ∧ w.time = t1 .id ∧ 0 ≤ t.date − t1 .date ∧ t.date − t1 .date ≤ 3) }) }

10

Contents _ Background

y Spatio-Temporal Data Warehouses • Spatial Data Warehouses • Temporal Data Warehouses • Spatio-Temporal Data Warehouses • Support of Continuous Fields _ From Spatio-Temporal Data to Trajectory Data _ Conclusions & Future Work

11

Spatio-Temporal Data Warehouses _ Several proposals aim at extending DW and OLAP with spatial/temporal features _ No commonly agreed definition of what is a spatio-temporal DW and what functionality it should support _ Proposed solutions vary considerably in the kind of data that can be represented and the kind of queries that can be expressed _ [Vaisman & Zim´anyi 2009] defined • Conceptual framework for spatio-temporal DWs using an extensible type system • Taxonomy of several classes of queries of increasing expressive power extending the tuple relational calculus with aggregated functions [Klug 1982] _ This provides the underlying basis for implementing spatio-temporal DWs 12

A Taxonomy for Spatio-Temporal Data Warehouses

Temporal Dimensions

TOLAP

Spatial TOLAP

GIS

OLAP

SOLAP

Moving Data Types

Spatio-Temporal Data

Spatio-Temporal OLAP Spatio-Temporal TOLAP

13

Representing Spatial Data

_ Two complementary ways to represent spatial data _ Object-based (vector) model: Objects of interest are stored with their spatial extent _ Space-based (raster) model: Space is represented as a continuum, to each point in space is associated a value of a phenomenon of interest 14

Time date season ... Calendar

District name population area ...

Geo location

Spatial Data Warehouses: Example Province name majorActivity capital ...

River Month month ...

Quarter

name flow ...

Water Pollution commonArea load

Station name ...

Pollutant Year year ...

name type loadLimit ...

Categories

quarter ...

Category name description ...

15

¨ Spatial Data Types [Guting et al. 2000] _ Spatial types: point, points, line, region

point

points

line

region

_ These types have an associated set of operations Class Predicates Set Operations Aggregation Numeric Distance and Direction

Operations isempty, =, ,, intersects, inside, , before, touches, attached, overlaps, on border, in interior intersection, union, minus, crossings, touch points, common border min, max, avg, center, single no components, size, duration, length, area, perimeter distance, direction 16

Spatial OLAP (SOLAP) Queries Time date ...

Pollutant name type loadLimit ...

River name ...

Water Pollution

Station name ...

commonArea load

_ For stations located on the Schelde river, give the average content of lead in the last quarter of 2008 {s.name, avgLead | Station(s) ∧ ∃r, ∃p (River(r) ∧ r.name = ‘Schelde’ ∧ intersects(r.geometry, s.geometry) ∧ Pollutant(p) ∧ p.name = ‘Lead’ ∧ avgLead = avg2 ({w.id, w.load | WaterPollution(w) ∧ w.station = s.id ∧ w.pollutant = p.id ∧ ∃t (Time(t) ∧ w.time = t.id ∧ t.date ≥ 1/10/2008 ∧ t.date ≤ 31/12/2008) }) )} 17

Spatial Aggregation: Space Based v f

Local Spatial Aggregation

y

v f

Span Spatial Aggregation

y

x

v f

x

Moving Window Spatial Aggregation

v f

MultiDimensional Spatial Aggregation

18

Spatial Aggregation: Object Based Data from Poland Germanspeaking countries

f Denmark

v

Local Spatial Aggregation

United Kingdom Ireland

v

Poland

Netherlands

f

Germany Belgium Czech Republic

Luxembourg

Data from German-speaking countries

Slovakia

Frenchspeaking countries

Austria France

Span Spatial Aggregation

Hungary

Switzerland Slovenia Croatia

v

Bosnia & HerzegovinaSerbia Italy

f Montenegro FYROM Albania

Portugal

Spain

Moving Window Spatial Aggregation

Data from Serbia and its neighbors

Greece

v f

MultiDimensional Spatial Aggregation

Data from French-speaking countries and their neighbors

19

Spatial Aggregation Queries Time date ...

Pollutant name type loadLimit ...

District name ...

Water Pollution

Station name ...

commonArea load

_ Union of the geometries of the districts in which the average content of lead in the last quarter of 2008 was greater than the load limit union({d.geometry | District(d) ∧ ∃p (Pollutant(p) ∧ p.name = ‘Lead’∧ avg2 ({w.id, w.load | WaterPollution(w) ∧ w.district = d.id ∧ w.pollutant = p.id ∧ ∃t (Time(t) ∧ w.time = t.id ∧ t.date ≥ 1/10/2008 ∧ t.date ≤ 31/12/2008) }) > p.loadLimit) } 20

Temporal Data Warehouses

commonArea load

LS

Pollutant name type ...

Categories

Water Pollution

LS LS

Category name description ...

VT loadLimit

_ Arise when evolution of dimension instances is supported • Also referred to as slowly changing dimensions [Kimball 96] _ Temporality types: Valid time (VT), Transaction Time (TT), Bitemporal Time (BT), Lifespan (LS), DW loading time (DWLT) _ Temporality represented using moving types moving(α) where α is a base type • Lifespan of Pollutant is of type moving(bool) • Temporal attribute loadLimit is of type moving(real) • Temporal relationship between Pollutant and Category is of type moving(id) 21

¨ Moving Types [Guting et al. 2000] _ Capture the evolution over time of base types and spatial types _ Obtained by applying a constructor moving(α) • A value of type moving(point) is a continuous function f : instant → point _ Operations on moving types Class Projection to Domain/Range Interaction with Domain/Range Rate of change Lifting

Operations deftime, rangevalues, locations, trajectory, routes, traversed, inst, val atinstant, atperiods, initial, final, present, at, atmin, atmax, passes derivative, speed, turn, velocity (all new operations inferred)

_ Lifting: Operations of moving types generalize those of the nontemporal types • A distance function with signature moving(point)×moving(point) → moving(real) calculates the distance between two moving points _ Semantics: result is computed at each time instant using the non-lifted operation 22

Temporal OLAP (TOLAP) Queries month ...

LS

Pollutant ...

VT loadLimit

Calendar Time date ...

Water Pollution

Categories

Month

Category

LS LS

name ...

District name ...

load

_ By district, pollutant category, and month, give the average load {d.name, c.name, m.month, avgLoad | District(d) ∧ Category(c) ∧ Month(m)∧ avgLoad = avg2 ({w.id, w.load | WaterPollution(w) ∧ ∃t, ∃p (Time(t) ∧ Pollutant(p) ∧ w.district = d.id ∧ w.time = t.id ∧ t.month = m.id ∧ w.pollutant = p.id ∧ val(initial(atperiods(p.category, t))) = c.id )})} _ Here we consider the category of a pollutant valid at the day of the measure _ Alternative: consider the current category of a pollutant 23

Spatio-Temporal OLAP (ST-OLAP) Time date season ...

Calendar

name type loadLimit ...

Categories

Pollutant

Pollutant type Cloud m( ) number ...

Geo location Air Pollution commonArea m( ) load

District name population area ...

_ Arise when spatial objects evolve over time _ Evolution captured by moving types moving(α) where α is a spatial type

24

Spatio-Temporal OLAP (ST-OLAP) Queries name ...

Time date ...

Calendar

Pollutant

Month month ...

Pollutant type Cloud m( ) number ...

Air Pollution commonArea m( ) load

District name population ...

_ By district and month, give the total number of persons affected by polluting clouds {d.name, m.month, totalNo | District(d) ∧ Month(m) ∧ totalNo = area(union({ traversed(p.commonArea) | AirPollution(p) ∧ p.district = d.id ∧ ∃t (Time(t) ∧ t.month = m.id)}))/ area(d.geometry) × d.population} 25

Spatial TOLAP (S-TOLAP) Month month ...

District name ...

LS

Pollutant ...

VT loadLimit

Calendar Time date season ...

Water Pollution load

Station name ...

_ Arise when there are spatial objects/attributes and temporal dimensions _ By pollutant and month, give the average load in stations of the Namur district, if it is larger than the load limit during the reported month {p.name, m.month, avgLoad | Pollutant(p) ∧ Month(m) ∧ ∃d, ∃s (District(d)∧ d.name = ‘Namur’ ∧ Station(s) ∧ inside(s.geometry,d.geometry) ∧ avgLoad = avg2 ({w.id, w.load | WaterPollution(w) ∧ ∃t (Time(t)∧ w.station = s.id ∧ w.time = t.id ∧ t.month = m.id ∧ w.pollutant = p.id)}) ∧ avgLoad > val(initial(atperiods(p.loadLimit, m.month))) )} 26

Spatio-Temporal TOLAP (ST-TOLAP) LS

Pollutant name ...

Time date ...

VT loadLimit Pollutant type LS Cloud m( ) number ...

Air Pollution commonArea m( ) load

District name population ...

_ Most general case: there are moving geometries and temporal dimensions _ Number of days when the Gent district was under at least one cloud of carbon monoxide (CO) with a load larger than the load limit valid when the cloud appeared duration(union({t.date | Time(t) ∧ ∃a, ∃d, ∃c, ∃p (AirPollution(a) ∧ District(d) ∧ Cloud(c) ∧ Pollutant(p) ∧ a.time = t.id ∧ a.district = d.id ∧ d.name = ‘Gent’ ∧ a.cloud = c.id ∧ c.pollutant = p.id ∧ p.name = ‘CO’∧ a.load > val(atinstant(p.loadLimit, inst(initial(at(c.lifespan, true))))) )})) 27

Spatio-Temporal Aggregation: Space Based (1) Local Spatial Aggregation

v f

f

y

f

y

y

x

x

x t

t1

t2

t3

Instant Temporal Aggregation

Moving Window Spatial Aggregation

f

v

v

v

28

Spatio-Temporal Aggregation: Space Based (2) v f

f

y

Local Spatial Aggregation

f

y

y

x

x

x t

t-w

t

t+w’

Moving Window Temporal Aggregation

Moving Window Spatial Aggregation

f

v

v

v

29

Continuous Fields _ Non-temporal fields describe phenomena that change continuously in space • land elevation, soil type, . . . _ Represented by field types field(α) where α is a base type _ Temporal fields describe phenomena that change continuously in space and time • temperature, precipitation, . . . _ Represented by field types moving(field(α)) where α is a base type • moving(field(real)) defines a continuous function f : instant → (point → real) _ Operators for moving fields as before _ Field levels have a geometry attribute of type field(α) or moving(field(α)) _ Field dimensions are not connected to a fact relationship 30

Time date season ... Calendar

District name population area ...

Geo location

Spatial Data Warehouses with Continuous Fields: Example Province name majorActivity capital ...

SoilType f() classifSystem date characteristics ...

River Month month ...

Quarter

name flow ...

Water Pollution commonArea load

Station name ...

Pollutant Year year ...

name type loadLimit ...

Categories

quarter ...

Category name description ...

Elevation f() units minValue maxValue ...

Temp f(, ) units startDate endDate ...

31

Field Types [Vaisman & Zim´anyi 2009] _ Capture the variation in space of base types _ Obtained applying a constructor field(α) • A value of type field(real) is a continuous function f : point → real _ Operations on field types Class Projection to Domain/Range Interaction with Domain/Range Lifting Rate of change Aggregation operators

Operations defspace, rangevalues, point, val atpoint, atpoints, atline, atregion, at, atmin, atmax, defined, takes,concave, convex, flex (all new operations inferred) partialder x, partialder y integral, area, surface, favg, fvariance, fstdev

_ Lifting applies to fields • The + operator with signature α × α → α generalized by allowing any argument to be a field as in field(α) × field(α) → field(α) _ Semantics: result is computed at each point in space using the non-lifted operation 32

Spatio-Temporal OLAP with Continuous Fields SoilType f() District name ...

Time date ...

Water Pollution commonArea load

classifSystem date characteristics ...

Pollutant name ...

_ For districts having at least 30% of their surface of clay soil, give the average load of lead on February 1st, 2009 {d.name, avgLead | District(d) ∧ ∃p, ∃s (Pollutant(p) ∧ p.name = ‘Lead’ ∧ SoilType(s) ∧ area(defspace(atregion(at(s.geometry, ‘Clay’), d.geometry)))/ area(d.geometry) ≥ 0.3 avgLead = avg2 ({w.id, w.load | WaterPollution(w) ∧ ∃t (Time(t) ∧ w.district = d.id ∧ w.time = t.id ∧ t.date = 1/2/2009 ∧ w.pollutant = p.id) }) )} 33

Spatio-Temporal OLAP with Temporal Continuous Fields Month month ...

River name ... Temp f(, )

Calendar Time date ...

Water Pollution

units startDate endDate ...

commonArea load

_ For each river and month, give a field computing the average temperature of the month at each point in the river {r.name, m.month, temp | River(r) ∧ Month(m) ∧ first = min({t.date | Time(t) ∧ t.month = m.id} ∧ last = max({t.date | Time(t) ∧ t.month = m.id} ∧ temp = avg({ atperiods(atregion( f.geometry, r.geometry), range(first, last)) | Temp( f ) }) } 34

Spatio-Temporal Data Warehouses: Conclusions _ Spatio-temporal DWs result from combining GIS, OLAP, and temporal data types • Temporal data types model geometries that evolve over time (moving objects) and evolving (slowly changing) dimensions • Field data types model continuous fields that change in space • Temporal fields obtained by composing field and temporal data types _ Our taxonomy for spatio-temporal OLAP queries • characterizes features required by spatio-temporal DWs • allows to classify different work in the literature _ Our framework is at a conceptual level: implementation issues were omitted • From abstract to concrete models: e.g., grid and TIN models for continuous fields • Optimization issues: index structures, pre-aggregation, query optimization, . . . 35

Contents _ Background _ Spatio-Temporal Data Warehouses

y From Spatio-Temporal Data to Trajectory Data • Trajectory Construction • Visual Analytics • Trajectory Mining _ Conclusions & Future Work

36

Typical Data Warehouse Architecture (Reminder) Internal sources

Data staging

Metadata OLAP tools

Operational databases

ETL process

Enterprise data warehouse

OLAP server

Reporting tools

Statistical tools External sources

Data marts Data mining tools

Data sources

Back-end tier

Data warehouse tier

OLAP tier

Front-end tier

37

Movement Data _ Typically, a temporal sequence of position records (id, x, y, t) _ Collected nowadays in rapidly growing amounts due to the development of tracking technologies • GPS, RFID, WiFi, mobile phones, banking transactions, sensor networks, . . . _ Complexities • Huge amounts: number of moving entities, number of records • Geographic space with its structure and complexity • Time domain at multiple granularities, linear and also multiple nested and overlapping cycles • Data properties: imprecision (errors in location, time, attributes), irregular and/or sparse sampling, missing values, . . . • Large diversity of types of movement: constrained vs. unconstrained, vehicles/persons/animals, 2D vs 3D . . . • Real-world: ill-defined, application-dependent problems Slide kindly lend by Gennady Andrienko

38

Examples of movement data: migration of white storks T k ofExamples Tracks f 35 storks t k during d ofi Movement 8 years, about b Data: t2 2,000 000Migration positions iti

of White Storks Animals

Tracks of 35 storks during 8 years, about 2,000 positions ⇒ Animals Slide kindly lend by Gennady Andrienko

4

39

Visual Analytics of Movement

Examples of movement data: cars in Milan 

Vehicles; Examples of Movement Data: Cars in Milan 2 075 216 position 2,075,216 iti records d off 17 17,241 241 cars d during i 1 week k network-constrained

2,075,216 position records of 17,241 cars during 1 week ⇒ Vehicles, network-constrained 5 Slide kindly lend by Gennady Andrienko

40

amples of movement data: children in Amsterdam

GPS tracks of 303Examples schoolchildren playing an of Movement Data: educational game in Amsterdam, about 57,000 points

Children in Amsterdam Pedestrians

6

GPS tracks of 303 school children playing an educational game in Amsterdam, about 57,000 points ⇒ Pedestrians Slide kindly lend by Gennady Andrienko

41

Visual Analytics of Movement

s of movement data: air traffic

of1 Movement Data: Air Traffic 3D movement position records of 17,851Examples planes during day resolution: 1-5 minutes

9

427,652 position records of 17,851 planes during 1 day ⇒ 3D movement Slide kindly lend by Gennady Andrienko

42

From Movement to Trajectories [Parent et al. 2013] _ Movement is continuous and never-ending

_ A trajectory is a finite meaningful movement segment

_ Trajectory data is semantically richer than spatio-temporal data Slide kindly lend by Stefano Spaccapietra

43

Trajectory Construction

Slide kindly lend by Stefano Spaccapietra

44

Cleaning Raw Data _ Input: Raw data

_ Output: Cleaned raw data

_ Methods: filtering, smoothing, outlier removal, missing point interpolation, mapmatching, data compression, etc.

Slide kindly lend by Stefano Spaccapietra

45

Segmentation into Trajectories _ Input: Cleaned raw data

_ Output: Trajectories

_ Methods: various segmentation algorithms, based on spatial gaps, temporal gaps, time intervals, time series, etc. Slide kindly lend by Stefano Spaccapietra

46

Trajectory Structuring: Stop & Move Episodes _ Input: Trajectories

_ Output: Structured trajectories (Stop, Move)

_ Methods: various stop identification algorithms, based on velocity, density, etc.

Slide kindly lend by Stefano Spaccapietra

47

Velocity-Based Stop Identification

Slide kindly lend by Stefano Spaccapietra

48

Semantic Enrichment _ Input: Structured trajectories

_ Output: Semantic trajectories

_ Methods: relate structured components (begin, end, stops, moves, . . .) to application knowledge (i.e., meaningful objects) Slide kindly lend by Stefano Spaccapietra

49

Visual Analytics [Andrienko & Andrienko 2012] _ Aims at helping people in • distilling relevant nuggets of information from large amounts of data • understanding the connections among relevant information • gaining insight from data _ Focuses on the division of labour between humans and machines _ Goal: computational power amplifies human perceptual and cognitive capabilities _ Visual representations: most effective means to convey information to human’s mind, prompt human cognition and reasoning _ Combines interactive visualizations with automated analysis techniques such as • database processing • data mining algorithms • statistics • geographical analysis methods Slide kindly lend by Gennady Andrienko

50

Interactive Transformation of Time

_ Space-time cube on the right uses time transformation in respect to daily cycle _ This technique enables interpretation of repeated trajectories Slide kindly lend by Gennady Andrienko

51

Trajectory Summarization

_ Left: major flows of tourists in Germany, according to panoramio.com photos _ Right: major traffic flows in Milano, based on trajectories of about 20,000 cars Slide kindly lend by Gennady Andrienko

52

Scalable Trajectory Clustering

_ Major clusters of trajectories extracted from the same data set (a week of 20K cars in Milano) presented by the trajectory summarization method Slide kindly lend by Gennady Andrienko

53

Similarity of Situations

_ Hourly traffic situations in Milano (spatial distributions of counts of cars) are clustered; colors are assigned to clusters according to the similarity of situations _ The colors are projected to 2D time plot (bottom right) showing similarities of situations over 7 days x 24 hours Slide kindly lend by Gennady Andrienko

54

Visualization of Trajectory Attributes

_ For a cluster of trajectories, attribute values (speeds) are displayed on the trajectory wall display _ This enables investigation of traffic jams Slide kindly lend by Gennady Andrienko

55

Trajectory Data Mining [Giannotti et al. 2011] _ The process of analyzing large amounts of trajectory data to identify unsuspected or unknown patterns that might be of value to an application _ A particular step in the knowledge discovery process _ Some key research questions • Which spatio-temporal patterns are useful abstractions of mobility data? • How to classify trajectories according to specific behavior? • How to interpret in a meaningful way the discovered patterns? • How to make such analysis privacy preserving in a measurable way?

Slide kindly lend by Chiara Renso

56

What are Trajectory Patterns (1) ? _ Frequent sets/sequences of places visited

Slide kindly lend by Chiara Renso

57

What are Trajectory Patterns (2) ? _ Groups of objects moving together

Slide kindly lend by Chiara Renso

58

Trajectory Clustering _ Cluster analysis: Find groups where objects in a group are similar (or near) to one another and different from (or distant from) the objects in other groups

_ Some research questions: • Which distance between trajectories? • Which kind of clustering? • What does a cluster mean? ∗ A representative trajectory? Slide kindly lend by Chiara Renso

59

The M-Atlas tool _ A knowledge discovery support environment for trajectory data _ Data, patterns, and background knowledge need to be progressively combined _ T-patterns, T-clusters, etc. are the basic primitives within a Data Mining Query Language (DMQL), supporting the entire knowledge discovery process _ T-patterns, T-clusters, etc., once mined from a trajectory dataset, can be stored and later used for query, mining, and interpreting in a progressive way

Slide kindly lend by Chiara Renso

60

Looking for Movement Patterns: From Where to Where

Select trajectories exiting the city from the centre towards North-West Slide kindly lend by Chiara Renso

61

Discovering Typical Routes

T-clustering divides trips based on route similarity Slide kindly lend by Chiara Renso

62

Focus on the Three Larger Clusters

One group (red) goes straight to NW, other follow alternative routes Slide kindly lend by Chiara Renso

63

Temporal Analysis on Each Group

A small group of commuters in the morning and a larger one in the afternoon Slide kindly lend by Chiara Renso

64

Temporal Analysis on Each Group

A group of commuters in the afternoon follows an alternative route: are they smart? Slide kindly lend by Chiara Renso

65

Conclusions _ A large number of applications in a variety of domains are interested in analyzing movement of some type of objects or phenomena _ Nowadays, a huge amount of movement data that is being continuously captured _ Current approaches to process these massive data sets are innapropriate _ A complex and pipelined process is needed for transforming raw samples to insightful knowledge _ This process is by definition iterative, semi-automatic, application-dependent, and multi-disciplinary _ This raises many theoretical and implementation issues • A forthcoming book [Renso et al. 2013] surveys some of them _ Solutions proposed so far are just the first steps in the direction of mobility data understanding _ Exciting research domain, huge potential implications in our lives and in our planet 66

References (1) _ N. Andrienko, G. Andrienko. Visual Analytics of Movement: an Overview of Methods, Tools, and Procedures. Information Visualization, 2012. Forthcoming. _ A. Klug. Equivalence of Relational Algebra and Relational Calculus Query Languages Having. Aggregate Functions. J. ACM 29(3): 699-717, 1982. _ J. Gamper, M.H. B¨ohlen, C.S. Jensen. Temporal Aggregation. Encyclopedia of Database Systems: 2924-2929, Springer, 2009. _ F. Giannotti et al. Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB Journal, 20:695-719, 2011. _ R.H. G¨uting et al. A foundation for representing and querying moving objects. ACM Trans. Database Syst. 25(1): 1-42, 2000. _ R.H. G¨uting, M. Schneider. Moving Objects Databases. Morgan Kaufmann, 2005. 67

References (2) _ E. Malinowski, E. Zim´anyi. Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications. Springer, 2008. _ C. Parent et al. Semantic Trajectories: Modeling and Analysis. ACM Computing Surveys, 45(4), 2013. Forthcoming. _ C. Renso, S. Spaccapietra, E. Zim´anyi (Eds.). Mobility Data: Modeling, Management, and Understanding. Cambridge Press, 2013. Forthcoming. _ A. Vaisman, E. Zim´anyi. What Is Spatio-Temporal Data Warehousing? DaWaK: 9-23, 2009. _ A. Vaisman, E. Zim´anyi. A Multidimensional Model Representing Continuous Fields in Spatial Data Warehouses. ACM GIS: 168-177, 2009. _ I.F. Vega L´opez, R.T. Snodgrass, B. Moon. Spatiotemporal Aggregate Computation: A Survey. IEEE Trans. Knowl. Data Eng., 17(2): 271-286, 2005. 68

Spatio-Temporal Data Warehouses: Current Status and Research Issues ´ Esteban ZIMANYI

Questions ? 69