10: Databases and GIS What is a Database? Example - Structure ...

3 downloads 43 Views 125KB Size Report
Automatic for the People. • Relationships. – Sheryl Crow recorded Tuesday Night Music Club. • Attributes. – Tuesday Night Music Club was recorded in 1993 on ...
What is a Database? • Structure – A.k.a. the schema – (word means: “proposed arrangement”)

10: Databases and GIS

• Data • Within the structure we can define classes of items, their properties and the relationships between them. • Data is then entered into the database to conform with the schema CL1 2002/3-10

1

CL1 2002/3-10

Example - Structure • • • • •

2

Example - Data • Artists

Schema: Media catalogue Instance: My record collection Classes: Artists, works Relationships: Artists record works etc. Attributes:

– Sheryl Crow – R.E.M.

• Works – Tuesday Night Music Club – Automatic for the People

• Relationships

– Artists have names, category, nationality … – Works have titles, dates, media (CD/MD) …

– Sheryl Crow recorded Tuesday Night Music Club

• Attributes – Tuesday Night Music Club was recorded in 1993 on CD

CL1 2002/3-10

3

CL1 2002/3-10

Database Integrity • Consider: – Automatic for the People – Up – Up!

R.E.M. REM REM

• Variations produce spurious duplication and lack of association (a search on R.E.M. will miss “Up!”) • This database is not an accurate reflection of the world it represents: it lacks integrity • As the database is updated it will diverge more and more from reality • Therefore, adding a new item must not invalidate a valid database. CL1 2002/3-10 5

4

DBMS • DataBase Management System • Enforce integrity – Data input and validation

• • • • •

Enforce authorisation Allow multiple simultaneous users Form and execute queries Generate reports Note: MS Access is a DBMS, not a database CL1 2002/3-10

6

1

Relational database model

Sample relations (tables) Artists

• Developed by E.F.Codd, 1970 • Only one structure in DB: the relation • The relation is a table with properties: – Every relation must have a unique name – Every column therein must have a unique name – Duplicate rows are not allowed • Enforced by requiring a unique primary key

– Each cell must contain a single value CL1 2002/3-10

AlbumID

AlbumTitle

ArtistID

1 2 3

Up Automatic for the People Tuesday Night Music Club

1 1 2

Albums ArtistID

ArtistName

ArtistType

1

R.E.M.

Band

2

Sheryl Crow

7

Relational Database

• Restrict (“filter” in Excel) – Returns rows according to criteria

• Project – Returns columns according to criteria

• Join (several types) – Returns one relation from two

• Union – Merge two compatible relations

• Intersection – Returns rows common to two relations 9

CL1 2002/3-10

HerAlbums

Join Albums with Artists

Union MyAlbums

(Natural) join of Albums and Artists AlbumID 1 2 3

Title Up Automatic for the People Tuesday Night Music Club

ArtistID 1 1 2

8

Selected Relational operators

• Database comprises a set of tables, indexed by unique primary key and interlinked by foreign keys – primary keys of other relations (e.g. ArtistID in the example) • Data manipulation (input, removal, amending and retrieval) based on eight Relational Algebra operators • Origins based in set theory CL1 2002/3-10

Solo Female CL1 2002/3-10

10

HerAlbums AlbumID 4 2 5

Title Dead Letter Office Automatic for the People Sheryl Crow

ArtistID 1 1 2

My Albums

ArtistName R.E.M. R.E.M. Sheryl Crow

ArtistType Band Band Solo Female

(intersection would be AlbumID 2)

AlbumID 1 2 3

Title Up Automatic for the People Tuesday Night Music Club

ArtistID 1 1 2

HerAlbums union MyAlbums

CL1 2002/3-10

11

AlbumID 1 2 3 4 5

Title Up Automatic for the People Tuesday Night Music Club Dead Letter Office Sheryl Crow

ArtistID 1 1 2 1 12 2

2

Remaining operators (reference) • Difference – Return all of relation “A” who are not in “B”

• (Product) – Returns all possible combinations of input rows

• (as mentioned, 3 types of join)

DBMS Operations • Use a query language, e.g. SQL (also handles schema definition) based on relational operators. • Input via direct entry into a table or via a form • Query as text enquiry or via GUI • Output as response to a query – Formatted into a report

• Schema design via GUI and wizards

CL1 2002/3-10

13

CL1 2002/3-10

Data Mining

Data mining techniques

• Discovery of hidden/unexpected patterns of data • A.k.a. Knowledge Discovery • Involves large volumes of data • Useful in making organisational decisions • Origins in Geology and Meteorology • Output may be a model CL1 2002/3-10

• Predictive modelling – ‘train’ model on sample data set – Test model against real data

• Segmentation – Look for clustering in data – Buying patterns -> market segmentation

• Link Analysis – E.g. retail analysis: disposable nappies & beer

• Deviation detection – Defect data in production statistics -> reasons

15

CL1 2002/3-10

Geographical Information Systems (GIS) • • • •

16

Layered • • • • • • • • •

Database, query language Capture mechanism, esp. remote sensing Layered information Spatially aware – Geometry, connectivity – Adjacency, borders

• Behaviour CL1 2002/3-10

14

17

Roads and Streets Land ownership and administrative areas Land Use Land topology (contours, features) Built features Lakes and Waterways Soil type, geology Pipes, conduits and cables Satellite images CL1 2002/3-10

18

3

GIS

Feature representation

• Displays data geographically • Can include proximity

• Features are represented differently depending on scale as we go from small scale (zoomed out) to Large Scale (zoomed in)

– Properties next to proposed incinerator/airport – Value of land in flood plain

– – – – –

• Combinations of data – Areas of multiple deprivation

• Continuous data e.g. contours • Discrete data e.g. location of phone masts CL1 2002/3-10

19

Sample queries

– Now change route and try again

• Display land use round proposed incinerator • Show area of city within flood plains • Results displayed as map rather than table

20

• Electricity company laying line of pylons – Existing cables – Land ownership – Land use – areas of natural beauty – Soil type

• Plots optimal route 31

Key Points - databases • • • • •

CL1 2002/3-10

Selective use of layers

• Show: administrative areas where house value < £50000 AND house ownership < 25% AND car ownership < 50% AND families-per-address > 6 • Sum value of all parcels of land within 25 metres of the route of a proposed road.

CL1 2002/3-10

Missing entirely … Simple representation … Complex representation Some large scale features disappear (e.g. topography) Colours and line thicknesses change

CL1 2002/3-10

32

Key points – GIS

Vocabulary: schema, relations, DBMS … Relational model and how it is applied Database integrity Selected operators and what they do Data mining

• Database which knows about spatial relationships • Layered data • Visualisation • Combining spatial + statistical/census info – E.g. Value of land in flood plain

• Feature representation when zooming CL1 2002/3-10

33

CL1 2002/3-10

34

4