Automatic for the People. • Relationships. – Sheryl Crow recorded Tuesday Night
Music Club. • Attributes. – Tuesday Night Music Club was recorded in 1993 on ...
What is a Database? • Structure – A.k.a. the schema – (word means: “proposed arrangement”)
10: Databases and GIS
• Data • Within the structure we can define classes of items, their properties and the relationships between them. • Data is then entered into the database to conform with the schema CL1 2002/3-10
1
CL1 2002/3-10
Example - Structure • • • • •
2
Example - Data • Artists
Schema: Media catalogue Instance: My record collection Classes: Artists, works Relationships: Artists record works etc. Attributes:
– Sheryl Crow – R.E.M.
• Works – Tuesday Night Music Club – Automatic for the People
• Relationships
– Artists have names, category, nationality … – Works have titles, dates, media (CD/MD) …
– Sheryl Crow recorded Tuesday Night Music Club
• Attributes – Tuesday Night Music Club was recorded in 1993 on CD
CL1 2002/3-10
3
CL1 2002/3-10
Database Integrity • Consider: – Automatic for the People – Up – Up!
R.E.M. REM REM
• Variations produce spurious duplication and lack of association (a search on R.E.M. will miss “Up!”) • This database is not an accurate reflection of the world it represents: it lacks integrity • As the database is updated it will diverge more and more from reality • Therefore, adding a new item must not invalidate a valid database. CL1 2002/3-10 5
4
DBMS • DataBase Management System • Enforce integrity – Data input and validation
• • • • •
Enforce authorisation Allow multiple simultaneous users Form and execute queries Generate reports Note: MS Access is a DBMS, not a database CL1 2002/3-10
6
1
Relational database model
Sample relations (tables) Artists
• Developed by E.F.Codd, 1970 • Only one structure in DB: the relation • The relation is a table with properties: – Every relation must have a unique name – Every column therein must have a unique name – Duplicate rows are not allowed • Enforced by requiring a unique primary key
– Each cell must contain a single value CL1 2002/3-10
AlbumID
AlbumTitle
ArtistID
1 2 3
Up Automatic for the People Tuesday Night Music Club
1 1 2
Albums ArtistID
ArtistName
ArtistType
1
R.E.M.
Band
2
Sheryl Crow
7
Relational Database
• Restrict (“filter” in Excel) – Returns rows according to criteria
• Project – Returns columns according to criteria
• Join (several types) – Returns one relation from two
• Union – Merge two compatible relations
• Intersection – Returns rows common to two relations 9
CL1 2002/3-10
HerAlbums
Join Albums with Artists
Union MyAlbums
(Natural) join of Albums and Artists AlbumID 1 2 3
Title Up Automatic for the People Tuesday Night Music Club
ArtistID 1 1 2
8
Selected Relational operators
• Database comprises a set of tables, indexed by unique primary key and interlinked by foreign keys – primary keys of other relations (e.g. ArtistID in the example) • Data manipulation (input, removal, amending and retrieval) based on eight Relational Algebra operators • Origins based in set theory CL1 2002/3-10
Solo Female CL1 2002/3-10
10
HerAlbums AlbumID 4 2 5
Title Dead Letter Office Automatic for the People Sheryl Crow
ArtistID 1 1 2
My Albums
ArtistName R.E.M. R.E.M. Sheryl Crow
ArtistType Band Band Solo Female
(intersection would be AlbumID 2)
AlbumID 1 2 3
Title Up Automatic for the People Tuesday Night Music Club
ArtistID 1 1 2
HerAlbums union MyAlbums
CL1 2002/3-10
11
AlbumID 1 2 3 4 5
Title Up Automatic for the People Tuesday Night Music Club Dead Letter Office Sheryl Crow
ArtistID 1 1 2 1 12 2
2
Remaining operators (reference) • Difference – Return all of relation “A” who are not in “B”
• (Product) – Returns all possible combinations of input rows
• (as mentioned, 3 types of join)
DBMS Operations • Use a query language, e.g. SQL (also handles schema definition) based on relational operators. • Input via direct entry into a table or via a form • Query as text enquiry or via GUI • Output as response to a query – Formatted into a report
• Schema design via GUI and wizards
CL1 2002/3-10
13
CL1 2002/3-10
Data Mining
Data mining techniques
• Discovery of hidden/unexpected patterns of data • A.k.a. Knowledge Discovery • Involves large volumes of data • Useful in making organisational decisions • Origins in Geology and Meteorology • Output may be a model CL1 2002/3-10
• Predictive modelling – ‘train’ model on sample data set – Test model against real data
• Segmentation – Look for clustering in data – Buying patterns -> market segmentation
• Link Analysis – E.g. retail analysis: disposable nappies & beer
• Deviation detection – Defect data in production statistics -> reasons
15
CL1 2002/3-10
Geographical Information Systems (GIS) • • • •
16
Layered • • • • • • • • •
Database, query language Capture mechanism, esp. remote sensing Layered information Spatially aware – Geometry, connectivity – Adjacency, borders
• Behaviour CL1 2002/3-10
14
17
Roads and Streets Land ownership and administrative areas Land Use Land topology (contours, features) Built features Lakes and Waterways Soil type, geology Pipes, conduits and cables Satellite images CL1 2002/3-10
18
3
GIS
Feature representation
• Displays data geographically • Can include proximity
• Features are represented differently depending on scale as we go from small scale (zoomed out) to Large Scale (zoomed in)
– Properties next to proposed incinerator/airport – Value of land in flood plain
– – – – –
• Combinations of data – Areas of multiple deprivation
• Continuous data e.g. contours • Discrete data e.g. location of phone masts CL1 2002/3-10
19
Sample queries
– Now change route and try again
• Display land use round proposed incinerator • Show area of city within flood plains • Results displayed as map rather than table
20
• Electricity company laying line of pylons – Existing cables – Land ownership – Land use – areas of natural beauty – Soil type
• Plots optimal route 31
Key Points - databases • • • • •
CL1 2002/3-10
Selective use of layers
• Show: administrative areas where house value < £50000 AND house ownership < 25% AND car ownership < 50% AND families-per-address > 6 • Sum value of all parcels of land within 25 metres of the route of a proposed road.
CL1 2002/3-10
Missing entirely … Simple representation … Complex representation Some large scale features disappear (e.g. topography) Colours and line thicknesses change
CL1 2002/3-10
32
Key points – GIS
Vocabulary: schema, relations, DBMS … Relational model and how it is applied Database integrity Selected operators and what they do Data mining
• Database which knows about spatial relationships • Layered data • Visualisation • Combining spatial + statistical/census info – E.g. Value of land in flood plain
• Feature representation when zooming CL1 2002/3-10
33
CL1 2002/3-10
34
4