Lecture 9: File Organization and Indexing

66 downloads 104 Views 531KB Size Report
File organization: is a method of arranging ... DB. • Stores records in a file in a collection of disk pages. • Keeps track of pages allocated to each ... Cost model:.
CSCI-GA.2433-001 Database Systems

Lecture 9: File Organization and Indexing Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com

Organizes data carefully to support fast access to desired subsets of records. Query Optimization and Execution

DBMS Layers

Relational Operators Files and Access Methods

Buffer Management Disk Space Management

DB

File organization: is a method of arranging the records in a file when the file is stored on disk. A relation is typically stored as a file of records.

Tuples/Records Query Optimization and Execution Relational Operators Files and Access Methods

Buffer Management Disk Space Management

DB

Relations Files Pages Blocks Sectors

• Stores records in a file in a collection of disk pages • Keeps track of pages allocated to each file. • Tracks available space within pages allocated to the file.

Do you think it is a good idea to sort the records stored in a file? Why?

Example: Suppose we have a relation with fields: name, age, and salary. How will do we sort it?

Index

What is an index?

Records

Index • It is:

– a data structure – a pointer (called data entry in the textbook) to a data record – organized based on search key

• Three alternatives to indices and data record interaction

– Put the data record with the index – Store a record ID in the index to point to the data record – Store a list of record IDs of data record with the same search key value

Special Case: Clustered Indexes Definition: The ordering of data records is the same as, or close to, the ordering of some index. Why is it important? Reduces the cost of using an index to answer a range of search queries. But: Too expensive to maintain when the data is updated. If data records cannot be kept sorted, how can we speed-up the search?

Very Un-special case: Heap File! • The simplest file structure • It is an unordered file • Records are stored in a random order across the pages of the file

Hash-Based Indexing • Use a hash function h(r) where r is a field value • The output of h(r) points to a bucket. • Bucket = primary page plus zero or more overflow pages • The buckets contain or pairs. • Hash-based indexes are best for equality selections cannot support range searches

Hash-Based Indexing

Hash- based indexes

Index points to data records

Tree-Based Indexing

Example: Find all data entries with 2410 GROUP BY E.dno





If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. Clustered E.dno index may be better!

Examples • To retrieve Emp records with age=30 AND sal=4000, an index on would be better than an index on age or an index on sal. • If condition is: 20