File organization: is a method of arranging ... DB. • Stores records in a file in a
collection of disk pages. • Keeps track of pages allocated to each ... Cost model:.
CSCI-GA.2433-001 Database Systems
Lecture 9: File Organization and Indexing Mohamed Zahran (aka Z)
[email protected] http://www.mzahran.com
Organizes data carefully to support fast access to desired subsets of records. Query Optimization and Execution
DBMS Layers
Relational Operators Files and Access Methods
Buffer Management Disk Space Management
DB
File organization: is a method of arranging the records in a file when the file is stored on disk. A relation is typically stored as a file of records.
Tuples/Records Query Optimization and Execution Relational Operators Files and Access Methods
Buffer Management Disk Space Management
DB
Relations Files Pages Blocks Sectors
• Stores records in a file in a collection of disk pages • Keeps track of pages allocated to each file. • Tracks available space within pages allocated to the file.
Do you think it is a good idea to sort the records stored in a file? Why?
Example: Suppose we have a relation with fields: name, age, and salary. How will do we sort it?
Index
What is an index?
Records
Index • It is:
– a data structure – a pointer (called data entry in the textbook) to a data record – organized based on search key
• Three alternatives to indices and data record interaction
– Put the data record with the index – Store a record ID in the index to point to the data record – Store a list of record IDs of data record with the same search key value
Special Case: Clustered Indexes Definition: The ordering of data records is the same as, or close to, the ordering of some index. Why is it important? Reduces the cost of using an index to answer a range of search queries. But: Too expensive to maintain when the data is updated. If data records cannot be kept sorted, how can we speed-up the search?
Very Un-special case: Heap File! • The simplest file structure • It is an unordered file • Records are stored in a random order across the pages of the file
Hash-Based Indexing • Use a hash function h(r) where r is a field value • The output of h(r) points to a bucket. • Bucket = primary page plus zero or more overflow pages • The buckets contain or pairs. • Hash-based indexes are best for equality selections cannot support range searches
Hash-Based Indexing
Hash- based indexes
Index points to data records
Tree-Based Indexing
Example: Find all data entries with 2410 GROUP BY E.dno
•
•
If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. Clustered E.dno index may be better!
Examples • To retrieve Emp records with age=30 AND sal=4000, an index on would be better than an index on age or an index on sal. • If condition is: 20