Simple Hash Table

5 downloads 189 Views 171KB Size Report
Pournelle, Jerry. 8to1 Stochastic Ave. Brooklyn, NY 09041. ------------------------------ -. Tolkein, J R R. Gold Street. Burbank, CA 90001. -------------------------------.
CS 2604 Minor Project 4

Spring 2005

Simple Hash Table For this project you will implement a simple hash table using closed addressing and a hybrid probe strategy. The hash table will be used here to store structured records, and it should be implemented as a template for reusability (by you on a future project). The hash table must support all the usual functionality, as discussed in class. For this project we will focus primarily on building the initial hash table from a list of records in a file, searching by key value for records in the table, inserting new records to the table, and deleting old records. In order to determine if the hash table organization is correct, you will instrument the insert and search functions to display the probe sequence (indices visited in performing a search). You must use the elfhash function discussed in class. Your hash table must support both the linear and quadratic probe strategies. You should design your implementation so that changes to the hash function and probe strategy are painless. Be careful about infinite loops when probing. Also be careful of the possibility that quadratic probing may fail to find an empty slot. If quadratic probing has not found an empty slot after probing N/2 locations, where N is the number of slots in the hash table, abandon it and apply linear probing, starting at the home slot where the collision occurred. Your program will first read a file of data records and hash each into the table. After building the initial table your program will read a command file and process the commands as specified below. Data Structures: The primary data structures element of this project is a hash table. Your implementation is under the following specific requirements: ƒ

ƒ ƒ ƒ ƒ ƒ

You must encapsulate the hash table as a C++ template. The hash table may use relational operators for record comparisons, or take a user-supplied comparator class as a template parameter. The hash table should not supply a hash function. That should either be implemented as part of the record type, or as a template parameter. On insertion, tombstones should be "recycled". That is, if the insertion search passed any tombstones before finding an empty slot, the new value should be inserted in the first slot that contained a tombstone. The table tombstone should always preserve at least one empty free slot, so a table with 100 slots will store at most 99 data elements. This will simplify an issue in your search logic. The underlying table structure must be a dynamically allocated array. There is no requirement the hash table provide a resizing mechanism. For testing, your hash table should have the ability to display itself to a specified output stream. The records that are stored in the hash table must be implemented using a C++ data class.

Data members of classes should be private, with the sole exception of a slot class used by the hash table, if appropriate. If an error occurs during the parsing of the initial data file, there’s an error in your code. However, your program should still attempt to recover, by “flushing” the current input attempting to recover to the beginning of the next record in the file. The same applies for the command file. Initial Data File Description: Your program must read the initial hash table data from a file named HashData.txt — use of another input file name will almost certainly result in a score of 0. The first line of the initial data file should be ignored. The second line will contain a text label, ending with a colon, followed by whitespace and the number of slots the hash table should have. The third line will contain a text label, ending with a colon, followed by whitespace and a string specifying the primary probe strategy to be used ("linear" or "quadratic"). The fourth line should also be ignored. Last modified: 4/22/2005 10:04 AM

1

CS 2604 Minor Project 4

Spring 2005

The remainder of the file will consist of an arbitrary number of records each of the following format: line 1: line 2: line 3:

a string specifying a name a string specifying a street address a string specifying a city, state, etc.

Each record will be followed by a line of delimiters, as shown in the sample file below. The strings will be of arbitrary length and a newline character will immediately follow each string. This file will not contain any comment lines. No two records will contain the same name (so name is a primary key). A sample initial data file follows: ; Hash table initial data file. ; Hash table size: 23 ; Probe strategy: linear ------------------------------Robinson, Spider Bag End Boulder, CO 83001 ------------------------------Robinson, Kelly 1 Barbarian Way Burbank, CA 90001 ------------------------------Howard, Robert E 123 Amber Avenue Colombo, Sri Lanka ------------------------------Willis, Connie 314 Perimeter Place Lankhmar, Nehwon ------------------------------de Camp, L Sprague 123 Amber Avenue Xylar City, Xylar ------------------------------Pournelle, Jerry 8to1 Stochastic Ave Brooklyn, NY 09041 ------------------------------Tolkein, J R R Gold Street Burbank, CA 90001 ------------------------------LeGuin, Ursula K 900 Vine Blvd Gravity, NM 70887 ------------------------------Card, Orson Scott 123 Amber Avenue Hobbiton, M E ------------------------------Dick, Philip K 1 Martian Blvd Xylar City, Xylar ------------------------------Niven, Larry 1 Barbarian Way Lankhmar, Nehwon ------------------------------Last modified: 4/22/2005 10:04 AM

2

CS 2604 Minor Project 4

Spring 2005

Command File Description: Your program must read commands from a file named HashCommands.txt — use of another input file name will almost certainly result in a score of 0. Lines beginning with a semicolon (‘;’) character are comments; your program will ignore comments. An arbitrary number of comment lines may occur in the command file. Each line of the command file will specify one of the commands described below. Each line consists of a sequence of “tokens” which will be separated by single tab characters. A newline character will immediately follow the final “token” on each line. insert This will cause a record containing the given data to be inserted to the hash table, if possible. For each insert command you must log the indices of the slots that are accessed during the search, followed by "inserted". If the insertion fails, log "duplicate" or simply "not inserted", as appropriate. Note: if quadratic probing is used, it is possible that an insertion will fail even if there is an empty slot in the table. delete This will cause the record containing the given name to be deleted from the hash table, if a matching record occurs there. For each delete command you must log the indices of the slots that are accessed during the search, followed by "deleted". If the insertion fails, log "not found". find This causes the hash table to be searched for a record containing the given name. You must log the indices of the slots that are accessed during the search, followed by "found", or if the insertion fails, by "not found". show[full | empty | tombstones] This causes the hash table to log the indices of all the slots that are in the specified state. If no slots are in that state, log "none". clear This causes the hash table to restore itself to its initial, empty state, switch from its current probe strategy to the other one, and log "table reset". dump This causes the hash table to display itself to the log. This is only for your testing purposes; the command files actually used by the Curator will NOT specify this command, but it may be included in sample data files. exit This causes the hash table application to terminate. The input file is guaranteed to end with an exit command. Legend: in the commands above:

a string of arbitrary length a string of arbitrary length a string of arbitrary length

You may assume that the command file will conform to the given syntax, so syntactic error checking is not required. A sample command file is shown on the following page.

Last modified: 4/22/2005 10:04 AM

3

CS 2604 Minor Project 4

Spring 2005

; Hash table test data: ; ; Check initial table construction: show full show empty show tombstones dump ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Check a simple lookup: ; find Robinson, Spider ; Check some lookups with collisions: ; find Pournelle, Jerry find Dick, Philip K ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Check a simple deletion: ; delete Robinson, Kelly find Robinson, Kelly ; Check table configuration again: ; show full show empty show tombstones ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Reinsert: ; insert Robinson, Kelly 1 Barbarian Way Burbank, CA 90001 ; Check insertion: ; find Robinson, Kelly show tombstones ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Check deletion with collisions: ; delete Pournelle, Jerry delete Dick, Philip K show tombstones find Pournelle, Jerry find Dick, Philip K ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Try deleting again: ; delete Pournelle, Jerry ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Check table one more time: ; show full show empty show tombstones ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Quit: ; exit

Last modified: 4/22/2005 10:04 AM

4

CS 2604 Minor Project 4

Spring 2005

Log File Description: Your program must write its output to a file named HashOut.txt — use of another output file name will undoubtedly result in a score of 0. Since your output for this assignment will be graded automatically, you must adhere to the specified format exactly. The first two lines should contain your name, course and project title, followed by separator line, as shown on the following page. The next two lines should list the size of the hash table, and the number of slots filled after it is build from the initial data file, followed again by a separator line as shown below. Remember that your labels must match the sample output exactly. The remainder of the log file should show the specified output for find, insert and delete commands. A separator line should follow the last of these lines. Here is a sample log corresponding to the input files given earlier: Bill McQuain CS 2604 Hash Table -------------------------------------------------------------------------------#slots: 23 #entries: 11 -------------------------------------------------------------------------------Command: show full Full: 0 1 4 7 8 16 17 18 20 21 22 -------------------------------------------------------------------------------Command: show empty Empty: 2 3 5 6 9 10 11 12 13 14 15 19 -------------------------------------------------------------------------------Command: show tombstones Tombstones: none -------------------------------------------------------------------------------Command: dump Hash table contains 11 records, and 23 slots 0: Name : Dick, Philip K Address: 1 Martian Blvd Phone : Xylar City, Xylar 1: Name : Niven, Larry Address: 1 Barbarian Way Phone : Lankhmar, Nehwon 4: Name : LeGuin, Ursula K Address: 900 Vine Blvd Phone : Gravity, NM 70887 7: Name : Tolkein, J R R Address: Gold Street Phone : Burbank, CA 90001 8: Name : Robinson, Kelly Address: 1 Barbarian Way Phone : Burbank, CA 90001 16: Name : Howard, Robert E Address: 123 Amber Avenue Phone : Colombo, Sri Lanka 17: Name

: Pournelle, Jerry

Last modified: 4/22/2005 10:04 AM

5

CS 2604 Minor Project 4

Spring 2005

Address: 8to1 Stochastic Ave Phone : Brooklyn, NY 09041 18: Name : Card, Orson Scott Address: 123 Amber Avenue Phone : Hobbiton, M E 20: Name : de Camp, L Sprague Address: 123 Amber Avenue Phone : Xylar City, Xylar 21: Name : Willis, Connie Address: 314 Perimeter Place Phone : Lankhmar, Nehwon 22: Name : Robinson, Spider Address: Bag End Phone : Boulder, CO 83001 -------------------------------------------------------------------------------Command: find Robinson, Spider Searching: 22 found -------------------------------------------------------------------------------Command: find Pournelle, Jerry Searching: 16 17 found -------------------------------------------------------------------------------Command: find Dick, Philip K Searching: 21 22 0 found -------------------------------------------------------------------------------Command: delete Robinson, Kelly Searching: 8 deleted -------------------------------------------------------------------------------Command: find Robinson, Kelly Searching: 8 9 not found -------------------------------------------------------------------------------Command: show full Full: 0 1 4 7 16 17 18 20 21 22 -------------------------------------------------------------------------------Command: show empty Empty: 2 3 5 6 9 10 11 12 13 14 15 19 -------------------------------------------------------------------------------Command: show tombstones Tombstones: 8 -------------------------------------------------------------------------------Command: insert Robinson, Kelly 1 Barbarian Way Burbank, CA 90001 Searching: 8 9 inserted -------------------------------------------------------------------------------Command: find Robinson, Kelly Searching: 8 found -------------------------------------------------------------------------------Command: show tombstones Tombstones: none -------------------------------------------------------------------------------Command: delete Pournelle, Jerry Searching: 16 17 deleted -------------------------------------------------------------------------------Command: delete Dick, Philip K Searching: 21 22 0 deleted -------------------------------------------------------------------------------Command: show tombstones Tombstones: 0 17 -------------------------------------------------------------------------------Last modified: 4/22/2005 10:04 AM

6

CS 2604 Minor Project 4

Spring 2005

Command: find Pournelle, Jerry Searching: 16 17 18 19 not found -------------------------------------------------------------------------------Command: find Dick, Philip K Searching: 21 22 0 1 2 not found -------------------------------------------------------------------------------Command: delete Pournelle, Jerry Searching: 16 17 18 19 not found -------------------------------------------------------------------------------Command: show full Full: 1 4 7 8 16 18 20 21 22 -------------------------------------------------------------------------------Command: show empty Empty: 2 3 5 6 9 10 11 12 13 14 15 19 -------------------------------------------------------------------------------Command: show tombstones Tombstones: 0 17 -------------------------------------------------------------------------------Command: exit -------------------------------------------------------------------------------Submitting Your Program: You will submit this assignment to the Curator System (read the Student Guide), and it will be graded automatically. Instructions for submitting, and a description of how the grading is done, are contained in the Student Guide. Note that the automated grading system requires that your program be submitted as a single source file. For this project you should develop your implementation using separate compilation, and then manually concatenate the code into a single .cpp file, being careful to get the order right and remove obsolete #include directives. Submit a single gzip'd tar file containing the C++ source and header files for your solution to the Curator System. Submit only the C++ source and header files. Submit nothing else. Be sure that your header files only contain include directives for Standard C++ header files and your own headers; any other include directives will certainly result in compilation errors. You will be allowed up to five submissions for this assignment. Use them wisely. Test your program thoroughly before submitting it. If you do not get a perfect score, analyze the problem carefully and test your fix before submitting again. The highest score you achieve will be counted. The Student Guide and link to the submission client can be found at:

http://www.cs.vt.edu/curator/

Evaluation: Your submitted program will be assigned a score based upon the runtime testing performed by the Curator System. After that, your program may be given a brief evaluation by one of the TAs, who will consider: ƒ ƒ ƒ

whether your hash table implementation meets the design requirements in this specification, and whether your overall design shows a good object-oriented decomposition of the given problem, and whether the internal documentation of your code is acceptable.

See the Programming Standards page on the course website for specific requirements that should be observed in this course. This evaluation will produce a deduction (possibly zero, of course) that will be applied to your runtime testing score to produce your final score for the project. Pledge:

Last modified: 4/22/2005 10:04 AM

7

CS 2604 Minor Project 4

Spring 2005

Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the pledge statement provided with the earlier project specifications in the header comment for your source code file.

Last modified: 4/22/2005 10:04 AM

8

Suggest Documents