Int. J. Business Information Systems, Vol. 28, No. 3, 2018
299
Predictive auto-completion for query in search engine Vinay Singh and Dheeraj Kumar Purohit ABV-Indian Institute of Information Technology and Management, Gwalior, M.P.-474010, India Email:
[email protected]
Vimal Kumar*, Pratima Verma and Ankita Malviya Department of Industrial and Management Engineering, Indian Institute of Technology, Kanpur; U.P. 208016, India Fax: +91-512-2597553 Email:
[email protected] Email:
[email protected] Email:
[email protected] Email:
[email protected] *Corresponding author Abstract: The main goal of this research is to model an approach to give top-k predictive search results in search engine by the use of a combination of algorithmic and probabilistic approach and compare their processing time. Modified edit distance algorithm is used for spell auto-correction and prefix tree is used for auto-completion. Intersecting union list algorithm is also used for multi-query predictive results. Wikipedia dictionary words are used for a single word query dataset and Internet Movie Database (IMDB) movie list is crawl by a python crawler, which is built for this research. And the rating of the movie provided by IMDB and frequency of each word is used to rank words. Keywords: auto-complete; prefix tree; hashing; auto-correction; internet movie database; IMDB; query auto-completion; QAC; prefix; search engine. Reference to this paper should be made as follows: Singh, V., Purohit, D.K., Kumar, V., Verma, P. and Malviya, A. (2018) ‘Predictive auto-completion for query in search engine’, Int. J. Business Information Systems, Vol. 28, No. 3, pp.299–314. Biographical notes: Vinay Singh is an Assistant Professor in Information system and technology management area at ABV-Indian Institute of Information Technology and Management Gwalior, India. He has more than ten years of work experience in application of IT in business. He has published more than 12 research articles in national/international journals and conferences. His research interests lie in technology and innovation management, IT-enabled services to the rural market and knowledge management. Dheeraj Kumar Purohit is an MTech student of ABV Indian Institute of Information Technology and Management Gwalior, India.
Copyright © 2018 Inderscience Enterprises Ltd.
300
V. Singh et al. Vimal Kumar is a doctoral candidate in the Department of Industrial and Management Engineering, IIT Kanpur, India. He received his Master’s in Operations Research from the Department of Industrial and Management Engineering, IIT Kanpur in 2012. He completed his graduation (BTech) in Manufacturing Technology in 2010 from JSS Academy of Technical Education, Noida. Currently, he is pursuing research in the domain of TQM and organisational strategy. He has published seven articles in reputable international journals and presented eight papers at international conferences. He has chaired a session for quality control and management at the International Conference on Industrial Engineering and Operations Management at Kuala Lumpur, Malaysia and member of IABE and IEOM, USA. He is a contributing author in journals including IJPMB, IJPQM, The TQM Journal, Benchmarking: An International Journal, etc. Pratima Verma is Doctoral candidate in Industrial and Management Engineering at IIT Kanpur, India. She received her MBA in Finance and Human Resource Management from the Uttar Pradesh Technical University, Lucknow, India in 2011. She has one year of experience in teaching. Currently, she is working in the field of horizontal strategy. She has also been awarded JRF/SRF in the area of human resource management. She has published/presented nine papers in international journals. Ankita Malviya received her MTech from the Department of Industrial and Management Engineering, IIT Kanpur, India in 2016. She completed her graduation (BE) in Electronics and Telecommunication Engineering in 2013 from Jabalpur Engineering College (JEC), Jabalpur. Her research interest includes operations research and operations management etc. Currently, she is working as Post-Graduate Engineer Trainee in Tata Motors, Lucknow.
1
Introduction and literature survey
Query auto-completion (QAC) is a prominent feature provided by many modern search engines (Cai et al., 2016). Prediction of auto-completion search engine query (Kong et al., 2015; Cai et al., 2016; Cai and de Rijke, 2016; Smith et al., 2016; Vargas et al., 2016) is an important feature for modern search engines. Auto-completion is mostly used in many software and search engines such as Google, Yahoo and Bing. Till now Google is providing best search results which can search efficiently by the use of the auto-completion technique, but Google also considers user’s previous history, browser’s cache, etc. Technology adopts (Czamecki and Spiliopoulou, 2012; Joseph et al., 2015) these search engines (Singh et al., 2016). Especially, as the user enters a letter at a time, the search engine provides all the query related to query enter by the user in a rank wise manner so that the query which user want to search is on the top of the list and also correct his spelling mistakes, which helps the user to search efficiently. Auto-complete is also widely used in database applications and helps to reduce the possibility of search errors, loss of data integrity and accuracy (Chaudhuri and Kaushik, 2009). There are various applications which can use an auto-completion technique based
Predictive auto-completion for query in search engine
301
on its usage. For example, many search engines use tree-based data structure and some search engines use user’s previous history to give search result for example Gmail, Facebook, etc. The work will focus on the use of tree-based data structure as there will be no log in the panel for each user which can store all his previous data so that it can be used globally and open to all. For spell auto-correction to each word, the work will focus on intersecting union lists which is a data-mining technique used to find the intersection of all the words in a query in all the sets of the database. The main objective of this research is to provide efficient auto-completion search results in the search engine and compare the processing time of probabilistic and algorithmic model for a single word query. The implementation of auto-completion is done by the combination of algorithmic and probabilistic approach. A modified edit distance algorithm is introduced to give spell auto correction. For multi-word search query, IMDB movie dataset is used and the combination of the single word query algorithm and intersecting union list algorithm is used to predict results and priority of search depends on the rating of the movie provided by IMDB and frequency of each word. To analyse any search engine (Singh et al., 2016), and then the following activities are to be noticed: •
auto-completion feature
•
lesser searching time
•
provide efficient search result
•
automated spell correct
•
user-friendly interface.
1.1 Auto complete using graph mining: a different approach Peng and Park (2004) used preprocessing, hashing and graph-mining technique to explain the auto completion feature. In this method, the suggestions are provided on the basis of content to be searched rather than history or a predefined list. To implement this feature, Peng and Park (2004) have used graph-mining methodology. This feature puts a limit on the number of sequential searches done in the document to reach the desired content. Preprocessing is one of the important steps in data mining. It improves the data quality and makes the raw data suitable for analysis (Tan et al., 2006). In the preprocessing stage, keywords are found by removing stop words from text content and simultaneously a graph is created using these keywords. For example in this sentence, ‘compare the quality of clustering words’, ‘the’ and ‘of’ will be treated here as stop words and rest all are keywords. In this paper, graph mining is also used, as a graph can give the relation between data objects. As there will be a huge amount of data so the sequential search would not give efficient results. For this graph can be used so that pruning can be easily done to remove the irreverent result. As a result, graph mining avoids costly and repeated database scan. Hashing is also used to reduce lookup time. This technique provides suggestions only when the user enters at least two characters in the search field. A hash function may generate two or more keys for the same hash value, which creates a conflict situation or a collision. Collisions can be efficiently avoided by using various collision resolution techniques (Cormen et al., 2001).
302
V. Singh et al.
1.2 Distributed frequent itemset mining using trie data structure A new distributed trie-based algorithm DTFIM is proposed to find frequent item sets to make the search more efficient in distributed computing. In the first pass of a priori algorithm, items with less support value are eliminated for next pass by checking the frequency of each item. After k passes, support count for each candidate itemset is determined as the whole database is scanned. Use of trie data structure provides a very efficient result and improves the performance of a priori algorithm (Bayardo et al., 2004). Ansari et al. (2008) created a trie-based data structure for each local machine and then a priori algorithm which is proposed by Agrawal and Srikant (1994) is used to find frequent item sets from the database. In this paper, Bordon’s idea is used for designing of the algorithm in distributed computing environment and it is concluded that the use of trie data structure provides better results than by using the sequential algorithm. Figure 1 shows that simple trie. Figure 1
A simple trie (see online version for colours)
Source: Ansari et al. (2008)
1.3 Misspellings in drug information system queries Senger et al. (2010) worked on providing an efficient search algorithm for finding the drug in an electronic drug information system. They analysed two characteristics spelling-correction and error in drug searches and how auto completion can be used as a prevention strategy. They found correctly spelled and misspelled drug names from the database of the University Hospital of Heidelberg which were entered in the drug information systems (DIS) in 2006. This algorithm provides options to look up generic drug names and brand names of a drug, which differ substantially due to regulation. These words are composed of a prefix, suffixes, and word stems. Their search engine does take special characters like ‘$’, ‘-’, ‘%’, ‘,’ and there is also a constraint on minimum query length of three letters. In case if a query does not match with any generic drug name nor with a brand name then the phonetic search algorithm Aspell (Atkinson, 2006) was applied automatically. Figure 2 shows the schematic diagram of a section of the German standard keyboard.
Predictive auto-completion for query in search engine Figure 2
303
Schematic diagram of a section of the German standard keyboard
Source: Senger et al. (2010)
Here the letters ‘Q’ and ‘F’ (bold) and their close neighbours (highlighted in grey) as defined for a ‘key proximity table’ are shown. They manipulate the distance from the wrong word’s letter to the correct word’s letter and give ranking accordingly so that it can be used in spell check.
1.4 Space-efficient data structures for top-k completion Hsu and Ottaviano (2013) studied on the case when the string set is so large that it cannot fit into data structure memory without compression. They worked on providing a statistical ranking for a given prefix of string. As auto-completion is a very popular method which is used in the search engine so that mistakes done by users can be neglected and a most suitable result is obtained as per user needs. Figure 3 shows the three ways of usage scenarios of top-k completions. For this, they used three different trie-based data structure which are: •
Completion trie: it is a compact data structure which is based on compressed compacted trie. Here, the children of every node are ordered as per the highest score among their descendants. It enumerates the completions of a string prefix in score order by storing the max score at each node. To reduce space occupied by the data it uses a standard compression technique called variable-length encoding.
•
Range minimum query (RMQ) trie: it is a generic scheme which can be applied to any data structure that maps a set of strings to consecutive integers objectively in lexicographic order, by using an RMQ data structure (Fischer and Heun, 2011).
•
Score-decomposed trie: it is a compressed data structure which is derived from the path-decomposed trie of (Grossi and Ottaviano, 2015). Here, they used path decomposition which is based on the maximum descendant score. This path decomposition provides efficient top-k completion queries.
Efficient interactive fuzzy keyword search: worked on providing search result based on intersecting union list algorithm. Edit distance algorithm is used to provide scoring of words to give auto correct. Used threshold t = 2 in the auto-correct algorithm. Predicting auto complete search query in a search engine is a very challenging topic. It helps the user to search efficiently and extract the results which user requires. As there will be high chances of errors in spellings it can be efficiently solved by the search engine. Although, reducing searching time is also an important part of research. The search engine query prediction is always a tough work. The combination of the single and multi-word query is
304
V. Singh et al.
not provided on the basis of intersecting union list algorithm (Ji et al., 2009), prefix tree data structure and modified edit distance algorithm. Earlier researches fetched results only on the basis of the frequency of the word. The results should not depend on a single criterion, there must be a combination of others. Figure 3
Usage scenarios of top-k completion, (a) search engine (b) browser (c) soft keyboard (see online version for colours)
(a)
(b)
(c)
Source: Hsu and Ottaviano (2013)
1.5 Brief explanation of the terminologies used •
Auto-complete: the original purpose of word prediction software is to help people with physical disabilities increase their typing speed, as well as to help them decrease the number of keystrokes needed in order to complete a word or a sentence. Here auto-complete is used so that user can see the different result when the types some letters or words. It can also be called as suggestion list which helps the user to search in any search engine.
•
Data structure: in computer science, a data structure is a particular way of organising data in a computer so that it can be used efficiently. Different kinds of data structures are suited to different kinds of applications, and some are highly specialised to specific tasks. This work will focus on tree data structure, as it can be considered as best data structure while searching sub-string in a large pool of string database for example trie, ternary search tree, suffix tree.
•
Priority queue: a priority queue is an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a ‘priority’ associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue.
•
Data mining: data mining, an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. Here data mining is used in providing auto-spell correction which can reduce their reverent results. This paper organised as follows: the next section is devoted to the methodology after the introduction and a brief review of the literature. In Section 3, we will discuss results and discussion, and section 4 outlines the conclusions.
Predictive auto-completion for query in search engine
2
305
Methodology
2.1 Data collection methodology For the single word query, the dataset was extracted from Wikipedia words. The size of the dataset is 1,128 kb and number of unique words are 1,09,583 and in multi-word query dataset type of query a web crawler was made using python. This web crawler was used to extract data from IMDB. The data is in the form of movie names which belong to four particular types as follows: PG-13, G, PG and R. PG-13 indicate there is material in the film that may not be suitable for children under the age of 13. According to the MPAA, a ‘G-rated’ motion picture contains nothing in theme, language, nudity, sex, violence or other matters that, in the view of the rating board, would offend parents whose younger children view the motion picture. A PG-rated film may not be suitable for children. The MPAA says a PG-rated should be checked out by parents before allowing younger children to see the movie. R-ratings require a parent or adult guardian to be present in order to view the film. An R-rated film may include adult themes, adult activity, hard language, intense or persistent violence, sexually-oriented nudity, drug abuse or other elements so that parents are counselled to take this rating very seriously. These types of movie names were selected because another type contains different language movies which were not suitable for the data set. Also, there were many movies for which rating was not provided by IMDB which is one of the criteria for prioritising the words in the final result. The API used for this work is http://getimdb.herokuapp.com/get/?id=tt”+”unique id”. Here unique id is an integer value given to each name which varies from 1 to 10,00,000 and more. A total number of crawled data is 5,00,000 and total fetched results according to the type defined is 10,000. On fetching results for ID: 2310332. The summary of the results is shown in JASON in Appendix. The flowchart of research model is shown in Figure 4. Figure 4
Flow chart of research model
306
V. Singh et al.
The model is divided into two major types of queries which are ‘single word query’ and ‘multi-word query’. In single word query, two types of different algorithms are applied to obtain the results which are ‘auto-completion algorithm’ and ‘spell auto-correction algorithm’. And final results are obtained on a combination of both the algorithms. In the multi-word query, a combination of both the above algorithm and union intersection list algorithm is used to obtain the results. Two types of algorithm are implemented for a single word query and in the end; the results from both are combined to fetch top 10 results according to the query. Auto-completion algorithm is an algorithm which completes the word given in the query and gives a list of all the words which has the given query as a prefix. Prefix-tree is used for this purpose. Although pre-processing time is more in the beginning but when the query is given then the time was taken to fetch the result is less.
2.2 Algorithm 1: Preprocessing auto completion algorithm 1
Insert all the words in prefix tree
2
Maintain a map having
3
Key -> word
4
Value -> frequency.
2.3 Algorithm 2: Query processing auto-completion algorithm 1
Scan the word ‘W’ entered by the user
2
Find all the words in data set having prefix ‘W’
3
Maintain a priority queue
4
Data -> word
5
Priority key -> frequency
6
if (frequency of two words are equal)
7
lowest lexicographical word is given higher priority.
In this spell auto-correction algorithm, Bayes theorem is used to find the correct word for the misspelled word by checking the conditional probability of a collection of words that possibly can be the correct word. From the data set first we find the frequency count of each word and if any word is not present in the dataset we assign it 1 for the smoothing purpose. After that edit, distance is calculated which are the number of edits done to convert one word to other. An edit can be a deletion, a transposition, an alteration or an insertion. Here edit distance 2 words are found which returns a set of words having a distance 2 from the misspell word. The algorithm finally returns top 10 words. Preprocessing time is less than auto completion preprocessing time. When a wrong spelling is given in the query then it corrects the word having an edit distance less than equal to 2 and displays ten words which are the correct form of the wrongly spell the word in the query.
Predictive auto-completion for query in search engine
307
2.4 Algorithm 3: Preprocessing spell auto-correct algorithm 1
Initialise a map
2
Key -> word
3
Value -> frequency.
2.5 Algorithm 4: Query processing spell auto-correction algorithm 1
Scan the word ‘W’ entered by user
2
Initialise a vector V1, V2 of string
3
Initialise a priority queue of size 10
4
If W is present in map
5
V1. Pushback (W)
6
else
7
V1.pushbackall (Edit Distance (W))
8
V2 ←V1
9
For each word in V1
10 If M[word] is present 11 push into priority queue 12 Print priority queue. For example, List of words when word ‘pq’ is passed in edit distance algorithm is as follows: {q, p, qp, aq, bq, cq, dq, eq, fq, gq, hq, iq, jq, kq, lq, mq, nq, oq, pq, qq, rq, sq, tq, uq, vq, wq, xq, yq, zq, pa, pb, pc, pd, pe, pf , pg, ph, pi, pj, pk, pl, pm, pn, po, pp, pq, pr, ps, pt, pu, pv, pw, px, py, pz, apq, bpq, cpq, dpq, epq, fpq, gpq, hpq, ipq, jpq, kpq, lpq, mpq, npq, opq, ppq, qpq, rpq, spq, tpq, upq, vpq, wpq, xpq, ypq, zpq, paq, pbq, pcq, pdq, peq, pfq, pgq, phq, piq, pjq, pkq, plq, pmq, pnq, poq, ppq, pqq, prq, psq, ptq, puq, pvq, pwq, pxq, pyq, pzq, pqa, pqb, pqc, pqd, pqe, pqf , pqg, pqh, pqi, pqj, pqk, pql, pqm, pqn, pqo, pqp, pqq, pqr, pqs, pqt, pqu, pqv, pqw, pqx, pqy, pqz }
2.6 Multi-word query implementation 2.6.1 Case 2: Multi-word query Given a query Q = p1, p2, …, p, suppose ki1, ki2, … is the set of keywords that share the prefix pi. Let Liij denote the inverted list of kij, and Ui = Sij Lij be the union of the lists for pi. We study how to compute the answer to the query, i.e., Ti Ui.
308
V. Singh et al.
2.7 Algorithm 5: Major steps of edit distance algorithm 1
Delete: remove one letter.
2
Transpose: swap adjacent letters.
3
Replace: change one letter to another.
4
Insert: insert a letter.
One method is the following. For each prefix pi, we compute the corresponding union list Ui on-the-fly and intersect the union lists of different keywords. The time complexity for computing the unions could be O(Pi,j | Lij |). The shorter the keyword prefix is, the slower the query could be, as inverted lists of more predicted keywords need to be traversed to generate the union list. This approach only requires the inverted lists of trie leaf nodes, and the space complexity of the inverted lists is O(n × L), where n is the number of records and L is the average number of distinct keywords of each record. Alternatively, we can pre-compute and store the union list of each prefix, and intersect the union lists of query keywords when a query comes. The main issue of this approach is that the pre-computed union lists require a large amount of space, especially since each record occurrence on an inverted list needs to be stored many times. The space complexity of all the union lists is O(n × L × w), where w is the average keyword length. Compression techniques can be used to reduce the space requirement. The ten words displayed are prioritised on the basis of word frequency and movie rating given by IMDB.
3
Results and discussion
3.1 Single word query results The results are in the form of preprocessing time taken by both algorithms for learning or storing purpose and processing time of algorithms when the query is given by the user. Different string length words are given as a query to check the processingtime.
3.2 Preprocessing time Auto-completion is taking more time as prefix tree is used in this algorithm and in auto spell correct array list is used. So preprocessing time of auto-completion is more than auto spells correctly. Dataset size is common for both algorithms which are 1,128 kB having 109,583 different words. Table 1 shows the preprocessing time of single word query. Table 1
Preprocessing time of single word query
Auto completion time 2,216 ms
Auto spell correction time 7 ms
3.3 Processing time Results for auto-completion results are shown in Table 2.
309
Predictive auto-completion for query in search engine Table 2
Results obtained from single word query
Query
Auto-completion results
Auto-spells correction results
fre
freak freaked freakier freakiest freakily freaking freakish freakishly freakishness freakout
fare fie fire re fret are fred free ire fore
appl
applaud applaudable applaudably applauded applauder applauders applauding applauds applause applauses
appal app apple apply pp paps amps appals appall pall
narra
narrate narrated narrater narraters narrates narrating narration narrations narrative narratives
sabra terra sacra harry durra marry nary aura laura narrate
yellow
yellow yellowbellies yellowbelly yellowed yellower yellowest yellowing yellowish yellowknife yellowly
yellow
superio
superior superiorities superiority superiorly superiors
superior super superiors superego supering superb supered supers
310
V. Singh et al.
3.4 Multi-word query results The results are in the form of pre-processing time taken by inverted list algorithm for learning or storing purpose which is a combination of both the algorithms that are used in single word query and processing time of algorithms when the query is given by the user. Different movie names are given as a query to check the processing time. The processing time of queries for both algorithms is shown in Table 3. Table 3 Query
Processing time comparison of single word queries Auto-completion time (ms)
Auto spell correct time (ms)
fre
7.0
60.0
appl
1.0
44.0
narra
0.5
35.0
yellow
1.0
0.5
superio
0.65
54.0
3.5 Preprocessing time The total preprocessing time of both algorithms is 1,548 ms without removing special characters and 1,027 ms after removing special characters. The data set size is 412 kb having 10,000 different movie names with total unique words are 8,556. Figures 5 and 6 show a graphical representation of processing. Figure 5
Processing time of single word queries (see online version for colours)
Figure 6
Processing time of multi-word queries (see online version for colours)
Predictive auto-completion for query in search engine
311
3.6 Processing time The processing time of various queries and results are as shown in Table 4. Table 4
Processing time and results of multi-word queries
Query godfather loos cannon
beautiful mind
pirsuits happiness
forest grump
Results the godfather squad the godfather of green bay loose cannons loose tooth goose on the loose los locos five loose women screw loose run lola run the boys of Baraka los lunes al sol a beautiful mind life is beautiful life is beautiful eternal sunshine of the spotless mind beautiful people hearts and minds beautiful thing the beautiful country beautiful girls mind the gap the pursuit of happiness double happiness little shots of happiness shortcut to happiness shortcut to happiness pirates of the caribbean: the curse of the black pearl pirates of the caribbean: dead man’s chest pirates of the caribbean: at world’s end pirates of the plain piranha 3d forrest gump grumpy old men once upon a forest a light in the forest haunted forest the forest forest warrior forest of the damned crumb
3.7 Effect of removing special characters On removing special characters such as ‘+’, ‘-’, ‘.’, ‘,’. The preprocessing time, as well as processing time, gets reduced. The effect is as shown in Table 5.
312
V. Singh et al.
Table 5
Processing time comparison of multi-word queries
Query godfather
4
With special characters (ms)
Without special characters (ms)
248.0
244.0
loos cannon
95.0
94.0
beautiful mind
292.0
283.0
pirsuits happiness
421.0
415.5
Conclusions
This work will help in predicting top k results in the auto-completion suggestion box. The use of modified edit distance algorithm is giving highly accurate results if the user entered a wrong spelled word. And for completion of the whole word, prefix tree is used to lowering the processing time of the search. Intersection union list algorithm for the multi-word query is also giving efficient results as the frequency of each word and IMDB movie rating is taken as a priority in predicting results.
References Agrawal, R. and Srikant, R. (1994) ‘Fast algorithms for mining association rules’, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, September, Vol. 1215, pp.487–499. Ansari, E., Dastghaibifard, G., Keshtkaran, M. and Kaabi, H. (2008) ‘Distributed frequent item set mining using trie data structure’, International Journal of Computer Science, Vol. 35, No. 3, pp.377–381. Atkinson, K. (2006) GNU Aspell 0.60, Vol. 4. Bayardo, R., Goethals, B. and Zaki, M.J. (2004) Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004). Cai, F. and de Rijke, M. (2016) ‘A survey of query auto completion in information retrieval’, Foundations and Trends® in Information Retrieval, Vol. 10, No. 4, pp.273–363. Cai, F., Reinanda, R. and Rijke, M.D. (2016) ‘Diversifying query auto-completion’, ACM Transactions on Information Systems (TOIS), Vol. 34, No. 4, p.25. Chaudhuri, S. and Kaushik, R. (2009) ‘Extending auto completion to tolerate errors’, Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, June, pp.707–718, ACM. Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C. (2001) Introduction to Algorithms, 2nd ed., Chapter 7, The MIT Press Cambridge , Massachusetts London, England McGraw-Hill Book Company. Czamecki, C. and Spiliopoulou, M. (2012) ‘A holistic framework for the implementation of a next generation network’, International Journal of Business Information Systems, Vol. 9, No. 4, pp.385–401. Fischer, J. and Heun, V. (2011) ‘Space-efficient preprocessing schemes for range minimum queries on static arrays’, SIAM Journal on Computing, Vol. 40, No. 2, pp.465–492. Grossi, R. and Ottaviano, G. (2015) ‘Fast compressed tries through path decompositions’, Journal of Experimental Algorithmics (JEA), Vol. 19, No. 1, pp.3–4. Hsu, B.J.P. and Ottaviano, G. (2013) ‘Space-efficient data structures for top-k completion’, Proceedings of the 22nd International Conference on World Wide Web, May, pp.583–594, International World Wide Web Conferences Steering Committee.
Predictive auto-completion for query in search engine
313
Ji, S., Li, G., Li, C. and Feng, J. (2009) ‘Efficient interactive fuzzy keyword search’, Proceedings of the 18th International Conference on World Wide Web, April, pp.371–380, ACM. Joseph, N.P.S., Mahmood, A.K., Yin, C.P., Wan, W.S., Yuen, P.K. and Heng, L.E. (2015) ‘Barebone cloud IaaS: revitalisation disruptive technology’, Int. J. Business Information Systems, Vol. 18, No. 1, pp.107–126. Kong, W., Li, R., Luo, J., Zhang, A., Chang, Y. and Allan, J. (2015) ‘Predicting search intent based on pre-search context’, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, August, pp.503–512, ACM. Peng, W. and Park, D.H. (2004) ‘Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization’, Urbana, Vol. 51, p.61801. Senger, C., Kaltschmidt, J., Schmitt, S.P., Pruszydlo, M.G. and Haefeli, W.E. (2010) ‘Misspellings in drug information system queries: characteristics of drug name spelling errors and strategies for their prevention’, International Journal of Medical Informatics, Vol. 79, No. 12, pp.832-839. Singh, V., Singh, A., Jain, D., Kumar, V. and Verma, P. (2016) ‘Patterns affecting structural properties of social networking site ‘Twitter’’, Int. J. Business Information Systems, accepted. Smith, C.L., Gwizdka, J. and Feild, H. (2016) ‘Exploring the use of query auto completion: search behavior and query entry profiles’, Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, March, pp.101–110, ACM. Tan, P.N., Steinbach, M. and Kumar, V. (2006) Introduction to Data Mining, Vol. 1, Pearson Addison Wesley, Boston. Vargas, S., Blanco, R. and Mika, P. (2016) ‘Term-by-term query auto-completion for mobile search’, Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, February, p.143152, ACM.
314
V. Singh et al.
Appendix (see online version for colours)