Segment Tree for Solving Range Minimum Query Problems Iskandar Setiadi 13511073 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132, Indonesia
[email protected]
Nowadays, a lot of data structures have been designed to solve problems which are faced in our reality. Complexity is considered as utmost priority in implementing data structure. In this paper, we will discuss a classical problem which is usually noted as Range Minimum Query (RMQ). In this problem, we want to find a minimum value between two specified indices in well-ordered set of array. There are a lot of techniques which can be used to solve this problem and all of them are having varied complexity. One of the most common way in implementing data structures is by using tree. An application of tree, called as segment tree, can be used as an effective approach to solve RMQ. Furthermore, we will compare the complexity of different approaches in solving Range Minimum Query. Index Terms: complexity, data structure, range minimum query, segment tree
I. INTRODUCTION Range Minimum Query Problems, denoted as RMQ, is defined as: Let A[0..n-1] be a linear array data structure containing n elements of well-ordered set. Let i,j be a positive integer such that 0 ≤ i ≤ j ≤ (n1). Let x be a positive integer. For each segment A[i..j], we want to find x (i ≤ x ≤ j) such that A[x] is position of the most minimum element in subarray A[i..j]. A query is denoted as a pair (i,j).
We can also substitute the stated problem above with maximum one; yet, range minimum query has more applications in our life. This fundamental problem has several applications in database searching, string processing, text compression, text indexing, and computational biology [6]. In computational biology, Range Minimum Query is used in identifying patterns of RNA. By detecting such patterns, biologist can easily identify similarity between two local regions of RNA’s. Imagine that you’re given a task to design an application for storing national exam (Ujian Nasional in Indonesia) results. Your boss wants to perform thousand operations for searching minimum scores between two intervals. This kind of problem is related to Range Minimum Query Problems. Further explanation of multiple approaches to solve this problem will be discussed later in Section III. One of the most efficient approach is by using segment tree. There is precisely two steps which are needed in this method. First, we need to build a tree data structure for storing each segments. Lastly, we only need to perform searching by query. There’s also several different approaches such as brute force / trivial method, brute force with DP (Dynamic Programming), and sparse table data structure. On the other hand, different approaches will have different performance too. Several basic knowledge that are needed in this paper will be elaborated in Section II. Every algorithm, which are written in this paper, will be written in C language (as used in IF2030).
II. RELATED THEORIES A. Picture 1.1 Range Minimum Query Representation [7] Picture 1.1 is a representation of array A containing 10 elements. We want to search position of the most minimum element between i = 2 and j = 7. A query (2,7) is applied at array A and resulting in RMQA (2,7) = x = 3. It can be easily seen that A[3] contains the most minimum element between A[2..7], that’s it, A[3] = 1. Makalah IF2091 Struktur Diskrit – Sem. I Tahun 2012/2013
Tree
Tree is defined as undirected graph and doesn’t have any circular vertex [3]. It is classified as a special kind of graph and well-known used to represent hierarchical data structure. Tree data structure also well-known for its recursive properties, as pointed in Picture 2.1 below. Element A is defined as the parent of three elements, that’s it, element B, C, and D as its child.
Picture 2.2 shows an example of the structure of segment tree. Each nodes in segment tree are storing value between intervals. Hence, segment tree can be classified as interval tree.
Picture 2.1 Tree Data Structure Let G = (V,E) is undirected graph and having n edges. A tree will have following properties: - Each vertices in G have a singular path - G is connected and it has (n – 1) edges - If an edge is removed, graph will be separated into two different components - G has no cycles and any two particular vertices can be connected by using one simple path Several basic tree terminologies that will be used in this discussion are listed below: - Root node : the topmost node in the tree (A) - Leaf node : the most bottom node in the tree (E, F, G, H) - Height : the longest length from the topmost node to a leaf node
B.
Segment Tree
One sub-application of tree data structure is segment tree. It allows querying which of the stored intervals contain a given value. The definition of segment tree is given below: Let pi be the list of distinct segment indices, defined as 1 ≤ pi ≤ n, for i = 1,2,3,… . A segment tree can be partitioned into several regions which intervals are defined with two consecutive endpoints pi and pi+1.
C. Complexity Complexity is used as an efficiency parameter in computational programming. Big-O notation is usually used to describe the behaviour of particular function, which is simplified in simpler terms. Let f, T be a positive non-decreasing function defined on positive integer. We can infer that T(N) = O(f(N)) if for some constant C and n0 holds:
N N ; T ( N ) O( f ( N )) 0 As stated above, we usually simplify this behaviour into simpler terms. Table 2.1 below is the classification of the following notation: Complexity (Notation) Name O(1) Constant O(log n) Logarithmic O(n) Linear O(n log n) Log-linear O(n2) Quadratic O(n3) Cubic O(2n) Exponential O(n!) Factorial Table 2.1 Order of Time Complexity in Big-O Notation Example II.C – 1 You’re given the following algorithm, written in C language. Determine its classification (using big-O notation). for(i=0;i