From Random to Hierarchical Data through an ...

0 downloads 0 Views 275KB Size Report
Rimon Elias, Mohab Al Ashraf, and Omar Aly ... rimon.elias@guc.edu.eg ..... Bladh, T., Carr, D., Scholl, J.: Extending tree-maps to three dimensions: a compar-.
From Random to Hierarchical Data through an Irregular Pyramidal Structure Rimon Elias, Mohab Al Ashraf, and Omar Aly Faculty of Digital Media Engineering and Technology German University in Cairo New Cairo City, Egypt [email protected]

Abstract. This paper proposes to transform data scanned randomly in a well-defined space (e.g, Euclidean) along a hierarchical irregular pyramidal structure in an attempt reduce search time consumed querying these random data. Such a structure is built as a series of graphs with different resolutions. Levels are constructed and surviving cells are chosen following irregular irregular pyramidal rules and according to a proximity criterion among the space points under consideration. Experimental results show that using such a structure to query data can save considerable search time. Keywords: Irregular pyramids, hierarchical structure, point clustering, hierarchical visualization, multiresolution visualization.

1

Introduction

Sometimes large sets of data are sought to be searched with respect to a specific query point. Many data items in these sets could have been excluded from the search as they are far from the query point. However, if data items are not clustered, there will no way but to check each item; a process that can be time consuming. If the data items are clustered or categorized into a hierarchy, search time can be enhanced considerably. However, if we structure data in a hierarchy, visualizing such a hierarchy may be a challenge. Different techniques have been developed over the last years to help humans grasp the structure of a hierarchy in a visual form (e.g., treemaps [19], information slices [1] and sunburst [20]). Those techniques can be categorized under different sets depending on the nature of data visualized and the way the data are visualized. This paper presents a technique based on irregular pyramidal rules to cluster data points in an aim to reduce time consumed in the search process. The paper is organized as follows. Sec. 2 presents the concepts of pyramidal architecture and multiresolution structures. Sec. 3 surveys different visualization techniques that have been developed under different categories. Sec. 4 presents our algorithm that depends on a hierarchical structure to cluster the data. Finally, Sec. 5 presents some experimental results while Sec. 6 derives some conclusions. A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 324–333, 2009. c Springer-Verlag Berlin Heidelberg 2009 

From Random to Hierarchical Data

2

325

Pyramidal Architecture

Hierarchical or multiresolution processing through pyramidal structures is a wellknown topic in image analysis. The main aim of such a concept is to reduce the amount of information to be manipulated in order to speed up the whole process. Over the past recent decades, many hierarchical or pyramidal structures have been developed to solve various problems that process images in general (e.g., segmenting an image according to its different gray levels). Such pyramidal structures can be categorized into two main subsets. These are regular and irregular pyramids. The classification of regularity and irregularity depends on whether a parent in a hierarchy has a constant number of children to build a regular structure or various number of children to build an irregular structure. Regular pyramids include, among others, bin pyramid [9] in which a parent has exactly two children; quad pyramid [9] where a parent has four children (Fig. 1); hexagonal pyramid [7] that uses a triangular tessellation and in which a parent has four children; dual pyramid [15] with levels rotated 45◦ alternatively. In the category of irregular pyramids, the number of children per parent varies according to the information processed and the operation under consideration. Hence, the number of surviving nodes, cells or pixels may change from one situation to another according to the data processed. In order to accommodate this: • A level should be represented as a graph data structure; and • Some rules must be utilized in order to govern the process. In the adaptive pyramid [8] and the disparity pyramid [6], the decimation process; i.e., the process by which the surviving cells are chosen, can be controlled by two rules: 1. Two neighbors cannot survive together to the higher level; and 2. For each non-surviving cell, there is at least one surviving cell in its neighborhood. It is worth mentioning that all the above pyramids work on images. However, we may apply pyramidal rules to space points in order to cluster them according to

Fig. 1. An example of a quad pyramid

326

R. Elias, M. Al Ashraf, and O. Aly

the proximity among each other. Hence, flat and random data with no apparent hierarchical nature can be categorized into a hierarchy. The next section specifies the steps of the algorithm we propose in order to cluster the data points and visualize them as a hierarchy using a query-dependent pixel-oriented technique.

3

Visualization Techniques

In addition to the irregular pyramid concept mentioned above, we need to investigate some visualization concepts. These are query-dependent versus queryindependent techniques in addition to different techniques to visualize hierarchies. 3.1

Query-Dependency

Visualization techniques can be categorized into query-dependent and queryindependent subsets. The query-dependent techniques refer to visualizing the arranged data according to some attribute. The user may input a query point to compare with the other data items. The differences can be calculated, arranged in order and visualized as colored pixels. Spiral and axes techniques and their variations [11,10,12] are examples that can be used in this case. The queryindependent techniques do not require the user to input a query point to visualize data with respect to that point; instead, the data are visualized with no apparent order if data items are not sorted originally. 3.2

Visualizing Hierarchies

If data are arranged in a hierarchical order, the visualization problem can be re-formulated so as to visualize the hierarchical structure (i.e., a tree in general). It becomes more difficult if such trees grow in width or depth. More challenge is imposed if interaction is to be added for the user to browse or focus on a subtree. Many algorithms have been developed in this area as SpaceTree [16], Cheops [3], cone trees [18], InfoTV [5] and InfoCube [17]. Treemaps [19] can also by used to visualize hierarchies. The idea of a treemap is to split the space into regions according to the number of branches as well as the size of the hierarchy. Versions of treemaps are clustered treemaps [21], Voronoi treemaps [2] and 3D Treemaps [4]. Circular visualization techniques can also be used to view hierarchies as in information slices [1], Sunburst [20] and InterRing [22]. Note that visualizing a hierarchy as a set of levels where each level is represented as a graph consisting of a number of nodes and edges is another challenge. Examples of techniques tackling this problem can be found in [14,13] where the graph nodes are visualized as colored spheres while the edges are shown as thin cylinders; each connecting two spheres. Although the hierarchical structure that we are suggesting in this paper is built as an irregular pyramid with levels represented as graphs comprising nodes and edges, visualizing this structure is not our target. Instead, we aim to convert the flat data into hierarchical data in order to speed up the process of querying the data.

From Random to Hierarchical Data

4

327

Algorithm

Our algorithm can be split into two main phases to: 1. Build the hierarchy through data clustering using irregular pyramidal technique. 2. Visualize the established hierarchical data with respect to a query point. As in other irregular pyramids, each level of the structure is represented as a graph. At the lowest level (i.e., the base), the graph consists of a number of cluster cells (or nodes) where each node is linked to every other node and where every node contains only one space point. At the upper levels, a cluster node may contain more points while the number of clusters is reduced at that level comparing to its predecessor. As mentioned in Sec. 2, some rules must exist in order to control the decimation process of choosing the surviving cells and how cells at different levels are linked together. The rules used in this structure are: 1. Two neighbors may both survive at the next level if and only if some binary variable is set to zero during the decimation process. Such a rule is different from the case of the adaptive and disparity pyramids [8,6]; and 2. For each nonsurviving node, there exists at least one surviving node in its neighborhood. Such a rule is true in case of the adaptive and disparity pyramids. Suppose that the set of clusters at a given level i is L(i) = {C(i,1) , C(i,2) , ..., C(i,n) } where n is the number of clusters at this level; and C(i,j) is a cluster consisting of a number of space points (where j ∈ {1, ..., n}). Also, we can define a cluster as C(i,j) = {p(i,j,1) , p(i,j,2) , ..., p(i,j,m) } where j ∈ {1, ..., n} is the cluster number; m is the number of points in cluster; and p is a vector whose length depends on the dimension of space. A binary variable q is reset to 0 for every two clusters, C(i,j) and C(i,k) , at level Li . The following Euclidean distance is calculated among the points contained in these clusters:  D     2 d p(i,j,a) , p(i,k,b) = ||p(i,j,a) − p(i,k,b) || =  (1) p(i,j,a,d) − p(i,k,b,d) d=1

where i is the level number; j and k are the cluster numbers; a and b are the point numbers; D is the dimension of space and ||.|| represents the norm of the difference between the two vectors. Manhattan metric may be used instead for faster results as: D      p(i,j,a,d) − p(i,k,b,d)  d p(i,j,a) , p(i,k,b) = d=1

(2)

  The value of the distance d p(i,j,a) , p(i,k,b) is comparedagainst a threshold  t supplied as a parameter to the algorithm. If the test d p(i,j,a) , p(i,k,b) < t

328

R. Elias, M. Al Ashraf, and O. Aly Table 1. Different creation and linking possibilities q C(i+1,j) C(i+1,k) 1 Yes No 1 No No 1 Yes Yes 0 Yes No 0 No No 0

Yes

Yes

Action Link C(i,k) to C(i+1,j) Create a new C(i+1,j) and link C(i,k) & C(i,j) to C(i+1,j) Delete C(i+1,j) and link C(i,j) to C(i+1,k) Create a new C(i+1,k) and link C(i,k) to C(i+1,k) Create a new C(i+1,k) and link C(i,k) to C(i+1,k) Create a new C(i+1,j) and link C(i,j) to C(i+1,j) Take no action

results in a true condition, the search is broken immediately for the current clusters and the variable q is set to 1; otherwise, q remains 0. Thus, different situations arise with respect to the value of q and whether or the parents C(i+1,j) and C(i+1,k) of clusters C(i,j) and C(i,k) do exist. Those can be summarized as listed in Table 1. The procedure explained above is repeated until all clusters are within distances greater than the above threshold t from each other (similar to [8,6]). Note that statistics like the mean and the size of the clusters are updated at each level. After storing the flat random data along a hierarchy, viewing parts of the data relevant to a query point becomes easier. Spiral and axes techniques [11] are applied to the hierarchical data. Clusters constituting each level are represented as pixels where each pixel has a color indicating the mean of all points contained in the cluster. Interactivity is added as clicking on a pixel displays the children underneath. A way of magnifying the results is also included in our implementation.

5

Experimental Results

We considered different factors while building the pyramid. Among these factors are the number of data points to be clustered and the threshold used and their impact on the number of levels and the number of clusters at the top level and consequently on the reduction factor of clusters. Ten files with sets ranging from 100 to 1000 5D points are used with a fixed threshold t of 800 applied to Manhattan metric. As expected, the number of levels increases as the number of points increases for the same threshold. This is shown in Fig. 2(a). In our hierarchical structure, a cluster contains one data point at the lowest level, which makes the number of clusters at this level equal to the number of points. As we go up the hierarchy, the number of clusters gets smaller while the number of points per cluster gets larger. For the ten files used before with the same threshold t of 800, the greatest impact concerning the reduction of the number of clusters with respect to the number of points happens at the second level as shown in Fig. 2(b).

From Random to Hierarchical Data

329

(a) (b) Fig. 2. (a) The number of levels of the hierarchical structure increases as the number of points increases. (b) The number of clusters is reduced significantly at the second level of the hierarchy.

(a)

(b) Fig. 3. (a) Number of clusters at the top level of the hierarchy for different point sets and reduction factor values associated with these sets. For all cases, a threshold value of 800 is used. (b) The percentage of the number of clusters at the top levels to the total number of points decreases as the number of points increases.

330

R. Elias, M. Al Ashraf, and O. Aly

Fig. 4. The number of levels peeks to 4 before it decreases again to 3

(a)

(b) Fig. 5. (a) Number of clusters at the top level of the hierarchy for different threshold levels and reduction factor values associated with these threshold levels. (b) The reduction factor increases as the threshold value increases.

For each data set where t = 800, the percentages of the number of clusters at the top levels to the total number of points were measured. As expected from Fig. 2(b), the percentage decreases as the number of points increases. Consequently, the reduction factor increases as the number of points increases. This is shown in Fig. 3. In order to test the impact of the threshold value, one file containing 1000 points is used with threshold values ranging from 300 to 1300. In these cases, the number of levels ranges from 2 to 4 according to the threshold value as shown in Fig. 4. It is logical that by increasing the threshold value, more points can be clustered together and less clusters can be formed at the top level of the hierarchical structure. As a consequence, the reduction factor should increase as the threshold value increases. These results are shown in Fig. 5.

From Random to Hierarchical Data

331

(b)

(a)

Fig. 6. Axes technique results for the same file after clustering. (a) Level 4 L(4) is displayed with 243 points. (b) Level 3 L(3) showing the contents of one of points in the lower right quadrant in (a).

(a)

(b) Fig. 7. (a) Time consumed to perform both versions of the axes technique for different sets of points. (b) Time consumed to perform both versions of the axes technique for different sets of points.

In order to visualize the points along the hierarchy built as four levels for a set of 1000 5D points with t= 800, we use both spiral and axes visualization techniques. As shown in Fig. 6(a), we start by plotting the top level (L(4) ) of the

332

R. Elias, M. Al Ashraf, and O. Aly

clustered hierarchy that contains only 243 points (as opposed to 1000 points in the original list). A cluster at the top level is represented as a point with a color indicating the mean of the points (or sub-clusters) contained in that cluster. The user has the ability to select a particular cluster and view its inner cluster points where each point can represent a cluster that can be viewed hierarchically and so on. Fig. 6(b) shows the contents of L(3) after selecting a point in the lower right quadrant in L(4) . In order to show the effect of our approach, we measured the time consumed when using the axes technique in both cases of random and hierarchical data for different sets of points. This is shown in Fig. 7. Notice that the difference between both versions gets larger with larger number of points. This makes sense as the reduction factor gets larger with larger number of points as mentioned previously (refer to Fig. 3(a)). We measured the time consumed to display random and hierarchical data for the same test set using the spiral technique and these were 76 msec and 16 msec respectively with a computer running at 2.0 GHz.

6

Conclusions

An irregular pyramidal scheme is suggested to transform random data hierarchically in an attempt to reduce time consumed searching the whole data for a particular query. Tests show reductions in the amount of data processed and consequently in time consumed.

References 1. Andrews, K., Heidegger, H.: Information slices: Visualising and exploring large hierarchies using cascading, semi-circular discs. In: IEEE InfoVis 1998, pp. 9–12 (1998) 2. Balzer, M., Deussen, O., Lewerentz, C.: Voronoi treemaps for the visualization of software metrics. In: Proc. ACM SoftVis 2005, New York, USA, pp. 165–172 (2005) 3. Beaudoin, L., Parent, M.-A., Vroomen, L.C.: Cheops: a compact explorer for complex hierarchies. In: Proc. 7th conf. on Visualization (VIS 1996), Los Alamitos, CA, USA, p. 87 (1996) 4. Bladh, T., Carr, D., Scholl, J.: Extending tree-maps to three dimensions: a comparative study. In: Masoodian, M., Jones, S., Rogers, B. (eds.) APCHI 2004. LNCS, vol. 3101, pp. 50–59. Springer, Heidelberg (2004) 5. Chignell, M.H., Poblete, F., Zuberec, S.: An exploration in the design space of three dimensional hierarchies. In: Human Factors and Ergonomics Society Annual Meeting Proc., pp. 333–337 (1993) 6. Elias, R., Laganiere, R.: The disparity pyramid: An irregular pyramid approach for stereoscopic image analysis. In: VI 1999, Trois-Rivi`eres, Canada, May 1999, pp. 352–359 (1999) 7. Hartman, N.P., Tanimoto, S.: A hexagonal pyramid data structure for image processing. IEEE Trans. on Systems, Man and Cybernetics 14, 247–256 (1984) 8. Jolion, J.M., Montavert, A.: The adaptive pyramid: A framework for 2d image analysis. CVGIP: Image Understanding 55(3), 339–348 (1991)

From Random to Hierarchical Data

333

9. Jolion, J.M., Rosenfeld, A.: A Pyramid Frame-work for Early Vision. Kluwer Academic Publishers, Dordrecht (1994) 10. Keim, D.A., Ankerst, M., Kriegel, H.-P.: Recursive pattern: A technique for visualizing very large amounts of data. In: Proc. 6th VIS 1995, Washington, DC, USA, pp. 279–286, 463 (1995) 11. Keim, D.A., Kriegel, H.: VisDB: Database exploration using multidimensional visualization. In: Computer Graphics and Applications (1994) 12. Keim, D.A., Kriegel, H.-P.: Visualization techniques for mining large databases: A comparison. IEEE Trans. on Knowl. and Data Eng. 8(6), 923–938 (1996) 13. Kerren, A.: Explorative analysis of graph pyramids using interactive visualization techniques. In: Proc. 5th IASTED VIIP 2005, Benidorm, Spain, pp. 685–690 (2005) 14. Kerren, A., Breier, F., Kgler, P.: Dgcvis: An exploratory 3d visualization of graph pyramids. In: Proc. 2nd CMV 2004, London, UK, pp. 73–83 (2004) 15. Kropatsch, W.G.: A pyramid that grows by powers of 2. Pattern Recognition Letters 3, 315–322 (1985) 16. Plaisant, C., Grosjean, J., Bederson, B.B.: Spacetree: Supporting exploration in large node link tree, design evolution and empirical evaluation. In: Proc. IEEE InfoVis 2002, Washington, DC, USA, p. 57 (2002) 17. Rekimoto, J., Green, M.: The information cube: Using transparency in 3d information visualization. In: Proc. 3rd WITS 1993, pp. 125–132 (1993) 18. Robertson, G.G., Mackinlay, J.D., Card, S.K.: Cone trees: animated 3d visualizations of hierarchical information. In: Proc. CHI 1991, New York, USA, pp. 189–194 (1991) 19. Shneiderman, B.: Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph. 11(1), 92–99 (1992) 20. Stasko, J.T., Zhang, E.: Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In: INFOVIS, p. 57 (2000) 21. Wattenberg, M.: Visualizing the stock market. In: CHI 1999 extended abstracts on Human factors in computing systems, New York, USA, pp. 188–189 (1999) 22. Yang, J., Ward, M.O., Rundensteiner, E.A.: Interring: An interactive tool for visually navigating and manipulating hierarchical structures. In: Proc. IEEE InfoVis 2002, Washington, DC, USA, p. 77 (2002)

Suggest Documents