Beyond visualization of big data: a multi-stage data exploration approach using visualization, sonification, and storification a
Jeffrey Rimlanda*, Mark Ballorab, Wade Shumakera College of Information Sciences and Technology, Penn State University, University Park, PA. 16802-6822; bCollege of Arts and Architecture, University Park, PA. 16802-6822 ABSTRACT
As the sheer volume of data grows exponentially, it becomes increasingly difficult for existing visualization techniques to keep pace. The sonification field attempts to address this issue by enlisting our auditory senses to detect anomalies or complex events that are difficult to detect via visualization alone. Storification attempts to improve analyst understanding by converting data streams into organized narratives describing the data at a higher level of abstraction than the input stream that they area derived from. While these techniques hold a great deal of promise, they also each have a unique set of challenges that must be overcome. Sonification techniques must represent a broad variety of distributed heterogeneous data and present it to the analyst/listener in a manner that doesn’t require extended listening – as visual “snapshots” are useful but auditory sounds only exist over time. Storification still faces many human-computer interface (HCI) challenges as well as technical hurdles related to automatically generating a logical narrative from lower-level data streams. This paper proposes a novel approach that utilizes a service oriented architecture (SOA)-based hybrid visualization/ sonification / storification framework to enable distributed human-in-the-loop processing of data in a manner that makes optimized usage of both visual and auditory processing pathways while also leveraging the value of narrative explication of data streams. It addresses the benefits and shortcomings of each processing modality and discusses information infrastructure and data representation concerns required with their utilization in a distributed environment. We present a generalizable approach with a broad range of applications including cyber security, medical informatics, facilitation of energy savings in “smart” buildings, and detection of natural and man-made disasters. Keywords: Sonification, visualization, big data, analytics
1. INTRODUCTION The visualization field has a long and well-documented history of improving human understanding of data. Although William Playfair (1759-1823) is commonly considered to be the “father of visualization” for his invention of the line graph, bar chart, pie chart, and circle graph, many precursors of visualization date even earlier to mapmaking, navigation, and astronomy [1]. In the last decade, open source online data and enhanced browser-based graphical capabilities have helped interactive visualization to become a ubiquitous method for exploring everything from governmental budgets [2] to body motions of Olympic swimmers [3]. However, the recent explosion of “big data” has resulted in a degree of data “volume, variety, and velocity” [4] that challenges the current notion of visualization for understanding data and using it to improve understanding of a situation. Much as Anton van Leeuwenhoek’s invention of the microscope in the 1670s revolutionized science by allowing scientists to observe microbes in droplets of water and human blood [5], big data exploration is poised to improve our understanding of nearly every aspect of human endeavor. However, new methods are necessary for realizing these improvements.
*
[email protected]; phone 1 814 867-3103; fax 1 814 863-5890; http://nc2if.psu.edu
Due to its high sensitivity to dynamic changes, the auditory system is well suited to augmenting the understanding of data. Sonification is gaining acceptance as an analytical tool [6], and debates continue regarding the benefits of representing sound with information, what the most effective methods are for utilizing sonification, and what the drawback are to these approaches. We have previously presented potential applications at prior SPIE conferences ([7], [8], [9]). This paper introduces a new category of application that is sonically simpler than prior efforts, but more tightly integrated into a cohesive architecture using visualization, sonification, and other techniques in the context of a distributed service-oriented architecture (SOA) for a new generation of applications. Three aspects of this approach include: 1) sonification of the inner workings (e.g. sub-threshold event detection and undefined patterns) of Complex Event Processing; 2) using sonification, visualization, and storification as a parallelized multi-stage approach; and 3) sonifying the difference between expected and observed values.
2. SONIFICATION AND COMPLEX EVENT PROCESSING (CEP) 2.1 Introduction to Complex Event Processing The Complex Event Processing (CEP) paradigm attempts to address the difficulty of automated/semi-automated processing of situations that are difficult to detect due to sheer size of the dataset (volume), highly heterogeneous data from many distributed input streams (variety), and having tight correlation with dynamic time windows (velocity) [10] (See Figure 1).
Figure 1: Complex Event Processing relies on application of low-level rules to accomplish detection of higher-level event hierarchies.
In the figure above, the red and blue rectangles represent potentially correlated events occurring within a common time window but from multiple data streams. In this example, the red block represents a weather satellite detecting a possible tornado and the blue block might be a twitter report mentioning very high winds and the observation of a funnel cloud in the same approximate area at the same approximate time. In this case, the green block would represent the aggregation of the satellite detection and the twitter report into a higher-level event of “probable tornado.” That event could then be placed into a different data stream for additional aggregation with other weather events (for example) into weather trends. This mechanism can be very effective for rapid detection of patterns and relationships that are well known at the time of system setup and configuration. However, many current domains are categorized by rapidly changing patterns and evolving relationships among events and entities.
2.2 The Shortcoming of Sub-Threshold Event Detection Figure 2 shows an example StreamBase CEP layout in which textual input in the form of weather reports and sensor data representing tornadic vortex signatures (TVSs) [11] are temporally and geospatially fused. In this (relatively simple) case, textual data is pre-filtered by the existence of certain keywords and whether the weather reporter is known to be trusted. The “hard” TVS data is filtered based on the strength of the TVS. Even this simple example raises a difficult question: How are thresholds (e.g. which TVSs are classified as weak) at each step determined? This could be accomplished via machine learning algorithms (such as neural networks) if adequate training data exists, but in practice these constants are often determined by what amounts to an educated guess (or worse). This can result in a condition where significant but sub-threshold data becomes filtered out of the system and never brought to the attention of a human analyst.
Figure 2: StreamBase Complex Event Processing layout for fusion of hard and soft weather data
One approach to this problem is to create a visualization display showing the system status and allowing a user to monitor ratios of threshold to sub-threshold data and % threshold data. However, analysts are often already overloaded with visual displays that must be monitored. To continue the above example, a severe weather analyst may be tasked with monitoring a geospatial display showing the location of TVS events as well as a textual display showing reports from weather observers. Adding a third display showing graphs or more columns of numeric data reflecting the inner workings of the alerting system will only further distract the user. Situations in which the user is already required to actively monitor visual displays can often benefit from the use of auditory display (AD) technology to present additional information. In facilities (such as Penn State University’s Extreme Events Lab (EEL) (see figure 3)) equipped with the capability to produce three-dimensional sound in addition to multiple three-dimensional visualizations, it is possible to use the auditory channels to represent status of system meta-processes while the visual channels are being utilized for actual data analysis. Thus, sound can be used to calibrate a visual analysis display. In such cases of 3D audio and multiple displays, it can be useful to have one or more dimensions of sound localization correspond to the physical location of related visual displays within the room. For example, if the displays on the left side of the EEL are being used to observe textual weather reports and the displays on the right side of the EEL are displaying geospatial views of satellite and radar TVS events, it might be helpful to pan the audio between left and right channels to reflect sub-threshold events or other selected attributes of the data. Additionally, panning from front to rear audio channels could correspond to low-to-high events within the CEP hierarchy. Since humans are unable to discern fine changes in front-back panning, an additional auditory cue, such as pitch change, should accompany the panning. The user/analyst could then sense contradictions between the audio and video channels as an indicator that events thresholds (or some other aspect of the system) should be retuned.
Figure 3: The Extreme Events Lab (EEL) at Penn State University is equipped with 8-channel audio in addition to stereoscopic 3D projector and LCD displays.
In this example, a synthesized piano-like tone might be played for each sub-threshold event within the streams of weather data. The distribution from front to rear and left to right should be correlated with the volume of events that reach the CEP thresholds and are shown on the visual displays. If the audio consistently surges to the front right of the room, yet the geospatial display is not indicating any TVS events that exceed the alert threshold, then it is likely that the threshold set on this event should be lowered.
3. MAPPING DATA TO SOUND 3.1 Pitch, Loudness, Timbre, and Localization In the previous section, the example assumed a well-equipped facility consisting of multiple visual displays and multichannel sound. In more modestly equipped listening environments, other attributes of sound can be used to represent data. Four key sound attributes that are useful for sonification are pitch (frequency), loudness (amplitude), timbre, and localization [12]. The previous sections focused on localization so the other three will be addressed here. Pitch is one of the most effective starting points for sonification because humans are typically capable of discerning approximately 400 steps of pitch. The minimum amount of difference that a human can detect is referred to as the “just noticeable difference” (JND) [13]. Loudness is another effective mechanism for mapping data to sound. Although there are only approximately 50-100 discernable steps of JND, humans are adept at intuitively recognizing lower volumes as “background” and higher volumes as “foreground”, so volume is an effective way to reflect the significance, relevance, or urgency. Timbre is somewhat of a catchall term that McAdams and Bregman [14] define as “anything that cannot be labeled pitch or loudness.” Timbre modulations (e.g. coloration, attack, brightness, tremolo) can add many layers of distinction, enabling high-dimensional data representations. 3.2 Expected vs. Observed Values Although there are many instances where the goal of sonification is exploratory data analysis (EDA), there are also many applications (such as monitoring industrial processes, patient health, computer network utilization, or building energy usage) where the goal of sonification is to compare an expected or ideal value with an actual measurement value. This idea dates back several millennia. In [15], Worrall infers from Boyd [16] that sound may have been part of an accounting system used by the Mesopotamians in 3500 BCE, wherein commodities remaining in or moving to or from royal warehouses were represented as sound so that quantities could be quickly checked for consistency and accuracy. There are a variety of ways to perform comparisons using auditory tools. For example, two oscillators that are very close in frequency can implement any situational awareness task that involves the comparison of two signals. If the two signals are the same, the sound will be perceived as though it were a single oscillator. If the signals diverge, then beating
will occur at their difference frequency (see Figure 4 for a code sample of this). Piano tuners have long used this behavior of beating frequencies, and the human ability to recognize them is well documented [17]. If signals diverge even more, and the frequencies reflect this greater divergence, then the sound will evolve into a rough quality as a result of the increasingly fast beating, and at a certain point the two sounds will segregate and be perceived as two simultaneous pitches. This quality could be an effective indicator of the similarity or differences between two states or signals.
Figure 4: SuperCollider code demonstrating the use of two sine wave oscillators with similar frequencies as an indication of similarity between process values.
In the case of a physician performing rounds, this method could be used to facilitate the requirement to review the health status of many patients in a short amount of time. The attending physician could understand differences between past and current vital signs presented in this manner very rapidly – even as their visual attention is used to observe and interact with the patient. If the physician detected an anomaly in the audible presentation, they could further explore details of the patient’s condition using visualizations or the patient’s charts. Monitoring energy usage is another application of increasing concern, particularly in businesses that occupy large building spaces. Energy rates are often updated every twenty minutes, and can represent large expenditures. Creative efforts are being made to make employees aware of their energy use and nudge them to conserve when possible by turning off lights or appliances [18]. The multi-oscillator approach comparing the ideal energy utilization with the actual current value could be used to generate a tone to be played either intermittently or when energy utilization is increased (such as when a light switch is turned on or air conditioning load is increased). When current levels are running high, this could serve as a subtle cue to encourage people to turn off unneeded electronics, or allow managers or energy producers to quickly ascertain aggregated usage information.
4. STORIFICATION The ability of Complex Event Processing (CEP) techniques to filter and aggregate multiple streams of data into a more coherent and semantically informed hierarchy can be used to enable the organization of large and heterogeneous data sets into a narrative structure that is relevant and useful to the analyst-in-the-loop or other user of the system. Creating a data-driven narrative is known as storification [19], and the best-known example of this paradigm is the Facebook “News Feed” [20]. Although there are many difficulties inherent to automatically converting streams of data into a logical “story”, there are lessons to be learned from this pursuit that can be utilized in any domain where the goal is improving situational awareness and rapidly understanding large amounts of data. The Virtual Reality (VR) and Virtual Environment (VE) [21] communities have long explored the concept of emergent narrative, in which certain rules exist ahead of time, but the actual chain of events that occurs is largely a product of the user(s) interaction. An example of this type of interaction is a sporting event such as a tennis match where the rules (e.g. the allowed equipment, play surface, scoring method, etc.) are defined ahead of time, but the actual story of the match emerges in real time as the opposing players (each with their own strengths, weaknesses, and emotions), as well as the environments (via strong winds blowing a ball out of bounds, or dust blowing into a player’s eye) each contribute to the emerging narrative. Such emergence can be an analogy to a machine-augmented sense-making environment in which an analyst begins with certain a priori knowledge, guidelines, and tools; and then proceeds along a path that is driven by the decisions of the analyst, evolving actions that occur within the environment (e.g. a tornado shifting direction toward a populated area), and the intervention of other human and machine resources. As this process continues, the number of possible and probable options becomes smaller, which allows the analyst to focus on a smaller search space of hypotheses.
Figure 5: Illustration of Laurel’s “Flying Wedge” model for constraining probabilities of future events and interactions as the narrative develops (from [22]).
Brenda Laurel has described this phenomenon using the “Flying Wedge” shown in Figure 5. This model is illustrative of increasing narrative development resulting in decreasing options to evaluate and possible outcomes. In the past, converting large heterogeneous data streams into a coherent and useful narrative has been largely infeasible due to the difficulty of assembling semantic hierarchies out of low-level data streams. However, recent advances, including the integration of Complex Event Processing (CEP) and Multi-Agent Systems (MAS) described above, have increased the number of situations where storification could potentially be a useful option.
5. A MULTI-STAGE, MULTI-MODALITY APPROACH 5.1 An Architectural Approach None of these techniques alone are adequate for addressing the “three Vs” of Big Data (volume, variety, and velocity) referenced above. Each modality has its own advantages and limitations and there are no one-size-fits-all solutions. However, we present an architectural suggestion in Figure 6 and a set of general guidelines in section 5.2 below. The goal of the infrastructure shown in Figure 6 to accept heterogeneous input data from a variety of streams and allow a human user (or users) of the system to make queries on that data and receive results (or alerts) in a manner that is most conceptually, temporally, and geospatially relevant to user and presented in a way that maximizes the analyst’s understanding while minimizing their workload and expenditure of Human Attention Units (HAUs) [23]. Incoming data is processed for feature extraction by various tools and techniques (depending on the data and desired attributes) and then translated into a common representation schema for data co-registration and fusion [24]. Those feature vectors are then used in conjunction with raw data by an integrated system of Complex Event Processing (CEP) tools, semantic data storage, and multi-agent systems for hypothesis generation and dynamic display generation. The multi-agent system evaluates the capabilities of the analyst, the attributes of the data itself, and the nature of pending queries or alerts when determining the information to display and the modality (visual, auditory, or narrative) that will be used to display it. As discussed in section 2, the information display process is dynamic and interactive, with the user’s decisions and the evolving information in the system modulating the presentation of the data. A more detailed explanation of the information architecture shown in Figure 6 is available in [25].
Figure 6: An architectural approach to hybrid-sensing, hybrid-cognition of large complex datasets using sonification for human-inthe-loop interaction.
5.2 General Guidelines for Visualization, Sonification, and Storification Visualization has the longest history, largest body of research, and is most immediately recognizable by the majority of users. It is highly effective for viewing large quantities of data in multiple dimensions. When scaled and presented properly, high quality visual displays can represent sufficient precision and accuracy for almost any data set. For all of its benefits, however, there are weaknesses to visual displays. When the user is responsible for monitoring multiple visual displays (or attending to “real world” visual tasks such as flying an airplane, performing surgery, or monitoring severe weather), it can be distracting to add additional visual displays. Additionally, display of multi-dimensional data can become visually overwhelming and lead to confusion instead of improving situational awareness. Furthermore, visual displays are of no use to users who are blind or visually impaired due to either permanent or temporary conditions (such as dense smoke). Although sonification by itself is less versatile and typically less powerful than its older brother visualization, the combination of sonification and visualization has been shown [26] to be more effective in many circumstances than either modality alone. Replacing certain aspects of highly dimensional or multivariate visual displays by adding welldesigned auditory displays can reduce perceived workload and improve understanding of rapidly evolving situations [26]. Careful design of integrated sonification / visualization systems can be informed by the manner in which humans interact with sounds and sights in nature. Looking toward an alarming sound is a natural response that can be accommodated by localizing auditory alerts to direct the user toward relevant visual displays. In nature, volume level of a sound often corresponds to threat. Volume levels of specific sounds corresponding to each visual display can help users develop an intuitive understanding of which visual displays should be attended to at a given time. Storification is arguably the newest and least tested contribution in this arena. However, of the techniques presented here, the automated creation of a brief narrative describing large amounts of complex data affords the highest benefit in terms of information compression. Although this remains a “holy grail” of information analysis, certain aspects of this
process are becoming more feasible as improvements in soft data / complex event processing, human-in-the-loop interaction, and processing speed improvements yield benefits in this area. It can be argued that a narrative provides a more nuanced representation of certain data than visualization alone (imagine the difficulty of viewing your Facebook News Feed as a giant graph instead of brief textual “blurbs”). At this point, the contributions of storification are most promising when either 1) well-defined a priori patterns can be detected and “compressed” into higher-level narrative, or 2) the primary goal is description of autonomous or semi-autonomous participants in a virtual environment, as opposed to analysis. Additionally, storification can be a useful option for viewing a relatively small quantity of nuanced data that has been selected from a much larger data set via the combined visualization and sonification techniques described above.
6. CONCLUSION This paper has presented new approaches and given recommendations for multi-modality data exploration using visualization, sonification, and storification. Given the current trend of data increasing exponentially in volume, becoming more heterogeneous, and requiring more rapid analysis and understanding, it is critical to avoid a one-sizefits-all approach. Instead, one must consider the nature of the data, the goals of the analysis, and the capabilities of both software systems and humans-in-the-loop when designing a comprehensive system for using data to increase understanding of a situation.
7. ACKNOWLEDGEMENT We gratefully acknowledge that this research activity has been supported in part by a Multidisciplinary University Research Initiative (MURI) grant (Number W911NF-09-1-0392) for “Unified Research on Network-based Hard/Soft Information Fusion”, issued by the US Army Research Office (ARO) under the program management of Dr. John Lavery.
REFERENCES [1] M. Friendly, "A brief history of data visualization", [Handbook of Data Visualization], p. 15-56 (2008). [2] L. Ding, D. DiFranzo, S. Magidson, D.L. McGuinness and J. Hendler, "The data-gov wiki: a semantic web portal for linked government data", ISWC (2009). [3] C. Kirmizibayrak, J. Honorio, X. Jiang, R. Mark and J.K. Hahn, "Digital Analysis and Visualization of Swimming Motion", International Journal of Virtual Reality, vol. 10, p. 9 (2011). [4] P. Zikopoulos and C. Eaton, [Understanding big data: Analytics for enterprise class Hadoop and streaming data], McGraw-Hill Osborne Media (2011). [5] E. Brynjolfsson and A. McAfee, "The Big Data Boom is the Innovation Story of Our Time", The Atlantic (2011). [6] T. Hermann, A. Hunt and J.G. Neuhoff, [The sonification handbook], Logos Verlag (2011). [7] M. Ballora and D.L. Hall, "Do you see what I hear: experiments in multi-channel sound and 3D visualization for network monitoring?", SPIE Defense, Security, and Sensing, p. 77090J-77090J (2010). [8] M. Ballora, N.A. Giacobe and D.L. Hall, "Songs of cyberspace: an update on sonifications of network traffic to support situational awareness", SPIE Defense, Security, and Sensing, p. 80640P-80640P (2011). [9] M. Ballora, R.J. Cole, H. Kruesi, H. Greene, G. Monahan and D.L. Hall, "Use of sonification in the detection of anomalous events", SPIE Defense, Security, and Sensing, p. 84070S-84070S (2012). [10] N. Museux, J. Mattioli, C. Laudy and H. Soubaras, "Complex event processing approach for strategic intelligence", Information Fusion, 2006 9th International Conference on, p. 1-8 (2006). [11] J.G. Wieler, "Real-time automated detection of mesocyclones and tornadic vortex signatures", Journal of Atmospheric and Oceanic Technology, vol. 3, p. 98-113 (1986). [12] S. Wilson, D. Cottle and N. Collins, [The SuperCollider Book], The MIT Press, (2011). [13] R.G. Klumpp and H.R. Eady, "Some measurements of interaural time difference thresholds", The Journal of the Acoustical Society of America, vol. 28, p. 859 (1956). [14] S. McAdams and A. Bregman, "Hearing musical streams", Computer Music Journal, vol. 3, p. 26-60 (1979).
[15] D. Worrall, “Sonification and Information: Concepts, instruments and techniques”, University of Canberra, (2009). [16] J.S. Mackay, E. Boyd, J.R. Fogo, A. Sloan and J. Patrick, [A history of accounting and accountants], TC & EC Jack, (1905). [17] G. Oster, "Auditory beats in the brain", Scientific American, vol. 229, p. 94-102 (1973). [18] B. Orland, "Energy Chickens: part of the Energy Efficient Building Hub ", http://seriousgames.psu.edu/energychickens/demo/, Web reference accessed March 2013 (2013). [19] R. Aylett, "Emergent narrative, social immersion and “storification”", Proceedings of the 1st International Workshop on Narrative and Interactive Learning Environments, p. 35-44 (2000). [20] M. Zuckerberg, R. Sanghvi, A. Bosworth, C. Cox, A. Sittig, C. Hughes, K. Geminder and D. Corson, "Dynamically providing a news feed about a user of a social network", U.S. Patent No. 7,669,123 (2010). [21] R. Aylett, "Narrative in virtual environments-towards emergent narrative", Proceedings of the AAAI fall symposium on narrative intelligence, p. 83-86 (1999). [22] S. Louchart, I. Swartjes, M. Kriegel and R. Aylett, "Purposeful authoring for emergent narrative", [Interactive Storytelling], p. 273-284 (2008). [23] D.L. Hall and J.M. Jordan, [Human-centered Information Fusion], Artech House, Boston (2010). [24] D.L. Hall and S.A.H. McMullen, [Mathematical Techniques in Multisensor Data Fusion], Artech House, Boston (2004). [25] J. Rimland, "Hybrid Human-Computer Distributed Sense-Making: Extending the SOA Paradigm for Dynamic Adjudication and Optimization of Human and Computer Roles", A Dissertation in Information Sciences and Technology, The Pennsylvania State University (2013). [26] M. Watson and P. Sanderson, "Sonification supports eyes-free respiratory monitoring and task time-sharing", Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 46, p. 497-517 (2004).