Unravelling Unstructured Data: A Wealth of Information in Big Data Mona Tanwar1, Reena Duggal2, Sunil Kumar Khatri3 1-3
Amity Institute of Information Technology Amity University Uttar Pradesh, Noida, India 1
[email protected] [email protected] 3
[email protected],
[email protected] 2
Abstract— Big Data is data of high volume and high variety being produced or generated at high velocity which cannot be stored, managed, processed or analyzed using the existing traditional software tools, techniques and architectures. With big data many challenges such as scale, heterogeneity, speed and privacy are associated but there are opportunities as well. Potential information is locked in big data which if properly leveraged will make a huge difference to business. With the help of big data analytics, meaningful insights can be extracted from big data which is heterogeneous in nature comprising of structured, unstructured and semi-structured content. One prime challenge in big data analytics is that nearly 95% data is unstructured. This paper describes what big data and big data analytics is. A review of different techniques and approaches to analyze unstructured data is given. This paper emphasizes the importance of analysis of unstructured data along with structured data in business to extract holistic insights. The need for appropriate and efficient analytical methods for knowledge discovery from huge volumes of heterogeneous data in unstructured formats has been highlighted. Keywords— Big Data, Unstructured data, Text Analytics, Audio Analytics, Video Analytics, Social Media Analytics
I. INTRODUCTION Big data has caught attention of professionals, academicians and researchers since it came into light. This paper explores various aspects of big data and big data analytics. Various definitions have come up during this phase when professionals throughout the world were putting in endeavors to understand and state what big data is. The first thing that comes to mind is its size but it is much more than that. IDC defines big data as "Big data technologies describe a new generation of technologies and architecture designed to economically extract value from very large volumes of a wide variety of data, enabling high velocity capture, discovery and/or analysis " in 2011 [1]. TechAmerica Foundation defines big data as “Big data is a term that describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.” [2]. The traditional data management and analysis systems like Relational Database Management Systems (RDBMS) are not
978-1-4673-7231-2/15/$31.00 ©2015 IEEE
suitable and adequate to process big data as they are based on structured data which is a very small fraction of big data and secondly because they are not scalable to the extremely high rate of generation of big data. The description of big data is incomplete without mentioning the three V’s of big data which are volume, variety and velocity. These are the fundamental characteristics of big data as given in Fig. 1 [3]. 1. Volume – It refers to the high magnitude of big data which is in the order of terabytes to petabytes and more. For instance, some earlier estimates suggested that 20 petabytes of storage space was used to store 260 billion Facebook photos. In 2010, it was reported that up to one million photographs were processed by Facebook per second [4]. Twitter generates 12 terabytes of data daily [5]. In 2012, Facebook stated that 2.7 billion “likes” and “comments” were registered daily by the users [6]. 2. Variety – Massive volumes of data of heterogeneous nature is generated by different sources. It consists of structured, semi-structured and unstructured data. Structured data has a fixed format whereas unstructured data has no fixed schema or format. 3. Velocity – Big data is being generated continuously at an exponential rate. 90% of current data is generated in last two years [7, 8]. Social media is one major contributor which is generating data explosively. Sensors, smart phones and internet are leading to huge data feeds. The explosive rate of growth of big data presents tremendous opportunity and will yield big economic gains if correctly exploited. The resources needed for exploiting big data are falling short because of the high rate of growth of data. Eric Schmidt, the CEO of Google in the Lake Tahoe Technology Conference held in 2010 [9] quoted that “Between the dawn of civilization through 2003, just 5 exabytes of information was created. That much information is now created every two days and the pace is increasing. People aren’t ready for the technology revolution that’s going to happen to them.” According to a recent study, such amount of data is being generated in every 10 minutes now [10]. It has also been estimated that more than 85% of Fortune 500 organizations will fail to exploit big data for competitive advantage [11]. They will lag behind the 15% organizations that will leverage big data. Organizations that exploit big data
will have a competitive advantage over other o organizations [12, 13]. Nearly 95% of big data is unstructuredd [2]. To improve business performance, insights need to be gained from unstructured data and integrated with othher data to get the holistic picture so that better business decissions can be made. Organizations that would leverage unstrucctured data as well along with structured data would have a competitive edge over others. This has been stated time and again.
Fig.1 Characteristics of Big Datta
II. WHAT IS “BIG DATA ANALY YTICS"? Big data analytics is the process of collecting, storing and analyzing huge volumes of high velocitty diverse data to extract hidden patterns and meaningful innsights. The results can be used for decision making by prooviding meaningful insights which help business grow, help in discovering certain patterns in fields like weather forecast, social networks, online browsing, customer purchases, telecom andd location data. Big data analytics can help in making discoverry in various areas such as space science, research in varioous fields such as medicines, in analyzing various trends inn stock market to make predictions, in predicting customer demand, d in making future predictions for sales and market direections etc. During the last elections that were held in India, the t BJP party used big data analytics to prepare their strategiees for campaigning [14]. The information obtained by anallytics is used by organizations to get deeper knowledge so thhat evidence based decisions can be made faster to get a com mpetitive advantage over other organizations. For instance, Clickstream data enables online retailers to understand custoomer behavior and also the browsing patterns. It provides innformation on how much time the customers spent on which pages and the sequence of pages as well. This helpss in taking better decisions with respect to what kind of prodducts, services and deals to offer them. Such data could also help in improving the website designs. Data being generateed through various mobile apps contain huge information whhich could help in
generating personalized offers for the customers. Big data analytics enables organizations to create real-time intelligence from big data. Devices such as smart phones and sensors have wth of data generation and accelerated the rate of grow magnified the need of real-tim me analytics. The scope and benefits are ubiquitous and never ending. Therefore, to process and analyze big data, there t is always a need of cost effective, efficient and appropriate tools and techniques. The process of extracting infformation from big data can be divided into five phases [15]. The first phase is Acquisition and Recording of data, secondd is Extraction, Cleaning and Annotation, third is Inteegration, Aggregation and Representation, fourth is Modeelling and Analysis and finally the fifth is Interpretation. Thee first three phases are Data Management tasks and the lateer two are Analytics tasks. The data management process invollves acquisition and storage of data, and retrieving and preparinng it for analysis. The analytics process refers to analyzing and acquiring knowledge from big data. Analytics can further be categorized into Descriptive Analytics, Predictive Analyticcs and Prescriptive Analytics. Descriptive analytics is conneccted with business intelligence and is based on historical daata. Predictive analytics uses statistical data and is meant forr making predictions for future scope based on the data. Prescrriptive analytics is used to find out the optimized solution for thhe concerned problem having a set of constraints [16]. Handling heterogeneity in big data analytics becomes tedious and difficult. Apart froom heterogeneity, complexities arise in big data analytics becauuse of other factors as well such as noise accumulation, spurrious correlations, scale etc. Moreover, the conventional tecchniques are statistical and do not scale up to the need of bigg data. A small size of data set known as the sample is selecteed and relationships are found out of the sample. The conclussion is then generalized to the entire population but in case off big data, the sample would be s in this case. The massive. So it doesn’t hold significance computational efficiency of these t techniques is also not scalable on big data. III. ROLE OF UNSTTRUCTURED DATA Big data consists of struuctured, semi-structured and unstructured data as shown in Fig. F 2[17]. This is often referred as structural heterogeneity in big b data. Only 5% of existing data is structured [18]. Structuured data is tabular data which exists in the form of relationaal databases and spreadsheets. Data having no particular form mat, schema or structure is said to be unstructured, it can be inn any form such as text, audio, images and video for example, word files, PDF files, content of blogs, forums, tweets, emailss, web pages, audio files, video files, images etc. The internett is flooded with unstructured content. Nearly 95% of existinng data is unstructured, a vast portion of which is in the form of videos [2]. Major contributors to unstructured data d are social networks and sensors. In between structuredd and unstructured data, there exists semi-structured data whhich has no strict standard for example Extensible Markup Lannguage (XML) files, web logs,
sensor logs etc. XML enables exchange of data on web and is machine readable since it consists of user-defined data tags. Unstructured data adds to the complicacy of analytics process since mostly the machines require a structural organization of data for processing and analysis. But in order to have a competitive advantage over other organizations, the potential information locked in unstructured data needs to be extracted by organizations. The real value of unstructured data can be leveraged by deconstructing it which results in data enriched with metadata. This process converts it into semistructured content which can be used to gain insights. It has also been stated that for decision making, the decision makers cannot rely completely on structured data as they would miss the vast amount of information available on the open source unstructured web content [18]. To keep pace with the fast evolving information age, organizations will need to exploit unstructured content to extract patterns across complex data types for evidence based decision making. For predictive analytics, both structured and unstructured data are required. Without their integration and analysis, a complete picture cannot be obtained. For instance, while an organization may correlate its sales data with variables such as quarter of year or demographics of customer to track the sales record of some product or service, the organization may understand what is happening but the reason can be understood after analyzing social media data and call center logs i.e. unstructured data [19]. A wealth of information is stored in unstructured data which if leveraged properly can make a big difference to business. One major challenge in unstructured data analytics is that unstructured data is noisy [19]. It needs to be cleaned first and then it is analyzed after which it is integrated with structured data. Integration of unstructured data with structured data is a challenging task. Many real time challenges exist with unstructured data. There is a need of efficient cost-effective techniques and tools to exploit unstructured data in real time.
Information Extraction or IE techniques convert unstructured text into structured content. For instance, from medical prescriptions, structured data like drug dosage, composition and other information can be obtained. There are two sub tasks in information extraction process namely Entity Recognition and Relation Extraction [21]. ER or Entity Recognition finds name information in text data and classifies the information into categories like person, firm and year. RE or Relation Extraction finds and retrieves the semantic relationships between entities such as persons, company, etc from the text data. One another technique used in text analytics is Text Summarization [22]. These techniques are used to produce a summary from a single or multiple text sources. These techniques can be applied to various text data sources such as news, articles, emails, blogs and forums. There are two different approaches to the summarization techniques. One is Extractive and the second is Abstractive. The extractive approach creates summary by extracting original text units or sentences from the source document and therefore the summary is a subset of it. This approach requires no understanding of the document. On the other hand, the abstractive approach extracts semantic information from the document. The summaries generated by this approach do not contain the original text units necessarily. This approach uses advanced Natural Language Processing Techniques or NLP techniques to parse the text and produce the summary. For big data, extractive systems are more suitable as they are easier to adopt.
IV. VARIOUS TECHNIQUES OF UNRAVELLING UNSTRUCTURED DATA
In this section, the analytical approaches and techniques used for extracting information from unstructured data (text, audio and video) have been discussed. Social media analytics has also been discussed though it consists both of structured and unstructured data but unstructured data has a very large fraction in comparison to structured data. A. Text Analytics It refers to the techniques used for extracting or retrieving information or insights from text based data such as emails, documents, advertisements, forums, blogs, news content, social network content, website content, call center logs, customer comments and reviews, tweets etc. It involves Statistical Analysis, Computational Linguistics and Machine Learning. Meaningful insights are gained which support decision making in businesses. For example, by retrieving information from financial news, stock market can be predicted [20].
Fig. 2 Types of Big Data
There are Question Answering techniques or QA techniques also which provide answers to questions framed in natural language. Such systems have their applications in various fields such as finance, academics, healthcare and marketing. They are based on complex natural language processing techniques. Question answering techniques have three different approaches. The first is Information Retrieval approach, the second is Knowledge-based approach and the third is Hybrid approach. In the information retrieval or IR approach, firstly question processing is done to create an appropriate query from the question, then document
processing is done to extract relevant pre-written content from existing documents after which answer processing is done to retrieve the candidate answers which are ranked and the top ranked answer is returned as the solution or output. The knowledge based approach produces the semantic information of the question and then it is used for querying. This approach is suitable for restricted domains such as tourism and medicine as there are no large volumes of pre-written content in these domains. In the hybrid approach, the questions are semantically analyzed using the knowledge based approach whereas the candidate solutions are generated using the information retrieval approach. Sentiment Analysis or Opinion Mining techniques are the techniques which analyze opinionated text which consists of opinion or views of people towards entities such as individuals, products, brands, firms or events [23]. It is done in areas such as marketing, political and social science and finance. Sentiment analysis is done at document level, sentence level and aspect level. The document level techniques infer whether there is a positive or negative sentiment about a single entity in the whole document. In the sentence level techniques, the single sentiment is determined about the entity in the sentence. This is more complicated in comparison to document level techniques. The aspect based techniques determine all sentiments specific to the entity with respect to the different aspects of the entity in the document. This is useful where customer opinions towards different features of a product are reviewed. Text based content is increasing at an accelerated rate and the ability of business decision makers to extract useful insights from text data remains challenging. B. Audio Analytics It is the process of analyzing and extracting or retrieving information from unstructured audio content or data. It is known as Speech Analytics when applied to human spoken language [24, 25]. It is implemented in areas such as customer call centers and healthcare. In call centers, it is used to analyze recorded customer call content to evaluate the performance of the agents, to understand customer behavior, to identify issues of product and services, to improve the customer experience and to increase the sales rate. Live calls can also be analyzed and feedbacks can also be provided to the agents in real time. Products and services recommendations can be formulated based on previous and current interactions with the customers. In healthcare, audio analytics is useful in diagnosing and treating diseases which affect the communication patterns of the patients like Depression, Cancer and Schizophrenia [24]. Infant’s cries can also be analyzed to gain information about the infant’s health and emotional status [25]. There are two approaches for speech analytics: the Transcript-based approach and the Phonetic-based approach. The transcript-based approach which is also known as the Large Vocabulary Continuous Speech Recognition (LVCSR) approach is further divided into two phases: Indexing and Searching. In the indexing phase, Automatic Speech Recognition (ASR) algorithms are used to match sounds with words which are identified with the help of a predefined
dictionary. If the system fails to do so, the most similar word is returned. The output of this system is a file which contains the sequence of the spoken words in the speech. In the searching phase, to find the search term, standard text-based methods are implemented. The phonetics based approach works with sounds or phonemes which helps in distinguishing one word from the other. This approach can also be further divided into two phases which are phonetic indexing and searching. In the indexing phase, the input speech is translated into a sequence of phonemes. In the searching phase, the search terms are phonetically represented by searching the output obtained from the previous phase. C. Video Analytics It is the process of monitoring, analyzing and gaining meaningful insights from video streams. It is also referred as Video Content Analysis (VCA) and is still in its initial stage [26]. There are techniques to process real-time and prerecorded videos. The major contributors to video data are Closed-Circuit Television (CCTV) cameras and the videosharing websites such as YouTube. One prime challenge in video analytics is the huge size of video data. Over two thousand pages of text data is equivalent to just one second of a High Definition (HD) video [27]. Every minute hundred hours of video content is uploaded to YouTube [28]. High volume of video content poses a challenge but opportunity as well with the help of big data technologies. Intelligence can be drawn from thousands of hours of video content. Video analytics has been applied in areas such as automated security, monitoring and surveillance systems for detecting breaches, identifying thefts, detecting littering of areas, keeping a check on suspicious activities or objects left unattended. Upon detection of any such activity the security officers may be notified in real time or a necessary automatic action may be taken such as turning on the sound alarms, locking the doors or turning on the lights. Labor-based surveillance systems are costly and are less effective in comparison to automatic systems. It has been observed that security personnel cannot maintain their focus for more than twenty minutes on the video content [29]. The content of CCTV cameras placed in retail outlets can be analyzed for extracting information. Such information can help in better marketing and operations management. The retailers can extract information such as the number of customers, their demographic information like age and gender, the time duration for which they were in the store, their movement patterns and the time duration for which they were in different sections. Queues can also be monitored in real time. Meaningful insights can be extracted by correlating such information with customer demographics to make better decisions for product price and placement, deals and offers, making combos, staffing, product promotion strategies and layout optimization. The customer group’s buying behavior, group size, group’s demographics and the buying behavior of individual members of the group can also be analyzed. One another application of video analytics is Automatic Video Indexing and Retrieval for easy search and retrieval of videos. For video searching and retrieval, there is a metadata based approach in which relational database management
systems are used. In the soundtracks and transcripts approach, video indexing can be done by applying audio analytics and text analytics [30]. There are two different system architecture approaches in video analytics: Server-based architecture and Edge-based architecture [31]. In server-based architecture, a centralized server performs video analytics on the videos captured by each camera. The limitation to this approach is that the bandwidth is limited and hence the videos are compressed by either reducing the image resolution and/or the frame rates. And therefore, the accuracy of the analysis gets affected because of loss of information. But maintenance is easier in this approach and it also provides economies of scale. In edgebased architecture, video analytics is applied on the video content captured by the camera locally. There is no loss of information in this approach and the content analysis is more effective. The maintenance of such systems is costly and the processing power is lower with respect to the server-based systems. D. Social Media Analytics It is the analysis of structured and unstructured content of social media channels such as social networks, social news, blogs, micro blogs, media sharing, wikis, social bookmarking, question-and-answer sites and review sites [32]. Many mobile apps also facilitate social interactions and therefore are social media channels. Research on social media is done in various disciplines such as psychology, sociology, computer science, mathematics, economics, physics and anthropology. In recent years, the results of social media analytics have been helpful for marketing because of widespread adoption of social media by users throughout the world [33]. In social media, the two sources of information are the content (images, audios, customer feedbacks, product reviews, videos, bookmarks, sentiments etc.) generated by users and the relationships between the entities of network (people, organizations, products etc.). The social media analytics can be categorized into two parts: Content-based analytics and Structure-based analytics [34]. In content-based analytics, analytics is performed on the content posted by the users on the social media platforms. Such content is of high volume, unstructured, noisy and dynamic nature. To extract insights from such data, the text, audio and video analytics techniques can be applied. The data processing challenges are addressed by the big data technologies. In structure-based analytics, the focus is on the structural attributes of the social network. Insights are extracted from the relationships of the entities. A social network is represented by a graph consisting of a set of nodes representing the participants and a set of edges representing the relationships between the entities or participants. The network graphs can further be categorized into two kinds of graphs: Social graphs and Activity graphs [35]. In social graphs, the edges represent the existence of links like friendship between the respective entities. Communities or hubs could be determined by such data. In activity networks, actual interactions between entities are represented by the edges. There is exchange of information between the linked entities such as likes and comments.
Activity graphs are more preferable in comparison to social graphs from analytics point of view as active relationships are more informative. Community detection or discovery is a technique which extracts implicit communities of a network. Communities are sub-networks where the entities interact more with each other with respect to the entire network. Behavioral patterns and the properties of the network can be extracted from such data. Community detection has similarity with clustering in this respect [36]. It has various application areas such as marketing and the World Wide Web (WWW) [37]. It is helpful in developing better product recommendation systems. Social Influence Analysis analyzes the influence of entities and connections in a social network. It is based on the assumption that the behavior of an entity or participant is influenced or affected by others. Such data gives an insight into the actors influence, strength of connections and the patterns of influence in the network. This technique is helpful in marketing to enhance awareness of brands and products and their adoption. Link Prediction is a technique which predicts possible future linkages between the existing entities in the social network [34]. A social network keeps growing over the time and new edges and nodes keep adding up. The goal is to understand and predict the possible interactions, collaboration or influence among the participants of a social network in a given time period. It has been observed that these techniques outperform pure chance by factors of 40-50 indicating that the present structure of the social network has potential information about future links [38]. By link predictions, recommendation systems are developed such as Facebook’s “People You May Know”, YouTube’s “Recommended for You” and Netflix’s and Amazon’s recommender engines. Besides the discussed techniques, other analytical techniques are evolving with time. This area is the focus of many researchers and professionals currently. The future will have cost effective efficient tools and techniques for analytics. V. CONCLUSIONS Big data has invaluable information which if extracted completely can make a big difference to business gains. In this paper, various aspects of big data analytics along with the role of unstructured data have been discussed. Various application areas have been described. The analytical techniques and approaches of unstructured data such as text, audio and video have been reviewed. These techniques include information extraction, summarization, question-answering and sentiment analysis for text analytics, the transcript-based and phoneticbased approach for audio analytics, automatic video indexing and retrieval for video analytics and content based and structure based approaches, community detection, social influence analysis and link prediction for social media analytics. It has also been emphasized that there is a need of knowledge discovery from unstructured data along with structured data in businesses to gain competitive edge over others. This fact has been supported by giving various real-life examples. Unstructured data has a wealth of information and
is in a very large fraction in comparison to structured data. In predictive analytics, both structured and unstructured data hold significance for extracting knowledge and making predictions in business. Cost effective and efficient tools and techniques are needed to analyze structured and unstructured data in real time. With data increasing over time, opportunities are arising in every field whether it is businesses, medicines, research, weather forecasting, etc. Data is a natural resource these days and if utilized effectively, the future will yield costeffective and critical insights to all mankind problems. We intend to work in this area in future and come up with better approaches to big data analytics using unstructured data. ACKNOWLEDGMENT Authors express their deep sense of gratitude to Founder President of Amity University, Dr. Ashok Chauhan for his keen interest in promoting research in Amity University and have always been an inspiration achieving great heights.
[17]
[18] [19]
[20]
[21]
the K. the for
[22] [23] [24]
REFERENCES [1] [2]
[3] [4]
[5] [6] [7] [8]
[9] [10] [11] [12] [13] [14]
[15] [16]
J. Gantz and D. Reinsel, “Extracting value from chaos,'' in Proc. IDC iView, pp. 1-12, 2011. Amir Gandomi and Murtaza Haider, “Beyond the hype: Big data concepts, methods, and analytics,” International Journal of Information Management, vol. 35, Issue 2, pp 137-144, April 2015. (2015) The Data Science Central website. [Online]. Available: www.datasciencecentral.com D. Beaver, S. Kumar, H. C. Li, J. Sobel and P. Vajgel, “Finding a needle in haystack: Facebook’s photo storage,” in Proc. The nineth USENIX Conference on Operating Systems Design and Implementation, Berkeley, CA, USA: USENIX Association, pp. 1– 8, 2010. K. Rupanagunta, D. Zakkam, and H. Rao, “How to Mine Unstructured Data,” Article in Information Management, June 29, 2012. S. Marche, “Is Facebook making us lonely,'' Atlantic, vol. 309, no. 4, pp. 60-69, 2012. “IBM What Is Big Data: Bring Big Data to the Enterprise,” IBM, 2012. [Online]. Available: http://www01.ibm.com/software/data/bigdata/. Reena Duggal, Balvinder Shukla and Sunil Kumar Khatri, “Big Data Analytics in Indian Healthcare System – Opportunities and Challenges” research paper accepted at National Conference on Computing, Communication and Information Processing (NCCCIP-2015) – May 2015: 92-104 Steve Pederson, “Exploiting Big Data from the Deep Web,” BrightPlanet Corporation, Tech. Report, Dec 17, 2012. Randy Rieland, "Big Data or Too Much Information," Smithsonian Magazine, May 7, 2012. Stephen Prentice, “From Data to Decision: Delivering Value from ‘Big Data,’” Gartner Inc., March 28, 2012. M. A. Beyer and D. W. Cearley, ‘“Big Data’ and Content Will Challenge IT across the Board,” Gartner Inc., February 15, 2012. M. Blechar, M. Adrian, T. Friedman, W. R. Schulte and D. Laney, “Predicts 2012: Information Infrastructure and Big Data,” Gartner Inc., November 29, 2011. Neerja Pawha Jetley, “How big data has changed India elections.” CNBC News. [Online]. available: http://www.cnbc.com/2014/04/10/how-big-data-have-changedindia-elections.html A. Labrinidis and H. V. Jagadish., “Challenges and opportunities with big data,” in Proc. VLDB Endowment, vol. 5(12), pp. 2032– 2033, 2012. (2013) G. Blackett.. Analytics Network- O. R. Analytics. [Online]. Available:
[25] [26] [27]
[28] [29] [30]
[31] [32] [33]
[34] [35] [36] [37]
[38]
http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNet work_anal%ytics.aspx Akash, Bryan, Nishi, Prashant, Subhadeep and Vaibhav, “Emerging Marketing Analytics- Big Data.” [Online]. Available: http://www.slideshare.net/AkashTyagi8/big-data-marketinganalytics K. Cukier, “Data, data everywhere: A special report on managing information,” The Economist, February 25, 2010. Steve Andriole, “Unstructured Data: The Other Side of Analytics.” Forbes. [Online]. Available: http://www.forbes.com/sites/steveandriole/2015/03/05/the-otherside-of-analytics/ W. Chung, “BizPro: Extracting and categorizing business intelligence factors from textual news articles,” International Journal of Information Management, vol. 34(2), pp. 272–284, 2014. J. Jiang, “Information extraction from text,” in C. C. Aggarwal, & C. Zhai (Eds.), Mining text data, Springer, pp. 11–41, 2012. U. Hahn and I. Mani, “The challenges of automatic summarization,” Computer, vol. 33(11), pp. 29–36, 2000. B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectures on Human Language Technologies, vol. 5(1), pp. 1– 167, 2012. J. Hirschberg, A. Hjalmarsson and N. Elhadad, “You’re as sick as you sound: Using computational approaches for modeling speaker state to gauge illness and recovery,” A. Neustein (Ed.), Advances in speech recognition, Springer, pp. 305–322, 2010. H. A. Patil, “Cry baby: Using spectrographic analysis to assess neonatal health status from an infant’s cry,” in A. Neustein (Ed.), Advances in speech recognition, Springer, pp. 323–348, 2010. B. K. Panigrahi, A. Abraham and S. Das, “Computational intelligence in power engineering,” Studies in Computational Intelligence, Springer, vol. 302, 2010. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, et al., “Big data: The next frontier for innovation, competition, and productivity,” McKinsey Global Institute Article, May 2011. YouTube Statistics (n.d.). [Online]. Available: http://www.youtube.com/yt/press/statistics.html A. Hakeem, H. Gupta, A. Kanaujia, T. E. Choe, K. Gunda, A. Scanlon, et al., “Video analytics for business intelligence,” Video analytics for business intelligence , Springer, pp. 309–354, 2012. W. Hu, N. Xie, L. Li, X. Zeng and S. Maybank, “A survey on visual content-based video indexing and retrieval,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transaction, vol. 41(6), pp. 797–819, 2011. Agent Comparative Analysis. [Online]. Available: http://www.agentvi.com/images/Video_Analytics_Architectures_ Comparative_Analysis.pdf G. Barbier, and H. Liu, “Data mining in social media,” C. C. Aggarwal (Ed.), Social network data analytics, Springer, pp. 327–352, 2011. W. He, S. Zha, and L. Li, “Social media competitive analysis and text mining: A case study in the pizza industry,” International Journal of Information Management, vol. 33(3), pp. 464–472, 2013. Charu C. Aggarwal, “An Introduction to Social Network Data Analysis,” Social Network Data Analytics, Springer, 2011. J. Heidemann, M. Klier, and F. Probst, “Online social networks: A survey of a global phenomenon,” Computer Networks, vol. 56(18), pp. 3866–3878, 2012. C. C. Aggarwal, “An introduction to social network data analytics,” C. C. Aggarwal (Ed.), Social network data analytics, Springer, pp. 1–15, 2011. S. Parthasarathy, Y. Ruan, and V. Satuluri, “Community discovery in social networks: Applications, methods and emerging trends,” C. C. Aggarwal (Ed.), Social network data analytics, Springer, pp. 79–113, 2011. D. Liben-Nowell, and J. Kleinberg, “The link prediction problem for social networks,” in Proc. The twelfth International Conference on Information and Knowledge Management, ACM, pp. 556–559, 2003.