Visual Sentiment Prediction with Transfer Learning and Big Data Analytics for smart cities. Kaoutar Ben Ahmed, Mohammed Bouhorma,. Mohamed Ben Ahmed.
Visual Sentiment Prediction with Transfer Learning and Big Data Analytics for smart cities Kaoutar Ben Ahmed, Mohammed Bouhorma, Mohamed Ben Ahmed
Atanas Radenski Schmid College of Science Chapman University Orange, CA 92866, USA
Computer science department FSTT, Abdelmalek Essaadi University Tangiers, Morocco
real time big data analyses are used to boost public safety effectiveness by integrating smart solutions in disaster and emergency management, and in law enforcement systems. Smart mobility solutions based on road sensors and intelligent transportation systems are employed in transport and environment fields aiming to reduce urban congestion and CO2 emission. Websites and cloud services make a number of city government services accessible for anyone over the Internet; they can also become smarter due to tracking and analytics technologies that can discover usage and access patterns and respond to citizen’s individual needs and preferences. Recommendation systems, mobile location-based applications, and other advanced ICTs offer city visitors novel and personalized experiences during which they are assisted by smartphone apps acting like electronic tourist guides updated with real time information about accommodation, dining options, weather, currency rates, nearby points of interest based on the visitor’s personal preferences or geographical location or other strategies and touch-sensitive maps are found in all stations to help tourist easily find their way. Social media triggered the raise of sentiment analysis, which brings new possibilities to city governance in general, and decision making in particular.
Abstract—Internet of things and social media platforms have changed the way people communicate and express themselves. People are now sharing their experiences and views in blogs, micro-blogs, comments, photos, videos, and other postings in social sites. It has been recognized that timely reactions to public opinions and sentiments and its proper use by city governments are of paramount importance to their missions. Despite of the widely recognized potential of sentiment analysis, relatively little is known about how to best harness its potential benefits for smart cities. The objective of this article is to help fill the void by reviewing the state of the art and opportunities of data sources and applications of sentiment analysis for smart cities. Additionally, The article explores deep features of photos shared by users in Twitter via transfer learning. Thus revealing interesting research opportunities and applications. Keywords—sentiment analysis; opinion mining; smart cities; big data; text mining
I.
INTRODUCTION
Smart (or smarter) cities "are urban areas that exploit operational data, such as that arising from traffic congestion, power consumption statistics, and public safety events, to optimize the operation of city services" [1]. A smart city’s main objective is to increase the quality of life for its citizens and to make the city more attractive, lively and greener. To achieve this goal, smart city technologies (SCTs) are fused with the traditional city’s infrastructure. SCTs refer to all the information and communications technologies (ICTs) that enable cities to harness big data gathered and analyzed in order to become connected and sustainable. The smart city concept emerged during the last decade as a fusion of ideas about how ICTs might improve the functioning of cities, enhancing their efficiency, improving their competitiveness, and providing new ways in which problems of poverty, social deprivation, and poor environment can be addressed. Urban communities worldwide are planning, developing and adopting digital systems and technologies to improve efficiency and quality of life for the citizens. According to a 2014 forecast, in 2025 there will be at least 26 global smart cities, about 50 percent of which will be located in North America and Europe [2].
Sentiment analysis is the examination of people’s opinions, sentiments, evaluations, appraisals, attitudes, emotions, and personal preferences towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [3]. Traditionally, sentiment analysis mines information from various text sources such as reviews, news, and blogs then classifies them on the basis of their polarity as positive, negative or neutral. An important preliminary task of sentiment analysis is to evaluate the subjective or objective nature of source texts. Subjectivity indicates that the text bears opinion content whereas objectivity indicates that the text is without opinion content. Recently, sentiment analysis aims to exploit audio, video, location, and other non-traditional data sources. In this paper, we focus on sentiment analysis as a promising smart city technology and offer a review of the state of the art and the opportunities in the areas of data sources (section 2), applications (section 3) and we propose a novel computational system for sentiment prediction of Twitter photos using pretrained deep neural networks. (section 4). We limit our considerations, unless otherwise required, to the smart city
Open data initiatives around the world make public data available online to wider audiences. Such initiatives have provided in recent years support for innovative projects, aiming the design of SCTs that enhance cities’ smartness. A number of technological developments illustrate this trend. For example,
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
800
application domain and do not aim to discuss sentiment analysis in its full generality.
analysis of citizens’ tweets can help identify critical events, such as road accidents, fires, street violence, etc., thus allowing to government agencies to provide fast response [35]. Moreover, sentiment analysis can assist decision-making processes and urban planning by providing a form of implicit feedback from the citizens ([36]; [8]). Argument mapping software is another application of sentiment analysis in the smart governance domain; it helps organize way policy statements by explicating the logical links between them. Argument mapping software is then useful to ensure that policy debates are logical and evidence-based. These tools would finally be helpful not only for policy-makers, but also for citizens who could more easily understand the key points of a discussion and participate to the policy-making process. Sentiment analysis is instrumental in identifying problems by listening to the public, rather than by actively asking for input by means of surveys, and in doing so it is capable of ensuring a more accurate reflection of reality [37]. City services ratings (Grade.DC.Gov) inferred from citizens’ sentiments help city leaders targeting their improvement efforts where needed. In support of the tourism industry, an increasing number of projects use smart mobile applications combined with open Wi-Fi infrastructure to deliver real time, personalized services to potential customers. Notable services include hotel reviews [38] and also location recommendations that are based on sentiment analysis of user tweets [5] and Foursquare check-ins and tips [6]. Sentiment analysis applications are found in politics as well. For instance, voting advise applications help voters understand which political party (or other voters) have closest positions to their own; to do so, the applications approximate voter preferences by deducting them from voters’ sentiments [37]. Facebook is mining user data in order to gauge sentiment about political issues and candidates. Such data are shared with ABC News and BuzzFeed to support the U.S. Election Day coverage; this has already been done in 2014 and is expected to happen again in 2016. Voter sentiment data are routinely used to predict election results ([4]; [39]). Sentiment analysis is used to predict the movements of economic indicators ([40]; [41]; [42]).
II. DATA SOURCES Twitter data streams have been recognized as a beneficial data source for smart cities; for example, a number of projects in different city areas such as politics [4] tourism ([5]; [6]) and law enforcement [7] employ the Twitter APIs to extract data sets that are then subjected to sentiment analysis. Apart from Twitter, other social networks also generate data for sentiment analysis that are currently used in various applications. For instance, geo-tagged content from Flickr and Foursquare is used to enhance decision making by the local government [8]. Facebook content is used by adaptive smart e-learning systems to detect the student’s emotional state in order to support personalized learning [9]. Web search queries and Yahoo! Answers website are employed as data sources to help finding interesting connections between community sentiments and factors such as gender, age, education level, the topic at hand, or even the time of the day ([10]; [11]). Web Forums and YouTube comments is crawled to generate a dataset for radicalization study of sentiment analysis in public safety context ([12]; [13]). A representative set of sentiment analysis applications is discussed in Section 3. New data sources are being investigated for use in sentiment analysis. Newspaper articles [14] can be a beneficial textual data source for sentiment analysis. Shared content by users on social media sites is often not only text but images as well and with the recent advances in visual sentiment analysis the now popular selfie can form a rich data source for sentiment analysis on social media. A few recent works attempted to predict visual sentiment using features from images [15];[16];[17];[18];[19]. There are also studies that analyze speech-based emotion recognition ([20]; [21]; [22]; [23]; [24]. Additionally, a growing number of studies [25]; [26]; [27]; [28]; [29]; [30] are concerned with multimodal sentiment analysis, integrating visual, acoustic and linguistic modalities. Some scholars [31]; [32] have claimed that the integration of visual, audio, and textual features can improve the analysis precision significantly over the individual use of one modality at a time leading to error rate reductions of up to 10.5%. Last, but not least, smartwatches are expected [33] to become capable of providing data on emotions: mood valence (positive vs. negative) and arousal (high vs. low). The location of smartwatches permits easy recording of heart rate variability and galvanic skin response (GSR). These two elements can be used to identify physiological arousal. With the advent of smartwatches and effective multisensor data collection, new algorithms (for sensor data fusion) might be developed that can identify valence without the need for processing a facial image.
A new range of opportunities is emerging in the smart city domain. Smart learning systems [43], for example, can personalize students’ learning experiences and improve education effectiveness by analyzing the student’s emotional state and then recommending suitable activities to be undertaken at the moment. On the other hand, the students’ sentiments towards a course can serve as feedback for teachers, especially in the case of online learning. . Minors’ safety in virtual environments can be ensured using sentiment analysis to detect abuse by pedophiles in chats [44]. Terrorist events can be predicted [45]. Data on emotions, generated by smart watches [33], in combination with coordinates, will lead to new applications that evaluate and provide in real time the sentiment of citizens, as they move about the city, on streets, in city offices, in parks, in the public transport, etc. Therefore, enhanced recommender systems for city services and events can be designed. Sentiment mapping adds location to sentiment analysis. By knowing the places that tweets or other sources of sentiment are uttered from, or the names of places they refer to, it is possible to build a map of the areas where anger is running high or where travelers are happier, thus improving
III. APPLICATIONS Sentiment analysis covers a wide range of applications. The ability of government agencies to continuously keep a tab on pulse of its citizens by means of on-going sentiment analysis will pave the way to better governance. Sentiment analysis can address questions of primary importance to government agencies [34]. For example, how do citizens feel about the agency’s new programs and policies? What are the most talked about program? Is it perceived as good or bad? Sentiment
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
801
transportation services [46]. Sensing citizens’ sentiments about a scientific or cultural event can provide, in real time, the event’s organizers with meaningful insights into the degree of success of their initiatives [47]. Crowd pulse detection [48] and gross national happiness calculation [49] are examples of emerging sentiment analysis applications that will provide insight into the dominant mood of the population in real time.
dataset of sufficient size. Instead, it is common to use transfer learning which is to pre-train a CNN on a very large dataset and then transfer the knowledge learnt to a new task to improve learning. Recent works from Torresani [55] and Li [56] show the effectiveness of transfer learning in computer vision. When it extends to deep learning, transfer learning across tasks has been shown feasible in an unsupervised way on relatively small datasets like CIFAR and MNIST by using deep feature representations [57], [58]. We apply the concept of transfer learning Deep CNN from large-scale image classification to the problem of sentiment prediction.
IV. PROPOSED COMPUTATIONAL SYSTEM: VISUAL SENTIMENT PREDICTION VIA TRANSFER LEARNING While extensive research has been conducted on sentiment analysis of text, research on visual sentiment analysis is far behind. Social media users are increasingly using images to express their opinions. Thus, sentiment analysis of such large scale visual content can help better understand user sentiments toward events or topics. In this section, we focus on the problem of sentiment prediction purely based on the visual information within image tweets. Recently, deep learning research has shown its state-of-the-art classification performance in the machine learning field [50], [51]. Especially on visual recognition tasks, convolutional neural networks (CNNs) has shown encouraging results outperforming hand-engineered image descriptors such as color histogram, HOG [52], SIFT [53] etc.
In our case, the learned parameters are transferred to the task of sentiment prediction, where the images are from a different domain and the labeled data is limited. The pretrained CNN we used in this study is the CNN-F architecture presented in Chatfield’s work [59]. The CNN-F architecture has 5 convolutional layers followed by 3 fully- connected layers, resulting in a total of 8 learnable concatenated layers. The details of each of the convolutional layers are given in table I. The first value in the description is the number of convolution filters and their receptive field size as “num x size x size”; the second indicates the convolution stride (“st.”) and spatial padding (“pad”); the third indicates if Local Response Normalization (LRN) [50] is applied, and the max-pooling down sampling factor. Fc6 and fc7 are regularized using dropout [50], while the last layer acts as a multi-way soft-max classifier. The activation function for all weight layers (except for fc8) is the REctification Linear Unit (RELU) [50].
A. Dataset To evaluate the proposed method we have taken real world dataset from Twitter. The benchmark [54] includes 603 tweets with photos. Ground truths of sentiment values were obtained by Amazon Mechanic Turk annotation, resulting in 470 positive and 133 negative labels. The dataset consists of training and test sets. It was divided into 5 partitions in order to perform a 5-fold cross-validation.
The CNN-F architecture was trained on ILSVRC-2012 with the ImageNet dataset [60], which contains 1.2 million images with 1000 categories, using gradient descent with momentum. To use the pre-trained CNN as a fixed feature extractor, we remove the last fully-connected layer of the pre-trained CNN (the layer leading to the output units) and treat the activations of the last hidden layer as a deep feature representation for each 224x224 input image. As is demonstrated by J Donahue [61], the 7th layer (Fc7) output of pre-trained CNN generalizes well to object recognition and detection. In this experiment, we explore its potential capability to higher-level concept understanding, namely sentiment.
Fig. 1. Examples of Twitter dataset
TABLE I. CNN-F ARCHITECTURE CONTAINS 5 CONVOLUTIONAL LAYERS (CONV 1–5) AND THREE FULLYCONNECTED LAYERS (FC 6–8). Architecture
For fine-tuning purposes, we have used 476 images of this dataset for training and 121 images for testing. B. Method We propose a novel framework that efficiently transfers Convolutional Neural Networks (CNN) learned on a largescale dataset to the task of visual sentiment prediction. Training an entire Convolutional Network from scratch with random initialization is a difficult task due to the difficulty of finding a
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
802
CNN-F
conv1
64×11×11 st. 4, pad 0 , LRN, x2 pool
conv2
256×5×5 st. 1, pad 2, LRN, x2 pool
conv3
256×3×3 st. 1, pad 1
conv4
256×3×3 st. 1, pad 1
conv5
256×3×3 st. 1, pad 1, x2 pool
fc6
4096 dropout
fc7
4096 dropout
fc8
1000 softmax
The overall architecture of our proposed system is explained in fig. 1. The deep features are computed by forward propagating through 5 convolutional layers and two fully connected layers in the pre- trained CNN. The output is a 4096dimensional deep feature vector. The pre-trained model was provided by a MATLAB toolbox called “MatConvNet” [62]. We choose to use symmetrical uncertainty [63] to select features. With one feature selected, we trained J48 decision tree, Random Forest and linear logistic regression classifiers from Weka [64]. The prediction accuracy is averaged over the 5 runs of cross validation.
C. Discussion It can be seen in table II that Logistic Regression model has the better accuracy compared to J48 and Random Forest. Therefore, we choose to use logistic regression classifier after tuning. We compare our method with two baselines. The first is the method of Borth et al. [54] consisting of using a set of lowlevel visual features for characterizing sentiment clues such as scenes, textures, faces as well as other abstract concepts. The second baseline is the SentiBank approach [54] that proposes new concept representation defined as an adjective-noun pair, e.g. happy guy, healthy food, etc. The ontology is constructed based on psychology studies and web mining. In both baselines, logistic regression model and the Twitter dataset (presented in section IV. A) were used.
Fig. 2. Overall architecture of the proposed system
TABLE II.
PREDICTION ACCURACY OF THE TRAINED CLASSIFIERS
Random Forest
J48
74.65%
74.99%
Linear Logistic Regression 76.16%
It can be seen in fig. 3 that our proposed method of using a pre-trained network as feature extractor (fc7) outperforms both baseline approaches. Fig. 3. Prediction accuracy of proposed method (fc7) in comparison to Lowlevel Features [54] and SentiBank [54] baselines.
76.16%
70% 57%
In order to improve the prediction accuracy, we have conducted supervised training to fine-tune the features of the pre-trained network and train another classifier on the features for the domain-specific task, which is sentiment prediction. Girshick et al. [51] showed how supervised pre-training followed by supervised fine-tuning of the network is a very effective approach for learning new tasks. Fine-tuning is done by initializing a network with weights optimized for ILSVRC2012. Then the weights are updated using the target task training set. The learning rate used for fine-tuning is typically set to be less than the initial learning rate used for the full ImageNet dataset. This ensures that the features learnt from the larger dataset are not forgotten.
Low-level Features
Fc7 (proposed method)
Additionally, experimental results showed that domain specific fine tuning is effective for improving the performance of neural network (table III). This set of results demonstrates that the proposed system for visual sentiment prediction is capable of efficiently utilizing the power of the pre-trained Convolutional Neural Network by transferring domain knowledge from the object detection domain to the sentiment prediction domain, and by domain specific fine-tuning.
We replaced the last 1000-dimensional logistic regression layer (previous fc8) with a 3-dimensional to suit our classification needs. Our 3 classes would be: Positive, Negative and Neutral. The pre-trained network contains several million parameters. Running a fine-tuning on a CPU is time consuming. Therefore, the forward and backward passes of the networks ran on a GeForce GTX 960. The network was trained by stochastic gradient descent, with a learning rate of 10−4 and momentum of 0.9. We lower the learning rate by an order of magnitude every 10 iterations. This generally results in a slightly better performance for the first iterations.
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
SentiBank
TABLE III.
IMPROVING PREDICTION ACCURACY USING DOMAIN SPECIFIC FINE-TUNING Method
803
Prediction accuracy
Fc7
76.16%
Fine-Tuning
82.35%
V. CONCLUSION The development of smart city technologies requires joint efforts by the academia, the industry and the government to provide evolving systems and services. In contrast to the past decade, nowadays “the leading urban centers are not placing their technological futures in the hands of a company or a single university research group”. “Instead, they are relying on a combination of academics, civic leaders, businesses, and individual citizens working together to create urban information systems that could benefit all these groups” [65]. We believe that sentiment analysis is a key factor in the development of all smart city domains. In this paper, a novel system for sentiment analysis using convolutional neural network is introduced for visual sentiment prediction. While the majority of previous works of sentiment analysis is focusing on text, an image is worth a thousand words. We suggest focusing on images and we propose to use a fine-tuned convolutional neural network initially trained on a large natural image recognition dataset (Imagenet ILSVRC2012) to learn feature representations for visual sentiment prediction.
[13]
[14]
[15]
[16]
[17]
[18]
REFERENCES [19]
[1]
C. Harrison, B. Eckman, R. Hamilton, P. Hartswick, J. Kalagnanam, J. Paraszczak, and P. Williams, “Foundations for Smarter Cities,” IBM J. Res. Dev., vol. 54, no. 4, pp. 1–16, Jul. 2010. [2] Frost & Sullivan, “Frost & Sullivan: Global Smart Cities market to reach US$1.56 trillion by 2020,” 2014. [Online]. Available: http://www.prnewswire.com/news-releases/frost--sullivan-global-smartcities-market-to-reach-us156-trillion-by-2020-300001531.html. [Accessed: 06-Feb-2015]. [3] B. Liu, “Sentiment Analysis and Opinion Mining,” Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1, pp. 1–167, May 2012. [4] [4] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting elections with twitter: What 140 characters reveal about political sentiment.,” 2010. [5] M. Balduini, I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, T. Lee, S.-H. Kim, and V. Tresp, “BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams,” Web Semant. Sci. Serv. Agents World Wide Web, vol. 16, pp. 33–41, Nov. 2012. [6] D. Yang, D. Zhang, Z. Yu, and Z. Wang, “A Sentiment-enhanced Personalized Location Recommendation System,” in Proceedings of the 24th ACM Conference on Hypertext and Social Media, New York, NY, USA, 2013, pp. 119–128. [7] M. Cheong and V. C. S. Lee, “A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter,” Inf. Syst. Front., vol. 13, no. 1, pp. 45–59, Sep. 2010. [8] A. Vakali, L. Anthopoulos, and S. Krco, “Smart Cities Data Streams Integration: Experimenting with Internet of Things and Social Data Flows,” in Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), New York, NY, USA, 2014, pp. 60:1–60:5. [9] A. Ortigosa, J. M. Martín, and R. M. Carro, “Sentiment analysis in Facebook and its application to e-learning,” Comput. Hum. Behav., vol. 31, pp. 527–541, Feb. 2014. [10] O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu, “A Large-scale Sentiment Analysis for Yahoo! Answers,” in Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 2012, pp. 633–642. [11] S. Chelaru, I. S. Altingovde, S. Siersdorfer, and W. Nejdl, “Analyzing, Detecting, and Exploiting Sentiment in Web Queries,” ACM Trans Web, vol. 8, no. 1, pp. 6:1–6:28, Dec. 2013. [12] A. Bermingham, M. Conway, L. McInerney, N. O’Hare, and A. F. Smeaton, “Combining Social Network Analysis and Sentiment Analysis
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
[20]
[21]
[22]
[23] [24]
[25]
[26]
[27]
[28]
[29]
[30]
804
to Explore the Potential for Online Radicalisation,” in Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining, Washington, DC, USA, 2009, pp. 231–236. A. Abbasi and H. Chen, “Affect Intensity Analysis of Dark Web Forums,” in 2007 IEEE Intelligence and Security Informatics, 2007, pp. 282–288. P. Hofmarcher, S. Theu\s sl, and K. Hornik, “Do media sentiments reflect economic indices,” Chin. Bus. Rev., vol. 10, no. 7, pp. 487–492, 2011. S. Siersdorfer, E. Minack, F. Deng, and J. Hare, “Analyzing and Predicting Sentiment of Images on the Social Web,” in Proceedings of the International Conference on Multimedia, New York, NY, USA, 2010, pp. 715–718. D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs,” in Proceedings of the 21st ACM International Conference on Multimedia, New York, NY, USA, 2013, pp. 223–232. D. Borth, T. Chen, R. Ji, and S.-F. Chang, “SentiBank: Large-scale Ontology and Classifiers for Detecting Sentiment and Emotions in Visual Content,” in Proceedings of the 21st ACM International Conference on Multimedia, New York, NY, USA, 2013, pp. 459–460. J. Yuan, S. Mcdonough, Q. You, and J. Luo, “Sentribute: Image Sentiment Analysis from a Mid-level Perspective,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, New York, NY, USA, 2013, pp. 10:1–10:8. Q. You, J. Luo, H. Jin, and J. Yang, “Robust Image Sentiment Analysis using Progressively Trained and Domain Transferred Deep Networks,” in The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015. D. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods,” Speech Commun., vol. 48, no. 9, pp. 1162–1181, Sep. 2006. D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech Commun., vol. 52, no. 7–8, pp. 613–625, Jul. 2010. F. Dellaert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” in , Fourth International Conference on Spoken Language, 1996. ICSLP 96. Proceedings, 1996, vol. 3, pp. 1970–1973 vol.3. R. Tato, R. Santos, R. Kompe, and J. M. Pardo, “Emotional space improves emotion recognition.,” in INTERSPEECH, 2002. M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognit., vol. 44, no. 3, pp. 572–587, Mar. 2011. L. C. De Silva, T. Miyasato, and R. Nakatsu, “Facial emotion recognition using multi-modal information,” in , Proceedings of 1997 International Conference on Information, Communications and Signal Processing, 1997. ICICS, 1997, vol. 1, pp. 397–401 vol.1. L. S. Chen, T. S. Huang, T. Miyasato, and R. Nakatsu, “Multimodal human emotion/expression recognition,” in Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. Proceedings, 1998, pp. 366–371. N. Sebe, I. Cohen, T. Gevers, and T. S. Huang, “Emotion recognition based on joint visual and audio cues,” in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, 2006, vol. 1, pp. 1136– 1139. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39–58, Jan. 2009. M. Wollmer, B. Schuller, F. Eyben, and G. Rigoll, “Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening,” IEEE J. Sel. Top. Signal Process., vol. 4, no. 5, pp. 867–881, Oct. 2010. I. D. Addo, S. I. Ahamed, and W. C. Chu, “Toward Collective Intelligence for Fighting Obesity,” in Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, 2013, pp. 690–695.
[31] L.-P. Morency, R. Mihalcea, and P. Doshi, “Towards Multimodal Sentiment Analysis: Harvesting Opinions from the Web,” in Proceedings of the 13th International Conference on Multimodal Interfaces, New York, NY, USA, 2011, pp. 169–176. [32] V. Pérez-Rosas, R. Mihalcea, and L.-P. Morency, “Utterance-Level Multimodal Sentiment Analysis.,” in ACL (1), 2013, pp. 973–982. [33] R. Rawassizadeh, B. A. Price, and M. Petre, “Wearables: Has the Age of Smartwatches Finally Arrived?,” Commun ACM, vol. 58, no. 1, pp. 45– 47, Dec. 2014. [34] R. Arunachalam and S. Sarkar, “The New Eye of Government: Citizen Sentiment Analysis in Social Media,” in Sixth International Joint Conference on Natural Language Processing, p. 23. [35] J. Villena-Román, “TweetAlert: Semantic Analytics in Social Networks for Citizen Opinion Mining in the City of the Future.” [36] B. Guthier, R. Alharthi, R. Abaalkhail, and A. El Saddik, “Detection and Visualization of Emotions in an Affect-Aware City,” in Proceedings of the 1st International Workshop on Emerging Multimedia Applications and Services for Smart Cities, New York, NY, USA, 2014, pp. 23–28. [37] D. Osimo and F. Mureddu, “Research challenge on opinion mining and sentiment analysis,” Univ. Paris-Sud Lab. LIMSI-CNRS Bâtim., vol. 508, 2012. [38] A. Kongthon, C. Haruechaiyasak, C. Sangkeettrakarn, P. Palingoon, and W. Wunnasri, “HotelOpinion: An opinion mining system on hotel reviews in Thailand,” in 2011 Proceedings of PICMET ’11: Technology Management in the Energy Smart World (PICMET), 2011, pp. 1–6. [39] H. Wang, D. Can, A. Kazemzadeh, F. Bar, and S. Narayanan, “A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle,” in Proceedings of the ACL 2012 System Demonstrations, Stroudsburg, PA, USA, 2012, pp. 115–120. [40] J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock market,” J. Comput. Sci., vol. 2, no. 1, pp. 1–8, 2011. [41] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting Stock Market Indicators Through Twitter ‘I hope it is not as bad as I fear,’” Procedia Soc. Behav. Sci., vol. 26, pp. 55–62, 2011. [42] P. Hofmarcher, S. Theu\s sl, and K. Hornik, “Do Media Sentiments Reflect Economic Indices?,” Chin. Bus. Rev., vol. 10, no. 7, 2011. [43] A. Ortigosa, J. M. Martín, and R. M. Carro, “Sentiment analysis in Facebook and its application to e-learning,” Comput. Hum. Behav., vol. 31, pp. 527–541, Feb. 2014. [44] D. Bogdanova, P. Rosso, and T. Solorio, “Exploring high-level features for detecting cyberpedophilia,” Comput. Speech Lang., vol. 28, no. 1, pp. 108–120, Jan. 2014. [45] M. Cheong and V. C. S. Lee, “A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter,” Inf. Syst. Front., vol. 13, no. 1, pp. 45–59, Sep. 2010. [46] S. B. Davis and M. Saunders, “How social media can help improve and redesign transport systems,” the Guardian, 2014. [Online]. Available: http://www.theguardian.com/sustainable-business/social-mediaredesign-transport-systems-cities. [Accessed: 24-Jan-2015]. [47] F. Antonelli, M. Azzi, M. Balduini, P. Ciuccarelli, E. D. Valle, and R. Larcher, “City Sensing: Visualising Mobile and Social Data About a City Scale Event,” in Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces, New York, NY, USA, 2014, pp. 337–338. [48] A. Vakali, D. Chatzakou, V. Koutsonikola, and G. Andreadis, “Social Data Sentiment Analysis in Smart Environments - Extending Dual Polarities for Crowd Pulse Capturing,” presented at the 2nd International Conference on Data Management Technologies and Applications, 2013, pp. 175–182.
978-1-5090-0751-6/16/$31.00 ©2016 IEEE
[49] A. D. I. Kramer, “An Unobtrusive Behavioral Model of ‘Gross National Happiness,’” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA, 2010, pp. 287– 290. [50] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [51] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587. [52] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 886–893 vol. 1. [53] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. [54] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM international conference on Multimedia, 2013, pp. 223–232. [55] L. Torresani, M. Szummer, and A. Fitzgibbon, “Efficient object category recognition using classemes,” in Computer Vision–ECCV 2010, Springer, 2010, pp. 776–789. [56] L.-J. Li, H. Su, L. Fei-Fei, and E. P. Xing, “Object bank: A high-level image representation for scene classification & semantic feature sparsification,” in Advances in neural information processing systems, 2010, pp. 1378–1386. [57] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer learning from unlabeled data,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 759–766. [58] G. Mesnil, Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio, I. J. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, and others, “Unsupervised and Transfer Learning Challenge: a Deep Learning Approach.,” ICML Unsupervised Transf. Learn., vol. 27, pp. 97–110, 2012. [59] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the details: Delving deep into convolutional nets,” ArXiv Prepr. ArXiv14053531, 2014. [60] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248–255. [61] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” ArXiv Prepr. ArXiv13101531, 2013. [62] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks for matlab,” in Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, 2015, pp. 689–692. [63] L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based filter solution.” [64] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explor Newsl, vol. 11, no. 1, pp. 10–18, Nov. 2009. [65] G. Mone, “The new smart cities,” Commun. ACM, vol. 58, no. 7, pp. 20–21, Jun. 2015.
805