Using Web Data to Enhance Traffic Situation Awareness - IEEE Xplore

2 downloads 200 Views 555KB Size Report
paper, we use natural language processing and data mining technologies to extract ... consult the Web about their usual commute, they are more likely to do so if ...
2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC) October 8-11, 2014. Qingdao, China

Using Web Data to Enhance Traffic Situation Awareness Xiao Wang, Ke Zeng, Xue-Liang Zhao and Fei-Yue Wang, Fellow, IEEE 

Abstract—With the ubiquity of mobile communication devices, people experiencing traffic jams share real-time information and interact with each other on social media sites, which provide new channels to monitor, estimate and manage traffic flows. In this paper, we use natural language processing and data mining technologies to extract traffic jam related information from Tianya.cn, analyze the content of people’s talk to discover the “talking point” of people when facing traffic jams, and to provide data support for relevant authorities to make successful and effective decisions for real-time traffic jam response and management.

I. INTRODUCTION Every year, traffic jams bring about considerable economic loss, severe casualties, aggravate pollution and increased travel time [1]. Governments spend a great amount of money trying to monitor, control and guide traffic jams, but it seems really difficult to monitor the traffic flow on every street timely. Besides, as traffic jams are dynamic, interrelated and unpredictable, they can propagate from one street to other streets. Although we have an impressive amount of hardware and software tools for traffic management and traveler’s decision making, the complex role of human behavior and information propagation in the transportation system demands considerations that might not be captured with sensors [2].

Situation awareness is defined as “the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future” [6]. For the last few years, the development of Human Flesh Search (HFS) [7], LBS and Crowdsoursing mechanisms [8] enable the traffic situation awareness possible in a given environment with crowds in the situation. The growing use of social media during traffic jams offer new information source from which the relevant authorities can enhance traffic situation awareness. People in the traffic jam can report on the ground information about what they are going through. The crowdsourcing information and the collective intelligence help relevant authorities to better understand “the big picture” during traffic jam situations, and thus provide assistance to guide, manage and control traffic flow. Here we focus on the content analysis about traffic jams based on large scale social media data, try to figure out the concerns of people when they are getting stuck in a jam. The rest of this paper is organized as follows: Section 2 presents some latest researches about web-based transportation analysis and the concept of social transportation; Section 3 focuses on web content extraction and introduces our data source platform; Section 4 analyzes the content of traffic jam related discussions and particularly cluster the discussions into themes in according to the co-occurrence network and discusses the statistical results as well as explains how these results can be used to help traffic management agencies to monitor, guide and control traffic flows timely; and a final conclusion is given on section 5. We’d hope that such results can help traffic management agencies to assess the severity of the jams, and make appropriate decisions to guide the traffic timely.

For the last ten years, with the development of internet technologies and mobile communication devices, location based services (LBS) [3] and geospatial information systems (GIS) [4] are playing more and more common roles in people’s life. The Internet becomes an open source for all kinds of information, and can be important to explain and help to predict many traffic-related phenomena. Existed research [5] pointed out that while people don’t often tweet, post or consult the Web about their usual commute, they are more likely to do so if something abnormal happens, which turn social media into a crowdsourcing virtual sensor network and allowing maps of messages to be plotted. This generates numerous user updates from which we can find useful *Reseach supported by the National Nature Science Foundation of China under grant 71232006. Xiao Wang is with Qingdao Academy of Intelligent Industries, Qingdao, Shandong, 266109, China; she is also with the State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China (email: [email protected] ). Ke Zeng is with the school of electronic and information engineering, Xi’an Jiaotong University, Xi’an, 710000 China(e-mail: [email protected]). Xue-Liang Zhao is with the State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China (email: [email protected] ). Fei-Yue Wang is with Qingdao Academy of Intelligent Industries, Qingdao, Shandong, 266109, China; he is also with the State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China (e-mail: [email protected] ). 978-1-4799-6078-1/14/$31.00 ©2014 IEEE

information related to real-world events-including emergencies such as natural disasters, medical and traffic accidents. Compared with sensor-based systems, social media-based monitors and data collect systems are much faster and lower cost.

II. SOCIAL TRANSPORTATION Recently, traffic analysis and forecasting using data from mobile phones and social media are getting more and more attention. With the development of LBS services, HFS and crowdsourcing mechanisms, social media has turn to a virtual sensor network full of human-agents with high mobility, fast response rate and great flexibility. Paper [9] indicates that social media contextual information can be used in intelligent transportation systems (ITS) to help to understand how traffic will evolve. Researches [10, 11] state that traffic jams on roads in physical space also cause “opinion exploration” in cyberspace, as users on social media platforms can also be 195

travelers and passengers who got stuck in traffics, and they can form negative sentiment tendency and map to our physical world. Besides, Staurt .et al [5] stresses on the real-time feedback function of social media, and suggests using social media data to do post-event impact assessments. Feiyue Wang [12, 13] puts forward the concept of “social transportation”, which stresses on the importance of real-time computing and embedded applications in transportation systems with online and interactive big data. The picture of social transportation is as figure 1 shows.

III. INFORMATION EXTRACTION Generally two resources are considered when we’re looking for information on the web: specific websites/ social media platforms, or generic Web search Engines such as Google, Baidu or Bing that can retrieve any document in the Web [9]. Here we focus on a well-known social media platform—Tianya.cn, which is one of the largest forums in China, possesses more than 91 million registered users, has 200 million visits per month and about 1 million average concurrent users (ACU), and provides abundant data for research. As user-generated-contents become the norm, forum users not only work as content generators, but also spreaders and consumers, which enable tianya.cn the characteristics of rapid information updating speed, fast information propagation, real-time feedback to social events, etc. The huge user groups as well as the vast amount of information make online forums very sensitive to current events, especially emergencies. When users publishing messages on Tianya.cn, they either publish a blog or publish a post. The content of a blog consists of a title, an entry and several posts, while the entry is detailed explanation of the title. To begin, we define some terminology. We refer to each message (include title, entry and post) as an item, and define chains of messages in the same blog as discussion thread, while themes are human defined logical assignments to messages and may be consisted of several threads that share the common “talking points”. IV. GLOBAL ANALYSIS FOR WEB TRAFFIC JAM CONTENT

Looking at the picture of social transportation, it is obvious that social transportation systems and technologies must be context aware. The Internet will be a central data resource pool about traffic events, transportation network as well as driver experience. From our perspective, the opportunities and open scientific challenges are immense and generally still at an early stage.

We analyze activities on Tianya.cn for the period January 1 to December 31, 2012. The dataset follows behavior of millions of users and tracks a total of billions of messages. Here we try to extract content about traffic jams from Tianya.cn, thus we set up several keywords as inputs for an available API, and manually filter the extracted information.

Indeed, with the ubiquity of location based services, mobile applications and geographic information systems, social media has become a crowdsourcing virtual sensor network, while nodes in this network could be people all over the world. People publish and share information about their mood, photos, the food they eat, the place they are in, the events happened nearby, all the time, and make it possible for the first time to study human behavior and social phenomenon in a global view. Thus, we have new and convenient accesses to real time data, and are capable of analyzing and forecasting traffics with real time data.

A. Temporal Variation Afterwards, we identify 193 discussion threads about traffic jams. Figure.2 shows the temporal distribution of items and discussion threads from January to December in 2012. From Fig.2 we can see, more than half of all discussion threads and more than 85% of all items about traffic jams emerge during September and October, which forms real-time mapping to physical world traffic congestion happened before, during and right after the National holiday in China. As the Chinese government implemented a policy which allowed vehicles with seven or less seats enjoy the travel toll free service on highways during the Golden Week (from Oct. 1 to Oct. 7), domestic travels were over-stimulated and thousands of traffic congestions happened. In according to existed research [11], Chinese citizens took 189 million trips during this holiday; formed 68,422 traffic accidents which caused 794 people lost their lives.

As many researchers emphasize the importance of social media content to advanced traffic analysis, few of them pay attention to the “talking point” of users. What do people think and care when they get stuck in traffic jams? How do people feel about and deal with the congestion situation? Are they angry about the situation or even the public services in China? If we can figure out what people care and concern in traffic jams, it will help to monitor the public opinions and could be possible to provide effective decision support to comfort public mood and avoid negative sentiment spread.

Interestingly, we thought there would be a lot of congestion related topics during February, as the “spring rush” in China is the largest annual movement of people in the world. However, people didn’t comment about the “spring rush” a lot, which we 196

thought people might have taken it for common phenomenon, and no longer surprised to it.

to do the word segmentation. Afterward, we present the segmented words into a word cloud to stress the focus of users’ talk, while the sizes of words illustrate their term frequency (TF). As there are too many words and lots of them just show up once, here to emphasize words with high term frequency, we delete a word from the cloud if TF(word)  1 , and get a word cloud as figure 3 shows. Surprisingly, most of the talks are not about personal emotion expression but the traffic jam itself, such as the severity of congestion, the urban traffic problem and how to solve the problem. Therefore, it can be inferred that most people are quite reasonable when facing traffic jams.

We further study the feature of vocabularies used in the discussions, and construct an undirected graph in accordance with the co-occurrence of words in one item. Specifically, each item represents a data point in N-dimensional space, where N is the size of the entire word vocabulary, such as item a  (word1 , word 2 ,..., word n ) . As the whole dataset is too large, and there may be plenty of noise data in users’ posts, here we choose to present the co-occurrence of words appeared in the first item (the title of a blog) in a discussion thread, and construct a co-occurrence network as figure 4 shows. In fig.4, each node represents a word, while an edge between each pair of nodes means those two words appear in the same item. Thus, words in one item form a complete graph[17].

B. Content Analysis

Many studies stressed the importance of online opinion analysis when coming to emergencies. Some recently researchers also pointed out that social media content can be important to explain and even help analyze sentiment tendency and predict many transport-related phenomena [14,15], as a sudden traffic congestion in an area may be due to special events, street fairs, road works or harsh weather. Here we focus on the content of discussion threads and employ the Chinese Lexical Analysis System 3 launched by the Chinese Academy of Sciences, Beijing, China, in 2011 [16] 197

C. Clustering for Theme Discovery It is obvious that the structure of the co-occurrence network of words in fig.4 shows clustering trend. Here we are interested in figuring out what do people think and care when facing traffic jams, in other words, we want to discover the “talking points” of users, so we cluster the discussion threads into themes. As we’ve segmented the items into words, now we construct a weighted adjacent matrix[18] ( A for short) for words in fig. 4, and each row vector of the matrix represent the co-occurrence situation of one word. Assume there are N words in all, A i,j represents the co-occurrence times of words

charge in rush hours, which also caused great controversy, as lots of users hold the viewpoint that charge policy is not capable of alleviating traffic congestion. Besides, many users focus on the performance of Chinese government when facing great traffic jams, and even proceed further discussions about public traffic policies, the reflection of our society and the application of high technology to avoid traffic jams.

i and j , which is also equals to the number of edges between nodes i and j in fig. 4. Then we can cluster the words into groups in accordance with the similarity [19] of their adjacent vectors.

Using information retrieval (IR) and Information Extraction (IE) technologies, with the help of crowdsourcing mechanisms, we can capture and extract contextual information from the Internet for advanced traffic management as well as for novel ITS application. By analyzing the co-occurrence situation of words, it becomes much easier to analyze the relation of all discussions, and provides new perspective to detect the talking points of people.

Here we use the k-means [20] algorithm to cluster the words, and get 11 themes as listed in fig.6 shows. Each node represents a word, and the color indicates the theme that the word belongs to. Edges state the co-occurrence of pairs of words and the strength of edges show their frequency of co-occurrence.

V. CONCLUSION In this paper, we use social media data to study and analyze the traffic situation awareness of online users through word segmentation and themes clustering, which provide on-the-ground information from the public, enhance the traffic situation awareness across geographical boundaries and offer abundant data support for relevant authorities. In the future, we’d like to conduct more systematic analysis and experiments on large scale datasets to do traffic analytics and prediction. REFERENCES [1]

[2]

[3]

[4]

[5]

It is clear from figure 6 that, most discussions focus on the manage and control for pollution reduction and environment protection caused by traffic jams, parking issues, urban traffic and traffic jams on high way during the National Holidays. Interestingly, the first two issues are also important and annoying problems along with the whole Golden Week, which stress the importance of traffic management and control during the National holidays. Besides, in order to use economic measures to guide and control urban transport travel demand, the government claimed to impose traffic congestion

[6]

[7] [8]

198

Fenghua Zhu, Ding Wen, and Songhang Chen, “Computational Traffic Experiments Based on Artificial Transportation Systems: An Application of ACP Approach”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 1, pp. 189-197, March 2012. Hardjono, B., Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia, Nurhadiyatna, A., Mursanto, P., Jatmiko W, “Development of traffic sensor system with virtual detection zone”, 2012 International Conference on Advanced Computer Science and Information Systems, Depok, 2012, pp. 1-2. M. Vanderschuren and D. de Vries, “Advanced Public Transportation Information Provision: What are the effects on improved customer satisfaction?”, Proceedings of the 16th International IEEE Annual Conference on Intelligent Transportation Systems (ITSC 2013), Hague, The Netherlands, 2013, pp. 6-9. Wei-Hsun Lee, Shian-Shyong Tseng, Jin-Lih Shieh, and Hsiao-Han Chen, “Discovering Traffic Bottlenecks in an Urban Network by Spatiotemporal Data Mining on Location-Based Services”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 12, NO. 4, pp. 1047-1056, December 2011. Stuart E. Middleton, Lee Middleton, and Stefano Modafferi, “Real-Time Crisis Mapping of Natural Disasters Using Social Media”, IEEE Intelligent Systems, VOL.14, NO.2, pp. 9-17, March 2014. Jie Yin, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert Power, “Using Social Media to Enhance Emergency Situation Awareness”, IEEE INTELLIGENT SYSTEMS, VOL.12, NO.4, pp. 52-59, November 2012. Qingpeng Zhang, Fei-Yue Wang, Daniel Zeng, Tao Wang, “Understanding Crowd-Powered Search Groups: A Social Network Perspective”, PLoS ONE, VOL.7, NO.6, pp. 1-16, November 2012. Di Wu, Yuan Zhang, Lichun Bao, and Amelia C. Regan, “Location-Based Crowdsourcing for Vehicular Communication in Hybrid Networks”, IEEE TRANSACTIONS ON INTELLIGENT

[9] [10]

[11] [12]

[13]

[14]

[15]

[16] [17] [18] [19]

[20]

TRANSPORTATION SYSTEMS, VOL. 14, NO. 2, pp. 837-846, June 2013. Francisco C. Pereira, Ana L. C. Bazzan, Moshe Ben-Akiva, “The Role of Context in Transport Prediction”, Intelligent Transportation Systems, VOL. 14, NO.1, pp. 76-80, January 2014. Jianping Cao, Ke Zeng, Hui Wang, Jiajun Cheng, Fengcai Qiao, Ding Wen, and Yanqing Gao, “Web-Based Traffic Sentiment Analysis: Methods and Applications”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 15, NO. 2, pp. 844-853, April 2014. Ke Zeng, Wenli Liu, Xiao Wang, Songhang Chen, “Traffic Congestion and Social Media in China”, IEEE Intelligent Systems, No. 1, Vol. 28, pp. 2-7, January 2013. Fei-yue Wang, “Scanning the Issue and Beyond: ITS With Complete Traffic Control”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 15, NO. 2, pp. 457-462, April 2014. Fei-yue Wang, “Scanning the Issue and Beyond: Real-Time Social Transportation with Online Social Signals”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 15, NO. 3, pp. 909-914, June 2014. YIN Xi-hui, JIA Yuan-hua, NIU Zhong-hai, “Research on Traffic Jam Charging Policy for Vehicles on Highway in Chinese Megalopolis”, 2012 International Conference on Management Science & Engineering (19th), September 20-22, 2012 Dallas, USA, pp. 1911-1916. Zuchao Wang, Min Lu, Xiaoru Yuan, Junping Zhang, Huub van de Wetering, “Visual Traffic Jam Analysis Based on Trajectory Data”, ANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 12 pp. 2159-2168, DECEMBER 2013. “ICTCLAS,” 2011. [Online]. Available: http://ictclas.nlpir.org/ Al-Naymat, G., “GCG: Mining maximal complete graph patterns from large spatial data”, 2013 ACS International Conference on Computer Systems and Applications(AICCSA) ,Ifrane, 2013, pp. 1-8. Tetsuya Yoshida, “Regularized Modularity Eigenmap for Community Discovery with Side Information”, 2011 IEEE International Conference on Granular Computing, 2011, Kaohsiung, pp. 799-784. GuanHong Zhang, Odbal, “Sentence Alignment For Web Page Text Based On Vector Space Model”, 2012 International Conference on Computer Science and Information Processing (CSIP), 2012, Xi’an, pp. 167-170. Yarmohammadi H.A., Fard A.A., Khosravi H., “Clustering low quality Farsi sub-words for word recognition”, 2014 Iranian Conference on Intelligent Systems(ICIS), 2014, Bam, pp. 1-5.

199

Suggest Documents