A Framework for Obtaining Publicly Available Geo

XV International Scientific Conference on Industrial Systems (IS'11)

Novi Sad, Serbia, September 14. – 16. 2011. University of Novi Sad, Faculty of Technical Sciences, Department of Industrial Engineering and Management Available online at http://www.iim.ftn.uns.ac.rs/conferences/is11/

A Framework for Obtaining Publicly Available Geo-referenced Video Meta-data Milan Mirkovic Faculty of Technical Sciences, Trg Dositeja Obradovica 6, Novi Sad, Serbia, [email protected]

Dubravko Culibrk Faculty of Technical Sciences, Trg Dositeja Obradovica 6, Novi Sad, Serbia, [email protected]

Andras Anderla Faculty of Technical Sciences, Trg Dositeja Obradovica 6, Novi Sad, Serbia, [email protected]

Darko Stefanovic Faculty of Technical Sciences, Trg Dositeja Obradovica 6, Novi Sad, Serbia, [email protected]

Stevan Milisavljevic Faculty of Technical Sciences, Trg Dositeja Obradovica 6, Novi Sad, Serbia, [email protected]

Abstract Meta-data was initially meant to be used as a way to index, or additionally describe, the primary data it was attached to (in other words, the content itself). However, as the content got more versatile and complex, and meta-data creation became more automated, it emerged as a valuable source of information and knowledge itself – if enough of it was collected and put under scrutiny. This paper presents a framework for obtaining meta-data that describes publicly available geo-referenced videos accessible through YouTube service, as well as a custom developed software tool for automatic retrieval of that meta-data. Key words: Framework, Meta- data, Public, Video

1. INTRODUCTION Meta-data is used to describe the content it is attached to in more detail, so advanced search and indexing can be performed efficiently. It is created in either automated or semi-automated fashion – while purely manual creation of meta-data is certainly possible, for modern applications it is usually too time consuming and hence often automated to some degree. In the former case, meta-data is attached to content at creation time, effectively embedding itself in the data, if the format allows it. In the latter case, user adds missing (or potentially useful) data manually at publish time. Good example of automatic meta-data creation would be taking of a photo; when user presses the shoot button, not only the content in terms of pixels that make up the photo is stored, but also some additional data that describes that photo, such as time and date of creation, make and model of the device used to make it, image resolution and color depth, and possibly even the geographical location (latitude and longitude coordinates). Sometimes devices don’t automatically store additional description of content they produce, or that description is scarce for specific purposes and requires some expansion. That’s when users have to either manually, or by using appropriate tools for auto-

creation – enter missing information themselves. Uploading a video clip to YouTube is a nice example of the latter case (semi-automatic meta-data creation); once the video has been uploaded, users are required to enter a name for it, pick one of the categories that describes the content the best, and they can enter a short description of it. They can even pick a place where the video was taken from accompanying map, or enter precise geographic coordinates in appropriate fields if they wish. Whatever the method of meta-data creation, once it’s there, it can be treated just like regular data: it can be indexed, searched and analyzed. While benefits of searching for content using meta-data are obvious (faster and more meaningful results obtained, extended search capabilities depending on the amount of meta-data available), some issues exist mainly due to human factor and ambiguity - that might cancel all the upsides of this approach. Incomplete, missing, ambiguous or false meta-data can lead to unexpected or unwanted results. The rest of the paper is organized as follows: Section 2 provides brief information about YouTube service and type of meta-data it stores along video content, Section 3 describes some limitations and problems that arise during meta-data collection, Section 4 presents a

IS'11

2

Milan Mirkovic et al.

framework for overcoming those limitations in order to collect as much meta-data for publicly available georeferenced videos as possible, in Section 5 an automated tool based on proposed framework is presented and Section 6 concludes the paper with some remarks about possibilities for improvement and future research.

2. YOUTUBE, VIDEOS AND META-DATA YouTube is a video-sharing web service, that enables anyone to upload their video content for the whole world to see. Or, as the company itself states: ”YouTube provides a forum for people to connect, inform, and inspire others across the globe and acts as a distribution platform for original content creators and advertisers large and small.” [1] It was founded in early 2005 by three ex-PayPal employees: Chad Hurley, Steve Chen, and Jawed Karim. Nowadays, YouTube has hundreds of millions of users from around the world, 24 hours of content uploaded each minute and as of May 2010 – more than 2 billions of views per day. It all makes it the largest and the most heavily used service of its kind in the world. Such a variety of users and publicly available content provide a fertile ground for researchers to conduct various analysis. 2.1 Videos For users to be able to upload their video content to YouTube, first thing they need to do is open an account – also known as a profile. Aside mandatory information one needs to provide in order to create an account – a unique e-mail address and user name, location, date of birth and gender – there are several other categories that allow users to enter additional information about themselves. Once they have created an account, users can login and start uploading video-content. The process is pretty straightforward: upon clicking the Upload link a window is brought up offering users to upload a file already existing on their device (PDA, mobile phone, computer) or to capture a video stream from their web camera. By far more often used option is the first one. YouTube supports a wide variety of formats it can accept and convert to be displayed via Adobe Flash enabled devices – MPEG-4, MPEG, WMV and even 3GP – as well as container formats, such as .avi, .mkv., .mov, .mp4, .flv, etc. Only a few restrictions apply when it comes to content users can upload (purely technically speaking, copyright issues aside), such as that file size cannot exceed 2GB and that its duration cannot be more than 15 minutes; but even these restrictions are prone to changes, due to technological advancements (most notably those affecting storage and bandwidth costs). 2.2 Meta-data Once the content has been uploaded, users have to choose a category that best describes it (there are 15 categories currently available, ranging from Autos & Vehicles, Comedy, Education and Entertainment, over Gaming, Music and Travel, to Sports and Science), provide a title for the video, assign at least one tag to it

(there are some automatically generated that users may or may not accept) and choose a privacy setting – Public for everyone to be able to search and see it, Unlisted for only people with direct link to be able to view the video, or Private for only certain YouTube users to be able to access the content. Additional information can, but does not have to be provided. Among that additional information, users can choose to enter date when and location where the video was recorded, they can add annotations, captions and subtitles to the video, swap soundtrack (or add one if none exists) and enable or disable various options that affect privacy settings and community interaction (such as ability to comment on the video, rate it, respond to it by another video, etc.). Once everything has been setup, depending on the privacy settings and the level of details provided, the video will be returned as a search result when a query is initiated.

3. YOUTUBE SEARCH, API AND LIMITATIONS YouTube provides a convenient web-interface for perfuming basic and advanced searches for videos that users have uploaded. In fact, the search process is so simplified that all that it is required of users is to type a word or a phrase they wish to search for in the provided field, and a result set is returned. That set can then be refined by imposing certain restrictions to it in form of showing only chosen categories, only videos uploaded by specific users or at a specific time, or altering search criteria by adding new words to search phrase or sorting results obtained by different index. Even though the service will inform user about the approximate number of results returned for given search criteria, no more than 1,000 results can be browsed at a time; i.e. if a search returned approximately 31,000 results, only the first 1,000 will be visible to the user. This is an limitation imposed by the service itself [2] and while the explanation for it is scarce, it is probably in place as a measure to save resources that would otherwise be wasted by returning huge result sets that no one browsed extensively anyway. For a more fine-grained searches to be performed – ones that allow multiple criteria to be imposed by the user at run-time, some of which might not be available through web-interface – and in order to automate metadata retrieval, one has to resort to YouTube Data API (Application Programming Interface). There are a lot of free libraries that abstract the Data API to languagespecific object models, and most of them are opensource and free for modification under one of the public licences. This allows programmers to develop software that can be highly customized to suit their particular needs. One of particular upsides of using Data API to perform geo-referenced meta-data searches is that it enables programmers to specify a circular area – using location (latitude/longitude coordinates) and radius parameters – to perform the search in, therefore targeting only the area that is in focus of their interest. This was the key feature for building a tool (basically a YouTube crawler, inspired by [3]) for obtaining publicly available geo-

IS'11

3


referenced meta-data that is described later in this paper, especially the more so when another key limitation is bared in mind – that YouTube doesn’t allow for pure spatial queries (i.e. passing only the location and radius parameters and retrieving all videos tagged as recorded in that area, ignoring search keywords).

4. FRAMEWORK FOR OBTAINING META-DATA Knowing the limitations described in Section 3, the framework proposed in this section presents some novel ways for overcoming them, in order to obtain a large set of videos as a search result. Since YouTube videos can only be accessed through search results (in the context of this framework where only geo-referenced videos are considered, videos that appear as items in various automatically generated lists – most viewed, most recent etc. – are not relevant), an effective mechanism for their retrieval is proposed. When a search for YouTube videos is performed, the results returned are correlated to the phrases (or words) used as a search criteria and the sorting order applied (by time created, relevance, view count, etc.) This means that if an area is searched using English words as criteria, only videos containing those words in the title, keywords or elsewhere in the meta-data would be included in the result set. This poses a serious problem when performing searches in areas that have a dominant language which the user is not familiar with, since there is a good probability that there are (many) videos titled or tagged using that local language – which sometimes relies on a completely different alphabet (e.g. Chinese or Japanese as opposed to English or German). This paper proposes the usage of words frequency lists as search criteria in order to obtain many results in the set returned, while disregarding the knowledge of dominant language of the area in focus. Words frequency lists are simply two-column tables, where the first column contains a set of words, and the other their ranking in accordance to their frequency in a language. That is to say, they contain words sorted from the most commonly used ones, to the least commonly used ones. This descending order is vital when performing searches, since it is more likely that a video will contain some of the top-ranked words than some less commonly used ones in its meta-data (e.g. “a”, “the”, “is” and “of” are more likely to appear than “vulcan”, “doorstep” and “windmill”). Some of those lists are readily-available, and are commonly a result of research conducted by linguists. The problem, though, is that not always is such a list available for the language one wishes to perform the search in. For example, English frequency lists can be found online, and they are constructed by counting words from thousands (and millions) of sources, including dictionaries, novels, poetry books, etc. Many other languages might not have that many digital sources available (i.e. books written in those languages have not been digitized), or similar research has simply not been conducted, which makes construction of their frequency lists a tedious task. To overcome this

problem to a large degree, we propose the use of RSS (Real Simple Syndication) feeds as a source of words that can be used to construct a words frequency list of any language (the method is presented in Section 5). Depending on the number of sources and their quality, these lists will be more or less accurate and hence yield cruder or finer results when used as search criteria. The logic behind this approach is that if an area has some mass-media or other (e.g. blogs, forums) RSS feeds available, it is very likely that it will have some videos containing meta-data in that language available through YouTube service as well. In the long run, if enough textual material is analyzed, a frequency list that correlates well with the real proportions of word usage in a language will be constructed (e.g. prepositions or nouns used in everyday communication are bound to be positioned closely to the top of the list, and there is a good probability that videos will contain some of them in their description or tags). Once the lists are constructed (or obtained otherwise) an area can be repeatedly queried for geo-referenced videos using words from those lists as search criteria. In this approach, since English is nowadays considered to be the “global” language, an English frequency list is always used in the search to maximize the number of results obtained. The approach is illustrated in Figure 1.

Figure 1 – Obtaining meta-data for a large number of YouTube videos

5. MYTUBE EXAMPLE

–

AN

AUTOMATED

TOOL

A tool that relies on framework described in previous Section, dubbed MyTube, has been developed with the goal of automating YouTube searches and obtaining as much video meta-data as possible, while providing flexibility in terms of search criteria used and geographic area covered.

IS'11

4


It uses Zend framework [4] for performing YouTube searches, and relies on a MySQL database for storing words frequency lists and retrieved meta-data. It also utilizes a web-based user interface for entering search criteria, and contains an interactive Google Map for easier tracking of searches performed (Figure 2).

a search criterion is performed returning a designated number of results. -

For each video in the returned result set, metadata is stored in the database; if a video already exists in the database (each video has a unique ID that can be checked against) it is skipped.

After the search has been performed, some basic information and statistics about it (location and radius parameters, number of videos that were retrieved and number that have already appeared in the database etc.) are stored in a separate table, and can be later visualized to keep track of areas that have been crawled for meta-data. As for the meta-data itself, for every video the following is extracted: -

Video ID (a unique identifier assigned by YouTube)

-

Video Title

-

Category (only one can be selected by users at publish-time)

-

View count (number of times the video has been viewed at the time of crawl)

-

Duration (video duration, in seconds)

-

Publish date and time (when the video was made available online)

The search for videos in an area is performed in the following way:

-

Tags (a list of keywords describing the content)

1. User either manually enters latitude and longitude coordinates in provided form fields, or uses a draggable marker on the map to populate them automatically.

-

Coordinates (latitude and longitude)

-

Author (username of the uploader)

-

Thumbnail information (since every video has 4 thumbnails extracted from it, they can be accessed via separate URLs; the exact time in video each thumbnail was extracted from is also stored)

-

Obtained date and time (when the meta-data was retrieved)

3. Lastly, they enter the desired search radius, and select a sorting order through a provided drop-down list.

-

Language retrieved (words frequency list that was used to perform the search the video was first returned as a result of)

When all of the above parameters are defined and submitted, the search for meta-data begins. Data retrieval and storage works as follows:

-

Coordinates – object (in addition to textual representation of latitude and longitude coordinates, a separate field that stores them as objects in MySQL’s internal geo-spatial format is populated as well, to allow for spatial database queries)

Figure 2 – Searches performed using MyTube, shown on an interactive Google Map While the tool is still a prototype, it enables users to construct words frequency lists for different languages by scanning designated RSS feeds for entries, which are then sliced to individual words, which are in turn stored in a table. If the word already exists in a table, its count (i.e. frequency) is increased. Available frequency lists (already in the database) can then be used to query YouTube for videos in an area.

2. They then select the desired frequency list from a drop-down list, and enter the number of top words to include as search criteria. The number of results per search criteria can also be entered (if left out, defaults to 100 results per search criterion).

-

Only videos containing geo-information (i.e. latitude and longitude coordinates) are considered.

-

Search is restricted to a circular area around a specified point, defined by latitude, longitude and radius parameters (in kilometres).

-

Using designated words frequency list and number of top words / results per word parameters given, a query using each word as

more

closely

5.1 Simple data visualization In addition of being capable to search and store video meta-data, the tool described in this section also provides some simple visualization functionality. Since all of the data stored is geo-referenced, one of the first questions that imposes itself is to see the spatial distribution of it. Heat maps are particularly suited for

IS'11

5


this task, since they display continuous data using gradients: lower (density) values are presented in one color, and higher values are displayed using another color, with many shades of colors in between (e.g. red for low density, yellow for medium density and white for high density).

took place (e.g. to see where concerts were held if “Music” category was used to construct the heatmap).

In order to construct a heat map for a region, MyTube enables users to draw a polygon on an interactive Google Map (Figure 3), boundaries of which will be used as criteria for retrieving stored data from the database (only records within the selected area will be returned). The result set contains coordinates of each record, which are then used to construct a heat map.

Figure 4 – Sample heat map as an overlay in Google Earth

6. CONCLUSION

Figure 3 – Selecting an area for simple visualization Depending on the zoom level that is used when defining boundaries (polygon), fine or coarse grained heat maps can be constructed (e.g. one can decide to construct a heat map for the whole continent, or just for a city block). Once constructed, heat maps (basically transparent .PNG images) can be used as overlays in other software tools to get some more information about the area they span, and perhaps to extract some useful information based on the distribution of the data (e.g. to identify attractive locations [5],[6]). Google Earth is one of the most popular geo-browsers nowadays, and it is a pretty good choice for visual exploration tasks, since it allows easy creation of custom layers and their merging with existing ones. Figure 4 shows a heat map for the area selected previously (see Figure 3), that has been imported to Google Earth as a new layer. Green and blue colors represent areas with few videos recorded in them, while red and white areas represent high concentration of videos tagged as recorded in those locations. Zooming in on areas with high concentration of videos while having the labels layer turned on in Google Earth can give a good clue what it is the users were recording. Also, to further investigate phenomenons underlaying patterns observed and to detect additional (e.g. temporal) patterns, the data could be split into different time intervals each of which would be imported as a separate layer and then compared to other ones in an effort to detect differences. Another interesting analysis would be to see the data distribution in regards to categories videos were assigned to – it would then be possible to identify places where events of certain sort

Technological advances, especially in the fields of telecommunications, data storage, networking and integrated mobile devices, have led to tremendous amounts of various types of content to be created every day. This content includes textual documents, audio, video recordings and images. It is estimated that the amount of data in the world’s databases doubles every 20 months [7]. While accessibility and scarcity of data were a problem before, ubiquity of computers nowadays raised a completely different issue: it is becoming increasingly difficult to make sense of, and discover the patterns in the vast abundance of data that just keeps increasing in volume. One of the ways to simplify navigation through all that data and make search results more meaningful is through usage of meta-data. Meta-data is simply another piece of information added to the content itself, that somehow extends it (additionally describes it), and/or indexes it in one way or another, improving the search process in terms of speed and relevance of the results returned. However, the meta-data itself can be subjected to various analysis if collected for a large enough number of entries. This is especially true when data-mining algorithms and visualization techniques are in question – both of which require a rather large data set to be able to perform well and yield useful results. Therefore, in order to apply these techniques (and many others) of data analysis, it is important to have a large data set at hands. Obtaining such a set can be a tedious and time-consuming process, that is best automated to some degree (or completely if possible). The level of automation and flexibility (the ability to adapt to new requirements) of the process are in a constant conflict, and it is usually rewarding (in terms of time spent and results achieved) to find a good balance between the two.

IS'11

6


This paper presented a framework for obtaining publicly available geo-referenced video meta-data, that is used to additionally describe videos accessible via YouTube service. Additionally, a tool for automatic retrieval and storage of that data was presented and some problems that arose during its development and ways to overcome them were discussed. The tool enables users to systematically search desired geographic areas for video meta-data, disregarding languages it might be written in (hence increasing number of items returned). It also provides means for some basic visualization of the obtained data through creation of heat maps and their export to Google Earth geo-browser. Its main potential, however, lies in the fact that it was developed using publicly available technologies that make it easily extendable and adaptable to new requirements and / or types of meta-data. Meta-data obtained using the tool described in Section 5 that relies on the proposed framework, can be used for a variety of analysis, ranging from pure visual exploration to data-mining. Results of those analysis can uncover new patterns (e.g. movement patterns [8], popular locations [9],[10], seasonal fluctuations [11]) and knowledge [12] (e.g. people tend to record short videos using their mobile phones, majority of clips were uploaded in the afternoon, etc.) that can be leveraged by different interested parties (statisticians, urban planners, marketing specialists and so on) to achieve various goals [13].

[10]

[11]

[12]

[13]

7. REFERENCES [1]

[2]

[3] [4] [5]

[6]

[7]

[8]

[9]

YouTube. About YouTube. http://www.YouTube.com/t/abouty outube, accessed December 2010. Google YouTube APIs and tools, FAQ http://code.google.com/apis/YouTube/faq.html#over _1000, accessed 11. April 2011 C. Shah, TubeKit - A Query-based YouTube Crawling Toolkit, 2009. R. Allen, N. Lo, and S. Brown, Zend Framework in action, Manning, 2009. M. Mirković, D. Ćulibrk, S. Milisavljević, and V. Crnojević, “Detecting attractive locations using publicly available user-generated video content central Serbia case study,” TELFOR, Belgrade, 2010. F. Girardin, F. Calabrese, F.D. Fiore, C. Ratti, and J. Blat, “Digital footprinting: Uncovering tourists with user-generated content,” Pervasive Computing, IEEE, vol. 7, 2008, p. 36–43. I.H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann Pub, 2005. F. Girardin, F. Dal Fioreb, C. Rattib, and J. Blata, “Leveraging explicitly disclosed location information to understand tourist dynamics: a case study,” Journal of Location Based Services, vol. 2, 2008, p. 41–56. S. Kisilevich, F. Mansmann, and D. Keim, “PDBSCAN: A density based clustering algorithm for

IS'11

exploration and analysis of attractive areas using collections of geo-tagged photos,” Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010, p. 38. S. Kisilevich, F. Mansmann, P. Bak, D. Keim, and A. Tchaikin, “Where Would You Go on Your Next Vacation? A Framework for Visual Exploration of Attractive Places,” 2010 Second International Conference on Advanced Geographic Information Systems, Applications, and Services, 2010, p. 21– 26. F. Girardin, F. Dal Fiore, J. Blat, and C. Ratti, “Understanding of tourist dynamics from explicitly disclosed location information,” 4th International Symposium on LBS and Telecartography, HongKong, China, 2007. U.M. Feyyad, “Data mining and knowledge discovery: Making sense out of data,” IEEE expert, vol. 11, 2002, p. 20–25. D. Fisher, “Hotmap: Looking at geographic attention,” IEEE transactions on visualization and computer graphics, vol. 13, 2007, p. 1184–1191.