User Tracking & Identification During a Web Surfing ...

6 downloads 1338 Views 788KB Size Report
HTML/JavaScript code that can be pasted into webpages. This code activates ... the application Program interface (API) located in the canvass modern internet web ... applications such as YouTube, Spotify and even Facebook. Figure 6: LSO ...
Web Security User Tracking & Identification During a Web Surfing Session Ebere Ogechi [M00516112] MSc Computer Networks School of Engineering & Information Sciences Middlesex University, Mauritius Branch Campus [email protected]

Abstract: The Internet is an amenity that cannot be done without in this present day and age. A tool that proffers so much convenience can in turn become the bane of society. Many internet users do not know that they can be identified and tracked while they surf the internet. This paper investigates several mechanisms that can be used to track and identify an individual during a web browsing session. These mechanisms will include HTTP cookies, canvas fingerprinting, Flash player local storage objects, scripting and others. Past and current work related to this subject area are also discussed leading to the culmination of certain methods being proposed that can be used to accomplish user tracking. A scenario detailing and evaluation how the proposed method will function as a whole system is discussed.

what you have bought in the past and what you have in your wish list. With all these data, they can then suggest to you products that you most likely will be interested in without the need for an exhaustive search. They in turn incessantly spam one’s e-mail inbox with hundreds of such products. On the other hand, data forensic experts within law enforcement agencies, have ethical rights to monitor and track for suspicious user interactions and activity in an effort to preempt possible threats. This paper introduces mechanisms in which a browsing session can be uniquely identified and also a user being identified as the rightful owner of the browsing session in question. Firstly, some areas of interest that can be used in session monitoring and user identification are highlighted and discussed. In the second section, we take a look at similar projects where the researchers were able to track and identify users. The methodology will equally be briefly discussed.

Keywords—web surfing; fingerprinting; browsing; security; network; cookies; surfing; tracking; scripting.

I. INTRODUCTION The Internet in this day and age can rightly be classified as a social amenity that we can’t do without in our day-to-day lives. It is really difficult to go a single day without Internet connectivity to check one’s social timeline, updates, and notifications or carry out serious research work such as this. Most of us are literally online all day every day, from the Internet connection at home to the one at work or school, to the cafeteria or restaurants (local hotspots), and then back again to work or school and then back home, we remain in constant connection to the Internet.

Afterwards, a proposed method to address this research topic is mentioned and duly explained how it will tackle this issue. Then the methods will be adequately evaluated and discussed. II. BACKGROUND & RELATED WORK There have been advances made towards this subject area with tools that can be used to track and identify a user while browsing. These tools or means can include cookies, spoofing, TCP mapping and some authentication protocols.

A large number of individuals that have massive online footprints are still oblivious to the fact that their online activities can be tracked or monitored by third-party individuals or applications. While it may not be totally possible to track every activity a user engages in—like chewing gum or having a burger and some cool aid—it is possible to track ones web surfing behavior and mannerisms to identify them as the ones occupying the current browsing session.

There have also been countermeasures that are presumed to be effective against such unauthorized tracking. These countermeasures can include, using anti-tracking add-ons on browsers, to private browsing aka incognito mode, to anonymous browsing applications such as Tor. [1] Session based form of tracking in collaboration with browser activity and URL encoding was used to monitor users that had several active sessions across tabs on their browser by using their click streams. The researchers demonstrated that if breaks in form of a series of timeouts were initiated and applied to the browsing sessions of these users, it would influence their metrics

Such web surfing behaviors can be tied to ones profile on popular e-commerce sites such as Amazon or e-Bay, that monitor and track a customer’s activity in every session established. They know what you have in your shopping cart,

1

exponentially. Their methodology involved monitoring the click streams of every member of a residential community at across a stretch of time. This translated to about 29.8 million page requests from 967 unique users accessing about 630, 000 webservers with about 110, 000 referral hosts. [2] Also, users and user-roles were able to be tracked by mining web-based applications that functioned in tandem with databases. From the perspective of the database, the same connection is made by the same database user via a web db-application. The researchers developed a method that provided the ability to track web-users in web databases. [3]

Figure 2: Code snippet showing tracking analytics [4]

Details such as a user’s IP address, location (city, country) based on their IP address, date and time of their visit, some information about the browser used and their operating system can all be derived from a script. Such scripts can be invoked— similar to logical bombs—upon a user accessing a certain website, while creating a log file for all users that have accessed that site on the server.

Figure 1: HTTP Cookie exchange between client & server [3]

This method can be applied to existing web applications without the need to overhaul current infrastructure. With tracked databases, logical user sessions can be identified which was used to mine true user roles.

The analytical snippet shown above is a small piece of HTML/JavaScript code that can be pasted into webpages. This code activates Google analytics by inserting “ga.js” into the page. The “UA-XXXXX-X” on line 5, will be replaced by the trackers web property ID. All these should be included before the tag is closed. [4]

There are quite a number of ways that an actual user can be identified during a web browsing session. Some of these indicative methods include the use of server logs, address spoofing, scripting languages, cookies, IP security, TCP mapping, redirected URLs, eavesdropping, amongst many others.

In certain cases, it might be imperative to disable tracking without having to discard completely the code snippet responsible for tracking. To prevent tracking, window property should be adjusted accordingly: window['ga-disable-UAXXXXXX-Y'] = true;.

The aim here at this point is to use a combination of the mentioned areas and come up with a structure to address the overall challenge of the subject topic.

B. Cookies The motivation for the advent of HTTP cookies came about because HTTP on its own is a stateless protocol (inability to retain values across multiple requests). Because HTTP is a stateless protocol, remote web servers have a daunting task of keeping up with millions of client computers by uniquely identifying each and every one of them and their requests. This means that highly interactive web sites and applications remember actions and events from previous interactions of the user.

A. Scripting Knowingly or unknowingly to users that surf the web, they can indeed be tracked. They can be tracked by codes from scripting languages such as JavaScript, PHP, ASP.NET and many others.

2

canvas element found in HTML5. For instance, if an unsuspecting user visits a webpage where canvas fingerprinting is performed, the user’s web browser draws hidden lines of text which is transformed to what is known as a digital image. Owing to the manner in which the user’s GPU and graphic driver are installed and used causes certain alterations in the rendered digital image. This unique image can be saved and then forwarded to certain promotional partners, in order to uniquely identify users who visit associated websites. The user's browsing activity can be recorded thereby allowing encouraging those that want to showcase their goods and services to direct adverts to the users [10]. AddThis is the company that created this fingerprinting mechanism as a replacement or an alternative to cookies. [11]

Figure 3: HTTP Cookie exchange between client & server [5]

A web or HTTP cookie can be said to be a tiny file that contains useful unique data. This small file is saved by a web browser on the computing device of a user. This file is basically just plain text. The remote web server that is accessed in form of a page tells a user’s browser to save this file, (usually with an appended cookie policy). This web cookie is set to the browser by a HTTP response from the server in accordance to the prior HTTP request. Cookies are not saved permanently on the browser as they have a lifetime. Each and every cookie is unique as it interacts with different web servers (web domain names). [6]

III. PROPOSED METHOD The method being proposed in this paper as instruments to be employed to track and identify users during web browsing sessions will present themselves as a system with three major categories. These categories include the following; 

While HTTP Cookies store website preferences that can be managed by regulating the privacy settings in web browsers, Flash Cookies are kept in undisclosed rent directory that a user will not readily have access to or even be aware of. [7] C. Canvas Fingerprinting This is a browser fingerprinting technique that makes use of the application Program interface (API) located in the canvass modern internet web browsers such as Firefox and chrome, where a tracker can take advantage in the way text are rendered using the scenes of the WebGL functionality, all in an effort to retrieve consistent fingerprint data that is acquired in split seconds while being transparent to the user. [8]



Explicitly assigned client-side identifiers o

Flash Plugin LSOs

o

Persistent Cookies (Re-spawning HTTP cookies)

o

Browser Fingerprinting

User-dependent behaviors and preferences o

Data Key-logging

o

Specialized website for user identification SYSTEM

User Identification

PC Identification

User Dependent Behaviors & Preferences

Explicitly Assigned Client-Side IDs

Persistent Cookies Data Key-logging Specialized Website for user identification Tracking

Canvas Fingerprinting Flash Plugin LSOs Browser Fingerprinting

Figure 4: Canvas fingerprinting flow of operation [9]

Most websites recognize and track users using canvas elements of HTML5 as opposed to the familiar web cookies. This form of user tracking functions by taking advantage of the

Figure 5: Proposed System

3

A. Flash Plugin LSOs

C. Browser Fingerprinting

These are also known as flash cookies and have a glaring advantage over regular cookies that were earlier discussed. These sorts of cookies are found on web browsers that utilize the flash player web plugin (by Adobe). This typically translates to the everyday web browser because this plugin is vital for stability and functionality of certain websites and web applications such as YouTube, Spotify and even Facebook.

Unlike locally stored cookies (active technique) that uniquely identify clients for the retention of their sessions, browser fingerprinting is accomplished by polling certain specific parameters that are accessed via the web browser. There are two major techniques involved in fingerprinting namely, active and passive collection of data. [11] The passive method involves browser history stealing attacks and fingerprinting algorithms. The passive technique involves a user being redirected subtly to an adversary’s website where certain JavaScript codes are invoked and unique history records are stolen from the browser. This can equally be accomplished by exploiting unpatched vulnerabilities or API functions that are misused. Browser fingerprinting identifies users by using UAS (User Agent String) which is basically a string that contains important details about the host system, the browser being used, HTTP request parameters, screen resolution, fonts and browser plug-ins installed. Some of these parameters can be seen in the table below.

Figure 6: LSO Overview [10]

As seen in the image above, these type of HTTP cookies are more-or-less brought back to like after their deletion (a user clears his browsers cookie information) after sometime. This is accomplished by using a variety of storage vectors that are transparent to the user making them difficult to remove.

Figure 8: Details from browser fingerprinting [11]

Some fingerprinting applications include NetworkMiner which is a silent TCP/IP and DHCP stack fingerprinting tool. Satori which is also a passive TCIP/IP, CDP, DHCP and other stack fingerprinting tools).

D. Keystroke Logging (Key-Logging) Key-logging is the act of retrieving key presses from a keyboard as a user types. These data are then logged onto a file oblivious to the user. Its ethical or unethical application, as the case may be, is used to especially where web forms are being completed because the data seems apparent that the user was completing a form. With this, a user’s full name, address, e-mail address, passwords, phone number and other personal information can be collected.

Figure 7: LSO about to be saved on PC from a myspace.com webpage [12]

Storage of these objects on a user’s PC can [present themselves while trying to apply flash settings for the first time while accessing a particular website that requires the flash plugin to function; or a flash website.

B. Persistent Cookies (Respawning HTTP cookies)

Key loggers can be hardware of software based or can be wirelessly intercepted. All these platforms are aimed at capturing user information as he types. A typical key logging light-weight software for the purpose of this paper can be delivered to the users’ PC via malware.

To persistently detect unique IDs across multiple storage locations where cookies can are saved, these cookies should have a way of “coming back to life” albeit being wiped from storage. Flash, re-spawning or persistent cookies have this edge over regular HTTP cookies because of this unique feature.

4

users are subtly redirected to his server where immediately malware is downloaded onto the user’s computer. This malware will include persistent cookies, and key-logging scripts. Browser and canvas fingerprinting equally occur. Now certain details are polled onto his remote server where he diligently mines for useful information. The script for the key-logging application is triggered once the user is on a webpage that has form objects such as text fields, list items etc. Captured data will be uploaded to his remote server every 24 hours provided that the user has internet connection. The flash cookies always help him identify what computer is fetching this data to his remote server.

Figure 9: Sample Log File From Key-logging [13]

The captured data will then be pushed to a remote server where it will be analyzed for unique identifiers to sort out valuable personal user information (user identification), that can later be acted upon by the adversary.

Deep web search: Pipl.com Facebook LinkedIn

E. Specialized Website Tracking There are websites available out there that when a user enters his/her name or that of another individual, all the details of that user is displayed, details such as name, location, profession, work place, e-mail address, social and professional networks, etc., provided that the subject has quite the internet footprint.

Malware deployed Adversary

Key-log Capture Browser Fingerprinting Web User Figure 11: Application Scenario

The previously mentioned website pipl.com, is open for the public to use and so are many others like Facebook and LinkedIn. Since access to these sites are not will strictly regulated, Cain can properly identify and track the names pulled from the key-log capture file by running deep web searches on these websites. His search on this site can yield fine-grained results as he can narrow his search to location, age, gender, occupation and other personal attributes.

Figure 10: Name search on pipl.com

These type of websites is normally relevant when some user details have been acquired and further search to get other details of the user is needed. Captured results from key-logging for instance will be useful on these websites where names are concerned. An example of a company and website that delivers this service is pipl.com.

Cain can now identify and track a user during a web browsing session using a combination of all these techniques as well as employing the proposed method in this paper. V. EVALUATION & DISCUSSIONS

As seen in Figure 10, a search was made “James Maddison”. The results show a wide variety of results where the results can then be sorted by age, location, gender etc., to narrow down the search to help identify a particular individual based on information at hand.

The proposed model of identifying and tracking actual users during a web session can only function effectively if certain conditions are met. These conditions are invariably the limitations of this proposed model. These limitations are discussed below;

IV. APPLICATION OF PROPOSED SYSTEM In order to evaluate this system, a scenario has to be employed. An adversary, Cain, is now in the business of stealing personal information about users that patronize a particular service online. He utilizes several redirection attacks where

5



This model requires that internet connectivity be available for the user for the model to begin to function.



The proposed model makes use of link redirection for the malware to be deployed to the target computer. The redirection of link occurs on a previously visited

illegitimate web pages, so when the user clicks on a link, he/she is redirected to the adversary’s remote server where the malware can then be deployed. If the user is a careful one that takes adequate precautions while online, he/she won’t be susceptible to the link redirection attack. 

[2] M. Meiss, J. Duncan, B. Goncalves, J. Ramasco and F. Menczer, "What’s in a Session: Tracking Individual Behavior on the Web," School of Informatics, Indiana University, Bloomington, IN, USA, 2009. [3] Y. Gonen, "Users Tracking and Roles Mining in WebBased Applications," Ben-Gurion University of the Negev, Be’er Sheva, Israel, 2011. [Online]. [4] "Tracking Basics (Asynchronous Syntax)," Google Developers, 2015. [Online]. Available: https://developers.google.com/analytics/devguides/colle ction/gajs/. [5] C. Shiflett, "The Truth about Sessions," Shiflett.org, 2011. [Online]. Available: http://shiflett.org/articles/thetruth-about-sessions.. [6] B. Pope, "HTTP Cookies: Providing State in Web Applications," University of Melbourne, 2008. [7] T. Vega, "Code That Tracks Users," The New York Times, 2015. [8] Wikipedia, "Canvas fingerprinting," 2015. [Online]. Available: http://en.wikipedia.org/wiki/Canvas_fingerprinting#cite _note-WebNeverForgets-5. [Accessed 08 03 2015]. [9] G. Aca, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan and C. Diaz, "The Web Never Forgets: Persistent Tracking Mechanisms in the Wild," KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium, 2013. [10] "Delete Flash Cookies to Stop Web Sites from Secretly Tracking You," Howtogeek.com, 2015. [Online]. Available: http://www.howtogeek.com/howto/7302/delete-flashcookies-to-improve-your-privacy-online/. [Accessed 10 03 2015]. [11] K. Boda, Á. Földes, G. Gulyás and S. Imre, "User Tracking on the Web via Cross-Browser Fingerprinting," Department of Telecommunications, Budapest University of Technology and Economics, Magyar tudósok krt. 2., H-1117 Budapest, Hungary, 2012. [12] "The 'Fingerprinting' Tracking Tool That's Virtually Impossible to Block," Mashable, 2014. [Online]. Available: http://mashable.com/2014/07/21/impossibleblock-tracking-tool/. [13] T. Olzak, "Keystroke Logging," Erudio Security, LLC., 208. [Online]. [14] "Tor Project: Anonymity Online," The ToR Project: Torproject.org, 2015. [Online]. Available: https://www.torproject.org/. [Accessed 21 03 2015]. [15] "Tor (anonymity network)," Wikiperia: Wikipedia.org, 2015. [Online]. Available: http://en.wikipedia.org/wiki/Tor_(anonymity_network). [Accessed 21 03 2015].

This model also requires that the malware is present on the users system. If the user runs antivirus applications with up-to-date virus definitions, this malware can be detected and removed. VI. CONCLUSION & FUTURE WORK

In this paper, a system that could be used to identify computers or computing devices with methods to identify and track users in a web session was discussed. It can be found as intriguing at how oblivious individuals are to the proven fact that their activities online can be indeed identified and monitored. Listed below are some ways for users to help try to protect themselves from such intrusions of privacy;  Web Browser History: After every browsing session, it is encouraged that a used wipes his/her browsing history and also flush all web and browser cookies.  Java Script Search Engines: Users must strive to avoid JavaScript based search engines. Because these search engines are JavaScript based, they have the potential to extract information without the knowledge of the user. Better alternates are the start pages for Firefox and Google Chrome browsers.  Suspicious Activity: It is highly encouraged to not respond to any e-mail that the sender is unknown or even click on links in such mails. These e-mails often contain phishing activities that can secure a users’ personal information.  The Onion Router (ToR): This application helps a user maintain his/her anonymity and privacy online. Tor prevents attackers from learning an individual’s location as well as browsing habits. Having this application installed helps defend against traffic analysis. It is free and open-source and cross-platform. ToR browser also notifies users for attempted canvas read events in order to provide an option to revert back blank image information to prevent the action of canvas fingerprinting. [14] [15] As for future work, the tracking and monitoring aspect of the proposed system and model will seamlessly be integrated into one application that can be executed by an adversary. VII. REFERENCES [1] R. Lightner, "Five smart ways to keep your browsing private," Cnet, 2012. [Online]. Available: http://www.cnet.com/how-to/five-smart-ways-to-keepyour-browsing-private.

6