Paper No. T.4-2.2, pp. 1-3
The 6th PSU-UNS International Conference on Engineering and Technology (ICET-2013), Novi Sad, Serbia, May 15-17, 2013 University of Novi Sad, Faculty of Technical Sciences
BYPASSING INTERNET ACCESS CONTROL WITH PROXY WEBSITE Igor Zecevic*, Petar Bjeljac University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Serbia *
[email protected]
Abstract: Internet filters are a type of software designed and optimized to allow controll of access to internet content. A large number of countries, including those with a long history of freedom of speech, have actively been involved in internet content access control. However, filtering software use may lead to statewide internet censorship. The aim of this paper is to present an approach for bypassing internet filters. The approach is based on the use of well known structural design patterns, used for the implementation of object-based software solutions. Key Words: Bypassing internet control/ Internet filters/Content control software/ Design patterns/Proxy pattern
1. INTRODUCTION Internet access controll is most oftenly implemented using either the network infrastructure (use of a proxy server) or using different types of software on PC-s or servers. Internet filters (content control software, content filtering software, web filtering software) [1] are a type of software designed and optimized to allow controll of access to internet content. This type of software allows restrictions to be made at different levels: country level, regional level, institutional level (library, school, university, internet cafe etc.) or single PC-s. A large number of countries, including those with a long history of freedom of speech, have actively been involved in internet content access control, often with wide support from the public [2]. Child pornography, hate speech and copyright violation encouraging content are just some of the reasons which make internet access control legitimate. However, filtering software use may lead to internet censorship, which is most oftenly established at national level [3]. Restriction of access to online news, prevention of any type of discussion between people or the blocking of comments about events such as the elections, protests or riots are some examples of censorship [4], [5]. Some internet filters even block content which should, according to their producer specification, be acceptable. That is how denial
1
of access to content about medicine, religion [6], sexual minorities [7] or even higher education [8] may occur. The aim of this paper is to present an approach which allows internet filter bypassing and gaining of full access to all internet content. The approach is based on the use of well known structural design patterns [9], used in the implementation of object-based software solutions. The described approach was implemented in the form of a public website (proxy website) containing integrated mechanisms for user request analysis. The paper also shows usage results for the implemented website during a one year period. 2. HOW DO INTERNET FILTERS WORK? Despite the fact that internet filter use is widely spread and accepted among parents, schools, libraries and work organisations, an insignificant number of studies have been made in attempt to empiricaly test the performance of this type of software. Hunter [10] has made analysis and compared the most famous internet filter software packages in the context of the (non)validity of blocking internet content access. Greenfield et al. [11] have classified filtering techniques, used in 24 software packages included in the study, into source-based and content-based internet filters. Source-based filters are filters based on predefined „white“ and „black“ lists, containing lists of allowed and prohibited website URL-s. The main disadvantage of source-based filters is the out-ofdateness of the used lists, due to the high frequency of changes on the internet, leading to insufficient content filtering (under-block error). On the other hand, contentbased filters analyse each requested page by comparing its content to a list of prohibited predefined key words, frazes and profiles. This filtering mechanism might cause the content-based filters to block content which should not be blocked in the specific context (over-block error). Commercial content filtering software packages, which have gained highest grades in comparative analysis [12], use a combination of these techniques, in order to gain flexibility (lower degree of over-block errors) and ensure better performance (lower degree of under-block errors).
according to the presented pattern, the Client object does not communicate directly with objects of the RealSubject class, it communicates only with the Proxy object. The Client-Proxy communication is carried out in the same way as is the one between the Client and RealSubject. This is made possible because both classes (Proxy and RealSubject) implement a set of publically available services, declared in the Subject interface. The Proxy object controlls access to the RealSubject, may be responsible for the lifecycle of its instance (initialization, maintenance and deletion), but may also execute pre and postprocessing of the execution results of operations delegated to the objects of this class. This exact characteristic of the proxy pattern allows its use for bypassing internet filters.
Regardless of the type of filtering software (browser-based, client-oriented, network-based, browser integrated filters), there are at most four algorithms used to decide whether the content is going to be blocked: a) content URL checking – a list of prohibited webpages (blacklist) is checked for each requested webpage. If the requested address is found in the list, the display of the page is prevented. This filtering mechanism is based on a permanent update of the list of blocked URL-s, which is a very demanding task. Internet filters contain either an internal blacklist mechanism, or use commercial services which perform the list updating task [13]; b) requested term checking – all terms sent from the web browser in the search request are analysed and compaired to a list of unappropriate terms. If the terms are found, the display of the content is blocked; c) content category and key-words checking – every website contains a categorisation, allowing it to gain a certain status in web browsers. If an unappropriate category is found, the display of the content is blocked; d) content checking – before the content is displayed, each page is parsed and analysed. In case of unappropriate content, the display of the page is blocked. Category, key-word and content checking of requested pages also requires a list of terms which are considered unappropriate. Even though this list is of slowly variable, the need to provide translations for a large number of different languages presents a significant problem. This is why, for instance, a large number of internet filters is not capable of blocking websites with cyrilic text, despite their content.
3.2. Proxy desing pattern in internet filter context An illustration of the proxy design pattern use for bypassing internet filters is presented in Fig. 2. In order to access content, which might be unaccessible due to internet filters, the user does not access the desired resource directly. He actually communicates to the proxy website first. Due to the fact that this website does not contain unappropriate terms in its content, categorisation or key-words, and the URL of the website is initially not found in the blacklist, internet filters do not flag the content unappropriate. Using the proxy website, which is loaded in the web browser on the computer where the internet filter is found, the user inputs desired terms or the direct address of the desired content. Prior to the http request, which may now contain prohibited terms or addresses, being sent, the browser-side methods encode [17] the adresses and desired terms. The web browser then sends the request, containing the encoded parameter values, to the proxy website. As the analyzed http parameter values can not be interpreted as unappropriate, internet filters do not block this request. The proxy website decodes the received request, thus receiving the original request. The desired content is unobstructedly loaded at the proxy website in the form of source code.
3. PROXY WEBSITE 3.1. Proxy desing pattern The proxy design pattern [14],[15] is a well known mediator pattern between client requests and the desired resource. This pattern is used in object-oriented software constructions, thus enabling controll of access to certain objects. The need for access control may emerge due to many different reasons: memory resource management during the instantiation and initialization of demanding objects, different object access rights, the preservation of a sophisticated object access and referencing methods etc. The UML class diagram [16] which presents a conceptual preview of the proxy design pattern is displayed in Fig. 1. In the environment implemented
Fig. 2. Proxy design pattern in internet filter context Before the response to the client is sent, the loaded content is encoded. The encoded content sent to the client from the proxy website can not be interpreted as unappropriate content. The content itself, at that moment, presents an array of characters with no semantics whatsoever, for the user, as well as for internet filters. Shortly after this content is loaded into the web browser, the integrated methods decode the content and display the web page in its original form. Because the original content of the page isn't displayed until after the page has
Fig. 1. UML class diagram for the Proxy design Pattern
2
been loaded, internet filters are not able to detect its original content and therefore do not block the page. The described mechanism is successful up to the point when the proxy website URL is added on the blacklist used by the internet filter. The described proxy pattern was implemented using standard web technologies. The proxy website, which allows request decoding and content encoding, was implemented using PHP [18]. The web browser script, which enables request encoding and content decoding was implemented using JavaScript [19]. As an encoding/decoding algorythm, Base64 was used [20]. 4. PROXY WEBSITE USAGE STATISTICS Internet Control Byppaser is a website implemented by the authors of the paper using the described methodology. The user structure analysis of the website was conducted using the Google analytics service [21]. This service allows insight into the number, frequency, uniqueness, geographical location and technical characteristics of the visitors. The tracking of the accessed content type was conducted using mechanisms of internal analysis of the desired pages. Internet Control Byppaser was publically available between the middle of February 2012 and the middle of February 2013. During that period, 22.002 visits have been registered by 14.057 unique users from 141 different countries. These visitors have used the proxy website to open a total of 678.543 third-party web pages. Furthermore, 43.869 internet content searches have been made. Table 1 displays the statistical distribution of visitors sorted by the country of origin. Table 2 shows the statistical distribution of accessed content according to content categories. Table 1. Visitors by country United States Pakistan India Indonesia Thailand United Arab Emirates Saudi Arabia Malaysia United Kingdom Iran Other
18,02 15,93 11,79 5,98 5,66 4,37 4,25 3,01 3,01 2,73 25.25
Table 2. Content by category Adult content Social networks Other
86,15 9,12 4.73
5. REFERENCES [1] C. Chou, A.P. Sinha, H. Zhao: "Commercial Internet filters: perils and opportunities", Decision Support Systems, Vol. 48, No. 4, pp. 521–530, 2010. [2] J. Zittrain and J Palfrey, “Access Denied: The Practice and Policy of Global Internet Filtering”, MIT Press (Cambridge), 2008.
3
[3] J. Zittrain: "Internet filtering in China", Internet Computing, IEEE, Vol. 7, No. 2, pp. 70-77, 2003. [4] H. Noman and C. J, York, "West Censoring East: The Use of Western Technologies by Middle East Censors, 2010–2011", OpenNet Initiative, 2011. [5] Reporters Without Borders, "List of the 13 Internet enemies", 2006. [6] "Notice!", http://trifold.tripod.com/NOTICE.html , Retrieved 2013-01-25. [7] “The Mind of a Censor”, http://www.spectacle.org/cs/burt.html, Retrieved 2013-01-25. [8] "Web Censors Prompt College To Consider Name Change", http://slashdot.org/story/00/03/01/2230240, Retrieved 2013-01-25. [9] R. O. Duda, P. E. Hart, D. G. Stork: "Pattern Classification", Wiley-Interscience, 2000. [10] C.D. Hunter: "Social impacts: Internet filter effectiveness - testing over - and underinclusive blocking decisions of four popular web filters", Social Science Computer Review, Vol. 18, No. 2, pp. 214–222, 2000. [11] P. Greenfield, P. Rickwood, H.C. Tran: Effectiveness of Internet filtering software products, CSIRO Mathematical and Information Sciences, 2001. [12] Small Business Content Filter Review, 2013 Compare Best Small Business Content Filtering Software, http://small-business-content-filterreview.toptenreviews.com, Retrieved 2013-04-04. [13] URLBlacklist.com, http://urlblacklist.com/, Retrieved 2013-04-04. [14] E.Gamma, "Design Patterns: Elements of Reusable Object-Oriented Software", Addison-Wesley Professional, 1995. [15] C. S. Horstmann: "Object-Oriented Design and Patterns", John Wiley & Sons Inc, 2005. [16] Object Management Group, Unified Modeling Language: Superstructure (convenience document), Version 2.1.2, OMG document, 2007. [17] G. Kuenning, E. Miller, "Anonymization Techniques for URLs and Filenames", Technical Report UCSC-CRL-03-05, University of California, Santa Cruz, Sep. 2003. [18] PHP: Hypertext Preprocessor, http://php.net/, Retrieved 2013-01-29. [19] JavaScript Tutorial, http://www.w3schools.com/js/, Retrieved 2013-01-29. [20] Base64 Decode and Encode, http://www.base64decode.org/, Retrieved 2013-0129. [21] http://www.google.com/analytics/ , Retrieved 201304-04.