Search engines as a security threat - Computer - IEEE Xplore

COMPUTING PRACTICES

Search Engines as a Security Threat Because they implement vulnerable security policies, search engines are excellent tools for helping hackers attack machines anonymously, search for easy targets, or gather confidential data.

Julio César Hernández José María Sierra Arturo Ribagorda Benjamín Ramos Carlos III University, Madrid

S

earch engines index a huge number of Web pages and other resources that sometimes inadvertently expose security weaknesses and confidential data. Hackers can use search engines to make anonymous attacks, find easy victims, and gain relevant knowledge that, in some cases, can be more than enough to mount a powerful attack against a network. Furthermore, search engines can help hackers avoid identification, which is one of a hacker’s main objectives. Anonymity is important to hackers because it helps them avoid the legal consequences of their actions and also helps them avoid having their ISPs cancel their accounts. One reason that so few hacking attempts get reported is that—as you realize if you’ve taken a look at your firewall security logs recently—there are so many of them. When you traceroute a hacker’s IP address to its source, your traceroute will often end at a hop completely unrelated to the hacker’s actual ISP or local network, which makes reporting the hacker to the upstream provider a difficult task. Search engines make the task of reporting—let alone prosecuting—a hacker all the more difficult. Several anonymizing techniques are available on search engines to help hackers obtain anonymity. For example, some search engine services can act as anonymous proxies. Search engines are not to be entirely blamed for hacker onslaughts. These indexing programs are dangerous largely because users are careless. Most people are not aware of the security implications of connecting a weak or improperly configured machine to the Internet. A hacker who finds these weak machines with a search engine can use them to compromise the secu0018-9162/01/$10.00 © 2001 IEEE

rity of other machines. Sometimes these improperly configured machines do not store any important data but are trusted by third-party networks. A hacker can gain access to the vulnerable machine and use it as a base for hacking a trusted network. The attacker could also retain anonymity by simply wiping any traces of activity such as log file entries from the weak machine. Furthermore, in the age of DSL and broadband cable accounts, home users often have their machines turned on and connected to the Internet for days at a time. The majority of home users don’t run hardware or software firewalls. Most, in fact, would be shocked to find that potential hackers target their machines— even those with dynamically assigned IP addresses— as often as several times a minute. For home users who have static IP addresses, their machines are even more vulnerable. The ultimate goal of most hack attempts on home machines these days seems to be to make them zombies in a distributed denial-of-service attack. Search engines make discovering these machines—and other weakly configured machines—almost effortless. Search engines and system operators can implement several countermeasures and use basic security techniques that make finding and abusing vulnerable machines much more difficult.

ANONYMOUS PROXIES Several search engines, including AltaVista and HotBot, offer automatic Web translation to their users. Operating this service is simple: The user requests a URL from the search engine translation machine, which downloads the resources locally, translates October 2001

25

Patching Windows Servers A particularly important aspect of operating a secure system is staying up to date on security patches. It’s critical to know which patches have been applied to your system and, more importantly, which haven’t. Microsoft’s HFNetChk will significantly aid system administrators in this task. HFNetChk, a command-line tool that system administrators can download from http://www.microsoft.com/technet/, checks the patch status of all the machines in a network from a central location. The tool does this by referring to an XML database that Microsoft constantly updates. HFNetChk will scan either the local or remote systems for patches available for Windows NT 4.0 and 2000 operating systems, as well as hot fixes for IIS 4.0, IIS 5.0, SQL Server 7.0, and SQL Server 2000. The HFNetChk tool uses an XML file that contains information about which hot fixes are available for which products.

The XML file contains information such as security bulletin name and title; detailed data about product-specific security hot fixes, including files in each hot-fix package and their file versions and checksums; registry keys that the hot-fix installation package applied; information about which patches supersede other patches; and related Microsoft Knowledge Base article numbers. Tools similar to HFNetChk are available for other operating systems. Users who install Web servers for the first time often aren’t aware that relatively easy help is available for updating a server to protect it against the most common security threats. Hackers can always find a way into a system, no matter how secure. But users don’t need to give them the keys to the front door by setting up a default server, connecting it to the Internet, and not updating it with the appropriate security patches.

them, and sends the results to the user. This procedure allows any user to employ the translation machine as a proxy. Users can access other sites through the search engine’s translation function and even piggyback other anonymous connections through a frame-in-frame technique that essentially doubles the hacker’s anonymity at every stage. This wouldn’t be an effective technique in the hands of a hacker if it were not for the fact that both AltaVista and HotBot do not make any restrictions for acting as anonymous proxies. Their users can navigate sites anonymously, hiding their actual IP addresses behind the translation machine IPs. Chaining translation machines from different search engines is relatively easy to do if you’re familiar with HTML frames. A hacker can use this technique to back up an anonymous proxy through several other anonymous proxies, making it even more difficult to trace the hacker’s identification back to the source. While there are actually services on the Internet that provide anonymous browsing—such as through anonymizer.com—the free services typically do not protect hackers from any eavesdropping intermediary between the hacker machine and the anonymizing servers. Some anonymizing services even offer URL encryption, which can protect hackers from the logging and tracking methods that intermediaries—ISPs, proxies, firewalls, and so forth—often employ. But issuing HTTP commands through a translation search engine is a particularly surreptitious method for gaining access to or damaging a weak server—because it’s not done through a standard anonymizing source like anonymizer.com. So many HTTP requests are directed to and from search engines that finding a hacker isn’t easy, especially if the hacker is using several piggybacked anonymous proxies on top of a standard AltaVista or HotBot connection. 26

Computer

Anonymous surfing does not appear to be a greater security risk than using a weakly configured machine connected to the Internet. However, hackers can easily use the anonymity that search engines provide to avoid the legal consequences of issuing malicious HTTP requests from a known machine. A simple HTTP request can actually return the contents of the boot.ini file on some improperly configured Web servers running Microsoft’s Internet Information Server (IIS) 3.0 and 4.0. Hackers can use anonymous translation services to find such servers and exploit them by issuing system commands through long HTTP requests. Hackers can easily and indirectly accomplish all of this behind the several layers of protection that multiple search engine proxies provide.

FINDING WEAK SYSTEMS The volume of pages that the largest search engines index—in February 2001, Google announced that it had indexed 1,346,966,000 pages—makes them excellent tools for identifying vulnerable servers. Standard hacking procedures include finding vulnerable machines (the discovery stage) and gathering information about them (the footprinting stage). One technique hackers can use in the discovery and footprinting stages is to search specifically for recently created Web servers, knowing they are often good targets. These out-of-the-box Web servers usually do not have very good security measures running. When installing a Web server for the first time, most people typically don’t immediately apply the proper updates, hot patches, and other security fixes. How do hackers use search engines to find these machines? Installing server software on some versions of Linux or Windows places certain administrative help files or HTML-based configuration files in the standard directory for Web serving. Because search engines con-

Configuring Default Apache To maximize security when doing a fresh installation of the Apache Web server, you should adopt a strict “need to know” policy for both the document root (which stores HTML documents) and the server root (which keeps log and configuration files). It’s most important to get permissions right in the server root because that’s where CGI scripts and the sensitive contents of the log and configuration files are kept with default installations. You need to protect the server from the prying eyes of both local and remote users. The simplest strategy is to create a www user for the Web administration/webmaster and a www group for all the users on your system who need to author HTML documents. On Unix systems, edit the /etc/passwd file to make the server root the home directory for the www user. Edit /etc/group to add all authors to the www group. You should set up the server root so that only the www user

stantly spider IP address ranges and domains, your server will pop up as a site listing this default content when a search engine indexes these files. Having found your default installation and knowing that you probably have a weakly protected server, a hacker homes in on this information. For example, some people who set up an NT box to do print serving may not realize that they also have installed a default version of IIS and they’re now running Web services. The server is advertising that it’s a default installation, making it a weak penetration point on the network, especially if it isn’t behind a firewall. Microsoft recently released a security bulletin (MS99025) that discusses vulnerabilities in the default installation of IIS 4.0. The company, of course, recommends that server operators take the necessary actions to secure the servers against one of the standard default vulnerabilities. The issue has to do with Microsoft Data Access Component, more specifically MDAC’s Remote Data Services component. RDS’s vulnerability allows an unauthorized user to perform actions in a server running IIS 3.0 and 4.0. Such actions include execution of shell commands as a privileged user and unauthorized access to secure, unpublished files on the IIS server. In other words, default installations of IIS are wide open to attackers who know the commands or know how to get the automated hacking tools that exploit these vulnerabilities automatically. The “Patching Windows Servers” sidebar provides additional information about HFNetChk, a tool that tells server operators what security patches they’re missing on Windows servers. The “Configuring Default Apache” sidebar explains how to do a fresh installation of the Apache Web server to assure its efficiency and security. IIS is just one server among many that hackers can easily attack if they suspect that a particular machine

can write to the configuration and log directories and to their contents. You can decide whether you want these directories to be readable by the www group. They should not be world readable. The cgi-bin directory and its contents should be world executable and readable, but not writable—if you trust them, you could give local Web authors write permission for this directory. For more information about configuring a default Apache installation, see apache.org or John E. Grotevant’s configuration paper, “Basic Apache Security Considerations,” at http://www.sans.org/infosecFAQ/web/apache_sec.htm. Several hacking scenarios will make updating your permissions moot because hackers can essentially enter a system from several different directions. But because it’s easiest to exploit the familiar, hackers attempt to find servers with default installations because they know how to exploit default permissions.

is weakly protected or left wide open with a default installation. And search engines are making the job of finding these weak servers even easier. To find a vulnerable Web server—one that hasn’t been properly configured with the latest updates—a hacker simply searches for some unique text strings, images, or other content that characterizes the default content on Web server installations. For example, to find vulnerable Windows NT machines with default installations of IIS, a hacker can search on Google for “Try the hyperlinks above to see some examples of the content you can publish with Microsoft Internet Information Server… .” There is a high probability that the results will contain links to several out-of-the-box IIS installations. Analogously, Figure 1 shows the string “This page is used to test the proper operation of the Apache Web server after it has been installed…,” which characterizes a default Apache installation. As Table 1 shows, this method for finding unattended or recent installations can give the hacker a large list of potential easy victims. Hackers can also look for default pages that correspond to old (and buggy) Web server versions simply by including the copyright year into the search string—as in “©1997 Microsoft Corporation. All rights reserved.” They can also look for Web servers with a poor security reputation, like Microsoft’s Personal Web Server, which is notorious for being extremely unsecure. Hackers can also take advantage of the ability to easily improve the search results by expanding or restricting the search to return only recently indexed Web pages. Several search engines have a feature that allows any user to search for multimedia content, like banner graphics or logos. This type of searching can locate default installations solely or in combination with the text string searching technique, making search engines a great tool for discovery and initial footprinting. October 2001

27

Figure 1. Apache default installations have unique text strings that allow easy identification using a search engine.

Table 1. Number of hits obtained from a search-engine query of default IIS and Apache server installations. Search engine

Default Apache installations indexed

Default IIS installations indexed

9,360 2,310 6,205

2,970 105 3,824

google.com lycos.com altavista.com

published to warez sites so that a hacker can use an FTP site to traffic warez anonymously—at least until the administrator or system operator notices the increase in bandwidth or loss of disk space on the FTP server. But having your FTP site become the temporary home of freely traded warez is primarily a nuisance. The more serious consequence of an improperly configured FTP server is that it provides access to sensitive information like password files.

FINDING CONFIDENTIAL FILES FTP VULNERABILITY FTP search engines are potentially even more dangerous than standard search engines. FTP file search engines such as Lycos FTP Search have thousands of links that point to sensitive information on weak machines. These links are often full of confidential or critical data. No administrator wants to see a machine’s password files linked from a search engine that is available to anyone on the Internet. Unfortunately, FTP search engines facilitate this kind of data access, making the hacker’s life much easier. Poorly configured servers and servers with weak protection are the primary culprits in exposing this kind of sensitive information. Even with proper protection, hackers can use several methods to crack entrance passwords. These passwords are constantly 28

Computer

Using the Lycos FTP search engine to find confidential files such as /etc/passwd and /etc/shadow provides hundreds of useful hits. A hacker can query the Lycos FTP search engine to obtain an /etc/passwd file and, even if the system has shadowed passwords, use it to launch a social engineering attack. In this type of attack, the hacker usually masquerades as a company sysadmin to get a user to volunteer access codes, usernames, or other potentially sensitive information. In addition to common files like /etc/passwd and /etc/shadow, hackers can find .htaccess and .htpasswd files, which are used to control the access to content within a Web server and to store the passwords that authenticate this access. Because system administrators frequently reuse passwords, a hacker who finds a poorly encrypted

password file from an FTP program—which should not be published within an FTP server anyway—can guess the root’s password and gain complete shell access to the vulnerable machine. Examples of important files that a careless FTP setup could easily expose include CuteFTP’s smdata.dat file and Netscape Enterprise Server’s admpw file, which both use weak encryption to store FTP passwords. A search for files such as smdata.dat or admpw quickly provides a hacker with crucial information. Another type of confidential data that an FTP or HTTP server can expose is standard security audits. Standard security auditing tools often create HTML or data output to be retrieved later for analysis, and sometimes this highly confidential data is published on the Internet by security consultants who aren’t really aware of the exposure of the company’s intranet to hackers. A recent search in a popular search engine found an audit file with more than 4,900 different machines, all in the .edu domain, with very bad security. This is like a birthday present for an attacker because it not only provides access to weak machines but also includes an analysis of those machines’ weaknesses.

HACKING COUNTERMEASURES A good way to make sure your FTP server is protected against releasing private information and

allowing unwanted access is to be aware of its security weaknesses. The “User Security Resources” sidebar provides sources of information about protecting systems against hacker attacks. It seems obvious that companies that operate search engines should take action to avoid indexing confidential information. In fact, under certain laws, providing easy public access to certain kinds of private information is a crime. But search engines do this 24 hours a day. It’s not difficult to imagine a government organization cracking down on a search engine company that exclusively indexes information about default server installations. But that’s not too far from what search engines are now doing.

Translation services Because the hacker community is actively using search engines as the tools of their trade, we urge the search engine community to adopt some methods to control this kind of abuse. For example, the anonymity problem inherent in most translation engines could be solved if the translation machines always tell the target server the real origin of the request. Translation engines can accomplish this by using REMOTE_ADDR, HTTP_X_FORWARDED_ FOR, or HTTP_VIA headers. If search engine companies implement this technique, potential intruders would likely stop using the engines as anonymous

User Security Resources Implementing countermeasures and using basic security techniques can help protect systems against attack. These resources provide additional information about protecting vulnerable machines. Books • Anonymous, Maximum Security: A Hacker’s Guide to Protecting Your Internet Site and Network, Sams Technical Publishing, Indianapolis, Ind., 1998. • J. Chirillo, Hack Attack Denied: A Complete Guide to Network Lockdown, Wiley Computer Publishing, New York, 2001. • D. Curry, Unix System Security: A Guide for Users and System Administrators, Addison-Wesley, Reading, Mass., 1992. • S. Garfinkel and G. Spafford, Practical Unix & Internet Security, 2nd ed., O’Reilly & Assoc., Sebastopol, Calif., 1996. • J. Scambray, S. McClure, and G. Kurtz, Hacking Exposed, 2nd ed., Osborne/McGraw-Hill, Berkeley, Calif., 2001. • E.D. Zwicky, S. Cooper, and D.B. Chapman, Building Internet Firewalls, 2nd ed., O’Reilly & Assoc., Sebastopol, Calif., 2000.

FTP hacking • Packet Storm Communications, http:// packetstormsecurity.org • Exploit World, http://www.insecure.org/sploits.html • RootShell, http://www.rootshell.com • Slashdot Newsletter, http://slashdot.org • Securiteam, http://www.securiteam.com • Security Focus, http://securityfocus.com • Wu-ftpd development group, http://www.wu-ftpd.org/ Additional online resources • • • • • • • •

CERT, http://www.cert.org Common Vulnerabilities and Exposures, http://cve.mitre.org CSG Security Resources, http://web.mit.edu/net-security/csg Computer Security, http://www.alw.nih.gov/Security/ security.html Electronic Privacy Information Center, “Cryptography Policy,” http://www.epic.org/crypto Fyodor’s Playhouse, http://www.insecure.org/index.html Genocide 2600, http://www.genocide2600.com/ Hackers.com, http://www.hackers.com/index2.zxZhtm

October 2001

29

proxies. Of course, there will always be other sources for cloaking an attack with an anonymous proxy, but search engines shouldn’t contribute to the problem. The case of Go.com (formerly Infoseek) is a good example of a search engine company taking action with this technique. Go.com provides translation services through SystranSoft—much like AltaVista—but Go.com’s services cannot be abused so easily because they use the REMOTE_ADDR header to inform the target Web server what machine made the original HTTP request. Go.com also avoids chain translation proxies by rejecting autotranslation requests.

Avoiding indexing There is little reason for search engines to index pages from default Web server installations. Indexing these pages can only be useful to hackers in finding relatively easy victims—system operators who haven’t done much beyond swap the installation disks. Search engine operators can program search engine bots to avoid indexing or returning default Web pages. They also can purge the search engines themselves of this information, which serves no real purpose. According to most estimates, the two main

REACH HIGHER Advancing in the IEEE Computer Society can elevate your standing in the profession. Application to Senior-grade membership recognizes ✔ ten years or more of professional expertise Nomination to Fellow-grade membership recognizes ✔ exemplary accomplishments in computer engineering

GIVE YOUR CAREER A BOOST UPGRADE YOUR MEMBERSHIP computer.org/join/grades.htm 30

Computer

Web servers are Apache and IIS, so it should be possible to purge most default Web page files easily from the search engine databases because the unique text from a default installation is so easy to identify. There are several kinds of files that bots simply should not retrieve from FTP servers. Ideally, search engine bots would follow a simple security policy: Only retrieve what belongs to the public directories of FTP servers. Retrieving and publishing the contents of the /etc directory could be enough to motivate legal prosecution under some laws. A robots.txt file—typically used in a Web server’s root directory to help bots know what to index and what to avoid—could also be useful in the case of FTP servers. The FTP robots.txt file could conceivably tell FTP bots which files or directories to index and which ones to avoid.

S

ecuring all channels against hackers trying to penetrate a vulnerable system isn’t possible. But there is no reason for search engines to be wide-open channels that continue to help hackers find and penetrate weak systems. Because it is so easy to use a search engine to cloak an attack, search-engine-based hacker abuse has become a real threat that poses serious risks. However, not all blame should fall on those who operate the search engines. Search engines aren’t responsible for the huge numbers of poorly configured and insecure machines all over the Internet—even if the search engines do aid in identifying them. But the search engines must take some blame if they continue to provide easy ways to locate weak and penetrable machines. ✸ Julio César Hernández is a security specialist and consultant in the Computer Security Group at Carlos III University of Leganés, Madrid. His research interests include cryptology, cryptanalysis, and network security. José María Sierra is a lecturer at Carlos III University and a member of the Computer Security Group. His research interests include IPv6, IPSEC, network security, and the formal validation of security protocols. Arturo Ribagorda leads the Carlos III University Computer Security Group and collaborates actively on improving the security of various international companies and institutions. Benjamín Ramos is a lecturer at Carlos III University and a member of the Computer Security Group. His research interests include cryptography, authentication, audio steganography, watermarking, and network security. Contact the authors at {jcesar,sierra,arturo,benja1}@ inf.uc3m.es.