TimeKeeper: A Metadata Archiving Method for Honeypot ... - IEEE Xplore

Proceedings of the 2007 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY 20-22 June 2007

TimeKeeper: A Metadata Archiving Method for Honeypot Forensics Kevin D. Fairbanks, Christopher P. Lee, Ying H. Xia, and Henry L. Owen III, Member, IEEE Abstract- Internet attacks are becoming more advanced as the economy for cybercrime grows and the tools for evading detection become ubiquitous. To counter this threat, new detection and forensics tools are needed to capture these new techniques. In this paper, we propose a method to extract and analyze a richer set of forensic information from the file system of onepot tol use. ue. We e sow in inspte journal of honypots anti-forensic tool show spite off ati-oresi

initial results of our journal monitoring prototype, TimeKeeper, of file system activities and argue that by detecting these events, we are able to capture previously unavailable forensic information. This forensic information can then be used for system recovery, research on attack techniques, insight into attacker motives, and for criminal investigations. Index Terms- Ext3 Journal, File Systems, Forensics, Metadata, Honeypot

When performing computer forensic analysis, a key step is the development of a timeline to discern when certain events took place. Obtaining the modification, access, and change times of a file can be valuable in this process [1]. When examined with a trained eye, this data source can not only reveal intrinsic information about a file, but also give details about intrusions. However, this data source is temporary and there exist anti-forensic techniques to corrupt this information [2]. As a response to these techniques, this paper proposes a method to track file metadata times in honeypots to produce a reliable source of information for forensic analysis. Existing forensics toolkits are used to determine the transactions that have taken place on a production system. They offer many utilities that aid in accessing the multiple layers of abstraction to retrieve data. They can also greatly reduce the labor required when examining many logs by gathering and presenting data in a way that is useful to investigators. These toolkits, however, are made to analyze production systems. Production systems are characterized by having very limited resources to dedicate purely to security, and therefore the persistence of journals and log files vary depending on hardware and software implementations and limitations. Honeypots differ in that they offer the advantage of not having typical production system constraints. These differences can be leveraged to create a richer set of data for Manuscript received March 21, 2007 {kevin.fairbanks, chrislee, yxia, owen}@gatech.edu School of Electrical and Computer Engineering

honeypot forensics than what is capable for production systems.

We developed and deployed a method that successfully tracks modification, access, and change times in a way that makes malicious modification of the file system difficult to hide. This makes forensics using metadata times more reliable and useful in honeynets. The metadata time information can ted from the Ext3 journal. This approach is from desharvested the defaurnale s foro S desirable, because Ext3 is the default file system for most

Linux distributions. This technique allows the identification of suspicious file modifications that attackers may attempt to

conceal through the altering of metadata. It can also serve as a starting point for a full forensic investigation, or yield more insight into an ongoing one. The major purpose of this paper is to present the concept of our Honeypot targeted journal monitoring tool named TimeKeeper. The remainder of the paper is organized as follows. Section II provides background information about file system architecture and discusses the differences between honeypots and production systems. Section III covers some related work. In Section IV the TimeKeeper system is explained including preliminary experimental results. We conclude in Section V.

II. BACKGROUND A. File System Overview Ext3 is an extension of Ext2 that gives the Ext2 file system journaling abilities. Each Ext2 partition is separated into units called block groups. Each block group is further separated into blocks. All groups are assigned an equal number of data blocks and inodes. Every file must have a unique inode, therefore the number of files on a system is limited to the total number of inodes. An inode is an object that stores information relating to a file such as permissions, timestamps, and the data blocks associated to that file. Data blocks, group descriptors, bitmaps, inode tables, and the file system super block are the basic types of blocks in a group. The super block in Ext2 contains important information relating to the entire filesystem. Examples of this data are the total number of inodes, the total number of blocks, the number of blocks per group, the number for free blocks, and the first available inode. Group descriptors contain data pertaining to a particular group such as the blocks associated with a particular group, the range of mnodes in a group, and the offset of certain blocks into a group. Many sources of information state that the super block and group descriptors are replicated

Georgia Institute of Technologyineeybokgopwtthcpesnblkgru0big Atlanta, Georgia 30332 neeybokgopwtth oesnblcgru0big

the ones primarily used. Although this is the default option, it

1 -4244-13 04-4/07/$25 .00

©C2007 IEEE

1 14

Proceedings of the 2007 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY 20-22 June 2007 is common for Ext2 partitions to use the Sparse super block Option [3] to only replicate these blocks in a fraction of the total block groups, thus saving space on larger partitions which have many groups and therefore a larger set of group descriptors. In each group, there is an inode bitmap and a data block bitmap. Each bitmap is limited to the size of one block which limits the number of inodes and blocks that a particular group can contain. Since the Ext2 file system supports block sizes of 1024, 2048, and 4096 bytes, the maximum number of blocks that a group can contain is 32k. A bit set to 1 denotes either a used or unavailable block while a bit set to 0 denotes an unallocated block or inode. Note that if the number of blocks a group contains is less than the number of blocks that a group can theoretically support, then the superfluous bits in the data block bitmap are set to 1. Each Ext2 inode is 128 bytes. These are stored sequentially in the inode tables in each group. The size of the inode table is dictated by the number of inodes per group which is limited by the size of a particular block. As an example, with a 4096 block size the maximum number of inodes per group is 32K inodes. Each block in this example can hold 32 inodes, therefore the maximum size of the inode table would be 1024 blocks. B. Journaling ournal ofjournaling Themainepurpose isytoreducerthe time needed to recover a file system after a system crash. The journal acts as a circular log that either does or does not replay file system operations to bring the system to a consistent state faster than the time required when using a file system check program such as e2fsck. A very important feature of any journaling file system is atomicity, meaning that either a transaction took place completely or it did not at all. In Ext3 this is handled by the Journal Block Device layer. In [4] there are three types of journaling modes described: Journal, Ordered, and Writeback with Ordered being selected by default. When in Journal mode, both data and metadata are written to the journal before being updated on disk. While this is very safe, it is the slowest form of journaling since everything is written twice. In Ordered mode, only metadata is written to the journal, When it is time to flush the changes to disk, the actual data is written to the main disk before the metadata. This method reduces the chances of corruption. Writeback, like Ordered, logs metadata only to the journal but does not take the precaution of writing data to disk first. Each of the modes mentioned above have the obvious tradeoff between speed and reliability, but an additional side effect of choosing one method over another is the amount of forensic data that can be gathered from the journal. A more detailed explanation of the journal can be found in [20]. The

main

purpose

ofj

ing istoreducethetimeneeded

C. Major Differences Between Honeypots and Production Systems One major difference between honeypots and production systems is that honeypots serve as reconnaissance mechanisms while the function of a production system is user defined,

1 -4244- 13 04-4/07/$25.00

©C2007 IEEE

This division in usage leads to the realization that resource consumption for the two types of systems can also differ, giving a honeypot the leeway to use a greater percentage of its resources toward security monitoring than would be practical for production systems. While a production system may be utilized by one or more parties and usually stores some set of data that may be important, a honeypot has no actual users nor does it contain important production data. After an attack, data recovery may be attempted if a production system is compromised, but on a honeypot the hard disk could be taken out for analysis, possibly stored, and then eventually formatted and reused. Data existing before an attack is inconsequential. In a honeynet environment, all network activity produced by a honeypot is monitored and stored as any traffic is suspicious. Traditional forms of information such as packet captures and flow logs can be stored for a relatively long period of time depending on the amount of attack traffic. Production systems differ in that regular user activity can possibly generate far more network traffic over the same amount of time than a honeypot. This massive volume of data limits the amount of state that a production environment can

keep.

In general, resource usage on production systems is geared toward increasing the performance of that system for some application. This need not be the case for honeypots. Although honeypots need to mimic production systems to entice attackers, more resources can be geared toward the security of the honeypot than that of the production system. With that in mind, more processor cycles and memory can be used to log incremental events such as the tracking of changes in file metadata to aid in the forensic analysis of attacks.

III. RELATED WORK A. Sebek Since honeypots are "systems designed to be compromised by an attacker" [5], they are constantly monitored for any signs of an intrusion. This vigilance has prompted hackers to hide their actions in a variety of ways, one of the most effective being the use of encryption. Sebek was designed as an answer to this problem and is the "primary tool to capture attacker activity on high-interaction honeypots" [6]. Honeypots often contain trojaned forms of binary programs in order to gain information about an attacker's activities [7]. For example, a trojaned shell that logs keystrokes could reveal to the honeynet administrator every command entered by an attacker. To counter this, attackers began downloading their own binaries to perform tasks. Furthermore, in an effort to obscure their actions, hackers often install their own form of session encryption software. Without obtaining their session keys, all of the network flow data is useless in terms of content analysis. Sebek client is a kernel module based on a rootkit that ~~~~~~~~~~~~~The runs hidden in privileged space on a Linux honeypot, giving it the ability to intercept encrypted data at the kernel level after it is decrypted for execution. It has the ability to log keystrokes and capture the password necessary to execute

115


encrypted programs such as burneye binaries. This is accomplished through the substitution of the read( system call with a special Sebek read( system call that logs data before transferring control back to the original read( system call. Moreover, Sebek exports data from the honeypot to a Sebek server on a different host using a specialized IP stack implementation so that Sebek packets are not detectable to other honeypots with the module installed [7]. Although Sebek is a great tool for gathering data from a honeypot in a honeynet, there exist methods for its detection. Dornseif et al. have devised several methods for detecting, disabling, and circumventing the kernel module and developed the Kebes toolkit to test their ideas [8]. Some of the ideas presented take advantage of the limitations of the read( system call while others seek to make forensic analysis very difficult by drastically increasing the amount of data logged by Sebek. Detecting Sebek is not the only anti-forensics work that has been explored; there also exists work for honeypot and other like environment identification as detailed in [9-11]. B. Inotify Inotify is a Linux utility that allows the user to be notified when certain events happen to a set of inodes. It is the replacement of dnotify and has been merged into the linux kernel as of release 2.6.13. Because it notifies the user of an modeevet, notiy could culdbe ued ass an a mod inode event, be used inode moitoing Inotify monitoring mechanism for archiving. The difference between Inotify and TimeKeeper is that Inotify works by monitoring system calls while the latter uses the Ext3 Journal as a source of information in an effort to offer a more scalable solution [12].

model for the two classifications of systems, although related is quite different when implemented due to the structure of the file system. In [1], the journal is mentioned as a source of time information and examples are shown of the information that can be extracted. We are unaware of any work that procures this information regularly for forensic analysis after a security incident while the system in question is active. IV. TIMEKEEPER

A. System Architecture The system proposed in this paper takes advantage of the honeypot environment by extracting metadata information from the journal and storing it to be used for forensic analysis. This is done by gathering the necessary data about the file system from the super block and group descriptors and then continuously reading and parsing through the journal data. Currently a prototype has been developed based on the Ext3 file system that is able to successfully identify and extract inode entries from the journal data. From these entries, the metadata times are retrieved and exported giving the honeypot administrator a history of times that can be used do to the following: foBuile g B la . . * Determine any irregularities Dtrieayirglrte * Compare the times lifted from theth system by one of the ious tionedtoo those in th y database to detect any nefarious changes that did not roueunaietis j pr g

mnode

-eve

C. Forensic Tools

Architec1ure of TimeK.e'eF

Building a timeline of events through the use of data available from the file system is a technique that has been examined with many open source and commercial tools f numai / developed for this purpose. Among the most well known are The Coroner's Toolkit (TCT) [13], The Sleuth Kit (TSK) [14], EnCase by Guidance Software, and Forensic Toolkit by Super AccessData (FTK). TCT, developed by Farmer and Venema, uses the mactime tool, which in turn uses the Istat( system call to Tm epor access the metadata times stored in inodes. TSK was written by Carrier and is based on TCT [14]. It uses the mac-robber tool to retrieve the same information from files. Details about similar features in the commercial products EnCase and FTK Group can be found at [15] and [16] respectively. In [1], some Descriptor no UNIX tools are discussed that allow access to file metadata nformion such as Is -i, debugfs, and the stat command. Each of the previously mentioned toolkits has limitations and certain anti-forensics toolkits have been developed to thwart evidence recovery. In an effort to cover their tracks an Figure 1: TimeKeeper Architecture attacker may use tools such as Timestop [17] or the Defilers Toolkit [18]. It has also been observed that the touch B. Experimental Setup command is used in an effort to cause confusion [1] [2].TotsthTieeprrtty,alobckdvews In [19], a comparison iS given between what the author crete an.one sa x3priinisd faUut considers to be traditional and advanced UNIX file systems. Liu vita mahn. Tieeee wa exctdotieo One of the main takeaways of this work is that the forensic thi patto atteitra.f6seod ewe ora 1-4244-13 04-4/07/$25 .00

©C2007 IEEE

1 16


dumps, while various types of file modifications were performed and tracked. It is very important that the database be located outside of the file system that is being monitored to avoid a feedback loop. Inode 12

Seq

14

21

14 12

25 26

14 12

14

21

23 25

26

Mtime ti

Atime ti

ti

t2

t3 ti

t3 t2

t2 ti

t3

Ctime ti

t2 t2

t3

t

t2 t3

t3 t3

_

_

t3

t3-

fthe file history.

then accessed again using the touchcommand

which modified all of the metadata times by setting them to the same value. TimeKeeper also recorded this modification. Next, the word "Hello" was appended to the end of the file This caused the inode using the echo command. modification and change times to be updated while the access time remained the same. This happened because the file was never actually opened; instead the bytes were added to the end of the file as denoted by the information stored inside of the

Whnhfl wsreaedusn temycmmn,*h mode number remained unaltered as well as the access and modification times, but the change time was altered. Finally the rm command was used to delete the file, thereby causing an update to the deletion time as well as the modification and

Mtime t4 t5

Seq 27

28

30 31

t5 t7

Atime t4 t4 t4 t4

Ctime t4 t5 t6 t7

Dtime

t7

Table2: Continued Condensed TimeKeeper Output

t2

pTure otlewas then accessed again using the touch command The file

1-4244-1304-4/07/$25 .00 ©C2007 IEEE

14 14

_

As a first test to show that historical modification, access, and change times could be obtained, we created a file with the touch command. This created an empty filepy with the modification, access, and creation times set equal to each other. This was verified by the stat command and may be seen in sequence number 21 in the Table 1. This file was then opened and edited by the VIM text editor. The text editor created a copy of the original file that was then edited and saved as a different inode. When a file is saved in VIM, the original file is deleted and the new inode assigned the filename. Table 3 shows the sequence numbers that map to these events. The stat command, after the editing of the file by VIM, showed that the filename is associated with a new inode that had a new set of metadata times. When the TimeKeeper database was queried based on the original inode number, it clearly showed the life span of the first inode including its deletion time. A similar query based on the new inode number showed a series of events that ends with all of the metadata times set to the same value making it appear to be a new file. When these results are interspersed, a pattern between the two modes emerges and it becomes apparent that the deletion time of the first inode corresponds with the change time of the second mnode, yielding a more accurate

inode,

Inode 14 14

Dtime

Table 1: Condensed TimeKeeper Output

was

change times. All of these incremental file changes were detected and captured by TimeKeeper and are summarized in Table 2.

A sequence of the test events along with the commands that were issued and the sequence number under which the events manifested themselves in the journal are provided below. Event

Command

Sequence

2

edit(VIM)

223-26

1

touch

21

3 ~ ~~~~~~~~4 touch 27 echo 28 5 30 mv 6 rm 31 Table3: Sequence of Events

~

D. Limitations

The current prototype is written in the Python scripting language. Although this provides a means for rapid development for a proof of concept, it does have certain drawbacks. For this technique to gather information, the program must constantly run in the background. If an attacker is skilled enough and truly wants to hide any actions taken, the program can be detected and stopped. A kernel patch or module might yield better results because they operate close to the file system. Since this technique depends on the presence of the journal, it is also limited by the information stored therein. If an attacker wants to avoid any journal probing device, then they could circumvent the file system writing directly to the raw device. This technique, as mentioned in [1], may cause file system corruption if used incorrectly which may be a tell-tale sign of suspicious activity. Finally, current work is limited to the Ext3 file system. This issue is not paramount as the theory behind the technique can be to any journaling system Ext3 is pervasive in the extended Linux community. Note that the and information that can be obtained is limited by the mode ofjournaling supported.

E. Future Work Currently the proof of concept prototype is set to poll the journal at a predetermined interval. The rate at which the journal overwrites itself is dependent upon the type of journaling supported by the file system, the type of operations being performed, and the rate at which file system operations are performed. This approach is not optimal as the rate increases. A better solution would be to monitor a certain block the journal such that every time that the block is acesein fo .rtn,tejunlisatmtclyadtdb thmoirngpga.Tisaccwulpevtmsig junldt u oecsiefl prtosbtensmln times. Also, it would decrease the overhead required in 1 17

Proceedings of the 2007 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY 20-22 June 2007 finding the synchronization point between the current and previous journal samples. Since the data being gathered can become untrusted if it is stored on the same medium that the attacker has access to, methods are being devised for an efficient way to protect this history. Some of the proposed solutions include storing the data in hidden partitions and writing to write once media such as CD-Rs and DVD-Rs. Furthermore, although this method of gathering more insight leverages the honeypot characteristic of being free of production constraints, the exact performance and storage penalties have not been calculated. Although the idea proposed could be implemented on production systems, c lb proposedthe benefits gained would not justify the cost incurred. By obtaining performance measurements in a honeypot,

TimeKeeper is not meant to be an independent system, but one that is incorporated in a forensic framework that is specifically targeted towards honeynets. Through the combination of existing technqiues and TimeKeeper, a richer source of forensic evidence will be produced. We will continue to refine, improve, and gather additional conclusions from our research based on the experiences gained. REFERENCES

[1] D. Farmer, W. Venema. Forensic Discovery. Addison-Wesley. Upper

Saddle River, NJ. 2004. S. Garfinkel. "Anti-Forensics: Techniques, Detection and i l Countermeasures" The 2nd International Conference on i-Warfare and nSecurity (ICIW), Navel Postgraduate School, Monterey, CA 2007. [3] B. Carrier. "An Investigator's Guide to File System Internals", FIRST [2]

Conference on Computer Security, Incident Handling & Response. June

2002.

optimizations might be made so that similar systems could be developed to aide in real-world situations where the

http://www.first.org/events/progconf/2002/dl-02-carrier-slides.pdf

[4] D. P. Bovet, Marco Cesati. Understanding the Linux Kernel Third

environment cannot be controlled. This method for building a metadata time history for a richer source of forensic information is not a stand alone

Edition. O'Reilly. Sebastopol, CA. 2006. [5] The Honeynet Project. Know Your Enemy: Revealing the Security Tools, Tactics and Motives of the Blackhat Community. Addison-

Wesley. 2002. The Honeynet Project. "Tools for Honeynets" http://www.honeynet.org/tools [7] The Honey Net Project. "Know Your Enemy: Sebek A kernel based data capture tool"

endeavor. It is a step toward building a forensic framework that specifically suits the honeypot environment and that can be further extended as these environments continue to evolve. In [23] a method is proposed for the monitoring of system calls to determine whether an encrypted application can be trusted. Combining the techniques discussed there with

[6]

http://honeynet.org/papers/sebek.pdf

[8] M. Dornseif, T. Holz, C. N. Klein. "NoSeBreak - Attacking Honeynets" Workshop in Information Assurance and Security, 2004. [9] Holz, Thorsten, Frederic Raynal. Series of articles starting with

TimeKeeper could yield a system that produces greater forensic evidence.

[10]

V. CONCLUSION Current forensics toolkits aid in the gathering of data from

"Defeating Honeypots: System Issues, Part 1" SecurityFocus. 3-232005. http://www.securityfocus.com/infocus/1826 Series of articles staring with Oudot, Laurent, Thorsten Holz. "Defeating Honeypots: Network Issues, Part 1" SecurityFocus. 9-04-

2004. http://www.securityfocus.com/infocus/1803 production systems by offering utilities that help investigators [11] T. Holz, F. Raynal "Detecting Honeypots and other suspicious .environments." Workshop on Information Assurance and Security, 2005. multiple layers These tools, through te [12] R. Love, J. McCutchan. "Inotify" of abstraction. access multple layers of abstraction These tnrough the automated gathering and presentation of data, can reduce the http://whww.edoceo.com/creo/inotify/ energy expended by performing detailed manual analysis of [13] D. Farmer, W. Venema. "The Coroner's Tool Kit" http://www.porcupine.org/forensics/tct.html. many logs. However, current toolkits were developed for B. Carrier. "The Sleuth Kit" [14] http://www.sleuthkit.org/ producion sytems,which are characterized oby having very production systems, ddcate ve purely seurty limited resources to dedicate purely to security. The [15] Guidance Software. "Encase" persistence of journals and log files on these systems varies http://www.guidancesoftware.com/products/ef_index.asp depending on hardware and software implementations and [16] AccessData. "ForensicToolkit" http://www.accessdata.coml limitations. Honeypots differ in that they offer the advantage of not having typical production system constraints. http://www.metasploit.com/projects/antiforensics/ In our opinion, there cannot be a universally applicable tool [18] The Grugq. "Defeating Forensic Analysis on Unix," Phrack Magazine vl-6. http://www.phrack.org/archives/59/p59-OxO6.txt for forensic analysis, thus creating the need for a forensic Dr. Knut Eckstein. "Forensics for Advanced UNIX Filesystems." toolkit that takes advantage of the unique honeypot [19] IEEE/USMA Information Assurance 2004. access

environment.

Workshop

[20] Brain Carrier. File System Forensic Analysis. Addison-Wesley. Upper Saddle River, NJ. 2005.

In this paper, we discussed a method for

building an archive of file metadata and described

TimeKeeper. Archiving the file modification, access, and

change times allows researchers to reconstruct more accurate

[21] K. Mandia, C. Prosise. Incident Response Investigating Computer

timelines and perform intrusion playbacks for examination of honeypots. This may lead to greater understanding of current

attack techniques and better methods of securing production systems. We showed that TimeKeeper can detect and log file system changes by monitoring the journal and storing file metadata. Through the examination of the differences between honeypots and production computers, we argued that the overhead of a system such as TimeKeeper is justified. 1-4244-1304-4/07/$25 .00 ©C2007 IEEE

Crime. Osborne/McGraw Hill. Berkeley, CA. 2001. Tweedie. "Journaling the Linux ext2fs Filesystem". The Fourth Annual Linux Expo. 1998. [23] Y. Xia, K. Fairbanks, H. Owen. "Establishing Trust in Black-Box Programs." IEEE SouthEast Conference, 2007. [22] S.

118