Linux Filesystems Ext2, Ext3

26 downloads 399 Views 148KB Size Report
Still lacked performance. • In 1993, the Second Extended File system, or. EXT2, was added. • In 1999, the Third Extended File system or. Ext3 was developed by  ...
Linux Filesystems Ext2, Ext3 Nafisa Kazi

1

What is a Filesystem • A filesystem: – Stores files and data in the files – Organizes data for easy access – Stores the information about files such as size, file permissions, owner, creation time etc. – May use a storage device such as a hard disk or CD-ROM • Involve maintaining the physical location of the files

– Could be virtual and exist only as an access method for virtual data or for data over a network (e.g. NFS). 2

Linux File System History • Minix: The first file system for Linux – Restrictive and lacked performance – Filenames longer than 14 characters not allowed – Maximum file size was 64 Mbytes

• EXT (Extended File System): The first file system designed specifically for Linux – Introduced in April 1992 – Still lacked performance

• In 1993, the Second Extended File system, or EXT2, was added • In 1999, the Third Extended File system or Ext3 was developed by Stephen Tweedie 3

Linux File System History (cont’d.) • VFS (Virtual File System): developed when EXT filesystem was added – VFS allows Linux to support different file systems – Each file system presents a common software interface to the VFS – All the details of various file systems are translated by software • All file systems appear identical to rest of Linux kernel

4

VFS

• For example: cp /floppy/TEST /tmp/test 5

VFS : Superblocks and i-nodes • VFS describes system’s files in terms of superblocks and inodes • The VFS i-nodes: – Describe files and directories within the system

• The VFS superblocks:

– As each system is initialized, it registers itself with VFS at boot time – Each file system type’s superblock read routine maps the filesytem’s topology onto VFS superblock – VFS keeps a list of the mounted file systems and their VFS superblocks – Each VFS superblock contains a pointer to the first VFS inode on the file system – As the system' s processes access directories and files, system routines are called that traverse the VFS inodes 6

Logical Diagram of VFS

7

Caching in VFS • I-node cache: – Repeatedly accessed inodes are kept in inode cache for quicker access

• Directory cache: – VFS also keeps a cache of directory lookups so that the inodes for frequently used directories can be found quickly – Stores directory name ⇒ i-node mapping

8

Caching in VFS (cont’ d.) • Buffer cache: – Cache data buffers from the devices to help speed up access – Makes the Linux file systems independent from the underlying media and from the device drivers that support them – Is integrated with the block device interface – Read request from filesytem result in block device drivers reading physical blocks from the device that they control – These blocks are saved in the global buffer cache and are shared by all filesystems – Buffers are identified by their block number and a unique identifier for the device that read it – Filesystems don’ t have to go to the device if a block is in the cache 9

Ext2 Disk Data Structures • The first block in each Ext2 partition is reserved for the partition boot sector • Rest of space is split into block groups, each of which has following layout

• All the block groups have the same size and are stored sequentially

– The kernel can derive the location of a block group in a disk simply from its integer index. 10

Ext2 Superblock • • • •

Contains a description of the file system Duplicated in each block group The superblock and the group descriptors in block group 0 are used when the filesystem is mounted. Some important information that this block holds are: – Magic Number : • Identifies the filesytem type

– Block Group Number : • The Block Group number that holds this code of the Superblock

– Block Size • The size of the block for this file system in bytes

– Blocks per Group • The number of blocks in a group. This is fixed when the file system is created

– Free Blocks • The number of free blocks in the file system,

– Free Inodes • The number of free Inodes in the file system,

– First Inode • This is the inode number of the first inode in the file system. • The first inode in an EXT2 root file system would be the directory entry for the ' /' directory

11

EXT2 Group Descriptor and Bitmap • All the group descriptors for all of the Block Groups are duplicated in each Block Group. It contains: – Blocks Bitmap – Inode Bitmap – Inode Table

• The bitmaps are sequences of bits – Value 0 specifies that the corresponding inode or data block is free – Value 1 specifies that the corresponding inode or data block is used 12

Inodes • Every file and directory in the file system is described by one inode • The inodes for each Block Group are kept in the inode table together with a bitmap. The inode contains the following fields: – mode

• Permissions that users have • Owner Information

– Size

• The size of the file in bytes,

– Timestamps

• The time that the inode was created and the last time that it was modified,

– Datablocks

• Pointers to the blocks that contain the data that this inode is describing. 13

Inode structure

14

Consistency Check Problem with Ext2

• Updates to filesystem blocks are kept in dynamic memory before being flushed to disk • A power-down failure might leave the filesystem in inconsistent state • To overcome this problem, each filesystem is checked (and fixed) before it is mounted – Utility is called fsck – Runs upon reboot after a system crash

• Does not scale well – With today’ s large disks and filesystems, fsck can take many hours to perform consistency check – Totally unacceptable in production environment 15

Ext3 Filesystem • Ext3 is a journaling filesystem – Goal of journaling filesystem: • To avoid time-consuming consistency checks during system start-up after ungraceful termination

– Main idea: • First write blocks to a special area of disk called journal • Then write blocks from journal to the filesystem

– Examples of journaling file systems • SGI’ s XFS and IBM’ s JFS

• Ext3 is as much compatible as possible with Ext2 filesystem – Fairly easy to migrate between Ext2 and Ext3 16

Journaling Filesystem (JFS) • Two step procedure for performing high-level change to the filesystem: – Step 1: Committing to the Journal • Keeps track of the information to be written to the hard drive in a journal • A copy of the blocks to be written is stored in the journal

– Step 2: Committing to the filesystem • When I/O transfer to the journal is completed, the blocks are written to the filesystem • When I/O transfer to the filesystem is completed, the copies of the blocks in the journal are discarded

• Journal allows quick recovery of filesystem after crash – No need to scan the entire disk; only scan the journal area 17

System Recovery with JFS • Two cases for system recovery – Case 1: the system failure occurred before a commit to the journal • Either the copies of the blocks relative to the change are missing from the journal or they are incomplete – In both cases, fsck ignores them

• Result: the high-level change to the filesystem is lost, but the filesystem state is still consistent

– Case 2: the system failure occurred after a commit to the journal • The copies of the blocks are valid, and fsck writes them into the filesystem • Result: fsck applies the whole change, thus fixing every inconsistency due to unfinished I/O data transfers into the filesystem 18

Journaling Modes • Logging blocks to the journal leads to a significant performance penalty • Therefore, JFS allows operator to decide what kind of blocks has to be logged • Gives rise to three journaling modes: – Journal – Ordered – Writeback

• Journaling mode is specified as an option to mount command – Example: mount –t ext3 data= writeback /dev/wd0a /jdisk 19

The Journal Journaling Mode • All filesystem data and metadata are logged into the journal – Metadata includes superblocks, inodes, data bitmap blocks, bitmap blocks etc

• Minimizes loss of updates made to each file • Requires additional disk accesses – Example: when a new file is created, all its data blocks are duplicated as log records

• Safest but slowest mode

20

Ordered Journaling Mode • Only changes to filesystem metadata are logged to the journal • Metadata and relative data blocks are grouped – Data blocks are written to disk before the metadata is written to disk

• Two cases of changes to a file – Case 1: appending to a file • If system crashes after data blocks are written to disk, metadata will not reflect the change • Hence file consistent though the changes to file are lost

– Case 2: overwriting part of a file • No guarantee that blocks are written to disk in order – Thus, can not assume that because overwritten block ‘x’ was updated, overwritten block ‘x-1’ was updated as well

• No changes to metadata (block allocation bitmap) • Hence no way of knowing if file is consistent or not

• Default journaling mode for Ext3 filesystem – Works out fine in practice as appending to a file is much more common than overwriting in the middle of a file 21

Writeback Journaling Mode • Only changes to filesystem metadata are logged • Does not wait for associated changes to file data to be written • Example: files may exhibit metadata inconsistencies – Block allocation bitmap will have data blocks as occupied, however updated data was not written when the system went down – This isn' t fatal, but can be disappointing to users

• Fastest mode 22

Journaling Block Device Layer • Ext3 journal is stored in hidden file ./journal in the root directory of filesystem

• The journal handled by a kernel layer called Journaling Block Device (JBD)

• Ext3 filesystem invokes JBD routines to ensure disk data structures don’ t get corrupted in case of system failure 23

Interaction Between Ext3 and JBD • JBD uses the same disk to log changes performed by Ext3 filesystem • Thus JBD must protect itself from system failure that could corrupt the journal • Hence, interaction between Ext3 and JBD is based on three fundamental units: – Log Record – Atomic Operation Handles – Transactions

• Log Record – Describes a single update of a disk block – Describes a low-level operation issued by the filesystem – Represented inside journal as blocks of data or metadata 24

Atomic Operation Handles • Log records of a set of low-level operations that correspond to a high-level changes of the filesystem • Example: appending block of data to file involves many low-level operations – If system failure occurs in middle, inconsistency

• Hence, when recovering from system failure, either the whole high-level operation is applied or none 25

Transactions • All log records belonging to several atomic operation handles are grouped into a single transaction • All log records are stored in consecutive blocks of the journal • JBD handles each transaction as a whole • Reclaims blocks used by a transaction only after all data in its log records are committed to filesystem

26

References • • • • •

http://www.tldp.org/LDP/tlk/fs/filesystem.html Safari book online : Understanding the Linux Kernel http://web.mit.edu/tytso/www/linux/ext2intro.htmls http://uranus.it.swin.edu.au/~jn/explore2fs/es2fs.htm http://www.lugatgt.org/articles/filesystems/?print=ht ml • http://www.redhat.com/support/wpapers/redhat/ext3/i ndex.html • http://www.gentoo.org/doc/en/articles/l-afig-p8.xml • http://olstrans.sourceforge.net/release/OLS2000ext3/OLS2000-ext3.html 27