Resource Volume Management for Shared File ... - Semantic Scholar

1 downloads 2405 Views 614KB Size Report
servers with maximum uptime, optimal performance, and high storage capacities .... view of a file's data to many host computers while impos- ing small overhead ...
Resource Volume Management for Shared File System in SAN Environment Seung-Ho Lim, Joo Young Hwang, Kyung Ho Kim, Jupyung Lee and Kyu Ho Park Computer Engineering Research Lab. EECS Korea Advanced Institute of Science and Technology Taejon, Korea shlim, jyhwang, kyhkim, jplee  @core.kaist.ac.kr, [email protected]

Abstract In this paper, a volume management scheme that supports multi-host environments is designed and implemented. The proposed scheme provides various features for enterprise internet services such as online resizing, online migration and an enhanced configuration scheme considering workload access patterns and disk geometry. The SANfs is a shared file system in which multiple hosts can share multiple disks via a storage area network (SAN). SANfs is designed to provide high storage I/O performance and reliability to support the internet web server, but various features of SANfs are limited by traditional storage management technology; therefore, a volume management scheme for SANfs is required. The designed volume manager in this paper, called SANfsVM, can maximize the performance of SANfs and provide SANfs with high storage volume scalability and availability. It can provide SANfs with maximum system uptime and unlimited storage capacity using online resizing management. SANfs-VM can also provide the opportunity to use the disk characteristics and access patterns of system workloads by utilizing an enhanced configuration scheme. Keywords : LVM, SAN, shared file system, online resizing, web server

1. Introduction Recently, with the explosive growth of e-business and multimedia services, the amount of data which internet servers should maintain has become enormous. To help cope with this problem, advanced switching technology, such as a Fibre channel, have been suggested. A Fibre channel is a serial interface capable of transfer rates of a gigabit at distances up to 10 km[3]. Hundreds of fibre channel host computers and disks can be combined in shared bandwidth across the fibre channel loops or switches. The storage area network (SAN) is the combination of network attached fibre channel storage devices and computers with fibre channel adapters on a loop or fabric. The fibre channel combines the high channel bandwidths of disk interfaces with the high connectivity of network interfaces. Each host This work is supported by NRL project, Ministry of Science and Technology, Korea

computer in SAN can effectively access the fibre channel disks just as these computers access the local disks. In the SAN environment, no special file consistency mechanism is required to share the storage system. The shared storage made available on a SAN raise possibilities for a shared file system[7]. Two approaches for coordinating operations to the shared file system exist: symmetric and asymmetric. The SANfs[14] is an asymmetric shared file system where operations are synchronized through the file manager. SANfs is designed to provide high storage I/O performance and reliability to support the internet cluster web server; but the ability to provide users of enterprise servers with maximum uptime, optimal performance, and high storage capacities and file size is limited by traditional storage management technology. In the shared storage environment, logical volume is the grouping of several storage devices into what appears to be a single large storage device. The logical volume is represented as a block device node and can be used as a real device[17]. The logical volume manager (LVM) manages the relation between the logical volume and the physical volume, and every I/O request for logical volume must be mapped to physical volume for the low level device drivers. LVM stores data in the underlying devices from beginning to end according to its configuration level. Fig. 1 presents a comparison between the traditional system and the LVMbased system. In this paper, we propose a new volume management scheme that supports multiple hosts in SAN environments. This scheme provides online resizing, online migration and adaptive configuration considering workload access patterns and disk geometry for multimedia cluster web servers. The designed volume manager, called SANfs-VM, can maximize the performance of SANfs file system and provide SANfs with high scalability and availability in the aspect of storagevolume. It can provide SANfs with maximum system uptime and unlimited storage capacity using online resizing management, and by using an online migration scheme, the load of SANfs can be balanced. SANfsVM can also provide the opportunity to use the disk char-

Fig. 1. Traditional System VS. LVM Based System

acteristics and access patterns of system workloads by utilizing an enhanced configuration scheme. The remainder of this paper is organized as follows. In Section 2, we review some related research. In Section 3, we describe the overview of the SANfs shared file system. Section 4 shows SANfs-VM and its features. Section 5 presents the experimental results. Section 6 summarizes our research and gives some future ideas for future research. 2. Related Works Logical Volume Managers have long been key components of storage systems. Veritas Software developed a prototype volume manager which set a reference implementation for other OS-specific LVMs by IBM, HP, Digital, Sun and others. LinuxLVM[17] is very similar to those OS-specific LVMs and provides the advanced configuration and management tools many are accustomed to using. Constructing flexible virtual views of storage with LinuxLVM is possible because of the abstraction layers formalized in the standard. User space tools used to configure each virtual level follow a similar set of operations, making online allocation or deallocation of storage to virtual groups possible. Most of these tools concentrated on the single system environment. Other works for the Volume Manager in the SAN environment are GFS’s[10] Pool Driver[18] and SANtopia’s SVM(SANtopia Volume Manager)[19]. A Pool Driver can be used to share SAN storage devices with a GFS file system. The Pool Driver was created for Linux and joins a collection of individual disk partitions into logically contiguous block devices. It is a mid-level block driver built on SCSI and FC driver. Like LinuxLVM, the Pool Driver’s user tools are used to build a logical block device. In order to support multi host systems, the Pool Driver services SCSI commands to handle the lock object, called Dlock, which is the locking mechanism of GFS file sharing. The Pool Driver has some drawbacks in volume management: it supports only RAID level 0 and 1, and

it cannot support online resizing/reconfiguration which are very important features for enterprise users. SANtopia’s SVM can manage the storage of SANtopia shared file system. The SANtopia volume manager supports various software RAID levels such as RAID 0, 1 and 5. The software RAID technique provides user with flexible configurations of many small physical disk drives. In addition, the users of the SANtopia volume manager can resize the logical volumes online. When the volumes are resized, newly added parts can be unified with old configuration or have their own configuration. The unified method causes the blocks within original configuration to move into new partitions. More research is needed on SANtopia volume manager. Taking advantage of special disk characteristics and load balancing are necessary to maximize workload efficiency. 3. Overview of SANfs SANfs[14]is based on asymmetric architecture. Multiple hosts and disks are connected to a fibre channel SAN. All the hosts access the disks attached to the SAN directly. Data consistency and file system metadata are managed by a centralized file manager, called a metaserver. To prevent the meta server from being the single point of failure, the metaserver is backed up by a shadow metaserver. The shadow metaserver takes the role of the metaserver when the meta server fails. Data consistency protocol should be efficient to avoid file manager bottlenecks. In SANfs, an efficient file level lease is deployed. Free blocks management is distributed to each host, reducing the communication between the computers and the file manager. The software architecture of the SANfs is shown in Fig. ?? In this architecture, the client-side of SANfs consists of the lock management, directory management and regular file I/O sector. File lock/Metadata messaging sector sends or receives messages with the metaserver to keep data consistency and obtain metadata. SANfs-VM is composed of the Logical Volume Driver and the Driver Manager. The Logical Volume Driver is a virtual block device driver created by the Driver Manager, which provides the SANfs file system module with a logical address. The Driver Manager operates the Logical Volume Driver according to the user’s tools. The centralized management scheme can simplify design and facilitate implementation. 3.1 Data consistency and metadata management Data consistency mechanisms are an essential part of the SAN environments which share storage devices directly. Since the disk data is directly accessed by hosts, data transfer is highly scalable. In contrast, the metadata is managed by a centralized metaserver. To achieve high scalability, the consistency and metadata management should be designed

R S T U VW X Y Z [ \ V

] Z \ ^R Z _ ` Z _        ->   /    &   "*   + 2     (*   + -.   0  1   -.   0  1   -.   0  1   g  h  (    i   j 0 k      - 0       1   0  1   1   0  1   J &2 Q   + li m,'m $  3   "$   3   O   P  + 2   0  $  3   "$   3   $  3   /   3  

               "!    # $        % &   (' ) &   "-.*      + 0 ,  -.1    /     a b >  c   d.b> e fEg(  h  i   j 0 k      ->     0 Q   + $  3   "$   3   2   0  $  3   /   3  

li m,'m O   P  + $  3   "$   3  

4 5 6 7 8:9; = = 8 ? @ A B>7 < C 8ED"7 8 7 I K LM N

K LM N

K LM N

K LM N

K LM N

Fig. 2. SANfs Architecture

efficiently to prevent the metaserver from being the single point of a bottleneck. In previously distributed file systems, the data consistency model has been studied at length. We utilize a file level lease method[6] for SANfs to provide a consistent view of a file’s data to many host computers while imposing small overhead on the meta server’s locking system. The metaserver does not block until the file locks of the failed hosts are returned. When a host fails, the metaserver preempts the file locks owned by the host after the expiration time of the file locks. A decentralized free block management scheme is proposed. A global zone map is managed by the centralized metaserver and every host is assigned a chunk of free blocks at a time. Local free block requests are served from the free block chunk, which reduces the amount of communication between the metaserver and the each host computer. 4. SANfs-VM: SANfs Volume Manager SANfs-VM is a blocking device driver that provides logical addresses to the SANfs file system using physical addresses. All the reads and writes to block devices occur in chunks defined by the file system block size. The block size of SANfs file system is 4KB, but the value may be changed. Each of these block I/O requests passes the ll rw block() routine in the linux kernel. The block mapping method of SANfs-VM, which converts the logical address of the block into a physical address in a real disk, is the same as LinuxLVM’s. SANfs-VM uses the index number which is written as a real device number and a real sector. ”Real device number” refers to where the block is included, and ”real sector” refers to where the real data is located in the disk. A mapping unit made by indexing is called an Extent. One Extent is may be composed of one or several file system blocks, and using these, SANfs-VM can create the mapping table according to its configuration level. The main features of SANfs-VM are an enhanced configuration method and online management of functions

such as resizing and migration. Enhanced configuration methods can produce the near optimal utilization of disk, and online management can yield the maximum system uptime and unlimited storage capacities. 4.1 Enhanced configuration method In the disk configuration method, there are various redundant schemes such as linear and various RAID level configurations. The SANfs-VM supports concatenation (linear), stripe (RAID-0), stripe with parity (RAID-5) level configurations. Fig. 3 shows the example of various configuration methods. The linear mode means that several collected physical disks store the data sequentially, and the stripe mode means the data are interleaved to the physical disks. In RAID level configuration, stripe size denotes the maximum amount of logically contiguous data stored on a single disk. Striping data across the disks of a disk array introduces significant performance benefits mainly because the effective transfer rate of the disk is affected by a factor equal to the stripe size. When users configure the RAID level with a disk array, the stripe size of volume must be considered to efficiently utilize a disk array[21][22][23][24]. To fully utilize a disk array, SANfs-VM can configure multiple stripe sizes in one RAID level volume with two considerations. The one is considering disk geometry. The widely adopted modern disk technique is Zone-bitrecording (ZBR) disks. ZBR disks have more sectors and a higher data transfer rate on outer tracks than on inner tracks in cylinders. These characteristic divides the cylinders into several zones, making the outer zones storage capacity and transfer rate higher than inner zones. Therefore, the ZBR disks have variable bandwidths depending on which zone the disk head is currently positioned. The other consideration is the access patterns of the workloads. In the case of multimedia servers, several classes of service data exist, such as video, audio, picture and text according to its type. Video and audio are periodic services, large and contiguous data sets requiring high bandwidth during their for a relatively short. Text and picture data are aperiodic services and small sets of data in comparison with video data. These service types just require fast service from the server Œ ‚ "Ž yn s {x| n opq r st u o(t vwyxzn s {.x(| n q }.x.s|~Es€€.q }.p ‚„ƒ …„† ‡ˆ~Es€€.q }.p

n opq r st u ot vwyxzn s {x| ‚ ƒ …„† ‰Š~‹s€€.q }.p

Fig. 3. Example of various configuration method

©› – ˜ ¥ › — ¤>› —

¥ (¦§ •® ™ • –(  ™>’‹¯> – ›>— •

« “ ¡ ›:©˜ ” ˜ Ÿ › — ‘>’:“ ”>“ • – — ˜ – ™>— š • — › œ> › • – ¨

±Âº °>À ¿ Á ² ÃÄ>¹

½.Ä ² Ä ³ ¿ ² ÃÄ>¹

±Âº °.À Á ´ ± µ ¶ ¾

½.Ä ² ÄG°>´ ± µ ¶ ¾

¥ (¦§ •

¬ £"— “ ¤ › —(©˜ ” ˜ Ÿ › —

° ± ± ²(³ ´ ± µ ¶ ·>¸>¹ º>»(³ ´ ± µ ¶ ½ º ¼ ¾ » µ ±>» ¸>¿ ¹ ¹ ² ±>»

­

Fig. 5. layout of SANfs file system ª

ž ™ Ÿ>“   ˜ ¡ ¢™>¡ >’E›G£(— “ ¤ ›>—

Fig. 4. The sequence of online resizing

rather than high bandwidth. To get an optimal performance in disk configuration, these data sets much be separated according to its service type. SANfs-VM binds up the same zones in the ZBR disk array: outer zone to outer zone and inner zone to inner zone. For each disk zone, SANfs-VM can apply different stripe sizes for optimal disk configuration. Inner zones can be converted to small stripe, and the smaller the stripe size, the higher the disk parallelism because the data unit to be stored is small and scattered. In this case, a number of I/O operations per second can occur, but data transfer rate for each stripe unit is not so favorable because a set of sequential data is divided into several stripe units and stored in each disk, respectively. The bandwidth of the inner zone is also worse than that of the outer zone. Text data focuses on a fast response rather than high data transfer rate, so a small I/O request size such as text data is adequate for disk zones which are inner and have small stripe sizes. On the other hand, if the stripe size is large, disk parallelism is degraded; however, large stripe size guarantees better continuity to the data to be stored than does small stripe size, meaning the data transfer rate of large stripe size is higher than that of the small stripe size. Therefore, outer zones can be converted to large stripe size, and service requiring high data transfer rates, such as video and audio data, can be stored in outer zones for disk utilization. Using this enhanced configuration method, disk utilization can be optimized in comparison with the conventional RAID configuration method. When using the RAID-5 level configuration scheme in the SAN environment, parity blocks are computed by multiple computers independently. Without specialized stripe locking support, parity occurs inconsistently. If the entire stripe is locked every time it is used, the overhead of stripe locking will be very large. In our system, a special stripe locking scheme is used for efficiently maintaining the parity consistency in the SAN environments[16]. The parity consistency problem occurs only for the write-shared stripe in which blocks are written by different hosts concurrently. Locking is not required for non write-shared stripes and can be removed, and for this reason, we classify the stripe

into trivial stripe and non-trivial stripe. A stripe is trivial if all the blocks in the stripe are allocated to the same file or not allocated at all. On the other hand, a stripe is non-trivial if all the blocks in the stripe are allocated to different files or are not free. In many cases, a stripe is trivial because the same file is rarely written concurrently by different hosts in practical workloads. Therefore, a number of instances of locking a stripe for parity consistency can be avoided using this stripe classification. 4.2 Online resizing There are many reasons why systems fail. One of them is lack storage space. During the operation of a system, the user will need to increase or decrease the size of a volume. This feature is especially important for the enterprise server’s administrator. We refer to this job as online resizing, and SANfs-VM can perform this feature. Fig. 4 shows the sequence of SANfs-VM’s online resizing. If the administrator requests the resizing of volume, at first, the SANfs system tries to resize the block device level. In this step, the driver manager in the metaserver creates a new mapping table according to its configuration, and then the driver manager sends this table to each host computer and announces that the volume is resized. The newly added volume can not be used by the SANfs file system directly because of the system contains no information about that volume in the file system layer, so no inconsistency problem occurs in resizing the block device level. There are two techniques to resize the volume: one is creation of a new logical volume, which involves the creation of a new mapping table for a newly added region, and the other is the unification of the newly added volume with old volume. The second step in online resizing is SANfs file system resizing. The layout of the SANfs file system is illustrated in Fig. 5, and this layout is similar to that of the ext2 file system. In order to support the online resizing of file system, the SANfs file system binds up the inode and data blocks and represents them as a group. The group descriptor contains its group’s critical parameters such as the number of inode blocks and data blocks, size of the group, the number of inodes in group, allocation bitmap of inodes and data, and the first starting data blocks. The superblock contains the information of all group descriptors and other metadata such as the total number of inodes, data blocks, and free blocks. This layout is controlled and managed by the file manager in metaserver. All the host computers should acquire the metadata from the file manager to use

the resources of the SANfs file system. This means that online resizing of the SANfs file system can be possible by modifying the superblock and adding new groups of file systems in the file manager of metaserver. If the resizing of the block device level is completed by the driver manager, the file manager makes a new group descriptor which contains the information about the newly added region. The file manager also updates the superblock with this new group descriptor. The new group of the file system is appended to the end of existing layout of the SANfs file system. Because each host computer should know about the total size of the file system, it is necessary to update the information of the superblock in each host computer. Since the goal of online resizing is to allow the system to continue operate while the resizing of the system is in progress, we need to make sure that we do the appropriate locking of the file system’s layout. In a Linux kernel, there is only a single lock for the superblock, so we can hold this superblock lock to ensure online resizing is safe. î î Ë ï Ë ï Ö ð:Ö ð:ñ"ñ"ÐÐÐËÐ Ð Ë Ð ÐÕÐÓ Õ ÕyÓ ÕíòòÆ ÇÆ ÇÈ.ÉÈ.É é å å à ì>Ü å ê ë è æ ç Ý.ä"é.Ü å ê ë

Å‹Æ ÇÈ.É ÇÐ Ë Ê Õ Ó Õ Ç.Ð Ë Ê"Ë>Ì.Í ÑÎ Ï Ò Ó Ô:ÕÉ Ñ.Ò Ó Ô:Õ.É ÑÎ Ï Ö Ç×.Ø Ì

Ú Û Ü Ý:Þ ß Þ à Ý>áãâ.ä å.æ>ç

Ú Û Ü Ý:Þ ß Þ à Ý>áãâ.ä å.æ>ç

Ê Õ>Ó ÕÙÑÖ Ç×>Ø Ì

ÅEÆ ÇÈ.É ÇÐ Ë Ê Õ Ó Õ ÇÐ Ë Ê"Ë.Ì>Í ÑÎ Ï Ò Ó ÔGÕ.É ÑÒ Ó Ô:ÕÉ ÑÎ Ï Ö Ç×.Ø Ì

Ê Õ Ó ÕíÑÖ Ç×.Ø Ì

Fig. 6. The resizing of SANfs file system layout

5. Experimental results In this section, we present an experimental performance evaluation of the SANfs-VM and the overall SANfs system. The experiments are divided into two categories. The first experiment is an evaluation of a normal block I/O operation on the SANfs-VM. We are capable of not only evaluating the overhead of SANfs-VM, but also evaluating the performance of the overall SANfs system, such as scalability and propensity to become a bottleneck point, by comparing the non SANfs-VM system with the SANfs-VM system. The next category is the enhanced configuration method considering disk geometry and workloads of the multimedia system which generate mixed loads. 5.1 Performance of SANfs and SANfs-VM The performance of the SANfs system is measured by a benchmark program, called Bonnie[25]. This benchmark program estimates the performance of the file system and I/O requests such as character read/write, block read/write, and random seeks. This experiment is interesting for two

132

„ÿ "ö þ„ÿ:ø ú û ûíý"þ„ÿ   õ

óGô õ ö„÷ùø ú ûüûÙý(þ ÿ   õ

 

   

 

   

   $#  !"       

     

% %'& ( ) *  + , .6/  0 & ( ) *  + , -

   

$#

  

 

  

 !"  

  

  

% %'& ( ) *  + , ./  0 & ( )*  + , -

       

   

  4  5  4   



          

Fig. 7. The result of benchmark

reasons. The first is how much overhead is introduced with the additional computation of the SANfs-VM layer. If this overhead is noticeable, it causes a serious problem in the overall SANfs system in spite of its various features. The interesting point of this experiment is measuring the performance of the SANfs file system, which give us the information on the scalability of the SANfs file system and whether the file manager can prevent the bottleneck point or not. To minimize the effect of caching to get the real I/O performance, the file size used in the Bonnie benchmark is 1024 MB. Because all of the machines have 128 MB of main memory for caching, the caching effect may be negligible. The block size of the SANfs file system is set to 4 KByte. The experiments compare the bandwidth of SANfs file system I/O requests when it uses SANfs-VM and when SANfs-VM is not used, according to the characteristics of the I/O requests. The testbed to estimate the performance consists of four personal computers. Two of them are Pentium 900 MHz CPUs with 128 MBytes of main memory. Another two of them are 450 MHz Pentium CPUs with 128 MBytes of main memory. One computer is configured as the metaserver for the four host computers, and each of the four computers is configured as SANfs clients. All computers are connected to the network with NIC cards. All computers with Qlogic 1 Gb/s Fibre channel adapter are connected to a fibre channel hub, and one Seagate fibre channel disk drive (ST39173FC) is connected in a simple loop to an FC hub. Fig. 7 shows the results of the experiments: the first result presented is that the bandwidth of the character I/O operation, and the other is the bandwidth of the block I/O operation in each host. The character I/O consumes much CPU time when operating, so the bandwidth is much lower than block I/O. At first, we can compare the non SANfsVM system with the SANfs-VM system. From these results, we can identify the some overhead of the SANfs-VM driver, but the overhead is not visible and will not cause a serious system problem. SANfs-VM is worth employing

for the flexible management of storage volume in spite of its cost. In summary, the overhead of SANfs-VM is memory overhead and CPU overhead. The memory overhead is a mapping table which all the hosts have and is, at most, several KBs. The CPU overhead is the computation time which converts the logical address into a physical address. This time is just the execution through indexing of several modulo operations. The next thing we can see is that all host computers show nearly equal performance levels. The measured performance value is nearly equal to that of the local file system, meaning that the overhead of the file manager in the metaserver is not seen in the experiments due to the efficient file locking and metadata management scheme of the SANfs file system; moreover, the data is directly accessed by the hosts themselves. If the number of host computers is very large, the file manager in the metaserver may become a bottleneck point. 5.2 The enhanced configuration method The second experiment compares the estimation accuracy of the enhanced configuration method with the conventional method provided by SANfs-VM. To evaluate the system, we used synthetic workloads which simulate the integrated multimedia data. We assume multimedia data consists of two types of service classes, text and video, and we experimented with these synthetic workloads. For this experiment, we first examined the real characteristics of ZBR disks. Fig 9 shows the data transfer rate according to the sector number. Usually, the number of each sector starts at the outermost track and ends at the innermost track. As shown in Fig. 9, the data transfer rate is degraded as the number of sectors increase, suggesting that the existence of several zones and the the outer zones have higher bandwidth than inner zones. We generated two types of synthetic workloads: one is text workload, and the second is video workload. The distribution of text workload is poisson distribution with a 0.1 arrival rate, and the size of one text request is a random value which between 1KB to 10KB. The access address of text request is random generation in the disk addresses. The video workload performs just like the mepg-2 type. The video workload’s deadline violation is 800KB/sec, and the size of a single video request is 512KB. Each process of video request accesses sequentially to get a contiguous data from the disks. The experimental environment is shown in Fig. 8. The hardware specifications are the same as the first experiment, and there are four disks. We measured the average response time of text requests which fit the deadline of video request as the number of video process increased. To investigate only our enhanced configuration method of SANfs-VM, the synthetic workloads pass through a ll rw block() routine directly.

F3G pnW

FCG

HIG

HIG omV

]_^U`ba cedf_c6g c 798;:= 798?:=A@CBED 7KJ;7ML @=NCJPO?Q R SITUQ ]_^U`hciMj g kld

X9 ZY [I\

X9Y [I\

XU 9Y [I\

X9 UY [I\

Fig. 8. Experimental environment

At first, we evaluated the multi zone configuration which divides the disk into two zones, inner zone and outer zone, according to sector number. SANfs-VM bind the inner zone to inner zone and the outer zone to the outer zone in a disk array. The basic configuration of the disk array is stripe mode. The stripe size of each zone is set to 512 KB. We allocated the text workloads and video workloads separately. First, we allocated the text class in the inner zone and the video class in the outer zone, and then we allocated the video class in the inner zone. The last is that two of them are not separated. To the left of Fig. 10 is the result of this experiment. As the number of video proceses increase, the average response time of text requests becomes faster than that of other cases when text workloads are allocated in the inner zone and the video workloads are stored in the outer zone. This improvement in response time is sensible because the outer zones have higher bandwidth than the inner zones. The video workloads are required for high bandwidth in order to avoid violating the deadline periods; therefore, it is adequate to be allocated in outer zones. In a disk array, this effect is appeared with greater frequency. On the other hand, if the video workloads are allocated in the inner zone, the deadline violations of video process can occur. As a result, the average response time of text requests is to increase. To the right of Fig. 10 are the experiments concerned

vu vt vs œ

𐠛

vq

˜™

vr

•

qu

 ‘’

qs

—

“–

“”

qt qq qr

rwqxsytxu{z r|z q}z s|z t|z u~q rq € q q s€q t€q uv r~v q€v s ‚„ƒ„… †6‡ˆ'‰„‰6‡ ‚Š„Š=‹ Œ Ž Ž Ž Ž Ž Ž 

Fig. 9. Characteristics of Zone-Bit-Recording Disk

ÓÔÕÖl×6ԄØÙÚÛØ ÔIÓ§Ü ÝAÔIÞ6ßà„á¬â Ö â ã ÚÛ Ô

 



 



ÿ

ü ýþ

åää íä ìä ëä êä éä èä çä æä åä

;  ?#; *  'A@B ; " % C#  ? < % ;

6ž Ÿ  ¡ž ¢£¤¥ ¢ žb§¦ ¨_žb©6ª¬«ª„  ­ ¦ £ žIª§¦ ® ž

 

µ¯ ´¯ Ï ÎÊ É ³¯ ÌÍ Î ²¯ È ÉÊ Ë ±¯ Å ÆÇ °¯ Ð

ä

ï ò ó ö ÷ ø ò ô

ò ó ö ÷ø ò ô  ÷ ò ø

å

é åä î ï ð6ñ ò ó ô õ ö ÷ ø ò ô§ù ó ô ú ò û û

åé

= ; > ?#; *  '@B ; " % C#  ? < % ;



¶¯

´°±ÑÒ´°± Ñ ³ÑÒ´°± Ñ ²± ÑÒ´°± Ñ



:9 7 86

 

23

 

45

01

./

¯ °

,-

´ °¯ · ¸ ¹lº » ¼ ½ ¾ ¿ À Á » ½l ¼ ½ à » Ä Ä

°´

    

Fig. 10. Text response time over # of video process



     !#"!$&% '# )(! #*  + +

 

Fig. 11. SANfs-VM configuration method VS. conventional configuration

with the configuration of multiple stripe sizes. For each zone, SANfs-VM is applied to different stripe size for optimal disk utilization. This experiment is based on the assumption that text workloads are allocated in the inner zone and video workloads are allocated in the outer zone. With the stripe size set at 512 KB in the outer zone, we changed the stripe size of inner zone to 4KB, 32KB, and 512KB. The reason that we fixed the stripe size of outer zone is that only a large stripe size is adequate for video workloads, which require sequential access for large sets of data. As shown to the right of Fig. 10, the 4KB-512KB (stripe size of the inner zone-stripe size of the outer zone) case shows the worst performance. The cases of 32KB-512KB and 512KB-512KB show similar results, with 32KB-512KB performing slightly better than 512KB-512KB. Text workloads are small, frequent to read/modify/write, and scattered throughout the disks. These requests also focus on fast response time and performing a number of I/O operations per second. Therefore, it is necessary to set an appropriate stripe size according to the service type. That is, stripe size is the key point that in achieving optimal system performance and disk utili zation. It is affected by the number of processes, service class, average request size, and distribution of requests. From the two experiments, we can evaluate the performance of SANfs-VM’s enhanced configuration method. When very different types of requests exist, it is necessary to configure appropriately to achieve optimal disk utilization and the best performance. SANfs-VM helps the system achieve higher performance than other configuration methods. Fig. 11 compares the enhanced configuration using the SANfs-VM’s configuration method with conventional stripe configuration. As the number of video processes increase, the average response time of text requests is enhanced. When the number of video proceses is 15, the average response time of our work is 61.85 ms, but the average response time of the conventional method is 72.54 ms. The average response time is reduced by a maximum of 15 percent compared with that of the conventional configuration scheme.

6. Conclusion In this paper, we have designed and implemented a volume manager for a shared file system in the SAN environment. This system provides various features for enterprise internet services such as online resizing, online migration and an enhanced configuration scheme considering workload access patterns and disk geometry. We have focused on two points in designing SANfs-VM. The first is online management such as online resizing. SANfs-VM is designed to support enterprise multimedia servers well. Because enterprise servers have maximum system uptime and unlimited storage capacity so as not lose their customer, the online management scheme is essential for servers. For these features, we designed the layout of the SANfs file system and data structures of the volume manager to support online management such as resizing and migration. The other focal point is system support for mixed workloads. Multimedia systems have various types of workloads such as video, audio, picture and text. To fully utilize the disk considering these mixed workloads, SANfs-VM can give the optimal volume configuration and data allocation method. With these features, SANfs-VM can maximize the performance of SANfs, providing it with high scalability and availability in the aspect of storage volume. It can provide SANfs with maximum system uptime and unlimited storage capacity using online resizing management. SANfsVM can also provide the opportunity to use the disk characteristics and access patterns of system workloads by using an enhanced configuration scheme. SANfs-VM is implemented and integrated with SANfs file system on a Linux kernel. The experimental results show that SANfs-VM can provide the system with 24hours-a-day uptime operations and unlimited storage capacity. In the experimental result of the enhanced configuration scheme considering disk geometry and access patterns of workloads, the average latency is reduced by a maximum of 15 percent in comparison with other configuration schemes. Further research is needed on auto configuration and

auto migration to achieve optimal performance according to circumstances. Another area where further research is needed is on backup systems for the data and recovery in the face of the server failure. The volume manager presents the upper layers with its logical address; therefore, a backup system is needed for when the system crashes abruptly using a snapshot feature. The design of an integrated file system to support various types of workloads is also needed to induce high performance. Disk scheduling algorithm, variation of block size can be these jobs. References [1] Alan F. Benner, Fibre Channel: Gigabit Communications and I/O for Computer Network. McGraw-Hill, 1996. [2] Randy H. Katz, “High-Performance Network and Channel Based Storage”, Proceedings of IEEE, Vol.80, No.8, pp123-1261, 1992. [3] Fibre Channel Solutions : management, Fibre Channel Association, http://www.fibrechannel.com/ [4] G.A. Gibson D.F. Nagle K. Amiri F.W. Chang, “File Server Scaling with Network-Attached Secure Disks”, In Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 272-284, Seattle, WA, June 1997. [5] Friedhelm Schmidt, The SCSI Bus and IDE Interface, AddisonWeslty, Second Edition, 1998. [6] C. Gray and D. Cheriton, “Leases : An efficient fault-tolerant mechanism for distributed file cache consistency”, In Proceedings of the twelfth ACM Symposium on Operating Systems Principles, pp. 202210, 1989 [7] Matthew T. O’Keefe, “Shared File Systems and Fibre Channel”, In the Sixth NASA Goddard Space Flight Center Conference on Mass Storage and Technologies in cooperation with the Fifteen IEEE Symposium on Mass Storage Systems, pp. 1-16, March 23-26, 1998 [8] J.H. Howard, “An overview of the andrew file system”, In Proceedings of the USENIX Winter Technical Conference, pp. 23-26, Berkeley, CA, 1988 USENIX Association [9] James Griffoen and Randy Appleton, “Reducing file system latency using a predictive apporach”, In Summer USENIX Conference, 1994. [10] A. Barry K. Preslan, “A 64-bit, Shared disk file system for linux”, In Proceesings of the Sixteenth IEEE Mass Storage Systems Symposium, pp. 22-41,1999. [11] Steven R. Solits, “The Design and Implementation of a Distributed File System Based on Shared Network Storage”, PhD thesis, University of Minnesota, Department of Electrical and Computer Engineering, Minneapolis, Minnesota, August 1997 [12] Burns R. C. R. M. Rees and D. D. E. Long, “Semi-preemptible locks for a distributed file system”, In Proceedings of the 2000 International Performance, Computing and Communications Conference, pp. 397-404, Feb. 2000. [13] Yong Kyu Lee, Shin Woo Kim et. al, “Metadata management of the SANtopia file system”, In the 8th international conference on ICPADS, pp. 492-499, 2001. [14] Joo Young Hwang, “A SAN-based High Performance Shared Disk File System”, PhD thesis, Department of Electrical Engineering & Computer Science, Devision of Electrical Engineering, 2003 http://core.kaist.ac.kr [15] Jakob Ostergaard, The Software-RAID HOWTO, http://ostenfeld.dk/jakob/Software-RAID.HOWTO/SoftwareRAID.HOWTO.html. [16] Joo Young Hwang, Chul Woo Ahn, Se Jeong Park, Kyu Ho Park, “A Scalable Multi-Host RAID-5 with Parity Consistency”, IEICE transactions on Information and Systems VOL.E85-D No.7, JULY 2002. [17] David C. Teigland, Heinz Maulschagen, ”Volume Managers for Linux”, http://www.sistina.com

[18] David C. Teignald, “The Pool Driver: A Volume Driver for SANs”, In Partial of Fulfillment of the Requirements for the Degree of Master of Science, October 31, 1999. [19] Chang-Soo Kim, Gyoung-Bae Kim, Bum-Joo Shin, “Volume Management for SAN environment”, Parallel and Distributed Systems, 2001. ICPADS 2001, Eight International Conference on, 2001 [20] Prashant Shenoy, Pawan Goyal and Harrick M. Vin, “Architectural Considerations for Next Generation File System”, ACM Multimedia 1999, 157-467 [21] P. Chen and D. Patterson, “Maximizing Performance in a Striped Disk Array”, In Proceedings of ACM SIGARCH Conference on Computer Architecture, Seattle, WA, pp. 322-331, May 1990 [22] P.M. Chen and E.K. Lee, “Striping in a RAID Level 5 Disk Array”, In Proceedings of the 1995 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1995 [23] E.K. Lee and R.H. Katz, “An Analytic Performance Model for Disk Arrays”, In Proceedings of the 1993 ACM SIGMETRICS, pp. 98109, May 1993 [24] Prashant J. Shenoy and Harrick M. Vin, “Efficient Striping Techniques for Multimedia File Systems”, Performance Evaluation Journal, Vol. 38, pp. 175-199, 1999 [25] Bonnie Raitt, Tim Bray : Bonnie Benchmark program. http://www.textuality.com/bonnie [26] Peter Scheuermann, Gerhard Weikum, Peter Zabback, “Data Partitioning and load balancing in parallel disk systems”, In JLDB Journal, pp. 48-66, 1998 [27] A. Dilger, “Online ext2 and ext3 Filesystem Resizing”, Ottawa Linux Symposium, 2002 [28] T.Ts’o, “Planned Extensions to the Linux Ext2/Ext3 Filesystem”, USENIX Annual Technical Conference, 2002

Suggest Documents