Project Report - Adding PXE Boot into Palacios

7 downloads 459 Views 174KB Size Report
Project Report - Adding PXE Boot into Palacios. Chen Jin ... In our project, we tried various ways in ... sent to the DHCP and Boot servers to initiate the network.
Project Report - Adding PXE Boot into Palacios Chen Jin

Bharath Pattabiraman

Patrick Foley

EECS Department Northwestern University

EECS Department Northwestern University

EECS Department Northwestern University

chen.jin@eecs. northwestern.edu

bharath@u. northwestern.edu

patrickfoley2011@u. northwestern.edu

ABSTRACT PXE is a standard for booting an OS from the network. Most machines BIOSes support it. But, the BIOS used by Palacios guests did not. In our project, we tried various ways in which PXE network boot capability could be added to Palacios. We used a PXE-capable Etherboot ROM image from ROM-o-matic.net that has support for our emulated network card. We then used this small ISO image to build the guest and let it serve as a replacement PXE-boot ROM for the emulated network card. With passthrough I/O, the requests are handed over directly to the host, which are then sent to the DHCP and Boot servers to initiate the network boot process. The PXE capability will of vital importance in diskless nodes where the node is completely dependent on the network for booting.

Figure 1: PXE system configuration

1. INTRODUCTION

using PXE protocol and then boots the guest.

PXE (Preboot eXecution Environment) allows us to boot Kitten/Palacios (and a test guest) remotely from a network server. Booting Palacios/Kitten over a network server is already possible. In this research effort we have enabled Palacios to remote boot a guest OS using PXE.

2.

PXE is defined on a foundation of Internet protocols, namely TCP/IP, DHCP, and TFTP. In brief, the PXE protocol operates as follows. The client initiates the protocol by broadcasting a DHCPDISCOVER containing an extension that identifies the request as coming from a client that implements the PXE protocol. Assuming that a DHCP server implementing this extended protocol is available, after several intermediate steps, the server sends the client a temporary IP configuration and list of appropriate Boot Servers. The client then discovers a Boot Server of the type selected and receives the name of a bootloader file on the chosen Boot Server. The client uses TFTP to download the bootloader from the Boot Server. Finally, the client initiates execution of the bootloader which boots the kernel. In our project the first request should be sent after Kitten and Palacios have booted up, so that Palacios gets the bootloader of the guest

SYSTEM

So, as shown in Figure 1, in order to use PXE we need to setup a PXE-server which can allow client systems to: • Request an IP address (via DHCP) • Download a kernel (via TFTP)

2.1

Server Configuration

On the server end of the client-server interaction there must be available services that are responsible for providing redirection of the client to an appropriate Boot Server. For our purposes we used Knoppix (an operating system based on Debian) running on Qemu to behave as the DHCP server and the Boot (TFTP) server. Knoppix has DHCP server and TFTP server pre-installed, so we just had to configure it in the following way.

2.1.1

DHCP Server Configuration

s u b n e t 1 7 2 . 2 1 . 0 . 0 netmask 2 5 5 . 2 5 5 . 0 . 0 { range 1 7 2 . 2 1 . 0 . 3 1 7 2 . 2 1 . 0 . 1 0 0 ; o p t i o n b r o a d c a s t −a d d r e s s 1 7 2 . 2 1 . 0 . 2 5 5 ; } group { next−s e r v e r 1 7 2 . 2 1 . 0 . 2 ; host t e s t {

f i x e d −a d d r e s s 1 7 2 . 2 1 . 0 . 5 0 ; hardware e t h e r n e t 5 2 : 5 4 : 0 0 : 1 2 : 3 4 : 5 7 ; f i l e n a m e ” p x e l i n u x . 0 ”; }

}

2.1.2 TFTP Server Setup According to [3], when a client boots up it will check if there is a file corresponding to its own MAC address in the ”/var/lib/tftpboot/pxelinux.cfg/” directory. However after trying several options it will fall back to requesting a default file. So, we simply changed to default file to contain the configuration we want. DEFAULT k i t t e n LABEL k i t t e n k e r n e l bzImage append s e r i a l . baud =115200 c o n s o l e= s e r i a l i n i t r d=i n i t t a s k

2.2 Client-Server Connection To make the server and client Qemu’s talk to each other we connected them together using vlans. The way this works is that one qemu process connects to a socket in another qemu process. When a frame appears on the vlan in the first qemu process, it is forwarded to the corresponding vlan in the other qemu process and vice-versa. This way, the server and client Qemu’s are connected using a socket interface connected to vlans. Thus, any frames transmitted by the client is received by the server, and vice versa.

3. IMPLEMENTATION In this section, we will walk you through our implementation process step by step from setting up a test bench for PXE booting testing and how we actually implemented it inside in Palacios VMM. Actually, we spent quite an amount of time to understand what PXE booting is and how actually to create the test environment for us eventually to work on the main part of this project. In the remainder of the section, we will describe the test bench setup, how to execute our idea progressively and finally made it work.

3.1 Testbench setup Since PXE is a standard for network booting, it is critical for use to set up the server-client system which allows us to conduct the experiments. We started work from server side which provides TFTP and DHCP services and the initial files for boot loaders.

3.1.1 server In order to avoid reconfiguring the system every time we use the Qemu, we decided to create a hard drive image which can save all the configuration changes we have made. We initially tried to install ubuntu, however, the installation took so long. After trying several different linux kernels, we picked knoppix that takes reasonable time to install, and provides both tftp and dhtp services. The following command allows users to create a qcow2 hard drive image with size of 4G is

qemu−image c r e a t e −f qcow2 knoppix . img 4G Now we launch a qemu command to start a virtual machine with 128M memory, using the created hard drive image and booting from a CD-ROM image. Once the emulated machine launched, we can start the regular installation process by first formating and partitioning the hard drive image and then install knoppix on the hard drive image. qemu −hda knoppix . img −cdrom knoppix . i s o −boot d −m 128 su f d i s k / dev /hda From the fdisk partition menu, create a primary Linux partition on partition 1 of 3600MB and a Linux Swap partition on partition 2 of 400MB (remainder of the disk). Make partition 1 the active partition, but I’m not certain that because of how QEMU boots a system on whether or not this make any difference. But that’s how people do it with a normal hard disk so let’s do it that way here. Write out the changes to the disk and then do a mkfs /dev/hda1. Once that is complete, Halt the Knoppix image and restart it with the above command line. Once you see the Knoppix prompt again like this type: boot knoppix s c r e e n =1152 x864 dma tohd=/dev / hda1 and then feel the disk spin for a bit, with the screen looking somewhat like this. When it’s all done, Knoppix will boot normally. What this actually does is just copy over the directory structure on the CDROM onto the Hard disk. Now, when you boot from your cdrom, you can add fromhd=/dev/hda1 which will make it boot off the hard disk image. Given all efforts spent in this phase, this is probably worthwhile to create such a hard drive. The start line for qemu doesn’t change from the previous run, but how we invoke Knoppix is different. At the boot prompt, type: boot knoppix s c r e e n =1152 x864 dma fromhd=/dev / hda1 After using a HD install, knoppix boots considerably faster than boot from iso image, however, it would take 10-15 minutes to populate all the devices. we looked at how we could make QEMU boot a system much faster using the savevm/loadvm commands from the monitor. Once the above system has booted, and we’ve got all the applications setup the way we want them, then type CTRL-ALT-2 to go into the QEMU monitor. Once inside the QEMU monitor, type ”savevm knoppix-save.vm” and then ”quit”. The file knoppix-save.vm should be save. Note that not all file formats support this. raw does not, but qcow2 does. Restarting a QEMU savevm’d image Once again, we add another parameter to the startup line for the qemu instance. To get the very quick restart of a running image, we run: qemu −cdrom / v o l / dev / r d s k / c 1 t 0 d 0 / knoppix

Figure 2: Tap Device Diagram −m 512 −k en−us −boot d −hda knoppix . img −loadvm knoppix−s a v e . vm

Figure 3: VLAN Diagram

and in about 10 seconds, my Knoppix instance is back where it was when I ran the savevm command from the QEMU monitor. After you restore an image like this, I found out that you should do a [CTRL-ALT-1] to reset the virtual machine’s system state, which explains the weirdness I saw with the keyboard and restored image. I wish I had found this sooner. Waiting 10-15 minutes for something to bootup and enumerate all the devices is pretty hideous. Once the hard drive image is set up, the next problem we faced was to transfer the boot loader files to the server. The default QEMU user network doesn’t work for our installed hard drive image which means there is not network connect between the emulated server and the host machine. One way to resolve it is to set up tap device which is shared by both host and guest machines.

3.1.2 Create a Tap device Tap device here is defined in the context of QEMU and it acts as virtual ethernet card which forwards the packet between host and guest OS at the data link layer in terms of ISO standard. The tap device set up consists of two parts, one is on the host side and the other takes place on guest side. Figure 2 shows the relationship between guest and host through the tap device connection. Set up tap0 device on the host side: mkdir / dev / n e t mknod / dev / n e t / tun c 10 200 sudo t u n c t l sudo i f c o n f i g tap0 1 9 2 . 1 6 8 . 1 0 0 . 1 up Set up the tap0 device on the guest side with an ip address, here I use 192.168.100.2 for this. qemu −hda knoppix . img −k en−us −n e t tap , i f na m e=tap0 sudo i f c o n f i g tap0 1 9 2 . 1 6 8 . 1 0 0 . 2 up Once the tap device is configured on both sides, the boot loader files can be transferred using scp command and saved in the specific directory discussed in the previous section.

3.1.3 server-client connection

Figure 4: Socket Connection Diagram Now we have the server mostly ready, the only thing left is to connect the client with it. There are several nice network configuration provided by QEMU. One of them is called VLAN, which basically acts as a virtual switch that allows emulated machine to connect to different physical interface. Figure 3 shows the VLAN’s diagram from guest’s perspective. The command parameters provided by QEMU looks as follows: −n e t n i c , v l a n =0, Figure 4 shows the socket diagram which connects two different VLANS from two guest OSs. The following command is used to launch two QEMU instances for server and client respectively. server : −n e t n i c , v l a n =0, macaddr = 5 2 : 5 4 : 0 0 : 1 2 : 3 4 : 5 6 , model=e1000 −n e t s o c k e t , v l a n =0, l i s t e n=l o c a l h o s t : 9 0 0 0 client : −n e t n i c , v l a n =2, macaddr = 5 2 : 5 4 : 0 0 : 1 2 : 3 4 : 5 7 , model=n e 2 k p c i −n e t s o c k e t , v l a n =2, c o n n e c t=l o c a l h o s t : 9 0 0 0

3.1.4

Booting

In order to set up the booting process correctly, we looked up how actually the booting process takes place. There is an article from [1] describes this process in great details. Here, I just summarize the key points from the article. Usually when a computer is powered on, the processor executes

code at a well-known location. This location is in the basic input/output system (BIOS) stored in flash memory on the motherboard. The BIOS must determin which devices are candidates for booting. In this case, BIOS can look for the PXE iso image as a bootable device. Usually the boot loader takes two stages: the first stage is to load the primary boot loader into RAM which usually at most as big as a hard drive sector. Once the first-stage boot loader is loaded into RAM and executed, it goes to the network and download the secondary boot loader and init task from TFTP server. Once the downloaded boot loader containing kernel image is loaded in RAM, the system starts to boot the kernel with the initial ram disk. After all the boring steps mentioned above, we are now so eager to try PXE booting. As we mentioned before, the website [2] provides different gpxe ISO images in terms of different ethernet cards. We downloaded e1000 version since it is one of ethernet cards that QEMU and palacios both support. In order to make sure that the gpxe image actually working, we decided to just launch gpxe iso image simply as the client. The booting process went smoothly and loaded all the way to the client system. At this point we are so close to the final goal, which is to implement PXE boot inside kitten/palacios. By understanding the booting process described in the beginning of this subsection, we understand that PXE booting can be implemented as a part of BIOS or can be combined with BIOS, we create an kitten iso image with gpxe as a guest. However, with e1000 ethernet card we soon found out the catch that DMA type network card would not work for us. The reason is that DMA operations require data written to the guest physical address directly while palacios maps the guest’s physical address by applying an offset. Palacios has no way to retrieve the offset unless a hyper call is injected in the gpxe source code. This is one way to reach our final goal. There are some alternative ways to work around the problem. Since QEMU supports ne2k network card as well which only operates at Port I/O level. In order to catch the ne2k I/O request, we need add irq handler inside palacios code and the interrupt raise function as well. The easiest and most convenient method is to leverage palacios’s PCI pass through and find out the iso image with PCI ethernet card but use Port I/O technique rather than DMA. At the end of presentation we downloaded the gpxe iso image with all the possible drivers, which contains ne2k pci device driver luckily. Since both qemu and palacios support ne2k ethernet card, we eventually meet our goals. Yeah!

4. CONCLUSIONS Through this project, we get to understand more about system’s booting process, including how much phases it involves, which file serves what functions during each phase, and why the booting process need break down into different phases and execute them in a specific sequence. The other part is to get familiar with PXE network protocols, how to set up and configure the test system based on

the QEMU emulated environment.

5.

ACKNOWLEDGMENTS

In this project, we received tremendous help from Professor Dinda and his student Lei Xia. Without their help, we wouldn’t make the booting work.

6.

REFERENCES

[1] www.ibm.com/developerworks/library/llinuxboot/index.html. [2] www.romomatic.net. [3] Y. Tang and L. Xia. Booting palacios/kitten over the network using pxe.