Document not found! Please try again

Incorporating Virtualization Awareness in Service Monitoring Systems

3 downloads 1255 Views 2MB Size Report
paths exist for integrating service monitoring and server virtualization support: (i) ... The free version of the solution offers virtual machine (VM) live migration, ...
12th IFIP/IEEE

International Symposium on Integrated Network Management

2011

Incorporating Virtualization Awareness in Service Monitoring Systems Marcio Barbosa de Carvalho, Lisandro Zambenedetti Granville Institute of Informatics Federal University of Rio Grande do Sui Porto Alegre, Brazil

{mbcarvalho, granville}@inf.ufrgs.br

Abstract- Traditional service monitoring systems (e.g., Nagios

introduced, and the set of experiments carried out in order to

and Cacti) have been conceived to monitor services hosted in physical computers. With the recent popularization of server virtualization platforms (e.g., Xen and VMware), monitored services can migrate from a physical computer to another, invalidating the original monitoring logic. In this paper, we investigate which strategies should be used to modify traditional service monitoring systems so that they can still provide accurate status information even for monitored services that are constantly moving on top of a set of servers with virtualization support.

evaluate our proposed solution is described. In Section

5

we

present and discuss the results from our experiments. Finally, in Section

6,

we close this paper with conclusions and future

work. II.

RELATED WORK

Current monitoring solutions for server virtualization are usually devoted to a specific aspect of virtualization. Security is probably one of the main aspects, for example, when a compromised virtual machine (VM) affects the performance of

I.

other VMs hosted in a shared physical computer. Back in

INTRODUCTION

[1]

Modern server virtualization solutions like Xen

[2],

VMware

despite

enabling

network

administrators

2003,

and of

In complement, Payne

infrastructures, break service monitoring processes of popular

[3]

and Cacti

[4].

Because

a

failing

one)

(e.g.,

(e.g.,

to save processing

invalidates

the

this

paper,

we

investigate

the

of

We use an experimental machine

awareness

of

virtualization

It is important to observe that the current solutions,

is

especially

when

servers are

monitoring agent and the monitored system. Machida

highly

[9],

Independent from virtualization management, in another front,

we review related work on service monitoring,

architecture

we present the design of our solution to incorporate

4

is

service

usually

where

monitoring

realized

monitors

through

contact

in a

remote

networked very

simple

machines

to

retrieve the operational status of the services of interest

the network

running at those machines. Because they are based on such a

environment where critical services must be monitored is

978-1-4244-9221-31111$26.00 ©2011 IEEE

traditional

environments

especially considering the support for server virtualization. In virtualization awareness into Nagios. In Section

at.

according to the changes in the virtualized resources.

The remaining of this paper is organized as follows. In

3

et

for example, presented a VM monitoring solution where

monitoring server) was able to adapt its monitoring traffic

unbound to the service being observed.

Section

the

an external monitoring agent (in fact it was refereed as

virtualization, as of today, every time a service starts running in a different physical machine, the monitoring system will be

2

assume

are, however, other research works that physically separate the

demanded. If the monitoring system is unaware of server

Section

those based on VM introspection,

monitoring agent to be placed inside physical machines. There

environment, service monitoring is critical to identify resource in moments

introduced,

observing the performance of VMs, in that case, instantiated in

incorporated into Nagios in order to monitor a set of services

especially

[8]

at.

the Xen virtualization platform.

in a real production network environment. In such a real failures,

et

for example, the PMonitor, which is a lightweight monitor for

monitoring systems when server virtualization takes place in where

advanced the previous work by

Like security, the performance of VMs is also an aspect

modifications and adjustments are required in current service the managed IT infrastructures.

[6]

that requires monitoring solutions. Shao

which

approach

at.

to dispense with human interventions, was able to automated

traditional

question

et

fix just compromised VMs.

monitoring logic. In

proposed a set of requirements

proposing a more sophisticated monitoring agent that, in order

when a healthy server takes over

immediately

et at. [6]

proving the concept by introducing the XenAccess monitoring library. Fraser

monitoring the services of a networked environment, any power) or replacement

presented a technique

to guide the development of VM monitors, in addition of

such systems have a list of fixed servers to contact while virtual machine (VM) migration

[5]

employment of a VM monitor running inside the physical host.

creating more flexible and robust information technology (IT) monitoring systems such as Nagios

Garfinkel and Rosenblum

called VM introspection for preventing intrusions through the

297

simple architecture,

open source monitoring systems like

common monitoring values, such as the method used to check

Nagios [3] and Cacti [4] have gained popularity among IT

service status,

administrators.

administrator responsible for the group, and the window period

In

addition,

growing

commumtIes

of

independent developers and enthusiasts provide plugins to

utilization

monitoring

becomes

between

checks,

the IT

Nagios uses two types of checks: active and passive. Active checks are initiated at the Nagios server, and triggered and controlled by the main Nagios process. Passive checks, in tum,

can be settle. resource

interval

that the service must be observed.

these systems so that very sophisticated monitoring scenarios

The

the time

more

complex in virtualized environments. In these environments,

are initiated and controlled by external processes that notify

the administrator must monitor both the real and virtual

Nagios when the status of monitored services changes.

resources.

Tools like Nagios

monitor resource utilization

In order to perform an active check, Nagios issues the

consulting agents installed on servers. In virtual machines, these

agents

only

provide

information

about

the

execution of a command line script defined in a service object

virtual

of the Nagios configuration hierarchy. The results of that

resources. VM monitoring tools, like Xen Center, provide both

execution is collected by the Nagios server from the standard

information but are unable to monitor the services that the

output and stored in internal data structures. These command

virtual machine runs.

lines are in fact plugins of the system that can be coded by third-party developers to enrich a Nagios' plugin library. In

From a historical point of view, service monitoring systems have been conceived before the current popularization of

order

server virtualization platforms. Because of that, traditional

complementary offers to the IT administrator the Nagios

service

monitoring

systems

are

incapable

of

to

remotely

execute

Nagios

plugins,

the

systems

tracking

Remote Plugin Executor (NRPE), which is itself a plugin

monitored services when they migrate from one physical host

installed in the Nagios server and a remote daemon running at

to another. On the other side, the systems specifically designed

the monitored hosts.

for VM monitoring check the status of the VMs themselves,

Passive checks are executed by external processes

but they ignore the services running on those VMs. In addition, such VM monitoring systems are not currently as widely used

monitored hosts -- that inform the main Nagios process about

as traditional service monitoring systems. In this scenario, two

the changes in the status of monitored services. That happens

paths exist for integrating service monitoring and server

when monitoring processes request Nagios to execute a so

virtualization support: (i) modifying VM monitoring systems

called external command (since it comes from an external

to become aware of the running services, or (ii) modifying

process). That is done allowing external processes to write into

traditional service monitoring systems to become aware of the underlying

virtualization

infrastructure

the Nagios external command file, which is periodically read

that is hosting the

by Nagios main process to check whether there are new

critical services. Since we believe that the popularity of service

external commands to be executed. External processes running

monitoring system among IT administrator will not decrease in

in the Nagios server are able to directly write into the external

the near future, we understand that the option of adapting the current service

monitoring

systems

is

command file. External processes running on remote hosts,

more feasible and

however,

realistic. III.

through

indirectly write into the external command file the

intermediation

of the Nagios Service Check

Acceptor (NSCA), which is a daemon running in the Nagios

PROPOSED SOLUTION AND IMPLEMENTA TION

server that receives remotely issued command calls.

In this section we first review Nagios and Citrix XenServer

Events in Nagios are handled in the system's server by

key aspects for our proposal to then present strategies to

Nagios Event Broker (NEB) modules. When a NEB module is

incorporate virtualization awareness into Nagios.

A.



running either locally at the Nagios server or remotely on the

developed to deal with a particular event, that module must subscribe its interest into a Nagios pipe. When the event of

Nagios overview

interest happens, Nagios checks which NEB modules have

Nagios is an open source monitoring tool widely used by

been subscribed and, through callback functions, passes the

IT administrators. Nagios' configuration is defined through

control of the server to the registered modules. Since Nagios

hierarchically organized set of configuration objects that map

employs an infinite single loop, the calling NEB modules must

the actual structure for the managed network and the services

quickly process and return the control to the Nagios core in

of interest running on top of it. From top to bottom, Nagios

order not to affect the performance of whole system.

defines, in its configuration hierarchy, hostgroup objects that contain host objects, which, in tum, contain service objects.

B.

Hostgroups uses host templates to minimize the registration effort required when new hosts need to be included into the system.

Host

templates

describe

common

XenServer

is

based

on

the

open

source

Xen

hypervisor. The free version of the solution offers virtual

monitoring

machine (VM) live migration, while load balancing and high

configuration values for the hosts within the same hostgroup.

availability support is only available on paid versions of the

That includes, for example, the time interval between status

system. To support VM live migration, XenServer uses a

checks, the hosts' icon to be used in Nagios visual maps, and

centralized storage that is shared among all physical hosts of a

the deadline for issuing notifications of internal changes of a host.

Citrix XenServer Citrix

cluster of servers. All physical hosts of the cluster can run any

Service templates are also provided to ease the

VM saved in the shared storage. Locally stored VMs can also

registration of network services to be monitored and that share

298

be used,

but in that case with no support for VM live

Na ios server

migration, load balancing, and high availability. Citrix

XenServer

uses

a

master-slave

strategy

Nagios main process

to

orchestrate its cluster. Information about the whole cluster and Physical host

all associations between VMs and physical hosts (i.e., which VMs are being hosted by each physical host) can be entirely accessed through and from any physical host. In our investigation, we consider a Citrix XenServer cluster where the monitored services can move from one physical host to the other when the VMs that are running the monitored

Figure 2.

services migrate (e.g., to save power energy or optimize processing).

Passive check architecture

Since in our case study critical services are

It's important to highlight that notifications to the Nagios

monitored by Nagios and such services can move over a

server are sent only when the list of VMs of a physical host is

XenServer cluster, Nagios needs to be adapted to be aware of

changed, thus decreasing the network traffic generated to

the underlying virtualization support that is taking place.

check the list of running VMs, when compared to the active checks strategy.

C.

Active checks strategy and architecture

Since

In order to incorporate virtualization awareness using Nagios' active checks, the first strategy consists in obtaining

is too large. A variation of the passive checks strategy, called

plugin, called check_xen_virtual_machines, is remotely

aggressive passive checks, can take advantage of the fact that

executed in each physical host through the intermediation of

each server in the Citrix XenServer cluster is aware of the

NRPE. At the Nagios server side, a new service object is the

system's

configuration

is

perceived much later at the Nagios side if the checking interval

XenServer cluster is hosting. In this strategy, a new Nagios

in

plugin

configuration, an internal change on a physical host may be

the list of VMs that a physical host belonging to a Citrix

registered

check_xen_virtual_machines

the

executed in time intervals that depend on the system cron

status of the whole cluster. The aggressive passive checks

hierarchy,

strategy

corresponding to each physical host that hosts VMs that need

consists

in

modify

the

check_xen_virtual_

machines plugin to notify the Nagios server about the entire

to be monitored. Such service, called virtual_machines,

status of the cluster at once. In this case, the first physical host

lists the VMs currently running on each physical host. The

of the cluster that detects a change notifies that change to the

checking command line associated with this service uses the

Nagios server. All other physical hosts would eventually detect

NRPE local plugin to contact the NRPE daemon running at the

that change too, and then inform Nagios again. This strategy

remote physical hosts of interest. The architecture to support

increases the network usage because the same change is

the active checks strategy is presented in Fig. 1.

notified several times, but it decreases the delay to detect a change. The impacts of the aggressive passive checks over the communication

Nagios server

conventional

network,

passive

as

and

well

active

as

the

checks,

impacts are

of

analyzed

the in

Section IV.

call Nagios main process !---,c""e;;";c" """; e;;"" n_-; v;;; n'i't;cua;;--71 _ x;O;:: machines

E.

Physical host

VM and physical host association and visualization The fact that virtualization awareness is incorporated into a

service monitoring system - in our case, into Nagios - does not mean that the user of the service monitoring system will be also aware, through the system, of the underlying virtualization support; the final user may or may not be aware of it. We Figure 1.

believe, however, that IT administrators should be conscious

Active check architecture

not only about the status of services they need to monitor, but D.

also about the actual location of those services running over a

Passive checks strategy and architecture

set of physical servers.

The passive checks strategy consists of physical hosts

In our proposed solution, not only Nagios is aware of the

notifying the Nagios server about the VMs that are currently

virtualization support presented in the hosting servers, but it

running. In this strategy, a different implementation of the check_xen_virtual_machines plugin is used. hosts have their system cron

configured

to

also exposes such awareness via its graphical user interface

Physical

(GUI) to the IT administrator. In essence, the IT administrator

periodically

is able to visually observer the status of critical services as well

execute check_xen_virtual_machines, which in its turn

as where they are currently running. In order to achieve that

detects when the set of running VMs changes and notifies Nagios using the send_nsca

with Nagios, our approach consists in having, for every VM

command that contacts the

being

NSCA daemon at the Nagios server. The passive checks

monitored,

a

Nagios

logical

service

called

physical_load that presents the identification and operation

strategy is illustrated in Fig. 2.

status of the associated physical host. Figure 3. shows a system

299

concerned are scalability, average response time, and network

snapshot where, for example, a VM that is running on top of

rsxen02

the

traffic.

physical server that, in its turn, is operational

adequate.

check_physical,

The possible,

the new plugin that makes it

implementation first reads from a Nagios local

folder a set of files that inform, for each VM, the identification of the physical host that is currently hosting each VM. Such files

are

updated

detect_host_change

by

a

that

listens

NEB

module

for

called

updates

in

the

VM/physical host association detected by either the active or passive strategies. Afterwards,

check_physical

contacts,

using active checks, the remote physical host to update its status information stored at the Nagios server, which is finally presented to the IT administrator, as exemplified by Fig. 3.

Check

1 mount

DISK OK - free space:

Check

1 3178

1.16 (50%

inode=31%):

QQi!!! /boot mount

I �==;;; DISK OK - free space: 13178 1.16 (50%

QQi!!!

inode=31%):

Check CPU Load

OK - load average: 0.00, 0.00, 0.00

Check

Memory

Check Swap Free , physical load

rsrails2

rsvlIlteste

Up

Up

Up

Memory WARNING - 95.7% (146380 k6) used

WARNING

Used

rsproject

;;;;;;;

OK ::::==

SWAP OK-100% free (0 1.16

out of 0

Figure

M6)

4.

Physical machines and VMs hierarchy visualization

On rsxen02 - OK - load average: 0.06, 0.06,

OK

IL____--' 0.01

Figure

3.

Real and virtual resource information visualization

Na iDS server Nagios Web interface

Another important view is the one where the primary focus is on the physical host, to then present the internal VMs. Figure 4. shows a second snapshot where our modified setup

Active or passive checks

of Nagios presents a hierarchy of physical machines and VMs of a monitored Citrix XenServer cluster. The

detect_host_change

NEB

module updates the

Nagios data structures when needed, but the Nagios Web

Figure

5.

Updating Nagios view of VMs and physical hosts

interface does not read the monitoring information directly from data structures. The Web interface, in fact, reads a file called

object_cache_file

shown.

In

order

to

detect_host_change

A.

to retrieve the information to be ensure

visual

consistency,

on

emulated larger systems

behavior of the Nagios plugins and modules in large systems.

must additionally patch such a file so

We

that the Nagios interface can show the current associations of

have

emulated large

systems

due

to restrictions on

acquiring a large number of real physical hosts, but at the same

the Citrix XenServer cluster. Parsing the

Scalability

The goal of this evaluation experiment is to check the

time we used an actual installation of Nagios to treat the

object_cache_file

file - whose size can be

migration of VMs over four emulated physical infrastructure

larger than 500 Kbytes - can be very demanding for the

setups. Our Nagios installation has been settled to support the

Nagios main process.

following four different scenarios:

Consequently,

Nagios may become

temporarily unavailable. To avoid that, another thread had to be created by the NEB module to use internal data structures to

- 125 VMs running on top of 25 physical hosts

decide when the file needs to be patched. A patch in the

- 250 VMs running on top of 50 physical hosts

object_cache_file

- 375 VMs running on top of 75 physical hosts

file

is

carried

out

only

if

the

associations have changed. This is illustrated in Fig. 5.

- 500 VMs running on top of 100 physical hosts In all four scenarios, each physical host hosts five VMs.

IV.

We

EVALUATION

have

coded

a

script

that

informs

Nagios

about

all

information on associations between VMs and physical hosts

This section presents a set of evaluation experiments

of the emulated scenarios. Such script uses passive checks to

whose goal is to analyze some aspects of our proposed solution that shows the impact and cost it. The aspects that are

300

send to Nagios the list of VMs that each physical host is running.

The maximum response time that the Nagios interface takes to show the correct association information using active checks is:

Nagios' initially shows all monitored hosts side by side, since no association information is present. Afterwards, the script that sends to Nagios the lists of all VMs of each physical host is executed. The experiments ends when the Nagios interface shows all VMs associated with their respective physical hosts. This is simple to visually check because, when finished, no VM will be displayed without a physical host associated to it. The response time that is measured is the difference between the moment when our script is started (emulating the notification of a set of migrating VMs) and the moment when the Nagios interface shows all VMs associated with their physical hosts.

I

I

150 �

3;100 � & 50

Figure 6.

B.

P + Mfl

+

PI.

(1)

+

fl

+

P + E + Mfl + PI.

(2)

In (2), I is the interval check time, fl is the time of processing a check, P is the time to receive the output of a check, E is the time to Nagios reads the external command file, Mfl is the time of NEB module processing, and PI is the time to refresh the Nagios Web interface. Again, we assume that fl, P, and Mfl consume 1 second each; PI takes 90 seconds; and E take 15 seconds. Assuming I takes 1 and 10 minutes, we have again two scenarios. We have 168 seconds with I taking 1 minute and 708 seconds with I taking 10 minutes.

200

=

+

For the passive checks strategies, we can make the same analysis using the equation below:

250

Virtual Servers

fl

In (1), I is the interval check interval, I, is the time to send a check to a monitored physical host, fl is the delay to process a check at the physical host, P is time to receive the output of a check, Mfl is the time of NEB module processing, and finally PI is time refreshing the Nagios Web interface. In an oversized estimation, we assume that I" fl, P, and Mfl consume 1 second each. We assume PI takes 90 seconds and I assumes 1 and 10 minutes, creating two different situations. In total, with I consumes 1 minute the total maximum time is 154 seconds; for an I consuming 10 minutes the total time is 694 seconds.

This first experiment has been repeated 30 times and the results depicted in Fig. 6 shows a confidence interval of 95%. As can be seen, even when the size of a scenario is increased four times, the average response time increased slightly more than 2 times. This shows that Nagios scales with the increased size of the monitored environment. However, in the scenario with 750 VMs and 150 physical hosts, the Nagios Web interface showed an erratic behavior, sometimes even crashing. This revels that Nagios GUI is unstable when monitoring large system.

Physical Servers

+ I, +

Another interesting estimative is the expected average time. To calculate this estimative, we assume the time of intervals like interval check, refresh interface, and reading the external command file as half of the time of the original maximum times presented above. This is the expected value for the intervals; it can be from 1 second up to the maximum time allowed. Thus, the expected time is half the maximum allowed. Equation (3) and (4) show the estimative for the active and passive checks strategies, respectively.

125 25 =

Average response time for large emulated scenarios

112 + I, + fl + P + Mfl

Estimated response time

We have conducted two comparisons: first, we made a comparative between estimated and measured times in each strategy. We are interested in the maximum time and the mean time that the solution takes to show association information between VMs and physical hosts after a change. Second, we made a comparative between strategies on the mean time measured.

112 + fl

+

P + E/2

+

Mfl

+

+

PII2

(3)

PII2

(4)

The aggressive passive checks strategy has a slight modification that minimizes the mean time because the first physical host that detects an association change informs the Nagios process faster. We consider the mean time as the interval check over 2, as explained above, multiplied by the number of physical hosts in the cluster. Equation (5) shows this estimative.

A theoretical response time estimative can be calculated for the active and passive checks strategies. These theoretical values become important to verify that the implementation does not present serious problems for construction. If the measured values are appropriate to the estimated times can infer the quality of the solution.

11(2* N)

301

+

fl

+

P + EI2 + Mfl + PII2

(5)

TABLEL

In (5), N is the number of physical hosts in the Citrix XenServer Cluster.

MEASURED TIMES FOR ACTIVE CHECKS Interval Check

Active Checks

For the active checks strategy, the mean time estimative with I of 1 minute is 79 seconds and with I of 10 minutes is

Maximum time estimated

349 seconds. For the passive checks strategy, the mean time

Mean time estimated

estimative with I of 1 minute is 85.5 seconds and with I of 10

Maximum time measured

minutes is 355.5 seconds. For the aggressive passive checks

Mean time measured

1 minute

10 minutes

154 s

694 s

79s

349s

130s

659s

66.67s

349,97s

strategy, the mean time estimative with N of 5 and I of 1 minute is 61.5 seconds, and with I of 10 minutes is 115.5

Table 2 shows the estimated and measured times for 30

seconds.

executions of each scenario with each interval check for the passive checks strategy.

C.

Response time in an actual cluster setup

After the theoretical analysis that indicates the expected

TABLE II.

results of the experiments, we present the technical details of our second experimental evaluation environment. In this case,

MEASURED TIMES FOR PASSIVE CHECKS Interval Check

Passive Checks

instead of emulating larger infrastructures, we employ a real

1 minute

10 minutes

Citrix XenServer cluster composed of five physical hosts that

Maximum time estimated

168 s

708 s

use Dual QuadCore Intel Xeon E5430 CPU with L2 cache of

Mean time estimated

85.5s

355.5s

12MB and 16GB of main memory. The cluster's storage has 1 TB to store virtual machines. The Nagios Server uses an Intel Core2 Duo E8400 CPU with L2 cache of 6MB and 2GB of main memory and all network connections are based on

Maximum time measured

149s

574 s

Mean time measured

77.5s

377.54 s

Table 3 shows the estimated and measured times for 30

gigabit Ethernet. Figure 7. shows the network topology of our

executions of each scenario with each interval check for the

second experimental environment.

aggressive passive checks strategy.

TABLE IlL

Citrix XenServer Cluster

MEASURED TIMES IN 30 EXPERlMENTALS

Aggressive Passive Checks

Interval Check 1 minute

10 minutes

Maximum time estimated

168 s

708 s

Mean time estimated

61.5s

115.5s

Maximum time measured Mean time measured

147s

557s

78.37s

401.27s

A statistical analysis must be made to ensure that the mean times measured in the experiment have a correlation with the mean time expected for the solution. Figure 8. shows the confidence interval to scenarios with I of 1 minute and 10 Figure 7.

minutes to all strategies presented. The confidence interval is

Second experimental infrastructure

calculated considering a confidence level of 95%.

The experiment consists on migrating a VM and measuring

500 450 400 :e 350 � 300 � 250 5r 200 & 150 100 50

the delay that the Nagios interface takes to show such a change. This experiment has been carried out considering different times for the check interval: an experimental scenario used the check interval of 1 minute, which is the minor time accepted

by

Nagios

and

system

cron,

while

another

experiment scenario uses 10 minutes, which is the default check interval of Nagios. This experiment has been performed in a real environment, like the one described in Fig. 7, and observed the three check strategies, Le., active checks, passive

''""'' ''1L..., ---'' "..", --"" - - -'---""""'-'''-''-''-,-- -c- --' -= --'---'--O-A -' o -'----ctive Checks . I =1 minute

checks, and aggressive passive checks.

o Passive Checks, I =1 minute

•Aggressive Passive Checks, I =1 minute

Table 1 shows the estimated and measured times for 30

gActive Checks, I =10 minutes

@Passive Checks, I =10 minutes

executions of each scenario with each interval check for the

�Aggressive Passive Checks, I =10 minutes

active checks strategy. Figure 8.

302

Response time considering the real cluster deployment

Both the interval checks of 1 and 10 minutes experiences faster response employed.

time

The

when

passive

the

active checks

checks

strategy

depends

on

In our experiment, we capture TCP transmissions of each

is

strategy and computed the average size of these transmissions

an

for the active checks and passive checks strategies, which are

strategy

additional delay that Nagios imposes to read the external

1960 bytes and 1594 bytes, respectively. Table V shows the

command file.

traffic accumulated in one hour.

In the scenario with I of

1 minute, this

difference is clearly observed because such difference is in the

The passive checks strategy is more efficient in terms of

order of 15 seconds, which is the interval that the Nagios reads

network usage. This strategy takes advantage of the size of the

the external command file. If the administrator wants to

individual transmissions and the knowledge that no migration

achieve the better response time we recommend the use of

occurs since the last execution. The active checks strategy

active checks strategy.

performs is better than the aggressive passive checks strategy

The aggressive passive checks strategy does not reach its

except in the scenario with 1 minute of interval check and in

objective. This strategy takes advantage of no determinism of

environments that rarely has a change in VMs and physical

the time of the execution of the notifying script. The script is

hosts

called by the system cron. Since the local clock of each

restrictions

physical host in cluster is synchronized among all the other

overloaded) we recommend the use of passive checks with I

hosts, the script ends up being called almost at the same time.

1 minute. This strategy has a response time a little greater, but

This turns the time of the execution of the script deterministic,

uses less traffic.

associations about

information.

If

administrative

the

traffic

administrator (the

has

network

is =

ending up providing no advantage over the conventional passive checks strategy. This can be observed by comparing

TABLEV.

the response time of passive checks and aggressive passive checks in Fig. 8. This situation could be improved by forcing a employed. However, this extra delay must be in the same order of interval check, to distribute the checks along all intervals. Network This

Number of transmissions

Strategy

random sleep time when the aggressive passive checks are

D.

NETWORK TRAFFIC IN ONE HOUR

traffic

Maximum

Minimum

Active Check. 1= 1 minute

117.600 bytes

117.600 bytes

Active Check, I =10 minutes

11.760 bytes

11.760 bytes

Passive Check, I =1 minute

95.640 bytes

9.564 bytes

Passive Check, I =10 minutes

comparison

consists

in

capture

all

the

traffic

Aggressive Passive Check, 1=1 minute Aggressive Passive Check, I =10 minutes

generated by each strategy. In order to compare our strategies with different interval check values, we observed the traffic generated in one hour of the system operation. Both the

9.564 bytes

9.564 bytes

478.200 bytes

47.820 bytes

47.820 bytes

47.820 bytes

passive checks strategies can minimize the network usage if they detect that no change occurred in the VMs and physical

V.

hosts association information since the last execution of the

CONCLUSIONS AND FUTURE WORK

checking script. Otherwise, at every 10 minutes this script

With the modifications made in Nagios to incorporate

forces a check to Nagios. This is needed in the case of a

virtualization awareness, a, IT administrator can put together

Nagios reboot. Without this enforced check, the physical host

the service status information of each virtual machine (VM)

only sends its internal information if a VM migrates. With this

with the resource metrics collected from the physical machine

information

the

that hosts it. This is dynamically executed in order to allow

minimum transmissions of each strategy for each physical

Nagios to detect migrations of VMs, thus requiring no human

we

can

be

compute

the

maximum

and

host. In the aggressive passive checks strategy, we consider

intervention into the Nagios conventional configurations. The

five physical hosts in the cluster. In this strategy, each physical

Web interface of Nagios graphically shows the relationships

host sends the association information of all physical hosts in

between VMs and physical hosts in the system's map. In this

the cluster, which increase the number of transmissions. The

map, the physical hosts are presented as parents of each VMs

Table IV shows the traffic results.

of the monitored environment. Although our work has employed the Citrix XenServer

TABLE IV.

MESSAGES IN ONE HOUR

Strategy

virtualized environment to prove the concept, one can easily adapt

Number of transmissions

our

proposed

solution

to

other

virtualization

environments. The only element of solution that is platform

Maximum

Minimum

specific is the check_xen_virtual_machines plugin. The

Active Check, I =1 minute

60

60

NEB module, which proceeds with the most complex task in

Active Check, I =10 minutes

6

6

Passive Check, I =1 minute

60

6

Passive Check, I =10 minutes

6

6

Aggressive Passive Check, 1=1 minute

300

30

Aggressive Passive Check, 1=10 minutes

30

30

the solution, expects a list of virtual machines for each physical host in the form of "OK - vmOl,vm02,vm03". Anyone can develop a plugin that informs Nagios sich a list for another virtualized environment.

303

Additional improvements for monitoring Citrix XenServer

[4]

physical hosts include the development of further Nagios

[5]

plugins

that

XenServer

collect

resource

hypervisor.

The

status

plugins

information supplied

by

of

the

Cacti. [http://www.cacti.net/] T. Garfinkel, M. Rosenblum, "A Virtual Machine Introspection Based

Architecture for Intrusion Detection," Network and Distributed System

Security Symposium,pp. 191-206,2003.

Nagios

[6]

community work only with the open source version of Xen and must then be adapted to work with Citrix XenServer. Or,

B. D. Payne, M. D. P. de Carbone, W. Lee W, "Secure and Flexible Monitoring of Virtual Machines," 23'd Annual Computer Security

Applications Conference (ACSAC),pp. 385-397,2007.

the IT administrator can use the check_snmp plugin to collect

[7]

metrics, for example, using SNMP.

T. Fraser, M. R. Evenson, W. A. Arbaugh, "VICI Virtual Machine

Introspection for Cognitive bnmunity," 24th Annual Computer Security Applications Conference (ACSAC),pp. 87-96,2008.

[8]

REFERENCES [1]

Citrix Systems,Citrix XenServer. [http://www.citrix.com/]

[2]

VMware. [http://www.vmware.com/]

[3]

W. Barth, Nagios: System and Network Monitoring, 2nd ed. San

Z. Shao, H. Jin, X. Lu, "PMonitor: a Lighweight Performance Monitor

for

Virtual

Machines,"

1st International

Workshop

Technology and Computer Science,pp. 689-693,2009.

[9]

on

Education

F. Machida, M. Kawato, and Y. Maeno,

"Adaptive Monitoring for

Virtual

Enterprise

Machine

Based

Reconfigurable

Systems,"

3'd

International Conference on Autonomic and Autonomous Systems

Francisco: No Starch Press,2008. [http://www.nagios.orgl]

(lCAS),pp. 8-8,2007.

304

Suggest Documents