Job-oriented VM State Synchronization in CloudStack

Job-oriented VM State Synchronization in CloudStack

Kelven Yang Citrix

Agenda –  –  –  – 

The Sync problem Legacy solution and its pain points High level principles of the new solution and change details Future work

The Sync Problem –  VM lifecycle in CloudStack •  Starting, Running, Stopping, Stopped, Migrating, Expunging –  VM lifecycle in Hypervisor •  Powered-off, Powered-on, Suspended –  Resource implications with VM in CloudStack •  Hypervisor VM resource •  Network environment •  Storage environment •  Guest OS environment –  Bring things in sync •  In-band VM operations •  Out-of-band VM operations

Legacy VM State-sync implementation –  Designed for in-band VM operations –  Hypervisor resource agent to participate CloudStack VM lifecycle management –  Full-sync/Delta-sync •  Setup system initial state with full-sync process •  Perform delta synchronization periodically

Legacy VM state-sync pain-points –  Resource agent to participate VM lifecycle management increased complexity of writing hypervisor agent •  Maintain in memory cache •  Monitor in-band operations issued from CloudStack •  Generate delta report –  Full-sync chain of actions •  In-place sync triggers chain of actions if a large number of VMs are out of sync •  In-place sync processing logic needs to exhaust all possible scenarios. –  Out-of-band changes are hard to be incorporated into the process –  Make a very tightly-coupled situation

High level principles of new VM state-sync –  Decouple Hypervisor resource agent from VM lifecycle management •  Report raw power state only •  Carry on hypervisor specific low-level operations only –  Serialize VM operations •  Jobs targeting on the same VM are serialized and executed in order •  State transition is handled within the context of the job –  Loosely couple interactions with messaging bus •  Glue VM state report, VM state management, VM HA management through the in-memory bus facility

VM State-Sync Interactions

Sync event source

PowerState SyncManager

Out-‐of-‐band change processing

VirtualMachine Manager

In band change processing

Orchestra6on Orchestra6on Jobs Jobs

Resource agent report Publish sync-‐change no6fica6ons

Sync-‐change no6fica6on

Raw reports

Message Bus Driving thread

Sync-‐change no6fica6on

VM State-Sync interactions –  VM Power state sync manager •  Responsible to maintain power state management •  Responsible to generate change event and publish to the message bus –  In-band state transition handling •  Change notification only triggers the wakeup of the job that is waiting for these change events •  Process happens within the job context to complete state sync process –  Out-of-band state transition handling •  Out-of-band changes can be detected easily by looking at existence of pending job working on the VM

Related Schema changes for new State-Sync –  VM Power state management •  New fields in vm_instance table power_host, power_state_update_time, power_state_update_count, power_host –  Job management •  New vm_work_job table •  New async_job_join_map table •  New async_job_journal table

VM Power state-sync server part –  Power state change detection Initial base point is set when host is connected Detect changes based on periodical report from resource agent –  Missing VM detection VM in previous known good state report may be missing from next round of report, the situation may happen in scenarios when out-of-band VM deletion happens. –  Performance consideration VM stays at stationary states most of time, we may have same power state update of a particular VM for a long time

When the number of consecutive same updates exceeds a threshold, no need to make update.

VM Power state-sync – resource part –  Retire of resource agent VM state cache New management server sync logic no longer needs resource to maintain such cache, tracking of state transition and delta change detection are also not needed.

–  Example of resource agent to compose VM power-state report

private HashMap getHostVmStateReport() {

foreach(vm on the host) {

gather VM power state

put it in the report

}

return the report }

In-band change processing –  –  – 

– 

Job that is performing in-band change is responsible to orchestrate the process Target state transition is monitored through message bus and completion determination is checked through Predicate interface Example orchestration flow (pseudo code) submit a worker job to carry on VM operation _jobMgr.waitAndCheck( new String[] { TopicConstants.VM_POWER_STATE, TopConstants.JOB_STATE }, 3000L, 60000L, new Predicate() { @Override public boolean checkCondition()) {

VMInstanceVO instance = _vmDao.findByid(vm.getId());

if(instance.getPowerState() == VirtualMachine.PowerState.PowerOff)

return true;

return false; }); Predicate interface public interface Predicate {

boolean checkCondition(); }

Out-of-band change processing @MessageHandler(topic = Topics.VM_POWER_STATE) private void HandlePowerStateReport(String subject, String senderAddress, Object args) { …. if (pendingWorkJobs.size() == 0 && !_haMgr.hasPendingHaWork(vmId)) { // there is no pending opera6on job VMInstanceVO vm = _vmDao.findById(vmId); if (vm != null) { switch (vm.getPowerState()) { case PowerOn: handlePowerOnReportWithNoPendingJobsOnVM(vm); break; case PowerOff: case PowerReportMissing: handlePowerOffReportWithNoPendingJobsOnVM(vm); break; default: assert (false); break; } } } else { _vmDao.resetVmPowerStateTracking(vmId); } }

VM_POWER_STATE topic on message bus to trigger Determina6on of out-‐of-‐band changes Dispatch of handling for out-‐of-‐band changes

Consider this as in-‐band change, let it be handled within job context. Reset tracking so that we won’t lose triggering events

Supporting facilities –  Message bus •  In Memory •  Hierarchical subscriber management •  Annotation based dispatching (@MessageHandler) –  Job facility We now tie all system activities with associated jobs, logging system is also updated to help problem diagnostic with tracking on per top-level jobs

•  API job

API job gives a running context for an asynchronous API requests. •  Work job

Work job carries the real orchestration process, its run will be serialized on target VM •  Pseudo job

Pseudo job gives a background thread a job context

Journey of getting there –  Dual state reports •  Simple model, but it is a model shift at very low level. •  Changes touch all hypervisor resources, all major orchestration flows, HA etc, it is very hard for unit test •  Dual state reports allow overlapping of old model and new model –  Compromises •  Synchronous in-band process for state transitions •  Thread starvation •  Result/Exception propagation across boundaries

Next Step –  In-memory power-state table Reason to store power-state is for change detection and as a temporary store for in-band job to query information, storage persistency is not mandatory.

Using DB memory based table would help to scale and improve performance –  General pluggable model to sync other types of information Immediate needs could be sync-support of storage DRS

Job-oriented VM State Synchronization in CloudStack

Job-oriented VM State Synchronization in CloudStack

Suggest Documents

State Synchronization in Heterogeneous Groupware

CloudStack Advanced Installation Guide

CloudStack Basic Installation Guide

Cloud.com CloudStack Installation Guide

Synchronization During Java VM Initialization and Termination - Rice CS

Synchronization of VM probes for observing P2P traffic and application

State Synchronization Approaches in Web-based ...

Analysis of CloudStack Platform Suitability

CloudStack Tuning - Linux Foundation Events

VM 250 ... VM 257 - ATR

Integrating VM Selection Criteria in Distributed Dynamic VM ...

Containers/Docker with CloudStack - Linux Foundation Events

VM Migration

On State Synchronization of Business Conversations - Computing

On State Synchronization of Business Conversations - CiteSeerX

Efficient Synchronization of State-based CRDTs

VM Design

ANGEL : A Hierarchical State Synchronization ... - Keio University

Transient synchronization following invasion - Penn State Entomology

asynchronous information distribution and cluster state synchronization

P4R800-VM

(VM) in routine diagnostic pathology

VM Migration

Altered Intra-and Interregional Synchronization in Resting-State ...