Jobs targeting on the same VM are serialized and executed in order .... temporary store for in-band job to query informa
Job-oriented VM State Synchronization in CloudStack
Kelven Yang Citrix
Agenda – – – –
The Sync problem Legacy solution and its pain points High level principles of the new solution and change details Future work
The Sync Problem – VM lifecycle in CloudStack • Starting, Running, Stopping, Stopped, Migrating, Expunging – VM lifecycle in Hypervisor • Powered-off, Powered-on, Suspended – Resource implications with VM in CloudStack • Hypervisor VM resource • Network environment • Storage environment • Guest OS environment – Bring things in sync • In-band VM operations • Out-of-band VM operations
Legacy VM State-sync implementation – Designed for in-band VM operations – Hypervisor resource agent to participate CloudStack VM lifecycle management – Full-sync/Delta-sync • Setup system initial state with full-sync process • Perform delta synchronization periodically
Legacy VM state-sync pain-points – Resource agent to participate VM lifecycle management increased complexity of writing hypervisor agent • Maintain in memory cache • Monitor in-band operations issued from CloudStack • Generate delta report – Full-sync chain of actions • In-place sync triggers chain of actions if a large number of VMs are out of sync • In-place sync processing logic needs to exhaust all possible scenarios. – Out-of-band changes are hard to be incorporated into the process – Make a very tightly-coupled situation
High level principles of new VM state-sync – Decouple Hypervisor resource agent from VM lifecycle management • Report raw power state only • Carry on hypervisor specific low-level operations only – Serialize VM operations • Jobs targeting on the same VM are serialized and executed in order • State transition is handled within the context of the job – Loosely couple interactions with messaging bus • Glue VM state report, VM state management, VM HA management through the in-memory bus facility
VM State-Sync Interactions
Sync event source
PowerState SyncManager
Out-‐of-‐band change processing
VirtualMachine Manager
In band change processing
Orchestra6on Orchestra6on Jobs Jobs
Resource agent report Publish sync-‐change no6fica6ons
Sync-‐change no6fica6on
Raw reports
Message Bus Driving thread
Sync-‐change no6fica6on
VM State-Sync interactions – VM Power state sync manager • Responsible to maintain power state management • Responsible to generate change event and publish to the message bus – In-band state transition handling • Change notification only triggers the wakeup of the job that is waiting for these change events • Process happens within the job context to complete state sync process – Out-of-band state transition handling • Out-of-band changes can be detected easily by looking at existence of pending job working on the VM
Related Schema changes for new State-Sync – VM Power state management • New fields in vm_instance table power_host, power_state_update_time, power_state_update_count, power_host – Job management • New vm_work_job table • New async_job_join_map table • New async_job_journal table
VM Power state-sync server part – Power state change detection Initial base point is set when host is connected Detect changes based on periodical report from resource agent – Missing VM detection VM in previous known good state report may be missing from next round of report, the situation may happen in scenarios when out-of-band VM deletion happens. – Performance consideration VM stays at stationary states most of time, we may have same power state update of a particular VM for a long time
When the number of consecutive same updates exceeds a threshold, no need to make update.
VM Power state-sync – resource part – Retire of resource agent VM state cache New management server sync logic no longer needs resource to maintain such cache, tracking of state transition and delta change detection are also not needed.
– Example of resource agent to compose VM power-state report
private HashMap getHostVmStateReport() {
foreach(vm on the host) {
gather VM power state
put it in the report
}
return the report }
In-band change processing – – –
–
Job that is performing in-band change is responsible to orchestrate the process Target state transition is monitored through message bus and completion determination is checked through Predicate interface Example orchestration flow (pseudo code) submit a worker job to carry on VM operation _jobMgr.waitAndCheck( new String[] { TopicConstants.VM_POWER_STATE, TopConstants.JOB_STATE }, 3000L, 60000L, new Predicate() { @Override public boolean checkCondition()) {
VMInstanceVO instance = _vmDao.findByid(vm.getId());
if(instance.getPowerState() == VirtualMachine.PowerState.PowerOff)
return true;
return false; }); Predicate interface public interface Predicate {
boolean checkCondition(); }
Out-of-band change processing @MessageHandler(topic = Topics.VM_POWER_STATE) private void HandlePowerStateReport(String subject, String senderAddress, Object args) { …. if (pendingWorkJobs.size() == 0 && !_haMgr.hasPendingHaWork(vmId)) { // there is no pending opera6on job VMInstanceVO vm = _vmDao.findById(vmId); if (vm != null) { switch (vm.getPowerState()) { case PowerOn: handlePowerOnReportWithNoPendingJobsOnVM(vm); break; case PowerOff: case PowerReportMissing: handlePowerOffReportWithNoPendingJobsOnVM(vm); break; default: assert (false); break; } } } else { _vmDao.resetVmPowerStateTracking(vmId); } }
VM_POWER_STATE topic on message bus to trigger Determina6on of out-‐of-‐band changes Dispatch of handling for out-‐of-‐band changes
Consider this as in-‐band change, let it be handled within job context. Reset tracking so that we won’t lose triggering events
Supporting facilities – Message bus • In Memory • Hierarchical subscriber management • Annotation based dispatching (@MessageHandler) – Job facility We now tie all system activities with associated jobs, logging system is also updated to help problem diagnostic with tracking on per top-level jobs
• API job
API job gives a running context for an asynchronous API requests. • Work job
Work job carries the real orchestration process, its run will be serialized on target VM • Pseudo job
Pseudo job gives a background thread a job context
Journey of getting there – Dual state reports • Simple model, but it is a model shift at very low level. • Changes touch all hypervisor resources, all major orchestration flows, HA etc, it is very hard for unit test • Dual state reports allow overlapping of old model and new model – Compromises • Synchronous in-band process for state transitions • Thread starvation • Result/Exception propagation across boundaries
Next Step – In-memory power-state table Reason to store power-state is for change detection and as a temporary store for in-band job to query information, storage persistency is not mandatory.
Using DB memory based table would help to scale and improve performance – General pluggable model to sync other types of information Immediate needs could be sync-support of storage DRS