CloudStack Tuning - Linux Foundation Events

CloudStack Tuning

whoami •  Name: Sudhansu Sahu •  Current Role: Working as a SDE at Citrix R&D India •  Having 5 years of experience in cloud space, since cloud.com days •  At Citrix I was a developer in CPBM product, then worked as a solution developer in worlwide cloud services •  Associated with apache cloudstack since 6 months

Goal •  To understand various configurations (OS/ MySQL/Tomcat/Java/Management Server) which has a direct impact on cloudstack performance. •  What will be the right value for these configurations? •  What is not configurable?

Overview •  OS configurations •  Management Server DB Configuration •  Management Server Direct Agent Configurations •  Management Server Indirect Agent Configurations •  Secondary Storage Scalability Configuration and tuning •  Management Server Jobs and their frequency

OS Configurations

"When I try to create 50 VMs for 50 accounts using cloudmonkey async requests my installation after some operations end up with stuck management-server - seems like it's working (logs are filling with new rows), but at the same time it's doing nothing - no VM creations, UI acting weird, API returns internal server error, routers stuck in "starting” state, etc. Also I can't restart it in "normal" way only with killing the java process. When I add delay for a minute between VMs deployment CS is doing much better and all routers + VMs are created successfully.”

"java.lang.OutOfMemoryError: unable to create new native thread"

To fix this, add the following lines to /etc/security/ limits.conf cloud hard nofile 4096 cloud soft nofile 4096 To fix this, add the following lines to /etc/security/ limits.d/90-nproc.conf cloud soft nproc 8192 cloud hard nproc 8192 root soft nproc -1 root hard nproc -1

Management Server DB Configuration

“A cloudstack Simulator based environment with 4K hosts, 4K accounts, 12K VMs, 8K router VMs, 2 management server nodes, 8G heap size started showing hosts in alert and disconnected state after 20 minutes of operation.” “Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object” The root cause was found to be less number of active database connection The configuration that determines the maximum active connection is 'db.cloud.maxActive'. The default value for this is 250. To address this issue we configured this to 1000 (db.properties file).

MySQL default configuration ‘max_connections’ is 214. Total of ‘maxActive’ parameter setting across management servers should not exceed MySQL's max_connections value. If you have 2 management server nodes with ‘maxActive’ as 250 then max_connection should be atleast 500. If maximum number of open files allowed is too small , default 1024 then the my.cnf changes (max_connections=500) will be ignored. Fix: /etc/security/limits.conf mysql hard nofile 4096 mysql soft nofile 4096 my.cnf [mysqld] open_files_limit = 4096 max_connections = 500

Management Server Direct Agent Configurations

Direct Agent Configurations •  direct.agent.load.size •  direct.agent.scan.interval •  direct.agent.pool.size •  direct.agent.thread.cap

“Host reconnect taking too long. Takes 30 min to reconnect 300 hosts” direct.agent.load.size Default : 16 Purpose: Used for handling connect/disconnect for direct agents. This is used when a new host (managed by direct agents) gets added or removed and also when MS is restarted direct.agent.scan.interval Default : 90 Purpose: Interval between scans to load agents Every 90 sec, Agent scan task looks for 16 unmanaged hosts and tries to reconnect. To make this faster we have 2 options. •  Decrease scan interval •  increase batch size 90 sec is decent scan interval so better to increase the batch size.

‘direct.agent.load.size’ should be increased to enable faster reconnection of hosts on restarts. Set this to a higher value as the number of hosts increases.

direct.agent.load.size

Number of Hosts

16

Default

CloudStack Tuning - Linux Foundation Events

CloudStack Tuning - Linux Foundation Events

Suggest Documents

Containers/Docker with CloudStack - Linux Foundation Events

kpatch - Linux Foundation Events

OpenConfig - Linux Foundation Events

sched_deadline - Linux Foundation Events

Desktop Linux Distribution - Linux Foundation Events

DTrace on Linux - Linux Foundation Events

Desktop Linux Distribution - Linux Foundation Events

Ftrace Linux Kernel Tracing - Linux Foundation Events

Microservices Modularity - Linux Foundation Events

meta-ivi - Linux Foundation Events

Mesos Networking - Linux Foundation Events

Container mechanics - Linux Foundation Events

MS Cluster on KVM - Linux Foundation Events

KVM Live Snapshot support - Linux Foundation Events

Storage Management - Linux Foundation Events [PDF]

x86 Instruction Encoding - Linux Foundation Events

Design of Vhost-pci - Linux Foundation Events

Open source @ scale - Linux Foundation Events

Vhost and VIOMMU - Linux Foundation Events

LinuxCon Europe 2014 - Linux Foundation Events

Fujitsu Standard Tool - Linux Foundation Events

Open source @ scale - Linux Foundation Events

QEMU interface introspection - Linux Foundation Events

ApacheConBigData - Introducing Apache ... - Linux Foundation Events