MySQL Performance monitoring using Statsd and Graphite - Percona

4 downloads 235 Views 11MB Size Report
Include other metrics into your graphs. • Deployments. • Failover(s) ... Graph Explorer (vimeo). • Team Dashboard
MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering

Overview 1.  2.  3.  4.  5.  6.  7. 

Who  are  we?   What  monitoring  tools  do  we  use?   What  are  StatsD,  Collectd  and  Graphite?   How  MySQL  logs  to  StatsD   Graphing  examples   Challenges   QuesHons?  

2  

Who are we? Who  is  Spil  Games?    

Facts Company  founded  in  2001   350+  employees  world  wide   180M+  unique  visitors  per  month   Over  60M  registered  users   45  portals  in  19  languages   •  Casual  games   •  Social  games   •  Real  Hme  mulHplayer  games   •  Mobile  games   •  35+  MySQL  clusters   •  60k  queries  per  second  (3.5  billion  qpd)   •  •  •  •  • 

4  

Geographic Reach 180  Million  Monthly  AcHve  Users(*)  

 

Source:  (*)  Google  Analy3cs,  August  2012  

5  

Brands Girls,  Teens  and  Family    

spielen.com   juegos.com   gamesgames.com   games.co.uk   6  

Monitoring We  use(d)  many  many  many   monitoring  tools  so  far!    

Existing monitoring systems we use(d) •  •  •  • 

Opsview/Nagios  (mainly  availability)   CacH  (using  Baron  Schwartz/Percona  templates)   MONYog   Good  ol’  RRD  

8  

Challenges •  Problems  with  exisHng  systems   •  Stats  gathering  through  polling   •  Data  gets  averaged  out   •  (Host)  checks  are  run  serial   •  Slowdowns  in  a  run  means  no/less  data   •  Secng  up  an  SSH  connecHon  is  slow   •  Low  granularity  (1  to  5  minutes)   •  Hardly  scalable   •  Difficult  to  correlate  metrics   9  

Difficult to add a new metric host065! bash-3.2# netstat -s | grep "listen queue"!     26 times the listen queue of a socket overflowed! ! host066! bash-3.2# netstat -s | grep "listen queue"!     33 times the listen queue of a socket overflowed!

10  

Statsd + Collectd + Graphite What  are  they?    

What is Collectd? •  •  •  • 

Unix  daemon  that  gathers  system  staHsHcs   Over  90  (input/output)  plugins   Plugin  to  send  metrics  to  Graphite/Carbon   Very  useful  for  system  metrics  

12  

Collectd Collectd  

Carbon  

TCP  

30 second interval

Gather  data  plugins  

CPU  

DISK  

LOAD  

13  

….  

What is StatsD? •  •  •  •  •  •  • 

Front-­‐end  proxy  for  Graphite/Carbon  (by  Etsy)   NodeJS  daemon  (also  other  languages)   Receives  UDP  (on  localhost)   Buffers  metrics  locally   Flushes  periodically  data  to  Graphite/Carbon  (TCP)   Client  libraries  available  in  about  any  language   Send  any  metric  you  like!  

14  

StatsD functions •  StatsD  funcHons   •  update_stats   •  increment/decrement   •  set   •  gauge   •  Hmers  

15  

StatsD Bash examples echo  ”some.metric:1|c"  |  nc  -­‐w  1  -­‐u  graphite.host  8125! ! echo  ”some.metric:1|c"  >  /dev/udp/localhost/8125! ! bash-3.2# netstat -s | grep "listen"!     26 times the listen queue of a socket overflowed! ! netstat -s | grep "listen" | awk '{print "hostname.listen.queue.overflowed:"$1"|c"}’ > ! /dev/udp/localhost/8125! ! hostname.listen.queue.overflowed:26|c! ! echo "show global status" | mysql -u root | awk '{print "hostname.mysql.status."$1":"$2"|c"}'!

! !

16  

StatsD StatsD  

2 second interval

localhost:8125 UDP ApplicaHon  Level  

#  OF  LOGINS  

Carbon  

TCP  

MySQL_Statsd  

CACHE  HIT/MISS  

STATUS  

17  

INNODB  STATUS  

What is Graphite? •  Highly  scalable  real-­‐Hme  graphing  system   •  Collects  numeric  Hme-­‐series   •  Backend  daemon  Carbon   •  Carbon-­‐cache:  receives  data   •  Carbon-­‐aggregator:  aggregates  data   •  Carbon-­‐relay:  replicaHon  and  sharding     •  RRD  or  Whisper  database  

18  

Graphite’s capabilities •  Each  metric  is  in  its  own  bucket   •  Periods  make  folders   •  prod.syseng.mmm..admin_offline   •  Metric  types   •  Counters   •  Gauge   •  RetenHon  can  be  set  using  a  regex   •  [mysql]     •  payern  =  ^prod\.syseng\.mysql\..*$     •  retenHons  =  2s:1d,1m:3d,5m:7d,1h:5y   19  

Our Graphite environment Client  requesHng  graphs  

Server-­‐1  

Loadbalancer  (port  443)  

Server-­‐2  

Server-­‐n  

Loadbalancer  (port  2003)  

Graphite  Rendering  Cluster  

Carbon  relay  

3 nodes

2 nodes

24h retention

Skyline  

1 node

8 nodes DEV  

SYSENG  

SERVICES1  

20  

SERVICES2  

Our Graphite cluster(s) Client  requesHng  graphs  

Server-­‐1  

12 graphs/s

Loadbalancer  (port  2003)  

Graphite  Rendering  Cluster  

Carbon  relay  

700 get/s

DEV  

Server-­‐n  

a  

Loadbalancer  (port  443)  

250K m/s

Server-­‐2  

3M m(etrics)/s(econd)

1M m/s SYSENG  

1.5M m/s SERVICES1  

21  

500K m/s SERVICES2  

Graphite Storage Clusters

22  

MySQL + StatsD How  do  we  use  them?    

Why use StatsD over Collectd? •  MySQL  plugin  for  Collectd   •  Sends  SHOW  STATUS   •  No  INNODB  STATUS   •  Plugin  not  flexible   •  DBI  plugin  for  Collectd   •  Metrics  based  on  columns   •  Different  granularity  needed   •  Separate  daemon  (with  persistent  connecHon)   •  StatsD  is  easy  as  ABC   24  

MySQL StatsD daemon •  •  •  •  •  •  •  • 

Wriyen  in  Python   Rewriyen  and  open  sourced  during  a  hackday   Gathers  data  every  0.5  seconds   Sends  to  StatsD  (localhost)  a•er  every  run   Easy  configuraHon   Persistent  connecHon   Baron  Schwartz’  InnoDB  status  parser  (cacH  poller)   Other  interesHng  metrics  and  counters   •  InformaHon  Schema   •  Performance  Schema   •  MariaDB  specific   •  Galera  specific   •  If  you  can  query  it,  you  can  use  it  as  a  metric!   25  

MySQL StatsD overview StatsD   MySQL  

SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS SHOW ENGINE INNODB STATUS

StatsD  thread  

MySQL  Thread  

MySQL  StatsD  daemon  

26  

Example configuration [daemon]   logfile  =  /var/log/mysql_statsd/daemon.log   pidfile  =  /var/run/mysql_statsd.pid     [statsd]   host  =  localhost   port  =  8125   prefix  =  prd.mysql   include_hostname  =  true     [mysql]   host  =  localhost   username  =  mysqlstatsd   password  =ub3rs3cr3tp@ss!   stats_types  =  status,variables,innodb,commit   query_variables  =  SHOW  GLOBAL  VARIABLES   interval_variables  =  10000   query_status  =  SHOW  GLOBAL  STATUS   interval_status  =  500   query_innodb  =  SHOW  ENGINE  INNODB  STATUS   interval_innodb  =  10000   query_commit  =  COMMIT   interval_commit  =  5000   sleep_interval  =  500       [metrics]   variables.max_connecHons  =  g   status.max_used_connecHons  =  g   status.connecHons  =  c   innodb.spin_waits  =  c  

27  

MySQL Multi Master patch •  •  •  • 

Perl  (Net::Statsd)   Sends  any  status  change  to  StatsD  (localhost)   Non-­‐blocking  (thanks  to  UDP)   Draw  as  infinite  in  Graphite  

28  

Other metrics •  Deployments   •  User  iniHated  acHons   •  Logins   •  High  scores   •  Comments  /  raHngs   •  Images  uploaded   •  Payments   •  ApplicaHon  metrics   •  Error  counts   •  Cache  staHsHcs  (cache  hit/miss)   •  Request  Hmers   •  Image  sizes   29  

Start graphing! Now  it  starts  to  get   interes=ng!  

What is important for you? •  IdenHfy  your  KPIs   •  Don’t  graph  everything   •  More  graphs  ==  less  overview   •  Combine  metrics   •  Stack  clusters  

31  

Correlate! •  Include  other  metrics  into  your  graphs   •  Deployments   •  Failover(s)   •  Combine  applicaHon  metrics  with  your  database   •  Other  influences   •  Launch  of  a  new  game   •  Apple  keynotes  

32  

Graphing •  Graphite  Graphing  Engine   •  DIY   •  Giraffe   •  Readily  available  dashboards/tools   •  Graph  Explorer  (vimeo)   •  Team  Dashboard   •  Skyline  (Etsy)   •  Dashing  (Shopify)  

33  

DIY

34  

Giraffe

35  

Graph Explorer

36  

Team Dashboard

37  

Skyline

38  

Dashing

39  

Graphite Graphing Engine •  URI  based  rendering  API   •  Support  for  wildcards  

•  stats.prod.syseng.mysql.*.status.com_select   •  sumSeries  (stats.prod.syseng.mysql.*.status.com_select)     •  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,  4)    

•  Many  funcHons  

•  Nth  percenHle   •  Holt-­‐Winters  Forecast   •  Timeshi•  

40  

Graphite web interface                

41  

Graphite Example URL https://graphitehost/render/? width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias %28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C %22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis %28sumSeries%28stats_counts.prod.syseng.mysql..status.questions%2C %20stats_counts.prod.syseng.mysql.