S3, Akamai, etc.) ⢠Proxy caches - distribute requests ..... MQ clusters have a similar architecture; JBoss messaging
Eugene Ciurana
[email protected] - pr3d4t0r ##java, irc.freenode.net
High-Availability, Fault Tolerance, and Resource Oriented Computing This presentation is available from: http://ciurana.eu/GeeCON-2010 Letʼs move the Java world!
About Eugene...
• • •
15+ years building mission-critical, highavailability systems 14+ years of Java work Open source evangelist
•
Official adoption of open source/Linux at Walmart worldwide
•
State of the art main line of business at the largest companies in the world - not a web guy!
Letʼs move the Java world!
What You’ll Learn...
• • • • • •
Decoupled, event-driven, resource-oriented systems are more flexible Avoid tight, point-to-point integration Enhance JVM-based apps with better domain-specific languages How to move away from monolithic app servers and architectures How to implement event-driven systems based by leveraging existing infrastructure and SOA investment Treat computational resources as addressable entities Balance open source vs. commercial products
•
Letʼs move the Java world!
Very Important!
Please Ask Questions! (don’t be shy)
Letʼs move the Java world!
What is Scalability?
• •
Scalability is the property of a system to: handle bigger amounts of work; or to be easily expanded in response to increased demand network, processing, database, file resources Types of scalability Horizontal (out): add more nodes with identical functionality as existing ones and redistribute the load Vertical (up): expand by adding more cores, main memory, storage, or network interfaces
• •
•
• •
Letʼs move the Java world!
Horizontal Scalability Load Balancer
Node
Node
Node
Scales out
Load Balancer
Node
Node
Node
Node
Clustering! Letʼs move the Java world!
Vertical Scalability Virtual Node 3 Virtual Node 2 Virtual Node 2 Virtual Node 1 Virtual Node 1 Scales up Virtual Node 0
Virtual Node 0
Dual Core Single Processor 16 MB RAM
Dual Core Dual Processor 32 MB RAM
Letʼs move the Java world!
What is Availability?
• •
How well a system provides useful resources over a set period of time High availability guarantees an absolute degree of functional continuity within a time window Expressed as a relationship between uptime and unplanned downtime A = 100 - (100*D/U); D, U expressed in minutes Beware: uptime != available
• • •
Letʼs move the Java world!
The Nines Game Availability %
Downtime (minutes)
Downtime/year
Vendor jargon
90
52560.00
36.5 days
one nine
99
5256.00
3.7 days
two nines
99.9
526.60
8.8 hours
three nines
99.99
52.56
53 minutes
four nines
99.999
5.26
5.3 minutes
five nines
99.9999
0.53
32 seconds
six nines
Letʼs move the Java world!
Service Level Agreements • • • • • • •
SLAs are negotiated terms that outline the obligations of the two parties delivering and using a system System type - not all systems require the same SLA Levels of availability Minimum Target SLAs help Uptime determine if Network you scale up Power or out Maintenance windows Serviceability Performance and metrics Billing
• • • • •
Letʼs move the Java world!
Load Balancers • • • • •
They work by spreading requests among two or more resources Implemented in hardware or in software Multiple machines Multiple processes Multiple threads Resources appear as a single device to consumers Can be stateless (web services), or stateful (applications that require session management) Algorithms determine the distribution 1/n == all systems equally likely to service Special requests (e.g. music store) some servers get hit more than others
• • •
• •
Letʼs move the Java world!
Load Balancers Consumer
Rn
R = request n = sequence number
Load Balancer 74.0.125.28 R1
Node 192.168.202.55
R3
Node 192.168.202.66
R2
Node 192.168.202.67
Node 192.168.202.69
Letʼs move the Java world!
Persistent Load Balancers Consumer
Consumer
Consumer
Sticky Load Balancer 74.0.125.28
Node 192.168.202.55
Node 192.168.202.66
Node 192.168.202.67
Node 192.168.202.69
Letʼs move the Java world!
Load Balancing and Databases Consumer
Load Balancer 74.0.125.28
Node 192.168.202.55
Node 192.168.202.66
Node 192.168.202.67
Node 192.168.202.69
Session Data
Letʼs move the Java world!
Caching Strategies
• • • •
Stateful load balancing requires data sharing Caching distributes popular, shared read-only data Think of them as a giant hash map If the data isn’t in the cache, fetch it from database Write policies: write-through: write to the cache AND database write-behind: cache is marked “dirty” and updated only if a dirty datum is requested no-write allocation: only read requests are cached; assumes data never changes
• • • •
Letʼs move the Java world!
Caching Usage Pattern
• •
Application caching Little or no programmer participation (e.g. Terracotta) Explicit API calls (memcached, Coherence, etc.) Web caching - stores full documents, or fragments (‘particles’) on the server or client and are invisible to the client Web accelerators - distribute the load (e.g. CDN like S3, Akamai, etc.) Proxy caches - distribute requests to same resources and may provide filtering/query (e.g. Squid, Apache, ISA servers)
• • • •
Letʼs move the Java world!
Caching Usage Pattern Begin
query
update Query?
Fetch datum from cache Update datum in database datum is None
no
yes
Invalidate cache
Query datum from database Add or update datum to cache Add datum to cache
Use datum in app
End
Letʼs move the Java world!
Distributed Caching Consumer
Load Balancer 74.0.125.28
Node 192.168.202.55
Node 192.168.202.66
Node 192.168.202.67
Node 192.168.202.69
Load Balanced Configuration or Datagram Cache 0
Cache 1
Cache 2
Cache 3
Database
Letʼs move the Java world!
Clustering
• • •
Cluster - two or more systems that appear to users as a single system A cluster (horizontally scalable) system is more costeffective than a monolithic single system (vertically scalable) with the same performance characteristics Systems are connected in the cluster over high-speed LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.
Letʼs move the Java world!
A/A Clustering
• • • •
A/A == Active/Active Distribute the load evenly among multiple nodes All nodes offer the same capabilities All nodes are active at the same time Consumer
Load Balancer 74.0.125.28
Node 192.168.202.55
Node 192.168.202.66
Node 192.168.202.67
Node 192.168.202.69
Letʼs move the Java world!
High-Availability A/P Cluster
• • • • •
A/P == Active/Passive Provides uninterrupted service through redundant nodes Eliminates single-point-of-failure Two nodes minimum, and “heartbeat” detection Automatic traffic switch for fail-over Consumer
Router 74.0.125.28
Active Node 192.168.202.55
heartbeat
Failover Node 192.168.202.69
State Data Cache
Database
replication or clustered database
Letʼs move the Java world!
Failover Database
Grid •
Consumer
•
Master
Load Balancer
Node
Node
• Node
Node
•
Load Balancer
Node
Node
Node
Node
• •
Process loads as independent jobs Nodes don’t require data sharing Storage, network may be shared by all nodes Intermediate results have no bearing on other jobs progress Each node is independent Map/Reduce (Hadoop)
Letʼs move the Java world!
Computational Cluster
• • • •
Used for operations that require raw computational power Not good for transactional operations (web, database) Tightly coupled nodes, homogeneous, close proximity Meant to replace supercomputers Consumer
Master
Node
Node
Node
Node
Node
Node
Node
Node
Letʼs move the Java world!
Redundancy and Fault Tolerance
• •
Redundancy - the expectation that any system component failure is independent of failure in other components Fault tolerance - the system continues to operate in the event of component failure May have decreased throughput
•
Fault tolerance results from SLAs
Letʼs move the Java world!
Fault Tolerance SLA Requirements • No single point of failure - redundant components
• • •
ensure continuous operation Allow repairs without disruption of service Fault isolation - problem detection must pinpoint the specific faulty component Fault propagation containment - problems in one component must not cascade to others Reversion mode - the system can be set back to a known state on command
•
Letʼs move the Java world!
A/A Cluster Fault Tolerance Consumer
Load Balancer 74.0.125.28
Replacement Node 192.168.202.53
• • •
Node 192.168.202.55
Node 192.168.202.66
Node 192.168.202.67
Node 192.168.202.69
Uninterruptible, scalable service (stateless, web services) Failure transparency - though maybe degraded service Ideal for event-based web services (SOAP, REST, JMS, etc.) No dependencies between nodes
•
Letʼs move the Java world!
A/P Cluster Fault Tolerance Consumer
Router 74.0.125.28
Node 192.168.202.55
heartbeat
Failover Node 192.168.202.69
State Data Cache
Database
• • • •
Failover Database
High availability through redundancy and failure detection Higher cost - used for stateful systems May require active sys- or netadmin participation More moving parts - more things to coordinate Letʼs move the Java world!
Putting It All Together
Letʼs move the Java world!
ROC Architecture
• •
ROC = Resource-Oriented Computing Everything is a resource (computational, data, other) Service Provider (UPS, FedEx)
Web browser Service Object
Remedy
business logic Web app
Internet
GUI App
Dedicated API JMS, SOAP, etc.
Transformer
Transformer
Mule ESB Transformer
SOAP
JDBC
CRM
Product Catalogue
HTTP, XML Product Product Support Product Support Pages Support Pages Pages
TCP pass-through
Single Sign-On LDAP, SOAP
Mainframe / RACF
Active Directory
Legacy Auth
Letʼs move the Java world!
SOA and Computational Network
Letʼs move the Java world!
Real-Life Example - LeapFrog End-User System (Mac, Windows)
USB
LeapFrog Connect
Web Browser
S3 Content Repository
Third-party Partner Site Internet
www.leapfrog.com
connected products
LearningPath
Firewall Mule ESB backbone HTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services
Mule ESB tailbone
Mule ESB funnybone
Connected products SOAP, REST web services
Device log upload, processing, servlet container
Customer Data
Game play Data
Servlets App Logic
Device Logs
Content Management System REST, JCR
Crowd SSO
Content Authoring
User Credentials
Letʼs move the Java world!
Real-Life Example - LeapFrog Internet
Load Balancer
Application Server Tomcat 6
Application Server Tomcat 6
Services Proxy
Load Balancer - Backbone
Backbone - message filtering, routing, dispatching, queuing, events Mule ESB 1.6.2
Load Balancer - Tailbone
Mule ESB SOAP, REST
Mule ESB SOAP, REST
Database
Mule ESB 1.6.2
Mule ESB 1.6.2
Load Balancer - Funnybone
Mule ESB servlet, MTOM
Mule ESB servlet, MTOM
NFS share
Mule ESB 1.6.2
Load Balancer - Message Broker
ActiveMQ
ActiveMQ
NFS share
Letʼs move the Java world!
Mule SOA Applied Clustering * Two or more Mule instances can provide services, for scalability if there is high demand * Load balanced configuration has built-in fail-over * External apps see a single point of entry: the service endpoint name * Load balancer or proxy sends the request to any available Mule server * Increased demand - add another Mule server without interrupting the existing ones * Decreased demand - remove Mule servers without interrupting other servers * This is an active/active configuration - any server can handle a request at any time * Assumes that the service application components are stateless External Applications
http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call
http://mule_server_2/service_call
Mule ESB as Application Container 1
Mule ESB as Application Container 2
Service 1
Service 2
Service 3
Service 1
Service 2
Letʼs move the Java world!
Service 3
Mule SOA - ESB App Failover * A/A configuration uses the load balancer to dispatch service calls * The load balancer takes a failing service out of rotation automatically * Failure reason no. 1: network connectivity * Failure reason no. 2: Mule container * Failure reason no. 3: Service application bug
External Applications
http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call
http://mule_server_2/service_call
Mule ESB as Application Container 1
Mule ESB as Application Container 2
Service 1
Service 2
Service 3
Service 1
Service 2
Letʼs move the Java world!
Service 3
Uninterrupted Application Updates * Allow stopping and deploying new application functionality without stopping services * Allow upgrades to a country's configuration without affecting other countries or stopping services Load Balancer
Mule ESB as Application version 1.4
Mule ESB as Application version 1.4
Load Balancer
Mule ESB as Application version 1.4
time
Mule ESB as Application version 2.0
Load Balancer
Mule ESB as Application version 2.0
Mule ESB as Application version 1.4
Load Balancer
Mule ESB as Application version 2.0
Mule ESB as Application version 2.0
Letʼs move the Java world!
Database Replication Primary Cluster
Node 0
Node 1
ESB as app services provider
Partition 0
Partition 1
DB 0
DB 1
DB 0b
DB 1b
Letʼs move the Java world!
Application Deployment Load Balancer
Mule 1
Load Balancer
Mule 2
Mule 3
JMS Queuing Active
Mule 4
JMS Queuing Active
Letʼs move the Java world!
Mule 5 Failover
Application Deployment This architecture has a lower cost of operation and simplifies power consumption and administration.
Application 1
Application 2
Web Service 1
Web Service 2
JBoss
Mule ESB Container
MQ
Java 6
Java 6
Java 6
Linux
Linux
Linux
Virtual Machine
Virtual Machine
Virtual Machine
Multi-Core Intel or AMD Processors
Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data centers. * Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative * Linux: Ubuntu Server * PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration * All web services are hosted by Mule ESB * The Mule ESB and JBoss servers are separate from one another * MQ clusters have a similar architecture; JBoss messaging and Websphere MQ * Java 6 as a minimum
Letʼs move the Java world!
Application Deployment App and service requests may come from the open Internet
Each data center will have a cluster of two or more physical systems. Each system will virtually host two or more applications/ environments deployed as described in the previous diagram.
Internet
The system is designed for horizontal scalability (more traffic, more virtual or physical servers. The system has inherent fail-over built in. App Balancer
Use physical load balancers; can be Linux systems or dedicated F5 balancers - separate from cluseter
Services Balancer
MQ Master
Web Services Active
Application Active
MQ Slave
Distributed Cache
Virtual Host (Intel, AMD)
Application Active
Web Services Active
Virtual Host (Intel, AMD)
Disk Disk SAN
Letʼs move the Java world!
Distributed Cache
Application Deployment Data Center Japan Data Center Europe
App Cluster App Cluster
Internet App Cluster
App Cluster
Expert
Claims Mgmt
Data Center US
App Cluster App Cluster
Each data center has an application cluster
Claims Mgmt
The app clusters have identical configurations; only the app itself may vary by locale
Informix
Designated data center also functions as the global services processing hub; all applications talk to this cluster (e.g. Claims Management) regardless of where the app calling them is from.
Legacy System Legacy System Legacy System
The global services clusters are separate physically and logically from the application clusters which may include locale-specific web services and data stores.
Letʼs move the Java world!
Application Deployment Primary Cluster
Node 0
Secondary Cluster
Node 1
ESB as app services provider
Node 0
q
u
e
u
e
Node 1
ESB as app services provider
Partition 0
Partition 1
Partition 0
Partition 1
DB 0
DB 1
DB 0
DB 1
DB 0b
DB 1b
DB 0b
DB 1b
Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)
Letʼs move the Java world!
Eugene Ciurana
[email protected] - pr3d4t0r ##java, irc.freenode.net
http://ciurana.eu/scalablesystems
Q&A Comments? Anything else? This presentation is available from: http://ciurana.eu/GeeCON-2010 Twitter: ciurana Letʼs move the Java world!