High-Availability, Fault Tolerance, and Resource Oriented ... - GeeCON

3 downloads 128 Views 3MB Size Report
S3, Akamai, etc.) • Proxy caches - distribute requests ..... MQ clusters have a similar architecture; JBoss messaging
Eugene Ciurana

[email protected] - pr3d4t0r ##java, irc.freenode.net

High-Availability, Fault Tolerance, and Resource Oriented Computing This presentation is available from: http://ciurana.eu/GeeCON-2010 Letʼs move the Java world!

About Eugene...

• • •

15+ years building mission-critical, highavailability systems 14+ years of Java work Open source evangelist



Official adoption of open source/Linux at Walmart worldwide



State of the art main line of business at the largest companies in the world - not a web guy!

Letʼs move the Java world!

What You’ll Learn...

• • • • • •

Decoupled, event-driven, resource-oriented systems are more flexible Avoid tight, point-to-point integration Enhance JVM-based apps with better domain-specific languages How to move away from monolithic app servers and architectures How to implement event-driven systems based by leveraging existing infrastructure and SOA investment Treat computational resources as addressable entities Balance open source vs. commercial products



Letʼs move the Java world!

Very Important!

Please Ask Questions! (don’t be shy)

Letʼs move the Java world!

What is Scalability?

• •

Scalability is the property of a system to: handle bigger amounts of work; or to be easily expanded in response to increased demand network, processing, database, file resources Types of scalability Horizontal (out): add more nodes with identical functionality as existing ones and redistribute the load Vertical (up): expand by adding more cores, main memory, storage, or network interfaces

• •



• •

Letʼs move the Java world!

Horizontal Scalability Load Balancer

Node

Node

Node

Scales out

Load Balancer

Node

Node

Node

Node

Clustering! Letʼs move the Java world!

Vertical Scalability Virtual Node 3 Virtual Node 2 Virtual Node 2 Virtual Node 1 Virtual Node 1 Scales up Virtual Node 0

Virtual Node 0

Dual Core Single Processor 16 MB RAM

Dual Core Dual Processor 32 MB RAM

Letʼs move the Java world!

What is Availability?

• •

How well a system provides useful resources over a set period of time High availability guarantees an absolute degree of functional continuity within a time window Expressed as a relationship between uptime and unplanned downtime A = 100 - (100*D/U); D, U expressed in minutes Beware: uptime != available

• • •

Letʼs move the Java world!

The Nines Game Availability %

Downtime (minutes)

Downtime/year

Vendor jargon

90

52560.00

36.5 days

one nine

99

5256.00

3.7 days

two nines

99.9

526.60

8.8 hours

three nines

99.99

52.56

53 minutes

four nines

99.999

5.26

5.3 minutes

five nines

99.9999

0.53

32 seconds

six nines

Letʼs move the Java world!

Service Level Agreements • • • • • • •

SLAs are negotiated terms that outline the obligations of the two parties delivering and using a system System type - not all systems require the same SLA Levels of availability Minimum Target SLAs help Uptime determine if Network you scale up Power or out Maintenance windows Serviceability Performance and metrics Billing

• • • • •

Letʼs move the Java world!

Load Balancers • • • • •

They work by spreading requests among two or more resources Implemented in hardware or in software Multiple machines Multiple processes Multiple threads Resources appear as a single device to consumers Can be stateless (web services), or stateful (applications that require session management) Algorithms determine the distribution 1/n == all systems equally likely to service Special requests (e.g. music store) some servers get hit more than others

• • •

• •

Letʼs move the Java world!

Load Balancers Consumer

Rn

R = request n = sequence number

Load Balancer 74.0.125.28 R1

Node 192.168.202.55

R3

Node 192.168.202.66

R2

Node 192.168.202.67

Node 192.168.202.69

Letʼs move the Java world!

Persistent Load Balancers Consumer

Consumer

Consumer

Sticky Load Balancer 74.0.125.28

Node 192.168.202.55

Node 192.168.202.66

Node 192.168.202.67

Node 192.168.202.69

Letʼs move the Java world!

Load Balancing and Databases Consumer

Load Balancer 74.0.125.28

Node 192.168.202.55

Node 192.168.202.66

Node 192.168.202.67

Node 192.168.202.69

Session Data

Letʼs move the Java world!

Caching Strategies

• • • •

Stateful load balancing requires data sharing Caching distributes popular, shared read-only data Think of them as a giant hash map If the data isn’t in the cache, fetch it from database Write policies: write-through: write to the cache AND database write-behind: cache is marked “dirty” and updated only if a dirty datum is requested no-write allocation: only read requests are cached; assumes data never changes

• • • •

Letʼs move the Java world!

Caching Usage Pattern

• •

Application caching Little or no programmer participation (e.g. Terracotta) Explicit API calls (memcached, Coherence, etc.) Web caching - stores full documents, or fragments (‘particles’) on the server or client and are invisible to the client Web accelerators - distribute the load (e.g. CDN like S3, Akamai, etc.) Proxy caches - distribute requests to same resources and may provide filtering/query (e.g. Squid, Apache, ISA servers)

• • • •

Letʼs move the Java world!

Caching Usage Pattern Begin

query

update Query?

Fetch datum from cache Update datum in database datum is None

no

yes

Invalidate cache

Query datum from database Add or update datum to cache Add datum to cache

Use datum in app

End

Letʼs move the Java world!

Distributed Caching Consumer

Load Balancer 74.0.125.28

Node 192.168.202.55

Node 192.168.202.66

Node 192.168.202.67

Node 192.168.202.69

Load Balanced Configuration or Datagram Cache 0

Cache 1

Cache 2

Cache 3

Database

Letʼs move the Java world!

Clustering

• • •

Cluster - two or more systems that appear to users as a single system A cluster (horizontally scalable) system is more costeffective than a monolithic single system (vertically scalable) with the same performance characteristics Systems are connected in the cluster over high-speed LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.

Letʼs move the Java world!

A/A Clustering

• • • •

A/A == Active/Active Distribute the load evenly among multiple nodes All nodes offer the same capabilities All nodes are active at the same time Consumer

Load Balancer 74.0.125.28

Node 192.168.202.55

Node 192.168.202.66

Node 192.168.202.67

Node 192.168.202.69

Letʼs move the Java world!

High-Availability A/P Cluster

• • • • •

A/P == Active/Passive Provides uninterrupted service through redundant nodes Eliminates single-point-of-failure Two nodes minimum, and “heartbeat” detection Automatic traffic switch for fail-over Consumer

Router 74.0.125.28

Active Node 192.168.202.55

heartbeat

Failover Node 192.168.202.69

State Data Cache

Database

replication or clustered database

Letʼs move the Java world!

Failover Database

Grid •

Consumer



Master

Load Balancer

Node

Node

• Node

Node



Load Balancer

Node

Node

Node

Node

• •

Process loads as independent jobs Nodes don’t require data sharing Storage, network may be shared by all nodes Intermediate results have no bearing on other jobs progress Each node is independent Map/Reduce (Hadoop)

Letʼs move the Java world!

Computational Cluster

• • • •

Used for operations that require raw computational power Not good for transactional operations (web, database) Tightly coupled nodes, homogeneous, close proximity Meant to replace supercomputers Consumer

Master

Node

Node

Node

Node

Node

Node

Node

Node

Letʼs move the Java world!

Redundancy and Fault Tolerance

• •

Redundancy - the expectation that any system component failure is independent of failure in other components Fault tolerance - the system continues to operate in the event of component failure May have decreased throughput



Fault tolerance results from SLAs

Letʼs move the Java world!

Fault Tolerance SLA Requirements • No single point of failure - redundant components

• • •

ensure continuous operation Allow repairs without disruption of service Fault isolation - problem detection must pinpoint the specific faulty component Fault propagation containment - problems in one component must not cascade to others Reversion mode - the system can be set back to a known state on command



Letʼs move the Java world!

A/A Cluster Fault Tolerance Consumer

Load Balancer 74.0.125.28

Replacement Node 192.168.202.53

• • •

Node 192.168.202.55

Node 192.168.202.66

Node 192.168.202.67

Node 192.168.202.69

Uninterruptible, scalable service (stateless, web services) Failure transparency - though maybe degraded service Ideal for event-based web services (SOAP, REST, JMS, etc.) No dependencies between nodes



Letʼs move the Java world!

A/P Cluster Fault Tolerance Consumer

Router 74.0.125.28

Node 192.168.202.55

heartbeat

Failover Node 192.168.202.69

State Data Cache

Database

• • • •

Failover Database

High availability through redundancy and failure detection Higher cost - used for stateful systems May require active sys- or netadmin participation More moving parts - more things to coordinate Letʼs move the Java world!

Putting It All Together

Letʼs move the Java world!

ROC Architecture

• •

ROC = Resource-Oriented Computing Everything is a resource (computational, data, other) Service Provider (UPS, FedEx)

Web browser Service Object

Remedy

business logic Web app

Internet

GUI App

Dedicated API JMS, SOAP, etc.

Transformer

Transformer

Mule ESB Transformer

SOAP

JDBC

CRM

Product Catalogue

HTTP, XML Product Product Support Product Support Pages Support Pages Pages

TCP pass-through

Single Sign-On LDAP, SOAP

Mainframe / RACF

Active Directory

Legacy Auth

Letʼs move the Java world!

SOA and Computational Network

Letʼs move the Java world!

Real-Life Example - LeapFrog End-User System (Mac, Windows)

USB

LeapFrog Connect

Web Browser

S3 Content Repository

Third-party Partner Site Internet

www.leapfrog.com

connected products

LearningPath

Firewall Mule ESB backbone HTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services

Mule ESB tailbone

Mule ESB funnybone

Connected products SOAP, REST web services

Device log upload, processing, servlet container

Customer Data

Game play Data

Servlets App Logic

Device Logs

Content Management System REST, JCR

Crowd SSO

Content Authoring

User Credentials

Letʼs move the Java world!

Real-Life Example - LeapFrog Internet

Load Balancer

Application Server Tomcat 6

Application Server Tomcat 6

Services Proxy

Load Balancer - Backbone

Backbone - message filtering, routing, dispatching, queuing, events Mule ESB 1.6.2

Load Balancer - Tailbone

Mule ESB SOAP, REST

Mule ESB SOAP, REST

Database

Mule ESB 1.6.2

Mule ESB 1.6.2

Load Balancer - Funnybone

Mule ESB servlet, MTOM

Mule ESB servlet, MTOM

NFS share

Mule ESB 1.6.2

Load Balancer - Message Broker

ActiveMQ

ActiveMQ

NFS share

Letʼs move the Java world!

Mule SOA Applied Clustering * Two or more Mule instances can provide services, for scalability if there is high demand * Load balanced configuration has built-in fail-over * External apps see a single point of entry: the service endpoint name * Load balancer or proxy sends the request to any available Mule server * Increased demand - add another Mule server without interrupting the existing ones * Decreased demand - remove Mule servers without interrupting other servers * This is an active/active configuration - any server can handle a request at any time * Assumes that the service application components are stateless External Applications

http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call

http://mule_server_2/service_call

Mule ESB as Application Container 1

Mule ESB as Application Container 2

Service 1

Service 2

Service 3

Service 1

Service 2

Letʼs move the Java world!

Service 3

Mule SOA - ESB App Failover * A/A configuration uses the load balancer to dispatch service calls * The load balancer takes a failing service out of rotation automatically * Failure reason no. 1: network connectivity * Failure reason no. 2: Mule container * Failure reason no. 3: Service application bug

External Applications

http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call

http://mule_server_2/service_call

Mule ESB as Application Container 1

Mule ESB as Application Container 2

Service 1

Service 2

Service 3

Service 1

Service 2

Letʼs move the Java world!

Service 3

Uninterrupted Application Updates * Allow stopping and deploying new application functionality without stopping services * Allow upgrades to a country's configuration without affecting other countries or stopping services Load Balancer

Mule ESB as Application version 1.4

Mule ESB as Application version 1.4

Load Balancer

Mule ESB as Application version 1.4

time

Mule ESB as Application version 2.0

Load Balancer

Mule ESB as Application version 2.0

Mule ESB as Application version 1.4

Load Balancer

Mule ESB as Application version 2.0

Mule ESB as Application version 2.0

Letʼs move the Java world!

Database Replication Primary Cluster

Node 0

Node 1

ESB as app services provider

Partition 0

Partition 1

DB 0

DB 1

DB 0b

DB 1b

Letʼs move the Java world!

Application Deployment Load Balancer

Mule 1

Load Balancer

Mule 2

Mule 3

JMS Queuing Active

Mule 4

JMS Queuing Active

Letʼs move the Java world!

Mule 5 Failover

Application Deployment This architecture has a lower cost of operation and simplifies power consumption and administration.

Application 1

Application 2

Web Service 1

Web Service 2

JBoss

Mule ESB Container

MQ

Java 6

Java 6

Java 6

Linux

Linux

Linux

Virtual Machine

Virtual Machine

Virtual Machine

Multi-Core Intel or AMD Processors

Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data centers. * Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative * Linux: Ubuntu Server * PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration * All web services are hosted by Mule ESB * The Mule ESB and JBoss servers are separate from one another * MQ clusters have a similar architecture; JBoss messaging and Websphere MQ * Java 6 as a minimum

Letʼs move the Java world!

Application Deployment App and service requests may come from the open Internet

Each data center will have a cluster of two or more physical systems. Each system will virtually host two or more applications/ environments deployed as described in the previous diagram.

Internet

The system is designed for horizontal scalability (more traffic, more virtual or physical servers. The system has inherent fail-over built in. App Balancer

Use physical load balancers; can be Linux systems or dedicated F5 balancers - separate from cluseter

Services Balancer

MQ Master

Web Services Active

Application Active

MQ Slave

Distributed Cache

Virtual Host (Intel, AMD)

Application Active

Web Services Active

Virtual Host (Intel, AMD)

Disk Disk SAN

Letʼs move the Java world!

Distributed Cache

Application Deployment Data Center Japan Data Center Europe

App Cluster App Cluster

Internet App Cluster

App Cluster

Expert

Claims Mgmt

Data Center US

App Cluster App Cluster

Each data center has an application cluster

Claims Mgmt

The app clusters have identical configurations; only the app itself may vary by locale

Informix

Designated data center also functions as the global services processing hub; all applications talk to this cluster (e.g. Claims Management) regardless of where the app calling them is from.

Legacy System Legacy System Legacy System

The global services clusters are separate physically and logically from the application clusters which may include locale-specific web services and data stores.

Letʼs move the Java world!

Application Deployment Primary Cluster

Node 0

Secondary Cluster

Node 1

ESB as app services provider

Node 0

q

u

e

u

e

Node 1

ESB as app services provider

Partition 0

Partition 1

Partition 0

Partition 1

DB 0

DB 1

DB 0

DB 1

DB 0b

DB 1b

DB 0b

DB 1b

Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)

Letʼs move the Java world!

Eugene Ciurana

[email protected] - pr3d4t0r ##java, irc.freenode.net

http://ciurana.eu/scalablesystems

Q&A Comments? Anything else? This presentation is available from: http://ciurana.eu/GeeCON-2010 Twitter: ciurana Letʼs move the Java world!

Suggest Documents