f t) computing (15mins, demo only because of cost). â¡ Deploying GEOSS clearinghouse onto GMU CISC cloud ti (15 i + 30
Qunying Huang, Phil Yang, Hannes Wu, Kai Liu, Jing Li Center for Intelligent Spatial Computing George Mason University & NASA http://cisc.gmu.edu/
Agenda
Cloud computing for Earth Science Presentation
((45mins)) Question &Discussion (15 mins )
Demos Deploying
GEOSS clearinghouse onto Amazon cloud computing ti (15mins, (15 i d demo only l because b off cost) t) Deploying GEOSS clearinghouse onto GMU CISC cloud computing ti (15mins (15 i + 30 user iinteraction) t ti )
Qunying Huang, Chaowei Yang, Huayi Wu, Kai Liu, Jing Li Joint Center of Intelligent Computing George Mason University ESIP Summer Meeting J l 20th , 2010 July Contact: http://cisc.gmu.edu
[email protected]
Outline
Introduction
Cloud computing definition
Cl d computing Cloud i examples l
Cloud computing platform test
Spatial p cloud computing p g
Summary (benefit & future research directions)
Introduction
Many scientific problems are data and computational intensive
Hi h performance High f computing ti supportt
Distributed computing
Grid computing
Cloud Computing
The growth of cloud computing From http://www.zdnet.com/blog/hichecliffe
Cloud Computing
D fi iti Definition
A computing Cloud is a set of network enabled services, providing scalable QoS guaranteed, scalable, guaranteed inexpensive computing platforms on demand, which could be accessed in a simple and pervasive way. (Liu and Orban, 2008) Mobile device
Client Computer
Database/Storage
Application Servers/ Cluster
Cloud Computing Software as a Service (SaaS) • Almost any IT services • Users: End-user
Pl f Platform as a Service S i (PaaS) (P S) • Platform for developing and delivering pp , abstracted from infrastructures applications, • Users: Developer ᄎ
Infrastructure as a Service (IaaS) • On-demand sharing physical infrastructures • Users: System Administrator
Cloud computing
D fi i characteristics Defining h i i On-demand self-service Multi-tenancy M li Measured Services Device and Location independent resource pooling Rapid elasticity
Enabling technologies Virtualization Web 2.0 Web service & SOA World-wide distributed storage & file system Parallel & distributed programming model
Virtualization
Foundation of cloud Isolated runtime environment Disaster recovery Hide heterogeneity g y of the infrastructure Allow partitioning and isolating of physical resources
Full virtualization Para-virtualization H d Hardware virtualization/hardware i li i /h d assisted i d virtualization. i li i
Virtualization const Virtualization-const
Xen
Para-virtualization.
Amazon EC2 , GoGrid,, 21vianet CloudEx , RackSpace Mosso
Hardware d virtualization i li i
WAH
Para-virtualization: P i t li ti Workstation W k t ti product d t Full virtualization: Vmware ESX Server AT&T Synaptic, Verizon CaaS
Qemu/VirtualBox KVM(Kernel-based Virtual Machine)
VM
VMware
VM
Microsoft Azure
Accelerator
Joyent
Hypervisor Hardware
VM
Virtual Infrastructure Middleware (VIM)
Provides a uniform view of the resource pool Place and replace VM dynamically on a pool of physical infrastructures
Virtual Machine
VIM (OpenNebula, Eucalyptus Nimbus, Hadoop ) Hypervisor
Hypervisor Hypervisor Hypervisor
Scheduling & monitoring Networking Life-cycle Life cycle management and monitoring of VM
Physical Infrastructure
Amazon Cloud Services
Elastic Compute Cloud – EC2 (IaaS)
Simple Storage Service – S3 (IaaS)
Elastic Block Storage g – EBS ((IaaS))
SimpleDB (SDB) (PaaS)
Simple Queue Service – SQS (PaaS)
CloudFront (S3 based Content Delivery Network – PaaS)
Consistent AWS Web Services API
Amazon EC2
A “W “Webb service i that th t provides id resizable i bl compute t capacity it in i the th cloud” EC2 saves a bootable VM root image as an “Amazon Amazon Machine Image” (AMI).
Instances Elastic Block Storage(EBS) XEN Virtualization Hosting of Virtual machine images(AMI)
Physical Server
Simple Si l Storage Service (S3)
Hosting of Virtual machine images(AMI)
How to Deploy Applications on Amazon EC2
Prepare a AMI
From scratchh
Based on a ppublic AMI and customize
Launch the AMI as a Amazon EC2 instance
Access the instance through SSH
Configure/Run applications
Register as a new AMI
The GEOSS Clearinghouse
Metadata catalogues search facility EO O da data, a, services, se v ces, and a d related e a ed resources esou ces can ca be discovered d scove ed and accessed.
Deployment of GEOSS Clearinghouse on Amazon EC2
Launch an CentOS AMI Authorize Network Access SSH the Amazon EC2 instance Transfer the GEOSS Clearinghouse codes into the virtual server I t ll Postgres/postgis Install P t / t i Restore the GEOSS Clearinghouse database Install tomcat, Jetty or other servlet container Configure servlet container Start the servlet container
Amazon EC2 Standard Linux Instance Types
Amazon EC2 High-Memory Linux Instance Types Type
CPU
Memory Storage Platform I/O
AWS Name
Cost
HighMemory Extra Large
6.5 ECU (2 virtual cores with 3.25 EC2 Compute Units each)
17.1 GB
420 GB
64-bit
High
m2.xlarge
$0.50 per hour
HighMemory Double Extra Large
13 EC2 Compute 34.2 GB Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB
64-bit
High
m2.2xlarg e
$1.20 per hour
HighMemory Quadruple Extra Large
26 6 EC2 Compute 6 68.4 GB Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 6 GB
6 64-bit
High
m2.4xlarg e
$2.40 $ per hour
Amazon EC2 High-CPU Linux Instance Types Type
CPU
Memory Storage Platform I/O
AWS Name
Cost
High-CPU Medium
5 ECU (2 virtual cores with 2.5 EC2 Compute p Units each)
1.7 GB
370 GB
32-bit
Medi um
c1.medium
$0.17p er hour
High-CPU Extra Large
20 Compute Units (8virtual cores with 2.5 EC2 Compute Units each)
7.5 GB
1810 GB
64-bit
High
c1.xlarge
$0.68 per hour
Av verage Response TIme
Amazon EC2 Instance Performance Test GetCapabilities
250
200 150 100 50
0 1
20
40
60
80
Large Instance High-Memory Extra Large Instance
Average R Response Improv vement
100
Concurrent Request Number
120
140
160
Extra Large Instance High-CPU Extra Large Instanc
Only One Core of the VM is utilized CPU speed is the primary factor
High-CPU g Medium instance should be used
Only $0.17per hour
Performance Improvement 0.4 0.35 0.3 0 25 0.25 0.2 0.15 0.1 0.05 0
6.5 EC2 Compute Units (2 virtual cores with 3.25EC2 Compute Units each)
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
20
40
6 60
80
100
120
140
160 6
Concurrent Request Number Extra Large Instance
High-Memory Extra Large Instance
High-CPU Extra Large Instance
CISC Cloud Computing Platform
Spatial Cloud Computing Architecture Cloud user Local User and Administrator
Spatial Cloud Portal
Geospatial Middleware
Virtual Infrastructure Middleware
Public cloud(EC2, Elastic Hosts…) L l infrastructure Local i f t t
Geospatial p Middleware
Integrate spatial constraints and principles
Scheduling/resource g allocation
Parallelization: Parallelization Methods/Parallelization Degree
G Geospatial i l capabilities bili i
kernel GIS functions as services
Standardize the interfaces: OGC WPS (Web Processing Service)
Community tools & API
Spatial principles (Yang et al., al 2010)
Physical phenomena are continuous and digital representations are discrete for both space and time
Closer things are more related Multiple scale
Physical phenomena are heterogeneous in space and time
Higher resolution will include more information Phenomena are evolvingg at different speed p The longer or bigger a dynamic process, the more exchanges are needed among neighbors
Application pp Example p 1: Server site selection l
Application Example 2: Parallelization
24 Processors P
Execcution Tim me(mins)
350 300 250 200 150 100 50 0
3*8
4*6 2*12
6*4
8*3
12*2 1*24 24*1
Decompostion Method (Longitude*latitude )
Multilevel Application pp Example p 3: visualization i li i
Cloud computing examples-Deployment of GeoNetwork instance through thro gh CISC Cloud Clo d
Benefit
Integrates a set of open-source components into a seamless,, self-service pplatform. Provides high-capacity computing, storage and network connectivity. connectivity Uses a virtualized, scalable approach to achieve cost and energy efficiencies. Create new opportunities pp for national, international, state, and local partners to leverage research easily
Challenges
Network bottlenecks Data
Performance unpredictability Privacy Scalable storage g
transfer
Amazon EBS
Bugs in large distributed systems
30
Future
IaaS will become increasingly standardized and commoditized Across-Cloud implementations (e.g. AWS and vCloud-based) Across-Cloud tools and middleware will be available to enable interoperability and portability across different cloud
IaaS providers will increasingly add new utilities and PaaS capabilities PaaS will become the battleground g for determining g the future of Cloud Computing PaaS will integrate with applications utilizing mobile devices and sensors
Conclusion Cloudd computing Cl i is i not just j a trendd
We are at a p prescient time
Technologies Cloud Architecture Open data standards Platform independent languages
Spatial Cloud Computing Parallelizing ll li i andd Scheduling S h d li Geospatial Middleware
Reference
Amazon Elastic Compute Cloud (Amazon EC2): http://www.amazon.com/ec2(Access http://www amazon com/ec2(Access June 7, 7 2010) Armbrust, M., Fox, A. and Griffith, R. et al., 2009. Above the Clouds: A Berkeley View of Cloud Computing, Unversity of California, Berkeley, Berkeley, CA, 2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html(accessed March 12, 2010).
Cisco, 2009. http://www.cisco.com/en/US/solutions/collateral/ns340/ns858/Virtualization_Blueprint.pdf
Google App Engine, http://appengine.google.com
Nimbus, The Nimbus Cloud. http://www.nimbusproject.org/
Microsoft Azure, http://www.microsoft.com/azure/ Xie, J., C. Yang, B. Zhou, and Q. Huang. 2009. High performance computing for the simulation of dust storms. In Computers, Environment, and Urban Systems. (In press). OpenNebula, 2010. http://Opennebula.org Wang, S., and Liu, Y. 2009. TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631 – 656. Wiki, 2009. Wiki Cloud Computing. http://en.wikipedia.org/wiki/Cloud_computing(accessed April 29, 2009) Yang, C., H.Wu, Q.Huang, Z.Li and J.Li. 2010, Spatial Computing for Supporting Physical Sciences Proceedings of the National Academy of Science Sciences, Science. (in press). press)
Thank You! Chaowei Yang
[email protected] y g @g
Pointers Portal http://aws.amazon.com Blog http://aws.typepad.com
CISC http://cisc.gmu.edu p g
EC2 http://aws.amazon.com/ec2 S3 http://aws.amazon.com/s3 Resource Center http://aws.amazon.com/resources Forums http://aws.amazon.com/forums