Immersive Teaching and Research in Data Sciences via Cloud ...

0 downloads 95 Views 3MB Size Report
hourly price exceeds the maximum hourly price. Spot instances are snapshotted then terminated. No data is deleted when a
RosettaHUB & AWS Educate The pathway to AWS mass adoption in Higher Education and research. The pathway to Pervasive cloud, data science, machine learning, big data and HPC education.

RosettaHUB & AWS Educate figures 65 higher education institutions including 4 among the top 10 universities in the World. 14,000+ students, educators and researchers

$1.6M+ of managed AWS credits renewable every year 16 Countries including the UK, Ireland, France and Germany 100% Automation of onboarding, resources and consumption monitoring and users management 0 financial risks taken by institutions and 0 operational effort required from them.

RosettaHUB, state of the art federation platform for AWS

The building blocks of AWS democratization RosettaHUB provides every student and every educator with an account on a social collaboration portal. Each portal account is linked to a private AWS account created, managed and monitored by RosettaHUB. The portal makes advanced AWS capabilities easy to understand and operate by students and educators. It also makes all cloud artifacts easy to share. RosettaHUB fully automates the onboarding processes. It collects and aggregates the grants provided by AWS Educate ($100 per student per year and $200 per educator per year) and gives institutions flexibility on credits allocation.

End-to-end monitoring, management and audit The institution’ Central Point Of Contact (CPOC) and educators can monitor on realtime the students’ interaction with AWS and the portal. The CPOC can manage students: adjust their budgets, their rights on AWS, their resources allowances, etc. The CPOC can create sub-organizations and assign roles to colleagues for a multi-tenant management of students. System administrators can generate reports on users activities and cloud usage. They can measure and assess effectiveness of the use of cloud resources. Repositories of pedagogic cloud artifacts can be prepared and shared with students.

Students and educators dashboards The RosettaHUB students and educators dashboards display an access button to the AWS console as well as access keys for programmatic access to AWS. It provides detailed aggregated real-time information about the resources being used on AWS, the budget amount left and the estimated overall hourly cost. Students and educators can request: 1. Limit increase to access higher capacity machine instance types (eg. p2.*, p3.*, g3.* GPU instances). 2. Access to optional AWS services 3. Budget increase and budget transfer to other users 4. Support 37 AWS Services are accessible by default. Access is available to IAM in a proxied manner to preserve the accounts sandboxing. IAM users and IAM roles can be easily and safely created and managed from the dashboard. Limits and budget requests are automatically processed by the RosettaHUB pipelines within a predefined scope. RosettaHUB creates and tracks tickets with AWS support.

Cost optimization and safeguards Accounts get automatically disabled and all on-demand EC2 instances are stopped if the user goes above 100% of his/her budget or if the estimated hourly price exceeds the maximum hourly price. Spot instances are snapshotted then terminated. No data is deleted when a user is disabled. Auto-stop on idle EC2 instances: the user can set the maximum idle time or disable this feature. By default it is set to 6 hours. Notification emails at 50%, 70%, 90% and 100% of budget consumption. Use of Spot instances is promoted in the RosettaHUB launch panels, spot instances are the first choice when launching instances or clusters. Users monitoring panel in the CPOC’s management console

Full technical and compliance integration Institutions, educators and students take no financial risks as all AWS accounts are guaranteed by RosettaHUB. RosettaHUB acts as a procurement adapter: Beyond AWS Educate credits, It allows Higher Education institutions and research laboratories to top-up their RosettaHUB institutional account with cloud credits in compliance with their regulatory frameworks and administrative constraints.

A dedicated RosettaHUB infrastructure can be fully integrated with the institution’s Information system.

Dedicated RosettaHUB Users can authenticate through institutional SAML or Active Directory infrastructures. Registrations’ lifecycle management actions can be triggered programmatically by the Institutional students management system. Notification emails can be customized for the institution and custom Email servers can be used. Cloud resources lifecycle management and sharing actions can be scheduled with cron and rate tasks. A dedicated marketplace can be used as an institutional sharing platform for pedagogic and research artifacts (files and data, virtual labs, machines and containers images, etc.)

AWS Educate + RosettaHUB

AWS Educate

AWS Educate Starter Account

Credits consolidation

The AWS Educate credits of each student/educator are collected and consolidated in a single pot. Institutions can get up to $250,000 of AWS Educate credits renewable yearly and they can manage them at will. Allocation of budgets is fully flexible.

Each student gets $100 and each educator gets $200 of AWS educate credits which they use independently. There is no consolidation of credits and no flexibility of budget allocation.

Each Student gets $75 of credits on a restricted account. Educators can’t use starter accounts. There is no flexibility on budgets allocation.

Financial risk

All AWS Accounts are consolidated under a single payer AWS account. RosettaHUB guarantees the master account and takes the financial risk in case the institution uses more than its AWS Educate credits. Students data is preserved when they go above their budget.

Each student/educator opens his/her own account using a credit card and takes the financial risk if the credits are exhausted or if he/she uses services not covered by AWS Educate.

Students don’t put their credit cards and don’t take financial risks. But they lose all their data and work on AWS if they go above their credits

Access to AWS Services

Students can access all services covered by AWS Educate which include big data, IoT and ML services. Students can be granted access to any instances types.

Management of the institution’s accounts

The institution's managers can monitor all the users accounts, increase/decrease their budgets, change their perimeter of action, block or unblock their access to AWS.

The institution has no visibility on the accounts of the students/educators.

The institution has no visibility on the accounts of the students/educators.

Billing monitoring

Billing information on each user's account is updated on real-time, notification emails are sent when the user reaches 50%, 70%, 90%, 100% of his/her budget.

AWS accounts with AWS credits show billing information only at the end of the month, the user has no information about AWS credits consumption during the month.

Consumption is reflected on the dashboard with a substantial delay increasing the risk of overconsumption and account shutting down.


RosettaHUB creates alarms that stop EC2 instances in case they remain idle for a number of hours in order to avoid credit depletion in case the user forgets instances running.

The user has to implement his/her own safety measures, there are no default safety measures that come with an AWS account.

The user has to implement his/her own safety measures, there are no default safety measures that come with an AWS account.

Enrollment process

The enrollment process to AWS, AWS Educate and RosettaHUB is simple and goes through a single form on a website that is dedicated to the institution. Batch enrollment through Excel files or via an API is also available.

The enrollment process requires a credit card and an email linked to the institution and contains multiple forms and steps.

The enrollment process requires an email linked to the institution.

AWS as social cloud

Sharing cloud artifacts is easy, users can share IAM users, S3 buckets, AMIs by simply specifying the logins/emails of their collaborators or adding the users to groups or organizations.

Sharing artifacts requires knowledge about IAM policies, S3 bucket policies, obtaining other users AWS account ids and keeping the information up to date when sharing with a group.

Sharing artifacts requires knowledge about IAM policies, S3 bucket policies, obtaining other users AWS account ids and keeping the information up to date when sharing with a group.

Students can access all services covered by AWS Educate.

Students can access only a subset of services covered by AWS Educate. They don’t have access to essential services such as IAM. They don’t have access to big data services (EMR) nor to high capacity machine instances

RosettaHUB, state of the art data science platform

Democratic and pervasive data science The RosettaHUB platform closes the technology gap between clouds, containers, data science software, realtime collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose containers-based virtual e-learning environments and for researchers to compose virtual e-science environments. Jupyter, RStudio, Spark, Zeppelin, Shiny Apps, virtual desktops, HPC clusters, etc. can be added to the virtual environments and made accessible in a secure and highly scalable-manner to thousands of students or collaborating researchers.

Defining the meta-cloud: RosettaHUB Web Services & managed images RosettaHUB delivers : • A docker-based meta-cloud. • A universal data science workbench. • A meta-kernel for data science • A man-cloud and man-data interaction design • A sharing model for cloud artifacts • A SOAP/Restful API with ~1000 functions • SDKs and add-ins • A cloud and data products marketplace. RosettaHUB fosters • Usability • Reproducibility • Shareability • Auditability at all layers of interaction between students, educators and researchers and their software tools, infrastructures and peers.

Public Cloud Private Cloud

Data scientist

One-click access to AWS-powered data science The RosettaHUB dashboard displays the cloud and data science related artifacts as customizable icons structured in categories. RosettaHUB meta-formations: they enable one-click provisioning and access to fullymanaged complex infrastructures for elearning and e-Research. RosettaHUB meta-keys: they map AWS access keys and a default VPC, they allow rapid access to AWS services and they can be shared. RosettaHUB meta-images:

Managed: they come with agents to orchestrate all service components and expose a composable virtual workbench to the end user

Semi-managed: they map any EC2 AMI

RosettaHUB meta-storages: they map S3 buckets, EFS or EBS volumes. They can be used as the working or reference volumes for managed instances and clusters.

User-friendly Spark and Hadoop clusters for research and education

Launching an EMR cluster can be done in one click by choosing an available formation or by creating a custom formation with custom settings

Seamless creation of Hadoop and Spark clusters based on AWS EMR, the RosettaHUB smart proxies and the RosettaHUB workbench. Support for both on-demand and spot. Seamless access to clusters with shells and notebooks including RosettaHUB notebooks, Zeppelin, Jupyter, Spark-Notebook, etc. Real-time collaborative access, cluster sharing, security and access control for Hadoop and Spark. Seamless data management, seamless mounting of S3 and EFS volumes on master and slave nodes. Very rapid big data applications prototyping using the RosettaHUB reactive programming frameworks, web applications designers and spreadsheet engines. Access the cluster’s master in the browser from the RosettaHUB collaborative workbench

User-friendly managed HPC for research and education Seamless creation of NVIDIA-docker based virtual environments for deep learning on GPU. Seamless creation and access to HPC clusters based on Alces Flight (or cfnCluster), the RosettaHUB smart proxies and the RosettaHUB workbench.

Real-time eagle-view on resources, billing and hourly cost for HPC clusters. Seamless data management, seamless mounting of S3 and EFS volumes on master and slave nodes. Extended support for spot and autoscaling.

Out-of-the-box cluster security and access control. Notebooks, cluster sharing and real-time collaboration for Alces Flight and cfnCluster. Seamless scheduling using cron and rate tasks.

Interactive Scientific Web UIs and reactive programming frameworks for HPC clusters.

Launching a HPC cluster can be done in one click by choosing an available formation or by creating a formation with custom settings

RosettaHUB meta-Formations eg. Deep learning assignments Spot Machine

Spot Machine

EMR Cluster

Cloud Keys: AWS Keys

Cloud Keys: AWS Keys

SSL certificate

SSL certificate

Proxy Instance Type


Machine Image: Tensorflow GPU Image

Machine Pool

Instance type: p2.xlarge

EMR Cluster

Maximum Bid Price

Master Instance type: m4.large

HPC Cluster

Reference and Working Volumes

Slave Instance type: m4.large

Spot Machine Pool Spot EMR Cluster RosettaHUB meta Formation

eg. Big data workshop

Spot HPC Cluster

Proxy Image: Standard CPU Image

Reference and Working Volumes

Students and educators persistent workspaces RosettaHUB creates for each student and educator a default S3 storage and a default EFS storage which map an S3 bucket and an EFS volume Formations are configured with working volumes and reference volumes which can be mappings of EFS, EBS, S3 or FTP. These are automatically mounted on the EC2 instances including nodes of HPC and EMR clusters Any public formation that the user launches automatically uses the default user’s EFS as its working volume: Data generated by students and educators is persistent and survives the termination of machine instances The reference volume can by synched at start-up to the working volume

EFS, EBS and S3 Volumes can be automatically mounted on the docker container of the RosettaHUB managed instances

Universal collaborative workbench The RosettaHUB meta-formations and Images can be used to create RosettaHUB Sessions. Sessions provide access to the universal workbench and they can be shared with a user or a group of users. Users have the same view on the workbench and can collaboratively create and adjust widgets, interact with tools and data. Composable widgets include: •

Real-time collaborative consoles, notebooks and code editors on the most commonly used tools for data analysis: R, Pyhton, Scala, RStudio etc.

Applications access (Jupyter, Zeppelin, etc.)

Real-time collaborative RStudio

Real-time collaborative remote desktop access in the browser.

Data visualization and interaction components such as charts, sliders, buttons.

Meta compute kernels & seamless data management The universal workbench allows the remote interactive control of RosettaHUB metakernels created and managed by the RosettaHUB docker agents. The RosettaHUB meta-kernels are processes merging the virtual machines of Java, R and Python. Meta-kernels allow intercommunication and in-memory transfer of variables from one language to the other Meta-kernels data access is fully managed by RosettaHUB. Meta-kernels can be shared as well as their working volumes and reference volumes.

Semi-managed images Semi-managed images allow users to easily launch a machine from the RosettaHUB web console using their RosettaHUB keys Launching semi-managed images can be done in one click from the RosettaHUB dashboard Access to the instances is managed by RosettaHUB, ie. RosettaHUB generates and saves the private keys associated with the instance as well as the password for Windows instances. Users can retrieve their private keys and passwords anytime . Instructions on how to connect to Linux and Windows instances are provided to the user

Contact: [email protected] RosettaHUB Website: To register a new institution:

For members of registered institutions: