The QuantCell Big Data Spreadsheet - Meetup

14 downloads 23 Views 574KB Size Report
Mar 23, 2013... libraries from the spreadsheet is simple and in many cases possible by non- developers. Live demo: OpenGamma Financial API example.
*

The QuantCell Big Data Spreadsheet

Agust Egilsson, PhD [email protected]

Big Data Science Saturday, March 23, 2013

* Image cropped from article about QuantCell Research in Java Magazine JULY/AUGUST 2012

1

We will talk about ….

Java based programming for big data scientists & end-users How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Deployment of solutions into production Big Data analytics created and consumed by end-users Long running operations, multithreading and garbage collection Simplifying coding for the data scientists Questions

The QuantCell Big Data Spreadsheet

2

Java based programming for big data scientists & end-users How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Why spreadsheets for data-scientists Deployment of solutions into productionand domain experts? Big Data analytics created and consumed by end-users • shorter turnaround times (e.g. financial products) Long running operations, multithreading and garbage collection • dynamic execution, Simplifying coding for the datadebugging scientists and testing • integrated runtime and development environments Questions • experiment driven programming • expression-oriented programming • minimum or no GUI design • by far the most widely used programming system

The QuantCell Big Data Spreadsheet

3

Java based programming for big data scientists & end-users How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Why Java?of solutions into production Deployment Big Data analytics created and consumed by end-users • largeoperations, ecosystem multithreading of analytical tools resources Long running andand garbage collection • explosive growth in publicly available APIs Simplifying coding for the data scientists • concurrency support Questions • big data analytics & technologies are mostly Java based • HPC and cloud ready • performance • optimization

The QuantCell Big Data Spreadsheet

4

Java based programming for big data scientists & end-users How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer The QuantCell big datainto spreadsheet supports Deployment of solutions production Big Data analytics created and consumed by end-users • highoperations, performance and access toand Hadoop clusters Long running multithreading garbage collection • intuitive access local and remote data-sources Simplifying coding for the to data scientists • access to a variety of algorithms and methods Questions • simplified programming already familiar to the expert • effortless deployment of solutions to Hadoop and into production

The QuantCell Big Data Spreadsheet

5

Java based programming for big data scientists & end-users How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Common use cases include Deployment of solutions into production Big Data analytics created and consumed by end-users • big data analyticsmultithreading and garbage collection Long running operations, • data mining using Mahout or weka etc Simplifying coding for the data scientists • risk analysis, pricing and trading strategies Questions

Live demo: Simple Java spreadsheet expressions: Data Market, Bio Data and simple analysis.

The QuantCell Big Data Spreadsheet

6

How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Explosive growth in publicly available Java analytical and Deployment of solutions into production libraries Bigvisualization Data analytics created and consumed by end-users Long running operations, multithreading and garbage collection Simplifying coding for the data scientists Questions Java end-user (big data scientists, quants) based programming

The QuantCell Big Data Spreadsheet

7

How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Explosive growth in publicly available Java analytical and Deployment of solutions into production libraries Bigvisualization Data analytics created and consumed by end-users Long running operations, multithreading and garbage collection For example: Simplifying coding for the data scientists • OpenGamma (695,000 lines) Questions • Weka lines) quants) based programming Java end-user (big(507,000 data scientists, • RapidMiner/YALE (535,000 lines) • BioJava (270,000 lines) • Chemistry Development Kit (861,000 lines) • NASA WorldWind (420,000 lines) • and so on … The QuantCell Big Data Spreadsheet

8

How the experts benefit directly from open source Java APIs Transferring JVM performance to the expert or programmer Same is true for analytical based frameworks Deployment of solutions into Java production Big Data analytics created and consumed by end-users • Apache Hadoop (2,200,000 linesand – Java and XML) Long running operations, multithreading garbage collection • Apache analyzing large datasets) Simplifying codingPig for(320,000 the data lines, scientists • Apache Hive (420,000 lines, data warehousing) … Questions Java end-user (big data scientists, quants) based programming Taking advantage of these libraries from the spreadsheet is simple and in many cases possible by non-developers Live demo: OpenGamma Financial API example.

The QuantCell Big Data Spreadsheet

9

Transferring JVM performance to the expert or programmer Deployment of solutions into production projects require performance BigAnalytical Data analytics created andtop consumed by end-users Long running operations, multithreading and garbage collection  expressions to byte code Simplifying coding forare thecompiled data scientists  expressions are optimized by Java Questions  dynamically intoquants) the JVMbased for execution Java end-user (big dataloaded scientists, programming just-in-time How theexperts benefitcompilation directly from open source Java APIs

Live demo: Let’s look at a few Java optimization tricks and confirm that these are used dynamically in the spreadsheet to optimize user expressions/functions

The QuantCell Big Data Spreadsheet

10

Transferring JVM performance to the expert or programmer Deployment of solutions into production code turning mode and off (-Xint) and BigLet’s Datarun analytics createdinterpreted and consumed byon end-users playrunning with the expressionmultithreading to eliminate optimization Long operations, and garbage collection Simplifying coding for the data scientists double (c2) = { Questions long start = System.nanoTime(); double add scientists, = c2; Java end-user (big data quants) based programming for (int i =directly 0; i