EMR Operability. Barnum does not persist clusters. 1. Start job: create new cluster. 2. Sync data from S3. 3. Run mapred
lectures for a modern theoretical course in computer science. ..... the
mathematical foundations rather than dwell on particular applications which are
...... easy to see that P is the image under an invertible linear transformation of a
rectangul
Etsy guaranteed active buyers that their credit card information would never be seen by the sellers they do business wit
Find and save ideas about Marketing pdf Making Money Online In ... Selling Success Third New Product Ideas Software Web
Oct 20, 2004 - Primary flight control software of the Airbus A340/A380 fly-by-wire system. ⺠C program, automatically generated from a propri- etary high-level ...
She followed in the path of Anne Ford, scion of the elite automobile for- tune; Ford's experience ... As they noted in a recent listserv email intended to mobi- lize parents and .... ble disability and the attention it mobilized is a reminder of how
Dec 16, 1987 - and Jim Cox and Gerson Levin of Brooklyn College. Discussions ...... Jonathan Rees, Norman I. Adams IV, and James R. Meehan. The t man-.
dependencies in production functions and industry structure, firm profits, and industry average ... When interdependencies are great, even small errors.
two with the greatest power to effect change â government, pursuing new laws and regulations ..... search engine, say for its free email services. Here ... avoid wholesale aggregation and profiling, advertising and marketing purposes being the.
Vishal Gaur. Department of Information, Operations, and Management Science, Leonard N. Stern School of Business,. New York University, 44 West 4th Street, ...
Duke University, Curham N.C., in 1968 and a ... facing hardware devices involve some form of pro- ing about ..... M.E. (Mutual Exclusion) is an interlock element.
Michael J. Lenox, Scott F. Rockart, Arie Y. Lewin. Fuqua School of Business, Duke University, ..... (Lippman and Rumelt 1982). In particular, we assume that this ...
(We will call top performers those firms that had above ..... SEM. Mean. SEM. H1. CommerceAlliance. 3.28. 0.22. 3.01. 0.28. 3.62. 0.35. 2.96. 0.33. 3.80. 0.55.
dren, often taking their insights beyond the home, and contributing more broadly to new ..... Consider the example of one of our research subjects, Dana Buchman, ..... the social worker predicted, the second meeting went smoothly and appro-.
(Etsy SEO, Ebay, Making ... Etsy businesses, Etsy Beginner Ideas) pdf download Etsy: Etsy Business: 50 Beginner Success
Dec 28, 2010 - Email: [email protected]. Science, Technology .... vism: they welcome the resources and sometimes the opportunities for the ethical ''modest ... many months to complete by hand quickly went to semiautomated and then.
Feb 15, 1999 - n=1. (yn â pxn)2. We rewrite the expression in terms of column N-vectors as: min .... points to the fitted function, as measured along a particular axis direction. ... ular to a unit vector Ëu, and the optimization problem may thus
The ability to execute the same application code on hardware with different ... Path B. 50% performance loss ... Not tru
When the baby cried, the parents rushed into the room. â ACE Relation: ... Mary left the room. This upset her .... Def
Aug 2, 2010 - Courant Institute of Mathematical Sciences ... It is hoped that this book will prepare readers to be able
The NYU Doctoral Program brings together the faculty and resources of the NYU Graduate School, the NYU School of Medicine, and Mount Sinai. Learn More ...
best play on all sides. What is the ... Several readers reported sites that give minimal. 17-clue ... Grossman and Ken Z
rules of behavior for people, hospitals, on-site responders and ambulances, with- out depending .... located respectively in Union Square, Times Square and Central Park. The ex- ... In this research work we have shown that a deep analysis of.
We formally identify the post operator of a boolean heap program as an .... appropriate component function when a symbol is interpreted in a state. ... a logic given with respect to the signature of program variables Var and ... s = s[x â¦â [[e]]
Etsy's data science infrastructure. The Precision / Recall tradeoff. Building products on the web using data science. Da
Practical Data Science @ Etsy Dr. Jason Davis
About me Ph.D. Machine learning & data mining Entrepreneur. Founder of ad startup Adtuitive Engineering Director @ Etsy. Search & Data
Topics Etsy's data science infrastructure The Precision / Recall tradeoff Building products on the web using data science Data science statistics Analyzing products with data
Etsy's data science infrastructure
Mapreduce and Elasticity Scalability
Operability
Elastic MapReduce (EMR) @ Etsy
EMR Scalability 0 to 1,000 machines to 0. Instantly. Simplified job scheduling
Easy to "backfill" and run expensive jobs.
EMR Operability Barnum does not persist clusters
1. 2. 3. 4.
Start job: create new cluster Sync data from S3 Run mapreduce analysis Shutdown cluster and sync results to S3
No operability support required
Precision vs. Recall
Precision vs. Recall
Recall = |Target AND Algorithm | / | Target | Precision = |Target AND Algorithm | / | Algorithm |
Academic Diversion: Distance Metric Learning via LogDet Divergence
Goal: learn distance measure to respect constraints
Academic diversions (cont.)
Recall vs. Precision
Adtuitive: High precision retail matching
Precision correlates with database size Scale the db & simple measures work well
Etsy.com search: relevancy vs recency
Search on Etsy: relevancy vs recency Buyer: "I want to see bananas" Seller "I want to see my bananas" Buyers: prefer high precision Sellers: prefer high recall
Building products with data
SELECT * FROM listings WHERE title LIKE '%silver%' ORDER BY creation_tsz;
Recall domination
I was looking for a pink bonnet with blue polka dots. I searched for "polka dot bonnet" but all I got were vintage polka records. Then I wanted a classic rock album and searched for "rock and roll" and got nothing but circular stones. http://www.etsy.com/blog/news/2007/search-what-you-wanted-and-what-you-got/
The switch to search relevancy
The switch to relevancy Better buyer experience ● Higher precision via better search algorithms Better advertising for sellers ● Opportunity to pay for recall via Search ads Better search insight for sellers ● Visibility into search placement via Shop Stats
Step 1: Understand your data
Dig through raw data Garbage in, garbage out Visit logs, Raw query logs
80% of data science is data cleaning!
Descriptive statistics Goal: reject hypotheses as quickly as possible search depth query conversion rates
Step 2: Invest in tools
Lunr
Good tools provide visibility into the intersection of algorithms & data
Side by side testing
A/B Analyzer
Shop Stats
Step 3: Understand Tradeoffs
(most tradeoffs don't look like this)
Queries a, b, & c are better. Queries x & y are not.
Thinking short-term vs. long-term How will the change affect behavior? Affect expectations? User incentives? Improved data?
Conversion rate: # of purchases / # of visits Registration rate: # of registrations / # of visits Funnel completion rates
Scenario: ● ● ●
OMG, I just deployed unicorns to the site... The registration rate increased from 2.53% to 2.58% Where's the champagne?
The binomial distribution
Completion rates are binomial ● Visitor completes task (success) ● Visitor does not complete task (failure) One parameter: p = probability of success Other fun facts ● The binomial distribution is not the normal distribution ● 0 less data ● Not symmetric Example ● success = 6, trials = 100 ● interval: [0.022, 0.126]
Example: Unicorn Search Experiment setup: 50/50 unicorns / non-unicorns Hypothesis: unicorns increase conversion rate by 2% Assumptions ● 500k visits with search per day ● Baseline conversion is 2.0% ● Hypothesis conversion is 2.04%
Example: Unicorn Search After 7 days ● 3.5M search visits, 1.75M per bucket ● Non-unicorns: 35,000 conversions ○ [0.0198, 0.0202] ● Unicorns: 35,700 conversions ○ [0.0202, 0.0206] Conclusion: We need an entire week to measure this improvement (!)
Analyzing products with data
Web analytics 101 Page view Visit Visitor Bounce Referer
Funnels
Example: search funnel
Search => View listing => Add to cart
The product lifecycle
Controlled Experiments
Controlled Experiments
Experimental Steps 1. Think of something awesome; implement it 2. Release as controlled experiment to small percentage of users 3. Wait a week 4. Analyze on-site behavior comparing control vs test group 5. Apply statistics and determine if the change is a win or a loss