Flywheel: Google's Data Compression Proxy for the Mobile Web Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott*, Matt Welsh, Bolian Yin
[email protected]
[email protected] *Currently a graduate student at UC Berkeley
Dominant Access Tech: Mobile •
Mobile devices are increasingly dominant
•
Growth is greatest in emerging markets 2
Mobile Data is Expensive
Emerging markets have highest costs!
Source: blog.jana.com 3
Flywheel •
Flywheel: proxy service that optimizes HTTP response size •
Three years of deployment experience, part of Chrome for Android, iOS, Desktop
•
Currently serving millions of users & billions of requests per day
4
What Flywheel Does Transcode to WebP Pick quality level Minify CSS, JS GZip text objects Total bytes: 9764
Total bytes: 5565 5
6
None of our optimizations are novel
7
What lessons did we learn from building and operating Flywheel? • •
Key lessons: Highly challenging to maintain good performance Tussles are pervasive, ongoing, & time-consuming
Outline •
Is a proxy really needed?
•
What’s hard about engineering Flywheel?
•
Does Flywheel meet our goals?
•
What can be learned? 9
The web isn’t well optimized for mobile •
Case in point: •
42% of HTML response bytes not compressed 10
Hard to keep up with best practices •
New optimizations: WebP, SDCH, HTTP/2,... rolled out as often as every 6 weeks
•
Heterogeneity of mobile devices increasing Need: Optimizing service for the mobile web
•
Need: an optimizing compiler for the mobile web
11
Outline •
Is a proxy really needed?
•
What’s hard about engineering Flywheel?
•
Does Flywheel meet our goals?
•
What can be learned? 12
Design Constraints •
Opt-in deployment model
•
Transparent to users
•
HTTP only (No HTTPS)
13
Challenge: trading off latency vs compression Indirection through Flywheel often increases RTT
RTT’
>
RTT
HTTP Origin
Network latency is dominant performance factor (Page size is not a dominant factor!) 14
Flywheel Design Cache
Proxy
Fetch router
Fetch bots
Optimization Optimization Optimization services services
HTTP Origin
services
Google datacenter
GET /
Fetch router maintains connection affinity 15
Flywheel Design Cache
Proxy
Fetch router
Fetch bots
Optimization Optimization Optimization services services
HTTP Origin
services
200
200 Google datacenter
Optimizations: image transcoding, GZip, minification … Separate optimization services for isolation, provisioning
16
Selective Proxying Indirection through Flywheel often increases RTT
GET /
HTTP Origin
Fetch objects on critical path from origin 17
Selective Proxying
GET /i.jpg
HTTP Origin
Fetch objects on critical path from origin Proxy objects that yield high data reduction 18
Challenge: Accommodating Tussles
Need to be policy-neutral
CANARY
Mechanism: HTTP fallback (blockable canary requests)
19
Outline •
Is a proxy really needed?
•
What’s hard about engineering Flywheel?
•
Does Flywheel meet our goals?
•
What can be learned? 20
Evaluation •
•
This talk: •
Primary goal: reduce web page size
•
Secondary goal: maintain good performance
Paper: •
Fault tolerance 21
Workload: Geographic Adoption Country
Adoption
Worldwide
10.5%
Brazil
17%
Russia
16.5%
Indonesia
16.3%
Mexico
15.5%
USA
9.5%
Adoption highest in developing markets 22
Workload: Page Footprints
23
Workload: Page Footprints
Majority of pages are small
24
Workload: Page Footprints
Majority of pages are small
97% of bytes come from top 5% largest pages
25
Data Reduction Savings = 1 - (outgoing bytes / incoming bytes) * 100 Type
% of Bytes
Savings
Share of Benefit
Total
100%
58%
-
Images
74.12%
66.40%
85%
HTML
9.64%
38.43%
6%
JavaScript
9.10%
41.09%
6%
CSS
1.81%
52.10%
2%
Other
5.33%
9.23%
1%
26
Data Reduction Savings = 1 - (outgoing bytes / incoming bytes) * 100 Type
% of Bytes
Savings
Share of Benefit
Total
100%
58%
-
Images
74.12%
66.40%
85%
HTML
9.64%
38.43%
6%
JavaScript
9.10%
41.09%
6%
CSS
1.81%
52.10%
2%
Other
5.33%
9.23%
1%
27
Data Reduction Savings = 1 - (outgoing bytes / incoming bytes) * 100 Type
% of Bytes
Savings
Share of Benefit
Total
100%
58%
-
Images
74.12%
66.40%
85%
HTML
9.64%
38.43%
6%
JavaScript
9.10%
41.09%
6%
CSS
1.81%
52.10%
2%
Other
5.33%
9.23%
1%
Images are bulk of bytes & savings 28
Reduction Across Users
29
Reduction Across Users
Median data reduction: 50%
30
Reduction Across Users
31
Reduction Across Users
Overall data reduction: 27%
32
Performance: Methodology •
Goal: don’t degrade performance Compare:
•
Random sampling of Flywheel users
•
Holdback experimental group 33
Simple Model of Page Load Time Load time of subresource si = propagation delay + transmission delay + computation time Critical path = longest chain of dependent subresources
Time
Page load time = Σ si on critical path
40
Page Load Time Holdback
36
Flywheel
36.13
32
Flywheel improves performance only in the tail
28
Seconds
39.38
24 20 16
14.61
12 8
9.22
4 0
5.38 3.68 2.08
2.21
Median
13.89
8.95
5.37
3.78
70th
80th
90th
95th
Quantile (page loads)
99th 35
Why is this? •
Recall our workload: • •
•
Long tail of large pages Most pages are small
Transmission delay dominant factor Propagation delay dominant factor
Good way to understand propagation delay: time to first byte (TTFB) 36
6
Time to First Byte Holdback
5.4 4.8
Seconds
4.2 3.6
Flywheel
5.81 5.06
Most users’ direct path to the origin is shorter than the indirect path
3 2.4 1.8
1.69
1.2 1.00
0.6 0
0.19
0.30
Median
0.36
0.49
70th
0.55
1.90
1.16
0.69
80th
90th
Quantile (requests)
95th
99th 37
Outline •
Is a proxy really needed?
•
What’s hard about engineering Flywheel?
•
Does Flywheel meet our goals?
•
What can be learned? 38
Lessons Learned Many performance optimizations we expected to have impact did not at Web scale 39
Disappointing Optimizations Preconnect: open TCP connection with origin early
•
bar.com bar.com
.....
foo.com foo.com
•
Increases reused connections from 73% to 80%
•
Yields less than 2% decrease in median PLT
Already a strong tendency for connection affinity
40
Disappointing Optimizations Prefetch: request cacheable subresources early
•
bar.com
GET /js /js
.....
foo.com foo.com
•
Increases cache hit ratio from 22% to 32%
•
Yields less than 2% decrease in median PLT
Cacheable items often aren’t on the critical path
41
Lessons Learned Improving PLT is highly challenging •
Compression doesn’t help (much)
•
Difficult to target critical path 42
Lessons Learned If you want widespread adoption, you must accommodate policy issues! 43
Lessons Learned Many more measurement findings and lessons in the paper! 44
Conclusion •
Flywheel shows it’s possible to provide 58% average HTTP data reduction at web scale •
Data reduction is the easy part
•
Maintaining performance and accommodating tussles are the hard part lmgtfy.com/?q=Chrome+Data+Saver
[email protected]
[email protected] 45