Building statistical models by visualization - Microsoft Research

Recommend Documents

Combining Visualization and Statistical Analysis to Improve ... - Microsoft

Web applications suffer from software and configuration faults that lower their .... top 40 pages (which cover about 98% of traffic at Ebates). Fundamentally, this ...

Combining Visualization and Statistical Analysis to Improve ... - Microsoft

computer does best: statistical analysis of log data. For the human operator, we provide a tool to help bring her system experience and expertise to bear on ...

Conditional Visualization for Statistical Models: An Introduction ... - arXiv

Oct 2, 2016 - fitted models on the section, and observed data near the section. ... In a geometric sense, conditional visualization means taking a section.

Statistical assessment of numerical models - National Research ...

Feb 18, 2003 - Center activities include workshops and other research activities in .... Statistical evaluation of model performance is viewed as part of a ...... We call the probability distribution of Y induced by the distribution of x the uncertai

Research Article Comparing Statistical Models to ... - ScienceOpen

1 Centre for Quantitative Medicine, Office of Clinical Sciences, Duke-NUS Graduate Medical School Singapore, Singapore 169857. 2 Tan Tock Seng Hospital, ...

Web-based visualization of large 3D urban building models

Mar 15, 2012 - Download by: [Institute of Remote Sensing Application] ..... we simplify 3D building models by the geometry information like vertices, faÐ·ade.

Visualization of building models and sensor data ...

(email: [email protected]). Matti Kurkela, ... platform. The prototype is utilized to visualize 3D models produced by terrestrial laser scanning, ... automated as-built BIM reconstruction problem and some of the partial solutions. This.

Statistical Models of Music-listening Sessions in Social Media - Microsoft

ferent complexity to describe patterns of song listening in an online music ..... and they provide a soft clustering of the media items M. Thus, for each cluster ci,.

Statistical Analysis-Based Error Models for the Microsoft Kinect ... - MDPI

Sep 18, 2014 - for each axis direction. Depth (z) directional error is measured using a flat surface, and horizontal (x) and vertical (y) errors are measured using ...

Statistical Analysis-Based Error Models for the Microsoft Kinect ... - MDPI

Sep 18, 2014 - Abstract: The stochastic error characteristics of the Kinect sensing device are presented for each axis direction. Depth (z) directional error is ...

Building Visualization Services

Our efficient team of 3D artists and modelers with qualified professionals and engineers work to drive your market endeavors through 3D rendering services.

[PDF] Microsoft Excel 2013 Building Data Models with ... - Google Sites

Microsoft Excel 2013 Building Data Models with PowerPivot (Business Skills) PDF ... with PowerPivot (Business Skills) Be

eBook Microsoft Excel 2013 Building Data Models with ... - Google Sites

PowerPivot (Business Skills) review, kindle Microsoft Excel 2013 Building Data .... with PowerPivot (Business Skills) au

Statistical analysis using Microsoft Excel Microsoft Excel ...

Items 1 - 7 ... Microsoft Excel spreadsheets have become somewhat of a standard for data ..... sion capabilities; reference (8) updates this to Excel 2007 and notes ...

Understanding Visualization by Understanding ... - Research at Google

argue that current visualization theory lacks the necessary tools to an- alyze which factors ... performed better with t

Microsoft Access GUI Building

Lecture notes for 15.564: Information Technology I. 1. Microsoft ... Microsoft Access provides the tools for developing graphical user interfaces that facilitate the ...

Connecting to MySQL and Microsoft SQL Server - Statistical Research

Connecting to a MySQL database or Microsoft SQL Server from the R ... Database connectivity using R requires a bit more work than just installing a library into ...

Markov Topic Models - Microsoft

Bo Thiesson, Christopher Meek. Microsoft Research. One Microsoft Way. Redmond, WA 98052. David Blei. Computer Science Dept. Princeton University.

Combining Statistical Models

Contemporary Mathematics. Combining Statistical Models. M. Sofia Massa and Steffen L. Lauritzen. Abstract. This paper develops a general framework to ...

Statistical Models of Fracture

Sep 26, 2006 - arXiv:cond-mat/0609650v1 [cond-mat.stat-mech] 26 Sep 2006. Advances in Physics. Vol. 00, No. 00, Month-Month 200x, 1â148. Statistical ...

ALGEBRAIC STATISTICAL MODELS

Mar 20, 2007 - Statistics at the American Institute of Mathematics. The algebraic problems studied in algebraic statistics are of a rather diverse nature. At the very ..... gularities. These are all part of answering the question: How does the geomet

Statistical Landmark Models

cation, and a C++ implementation is presented. ... 11. 4 C++ Implementation. 14 ..... printf("Enter percentage of Shape variability: "); scanf("%i",&shapevar);.

statistical thinking models - International Association for Statistical ...

The University of Auckland. NewZealand. Models for statistical modes of thinking and problem solving have been developed, and continue to be developed, by ...

Nonparametric Models for Uncertainty Visualization

Eurographics Conference on Visualization (EuroVis) 2013 ... For visual data analysis of such .... sian mixture models to approximate large ensemble datasets.

Building statistical models by visualization - Microsoft Research

Download PDF

17 downloads 37 Views 216KB Size Report

Comment

A pseudo-scatterplot for unpaired data. • Quantile of x ... Maximize the spread of the projected data. • Regression ... “The Elements of Graphing Data”, William.

Building statistical models by visualization Tom Minka CMU Statistics Dept

Outline • Scatterplots – independence, causality

• QQ plots – distribution checking

• Residual plots – linearity, outliers

• Projections for regression – additivity

• Projections for classification – linearity

Football statistics Is there a better representation?

CIN OAK GB SEA JAC NO KC TB MIN TEN CAR NYJ NE STL …

W L T PF PA 2 14 0 279 456 11 5 0 450 304 12 4 0 398 328 7 9 0 355 369 6 10 0 328 315 9 7 0 432 388 8 8 0 467 399 12 4 0 346 196 6 10 0 390 442 11 5 0 367 324 7 9 0 258 302 9 7 0 359 336 9 7 0 381 346 7 9 0 316 369

Visual independence test • “Permutation test” • Randomly pair x values with y values • If the distribution looks different from the original, the variables are dependent • No distributional assumptions required

Comparing distributions • • • •

“Quantile-quantile plot” A pseudo-scatterplot for unpaired data Quantile of x = fraction of points < x Plot quantile q in one set against quantile q in the other set, for all q • Tells you how to transform one variable to have the distribution of the other

Regression models How to do regression visually: • Transform to make the picture simpler • Fit a simple model • Use residuals to suggest more complex models, outliers to remove • Iterate

High-dimensional data • Two basic approaches to visualization • Many points, few dimensions: – Projection – Slicing

• Few points, many dimensions: – Parallel-coordinates – Iconic displays

Projections • PCA – Maximize the spread of the projected data

• Regression projection – Project only the predictors (inputs) – Maximize the spread of the response (output)

Boston Housing data • Predict the median house value in Boston census tracts, based on crime, poverty, industry, pollution, etc. • A regression problem with many predictors • Is an additive model appropriate?

Discriminative projections • M-projection (Fisher, LDA) – Tries to separate means of classes

• V-projection – Tries to separate variances of classes

• MV-projection – Maximize KL divergence between Gaussians – Separates means and variances

Sonar problem • Sonar echo is represented by energy in 60 frequency bands • Mines vs. Rocks • Dataset is linearly separable, but 1nn consistently beats linear classifiers

Vowels dataset • Binary problem: “hid” vs. rest • Knn and quadratic kernel beat linear

Online digit recognition • Classify “8” vs. rest • Knn beats quadratic kernel beats linear

Some good books • “The Elements of Graphing Data”, William Cleveland, 2nd Ed. • “Visualizing Data”, William Cleveland • “The Visual Display of Quantitative Information”, Edward Tufte • “Exploratory Data Analysis”, John Tukey

Summary • Visualization is a simple and fast way to check model assumptions and learn about a domain • Many opportunities still exist to design better graphs, esp. for high dimensions • Visualization is not “art”, but a well structured field, worthy of research attention