Computational Statistics Handbook with MATLAB® Second Edition

195 downloads 382 Views 5MB Size Report
Computer Science and Data Analysis Series. Computational. Statistics Handbook with MATLAB®. Second Edition. Wendy L. Martinez. The Office of Naval ...
«H Computer Science and Data Analysis Series

Computational Statistics Handbook with MATLAB® Second Edition

Wendy L. Martinez The Office of Naval Research Arlington, Virginia, U.S.A.

Angel R. Martinez Naval Surface Warfare Center Dahlgren, Virginia, U.S.A.

Chapman &. Hall/CRC Taylor & Francis Group Boca Raton

London

N e w York

Chapman & Hall/CRC is an imprint of the Taylor & Francis Group, an informa business

Table ofContents Preface to the Second Edition Preface to the First Edition

xvii xxi

Chapter 1 Introduction 1.1 What Is Computational Statistics? 1.2 An Overview of the Book Philosophy What Is Covered A Word About Notation 1.3 MATLAB® Code Computational Statistics Toolbox Internet Resources 1.4 Further Reading

1 2 2 3 5 6 7 8 9

Chapter 2 Probability Concepts 2.1 Introduction 2.2 Probability Background Probability Axioms of Probability 2.3 Conditional Probability and Independence Conditional Probability Independence Bayes' Theorem 2.4 Expectation Mean and Variance Skewness Kurtosis 2.5 Common Distributions Binomial Poisson Uniform Normal

11 12 12 14 17 17 17 18 19 21 21 23 23 24 24 26 29 31

vii

viii

Computational Statistics Handbook with MATLAB®, 2ND Edition

Exponential Gamma Chi-Square Weibull Beta Student's t Distribution Multivariate Normal Multivariate t Distribution 2.6 MATLAB® Code 2.7 Further Reading Exercises

34 36 37 38 40 41 44 47 48 49 52

Chapter 3 Sampling Concepts 3.1 Introduction 3.2 Sampling Terminology and Concepts Sample Mean and Sample Variance Sample Moments Covariance 3.3 Sampling Distributions 3.4 Parameter Estimation Bias MeanSquared Error Relative Efficiency Standard Error Maximum Likelihood Estimation Method of Moments 3.5 Empirical Distribution Function Quantiles 3.6 MATLAB® Code 3.7 Further Reading Exercises

55 55 57 58 60 63 65 66 66 67 67 68 71 72 74 77 78 80

Chapter 4 Generating Random Variables 4.1 Introduction 4.2 General Techniques for Generating Random Variables Uniform Random Numbers Inverse Transform Method Acceptance-Rejection Method 4.3 Generating Continuous Random Variables Normal Distribution Exponential Distribution Gamma

83 83 83 86 89 93 93 94 95

Table ofContents Chi-Square Beta Multivariate Normal Multivariate Student's t Distribution Generating Variates on a Sphere 4.4 Generating Discrete Random Variables Binomial Poisson Discrete Uniform 4.5 MATLAB® Code 4.6 Further Reading Exercises

ix 98 99 101 103 104 107 107 108 111 112 113 115

Chapter 5 Exploratory Data Analysis 5.1 Introduction 5.2 Exploring Univariate Data Histograms Stem-and-Leaf Quantile-Based Plots - Continuous Distributions Quantile Plots - Discrete Distributions Box Plots 5.3 Exploring Bivariate and Trivariate Data Scatterplots Surface Plots Contour Plots Bivariate Histogram 3-D Scatterplot 5.4 Exploring Multi-Dimensional Data Scatterplot Matrix Slices and Isosurfaces Glyphs Andrews Curves Parallel Coordinates 5.5 MATLAB® Code 5.6 Further Reading Exercises

117 119 119 122 124 132 138 145 145 146 148 149 155 158 158 160 166 168 172 179 181 183

Chapter 6 Finding Structure 6.1 Introduction 6.2 Projecting Data 6.3 Principal Component Analysis 6.4 Projection Pursuit EDA

187 188 190 195

x

Computational Statistics Handbook with MATLAB®, 2ND Edition

Projection Pursuit Index Finding the Structure Structure Removal 6.5 Independent Component Analysis 6.6 Grand Tour 6.7 Nonlinear Dimensionality Reduction Multidimensional Scaling Isometric Feature Mapping - ISOMAP 6.8 MATLAB® Code 6.9 Further Reading Exercises

197 198 199 204 211 216 216 220 224 227 230

Chapter 7 Monte Carlo M e t h o d s for Inferential Statistics 7.1 Introduction 7.2 Classical Inferential Statistics Hypothesis Testing Confidence Intervals 7.3 Monte Carlo Methods for Inferential Statistics Basic Monte Carlo Procedure Monte Carlo Hypothesis Testing Monte Carlo Assessment of Hypothesis Testing 7.4 Bootstrap Methods General Bootstrap Methodology Bootstrap Estimate of Standard Error Bootstrap Estimate of Bias Bootstrap Confidence Intervals 7.5 MATLAB® Code 7.6 Further Reading Exercises

233 234 234 243 246 246 247 252 256 256 258 260 262 268 269 271

Chapter 8 Data Partitioning 8.1 Introduction 8.2 Cross-Validation 8.3Jackknife 8.4 Better Bootstrap Confidence Intervals 8.5 Jackknife-After-Bootstrap 8.6 MATLAB® Code 8.7 Further Reading Exercises

273 274 281 289 293 295 296 298

Table of Contents

xi

Chapter 9 Probability Density Estimation 9.1 Introduction 9.2 Histograms 1-D Histograms Multivariate Histograms Frequency Polygons Averaged Shifted Histograms 9.3 Kernel Density Estimation Univariate Kernel Estimators Multivariate Kernel Estimators 9.4 Finite Mixtures Univariate Finite Mixtures Visualizing Finite Mixtures Multivariate Finite Mixtures EM Algorithm for Estimating the Parameters Adaptive Mixtures 9.5 Generating Random Variables 9.6 MATLAB® Code 9.7 Further Reading Exercises

301 303 303 309 311 316 322 322 327 329 331 333 335 338 343 348 356 357 359

Chapter 10 Supervised Learning 10.1 Introduction 10.2 Bayes Decision Theory Estimating Class-Conditional Probabilities: Parametric Method Estimating Class-Conditional Probabilities: Nonparametric Bayes Decision Rule Likelihood Ratio Approach 10.3 Evaluating the Classifier Independent Test Sample Cross-Validation Receiver Operating Characteristic (ROC) Curve 10.4 Classification Trees Growing the Tree Pruning the Tree Choosing the Best Tree Other Tree Methods 10.5 Combining Classifiers Bagging Boosting Arcing Classifiers Random Forests 10.6 MATLAB® Code

363 365 367 369 370 377 380 380 382 385 390 394 399 403 412 414 415 417 420 422 423

xii

Computational Statistics Handbook with MATLAB9, 2ND Edition

10.7 Further Reading Exercises

424 428

Chapter 11 Unsupervised Learning 11.1 Introduction 11.2Measuresof Distance 11.3 Hierarchical Clustering 11.4 K-Means Clustering 11.5 Model-Based Clustering Finite Mixture Models and the EM Algorithm Model-Based Agglomerative Clustering Bayesian Information Criterion Model-Based Clustering Procedure 11.6 Assessing Cluster Results Mojena - Upper Tail Rule Silhouette Statistic Other Methods for Evaluating Clusters 11.7 MATLAB® Code 11.8 Further Reading Exercises

431 432 434 442 445 446 450 453 453 458 458 459 462 465 466 469

C h a p t e r 12 Parametric M o d e l s 12.1 Introduction 12.2 Spline Regression Models 12.3 Logistic Regression Creating the Model Interpreting the Model Parameters 12.4 Generalized Linear Models Exponential Family Form Generalized Linear Model Model Checking 12.5 MATLAB® Code 12.6 Further Reading Exercises

471 477 482 482 487 488 489 494 498 508 509 511

C h a p t e r 13 Nonparametric M o d e l s 13.1 Introduction 13.2 Some Smoothing Methods Bin Smoothing RunningMean

513 514 515 517

Table ofContents

xiii

Running Line Local Polynomial Regression - Loess Robust Loess 13.3 Kernel Methods Nadaraya-Watson Estimator Local Linear Kernel Estimator 13.4 Smoothing Splines Natural Cubic Splines Reinsch Method for Finding Smoothing Splines Values for a Cubic Smoothing Spline Weighted Smoothing Spline 13.5 Nonparametric Regression - Other Details Choosing the Smoothing Parameter Estimation of the Residual Variance Variability of Smooths 13.6 Regression Trees Growing a Regression Tree Pruning a Regression Tree Selecting a Tree 13.7 Additive Models 13.8 MATLAB® Code 13.9 Further Reading Exercises

518 519 525 528 531 532 534 536 537 540 540 542 542 547 548 551 553 557 557 563 567 570 573

Chapter 14 Markov Chain Monte Carlo Methods 14.1 Introduction 14.2 Background Bayesian Inference Monte Carlo Integration Markov Chains Analyzing the Output 14.3 Metropolis-Hastings Algorithms Metropolis-Hastings Sampler Metropolis Sampler Independence Sampler Autoregressive Generating Density 14.4 The Gibbs Sampler 14.5 Convergence Monitoring Gelman and Rubin Method Raftery and Lewis Method 14.6 MATLAB® Code 14.7 Further Reading Exercises

575 576 576 577 579 580 580 581 584 587 589 592 602 604 607 609 610 612

xiv

Computational Statistics Handbook with MATLAB®, 2ND Edition

Chapter 15 Spatial Statistics 15.1 Introduction What Is Spatial Statistics? Types of Spatial Data Spatial Point Patterns Complete Spatial Randomness 15.2 Visualizing Spatial Point Processes 15.3 Exploring First-order and Second-order Properties Estimating the Intensity Estimating the Spatial Dependence 15.4 Modeling Spatial Point Processes Nearest Neighbor Distances IC-Function 15.5 Simulating Spatial Point Processes Homogeneous Poisson Process Binomial Process Poisson Cluster Process Inhibition Process Strauss Process 15.6 MATLAB® Code 15.7 Further Reading Exercises

617 617 618 619 621 623 627 627 630 638 638 643 646 647 650 651 654 656 658 659 661

Appendix A Introduction to MATLAB® A.l What Is MATLAB®? A.2 Getting Help in MATLAB® A.3 File and Workspace Management A.4 Punctuation in MATLAB® A.5 Arithmetic Operators A.6 Data Constructs in MATLAB® Basic Data Constructs Building Arrays CellArrays A.7 Script Files and Functions A.8 Control Flow For Loop WhileLoop If-Else Statements Switch Statement A.9 Simple Plotting A.10 Contact Information

663 664 664 666 666 668 668 668 669 670 672 672 672 673 673 673 676

Table ofContents

xv

Appendix B Projection Pursuit Indexes B.l Indexes Friedman-Tukey Index Entropy Index Moment Index L 2 Distances B.2 MATLAB® Source Code

677 677 678 678 679 680

Appendix C MATLAB® Statistics Toolbox File I/O Dataset Arrays GroupedData Descriptive Statistics Statistical Visualization Probability Density Functions Cumulative Distribution Functions Inverse Cumulative Distribution Functions Distribution Statistics Functions Distribution Fitting Functions Negative Log-Likelihood Functions Random Number Generators Hypothesis Tests Analysis of Variance Regression Analysis Multivariate Methods Cluster Analysis Classification Markov Models Design of Experiments Statistical Process Control Graphical User Interfaces

687 687 687 688 688 689 690 691 691 692 692 693 694 694 694 695 696 696 696 697 697 697

Appendix D Computational Statistics Toolbox Probability Distributions Statistics Random Number Generation Exploratory Data Analysis Bootstrap and Jackknife Probability Density Estimation Supervised Learning Unsupervised Learning

699 699 700 700 701 701 701 701

xvi

Computational Statistics Handbook with MATLAB®, 2ND Edition

Parametric and Nonparametric Models Markov Chain Monte Carlo Spatial Statistics

702 702 702

Appendix E Exploratory Data Analysis Toolboxes E.l Introduction E.2 Exploratory Data Analysis Toolbox E.3 EDA GUI Toolbox

703 704 705

Appendix F Data Sets Introduction

719

Appendix G Notation Overview ObservedData Greek Letters Functions and Distributions Matrix Notation Statistics

727 727 728 728 729 729

References Author Index Subject Index

731 751 757

Suggest Documents