preface xiii
         1 introduction 1
         data analysis 1
         what’s in this book 2
         what’s with theworkshops? 3
         what’s with the math? 4
         what you’ll need 5
         what’smissing 6
         part i graphics: looking at data
         2 a single variable: shape and distribution 11
         dot and jitter plots 12
         histograms and kernel density estimates 14
         the cumulative distribution function 23
         rank-order plots and lift charts 30
         only when appropriate: summary statistics and box plots 33
         workshop: numpy 38
         further reading 45
         3 two variables: establishing relationships 47
         scatter plots 47
         conquering noise: smoothing 48
         .logarithmic plots 57
         banking 61
         linear regression and all that 62
         showing what’s important 66
         graphical analysis and presentation graphics 68
         workshop: matplotlib 69
         further reading 78
         4 time as a variable: time-series analysis 79
         examples 79
         the task 83
         smoothing 84
         don’t overlook the obvious! 90
         the correlation function 91
         optional: filters and convolutions 95
         workshop: scipy.signal 96
         further reading 98
         5 more than two variables: graphical multivariate analysis 99
         false-color plots 100
         a lot at a glance: multiplots 105
         composition problems 110
         novel plot types 116
         interactive explorations 120
         workshop: tools for multivariate graphics 123
         further reading 125
         6 intermezzo: a data analysis session 127
         a data analysis session 127
         workshop: gnuplot 136
         further reading 138
         part ii analytics: modeling data
         7 guesstimation and the back of the envelope 141
         principles of guesstimation 142
         how good are those numbers? 151
         optional: a closer look at perturbation theory and
         error propagation 155
         workshop: the gnu scientific library (gsl) 158
         further reading 161
         8 models from scaling arguments 163
         models 163
         arguments from scale 165
         mean-field approximations 175
         common time-evolution scenarios 178
         case study: how many servers are best? 182
         why modeling? 184
         workshop: sage 184
         further reading 188
         9 arguments from probability models 191
         the binomial distribution and bernoulli trials 191
         the gaussian distribution and the central limit theorem 195
         power-law distributions and non-normal statistics 201
         other distributions 206
         optional: case study—unique visitors over time 211
         workshop: power-law distributions 215
         further reading 218
         10 what you really need to know about classical statistics 221
         genesis 221
         statistics defined 223
         statistics explained 226
         controlled experiments versus observational studies 230
         optional: bayesian statistics—the other point of view 235
         workshop: r 243
         further reading 249
         11 intermezzo: mythbusting—bigfoot, least squares,
         and all that 253
         how to average averages 253
         the standard deviation 256
         least squares 260
         further reading 264
         part iii computation: mining data
         12 simulations 267
         awarm-up question 267
         monte carlo simulations 270
         resampling methods 276
         workshop: discrete event simulations with simpy 280
         further reading 291
         13 finding clusters 293
         what constitutes a cluster? 293
         distance and similarity measures 298
         clustering methods 304
         pre- and postprocessing 311
         other thoughts 314
         a special case:market basket analysis 316
         aword ofwarning 319
         workshop: pycluster and the c clustering library 320
         further reading 324
         14 seeing the forest for the trees: finding
         important attributes 327
         principal component analysis 328
         visual techniques 337
         kohonen maps 339
         workshop: pca with r 342
         further reading 348
         15 intermezzo: when more is different 351
         a horror story 353
         some suggestions 354
         what about map/reduce? 356
         workshop: generating permutations 357
         further reading 358
         part iv applications: using data
         16 reporting, business intelligence, and dashboards 361
         business intelligence 362
         corporate metrics and dashboards 369
         data quality issues 373
         workshop: berkeley db and sqlite 376
         further reading 381
         17 financial calculations and modeling 383
         the time value of money 384
         uncertainty in planning and opportunity costs 391
         cost concepts and depreciation 394
         should you care? 398
         is this all that matters? 399
         workshop: the newsvendor problem 400
         further reading 403
         18 predictive analytics 405
         introduction 405
         some classification terminology 407
         algorithms for classification 408
         the process 419
         the secret sauce 423
         the nature of statistical learning 424
         workshop: two do-it-yourself classifiers 426
         further reading 431
         19 epilogue: facts are not reality 433
         a programming environments for scientific computation
         and data analysis 435
         software tools 435
         a catalog of scientific software 437
         writing your own 443
         further reading 444
         b results from calculus 447
         common functions 448
         calculus 460
         useful tricks 468
         notation and basic math 472
         where to go from here 479
         further reading 481
         c working with data 485
         sources for data 485
         cleaning and conditioning 487
         sampling 489
         data file formats 490
         the care and feeding of your data zoo 492
         skills 493
         terminology 495
         further reading 497
         index 499
      · · · · · ·     (
收起)