基於開源工具的數據分析

基於開源工具的數據分析 pdf epub mobi txt 電子書 下載2025

出版者:東南大學
作者:Philipp K. Janert
出品人:
頁數:509
译者:
出版時間:2011-5
價格:82.00元
裝幀:
isbn號碼:9787564126742
叢書系列:
圖書標籤:
  • 數據分析
  • 數據挖掘
  • 機器學習
  • 基於開源工具的數據分析
  • 算法
  • 計算機
  • Programming
  • 推薦係統
  • 數據分析
  • 開源工具
  • 數據可視化
  • Python
  • 統計分析
  • 機器學習
  • 數據清洗
  • 商業智能
  • 大數據
  • 數據挖掘
想要找書就要到 小美書屋
立刻按 ctrl+D收藏本頁
你會得到大驚喜!!

具體描述

數據收集相對比較簡單,而要把原始信息轉化為有用的數據則需要你知道如何精確地抽取你想要的內容。通過這《基於開源工具的數據分析(影印版)》的深入講解,那些對數據分析感興趣的中等或者富有經驗的程序員將可以學習到在商業環境中與數據打交道的技術。你將瞭解到如何觀察數據來找齣它所包含的信息,如何在概念模型裏捕捉到這些想法,然後把你的理解通過商業計劃、度量標準的精確報告和其他方式反饋給你所在的機構。

你將會通過每章結束部分的動手實踐來慢慢體驗各種概念。最重要的是,你將瞭解到如何思考你所希望獲取的數據——而不是依賴於工具來替你思考。

. 使用圖形來描述帶有一個、兩個或者十多個變量的數據

. 使用粗略計算以及維度和概率參數來開發概念模型

. 使用諸如模擬和聚類的集約計算方法來挖掘數據

. 通過報告、信息闆和其他度量程序來讓你的結論更容易理解

. 理解財務計算,包括貨幣時間價值

. 利用降維技術或者預測分析來剋服數據分析過程中麵臨的挑戰

. 熟悉數據分析的不同開源編程環境

著者簡介

PhilippcK.cJanert目前提供數據分析和數學模型的谘詢服務,1他曾經是物理學傢和軟件工程師.a他是《GnuplotcincAction:UnderstandingcDatacwithcGraphs》c(Manning齣版)的作者,c他為O’ReillycNetwork,cIBMcdeveloperWorks和IEEEcSoftware寫過文章.a他擁有Washington大學理論物理學的博士學位

圖書目錄

preface xiii
1 introduction 1
data analysis 1
what’s in this book 2
what’s with theworkshops? 3
what’s with the math? 4
what you’ll need 5
what’smissing 6
part i graphics: looking at data
2 a single variable: shape and distribution 11
dot and jitter plots 12
histograms and kernel density estimates 14
the cumulative distribution function 23
rank-order plots and lift charts 30
only when appropriate: summary statistics and box plots 33
workshop: numpy 38
further reading 45
3 two variables: establishing relationships 47
scatter plots 47
conquering noise: smoothing 48
.logarithmic plots 57
banking 61
linear regression and all that 62
showing what’s important 66
graphical analysis and presentation graphics 68
workshop: matplotlib 69
further reading 78
4 time as a variable: time-series analysis 79
examples 79
the task 83
smoothing 84
don’t overlook the obvious! 90
the correlation function 91
optional: filters and convolutions 95
workshop: scipy.signal 96
further reading 98
5 more than two variables: graphical multivariate analysis 99
false-color plots 100
a lot at a glance: multiplots 105
composition problems 110
novel plot types 116
interactive explorations 120
workshop: tools for multivariate graphics 123
further reading 125
6 intermezzo: a data analysis session 127
a data analysis session 127
workshop: gnuplot 136
further reading 138
part ii analytics: modeling data
7 guesstimation and the back of the envelope 141
principles of guesstimation 142
how good are those numbers? 151
optional: a closer look at perturbation theory and
error propagation 155
workshop: the gnu scientific library (gsl) 158
further reading 161
8 models from scaling arguments 163
models 163
arguments from scale 165
mean-field approximations 175
common time-evolution scenarios 178
case study: how many servers are best? 182
why modeling? 184
workshop: sage 184
further reading 188
9 arguments from probability models 191
the binomial distribution and bernoulli trials 191
the gaussian distribution and the central limit theorem 195
power-law distributions and non-normal statistics 201
other distributions 206
optional: case study—unique visitors over time 211
workshop: power-law distributions 215
further reading 218
10 what you really need to know about classical statistics 221
genesis 221
statistics defined 223
statistics explained 226
controlled experiments versus observational studies 230
optional: bayesian statistics—the other point of view 235
workshop: r 243
further reading 249
11 intermezzo: mythbusting—bigfoot, least squares,
and all that 253
how to average averages 253
the standard deviation 256
least squares 260
further reading 264
part iii computation: mining data
12 simulations 267
awarm-up question 267
monte carlo simulations 270
resampling methods 276
workshop: discrete event simulations with simpy 280
further reading 291
13 finding clusters 293
what constitutes a cluster? 293
distance and similarity measures 298
clustering methods 304
pre- and postprocessing 311
other thoughts 314
a special case:market basket analysis 316
aword ofwarning 319
workshop: pycluster and the c clustering library 320
further reading 324
14 seeing the forest for the trees: finding
important attributes 327
principal component analysis 328
visual techniques 337
kohonen maps 339
workshop: pca with r 342
further reading 348
15 intermezzo: when more is different 351
a horror story 353
some suggestions 354
what about map/reduce? 356
workshop: generating permutations 357
further reading 358
part iv applications: using data
16 reporting, business intelligence, and dashboards 361
business intelligence 362
corporate metrics and dashboards 369
data quality issues 373
workshop: berkeley db and sqlite 376
further reading 381
17 financial calculations and modeling 383
the time value of money 384
uncertainty in planning and opportunity costs 391
cost concepts and depreciation 394
should you care? 398
is this all that matters? 399
workshop: the newsvendor problem 400
further reading 403
18 predictive analytics 405
introduction 405
some classification terminology 407
algorithms for classification 408
the process 419
the secret sauce 423
the nature of statistical learning 424
workshop: two do-it-yourself classifiers 426
further reading 431
19 epilogue: facts are not reality 433
a programming environments for scientific computation
and data analysis 435
software tools 435
a catalog of scientific software 437
writing your own 443
further reading 444
b results from calculus 447
common functions 448
calculus 460
useful tricks 468
notation and basic math 472
where to go from here 479
further reading 481
c working with data 485
sources for data 485
cleaning and conditioning 487
sampling 489
data file formats 490
the care and feeding of your data zoo 492
skills 493
terminology 495
further reading 497
index 499
· · · · · · (收起)

讀後感

評分

評分

我统计学没学扎实的还有点搞不懂里面的说的那些理论,上网搜索英文的的更是很难搞懂了,加上里面的里面例子有没有提供数据来源,没有告诉图形是怎么做出来的,所以书的内容和标题有点南辕北辙啊。 但是作者提供了一种系统的思路的做数据分析,这可以提供一些思路去学习更细节的...

評分

Don’t let “data” get in the way of ethical decisions. The most important things in life can’t be measured. It is a fallacy to believe that, just because something can’t be measured, it doesn’t matter or doesn’t even exist. And a pretty tragic fallacy...  

評分

书的理论性较强 至少对我我这种不是学统计和学数学出身的人来讲 很多分析和图例没有给出实际的操作过程。 不是很推荐。 感觉作者很专业,讲的也很系统,但是觉得并不是一个入门级的书 要我写多少字才可以啊?  

評分

我统计学没学扎实的还有点搞不懂里面的说的那些理论,上网搜索英文的的更是很难搞懂了,加上里面的里面例子有没有提供数据来源,没有告诉图形是怎么做出来的,所以书的内容和标题有点南辕北辙啊。 但是作者提供了一种系统的思路的做数据分析,这可以提供一些思路去学习更细节的...

用戶評價

评分

又在讀一本看不完的書

评分

又在讀一本看不完的書

评分

又在讀一本看不完的書

评分

又在讀一本看不完的書

评分

又在讀一本看不完的書

本站所有內容均為互聯網搜索引擎提供的公開搜索信息,本站不存儲任何數據與內容,任何內容與數據均與本站無關,如有需要請聯繫相關搜索引擎包括但不限於百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美書屋 版权所有