Data Mining with Rattle and R

Data Mining with Rattle and R pdf epub mobi txt 電子書 下載2025

出版者:Springer
作者:Graham Williams
出品人:
頁數:396
译者:
出版時間:2011-8-4
價格:GBP 49.99
裝幀:Paperback
isbn號碼:9781441998897
叢書系列:
圖書標籤:
  • R
  • 數據挖掘
  • Rattle
  • Programming
  • Mining
  • 計算機科學
  • 計算機技術
  • 方法論
  • 數據挖掘
  • R語言
  • Rattle
  • 機器學習
  • 統計學習
  • 數據分析
  • 商業智能
  • 數據科學
  • 可視化
  • 預測建模
想要找書就要到 小美書屋
立刻按 ctrl+D收藏本頁
你會得到大驚喜!!

具體描述

Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.

Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.

The book covers data understanding, data preparation, data refinement, model building, model evaluation, and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.

著者簡介

Dr Graham Williams is Senior Director of Analytics with the Australian Taxation Office, and previously Principal Computer Scientist for Data Mining with CSIRO. He is also Visiting Professor and Senior International Scientist with the Shenzhen Institutes of Advanced Analytics of the Chinese Academy of Sciences, Adjunct Professor, Data Mining, Fraud Prevention, Security, University of Canberra, and Adjunct Professor, Australian National University. Graham regularly teaches data mining courses and is author of the freely available, open source data mining system, Rattle. He has been involved in many data mining projects for clients from government and industry over his long career. His research developments included ensemble learning (1980's) and hot spots discovery (1990's). He is actively involved in the international artificial intelligence and data mining research communities, particularly as chair of the Pacific Asia Knowledge Discovery and Data Mining conference series and founder and co-chair of the Australasian Data Mining conference series. Graham has editted a number of books and authored many academic and industry papers and reports. His current focus is on making data mining technology readily accessible, ensuring research, innovation and discovery are repeatable and available, and encouraging the free and open sharing of knowledge.

圖書目錄

Contents
Preface vii
I Explorations 1
1 Introduction 3
1.1 Data Mining Beginnings . . . . . . . . . . . . . . . . . . . 5
1.2 The Data Mining Team . . . . . . . . . . . . . . . . . . . 5
1.3 Agile Data Mining . . . . . . . . . . . . . . . . . . . . . . 6
1.4 The Data Mining Process . . . . . . . . . . . . . . . . . . 7
1.5 A Typical Journey . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Insights for Data Mining . . . . . . . . . . . . . . . . . . . 9
1.7 Documenting Data Mining . . . . . . . . . . . . . . . . . . 10
1.8 Tools for Data Mining: R . . . . . . . . . . . . . . . . . . 10
1.9 Tools for Data Mining: Rattle . . . . . . . . . . . . . . . . 11
1.10 Why R and Rattle? . . . . . . . . . . . . . . . . . . . . . . 13
1.11 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Getting Started 21
2.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Quitting Rattle and R . . . . . . . . . . . . . . . . . . . . 24
2.3 First Contact . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Loading a Dataset . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Building a Model . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Understanding Our Data . . . . . . . . . . . . . . . . . . . 31
2.7 Evaluating the Model: Confusion Matrix . . . . . . . . . . 35
2.8 Interacting with Rattle . . . . . . . . . . . . . . . . . . . . 39
2.9 Interacting with R . . . . . . . . . . . . . . . . . . . . . . 43
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.11 Command Summary . . . . . . . . . . . . . . . . . . . . . 55
3 Working with Data 57
3.1 Data Nomenclature . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Sourcing Data for Mining . . . . . . . . . . . . . . . . . . 61
3.3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Data Matching . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Data Warehousing . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Interacting with Data Using R . . . . . . . . . . . . . . . 68
3.7 Documenting the Data . . . . . . . . . . . . . . . . . . . . 71
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.9 Command Summary . . . . . . . . . . . . . . . . . . . . . 74
4 Loading Data 75
4.1 CSV Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 ARFF Data . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 ODBC Sourced Data . . . . . . . . . . . . . . . . . . . . . 84
4.4 R Dataset|Other Data Sources . . . . . . . . . . . . . . 87
4.5 R Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.6 Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Data Options . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 97
5 Exploring Data 99
5.1 Summarising Data . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Basic Summaries . . . . . . . . . . . . . . . . . . . 101
5.1.2 Detailed Numeric Summaries . . . . . . . . . . . . 103
5.1.3 Distribution . . . . . . . . . . . . . . . . . . . . . . 105
5.1.4 Skewness . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.5 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.6 Missing Values . . . . . . . . . . . . . . . . . . . . 106
5.2 Visualising Distributions . . . . . . . . . . . . . . . . . . . 108
5.2.1 Box Plot . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.3 Cumulative Distribution Plot . . . . . . . . . . . . 116
5.2.4 Benford's Law . . . . . . . . . . . . . . . . . . . . 119
5.2.5 Bar Plot . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.6 Dot Plot . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2.7 Mosaic Plot . . . . . . . . . . . . . . . . . . . . . . 122
5.2.8 Pairs and Scatter Plots . . . . . . . . . . . . . . . 123
5.2.9 Plots with Groups . . . . . . . . . . . . . . . . . . 127
5.3 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . 128
5.3.1 Correlation Plot . . . . . . . . . . . . . . . . . . . 128
5.3.2 Missing Value Correlations . . . . . . . . . . . . . 132
5.3.3 Hierarchical Correlation . . . . . . . . . . . . . . . 133
5.4 Command Summary . . . . . . . . . . . . . . . . . . . . . 135
6 Interactive Graphics 137
6.1 Latticist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2 GGobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 148
7 Transforming Data 149
7.1 Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Transforming Data . . . . . . . . . . . . . . . . . . . . . . 153
7.3 Rescaling Data . . . . . . . . . . . . . . . . . . . . . . . . 154
7.4 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.5 Recoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.6 Cleanup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.7 Command Summary . . . . . . . . . . . . . . . . . . . . . 167
II Building Models 169
8 Descriptive and Predictive Analytics 171
8.1 Model Nomenclature . . . . . . . . . . . . . . . . . . . . . 172
8.2 A Framework for Modelling . . . . . . . . . . . . . . . . . 172
8.3 Descriptive Analytics . . . . . . . . . . . . . . . . . . . . . 175
8.4 Predictive Analytics . . . . . . . . . . . . . . . . . . . . . 175
8.5 Model Builders . . . . . . . . . . . . . . . . . . . . . . . . 176
9 Cluster Analysis 179
9.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 180
9.2 Search Heuristic . . . . . . . . . . . . . . . . . . . . . . . 181
9.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 185
9.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.6 Command Summary . . . . . . . . . . . . . . . . . . . . . 191
10 Association Analysis 193
10.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 194
10.2 Search Heuristic . . . . . . . . . . . . . . . . . . . . . . . 195
10.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 197
10.5 Command Summary . . . . . . . . . . . . . . . . . . . . . 203
11 Decision Trees 205
11.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 206
11.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 215
11.5 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 230
11.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 243
12 Random Forests 245
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
12.2 Knowledge Representation . . . . . . . . . . . . . . . . . . 247
12.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 249
12.5 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 261
12.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 268
13 Boosting 269
13.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 270
13.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
13.3 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 272
13.4 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 285
13.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
13.7 Command Summary . . . . . . . . . . . . . . . . . . . . . 291
14 Support Vector Machines 293
14.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 294
14.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.3 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 299
14.4 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 302
14.5 Command Summary . . . . . . . . . . . . . . . . . . . . . 304
III Delivering Performance 305
15 Model Performance Evaluation 307
15.1 The Evaluate Tab: Evaluation Datasets . . . . . . . . . . 308
15.2 Measure of Performance . . . . . . . . . . . . . . . . . . . 312
15.3 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . 314
15.4 Risk Charts . . . . . . . . . . . . . . . . . . . . . . . . . . 315
15.5 ROC Charts . . . . . . . . . . . . . . . . . . . . . . . . . . 320
15.6 Other Charts . . . . . . . . . . . . . . . . . . . . . . . . . 320
15.7 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
16 Deployment 323
16.1 Deploying an R Model . . . . . . . . . . . . . . . . . . . . 323
16.2 Converting to PMML . . . . . . . . . . . . . . . . . . . . 325
16.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 327
IV Appendices 329
A Installing Rattle 331
B Sample Datasets 335
B.1 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
B.1.1 Obtaining Data . . . . . . . . . . . . . . . . . . . . 336
B.1.2 Data Preprocessing . . . . . . . . . . . . . . . . . . 339
B.1.3 Data Cleaning . . . . . . . . . . . . . . . . . . . . 339
B.1.4 Missing Values . . . . . . . . . . . . . . . . . . . . 341
B.1.5 Data Transforms . . . . . . . . . . . . . . . . . . . 343
B.1.6 Using the Data . . . . . . . . . . . . . . . . . . . . 345
B.2 Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
B.2.1 The Adult Survey Dataset . . . . . . . . . . . . . . 347
B.2.2 From Survey to Audit . . . . . . . . . . . . . . . . 348
B.2.3 Generating Targets . . . . . . . . . . . . . . . . . . 349
B.2.4 Finalising the Data . . . . . . . . . . . . . . . . . . 354
B.2.5 Using the Data . . . . . . . . . . . . . . . . . . . . 354
B.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 354
References 357
Index 365
· · · · · · (收起)

讀後感

評分

評分

評分

評分

評分

用戶評價

评分

neat as a toolkit

评分

neat as a toolkit

评分

使用可視化的Rattle工具講解瞭數據挖掘的各個流程,可作為R語言學習的入門教程!

评分

neat as a toolkit

评分

neat as a toolkit

本站所有內容均為互聯網搜索引擎提供的公開搜索信息,本站不存儲任何數據與內容,任何內容與數據均與本站無關,如有需要請聯繫相關搜索引擎包括但不限於百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美書屋 版权所有