圖書標籤: 數據分析 統計 R 機器學習 數據挖掘 數學 Statistics
发表于2024-12-24
Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning (Multivariate Analysis) (Vol pdf epub mobi txt 電子書 下載 2024
Although there are several good books on unsupervised machine learning, we felt that many of them are too theoretical. This book provides practical guide to cluster analysis, elegant visualization and interpretation. It contains 5 parts. Part I provides a quick introduction to R and presents required R packages, as well as, data formats and dissimilarity measures for cluster analysis and visualization. Part II covers partitioning clustering methods, which subdivide the data sets into a set of k groups, where k is the number of groups pre-specified by the analyst. Partitioning clustering approaches include: K-means, K-Medoids (PAM) and CLARA algorithms. In Part III, we consider hierarchical clustering method, which is an alternative approach to partitioning clustering. The result of hierarchical clustering is a tree-based representation of the objects called dendrogram. In this part, we describe how to compute, visualize, interpret and compare dendrograms. Part IV describes clustering validation and evaluation strategies, which consists of measuring the goodness of clustering results. Among the chapters covered here, there are: Assessing clustering tendency, Determining the optimal number of clusters, Cluster validation statistics, Choosing the best clustering algorithms and Computing p-value for hierarchical clustering. Part V presents advanced clustering methods, including: Hierarchical k-means clustering, Fuzzy clustering, Model-based clustering and Density-based clustering.
About the Author
Alboukadel Kassambara is a PhD in Bioinformatics and Cancer Biology. He works since many years on genomic data analysis and visualization. He created a bioinformatics tool named GenomicScape (www.genomicscape.com) which is an easy-to-use web tool for gene expression data analysis and visualization. He developed also a website called STHDA (Statistical Tools for High-throughput Data Analysis, www.sthda.com/english), which contains many tutorials on data analysis and visualization using R software and packages. He is the author of the R packages survminer (for analyzing and drawing survival curves), ggcorrplot (for drawing correlation matrix using ggplot2) and factoextra (to easily extract and visualize the results of multivariate analysis such PCA, CA, MCA and clustering). You can learn more about these packages at: http://www.sthda.com/english/wiki/r-packages. Recently, he published two books on data visualization: i) Guide to Create Beautiful Graphics in R (at: https://goo.gl/vJ0OYb); 2) Complete Guide to 3D Plots in R (at: https://goo.gl/v5gwl0).
Read more
這本書實在是太好瞭,把常用的聚類方法簡潔地講瞭一遍,以及它們的評價方法、優缺點和適用場景。也介紹瞭一些有趣的包——再次贊美ggplot2,以及factoextra這種直接生成ggplot2對象的包,看到+geom_violin()的時候就不禁贊嘆R社區真的很棒啊!
評分實用。清晰。解釋的不夠詳盡但是足夠上手
評分實用。清晰。解釋的不夠詳盡但是足夠上手
評分https://github.com/kassambara/factoextra
評分實用。清晰。解釋的不夠詳盡但是足夠上手
評分
評分
評分
評分
Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning (Multivariate Analysis) (Vol pdf epub mobi txt 電子書 下載 2024