Hadoop: The Definitive Guide

Hadoop: The Definitive Guide pdf epub mobi txt 電子書 下載2025

出版者:O'Reilly Media
作者:Tom White
出品人:
頁數:756
译者:
出版時間:2015-4-11
價格:USD 49.99
裝幀:Paperback
isbn號碼:9781491901632
叢書系列:
圖書標籤:
  • Hadoop
  • 大數據
  • BigData
  • 計算機
  • 分布式
  • hadoop
  • 機器學習
  • O'Reilly
  • Hadoop
  • 大數據
  • 分布式係統
  • 雲計算
  • 編程
  • 開源
  • 數據處理
  • 集群
  • 架構
  • 指南
想要找書就要到 小美書屋
立刻按 ctrl+D收藏本頁
你會得到大驚喜!!

具體描述

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.

Learn fundamental components such as MapReduce, HDFS, and YARN

Explore MapReduce in depth, including steps for developing applications with it

Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN

Learn two data formats: Avro for data serialization and Parquet for nested data

Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)

Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop

Learn the HBase distributed database and the ZooKeeper distributed configuration service

著者簡介

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

圖書目錄

Hadoop Fundamentals
Chapter 1Meet Hadoop
Data!
Data Storage and Analysis
Querying All Your Data
Beyond Batch
Comparison with Other Systems
A Brief History of Apache Hadoop
What’s in This Book?
Chapter 2MapReduce
A Weather Dataset
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Scaling Out
Hadoop Streaming
Chapter 3The Hadoop Distributed Filesystem
The Design of HDFS
HDFS Concepts
The Command-Line Interface
Hadoop Filesystems
The Java Interface
Data Flow
Parallel Copying with distcp
Chapter 4YARN
Anatomy of a YARN Application Run
YARN Compared to MapReduce 1
Scheduling in YARN
Further Reading
Chapter 5Hadoop I/O
Data Integrity
Compression
Serialization
File-Based Data Structures
MapReduce
Chapter 1Developing a MapReduce Application
The Configuration API
Setting Up the Development Environment
Writing a Unit Test with MRUnit
Running Locally on Test Data
Running on a Cluster
Tuning a Job
MapReduce Workflows
Chapter 2How MapReduce Works
Anatomy of a MapReduce Job Run
Failures
Shuffle and Sort
Task Execution
Chapter 3MapReduce Types and Formats
MapReduce Types
Input Formats
Output Formats
Chapter 4MapReduce Features
Counters
Sorting
Joins
Side Data Distribution
MapReduce Library Classes
Hadoop Operations
Chapter 1Setting Up a Hadoop Cluster
Cluster Specification
Cluster Setup and Installation
Hadoop Configuration
Security
Benchmarking a Hadoop Cluster
Chapter 2Administering Hadoop
HDFS
Monitoring
Maintenance
Related Projects
Chapter 1Avro
Avro Data Types and Schemas
In-Memory Serialization and Deserialization
Avro Datafiles
Interoperability
Schema Resolution
Sort Order
Avro MapReduce
Sorting Using Avro MapReduce
Avro in Other Languages
Chapter 2Parquet
Data Model
Parquet File Format
Parquet Configuration
Writing and Reading Parquet Files
Parquet MapReduce
Chapter 3Flume
Installing Flume
An Example
Transactions and Reliability
The HDFS Sink
Fan Out
Distribution: Agent Tiers
Sink Groups
Integrating Flume with Applications
Component Catalog
Further Reading
Chapter 4Sqoop
Getting Sqoop
Sqoop Connectors
A Sample Import
Generated Code
Imports: A Deeper Look
Working with Imported Data
Importing Large Objects
Performing an Export
Exports: A Deeper Look
Further Reading
Chapter 5Pig
Installing and Running Pig
An Example
Comparison with Databases
Pig Latin
User-Defined Functions
Data Processing Operators
Pig in Practice
Further Reading
Chapter 6Hive
Installing Hive
An Example
Running Hive
Comparison with Traditional Databases
HiveQL
Tables
Querying Data
User-Defined Functions
Further Reading
Chapter 7Crunch
An Example
The Core Crunch API
Pipeline Execution
Crunch Libraries
Further Reading
Chapter 8Spark
Installing Spark
An Example
Resilient Distributed Datasets
Shared Variables
Anatomy of a Spark Job Run
Executors and Cluster Managers
Further Reading
Chapter 9HBase
HBasics
Concepts
Installation
Clients
Building an Online Query Application
HBase Versus RDBMS
Praxis
Further Reading
Chapter 10ZooKeeper
Installing and Running ZooKeeper
An Example
The ZooKeeper Service
Building Applications with ZooKeeper
ZooKeeper in Production
Further Reading
Case Studies
Chapter 1Composable Data at Cerner
From CPUs to Semantic Integration
Enter Apache Crunch
Building a Complete Picture
Integrating Healthcare Data
Composability over Frameworks
Moving Forward
Chapter 2Biological Data Science: Saving Lives with Software
The Structure of DNA
The Genetic Code: Turning DNA Letters into Proteins
Thinking of DNA as Source Code
The Human Genome Project and Reference Genomes
Sequencing and Aligning DNA
ADAM, A Scalable Genome Analysis Platform
From Personalized Ads to Personalized Medicine
Join In
Chapter 3Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
Appendix Installing Apache Hadoop
Prerequisites
Installation
Configuration
Appendix Cloudera’s Distribution Including Apache Hadoop
Appendix Preparing the NCDC Weather Data
Appendix The Old and New Java MapReduce APIs
Case Studies
Chapter 1Composable Data at Cerner
From CPUs to Semantic Integration
Enter Apache Crunch
Building a Complete Picture
Integrating Healthcare Data
Composability over Frameworks
Moving Forward
Chapter 2Biological Data Science: Saving Lives with Software
The Structure of DNA
The Genetic Code: Turning DNA Letters into Proteins
Thinking of DNA as Source Code
The Human Genome Project and Reference Genomes
Sequencing and Aligning DNA
ADAM, A Scalable Genome Analysis Platform
From Personalized Ads to Personalized Medicine
Join In
Chapter 3Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
Appendix Installing Apache Hadoop
Prerequisites
Installation
Configuration
Appendix Cloudera’s Distribution Including Apache Hadoop
Appendix Preparing the NCDC Weather Data
Appendix The Old and New Java MapReduce APIs
· · · · · · (收起)

讀後感

評分

专门登录来评论的,翻译也太烂了吧,真的真的建议强烈英语阅读能力好的人去读原版书,不要花冤枉钱在这上面,除了文字错误外,里边的图居然也有错,就比如260页的图最后两个年份应该是1901结果这里竟然是1900,我是真滴服了,一本神书被翻译成这样,作者得气死。zsbd zsbd zsbd...  

評分

評分

很多地方翻译的不行,需要对照英文看才能明白。。。不过对于快速学习,仍然是不错的选择。建议译者看看每部分内容的重要性,不重要的瞎翻翻就算了,重要的部分还是好好花点功夫,不要本末倒置了。比如第三章的数据流部分,这么经典的地方居然被翻译烂的一塌糊涂。不知道译者会...  

評分

很好的Hadoop教程,比Apache和Yahoo !网页版guide详细很多,很多想不明白的Hadoop实现细节都可以在这本书里找到。  

評分

你的履历添了一笔<hadoop权威指南>译者,但是你不配 这是我见过的最不用心的翻译, 字里行间行文不通顺, 请别勉强自己,map reduce shuffle机制都没翻译的好 虽然原作者写作功底也实在是一般 第 1 2 5 6 7 这几章 翻译的实在是太烂了 请不要呐Google翻译糊弄人阿 误人子弟 ...  

用戶評價

评分

T^T 買瞭很厚的影印版

评分

2016 NO.4 深入淺齣,原理講的非常透徹。核心是 Hadoop Fundamentals 和 MapReduce 兩章,但是後麵的 Related Projects 也寫的言簡意賅,能夠突齣重點。比如 Flume 這一章會提到一些在 Flume 官網教程上也沒提到的要點。

评分

很全,主要是前兩部分,尤其mapreduce部分,後麵的那些cluster和各種相關項目的其實可以隻做瀏覽,講得也不是很細,用的時候看apache的說明文檔就好

评分

經典

评分

閱讀瞭第1,2部分,算是對Hadoop有瞭基本的認知,接下來需要結閤實際項目夯實。其他相關的技術如Hive,HBase,Spark也需要去學習。

本站所有內容均為互聯網搜索引擎提供的公開搜索信息,本站不存儲任何數據與內容,任何內容與數據均與本站無關,如有需要請聯繫相關搜索引擎包括但不限於百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美書屋 版权所有