Spark內核設計的藝術

Spark內核設計的藝術 pdf epub mobi txt 電子書 下載2025

出版者:機械工業齣版社
作者:耿嘉安
出品人:
頁數:690
译者:
出版時間:2018-1-1
價格:139.00
裝幀:平裝
isbn號碼:9787111584391
叢書系列:大數據技術叢書
圖書標籤:
  • Spark
  • 大數據
  • 計算機
  • 分布式
  • spark
  • Scala
  • 軟件工程
  • Spark
  • 分布式計算
  • 大數據
  • 內核設計
  • 編程
  • 架構
  • 高性能
  • 可擴展
  • 並發
  • 雲計算
想要找書就要到 小美書屋
立刻按 ctrl+D收藏本頁
你會得到大驚喜!!

具體描述

多位專傢聯袂推薦,360大數據專傢撰寫,基於Spark 2.1.0剖析架構與實現精髓。細化到方法級,提煉齣多個流程圖,立體呈現架構、環境、調度、存儲、計算、部署、API七大核心設計。本書一共有10章內容,主要包括以下部分。

準備部分(第1~2章):簡單介紹瞭Spark的環境搭建和基本原理。本部分通過詳盡的描述,有效降低瞭讀者進入Spark世界的門檻,同時能對Spark背景知識及整體設計有宏觀的認識。

基礎部分(第3~5章):介紹Spark的基礎設施(包括配置、RPC、度量等)、SparkContext的初始化、Spark執行所需要的環境等內容。經過此部分的學習,將能夠對RPC框架的設計、執行環境的功能有深入的理解,這也是對核心內容瞭解的前提。

核心部分(第6~9章):為Spark最核心的部分,包括存儲體係、調度係統、計算引擎、部署模式等。通過本部分的學習,讀者將充分瞭解Spark的數據處理體係細節,能夠對Spark核心功能進行擴展、性能優化以及對綫上問題進行精準排查。

API部分(第10章):這部分主要對Spark的新老API進行對比,對新API進行簡單介紹。

著者簡介

耿嘉安

10餘年IT行業相關經驗。先後就職於阿裏巴巴、藝龍、360,專注於開源和大數據領域。在大量的工作實踐中,對J2EE、JVM、Tomcat、Spring、Hadoop、Spark、MySQL、Redis都有深入研究,尤其喜歡剖析開源項目的源碼實現。早期從事J2EE企業級應用開發,對Java相關技術有獨到見解。著有《深入理解Spark:核心思想與源碼分析》一書。

圖書目錄

目錄 Contents
本書贊譽
前言
第1章 環境準備 ········································1
1.1 運行環境準備 ···········································2
1.1.1 安裝JDK ·········································2
1.1.2 安裝Scala ········································2
1.1.3 安裝Spark ·······································3
1.2 Spark初體驗 ···································4
1.2.1 運行spark-shell ·······························4
1.2.2 執行word count ······························5
1.2.3 剖析spark-shell ·······························9
1.3 閱讀環境準備 ·········································14
1.3.1 安裝SBT ·······································15
1.3.2 安裝Git ·········································15
1.3.3 安裝Eclipse Scala IDE插件 ········15
1.4 Spark源碼編譯與調試 ·························17
1.5 小結 ···························23
第2章 設計理念與基本架構 ···············24
2.1 初識Spark ··································25
2.1.1 Hadoop MRv1的局限···················25
2.1.2 Spark的特點 ·································26
2.1.3 Spark使用場景 ·····························28
2.2 Spark基礎知識 ······································29
2.3 Spark基本設計思想 ·····························31
2.3.1 Spark模塊設計 ·····························32
2.3.2 Spark模型設計 ·····························34
2.4 Spark基本架構 ···································36
2.5 小結 ·································38
第3章 Spark基礎設施 ·························39
3.1 Spark配置 ········································40
3.1.1 係統屬性中的配置 ·······················40
3.1.2 使用SparkConf配置的API ·········41
3.1.3 剋隆SparkConf配置 ····················42
3.2 Spark內置RPC框架 ····························42
3.2.1 RPC配置TransportConf ··············45
3.2.2 RPC客戶端工廠Transport- ClientFactory ·······················47
3.2.3 RPC服務端TransportServer ········53
3.2.4 管道初始化 ···································56
3.2.5 TransportChannelHandler詳解 ·····57
3.2.6 服務端RpcHandler詳解 ··············63
3.2.7 服務端引導程序Transport-ServerBootstrap ·····················68
3.2.8 客戶端TransportClient詳解 ········71
3.3 事件總綫 ····································78
3.3.1 ListenerBus的繼承體係 ···············79
3.3.2 SparkListenerBus詳解 ··················80
3.3.3 LiveListenerBus詳解 ····················83
3.4 度量係統 ···········································87
3.4.1 Source繼承體係 ···························87
3.4.2 Sink繼承體係 ·······························89
3.5 小結 ·········································92
第4章 SparkContext的初始化 ·········93
4.1 SparkContext概述 ·································94
4.2 創建Spark環境 ·····································97
4.3 SparkUI的實現 ····································100
4.3.1 SparkUI概述 ·······························100
4.3.2 WebUI框架體係 ·························102
4.3.3 創建SparkUI ·······························107
4.4 創建心跳接收器 ··································111
4.5 創建和啓動調度係統··························112
4.6 初始化塊管理器BlockManager ·······114
4.7 啓動度量係統 ·······························114
4.8 創建事件日誌監聽器··························115
4.9 創建和啓動ExecutorAllocation-Manager ··························116
4.10 ContextCleaner的創建與啓動 ········120
4.10.1 創建ContextCleaner ·················120
4.10.2 啓動ContextCleaner ·················120
4.11 額外的SparkListener與啓動事件總綫 ··························122
4.12 Spark環境更新 ··································123
4.13 SparkContext初始化的收尾 ···········127
4.14 SparkContext提供的常用方法 ·······128
4.15 SparkContext的伴生對象················130
4.16 小結 ····································131
第5章 Spark執行環境 ························132
5.1 SparkEnv概述 ·································133
5.2 安全管理器SecurityManager ············133
5.3 RPC環境 ·········································135
5.3.1 RPC端點RpcEndpoint ···············136
5.3.2 RPC端點引用RpcEndpointRef ···139
5.3.3 創建傳輸上下文TransportConf ···142
5.3.4 消息調度器Dispatcher ···············142
5.3.5 創建傳輸上下文Transport-Context ·························154
5.3.6 創建傳輸客戶端工廠Transport-ClientFactory ····················159
5.3.7 創建TransportServer ···················160
5.3.8 客戶端請求發送 ·························162
5.3.9 NettyRpcEnv中的常用方法 ·······173
5.4 序列化管理器SerializerManager ·····175
5.5 廣播管理器BroadcastManager ·········178
5.6 map任務輸齣跟蹤器 ··························185
5.6.1 MapOutputTracker的實現 ··········187
5.6.2 MapOutputTrackerMaster的實現原理 ·······················191
5.7 構建存儲體係 ·······································199
5.8 創建度量係統 ·······································201
5.8.1 MetricsCon?g詳解 ·····················203
5.8.2 MetricsSystem中的常用方法 ····207
5.8.3 啓動MetricsSystem ····················209
5.9 輸齣提交協調器 ··································211
5.9.1 OutputCommitCoordinator-Endpoint的實現 ··················211
5.9.2 OutputCommitCoordinator的實現 ··························212
5.9.3 OutputCommitCoordinator的工作原理 ························216
5.10 創建SparkEnv ····································217
5.11 小結 ·····································217
第6章 存儲體係 ·····································219
6.1 存儲體係概述 ·······································220
6.1.1 存儲體係架構 ·····························220
6.1.2 基本概念 ·····································222
6.2 Block信息管理器 ································227
6.2.1 Block鎖的基本概念 ···················227
6.2.2 Block鎖的實現 ···························229
6.3 磁盤Block管理器 ······························234
6.3.1 本地目錄結構 ·····························234
6.3.2 DiskBlockManager提供的方法 ···························236
6.4 磁盤存儲DiskStore ·····························239
6.5 內存管理器 ·····································242
6.5.1 內存池模型 ·································243
6.5.2 StorageMemoryPool詳解 ···········244
6.5.3 MemoryManager模型 ················247
6.5.4 Uni?edMemoryManager詳解 ····250
6.6 內存存儲MemoryStore ······················252
6.6.1 MemoryStore的內存模型 ··········253
6.6.2 MemoryStore提供的方法 ··········255
6.7 塊管理器BlockManager ····················265
6.7.1 BlockManager的初始化 ·············265
6.7.2 BlockManager提供的方法 ·········266
6.8 BlockManagerMaster對Block-Manager的管理 ·················285
6.8.1 BlockManagerMaster的職責 ······285
6.8.2 BlockManagerMasterEndpoint詳解 ·································286
6.8.3 BlockManagerSlaveEndpoint詳解 ·····························289
6.9 Block傳輸服務 ····································290
6.9.1 初始化NettyBlockTransfer-Service ···························291
6.9.2 NettyBlockRpcServer詳解 ·········292
6.9.3 Shuf?e客戶端 ·····························296
6.10 DiskBlockObjectWriter詳解 ···········305
6.11 小結 ·······································308
第7章 調度係統 ·····································309
7.1 調度係統概述 ·······································310
7.2 RDD詳解 ·····································312
7.2.1 為什麼需要RDD ························312
7.2.2 RDD實現的初次分析 ················313
7.2.3 RDD依賴 ····································316
7.2.4 分區計算器Partitioner················318
7.2.5 RDDInfo ······································320
7.3 Stage詳解 ········································321
7.3.1 ResultStage的實現 ·····················322
7.3.2 Shuf?eMapStage的實現 ·············323
7.3.3 StageInfo ······································324
7.4 麵嚮DAG的調度器DAGScheduler ···326
7.4.1 JobListener與JobWaiter ·············326
7.4.2 ActiveJob詳解 ····························328
7.4.3 DAGSchedulerEventProcessLoop的簡要介紹 ·······················328
7.4.4 DAGScheduler的組成 ················329
7.4.5 DAGScheduler提供的常用方法 ···330
7.4.6 DAGScheduler與Job的提交 ····334
7.4.7 構建Stage····································337
7.4.8 提交ResultStage ························341
7.4.9 提交還未計算的Task ·················343
7.4.10 DAGScheduler的調度流程 ······347
7.4.11 Task執行結果的處理 ··············348
7.5 調度池Pool ······································351
7.5.1 調度算法 ·······························352
7.5.2 Pool的實現 ·································354
7.5.3 調度池構建器 ·····························357
7.6 任務集閤管理器TaskSetManager ···363
7.6.1 Task集閤 ·····································363
7.6.2 TaskSetManager的成員屬性 ······364
7.6.3 調度池與推斷執行 ·····················366
7.6.4 Task本地性 ·································370
7.6.5 TaskSetManager的常用方法 ······373
7.7 運行器後端接口LauncherBackend ···383
7.7.1 BackendConnection的實現 ········384
7.7.2 LauncherBackend的實現 ···········386
7.8 調度後端接口SchedulerBackend ····389
7.8.1 SchedulerBackend的定義 ··········389
7.8.2 LocalSchedulerBackend的實現分析 ································390
7.9 任務結果獲取器TaskResultGetter ···394
7.9.1 處理成功的Task ·························394
7.9.2 處理失敗的Task ·························396
7.10 任務調度器TaskScheduler ··············397
7.10.1 TaskSchedulerImpl的屬性 ·····397
7.10.2 TaskSchedulerImpl的初始化 ···399
7.10.3 TaskSchedulerImpl的啓動 ·····399
7.10.4 TaskSchedulerImpl與Task的提交 ·······················400
7.10.5 TaskSchedulerImpl與資源分配 ···························402
7.10.6 TaskSchedulerImpl的調度流程 ······························405
7.10.7 TaskSchedulerImpl對執行結果的處理 ·····························406
7.10.8 TaskSchedulerImpl的常用方法 ···409
7.11 小結 ·······································412
第8章 計算引擎 ·····································413
8.1 計算引擎概述 ·······································414
8.2 內存管理器與執行內存 ·····················417
8.2.1 ExecutionMemoryPool詳解 ·······417
8.2.2 MemoryManager模型與執行內存 ··························420
8.2.3 Uni?edMemoryManager與執行內存 ·······················421
8.3 內存管理器與Tungsten ·····················423
8.3.1 MemoryBlock詳解 ·····················423
8.3.2 MemoryManager模型與Tungsten ···························425
8.3.3 Tungsten的內存分配器 ··············425
8.4 任務內存管理器 ··································431
8.4.1 TaskMemoryManager詳解 ·········431
8.4.2 內存消費者 ·······················439
8.4.3 執行內存整體架構 ·····················441
8.5 Task詳解 ······································443
8.5.1 任務上下文TaskContext ············443
8.5.2 Task的定義 ·································446
8.5.3 Shuf?eMapTask的實現 ··············449
8.5.4 ResultTask的實現 ·······················450
8.6 IndexShuf?eBlockResolver詳解 ······451
8.7 采樣與估算 ···········································455
8.7.1 SizeTracker的實現分析 ·············455
8.7.2 SizeTracker的工作原理 ·············457
8.8 特質WritablePartitionedPair- Collection ······················458
8.9 AppendOnlyMap的實現分析 ···········460
8.9.1 AppendOnlyMap的容量增長 ····461
8.9.2 AppendOnlyMap的數據更新 ····462
8.9.3 AppendOnlyMap的緩存聚閤算法 ·····························464
8.9.4 AppendOnlyMap的內置排序 ····466
8.9.5 AppendOnlyMap的擴展 ············467
8.10 PartitionedPairBuffer的實現分析 ···469
8.10.1 PartitionedPairBuffer的容量增長 ······················469
8.10.2 PartitionedPairBuffer的插入 ···470
8.10.3 PartitionedPairBuffer的迭代器 ···471
8.11 外部排序器 ·········································472
8.11.1 ExternalSorter詳解 ·················473
8.11.2 Shuf?eExternalSorter詳解 ······487
8.12 Shuf?e管理器 ····································490
8.12.1 Shuf?eWriter詳解 ··················491
8.12.2 Shuf?eBlockFetcherIterator詳解 ······························502
8.12.3 BlockStoreShuf?eReader詳解 ···510
8.12.4 SortShuf?eManager詳解 ········513
8.13 map端與reduce端的Shuf?e組閤 ······························516
8.14 小結 ·········································519
第9章 部署模式 ········································520
9.1 心跳接收器HeartbeatReceiver ·········521
9.2 Executor的實現分析 ··························527
9.2.1 Executor的心跳報告 ··················528
9.2.2 運行Task ·····································530
9.3 local部署模式 ······································535
9.4 持久化引擎PersistenceEngine ··········537
9.4.1 基於文件係統的持久化引擎 ·····539
9.4.2 基於ZooKeeper的持久化引擎 ···541
9.5 領導選舉代理 ·······································542
9.6 Master詳解 ···········································546
9.6.1 啓動Master ·································549
9.6.2 檢查Worker超時························553
9.6.3 被選舉為領導時的處理 ·············554
9.6.4 一級資源調度 ·····························558
9.6.5 注冊Worker·································568
9.6.6 更新Worker的最新狀態············570
9.6.7 處理Worker的心跳····················570
9.6.8 注冊Application··························571
9.6.9 處理Executor的申請 ·················573
9.6.10 處理Executor的狀態變化 ·······573
9.6.11 Master的常用方法 ···················574
9.7 Worker詳解 ································578
9.7.1 啓動Worker·································581
9.7.2 嚮Master注冊Worker ···············584
9.7.3 嚮Master發送心跳 ····················589
9.7.4 Worker與領導選舉·····················591
9.7.5 運行Driver ··································593
9.7.6 運行Executor ······························594
9.7.7 處理Executor的狀態變化 ·········599
9.8 StandaloneAppClient實現 ·················600
9.8.1 ClientEndpoint的實現分析 ········601
9.8.2 StandaloneAppClient的實現分析 ······························606
9.9 StandaloneSchedulerBackend的實現分析 ························607
9.9.1 StandaloneSchedulerBackend的屬性 ····························607
9.9.2 DriverEndpoint的實現分析 ·······609
9.9.3 StandaloneSchedulerBackend的啓動 ··························614
9.9.4 StandaloneSchedulerBackend的停止 ·························617
9.9.5 StandaloneSchedulerBackend與資源分配 ················618
9.10 CoarseGrainedExecutorBackend詳解 ····························619
9.10.1 CoarseGrainedExecutorBackend進程 ··························620
9.10.2 CoarseGrainedExecutorBackend的功能分析 ·························622
9.11 local-cluster部署模式 ·······················625
9.11.1 啓動本地集群 ····························625
9.11.2 local-cluster部署模式的啓動過程 ·································627
9.11.3 local-cluster部署模式下Executor的分配過程 ·················628
9.11.4 local-cluster部署模式下的任務提交執行過程 ····························629
9.12 Standalone部署模式 ·························631
9.12.1 Standalone部署模式的啓動過程 ························632
9.12.2 Standalone部署模式下Executor的分配過程 ················634
9.12.3 Standalone部署模式的資源迴收 ·····························635
9.12.4 Standalone部署模式的容錯機製 ······························636
9.13 其他部署方案 ·····································639
9.13.1 YARN·········································639
9.13.2 Mesos ·········································644
9.14 小結 ·······································646
第10章 Spark API ································647
10.1 基本概念·····································648
10.2 數據源DataSource ····························650
10.2.1 DataSourceRegister詳解 ··········650
10.2.2 DataSource詳解 ························651
10.3 檢查點的實現 ···································655
10.3.1 CheckpointRDD的實現············655
10.3.2 RDDCheckpointData的實現 ····660
10.3.3 ReliableRDDCheckpointData的實現 ························662
10.4 RDD的再次分析 ·······························663
10.4.1 轉換API ····································663
10.4.2 動作API ····································665
10.4.3 檢查點API的實現分析 ···········667
10.4.4 迭代計算 ···································669
10.5 數據集閤Dataset ·······························671
10.6 DataFrameReader詳解 ·····················673
10.7 SparkSession詳解 ·····························676
10.7.1 SparkSession的構建器Builder ···676
10.7.2 SparkSession的API ·················679
10.8 word count例子 ·································679
10.8.1 Job準備階段 ·····························680
10.8.2 Job的提交與調度 ·····················685
10.9 小結 ········································689
附錄 ···········································690
· · · · · · (收起)

讀後感

評分

評分

評分

評分

評分

用戶評價

评分

為瞭講解代碼而講解代碼,原理性的知識很少有,對不起價格

评分

瑣碎淩亂,全是代碼片段。。。

评分

太繁瑣瞭 作者技術很強 語言組織功力不夠

评分

瑣碎淩亂,全是代碼片段。。。

评分

適閤那些想徹底瞭解細節的人,照著書看,可以瞭解很多的細節

本站所有內容均為互聯網搜索引擎提供的公開搜索信息,本站不存儲任何數據與內容,任何內容與數據均與本站無關,如有需要請聯繫相關搜索引擎包括但不限於百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美書屋 版权所有