圖書標籤: bigdata 數據挖掘 大數據 計算機 data manning 編程 big
发表于2024-10-21
Big Data pdf epub mobi txt 電子書 下載 2024
Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database. Complexity increases with scale and demand, and handling big data is not as simple as just doubling down on your RDBMS or rolling out some trendy new technology. Fortunately, scalability and simplicity are not mutually exclusive—you just need to take a different approach. Big data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers.
Big Data teaches you to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy to understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.
Big Data shows you how to build the back-end for a real-time service called SuperWebAnalytics.com—our version of Google Analytics. As you read, you'll discover that many standard RDBMS practices become unwieldy with large-scale data. To handle the complexities of Big Data and distributed systems, you must drastically simplify your approach. This book introduces a general framework for thinking about big data, and then shows how to apply technologies like Hadoop, Thrift, and various NoSQL databases to build simple, robust, and efficient systems to handle it.
Nathan Marz is an engineer at Twitter. He was previously Lead Engineer at BackType, a marketing intelligence company, that was acquired by Twitter in July of 2011. He is the author of two major open source projects: Storm, a distributed realtime computation system, and Cascalog, a tool for processing data on Hadoop. He is a frequent speaker and writes a blog at nathanmarz.com.
Sam Ritchie is an engineer at Twitter who uses Cascalog and ElephantDB to process and analyze many terabytes of data in near real-time. He is also the lead developer on FORMA, an open-source deforestation monitoring system in use by a number of top research institutions. He is a committer on Cascalog, ElephantDB, Pallet and a number of other open source Clojure projects.
真不怎麼樣 ,lambda 這概念早就過時瞭 實踐起來也很難。
評分storm創始人關於real-time+batch的最一綫的介紹,看完前兩章差不多可以推斷齣來整個lambda架構的內容,作者夾帶私貨有點多
評分一看就犯睏 對於非行業的外行看看講理論的章節就有很多收獲瞭
評分草草看完瞭,思路上清晰瞭一點,但感悟還是不夠深,需要把每一個提到的東西稍微研究一下纔行…
評分非常hands on的一本介紹distributed system開發的書。
1. 大名鼎鼎的 Lambda 架构作者的书; 2. 喜欢这样条分缕析的思路 3. Human-fault tolerance is not optional 4. example 有点多余, 信息冗杂读较高 4. Lambda 架构 serving layer 对 normalization/denormalization 解决的的确很好 5. 如果能够在刚接触大数据的时候读这本书, ...
評分1. 大名鼎鼎的 Lambda 架构作者的书; 2. 喜欢这样条分缕析的思路 3. Human-fault tolerance is not optional 4. example 有点多余, 信息冗杂读较高 4. Lambda 架构 serving layer 对 normalization/denormalization 解决的的确很好 5. 如果能够在刚接触大数据的时候读这本书, ...
評分很早就听说了大名鼎鼎的Lambda Architecture,但是一直不明白具体的含义。就算读了wikipedia ( https://en.wikipedia.org/wiki/Lambda_architecture ),依然只明其表而不懂其里。好在有这本《Big Data - Principles and Best Practices of Scalable Runtime Data Systems》给予...
評分本书由大数据专家撰写。 我知道这点,因为我从事数据销毁相关的工作十年了。 现在我读了这本书,我发现我的所有问题都在本书中得到解决。 事实上,所讨论的每个问题都出现在我的管道中,好像作者在我的项目中与我一起工作。另一本对我来说非常有用的功能是它是第一本我可以找到...
評分很早就听说了大名鼎鼎的Lambda Architecture,但是一直不明白具体的含义。就算读了wikipedia ( https://en.wikipedia.org/wiki/Lambda_architecture ),依然只明其表而不懂其里。好在有这本《Big Data - Principles and Best Practices of Scalable Runtime Data Systems》给予...
Big Data pdf epub mobi txt 電子書 下載 2024