第1 部分 起步 ............................................................... 1
         第1 章 理论 .................................................................. 3
         敏捷大数据 ............................................................................................................3
         Big Words 定义 ......................................................................................................4
         敏捷大数据团队 .....................................................................................................5
         认识机遇和问题 ..............................................................................................6
         敏捷大数据流程 ................................................................................................... 11
         代码检查和结对编程 ...........................................................................................12
         敏捷的场所:开发的效率 ....................................................................................13
         协作空间 .......................................................................................................14
         私人空间 .......................................................................................................14
         个人空间 .......................................................................................................14
         用大幅打印件明确表达想法 ................................................................................15
         第2 章 数据 ............................................................... 17
         电子邮件 ..............................................................................................................17
         处理原始数据 ......................................................................................................18
         原始的电子邮件 ............................................................................................18
         结构化与半结构化数据 .................................................................................18
         SQL ......................................................................................................................20
         NoSQL .................................................................................................................24
         序列化 ...........................................................................................................24
         从演变的模式中抽取和展示特征 ..................................................................25
         数据流水线 ...................................................................................................26
         数据透视 ..............................................................................................................27
         社交网络 .......................................................................................................28
         时间序列 .......................................................................................................30
         自然语言 .......................................................................................................31
         概率 ...............................................................................................................33
         小结 .....................................................................................................................35
         第3 章 敏捷开发工具 ................................................... 37
         可扩展性= 简洁...................................................................................................37
         敏捷大数据处理 ...................................................................................................38
         设置运行Python 的虚拟环境 ...............................................................................39
         使用Avro 对事件进行序列化 ..............................................................................40
         在Python 中使用Avro ..................................................................................40
         收集数据 ..............................................................................................................42
         使用Pig 处理数据................................................................................................44
         安装Pig .........................................................................................................45
         使用MongoDB 发布数据 ....................................................................................49
         安装MongoDB ..............................................................................................49
         安装MongoDB 的Java 驱动程序 .................................................................50
         安装mongo-hadoop .......................................................................................50
         用Pig 向MongoDB 推送数据 .......................................................................50
         使用ElasticSearch 搜索数据 ................................................................................52
         安装 ...............................................................................................................52
         使用Wonderdog 整合ElasticSearch 和Pig ...................................................53
         对工作流程的反思 ...............................................................................................55
         轻量级的Web 应用 ..............................................................................................56
         Python 和 Flask .............................................................................................56
         展示数据 ..............................................................................................................58
         安装Bootstrap ...............................................................................................58
         启用Bootstrap ...............................................................................................59
         使用d3.js 和nvd3.js 可视化数据 ..................................................................63
         小结 .....................................................................................................................64
         第4 章 在云端 ............................................................. 65
         引言 .....................................................................................................................65
         GitHub .................................................................................................................67
         dotCloud ...............................................................................................................67
         dotCloud Echo 服务 .......................................................................................68
         Python 工作者服务 ........................................................................................71
         Amazon Web Services ..........................................................................................71
         Simple Storage Service ..................................................................................71
         Elastic MapReduce ........................................................................................72
         MongoDB 即服务 ..........................................................................................79
         辅助工具(Instrumentation) ................................................................................81
         Google Analytics ...........................................................................................81
         Mortar Data ...................................................................................................82
         第2 部分 登上金字塔 ................................................... 85
         第5 章 收集和展示数据 ............................................... 89
         整合软件栈 ..........................................................................................................90
         收集并序列化收件箱 ...........................................................................................90
         处理和发布邮件数据 ...........................................................................................91
         在浏览器中显示邮件 ...........................................................................................93
         用Flask 和pymongo 处理邮件数据 ..............................................................94
         使用Jinja2 渲染HTML5 页面 ......................................................................94
         敏捷检查点 ..........................................................................................................98
         生成电子邮件清单 ...............................................................................................99
         用MongoDB 显示邮件 .................................................................................99
         对数据展示的分析 ...................................................................................... 101
         搜索邮件 ............................................................................................................ 106
         使用Pig,ElasticSearch 和Wonderdog 构建索引 ....................................... 106
         在网页中搜索邮件数据 ............................................................................... 107
         结论 ................................................................................................................... 108
         第6 章 使用图表可视化数据 ....................................... 111
         优秀的图表 ........................................................................................................ 112
         抽取实体:邮件地址 ......................................................................................... 112
         抽取邮件 ..................................................................................................... 112
         对时间进行可视化 ............................................................................................. 116
         结论 ................................................................................................................... 122
         第7 章 利用报表探索数据 .......................................... 123
         为数据添加联系 ................................................................................................. 126
         用TF-IDF 从邮件中提取关键字 ........................................................................ 133
         小结 ................................................................................................................... 138
         第8 章 预测 .............................................................. 141
         预测电子邮件的回复率 ...................................................................................... 142
         个性化 ................................................................................................................ 147
         小结 ................................................................................................................... 148
         第9 章 驱动行动 ........................................................ 149
         好邮件的属性 .................................................................................................... 150
         使用朴素贝叶斯方法进行更好的预测 ............................................................... 150
         P(Reply | From ∩ To) ........................................................................................ 150
         P(Reply | Token) ................................................................................................. 151
         实时预测 ............................................................................................................ 153
         记录事件日志 .................................................................................................... 157
         小结 ................................................................................................................... 157
         索引 ........................................................................... 159
      · · · · · ·     (
收起)