实时语音处理实践指南

实时语音处理实践指南 pdf epub mobi txt 电子书 下载 2025

出版者:电子工业出版社
作者:葛世超
出品人:博文视点
页数:352
译者:
出版时间:2020-4
价格:99
装帧:平装
isbn号码:9787121387593
丛书系列:博文视点AI系列
图书标签:
  • 计算机
  • 信号处理
  • webrtc
  • 语音处理
  • 实时通信
  • 音频处理
  • 信号处理
  • 嵌入式系统
  • 音视频技术
  • 工程实践
  • 技术指南
  • 语音识别
  • 语音合成
想要找书就要到 小美书屋
立刻按 ctrl+D收藏本页
你会得到大惊喜!!

具体描述

本书主要介绍基于互联网场景的交互式实时语音处理流程,内容涉及智能语音助手、智能音箱、音/视频会议等,具体包括实时语音信号处理、数字音效、网络传输编/解码和语音唤醒识别四部分。在阐述各部分的内容时,本书从基本概念和原理入手,将理论和实践相结合,并细致分析了极具商业价值的实例,以帮助读者了解相关算法在工程上是如何实现的。另外,为便于有兴趣的读者快速进行算法验证并将其改进和应用到实际的项目中,作者也开源了书中算法的源码。

作者简介

葛世超,硕士,毕业于西安电子科技大学雷达国防重点实验室,先后任职于阿里巴巴和rokid,从事语音算法工作。

吕强,学士,吉林大学通信工程专业毕业,原微鲸电视系统软件音频专家。

钱思冲 武汉理工大学博士,2016年至2018年在rokid从事麦克风阵列信号研究,目前主要研究语音信号盲源分离。

张博伦,硕士研究生,毕业于中国海洋大学海底科学与探测技术教育部重点实验室。毕业后先后从事水声、音频信号处理等工作。

张硕,毕业于西安电子科技大学和法国高等电力学院,先后任职于诺基亚和Rokid,从事语音算法相关工作。

目录信息

绪论······································································· 1
第1章 信号处理··············································· 7
1.1 数字和模拟频率··········································· 7
1.2 离散傅里叶变换···········································8
1.2.1 实数DFT ······································ 9
1.2.2 复数DFT ···································· 10
1.2.3 负频分量····································· 10
1.2.4 DFT变换性质···························· 10
1.3 FFT···························································· 11
1.3.1 FFT 结果举例····························· 12
1.3.2 实信号FFT································· 13
1.3.3 短时傅里叶变换························· 14
1.3.4 STFT语音窗函数选择··············· 14
1.4 重叠相加法和重叠保留法·························· 16
1.4.1 OLA············································· 17
1.4.2 OLS ············································· 19
1.5 加权重叠相加法········································· 21
1.5.1 WOLA 计算过程························ 22
1.5.2 WOLA 窗函数选择···················· 22
1.6 滤波器组···················································· 23
1.7 语音预加重····································· 27
1.8 高斯分布···················································· 27
1.8.1 单高斯分布································· 27
1.8.2 多维高斯分布····························· 29
1.9 HMM模型················································· 31
1.10 卡尔曼滤波·············································· 32
本章小结······························································ 33
参考文献······························································ 33
第2章 发音机理和器件································ 34
2.1 语音的产生和接收········································· 34
2.1.1 语音产生机理····························· 34
2.1.2 发声模型····································· 36
2.1.3 发音单位····································· 36
2.1.4 发音分类····································· 37
2.1.5 声音接收····································· 37
2.1.6 声音传播····································· 38
2.2 扬声器························································ 38
2.2.1 电学性能····································· 38
2.2.2 声学性能····································· 39
2.2.3 底噪············································· 40
2.2.4 频响特性····································· 41
2.2.5 THD+N POUT···························· 41
2.2.6 电压(功率)和失真················· 42
2.3 麦克风························································ 42
2.3.1 麦克风性能指标························· 42
2.3.2 麦克风的选择····························· 43
2.4 结构设计····················································45
2.4.1 扬声器相关音腔设计················· 45
2.4.2 麦克风和扬声器························· 45
2.5 音频设备···················································· 46
2.5.1 听音设备····································· 46
2.5.2 声场表现力································· 47
2.5.3 发声设备····································· 48
2.5.4 消声室测试································· 48
2.6 声学测试···················································· 49
2.6.1 声学音量····································· 50
2.6.2 失真度THD································ 50
2.6.3 频响混叠····································· 51
2.6.4 麦克风阵列一致性····················· 53
2.6.5 AEC参考通路···························· 54
2.6.6 扬声器镜频································· 56
2.6.7 扬声器最大幅度下的THD········ 57
本章小结······························································ 58
参考文献······························································ 58
第3章 语音端点检测····································· 59
3.1 特征选取···················································· 59
3.2 判决准则···················································· 61
3.2.1 门限············································· 61
3.2.2 统计模型法································· 61
3.2.3 机器学习法································· 62
3.3 VAD 实例·················································· 63
3.3.1 高斯分布····································· 63
3.3.2 算法流程····································· 63
3.3.3 计算流程····································· 68
3.4 语音/非语音帧的初始参数························· 75
3.4.1 模型参数计算····························· 75
3.4.2 高斯混合模型····························· 76
3.4.3 EM算法······································ 76
本章小结······························································ 78
参考文献······························································ 78
第4章 单通道降噪········································· 79
4.1 谱减法························································ 79
4.1.1 谱减法原理································· 79
4.1.2 谱减法实现································· 81
4.1.3 音乐噪声控制····························· 83
4.1.4 滤波法········································· 83
4.2 维纳滤波···················································· 84
4.3 子空间降噪················································ 86
4.4 WebRTC 单通道降噪实现······················· 87
4.4.1 算法原理····································· 87
4.4.2 算法初始化································· 88
4.4.3 信噪比计算:ComputeSnr ········ 90
4.4.4 语音噪声概率计算····················· 91
4.4.5 特征选取····································· 94
4.4.6 平坦度计算································· 96
4.4.7 噪声估计更新函数:
UpdateNoiseEstimate················ 97
4.4.8 消除噪声····································· 98
4.4.9 信号合成····································· 99
4.4.10 仿真结果··································· 99
4.5 深度学习降噪········································· 101
本章小结···························································· 104
参考文献···························································· 105
第5章 声学回声消除·································· 106
5.1 回声消除原理·········································· 106
5.2 自适应滤波器·········································· 108
5.2.1 维纳滤波器······························· 108
5.2.2 LMS算法································· 109
5.2.3 NLMS算法······························· 110
5.2.4 PBFDAF 算法··························· 111
5.3 WebRTC 回声消除算法························ 113
5.3.1 延迟估计··································· 113
5.3.2 自适应滤波······························· 114
5.3.3 非线性处理(NLP)··············· 117
5.3.4 MATLAB代码解读················· 118
5.3.5 仿真实验··································· 127
5.4 Speex 回声消除算法······························ 128
5.4.1 变步长计算······························· 129
5.4.2 双线性滤波器及预处理··········· 130
5.4.3 MATLAB代码解读················· 132
5.4.4 算法流程示意图······················· 141
5.4.5 仿真实验··································· 144
本章小结···························································· 146
参考文献···························································· 146
第6章 声源定位··········································· 147
6.1 GCC算法······················ 147
6.2 SRP-PHAT算法··································· 149
6.3 MUSIC算法············································ 150
6.4 TOPS 算法·············································· 152
6.5 FRIDA算法············································· 154
6.6 后处理抗噪·············································· 155
6.6.1 统计方法··································· 155
6.6.2 卡尔曼方法······························· 156
6.6.3 声源定位建模··························· 158
6.6.4 粒子滤波法······························· 160
本章小结···························································· 160
参考文献···························································· 161
第7章 波束形成技术··································· 162
7.1 麦克风阵列·············································· 163
7.1.1 麦克风数量和间距··················· 163
7.1.2 空域混叠··································· 165
7.1.3 波束形成指标··························· 165
7.1.4 噪声场······································· 166
7.1.5 声辐射······································· 167
7.2 常见波束形成方法··································· 168
7.2.1 延迟和波束形成方法··············· 168
7.2.2 滤波和波束形成方法··············· 169
7.2.3 恒定宽度波束形成方法··········· 169
7.2.4 超分辨波束形成方法··············· 170
7.2.5 广义旁瓣相消波束形成方法··· 171
7.2.6 最小方差信号无畸变响应波束形成方法················· 172
7.3 WebRTC 波束形成实例························ 174
7.3.1 编译测试文件··························· 174
7.3.2 测试文件处理流程··················· 175
7.3.3 测试命令··································· 176
7.3.4 算法的基本思想······················· 176
7.3.5 测试源码··································· 178
7.3.6 算法处理流程··························· 181
7.3.7 权重计算函数··························· 185
7.3.8 权重相乘操作··························· 186
7.4 后置滤波(Post-filtering) ·················· 187
7.4.1 MMSE后置滤波······················ 189
7.4.2 Zelinski 后置滤波····················· 190
7.4.3 mccowan后置滤波·················· 191
7.4.4 STSA后置滤波························ 192
本章小结···························································· 193
参考文献···························································· 194
第8章 盲源分离··········································· 196
8.1 基本概念及数学预备知识······················· 196
8.1.1 ICA基本概念··························· 196
8.1.2 梯度和最优化方法··················· 197
8.2 盲语音分离预处理——PCA··················· 199
8.3 频域独立成分分析法——FDICA··········· 200
8.3.1 频域ICA··································· 200
8.3.2 去相关估计方法······················· 200
8.3.3 不确定性问题··························· 201
8.4 后置滤波处理··········································· 205
8.4.1 噪声估计··································· 205
8.4.2 衰减因子计算··························· 206
8.5 GSC 与ICA联合估计···························· 209
8.5.1 峭度··········································· 209
8.5.2 经典GSC·································· 210
8.5.3 动态权重向量估计··················· 210
本章小结···························································· 212
参考文献···························································· 213
第9章 音效处理··········································· 214
9.1 声道的分类·············································· 214
9.1.1 单声道······································· 214
9.1.2 双声道······································· 215
9.1.3 立体声······································· 215
9.1.4 多声道······································· 215
9.1.5 全景声······································· 216
9.2 后端音效处理··········································· 217
本章小结···························································· 226
参考文献···························································· 226
第10章 语音编/解码··································· 227
10.1 LPC 编码·············································· 230
10.2 SILK编/解码········································· 231
10.2.1 编码参数································· 232
10.2.2 编码器····································· 234
10.2.3 解码器····································· 239
10.3 opus 编/解码概览································· 239
10.3.1 opus 解码································ 242
10.3.2 opus 编码································ 243
10.3.3 opus 语音/音乐检测·············· 244
10.4 语音质量评估········································ 247
10.4.1 主观测试································· 248
10.4.2 客观测试································· 248
10.4.3 无参考质量评估····················· 249
本章小结···························································· 249
参考文献···························································· 249
第11章 语音网络传输································ 251
11.1 拥塞控制················································ 252
11.1.1 GoogleCC拥塞控制··············· 255
11.1.2 基于PCC的拥塞控制··········· 260
11.1.3 基于BBR 的拥塞控制··········· 264
11.2 NetEQ ·················································· 266
11.2.1 NetEQ原理····························· 266
11.2.2 抖动和收包····························· 268
11.2.3 NetEQ代码框架····················· 269
11.2.4 延迟计算································· 272
11.2.5 DSP 处理································ 274
11.2.6 变速不变调····························· 275
本章小结···························································· 277
参考文献···························································· 277
第12章 语音唤醒········································ 278
12.1 语音唤醒技术简介································· 278
12.2 特征提取················································ 279
12.2.1 FBank ······································ 279
12.2.2 MFCC······································ 283
12.2.3 PCEN ······································ 284
12.3 模型结构················································ 284
12.3.1 DNN ········································ 284
12.3.2 CNN ········································ 286
12.3.3 CRNN······································ 287
12.3.4 DSCNN ··································· 288
12.3.5 子带CNN ······························· 289
12.3.6 Attention·································· 290
12.4 计算加速················································ 292
12.4.1 硬件资源评估························· 292
12.4.2 加速方向································· 294
本章小结···························································· 299
参考文献···························································· 299
第13章 语音识别········································ 301
13.1 语音特征提取········································ 303
13.1.1 MFCC特征····························· 304
13.1.2 PLP 特征································· 305
13.1.3 归一化····································· 306
13.2 声学模型················································ 306
13.2.1 高斯混合模型························· 307
13.2.2 参数估计································· 307
13.2.3 隐马尔科夫模型····················· 308
13.2.4 Baum-Welch法······················· 309
13.2.5 HMM识别器·························· 309
13.3 语言模型················································ 310
13.3.1 N-gram语言模型··················· 311
13.3.2 加权有限状态转换机············· 312
13.4 YES 和NO识别实例···························312
13.4.1 数据准备································· 312
13.4.2 数据预处理····························· 313
13.4.3 词汇和发音词典····················· 314
13.4.4 语言学模型····························· 315
13.4.5 特征提取································· 319
13.4.6 声学模型训练························· 320
13.4.7 解码和测试····························· 321
13.5 Kaldi 中文语音识别······························321
13.5.1 数据集准备····························· 321
13.5.2 声学模型训练························· 322
13.5.3 安装portaudio ························ 322
13.5.4 在线识别································· 323
13.6 DeepSpeech 语音识别······················· 324
13.6.1 识别建模································· 325
13.6.2 网络组成································· 325
13.6.3 模型训练和部署····················· 326
本章小结···························································· 330
参考文献···························································· 330
附录A 本书涉及的专业术语··························· 331
· · · · · · (收起)

读后感

评分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

评分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

评分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

评分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

评分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

用户评价

评分

学得太宽泛笼统,很一般

评分

概念描述晦涩难懂,逻辑性太差。什么都是浅尝辄止,算不上一本好书。

评分

概念描述晦涩难懂,逻辑性太差。什么都是浅尝辄止,算不上一本好书。

评分

学得太宽泛笼统,很一般

评分

概念描述晦涩难懂,逻辑性太差。什么都是浅尝辄止,算不上一本好书。

本站所有内容均为互联网搜索引擎提供的公开搜索信息,本站不存储任何数据与内容,任何内容与数据均与本站无关,如有需要请联系相关搜索引擎包括但不限于百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美书屋 版权所有