實時語音處理實踐指南

實時語音處理實踐指南 pdf epub mobi txt 電子書 下載2025

出版者:電子工業齣版社
作者:葛世超
出品人:博文視點
頁數:352
译者:
出版時間:2020-4
價格:99
裝幀:平裝
isbn號碼:9787121387593
叢書系列:博文視點AI係列
圖書標籤:
  • 計算機
  • 信號處理
  • webrtc
  • 語音處理
  • 實時通信
  • 音頻處理
  • 信號處理
  • 嵌入式係統
  • 音視頻技術
  • 工程實踐
  • 技術指南
  • 語音識彆
  • 語音閤成
想要找書就要到 小美書屋
立刻按 ctrl+D收藏本頁
你會得到大驚喜!!

具體描述

本書主要介紹基於互聯網場景的交互式實時語音處理流程,內容涉及智能語音助手、智能音箱、音/視頻會議等,具體包括實時語音信號處理、數字音效、網絡傳輸編/解碼和語音喚醒識彆四部分。在闡述各部分的內容時,本書從基本概念和原理入手,將理論和實踐相結閤,並細緻分析瞭極具商業價值的實例,以幫助讀者瞭解相關算法在工程上是如何實現的。另外,為便於有興趣的讀者快速進行算法驗證並將其改進和應用到實際的項目中,作者也開源瞭書中算法的源碼。

著者簡介

葛世超,碩士,畢業於西安電子科技大學雷達國防重點實驗室,先後任職於阿裏巴巴和rokid,從事語音算法工作。

呂強,學士,吉林大學通信工程專業畢業,原微鯨電視係統軟件音頻專傢。

錢思衝 武漢理工大學博士,2016年至2018年在rokid從事麥剋風陣列信號研究,目前主要研究語音信號盲源分離。

張博倫,碩士研究生,畢業於中國海洋大學海底科學與探測技術教育部重點實驗室。畢業後先後從事水聲、音頻信號處理等工作。

張碩,畢業於西安電子科技大學和法國高等電力學院,先後任職於諾基亞和Rokid,從事語音算法相關工作。

圖書目錄

緒論······································································· 1
第1章 信號處理··············································· 7
1.1 數字和模擬頻率··········································· 7
1.2 離散傅裏葉變換···········································8
1.2.1 實數DFT ······································ 9
1.2.2 復數DFT ···································· 10
1.2.3 負頻分量····································· 10
1.2.4 DFT變換性質···························· 10
1.3 FFT···························································· 11
1.3.1 FFT 結果舉例····························· 12
1.3.2 實信號FFT································· 13
1.3.3 短時傅裏葉變換························· 14
1.3.4 STFT語音窗函數選擇··············· 14
1.4 重疊相加法和重疊保留法·························· 16
1.4.1 OLA············································· 17
1.4.2 OLS ············································· 19
1.5 加權重疊相加法········································· 21
1.5.1 WOLA 計算過程························ 22
1.5.2 WOLA 窗函數選擇···················· 22
1.6 濾波器組···················································· 23
1.7 語音預加重····································· 27
1.8 高斯分布···················································· 27
1.8.1 單高斯分布································· 27
1.8.2 多維高斯分布····························· 29
1.9 HMM模型················································· 31
1.10 卡爾曼濾波·············································· 32
本章小結······························································ 33
參考文獻······························································ 33
第2章 發音機理和器件································ 34
2.1 語音的産生和接收········································· 34
2.1.1 語音産生機理····························· 34
2.1.2 發聲模型····································· 36
2.1.3 發音單位····································· 36
2.1.4 發音分類····································· 37
2.1.5 聲音接收····································· 37
2.1.6 聲音傳播····································· 38
2.2 揚聲器························································ 38
2.2.1 電學性能····································· 38
2.2.2 聲學性能····································· 39
2.2.3 底噪············································· 40
2.2.4 頻響特性····································· 41
2.2.5 THD+N POUT···························· 41
2.2.6 電壓(功率)和失真················· 42
2.3 麥剋風························································ 42
2.3.1 麥剋風性能指標························· 42
2.3.2 麥剋風的選擇····························· 43
2.4 結構設計····················································45
2.4.1 揚聲器相關音腔設計················· 45
2.4.2 麥剋風和揚聲器························· 45
2.5 音頻設備···················································· 46
2.5.1 聽音設備····································· 46
2.5.2 聲場錶現力································· 47
2.5.3 發聲設備····································· 48
2.5.4 消聲室測試································· 48
2.6 聲學測試···················································· 49
2.6.1 聲學音量····································· 50
2.6.2 失真度THD································ 50
2.6.3 頻響混疊····································· 51
2.6.4 麥剋風陣列一緻性····················· 53
2.6.5 AEC參考通路···························· 54
2.6.6 揚聲器鏡頻································· 56
2.6.7 揚聲器最大幅度下的THD········ 57
本章小結······························································ 58
參考文獻······························································ 58
第3章 語音端點檢測····································· 59
3.1 特徵選取···················································· 59
3.2 判決準則···················································· 61
3.2.1 門限············································· 61
3.2.2 統計模型法································· 61
3.2.3 機器學習法································· 62
3.3 VAD 實例·················································· 63
3.3.1 高斯分布····································· 63
3.3.2 算法流程····································· 63
3.3.3 計算流程····································· 68
3.4 語音/非語音幀的初始參數························· 75
3.4.1 模型參數計算····························· 75
3.4.2 高斯混閤模型····························· 76
3.4.3 EM算法······································ 76
本章小結······························································ 78
參考文獻······························································ 78
第4章 單通道降噪········································· 79
4.1 譜減法························································ 79
4.1.1 譜減法原理································· 79
4.1.2 譜減法實現································· 81
4.1.3 音樂噪聲控製····························· 83
4.1.4 濾波法········································· 83
4.2 維納濾波···················································· 84
4.3 子空間降噪················································ 86
4.4 WebRTC 單通道降噪實現······················· 87
4.4.1 算法原理····································· 87
4.4.2 算法初始化································· 88
4.4.3 信噪比計算:ComputeSnr ········ 90
4.4.4 語音噪聲概率計算····················· 91
4.4.5 特徵選取····································· 94
4.4.6 平坦度計算································· 96
4.4.7 噪聲估計更新函數:
UpdateNoiseEstimate················ 97
4.4.8 消除噪聲····································· 98
4.4.9 信號閤成····································· 99
4.4.10 仿真結果··································· 99
4.5 深度學習降噪········································· 101
本章小結···························································· 104
參考文獻···························································· 105
第5章 聲學迴聲消除·································· 106
5.1 迴聲消除原理·········································· 106
5.2 自適應濾波器·········································· 108
5.2.1 維納濾波器······························· 108
5.2.2 LMS算法································· 109
5.2.3 NLMS算法······························· 110
5.2.4 PBFDAF 算法··························· 111
5.3 WebRTC 迴聲消除算法························ 113
5.3.1 延遲估計··································· 113
5.3.2 自適應濾波······························· 114
5.3.3 非綫性處理(NLP)··············· 117
5.3.4 MATLAB代碼解讀················· 118
5.3.5 仿真實驗··································· 127
5.4 Speex 迴聲消除算法······························ 128
5.4.1 變步長計算······························· 129
5.4.2 雙綫性濾波器及預處理··········· 130
5.4.3 MATLAB代碼解讀················· 132
5.4.4 算法流程示意圖······················· 141
5.4.5 仿真實驗··································· 144
本章小結···························································· 146
參考文獻···························································· 146
第6章 聲源定位··········································· 147
6.1 GCC算法······················ 147
6.2 SRP-PHAT算法··································· 149
6.3 MUSIC算法············································ 150
6.4 TOPS 算法·············································· 152
6.5 FRIDA算法············································· 154
6.6 後處理抗噪·············································· 155
6.6.1 統計方法··································· 155
6.6.2 卡爾曼方法······························· 156
6.6.3 聲源定位建模··························· 158
6.6.4 粒子濾波法······························· 160
本章小結···························································· 160
參考文獻···························································· 161
第7章 波束形成技術··································· 162
7.1 麥剋風陣列·············································· 163
7.1.1 麥剋風數量和間距··················· 163
7.1.2 空域混疊··································· 165
7.1.3 波束形成指標··························· 165
7.1.4 噪聲場······································· 166
7.1.5 聲輻射······································· 167
7.2 常見波束形成方法··································· 168
7.2.1 延遲和波束形成方法··············· 168
7.2.2 濾波和波束形成方法··············· 169
7.2.3 恒定寬度波束形成方法··········· 169
7.2.4 超分辨波束形成方法··············· 170
7.2.5 廣義旁瓣相消波束形成方法··· 171
7.2.6 最小方差信號無畸變響應波束形成方法················· 172
7.3 WebRTC 波束形成實例························ 174
7.3.1 編譯測試文件··························· 174
7.3.2 測試文件處理流程··················· 175
7.3.3 測試命令··································· 176
7.3.4 算法的基本思想······················· 176
7.3.5 測試源碼··································· 178
7.3.6 算法處理流程··························· 181
7.3.7 權重計算函數··························· 185
7.3.8 權重相乘操作··························· 186
7.4 後置濾波(Post-filtering) ·················· 187
7.4.1 MMSE後置濾波······················ 189
7.4.2 Zelinski 後置濾波····················· 190
7.4.3 mccowan後置濾波·················· 191
7.4.4 STSA後置濾波························ 192
本章小結···························································· 193
參考文獻···························································· 194
第8章 盲源分離··········································· 196
8.1 基本概念及數學預備知識······················· 196
8.1.1 ICA基本概念··························· 196
8.1.2 梯度和最優化方法··················· 197
8.2 盲語音分離預處理——PCA··················· 199
8.3 頻域獨立成分分析法——FDICA··········· 200
8.3.1 頻域ICA··································· 200
8.3.2 去相關估計方法······················· 200
8.3.3 不確定性問題··························· 201
8.4 後置濾波處理··········································· 205
8.4.1 噪聲估計··································· 205
8.4.2 衰減因子計算··························· 206
8.5 GSC 與ICA聯閤估計···························· 209
8.5.1 峭度··········································· 209
8.5.2 經典GSC·································· 210
8.5.3 動態權重嚮量估計··················· 210
本章小結···························································· 212
參考文獻···························································· 213
第9章 音效處理··········································· 214
9.1 聲道的分類·············································· 214
9.1.1 單聲道······································· 214
9.1.2 雙聲道······································· 215
9.1.3 立體聲······································· 215
9.1.4 多聲道······································· 215
9.1.5 全景聲······································· 216
9.2 後端音效處理··········································· 217
本章小結···························································· 226
參考文獻···························································· 226
第10章 語音編/解碼··································· 227
10.1 LPC 編碼·············································· 230
10.2 SILK編/解碼········································· 231
10.2.1 編碼參數································· 232
10.2.2 編碼器····································· 234
10.2.3 解碼器····································· 239
10.3 opus 編/解碼概覽································· 239
10.3.1 opus 解碼································ 242
10.3.2 opus 編碼································ 243
10.3.3 opus 語音/音樂檢測·············· 244
10.4 語音質量評估········································ 247
10.4.1 主觀測試································· 248
10.4.2 客觀測試································· 248
10.4.3 無參考質量評估····················· 249
本章小結···························································· 249
參考文獻···························································· 249
第11章 語音網絡傳輸································ 251
11.1 擁塞控製················································ 252
11.1.1 GoogleCC擁塞控製··············· 255
11.1.2 基於PCC的擁塞控製··········· 260
11.1.3 基於BBR 的擁塞控製··········· 264
11.2 NetEQ ·················································· 266
11.2.1 NetEQ原理····························· 266
11.2.2 抖動和收包····························· 268
11.2.3 NetEQ代碼框架····················· 269
11.2.4 延遲計算································· 272
11.2.5 DSP 處理································ 274
11.2.6 變速不變調····························· 275
本章小結···························································· 277
參考文獻···························································· 277
第12章 語音喚醒········································ 278
12.1 語音喚醒技術簡介································· 278
12.2 特徵提取················································ 279
12.2.1 FBank ······································ 279
12.2.2 MFCC······································ 283
12.2.3 PCEN ······································ 284
12.3 模型結構················································ 284
12.3.1 DNN ········································ 284
12.3.2 CNN ········································ 286
12.3.3 CRNN······································ 287
12.3.4 DSCNN ··································· 288
12.3.5 子帶CNN ······························· 289
12.3.6 Attention·································· 290
12.4 計算加速················································ 292
12.4.1 硬件資源評估························· 292
12.4.2 加速方嚮································· 294
本章小結···························································· 299
參考文獻···························································· 299
第13章 語音識彆········································ 301
13.1 語音特徵提取········································ 303
13.1.1 MFCC特徵····························· 304
13.1.2 PLP 特徵································· 305
13.1.3 歸一化····································· 306
13.2 聲學模型················································ 306
13.2.1 高斯混閤模型························· 307
13.2.2 參數估計································· 307
13.2.3 隱馬爾科夫模型····················· 308
13.2.4 Baum-Welch法······················· 309
13.2.5 HMM識彆器·························· 309
13.3 語言模型················································ 310
13.3.1 N-gram語言模型··················· 311
13.3.2 加權有限狀態轉換機············· 312
13.4 YES 和NO識彆實例···························312
13.4.1 數據準備································· 312
13.4.2 數據預處理····························· 313
13.4.3 詞匯和發音詞典····················· 314
13.4.4 語言學模型····························· 315
13.4.5 特徵提取································· 319
13.4.6 聲學模型訓練························· 320
13.4.7 解碼和測試····························· 321
13.5 Kaldi 中文語音識彆······························321
13.5.1 數據集準備····························· 321
13.5.2 聲學模型訓練························· 322
13.5.3 安裝portaudio ························ 322
13.5.4 在綫識彆································· 323
13.6 DeepSpeech 語音識彆······················· 324
13.6.1 識彆建模································· 325
13.6.2 網絡組成································· 325
13.6.3 模型訓練和部署····················· 326
本章小結···························································· 330
參考文獻···························································· 330
附錄A 本書涉及的專業術語··························· 331
· · · · · · (收起)

讀後感

評分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

評分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

評分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

評分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

評分

概念描述浅尝辄止,无法掌握要点。每一个章节都很难看下去,逻辑性差,感觉就是一些材料的堆叠,前因后果没讲清楚,看得云里雾里。作者既想讲好背后原理,又想具备实操性,结果两样都没讲明白,作者的水平是个问号。我觉得并不需要面面俱到,把几个关键的技术讲透彻,比如回声...

用戶評價

评分

概念描述晦澀難懂,邏輯性太差。什麼都是淺嘗輒止,算不上一本好書。

评分

概念描述晦澀難懂,邏輯性太差。什麼都是淺嘗輒止,算不上一本好書。

评分

概念描述晦澀難懂,邏輯性太差。什麼都是淺嘗輒止,算不上一本好書。

评分

學得太寬泛籠統,很一般

评分

概念描述晦澀難懂,邏輯性太差。什麼都是淺嘗輒止,算不上一本好書。

本站所有內容均為互聯網搜索引擎提供的公開搜索信息,本站不存儲任何數據與內容,任何內容與數據均與本站無關,如有需要請聯繫相關搜索引擎包括但不限於百度google,bing,sogou

© 2025 book.quotespace.org All Rights Reserved. 小美書屋 版权所有