Automatic Speech Recognition pdf epub mobi txt 电子书下载 2025

简体网页||繁体网页

☆☆☆☆☆

出版者:Springer

作者:俞栋

出品人:

页数:321

译者:

出版时间:2014-11-11

价格:USD 99.00

装帧:Hardcover

isbn号码:9781447157786

丛书系列:

图书标签:

机器学习
语音
人工智能
计算机
自动语音识别
综述
CS
自動語音識別
Automatic Speech Recognition
Speech Processing
ASR
Audio Signal Processing
Machine Learning
Sound Recognition
Language Modeling
Speech Technology
Natural Language Processing

下载链接在页面底部

facebook linkedin mastodon messenger pinterest reddit telegram twitter viber vkontakte whatsapp 复制链接

想要找书就要到小美书屋

book.quotespace.org

立刻按 ctrl+D收藏本页

你会得到大惊喜!!

具体描述

This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.

作者简介

俞栋，1998年加入微软公司，现任微软研究院首席研究员、浙江大学兼职教授和中科大客座教授。他是语音识别和深度学习方向的资深专家，出版了两本专著，发表了150多篇论文，是近60项专利的发明人及有广泛影响力的深度学习开源软件CNTK的发起人和主要作者之一。他在基于深度学习的语音识别技术上的工作带来了语音识别研究方向的转变，极大地推动了语音识别领域的发展，并获得2013年IEEE信号处理协会最佳论文奖。俞栋博士现担任IEEE语音语言处理专业委员会委员，曾担任IEEE/ACM音频、语音及语言处理汇刊、IEEE信号处理杂志等期刊的编委。

邓力，世界著名人工智能、机器学习和语音语言信号处理专家，现任微软首席人工智能科学家和深度学习技术中心研究经理。他在美国威斯康星大学先后获硕士和博士学位，然后在加拿大滑铁卢大学任教获得终身正教授。其间，他还任麻省理工学院研究职位。1999年加入微软研究院历任数职，并在2014年初创办深度学习技术中心，主持微软公司和研究院的人工智能和深度学习领域的技术创新。邓立博士的研究方向包括自动语音与说话者识别、口语识别与理解、语音-语音翻译、机器翻译、语言模式、统计方法与机器学习、听觉和其他生物信息处理、深层结构学习、类脑机器智能、图像语言多模态深度学习，商业大数据深度分析等。他在上述领域做出了重大贡献，是ASA（美国声学学会）会士、IEEE（美国电气和电子工程师协会）会士和理事、ISCA（国际语音通信协会）会士，并凭借在深度学习与自动语音识别方向做出的杰出贡献荣获2015年度IEEE 信号处理技术成就奖。同时，他也曾在顶级杂志和会议上发表过与上述领域相关的300余篇学术论文，出版过5部著作，发明及合作发明了超过70多项专利。邓立博士还担任过IEEE信号处理杂志和《音频、语音与语言处理学报》（IEEE/ACMTransactions on Audio, Speech & anguage Processing）的主编。

目录信息

1 Introduction
1.1 Automatic Speech Recognition: A Bridge for Better Communication
1.1.1 Human-Human Communication
1.1.2 Human-Machine Communication
1.2 Basic Architecture of ASR Systems
1.3 Book Organization
1.3.1 Part I: Conventional Acoustic Models
1.3.2 Part II: Deep Neural Networks
1.3.3 Part III: DNN-HMM Hybrid Systems for ASR
1.3.4 Part IV: Representation Learning in Deep Neural Networks
1.3.5 Part V: Advanced Deep Models
References
Part I Conventional Acoustic Models
2 Gaussian Mixture Models
2.1 Random Variables
2.2 Gaussian and Gaussian-Mixture Random Variables
2.3 Parameter Estimation
2.4 Mixture of Gaussians as a Model for the Distribution of Speech Features
References
3 Hidden Markov Models and the Variants
3.1 Introduction
3.2 Markov Chains
3.3 Hidden Markov Sequences and Models
3.3.1 Characterization of a Hidden Markov Model
3.3.2 Simulation of a Hidden Markov Model
3.3.3 Likelihood Evaluation of a Hidden Markov Model
3.3.4 An Algorithm for Efficient Likelihood Evaluation
3.3.5 Proofs of the Forward and Backward Recursions
3.4 EM Algorithm and Its Application to Learning HMM Parameters
3.4.1 Introduction to EM Algorithm
3.4.2 Applying EM to Learning the HMM—Baum-Welch Algorithm
3.5 Viterbi Algorithm for Decoding HMM State Sequences
3.5.1 Dynamic Programming and Viterbi Algorithm
3.5.2 Dynamic Programming for Decoding HMM States
3.6 The HMM and Variants for Generative Speech Modeling and Recognition
3.6.1 GMM-HMMs for Speech Modeling and Recognition
3.6.2 Trajectory and Hidden Dynamic Models for Speech Modeling and Recognition
3.6.3 The Speech Recognition Problem Using Generative Models of HMM and Its Variants
References
Part II Deep Neural Networks
4 Deep Neural Networks
4.1 The Deep Neural Network Architecture
4.2 Parameter Estimation with Error Backpropagation
4.2.1 Training Criteria
4.2.2 Training Algorithms
4.3 Practical Considerations
4.3.1 Data Preprocessing
4.3.2 Model Initialization
4.3.3 Weight Decay
4.3.4 Dropout
4.3.5 Batch Size Selection
4.3.6 Sample Randomization
4.3.7 Momentum
4.3.8 Learning Rate and Stopping Criterion
4.3.9 Network Architecture
4.3.10 Reproducibility and Restartability
References
5 Advanced Model Initialization Techniques
5.1 Restricted Boltzmann Machines
5.1.1 Properties of RBMs
5.1.2 RBM Parameter Learning
5.2 Deep Belief Network Pretraining
5.3 Pretraining with Denoising Autoencoder
5.4 Discriminative Pretraining
5.5 Hybrid Pretraining
5.6 Dropout Pretraining
References
Part III Deep Neural Network-Hidden MarkovModel Hybrid Systems for AutomaticSpeech Recognition
6 Deep Neural Network-Hidden Markov Model Hybrid Systems
6.1 DNN-HMM Hybrid Systems
6.1.1 Architecture
6.1.2 Decoding with CD-DNN-HMM
6.1.3 Training Procedure for CD-DNN-HMMs
6.1.4 Effects of Contextual Window
6.2 Key Components in the CD-DNN-HMM and Their Analysis
6.2.1 Datasets and Baselines for Comparisons and Analysis
6.2.2 Modeling Monophone States or Senones
6.2.3 Deeper Is Better
6.2.4 Exploit Neighboring Frames
6.2.5 Pretraining
6.2.6 Better Alignment Helps
6.2.7 Tuning Transition Probability
6.3 Kullback-Leibler Divergence-Based HMM
References
7 Training and Decoding Speedup
7.1 Training Speedup
7.1.1 Pipelined Backpropagation Using Multiple GPUs
7.1.2 Asynchronous SGD
7.1.3 Augmented Lagrangian Methods and Alternating Directions Method of Multipliers
7.1.4 Reduce Model Size
7.1.5 Other Approaches
7.2 Decoding Speedup
7.2.1 Parallel Computation
7.2.2 Sparse Network
7.2.3 Low-Rank Approximation
7.2.4 Teach Small DNN with Large DNN
7.2.5 Multiframe DNN
References
8 Deep Neural Network Sequence-Discriminative Training
8.1 Sequence-Discriminative Training Criteria
8.1.1 Maximum Mutual Information
8.1.2 Boosted MMI
8.1.3 MPE/sMBR
8.1.4 A Uniformed Formulation
8.2 Practical Considerations
8.2.1 Lattice Generation
8.2.2 Lattice Compensation
8.2.3 Frame Smoothing
8.2.4 Learning Rate Adjustment
8.2.5 Training Criterion Selection
8.2.6 Other Considerations
8.3 Noise Contrastive Estimation
8.3.1 Casting Probability Density Estimation Problem as a Classifier Design Problem
8.3.2 Extension to Unnormalized Models
8.3.3 Apply NCE in DNN Training
References
Part IV Representation Learningin Deep Neural Networks
9 Feature Representation Learning in Deep Neural Networks
9.1 Joint Learning of Feature Representation and Classifier
9.2 Feature Hierarchy
9.3 Flexibility in Using Arbitrary Input Features
9.4 Robustness of Features
9.4.1 Robust to Speaker Variations
9.4.2 Robust to Environment Variations
9.5 Robustness Across All Conditions
9.5.1 Robustness Across Noise Levels
9.5.2 Robustness Across Speaking Rates
9.6 Lack of Generalization Over Large Distortions
References
10 Fuse Deep Neural Network and Gaussian Mixture Model Systems
10.1 Use DNN-Derived Features in GMM-HMM Systems
10.1.1 GMM-HMM with Tandem and Bottleneck Features
10.1.2 DNN-HMM Hybrid System Versus GMM-HMM System with DNN-Derived Features
10.2 Fuse Recognition Results
10.2.1 ROVER
10.2.2 SCARF
10.2.3 MBR Lattice Combination
10.3 Fuse Frame-Level Acoustic Scores
10.4 Multistream Speech Recognition
References
11 Adaptation of Deep Neural Networks
11.1 The Adaptation Problem for Deep Neural Networks
11.2 Linear Transformations
11.2.1 Linear Input Networks
11.2.2 Linear Output Networks
11.3 Linear Hidden Networks
11.4 Conservative Training
11.4.1 L2 Regularization
11.4.2 KL-Divergence Regularization
11.4.3 Reducing Per-Speaker Footprint
11.5 Subspace Methods
11.5.1 Subspace Construction Through Principal Component Analysis
11.5.2 Noise-Aware, Speaker-Aware, and Device-Aware Training
11.5.3 Tensor
11.6 Effectiveness of DNN Speaker Adaptation
11.6.1 KL-Divergence Regularization Approach
11.6.2 Speaker-Aware Training
References
Part V Advanced Deep Models
12 Representation Sharing and Transfer in Deep Neural Networks
12.1 Multitask and Transfer Learning
12.1.1 Multitask Learning
12.1.2 Transfer Learning
12.2 Multilingual and Crosslingual Speech Recognition
12.2.1 Tandem/Bottleneck-Based Crosslingual Speech Recognition
12.2.2 Shared-Hidden-Layer Multilingual DNN
12.2.3 Crosslingual Model Transfer
12.3 Multiobjective Training of Deep Neural Networks for Speech Recognition
12.3.1 Robust Speech Recognition with Multitask Learning
12.3.2 Improved Phone Recognition with Multitask Learning
12.3.3 Recognizing both Phonemes and Graphemes
12.4 Robust Speech Recognition Exploiting Audio-Visual Information
References
13 Recurrent Neural Networks and Related Models
13.1 Introduction
13.2 State-Space Formulation of the Basic Recurrent Neural Network
13.3 The Backpropagation-Through-Time Learning Algorithm
13.3.1 Objective Function for Minimization
13.3.2 Recursive Computation of Error Terms
13.3.3 Update of RNN Weights
13.4 A Primal-Dual Technique for Learning Recurrent Neural Networks
13.4.1 Difficulties in Learning RNNs
13.4.2 Echo-State Property and Its Sufficient Condition
13.4.3 Learning RNNs as a Constrained Optimization Problem
13.4.4 A Primal-Dual Method for Learning RNNs
13.5 Recurrent Neural Networks Incorporating LSTM Cells
13.5.1 Motivations and Applications
13.5.2 The Architecture of LSTM Cells
13.5.3 Training the LSTM-RNN
13.6 Analyzing Recurrent Neural Networks—A Contrastive Approach
13.6.1 Direction of Information Flow: Top-Down versus Bottom-Up
13.6.2 The Nature of Representations: Localist or Distributed
13.6.3 Interpretability: Inferring Latent Layers versus End-to-End Learning
13.6.4 Parameterization: Parsimonious Conditionals versus Massive Weight Matrices
13.6.5 Methods of Model Learning: Variational Inference versus Gradient Descent
13.6.6 Recognition Accuracy Comparisons
13.7 Discussions
References
14 Computational Network
14.1 Computational Network
14.2 Forward Computation
14.3 Model Training
14.4 Typical Computation Nodes
14.4.1 Computation Node Types with No Operand
14.4.2 Computation Node Types with One Operand
14.4.3 Computation Node Types with Two Operands
14.4.4 Computation Node Types for Computing Statistics
14.5 Convolutional Neural Network
14.6 Recurrent Connections
14.6.1 Sample by Sample Processing Only Within Loops
14.6.2 Processing Multiple Utterances Simultaneously
14.6.3 Building Arbitrary Recurrent Neural Networks
References
15 Summary and Future Directions
15.1 Road Map
15.1.1 Debut of DNNs for ASR
15.1.2 Speedup of DNN Training and Decoding
15.1.3 Sequence Discriminative Training
15.1.4 Feature Processing
15.1.5 Adaptation
15.1.6 Multitask and Transfer Learning
15.1.7 Convolution Neural Networks
15.1.8 Recurrent Neural Networks and LSTM
15.1.9 Other Deep Models
15.2 State of the Art and Future Directions
15.2.1 State of the Art—A Brief Analysis
15.2.2 Future Directions
References
Index
· · · · · · (收起)

读后感

评分☆☆☆☆☆

本人也是入门级选手，最近写论文，所以买了这本书，看了一下，卡在了传统模型上面，怀疑自己是不是太笨了？（只有deep learning背景）答案：并不是的。这本书就不适合入门。那怎么入门呢？ 1.不要在一本书上吊死。网上还有很多很亲切的材料等待您去挖掘，比如我搜到这篇：[GM...

评分☆☆☆☆☆

用户评价

评分☆☆☆☆☆

国庆长假读完了这本书，它属于语音识别方向综述性的文章，集中在声学模型训练部分，对于解码器和语言模型方面没有介绍。阅读需要一定的asr基础知识，不是入门级别的。对于很多知识点一带而过，需要深究还需要查看相关文献。整体来讲框架还是非常清晰，是一本很好的综述类书籍，由于是2014年出版，2015年和2016年比较新的技术（比如CTC）没有涉及。

评分☆☆☆☆☆