
作者:AlexThomas
页数:347
出版社:东南大学出版社
出版日期:2021
ISBN:9787564195113
电子书格式:pdf/epub/txt
内容简介
如果您想构建使用自然语言文本的企业级应用程序,但不确定从哪里开始或使用什么工具,那么本实用指南将帮助您入门。Wisecube的首席数据科学家Alex Thomas向软件工程师和数据科学家展示了如何使用深度学习和Apache Spark NLP库构建可扩展的自然语言处理(NLP)应用程序。通过具体的例子、实践和理论解释,以及在Spark processing框架上使用NLP的动手练习,这本书教你从基本语言学和写作系统到情绪分析和搜索引擎的一切。您还将探讨开发基于文本的应用程序的一些特殊问题,比如性能。
作者简介
亚历克斯·托马斯是Wisecube的首席数据科学家。他将自然语言处理和机器学习运用于临床数据、身份数据、雇主和求职者数据以及如今的生化数据。Alex从09版本开始使用Apache Spark,在工作中也用过包括UIMA和OpenNLP在内的多种NLP库和框架。
目录
Preface
Part I. Basics
1. Getting Started
Introduction
Other Tools
Setting Up Your Environment
Prerequisites
Starting Apache Spark
Checking Out the Code
Getting Familiar with Apache Spark
Starting Apache Spark with Spark NLP
Loading and Viewing Data in Apache Spark
Hello World with Spark NLP
2. Natural Language Basics
What Is Natural Language?
Origins of Language
Spoken Language Versus Written Language
Linguistics
Phonetics and Phonology
Morphology
Syntax
Semantics
Sociolinguistics: Dialects, Registers, and Other Varieties
Formality
Context
Pragmatics
Roman ]akobson
How To Use Pragmatics
Writing Systems
Origins
Alphabets
Abiads
Abugidas
Syllabaries
Logographs
Encodings
ASCII
Unicode
UTF-8
Exercises: Tokenizing
Tokenize English
Tokenize Greek
Tokenize Ge’ez (Amharic)
Resources
3. NLP on Apache Spark
Parallelism, Concurrency, Distributing Computation
Parallelization Before Apache Hadoop
MapReduce and Apache Hadoop
Apache Spark
Architecture of Apache Spark
Physical Architecture
Logical Architecture
Spark SQL and Spark MLlib
Transformers
Estimators and Models
Evaluators
NLP Libraries
Functionality Libraries
Annotation Libraries
NLP in Other Libraries
Spark NLP
Annotation Library
Stages
Pretrained Pipelines
Finisher
Exercises: Build a Topic Model
Resources
4. Deep Learning Basics
Gradient Descent
Backpropagation
Convolutional Neural Networks
Filters
Pooling
Recurrent Neural Networks
Backpropagation Through Time
Elman Nets
LSTMs
Exercise 1
Exercise 2
Resources
Part II. Building Blocks
5. Processing Words
6. Information Retrieval
7. Classification and Regression
8. Sequence Modeling with Keras
9. Information Extraction
10. Topic Modeling
11. Word Embeddings
Part III. Applications
12. Sentiment Analysis and Emotion Detection
13. Building Knowledqe Bases
14. Search Engine
15. Chatbot
16. Object Character Recognition
Part IV. Building NLP Systems
17. Supporting Multiple Languages
18. Human Labeling
19. Productionizing NLP Applications
Glossary
Index
Part I. Basics
1. Getting Started
Introduction
Other Tools
Setting Up Your Environment
Prerequisites
Starting Apache Spark
Checking Out the Code
Getting Familiar with Apache Spark
Starting Apache Spark with Spark NLP
Loading and Viewing Data in Apache Spark
Hello World with Spark NLP
2. Natural Language Basics
What Is Natural Language?
Origins of Language
Spoken Language Versus Written Language
Linguistics
Phonetics and Phonology
Morphology
Syntax
Semantics
Sociolinguistics: Dialects, Registers, and Other Varieties
Formality
Context
Pragmatics
Roman ]akobson
How To Use Pragmatics
Writing Systems
Origins
Alphabets
Abiads
Abugidas
Syllabaries
Logographs
Encodings
ASCII
Unicode
UTF-8
Exercises: Tokenizing
Tokenize English
Tokenize Greek
Tokenize Ge’ez (Amharic)
Resources
3. NLP on Apache Spark
Parallelism, Concurrency, Distributing Computation
Parallelization Before Apache Hadoop
MapReduce and Apache Hadoop
Apache Spark
Architecture of Apache Spark
Physical Architecture
Logical Architecture
Spark SQL and Spark MLlib
Transformers
Estimators and Models
Evaluators
NLP Libraries
Functionality Libraries
Annotation Libraries
NLP in Other Libraries
Spark NLP
Annotation Library
Stages
Pretrained Pipelines
Finisher
Exercises: Build a Topic Model
Resources
4. Deep Learning Basics
Gradient Descent
Backpropagation
Convolutional Neural Networks
Filters
Pooling
Recurrent Neural Networks
Backpropagation Through Time
Elman Nets
LSTMs
Exercise 1
Exercise 2
Resources
Part II. Building Blocks
5. Processing Words
6. Information Retrieval
7. Classification and Regression
8. Sequence Modeling with Keras
9. Information Extraction
10. Topic Modeling
11. Word Embeddings
Part III. Applications
12. Sentiment Analysis and Emotion Detection
13. Building Knowledqe Bases
14. Search Engine
15. Chatbot
16. Object Character Recognition
Part IV. Building NLP Systems
17. Supporting Multiple Languages
18. Human Labeling
19. Productionizing NLP Applications
Glossary
Index















