
作者:GerardMaas.Francoi
页数:424
出版社:东南大学出版社
出版日期:2020
ISBN:9787564188238
电子书格式:pdf/epub/txt
内容简介
在构建分析工具以快速获得洞察力之前,你首先需要知道如何处理实时数据。 熟悉Apache Spark的开发人员通过这本实用指导书,可以学习如何将该内存框架用于流数据处理。你将会发现Spark如何让你几乎能够用编写批处理作业相同的方式编写流作业。本书通过两个部分对比了Spark现在支持的两种流API的差异:原始Spark流库(DStreams)和新的结构化流API。
本书特色
在构建分析工具以快速获得洞察力之前,你首先需要知道如何处理实时数据。 熟悉Apache Spark的开发人员通过这本实用指导书,可以学习如何将该内存框架用于流数据处理。你将会发现Spark如何让你几乎能够用编写批处理作业相同的方式编写流作业。《Apache Spark流处理(影印版)》通过两个部分对比了Spark现在支持的两种流API的差异:原始Spark流库(DStreams)和新的结构化流API。
目录
Foreword
Preface
Part Ⅰ. Fundamentals of Stream Processing with Apache Spark
1. Introducing Stream Processing
What Is Stream Processing?
Batch Versus Stream Processing
The Notion of Time in Stream Processing
The Factor of Uncertainty
Some Examples of Stream Processing
Scaling Up Data Processing
MapReduce
The Lesson Learned: Scalability and Fault Tolerance
Distributed Stream Processing
Stateful Stream Processing in a Distributed System
Introducing Apache Spark
The First Wave: Functional APIs
The Second Wave: SQL
A Unified Engine
Spark Components
Spark Streaming
Structured Streaming
Where Next?
2. Stream-Processing Model
Sources and Sinks
Immutable Streams Defined from One Another
Transformations and Aggregations
Window Aggregations
Tumbling Windows
Sliding Windows
Stateless and Stateful Processing
Stateful Streams
An Example: Local Stateful Computation in Scala
A Stateless Definition of the Fibonacci Sequence as a Stream
Transformation
Stateless or Stateful Streaming
The Effect of Time
Computing on Timestamped Events
Timestamps as the Provider of the Notion of Time
Event Time Versus Processing Time
Computing with a Watermark
Summary
3. Streaming Architectures
Components of a Data Platform
Architectural Models
The Use of a Batch-Processing Component in a Streaming Application
Referential Streaming Architectures
The Lambda Architecture
The Kappa Architecture
Streaming Versus Batch Algorithms
Streaming Algorithms Are Sometimes Completely Different in Nature
Streaming Algorithms Can’t Be Guaranteed to Measure Well Against
Batch Algorithms
Summary
4. Apache Spark as a Stream-Processing Engine
The Tale of Two APIs
Spark’s Memory Usage
Failure Recovery
Lazy Evaluation
Cache Hints
Understanding Latency
Throughput-Oriented Processing
Spark’s Polyglot API
……
Part Ⅱ Structured Streaming
Part Ⅲ Spark Streaming
Part Ⅳ Advanced Spark Streaming Techniques
Part Ⅴ Beyond Apache Spark
E.References for Part Ⅴ
Index
Preface
Part Ⅰ. Fundamentals of Stream Processing with Apache Spark
1. Introducing Stream Processing
What Is Stream Processing?
Batch Versus Stream Processing
The Notion of Time in Stream Processing
The Factor of Uncertainty
Some Examples of Stream Processing
Scaling Up Data Processing
MapReduce
The Lesson Learned: Scalability and Fault Tolerance
Distributed Stream Processing
Stateful Stream Processing in a Distributed System
Introducing Apache Spark
The First Wave: Functional APIs
The Second Wave: SQL
A Unified Engine
Spark Components
Spark Streaming
Structured Streaming
Where Next?
2. Stream-Processing Model
Sources and Sinks
Immutable Streams Defined from One Another
Transformations and Aggregations
Window Aggregations
Tumbling Windows
Sliding Windows
Stateless and Stateful Processing
Stateful Streams
An Example: Local Stateful Computation in Scala
A Stateless Definition of the Fibonacci Sequence as a Stream
Transformation
Stateless or Stateful Streaming
The Effect of Time
Computing on Timestamped Events
Timestamps as the Provider of the Notion of Time
Event Time Versus Processing Time
Computing with a Watermark
Summary
3. Streaming Architectures
Components of a Data Platform
Architectural Models
The Use of a Batch-Processing Component in a Streaming Application
Referential Streaming Architectures
The Lambda Architecture
The Kappa Architecture
Streaming Versus Batch Algorithms
Streaming Algorithms Are Sometimes Completely Different in Nature
Streaming Algorithms Can’t Be Guaranteed to Measure Well Against
Batch Algorithms
Summary
4. Apache Spark as a Stream-Processing Engine
The Tale of Two APIs
Spark’s Memory Usage
Failure Recovery
Lazy Evaluation
Cache Hints
Understanding Latency
Throughput-Oriented Processing
Spark’s Polyglot API
……
Part Ⅱ Structured Streaming
Part Ⅲ Spark Streaming
Part Ⅳ Advanced Spark Streaming Techniques
Part Ⅴ Beyond Apache Spark
E.References for Part Ⅴ
Index















