login
Home / Papers / Spark Structured Streaming: Customizing Kafka Stream Processing

Spark Structured Streaming: Customizing Kafka Stream Processing

11 Citations2020
Yuriy Drohobytskiy, Vitaly Brevus, Yuriy Skorenkyy
journal unavailable

An improvement of large-scale multi-party data exchange and stream processing solution that uses Apache Kafka streams as well as HDFS file granulation and is exemplified in a real project of data ingestion into the Hadoop ecosystem.

Abstract

The aim of the present paper is to develop an improvement of large-scale multi-party data exchange and stream processing solution. The method of choice uses Apache Kafka streams as well as HDFS file granulation, and is exemplified in a real project of data ingestion into the Hadoop ecosystem. The management and conditional stream controlling procedures are proposed. Various ways to manage Kafka offsets during stream processing are considered.

Spark Structured Streaming: Customizing Kafka Stream Process