login
Home / Papers / Spark Structured Streaming: Customizing Kafka Stream Processing

Spark Structured Streaming: Customizing Kafka Stream Processing

11 Citations2020
Yuriy Drohobytskiy, Vitaly Brevus, Yuriy Skorenkyy
journal unavailable

An improvement of large-scale multi-party data exchange and stream processing solution that uses Apache Kafka streams as well as HDFS file granulation and is exemplified in a real project of data ingestion into the Hadoop ecosystem.

Abstract

The aim of the present paper is to develop an improvement of large-scale multi-party data exchange and stream processing solution. The method of choice uses Apache Kafka streams as well as HDFS file granulation, and is exemplified in a real project of data ingestion into the Hadoop ecosystem. The management and conditional stream controlling procedures are proposed. Various ways to manage Kafka offsets during stream processing are considered.