Home / Papers / Spark Structured Streaming: Customizing Kafka Stream Processing

Spark Structured Streaming: Customizing Kafka Stream Processing

DOI: 10.1109/dsmp47368.2020.9204304Source

11 Citations•2020•

Yuriy Drohobytskiy, Vitaly Brevus, Yuriy Skorenkyy

journal unavailable

An improvement of large-scale multi-party data exchange and stream processing solution that uses Apache Kafka streams as well as HDFS file granulation and is exemplified in a real project of data ingestion into the Hadoop ecosystem.

Abstract

The aim of the present paper is to develop an improvement of large-scale multi-party data exchange and stream processing solution. The method of choice uses Apache Kafka streams as well as HDFS file granulation, and is exemplified in a real project of data ingestion into the Hadoop ecosystem. The management and conditional stream controlling procedures are proposed. Various ways to manage Kafka offsets during stream processing are considered.