Definition

What is Apache Flink?

Jack Vaughan

Published: Oct 06, 2021

Apache Flink is a distributed data processing platform for use in big data applications, primarily involving analysis of data stored in Hadoop clusters. Supporting a combination of in-memory and disk-based processing, Flink handles both batch and stream processing jobs, with data streaming the default implementation and batch jobs running as special-case versions of streaming applications.

Flink was designed as an alternative to MapReduce, the batch-only processing engine that was paired with the Hadoop Distributed File System (HDFS) in Hadoop's initial incarnation. The Flink software is open source and adheres to The Apache Software Foundation's licensing provisions. Its development is primarily being driven by DataArtisans GmbH, a startup vendor based in Berlin.

How does Apache Flink work?

Flink streaming applications are programmed via a DataStream API using either Java or Scala. These languages, as well as Python, can also be used to program against a complementary DataSet API for processing static data. Flink can be deployed on a single Java virtual machine (JVM) in standalone mode or YARN-based Hadoop clusters, or on cloud systems.

The core Flink runtime supports a pipelined streaming architecture; it also offers a built-in method to support iterative data processing for machine learning and other analytics applications. Dedicated APIs and libraries are provided for development of machine learning programs, as well as string handling, graph processing and other uses. Another API is focused on Hadoop application integration.

How has Apache Flink evolved?

Flink arose as an offshoot of Stratosphere, a project begun in 2009 at three universities in Germany: TU Berlin, Humboldt University of Berlin and the Hasso Plattner Institute. The Flink technology subsequently became an Apache incubator project in April 2014 and a top-level project late that year; after nine earlier releases, Apache Flink 1.0.0 was released in March 2016. With that, Flink officially joined other Hadoop ecosystem frameworks such as Spark, Storm and Samza in the competition to provide big data streaming capabilities.

Continue Reading About What is Apache Flink?

Apache Iceberg rising for new cloud data lake platforms

Dell EMC Streaming Data Platform integrates open source technology

How Grab is using Kafka in fraud detection

Apache Flink site

What's inside the Flink streaming architecture?

What is Apache Flink?

How does Apache Flink work?

How has Apache Flink evolved?

Continue Reading About What is Apache Flink?

Dig Deeper on Data management strategies

Hadoop vs. Spark for modern data pipelines

What is stream processing? Introduction and overview

Confluent launches Tableflow to ease use of streaming data

18 top big data tools and technologies to know about in 2025

How does Apache Flink work?

How has Apache Flink evolved?

Continue Reading About What is Apache Flink?

Related Terms

Dig Deeper on Data management strategies

Hadoop vs. Spark for modern data pipelines

What is stream processing? Introduction and overview

Confluent launches Tableflow to ease use of streaming data

18 top big data tools and technologies to know about in 2025