Skip to main content


Comfortable ETL with Apache Spark

Boost your ETL jobs with Spark by implementing Flowman.

Flowman Overview

The approach and idea of Flowman

Flowman is an open source project developed by dimajix that helps your company develop ETL jobs based on Apache Spark. The core idea of Flowman is to specify the data flow purely declaratively, and then have it executed by a flexible Spark application (also Flowman).

With this approach, you cleanly separate the business logic from all the technical details necessary for a productive operation. This allows you to focus on business logic, while Flowman, as a mature Spark application, takes care of the technical details to ensure a stable execution. This includes exporting relevant metrics for monitoring, consistent logging, support for clean reruns, and more.

The data flows themselves are stored in YAML files, and in contrast to classic Scala/Java code can also be traced by a business expert with only a short induction. In this way, you can involve the existing expert knowledge more closely in the development in order to detect technical errors at an early stage.

Product Features

The following features are provided by Flowman

100% Open Source (Apache Lizenz)
Based on Apache Spark
Flexible spezification of data flows
Automatic schema management (creation and migration of tables)
Versatile command line tool for execution
Integrated Metrics for Monitoring
Supports Hadoop and Kubernetes
Supports AWS and Azure (S3 and ABS)


These advantages result from the use of Flowman

Open Source.

No license costs are required, at the same time you benefit from the further development. The liberal Apache license allows you to make internal changes without the obligation to publish them.


A plugin interface allows you to develop missing functionality yourself without having to disclose it.

Relief for Developers.

By focusing on business logic, your developers can focus on the essentials while Flowman implements the technical details.

Uniform Solution.

Instead of a loose collection of different Spark applications, you can use a unified solution that covers all the essential satisfies. There are no parallel developments of several solutions to similar problems.



    Ihr Name (Pflichtfeld)

    Ihre E-Mail-Adresse (Pflichtfeld)

    Betreff (Pflichtfeld)

    Ihre Nachricht (Pflichtfeld)