Flowman is an open source project developed by dimajix that helps your company develop ETL jobs based on Apache Spark. The core idea of Flowman is to specify the data flow purely declaratively, and then have it executed by a flexible Spark application (also Flowman).
With this approach, you cleanly separate the business logic from all the technical details necessary for a productive operation. This allows you to focus on business logic, while Flowman, as a mature Spark application, takes care of the technical details to ensure a stable execution. This includes exporting relevant metrics for monitoring, consistent logging, support for clean reruns, and more.
The data flows themselves are stored in YAML files, and in contrast to classic Scala/Java code can also be traced by a business expert with only a short induction. In this way, you can involve the existing expert knowledge more closely in the development in order to detect technical errors at an early stage.