• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Ever. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Data Engineering
    3. Awesome Open-Source Data Engineering

    Awesome Open-Source Data Engineering

    A comprehensive collection of open-source projects for data engineering, covering workflow orchestration, data processing, streaming, storage, and quality tools used in modern data platforms.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 25, 2026

    Categories

    1 Item
    Data Engineering

    Tags

    3 Items
    #open-source#data-pipelines#etl

    Similar Products

    6 result(s)

    Awesome ETL

    A curated list of notable ETL (Extract, Transform, Load) frameworks, libraries, and software for building robust data pipelines and managing data integration workflows.

    Awesome Apache Airflow

    A curated collection of resources, plugins, tools, and best practices for Apache Airflow, the leading open-source platform for orchestrating complex data workflows and ETL pipelines.

    Awesome Kafka

    A curated list of awesome Apache Kafka resources, tools, libraries, and applications for building event-driven architectures and real-time data streaming platforms.

    Awesome Data Pipeline

    A curated list of data pipeline tools and frameworks organized by category including workflow management, data ingestion, data lakes, query engines, streaming, and data transformation.

    Awesome Chinese LLM

    An awesome-style curated list of open-source Chinese large language models, focused on smaller-scale models suitable for private deployment, along with domain-specific fine-tunes, applications, datasets, and tutorials.

    Featured

    Awesome Open IoT

    A curated list of awesome open source IoT frameworks, libraries, and software. Comprehensive collection of tools for building Internet of Things applications with open-source technologies.

    Overview

    This curated list provides an overview of open-source projects related to data engineering, helping data engineers discover tools for building robust data platforms and pipelines.

    Features

    Workflow Orchestration

    • Apache Airflow - Platform to programmatically author, schedule and monitor workflows
    • Dagster - Data orchestrator for machine learning, analytics, and ETL
    • Prefect - Modern workflow orchestration framework
    • Temporal - Durable execution framework

    Data Processing

    • Apache Spark - Unified analytics engine for large-scale data processing
    • Apache Flink - Stateful computations over data streams
    • dbt - Transform data in your warehouse using SQL
    • Polars - Lightning-fast DataFrame library

    Stream Processing

    • Apache Kafka - Distributed event streaming platform
    • Apache Pulsar - Cloud-native distributed messaging and streaming
    • Redpanda - Kafka API compatible streaming platform

    Data Quality

    • Great Expectations - Data validation framework
    • Soda Core - Data quality testing framework
    • Monte Carlo - Data observability platform

    Storage & Lakehouse

    • Apache Iceberg - Open table format for huge analytic datasets
    • Delta Lake - Storage framework for building lakehouses
    • Apache Hudi - Transactional data lake platform

    Query Engines

    • Apache Drill - Schema-free SQL query engine
    • Presto/Trino - Distributed SQL query engine
    • ClickHouse - Fast column-oriented database

    Use Cases

    • Building modern data platforms and lakehouses
    • Creating reliable ETL/ELT pipelines
    • Implementing real-time streaming analytics
    • Ensuring data quality and observability
    • Orchestrating complex data workflows

    Pricing

    All projects listed are free and open-source, available under various open-source licenses (Apache 2.0, MIT, etc.).