• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Ever. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Themed Directories
    3. Awesome Data Engineering

    Awesome Data Engineering

    An Awesome collection of data engineering resources, tools, and best practices for building and maintaining data infrastructure.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedDec 25, 2025

    Categories

    1 Item
    Themed Directories

    Tags

    3 Items
    #awesome-lists#big-data#data-engineering

    Similar Products

    6 result(s)

    Awesome Hadoop

    A curated Awesome list of Hadoop ecosystem resources, libraries, and tools for distributed storage and processing of very large datasets.

    Awesome HBase

    An Awesome collection of resources, tools, and libraries for Apache HBase, a distributed, scalable big data store.

    Awesome HPC

    Curated Awesome list of tools, libraries, and documentation related to High Performance Computing (HPC).

    Awesome InfluxDB

    A curated Awesome list of resources, libraries, tools, and integrations for the InfluxDB time-series database.

    Awesome Data – Image Processing Datasets

    A curated awesome-style collection of image processing and computer vision datasets, hosted under the Awesome Data (apd-core) project. The listed datasets (e.g., ImageNet, KITTI, Danbooru, DukeMTMC) are part of this meta awesome directory of specialized data resources.

    Featured

    Awesome Public Datasets – Social Networks

    A curated subset of the Awesome Public Datasets project that catalogs high-quality, publicly available social network datasets (e.g., Twitter scrapes, Enron email, Facebook graphs). This collection functions as an "awesome-style" directory specifically focused on social network data, providing structured metadata files for each dataset to make discovery and reuse easier across the wider awesome ecosystem.

    Featured

    Awesome Data Engineering

    URL: https://github.com/igorbarinov/awesome-data-engineering#readme
    Type: Curated resource list (Awesome list)
    Category: Themed directories
    Tags: awesome-lists, big-data, data-engineering

    Overview

    A curated, community-maintained list of tools, libraries, and resources related to data engineering, aimed at helping software developers discover technologies for building and maintaining data infrastructure.

    Features

    • Databases
      Collection of database technologies and related tools commonly used in data engineering pipelines.

    • Data Comparison
      Tools and libraries for comparing datasets, validating data consistency, and detecting differences between data sources.

    • Data Ingestion
      Resources for ingesting data from various sources into data platforms, including ETL/ELT tools and connectors.

    • File System
      Tools and frameworks dealing with file systems and storage layers relevant to data pipelines.

    • Serialization Format
      Formats and tooling for efficient data serialization and interchange (e.g., structured/binary formats used in data engineering).

    • Stream Processing
      Frameworks and services for real-time data processing and streaming analytics.

    • Batch Processing
      Technologies for large-scale, scheduled, or offline data processing workflows.

    • Charts and Dashboards
      Visualization and dashboarding tools to present and explore processed data.

    • Workflow
      Orchestration and workflow management tools for coordinating data pipelines and jobs.

    • Data Lake Management
      Tools for organizing, governing, and operating data lakes.

    • ELK (Elasticsearch, Logstash, Kibana)
      Resources related to the ELK stack for log management, search, and analytics.

    • Docker
      Containerization-related resources useful for packaging and deploying data engineering components.

    • Datasets

      • Realtime – Datasets suited for real-time and streaming scenarios.
      • Data Dumps – Static or bulk datasets for experimentation, benchmarking, or training.
    • Monitoring
      Tools and practices for observing data systems and pipelines.

      • Prometheus – Specific focus on Prometheus-related monitoring resources.
    • Profiling
      Techniques and tools for understanding data characteristics and performance.

      • Data Profiler – Resources for profiling dataset quality, distribution, and schema.
    • Testing
      Tools and approaches for testing data pipelines, transformations, and data quality.

    • Community

      • Forums – Community discussion and Q&A platforms for data engineers.
      • Conferences – Events and conferences focused on data engineering topics.
      • Podcasts – Audio resources discussing data engineering tools, practices, and industry trends.
      • Books – Recommended reading lists related to data engineering.
    • Topics & Resources
      Additional categorized links covering broader data engineering topics and learning materials.

    • Contributing
      Guidelines for community contributions to keep the list up to date.

    • License
      Clearly licensed as an open resource (license file included in the repository).

    Pricing

    • Not applicable. This is an open, public GitHub repository of curated resources.