

An Awesome collection of data engineering resources, tools, and best practices for building and maintaining data infrastructure.
Loading more......
URL: https://github.com/igorbarinov/awesome-data-engineering#readme
Type: Curated resource list (Awesome list)
Category: Themed directories
Tags: awesome-lists, big-data, data-engineering
A curated, community-maintained list of tools, libraries, and resources related to data engineering, aimed at helping software developers discover technologies for building and maintaining data infrastructure.
Databases
Collection of database technologies and related tools commonly used in data engineering pipelines.
Data Comparison
Tools and libraries for comparing datasets, validating data consistency, and detecting differences between data sources.
Data Ingestion
Resources for ingesting data from various sources into data platforms, including ETL/ELT tools and connectors.
File System
Tools and frameworks dealing with file systems and storage layers relevant to data pipelines.
Serialization Format
Formats and tooling for efficient data serialization and interchange (e.g., structured/binary formats used in data engineering).
Stream Processing
Frameworks and services for real-time data processing and streaming analytics.
Batch Processing
Technologies for large-scale, scheduled, or offline data processing workflows.
Charts and Dashboards
Visualization and dashboarding tools to present and explore processed data.
Workflow
Orchestration and workflow management tools for coordinating data pipelines and jobs.
Data Lake Management
Tools for organizing, governing, and operating data lakes.
ELK (Elasticsearch, Logstash, Kibana)
Resources related to the ELK stack for log management, search, and analytics.
Docker
Containerization-related resources useful for packaging and deploying data engineering components.
Datasets
Monitoring
Tools and practices for observing data systems and pipelines.
Profiling
Techniques and tools for understanding data characteristics and performance.
Testing
Tools and approaches for testing data pipelines, transformations, and data quality.
Community
Topics & Resources
Additional categorized links covering broader data engineering topics and learning materials.
Contributing
Guidelines for community contributions to keep the list up to date.
License
Clearly licensed as an open resource (license file included in the repository).