• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Ever. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Data Engineering
    3. Awesome Data Analysis

    Awesome Data Analysis

    500+ curated resources for data analysis and data science including Python, SQL, Statistics, ML, AI, visualization tools, cheatsheets, roadmaps, and interview preparation for beginners and experts.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 11, 2026

    Categories

    1 Item
    Data Engineering

    Tags

    3 Items
    #data-analysis#data-science#python

    Overview

    Awesome Data Analysis is a comprehensive collection of 500+ curated resources covering all aspects of data analysis and data science, from fundamentals to advanced topics, suitable for both beginners and experienced practitioners.

    Programming Languages

    Python for Data Analysis

    • pandas: Data manipulation and analysis
    • NumPy: Numerical computing
    • SciPy: Scientific computing
    • Jupyter: Interactive computing environment
    • IPython for enhanced REPL experience

    SQL and Databases

    • SQL fundamentals and advanced queries
    • PostgreSQL for analytics
    • MySQL and database design
    • NoSQL databases (MongoDB, Redis)
    • Query optimization techniques

    R for Statistics

    • tidyverse ecosystem
    • ggplot2 for visualization
    • Statistical modeling packages
    • R Markdown for reproducible research

    Data Visualization

    Python Libraries

    • matplotlib: Comprehensive plotting
    • seaborn: Statistical visualization
    • plotly: Interactive graphs
    • bokeh: Interactive web-based visualization
    • altair: Declarative statistical visualization

    Business Intelligence Tools

    • Tableau for interactive dashboards
    • Power BI for Microsoft ecosystem
    • Looker for data exploration
    • Metabase open-source BI

    Statistics and Mathematics

    Fundamentals

    • Descriptive statistics
    • Inferential statistics
    • Probability theory
    • Hypothesis testing
    • A/B testing methodology

    Advanced Topics

    • Bayesian statistics
    • Time series analysis
    • Multivariate analysis
    • Experimental design
    • Causal inference

    Machine Learning

    Core Libraries

    • scikit-learn: Classical ML algorithms
    • XGBoost: Gradient boosting
    • LightGBM: Efficient gradient boosting
    • CatBoost: Categorical features handling

    Deep Learning

    • TensorFlow and Keras
    • PyTorch framework
    • Neural network architectures
    • Transfer learning

    Data Collection

    Web Scraping

    • BeautifulSoup for HTML parsing
    • Scrapy framework
    • Selenium for dynamic content
    • APIs and REST integration

    Data Sources

    • Public datasets repositories
    • APIs for real-time data
    • Database connections
    • File format handling (CSV, JSON, Parquet)

    Data Cleaning and Preprocessing

    Techniques

    • Handling missing values
    • Outlier detection and treatment
    • Data normalization and standardization
    • Feature engineering
    • Data validation

    Tools

    • pandas for data wrangling
    • OpenRefine for data cleaning
    • Great Expectations for validation
    • pandera for schema validation

    Exploratory Data Analysis (EDA)

    • Univariate analysis techniques
    • Bivariate and multivariate analysis
    • Correlation analysis
    • Distribution visualization
    • Automated EDA tools (pandas-profiling, sweetviz)

    Big Data Technologies

    • Apache Spark for distributed computing
    • Hadoop ecosystem
    • Dask for parallel computing in Python
    • PySpark for Python integration

    Cloud Platforms

    Cloud Services

    • AWS (S3, Athena, SageMaker)
    • Google Cloud Platform (BigQuery, Vertex AI)
    • Azure (Synapse, ML Studio)
    • Databricks unified analytics

    Tools and IDEs

    Development Environments

    • Jupyter Notebook and JupyterLab
    • VS Code with extensions
    • PyCharm for Python
    • RStudio for R
    • Google Colab for cloud notebooks

    Version Control

    • Git fundamentals
    • GitHub for collaboration
    • DVC for data versioning
    • MLflow for experiment tracking

    Roadmaps and Learning Paths

    Career Paths

    • Data Analyst roadmap
    • Data Scientist pathway
    • ML Engineer progression
    • Analytics Engineer path

    Skill Development

    • Beginner to advanced progression
    • Specialization tracks
    • Certification paths
    • Portfolio building

    Interview Preparation

    Technical Topics

    • SQL interview questions
    • Python coding challenges
    • Statistics and probability questions
    • Machine learning concepts
    • System design for data systems

    Soft Skills

    • Communication of insights
    • Stakeholder management
    • Business understanding
    • Presentation skills

    Cheat Sheets

    • pandas operations quick reference
    • SQL query syntax
    • matplotlib plotting commands
    • Git commands
    • Regular expressions
    • Statistical tests selection

    Best Practices

    • Reproducible research practices
    • Code documentation
    • Project organization
    • Peer review processes
    • Ethical data analysis

    Resources

    Books

    • "Python for Data Analysis" by Wes McKinney
    • "The Data Warehouse Toolkit" by Ralph Kimball
    • "Storytelling with Data" by Cole Nussbaumer Knaflic

    Online Courses

    • Coursera specializations
    • DataCamp tracks
    • Udacity nanodegrees
    • Fast.ai courses

    Communities

    • Kaggle competitions and discussions
    • Stack Overflow for Q&A
    • Reddit data science communities
    • LinkedIn groups and discussions