Awesome Data - Yelp Dataset Challenge
An entry in the Awesome Data Project’s meta-collection that catalogs the Yelp Dataset Challenge, a public subset of Yelp’s business, review, and user data frequently used in data science and machine learning research. It serves as a curated pointer within the Awesome-style directory system to this specific data challenge resource.
About this tool
Awesome Data – Yelp Dataset Challenge
Curated entry in the Awesome Data collection pointing to the Yelp Open Dataset, a large, real-world dataset commonly used for data science and machine learning projects involving local businesses, reviews, and user-generated content.
- Source: Yelp Open Dataset
- Brand: Yelp
- Category: Themed directories
- Tags: datasets, machine-learning, business
Features
Dataset scope
- Educational / research focus: Intended for educational and research use (e.g., data science, machine learning, information retrieval, recommendation systems).
- Real-world business data: Captures real Yelp business and review activity.
Contents & scale
- Reviews: 6,990,280 reviews
- Businesses: 150,346 businesses
- Geographical coverage: 11 metropolitan areas
- Photos / pictures: 200,100 pictures
- Attributes & metadata (via JSON files), including for example:
- Business hours
- Parking availability
- Ambience and similar business attributes
- Check-ins
File structure & formats
- Main JSON bundle (business/review data):
- 1 compressed TAR archive (≈ 4.35 GB)
- Uncompressed contents (≈ 8.65 GB):
- 1 PDF (documentation)
- 5 JSON files (core dataset files)
- Documentation included
- Photos bundle:
- 1 compressed TAR archive (≈ 7.45 GB)
- Uncompressed contents (≈ 7.11 GB):
- 1 JSON file
- 1 text file
- 1 PDF (documentation)
- 1 folder containing ≈ 200,000 photos
- Documentation included
Access & downloads
- Data download (JSON): Download JSON
- Photos download: Download photos
Use Cases
- Academic coursework and projects (data mining, statistics, ML)
- Research on recommendations, NLP, sentiment analysis, and ranking
- Analysis of local business ecosystems and user behavior
Pricing
- Presented as an open, educational dataset; no pricing information is provided in the source content.
Loading more......
Information
Categories
Tags
Similar Products
3 result(s)Large-scale web crawl dataset containing 3.5 billion web pages from CommonCrawl (2012), suitable for web mining, search, and network analysis research. Listed as part of an awesome-style collection of computer networks datasets.
An Awesome-style collection of short, easy-to-understand JavaScript code snippets you can grasp in 30 seconds.
A GitHub repository by Brad Traversy containing 50+ small, focused web development mini projects built with HTML, CSS, and JavaScript, useful as a curated collection of example projects for learning or referencing in awesome-style directories.