

A curated Awesome-style sub-collection within the APD (Awesome Public Datasets) core repository that indexes multiple high‑quality natural language datasets and lexical resources via individual YAML meta files (e.g., SQuAD, Universal Dependencies, WordNet). It serves as a meta directory of links to external NLP datasets, aligning with the broader Awesome ecosystem as a directory-of-resources pattern.
Loading more......
Category: Meta-directories
Tags: datasets, nlp, directory-of-directories
Source: GitHub – awesomedata/apd-core (NaturalLanguage)
The NaturalLanguage section of the APD (Awesome Public Datasets) core repository is a curated, Awesome-style sub-collection focused on natural language processing (NLP) datasets and lexical resources. Instead of hosting datasets directly, it acts as a meta directory that indexes multiple high‑quality external NLP resources through individual YAML metadata files.
Examples of referenced resources include:
This section follows the broader Awesome ecosystem pattern of providing a structured directory-of-resources to help users discover and navigate NLP datasets.
Curated NLP Dataset Index
Focused list of public natural language datasets and lexical resources, filtered to highlight commonly used, higher-quality sources.
YAML-based Metadata Files
Each dataset/resource is represented by an individual YAML meta file containing structured information (e.g., name, description, links, possibly licenses and modalities), enabling machine-readable indexing and easier tooling integration.
Meta Directory (Directory-of-Directories Pattern)
Functions as a directory of external resources, not as a data host:
Coverage of Multiple NLP Resource Types
Includes various categories such as:
Integration with APD Core Structure
Lives under core/NaturalLanguage in the apd-core repo, benefiting from:
Open, Git-based Contribution Model
As a GitHub-hosted collection, it can be extended via pull requests: