apd-core - NaturalLanguage section
A curated Awesome-style sub-collection within the APD (Awesome Public Datasets) core repository that indexes multiple high‑quality natural language datasets and lexical resources via individual YAML meta files (e.g., SQuAD, Universal Dependencies, WordNet). It serves as a meta directory of links to external NLP datasets, aligning with the broader Awesome ecosystem as a directory-of-resources pattern.
About this tool
apd-core – NaturalLanguage Section
Category: Meta-directories
Tags: datasets, nlp, directory-of-directories
Source: GitHub – awesomedata/apd-core (NaturalLanguage)
Overview
The NaturalLanguage section of the APD (Awesome Public Datasets) core repository is a curated, Awesome-style sub-collection focused on natural language processing (NLP) datasets and lexical resources. Instead of hosting datasets directly, it acts as a meta directory that indexes multiple high‑quality external NLP resources through individual YAML metadata files.
Examples of referenced resources include:
- Question answering datasets (e.g., SQuAD)
- Syntactic and morphosyntactic corpora (e.g., Universal Dependencies)
- Lexical databases (e.g., WordNet)
This section follows the broader Awesome ecosystem pattern of providing a structured directory-of-resources to help users discover and navigate NLP datasets.
Features
-
Curated NLP Dataset Index
Focused list of public natural language datasets and lexical resources, filtered to highlight commonly used, higher-quality sources. -
YAML-based Metadata Files
Each dataset/resource is represented by an individual YAML meta file containing structured information (e.g., name, description, links, possibly licenses and modalities), enabling machine-readable indexing and easier tooling integration. -
Meta Directory (Directory-of-Directories Pattern)
Functions as a directory of external resources, not as a data host:- Links out to canonical dataset homepages or repositories.
- Aligns with Awesome-style lists and the broader Awesome Public Datasets (APD) ecosystem.
-
Coverage of Multiple NLP Resource Types
Includes various categories such as:- Question answering datasets (e.g., SQuAD)
- Parsed corpora / treebanks (e.g., Universal Dependencies)
- Lexical/semantic resources (e.g., WordNet)
- Other written/spoken language datasets relevant to NLP research and applications.
-
Integration with APD Core Structure
Lives undercore/NaturalLanguagein the apd-core repo, benefiting from:- Shared conventions with other APD sub-collections.
- Consistent metadata format across domains.
-
Open, Git-based Contribution Model
As a GitHub-hosted collection, it can be extended via pull requests:- New YAML entries can be added for additional datasets.
- Existing metadata can be updated or corrected collaboratively.
Pricing
- Free
- Public GitHub repository.
- Free to browse, clone, and use the metadata and links to external datasets (subject to each dataset’s own license and access terms).
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)APD Core is the central metadata repository that powers Awesome Public Datasets and related awesome-style data directories. It organizes links to open data sources—such as government portals for Vienna (Austria), Vietnam’s General Statistics Office, and U.S. Congressional Research Service reports—into structured YAML files for use in awesome collections and meta indexes.
An Awesome list of datasets, models, and research focused on automated question answering in natural language.
An Awesome directory of resources, datasets, and tools for natural language processing specifically in Spanish.
GitHub’s topic index page listing all public repositories tagged with the `awesome` topic, effectively serving as a central directory of Awesome lists across domains.
GitHub repository that serves as a curated meta-list collecting multiple "awesome" lists of other awesome lists, effectively a directory of meta awesome directories.
"awesome-cn" is a Chinese-language meta collection of curated "awesome" lists spanning programming languages, frameworks, and learning resources. Maintained in Python, it aggregates links to numerous topic-specific awesome lists (e.g., awesome-go, awesome-python, awesome-vue, awesome-javascript), providing a centralized entry point for Chinese developers looking for high-quality curated resources.