apd-core – NaturalLanguage Section

Category: Meta-directories
Tags: datasets, nlp, directory-of-directories
Source: GitHub – awesomedata/apd-core (NaturalLanguage)

awesomedata

Overview

The NaturalLanguage section of the APD (Awesome Public Datasets) core repository is a curated, Awesome-style sub-collection focused on natural language processing (NLP) datasets and lexical resources. Instead of hosting datasets directly, it acts as a meta directory that indexes multiple high‑quality external NLP resources through individual YAML metadata files.

Examples of referenced resources include:

Question answering datasets (e.g., SQuAD)
Syntactic and morphosyntactic corpora (e.g., Universal Dependencies)
Lexical databases (e.g., WordNet)

This section follows the broader Awesome ecosystem pattern of providing a structured directory-of-resources to help users discover and navigate NLP datasets.

Features

Curated NLP Dataset Index
Focused list of public natural language datasets and lexical resources, filtered to highlight commonly used, higher-quality sources.
YAML-based Metadata Files
Each dataset/resource is represented by an individual YAML meta file containing structured information (e.g., name, description, links, possibly licenses and modalities), enabling machine-readable indexing and easier tooling integration.
Meta Directory (Directory-of-Directories Pattern)
Functions as a directory of external resources, not as a data host:
- Links out to canonical dataset homepages or repositories.
- Aligns with Awesome-style lists and the broader Awesome Public Datasets (APD) ecosystem.
Coverage of Multiple NLP Resource Types
Includes various categories such as:
- Question answering datasets (e.g., SQuAD)
- Parsed corpora / treebanks (e.g., Universal Dependencies)
- Lexical/semantic resources (e.g., WordNet)
- Other written/spoken language datasets relevant to NLP research and applications.
Integration with APD Core Structure
Lives under core/NaturalLanguage in the apd-core repo, benefiting from:
- Shared conventions with other APD sub-collections.
- Consistent metadata format across domains.
Open, Git-based Contribution Model
As a GitHub-hosted collection, it can be extended via pull requests:
- New YAML entries can be added for additional datasets.
- Existing metadata can be updated or corrected collaboratively.

Pricing

Free
- Public GitHub repository.
- Free to browse, clone, and use the metadata and links to external datasets (subject to each dataset’s own license and access terms).

Connect with us

Stay Updated

Product

Clients

Company

Resources

apd-core - NaturalLanguage section

Information

Categories

Tags

Similar Products