ICWSM Data Challenge

Overview

The ICWSM Data Challenge is part of the ICWSM Data Sharing Initiative associated with the AAAI ICWSM conference. It provides large-scale, openly available social media and web datasets used in ICWSM-published research for use in annual research competitions and broader community work.

Key Details

Type: Research data challenge / dataset sharing service
Organizer: AAAI International Conference on Web and Social Media (ICWSM)
Focus: Social media and web data (e.g., Twitter, blogs, social platforms)
Access: Requires a registration process to obtain datasets
Licensing: Datasets are released as community resources (openly available, subject to registration and any dataset-specific terms)

Features

Dataset Hosting Service
- Central hosting of datasets introduced by papers published in ICWSM proceedings.
- Aims to make research datasets reusable by the wider community.
Open Community Resources
- All hosted datasets are released as community resources, intended for open research use.
- Access governed via a registration process (details provided on the site under “obtaining”).
Continuously Growing Collection
- Initiative active for multiple years (e.g., ICWSM-16 noted as the fifth year).
- Datasets from multiple conference editions (e.g., 2012, 2015, 2016 and beyond) are or will be made available.
Social Media–Focused Datasets
- Emphasis on large-scale collections of tweets and other social/web content.
- Examples of 2015-related datasets (described at a high level by associated papers):
  - Twitter Streaming API vs. Firehose comparison.
  - Dynamics of emergent hashtags (#Bigbirds).
  - Detecting comments on news articles in microblogs.
  - Citation cascades in the blogosphere.
  - User-generated comments for social media object annotation.
  - Multi-indicator tweet geolocation.
  - Meme competition and success (Quickmeme.com).
  - Artist popularity across web and social music services.
  - Diurnal activity patterns from social media.
  - Political orientation inference from Twitter.
  - Political leaning quantification from tweets and retweets.
Structured Dataset Descriptions (Example: ICWSM 2012)
- Some years provide tabular metadata per dataset, including:
  - Number of files.
  - Number of observations (tweets, accounts, entries, etc.).
  - Number of Twitter users.
  - Network properties where applicable (nodes, edges).
- Example 2012 datasets include (titles from associated papers):
  - Opinion retrieval in Twitter (tweets labeled as relevant/irrelevant for 50 queries).
  - Target-dependent sentiment expressions for movies and persons.
  - Home location inference from geo-tagged tweets in 100 top cities.
  - Conversation practices and network structure around a TV show (#XFactor Italia).
  - Impact and influence of bots on a social network (anobii.com social data).
  - Managing bad news in social media (Domino’s Pizza crisis tweet collection).
  - Tracking sentiment and topic dynamics from social media (Mozilla add-on reviews; description truncated in source but indicates review data).
Conference Integration
- Datasets are directly tied to specific ICWSM papers and years, facilitating reproducibility and follow-up work.
- Supports annual research competitions and challenges run in conjunction with the conference.

Access and Registration

Access requires following a registration process described on the ICWSM dataset page (section “obtaining”).
Once registered, users can download available datasets from specific ICWSM years (e.g., 2012, 2015, 2016, etc.).

Use Cases

Reproducibility of published ICWSM research.
Benchmarking new algorithms on established social media datasets.
Comparative studies across different social media phenomena (hashtags, political communication, sentiment, geolocation, network analysis, etc.).

Pricing

The provided content only indicates that datasets are released as openly available community resources.
No pricing or paid plans are mentioned; access appears to be free, subject to registration and any dataset-specific conditions.

Key Details

Type: Research data challenge / dataset sharing service

Organizer: AAAI International Conference on Web and Social Media (ICWSM)

Focus: Social media and web data (e.g., Twitter, blogs, social platforms)

Access: Requires a registration process to obtain datasets

Licensing: Datasets are released as community resources (openly available, subject to registration and any dataset-specific terms)

Features

Dataset Hosting Service

Central hosting of datasets introduced by papers published in ICWSM proceedings.
Aims to make research datasets reusable by the wider community.

Open Community Resources

All hosted datasets are released as community resources, intended for open research use.
Access governed via a registration process (details provided on the site under “obtaining”).

Continuously Growing Collection

Initiative active for multiple years (e.g., ICWSM-16 noted as the fifth year).
Datasets from multiple conference editions (e.g., 2012, 2015, 2016 and beyond) are or will be made available.

Social Media–Focused Datasets

Emphasis on large-scale collections of tweets and other social/web content.
Examples of 2015-related datasets (described at a high level by associated papers):
- Twitter Streaming API vs. Firehose comparison.
- Dynamics of emergent hashtags (#Bigbirds).
- Detecting comments on news articles in microblogs.
- Citation cascades in the blogosphere.
- User-generated comments for social media object annotation.
- Multi-indicator tweet geolocation.
- Meme competition and success (Quickmeme.com).
- Artist popularity across web and social music services.
- Diurnal activity patterns from social media.
- Political orientation inference from Twitter.
- Political leaning quantification from tweets and retweets.

Structured Dataset Descriptions (Example: ICWSM 2012)

Some years provide tabular metadata per dataset, including:
- Number of files.
- Number of observations (tweets, accounts, entries, etc.).
- Number of Twitter users.
- Network properties where applicable (nodes, edges).
Example 2012 datasets include (titles from associated papers):
- Opinion retrieval in Twitter (tweets labeled as relevant/irrelevant for 50 queries).
- Target-dependent sentiment expressions for movies and persons.
- Home location inference from geo-tagged tweets in 100 top cities.
- Conversation practices and network structure around a TV show (#XFactor Italia).
- Impact and influence of bots on a social network (anobii.com social data).
- Managing bad news in social media (Domino’s Pizza crisis tweet collection).
- Tracking sentiment and topic dynamics from social media (Mozilla add-on reviews; description truncated in source but indicates review data).

Conference Integration

Datasets are directly tied to specific ICWSM papers and years, facilitating reproducibility and follow-up work.
Supports annual research competitions and challenges run in conjunction with the conference.

ICWSM Data Challenge

About this tool

ICWSM Data Challenge

Overview

Key Details

Features

Access and Registration

Use Cases

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

ICWSM Data Challenge

About this tool

ICWSM Data Challenge

Overview

Key Details

Features

Access and Registration

Use Cases

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources