ICWSM Data Challenge
The ICWSM Data Challenge, running since 2009, provides large-scale social media and web datasets for annual research competitions associated with the AAAI ICWSM conference.
About this tool
ICWSM Data Challenge
Overview
The ICWSM Data Challenge is part of the ICWSM Data Sharing Initiative associated with the AAAI ICWSM conference. It provides large-scale, openly available social media and web datasets used in ICWSM-published research for use in annual research competitions and broader community work.
Key Details
- Type: Research data challenge / dataset sharing service
- Organizer: AAAI International Conference on Web and Social Media (ICWSM)
- Focus: Social media and web data (e.g., Twitter, blogs, social platforms)
- Access: Requires a registration process to obtain datasets
- Licensing: Datasets are released as community resources (openly available, subject to registration and any dataset-specific terms)
Features
-
Dataset Hosting Service
- Central hosting of datasets introduced by papers published in ICWSM proceedings.
- Aims to make research datasets reusable by the wider community.
-
Open Community Resources
- All hosted datasets are released as community resources, intended for open research use.
- Access governed via a registration process (details provided on the site under “obtaining”).
-
Continuously Growing Collection
- Initiative active for multiple years (e.g., ICWSM-16 noted as the fifth year).
- Datasets from multiple conference editions (e.g., 2012, 2015, 2016 and beyond) are or will be made available.
-
Social Media–Focused Datasets
- Emphasis on large-scale collections of tweets and other social/web content.
- Examples of 2015-related datasets (described at a high level by associated papers):
- Twitter Streaming API vs. Firehose comparison.
- Dynamics of emergent hashtags (#Bigbirds).
- Detecting comments on news articles in microblogs.
- Citation cascades in the blogosphere.
- User-generated comments for social media object annotation.
- Multi-indicator tweet geolocation.
- Meme competition and success (Quickmeme.com).
- Artist popularity across web and social music services.
- Diurnal activity patterns from social media.
- Political orientation inference from Twitter.
- Political leaning quantification from tweets and retweets.
-
Structured Dataset Descriptions (Example: ICWSM 2012)
- Some years provide tabular metadata per dataset, including:
- Number of files.
- Number of observations (tweets, accounts, entries, etc.).
- Number of Twitter users.
- Network properties where applicable (nodes, edges).
- Example 2012 datasets include (titles from associated papers):
- Opinion retrieval in Twitter (tweets labeled as relevant/irrelevant for 50 queries).
- Target-dependent sentiment expressions for movies and persons.
- Home location inference from geo-tagged tweets in 100 top cities.
- Conversation practices and network structure around a TV show (#XFactor Italia).
- Impact and influence of bots on a social network (anobii.com social data).
- Managing bad news in social media (Domino’s Pizza crisis tweet collection).
- Tracking sentiment and topic dynamics from social media (Mozilla add-on reviews; description truncated in source but indicates review data).
- Some years provide tabular metadata per dataset, including:
-
Conference Integration
- Datasets are directly tied to specific ICWSM papers and years, facilitating reproducibility and follow-up work.
- Supports annual research competitions and challenges run in conjunction with the conference.
Access and Registration
- Access requires following a registration process described on the ICWSM dataset page (section “obtaining”).
- Once registered, users can download available datasets from specific ICWSM years (e.g., 2012, 2015, 2016, etc.).
Use Cases
- Reproducibility of published ICWSM research.
- Benchmarking new algorithms on established social media datasets.
- Comparative studies across different social media phenomena (hashtags, political communication, sentiment, geolocation, network analysis, etc.).
Pricing
- The provided content only indicates that datasets are released as openly available community resources.
- No pricing or paid plans are mentioned; access appears to be free, subject to registration and any dataset-specific conditions.
Loading more......
Information
Categories
Tags
Similar Products
3 result(s)Large-scale web crawl dataset containing 3.5 billion web pages from CommonCrawl (2012), suitable for web mining, search, and network analysis research. Listed as part of an awesome-style collection of computer networks datasets.
An Awesome-style collection of short, easy-to-understand JavaScript code snippets you can grasp in 30 seconds.
A GitHub repository by Brad Traversy containing 50+ small, focused web development mini projects built with HTML, CSS, and JavaScript, useful as a curated collection of example projects for learning or referencing in awesome-style directories.