Home / Papers / Datasets: A Community Library for Natural Language Processing

Datasets: A Community Library for Natural Language Processing

DOI: 10.18653/v1/2021.emnlp-demo.21Source

318 Citations•2021•

Quentin Lhoest, A. Villanova del Moral, Yacine Jernite

journal unavailable

After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks.

Abstract

Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut, Stas Bekman, Pierric Cistac, Thibault Goehringer, Victor Mustar, François Lagunas, Alexander Rush, Thomas Wolf. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2021.