webis-de/small-text
Active Learning for Text Classification in Python
Active Learning for Text Classification in Python.
Installation | Quick Start | Contribution | Changelog | Docs
Small-Text provides state-of-the-art Active Learning for Text Classification.
Several pre-implemented query strategies, initialization strategies, and stopping criteria are provided,
which can be easily mixed and matched to build active learning experiments or applications.
What is Active Learning?
Active learning allows you to efficiently label training data for supervised learning when you have little to no labeled data.
For example, Active Learning has previously been used for:
- Bootstrapping a biomedical corpus of digenic variant combinations
- Detecting disclosures of individuals' employment status on social media
- Accelerating systematic literature reviews
- Topic categorization of citizen contributions
See the showcase section specifically for previous active learning applications where small-text was used.
Features
- Provides unified interfaces for Active Learning, allowing you to
easily mix and match query strategies with classifiers provided by sklearn, PyTorch, or transformers. - Supports GPU-based Pytorch models and integrates transformers
so that you can use state-of-the-art Text Classification models for Active Learning. - GPU is supported but not required. CPU-only use cases require only
a lightweight installation with minimal dependencies. - Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).
News
Community Survey - March 8th, 2026
- How is active learning used in NLP today? Our EACL 2026 paper, "Reassessing Active Learning Adoption in Contemporary NLP: A Community Survey",
investigates this question and presents new insights into its contemporary use.
Version 2.0.0 dev3 (v2.0.0.dev3) - August 17th, 2025
- This is a development release with the most changes so far. You can consider it an alpha release, which does not guarantee you stable interfaces yet,
but is otherwise ready to use. - Version 2.0.0 offers refined interfaces, new query strategies, improved classifiers, and new functionality such as vector indices. See the changelog for a full list of changes.
Version 1.4.1 (v1.4.1) - August 18th, 2024
- Bugfix release.
Paper published at EACL 2023 🎉
- The paper introducing small-text has been accepted at EACL 2023. Meet us at the conference in May!
- Update: the paper was awarded EACL Best System Demonstration. Thank you for your support!
For a complete list of changes, see the change log.
Installation
Small-Text can be easily installed via pip:
pip install small-textThe command results in a slim installation with only the necessary dependencies.
For a full installation via pip, you just need to include the transformers extra requirement:
pip install small-text[transformers]The library requires Python 3.9 or newer. For using the GPU, CUDA 10.1 or newer is required.
More information regarding the installation can be found in the
documentation.
Quick Start
For a quick start, see the provided examples for binary classification,
pytorch multi-class classification, and
transformer-based multi-class classification,
or check out the notebooks.
Notebooks
Showcase
A full list of showcases can be found in the docs.
🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.
Documentation
Read the latest documentation here. Noteworthy pages include:
Scope of Features
| Name | Active Learning | |
|---|---|---|
| Query Strategies | Stopping Criteria | |
| small-text v1.3.0 | 14 | 5 |
| small-text v2.0.0 | 19 | 5 |
We use the numbers only to show the tremendous progress that small-text has made over time.
There are many features and improvements that are not reflected in these numbers.
Alternatives
modAL, ALiPy, libact, ALToolbox
Contribution
Contributions are welcome. Details can be found in CONTRIBUTING.md.
Acknowledgments
This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group
which is a part of the Webis research network.
The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
Citation
Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:
@inproceedings{schroeder2023small-text,
title = "Small-Text: Active Learning for Text Classification in Python",
author = {Schr{\"o}der, Christopher and M{\"u}ller, Lydia and Niekler, Andreas and Potthast, Martin},
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-demo.11",
pages = "84--95"
}

