401 results for “topic:corpus-linguistics”
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
My book list
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
A list of Indonesian NLP resources.
A curated list of NLP resources for Hungarian
A web-based engine for creating and annotating textual corpora
data resource untuk NLP bahasa indonesia
Crawler for linguistic corpora
:spider: The pipeline for the OSCAR corpus
Kanji usage frequency data collected from various sources
Data for the quantitative study of (Vedic) Sanskrit
Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
An advanced, extensible web front-end for the Manatee-open corpus search engine
Large silver standart Russian corpus with NER, morphology and syntax markup
A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。
A textual corpus database for the digital humanities.
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
My solutions to selected exercises to "Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit" by Steven Bird, Ewan Klein, and Edward Loper.
A set of workflows for corpus building through OCR, post-correction and normalisation
Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.
Rezonator: Dynamics of human engagement
CONLL-U to Pandas DataFrame
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include semantic tags from Biber (2006) and Biber et al. (1999), including other specific tags.
Yet another search platform for linguistic corpora.
Korpuslinguistik war noch nie so einfach...
Thai Law Dataset (Act of Parliament)