"topic:text-processing" — Search

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Python67595Updated 1 day ago

nlpnlp-librarysemevalspell-correctorspelling-correctiontext-processingtext-segmentationtokenizationtokenizerword-normalizationword-segmentation

open-korean-text/open-korean-text

Open Korean Text Processor - An Open-source Korean Text Processor

Scala65797Updated 1 week ago

koreankorean-text-processingkorean-tokenizernatural-language-processingtext-processingtokenizer

kreuzberg-dev/html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.

HTML56050Updated 5 hours ago

hocrhtmlhtml-convertermarkdownmarkdown-converterragtext-extractiontext-processing

lukaszliniewicz/Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.

Python54039Updated 3 days ago

audiobookaudiobook-creatoraudiobook-makeraudiobookscustomtkinterprojectsdubbingllmpdf-to-audiorvcsilerosubtitle-to-speechsubtitle-to-voicetext-processingtext-to-speechtkinter-guivoice-clonevoice-cloningvoicecraftxttsxttsv2

Puchaczov/Musoq

SQL Runtime without any database

C#50122Updated 1 hour ago

ai-assisted-queriescross-platformcsharpcsvdata-analysis-sqldata-explorationdata-processingdotnetdotnet-coredotnetcorefile-systemplugin-architecturequery-languagesqlsql-liketext-processing

linuxscout/pyarabic

pyarabic

Python47987Updated 1 week ago

arabic-languagenlp-librarytext-processing

proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Python47767Updated 2 months ago

computational-linguisticsevaluation-metricsfolialanguage-modellinglibrarylinguisticsmachine-learningnatural-language-processingnlpnlp-librarypythonsearch-algorithmstext-processing

haven-jeon/PyKoSpacing

Automatic Korean word spacing with Python

Python425114Updated 14 hours ago

korean-nlpnlpspacingtext-processing

andrewbihl/bsed

Simple SQL-like syntax on top of Perl text processing.

Python41313Updated 3 months ago

awkcsvdomain-specific-languagegrepperlpythonsedtext-processing

airbnb/artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them