Top Repositories
Create a teiCorpus-file from a collection of TEI documents
Split texts (.txt, .xml) into paragraphs
Fix errors in xml document that make it invalid according to TEI P5
The dice game "Pig" for the command line
Find near-duplicates on paragraph level
Multilingual Sentence & Image Embeddings with BERT
Repositories
15Multilingual Sentence & Image Embeddings with BERT
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Split texts (.txt, .xml) into paragraphs
Create a teiCorpus-file from a collection of TEI documents
Fix errors in xml document that make it invalid according to TEI P5
Text anonymization in many languages using Faker
Simple script for downloading Youtube comments without using the Youtube API
The dice game "Pig" for the command line
This repository contains supplementary code for BSc thesis at the University of Potsdam
Find near-duplicates on paragraph level
The lxml XML toolkit for Python
A (sort of) animated unicode pig for the command line
A simple tool to pull the complete edit history of a Wikipedia page
Compute similarity between trees, e.g. dependency trees
Material für Hausarbeit im Kurs Stance Detection, Universität Potsdam, Sommersemester 2021