68 results for “topic:text-data”
Large-scale pretraining for dialogue
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Cleans Reddit Text Data :scroll: :broom:
Tools to uniformly read in text data including semi-structured transcripts
Tools for reshaping text data
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Visualize large text collections with WebGL
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
Scrape EDGAR filings from https://www.sec.gov/
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.
곰tv 자막 데이터 수집 코드
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
Dataset of League of Legends Voice Lines
For reading from and writing to parallel data files in Python
A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.
Directional Co-clustering with a Conscience (DCC)
A machine learning model that predicts tags for a given question and body.
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
A tutorial on using regular expressions in R
classifying employee reviews on glassdoor.com