"topic:text-data" — Search

68 results for “topic:text-data”

Large-scale pretraining for dialogue

data-processingdialogptdialoguegpt-2machine-learningpytorchtext-datatext-generationtransformer

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Python2.4k368Updated 4 years ago

bertcasl-projectdata-processingdeep-learningdialog-systemsgpt-2machine-learningmachine-translationnatural-language-processingpythontensorflowtexartext-datatext-generationxlnet

microsoft/GODEL

Large-scale pretrained models for goal-directed dialog

Python887114Updated 2 years ago

conversational-aidata-processingdialogptdialoguedialogue-systemsgrounded-generationlanguage-groundinglanguage-modelmachine-learningpretrained-modelpytorchtext-datatext-generationtransformertransformers

asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Python746113Updated 3 years ago

bertcasl-projectdata-processingdeep-learningdialog-systemsgpt-2machine-learningmachine-translationnatural-language-processingpythonpytorchrobertatexartexar-pytorchtext-datatext-generationxlnet

asyml/forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

Python25059Updated 2 years ago

data-processingdeep-learninginformation-retrievalmachine-learningnatural-languagenatural-language-processingpipelinepythontext-data

thu-coai/cotk

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

Python12925Updated 5 years ago

cotkdata-processingdeep-learningmachine-learningmetricsnatural-language-generationnatural-language-processingpythontext-data

LoLei/redditcleaner

Cleans Reddit Text Data :scroll: :broom:

Python832Updated 5 years ago

data-cleaninghacktoberfestnlpprawpsawpushshiftpythonreddittext-data

trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

R776Updated 3 years ago

docdocxpdf-readingrread-transcriptstext-datatext-mining

trinker/textshape

Tools for reshaping text data

R532Updated 1 year ago

data-reshapingmanipulationrsentence-boundary-detectiontext-datatext-formatingtidy

BALaka-18/rake_new2

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

Python2920Updated 1 year ago

keyword-extractionkeyword-searchkeywordsnlppython-librarytexttext-data

PratikBarhate/question-classification

Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].

Python2912Updated 5 years ago

experimentalmachine-learningneural-networknlppython3pytorchquestion-classificationspacytext-data

YaleDHLab/wordmapArchived

Visualize large text collections with WebGL

JavaScript285Updated 1 year ago

data-visualizationnlptext-datawebglword2vec

carted/processing-text-data

Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).

Python206Updated 4 years ago

apache-beambertdataflowtensorflowtext-datatfhubuse-bert

PedroBarcha/old-books-dataset

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

HTML152Updated 8 years ago

binarizationbinarized-datasetbooks-datasetdatasetground-truthgroundtruthocr-databaseocr-datasetold-booksold-documentstexttext-datatext-database

tylerjthomas9/ScrapeSEC.jl

Scrape EDGAR filings from https://www.sec.gov/

Julia140Updated 1 year ago

edgarfinancefinancial-datajuliascrapersectext-data

tayebiarasteh/retweet

How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies

Python115Updated 4 years ago

bidirectional-lstmdeep-learningdeep-neural-networkslstmmanual-annotationsnatural-languagenatural-language-processingnlppytorchsentiment-analysissentiment-polaritytext-classificationtext-datatweepytweettweet-analysistweet-datatweet-repliestweeterunsupervised-learning

Hsankesara/The-Tweets-of-Wisdom

A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

Jupyter Notebook92Updated 6 years ago

nlptext-datatext-datasetstweepytweets

FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100

This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.

70Updated 2 years ago

datasetdeep-learningllmmachine-learningnlppythontext-data

mrchypark/gomSubtitleData

곰tv 자막 데이터 수집 코드

R66Updated 9 years ago

datadramakoreanmoviesrsubtitlestexttext-data

XMU-Kuangnan-Fang-Team/SpecificLDA

A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data

Python43Updated 1 year ago

ldapythonspecific-ldatext-data

sevvalckc/Turkish-SAD

Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

Python31Updated 10 months ago

sentiment-analysistext-dataturkish-dataset

PriyankaSett/predicting_instagram_likes

The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.

Jupyter Notebook30Updated 2 years ago

decision-tree-regressionknn-regressionlasso-regressionlinear-regressionnltkpandaspythonrandom-forest-regressionregression-analysisseaborntext-datatf-idfwordninja

Allan-Cao/lol-voice-lines

Dataset of League of Legends Voice Lines

Jupyter Notebook30Updated 2 years ago

datasetleague-of-legendstext-data

SignalN/parallelio

For reading from and writing to parallel data files in Python

Python30Updated 8 years ago

machine-learningnatural-language-processingpre-processingpreprocessingtexttext-data

Mohampouraz/Persian-poetry

A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.

Python30Updated 6 months ago

farsifarsi-datasetsliteraturemachine-learningnlpnlp-machine-learningpersianpersian-poetrytext-classificationtext-data

saghiles/dcc

Directional Co-clustering with a Conscience (DCC)

R30Updated 6 years ago

clusteringco-clusteringdirectional-statisticsmixture-modeltext-clusteringtext-datatopic-modelingvon-mises-fisher

Ankit152/StackOverflow-Tag-Prediction

A machine learning model that predicts tags for a given question and body.

Jupyter Notebook30Updated 4 years ago

count-vectorizerhamming-lossmachine-learningmicro-f1scorenlponevsrestclassifiersgd-classifierstackoverflowstemmingtag-predictiontagstext-datatext-miningtfidf-vectorizer

sugatagh/Natural-Language-Processing-with-Disaster-Tweets

The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.

Jupyter Notebook21Updated 2 years ago

classification-modelmachine-learningnatural-disastersnatural-language-processingtext-data

jfjelstul/regular-expressions-tutorial

A tutorial on using regular expressions in R

20Updated 3 years ago

rregular-expressionsstringrtext-analysistext-as-datatext-datatidyversetutorial

ccubc/GlassdoorReviews

classifying employee reviews on glassdoor.com

Jupyter Notebook21Updated 5 years ago

big-dataldanlptext-data

Page 1 of 3