"topic:machine-learning-dataset" — Search

44 results for “topic:machine-learning-dataset”

Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集，中英文机器翻译数据集, 中文分词数据集

A Malware classifier dataset built with header fields’ values of Portable Executable files

machine-learning-datasetmachine-learning-for-malwaremalwaremalware-analysismalware-datasetmalware-detectionmalware-researchpefilepeheaderpython

tosiron/jazznet

jazznet dataset of piano patterns for music audio machine learning research

Python854Updated 2 years ago

audio-synthesisdata-generationdatasetdeep-learningmachine-learningmachine-learning-datasetmusic-datasetmusic-information-retrieval

JohannesBuchner/spoken-command-recognition

A large, free audio sample database (10M words pronounced), a test bed for voice activity detection algorithms and for single-syllable word recognition

Python7031Updated 8 years ago

audioaudio-classificationdatasetmachine-learningmachine-learning-datasetspoken-english

elkorchi/2DGeometricShapesGenerator

2D Geometric shapes generator

Python4412Updated 1 year ago

geometric-shapesimage-generatormachine-learning-dataset

reddyprasade/Machine-Learning-Problems-DataSets

We currently maintain 488 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians.

Python4124Updated 5 years ago

machine-learning-datasetmachine-learning-datasetsuciuci-machine-learning

FrankFeng-23/SPREAD

SPREAD is a large-scale synthetic dataset for image- and point-cloud- based tasks in forestry.

Python332Updated 7 months ago

3d-point-cloudsmachine-learningmachine-learning-datasetpythonsynthetic-dataunreal-engine-5

cvjena/cifair

A duplicate-free variant of the CIFAR test set.

Python140Updated 5 years ago

computer-visioncomputer-vision-datasetsdatasetdatasetsmachine-learningmachine-learning-datasetmachine-learning-datasets

ichisadashioko/etlcdb

Extract Japanese characters database.

Jupyter Notebook132Updated 5 years ago

handwritinghandwriting-datasethiraganajapanese-character-databasekanjimachine-learning-dataset

lqwk/ucla-dining-dataset

UCLA Dining Hall Menus Dataset

110Updated 8 years ago

machine-learningmachine-learning-datasetucla-diningucla-dining-dataset

EngineeringSoftware/math-comp-corpus

Corpus of Coq code related to MathComp including several machine-readable representations

Common Lisp101Updated 3 years ago

coqmachine-learning-datasetmathcompserapi

ml4py/dataset-iiit-pet

Classification dataset for comparing cats and dogs images

82Updated 5 years ago

machine-learning-dataset

chadsr/marktplaats-scraper

Marktplaats.nl (Dutch Classifieds) Listing Scraper

Python52Updated 2 weeks ago

chromedriverdataset-creationdataset-generationdutch-languagemachine-learningmachine-learning-datasetmarktplaatsmarktplaats-nlscraperseleniumweb-scraperweb-scraping

deepinstinct-algo/DeepURLBench

This repo is the dataset for the paper "A New Dataset and Methodology for Malicious URL Classification"

40Updated 1 year ago

machine-learning-datasetnlp-datasetphishing-detectionurl-classificationurl-dataset

latentcollection/macOS-fontface-scraper

OpenFrameworks program that generates training data from font-faces installed on your Mac.

Makefile40Updated 1 year ago

image-processingmachine-learning-datasetmacosopenframeworkstypography

krzjoa/Komentarze

Korpus ręcznie sklasyfikowanych komentarzy do uczenia maszynowego (filtrowanie komentarzy obraźliwych)

Python30Updated 3 years ago

corpuscorpus-datadatasetjson-datamachine-learning-dataset

screddy1313/amazon-product-images-downloader

Given a product name, the python program downloads all the images. This includes pagenation also.

Jupyter Notebook35Updated 3 years ago

amazon-automationimage-downloader-pythonmachine-learning-datasetselenium-python

Camponotus-vagus/iNaturalist-Image-Downloader

Batch download images from iNaturalist observations. GUI app for creating ML datasets, biodiversity research, and citizen science projects. No coding required - standalone executables for Windows, macOS & Linux.

Python21Updated 1 month ago

batch-downloadbiodiversitycitizen-sciencedatasetecologyguiimage-downloaderinaturalistmachine-learning-datasetnaturephotographypythontkinterwildlife

sferez/BybitMarketData

This repository serves as a collection point for market data from Bybit. Aimed at facilitating machine learning model creation and finetuning.

20Updated 1 year ago

aiai-dataai-data-collectionbybitbybit-websocketcryptocrypto-datacryptocurrencycryptocurrency-datasetsmachine-learning-datasetmarketmarket-datatradingtrading-bottrading-strategies

mkaspulanwar/rupiah-banknotes-dataset

Rupiah Banknotes Dataset is a collection of Indonesian currency images (Rp1,000, Rp2,000, Rp5,000, Rp10,000, Rp20,000, Rp50,000, and Rp100,000) designed for Machine Learning (ML) and Computer Vision (CV) tasks.

20Updated 6 months ago

computer-vision-datasetcurrency-banknotes-datasetcurrency-classificationdeep-learning-datasetindonesia-money-datasetindonesian-currency-datasetmachine-learning-datasetrupiah-banknotesrupiah-dataset

jay-johnson/network-pipeline-datasets

CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Vue and Spring - Anti-Nex training datasets

20Updated 8 years ago

aicsv-datacsv-datasetsdjangoflaskflask-restfulfree-datasetsmachine-learningmachine-learning-datasetmachine-learning-defensenetwork-analysisnetwork-securityowasppython3reactreact-reduxspringspring-bootvuevue2

vtalpaert/pytorch-geometric-visual-task

Simple task for mixed image-graph data

Python20Updated 6 years ago

datasetdeep-learning-datasetsgeometric-deep-learninggraph-neural-networksmachine-learning-datasetpytorchpytorch-geometric

ironsheep/P2-Knowledge-Base

📚 Authoritative P2 microcontroller documentation: architecture, PASM2/Spin2 languages, smart pins, and examples. Optimized for AI training, developer education, and technical reference

Python20Updated 1 week ago

ai-friendlyai-trainingai-training-permittedcode-examplesdocumentationembedded-systemslearning-resourcesmachine-learning-datasetmicrocontrollermulticorep2parallaxpasm2propeller2reference-documentationsmart-pinsspin2technical-documentation

hernancapucci/agent-manifest-dataset

Public dataset of Agent Manifest declarations registered through the Agent Manifest registry.

10Updated 1 week ago

agent-datasetagent-manifestai-agentsai-datasetai-governanceai-researchmachine-learning-datasetopen-data

aalekhpatel07/captcha-generator

Generate captchas for ML tasks in parallel.

Rust10Updated 4 years ago

captchacaptcha-generatormachine-learning-datasetrust

DavidWalz/dlipr

tools for a deep learning in physics research course

Python14Updated 7 years ago

deep-learningmachine-learning-datasetphysics

mookiezi/dataset-cleaning-toolkit

A dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, Parquet cleaning and sorting, HuggingFace–style metadata generation, and batched chain insertion into PostgreSQL — with Rich progress, multiprocessing, and 32 GB-RAM-friendly batching.

Python10Updated 5 months ago

chatmlcleanercsvdatasetmachine-learningmachine-learning-datasetmlnatural-language-processingnlptoolboxtoolkit

mookiezi/dataset-pipeline

A full Discord dataset pipeline with end-to-end flow from raw Discord data to final Parquet dataset with full statistics — every stage independant, idempotent, and CLI-driven for ease of automation.

10Updated 6 months ago

datasetmachine-learningmachine-learning-datasetmlnatural-language-processingnlppipeline

Elijas/sentence-polarity-dataset-v1.0

sentence polarity dataset v1.0 (includes sentence polarity dataset README v1.0): 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.

11Updated 6 years ago

datasetmachine-learning-datasetpolarity-dataset

mookiezi/dataset-toolbox

A dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, dialogue-turn filtering, turn-based filtering, token and turn analysis, Parquet cleaning and sorting, HuggingFace–style metadata generation, and batched chain insertion into PostgreSQL.

Python00Updated 6 months ago

csvdatasetmachine-learningmachine-learning-datasetmlnatural-language-processingnlpparquetstatstoolboxtoolkit

Page 1 of 2