"topic:dataset" — Search

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️

Vue5.5k682Updated 14 hours ago

aiasset-managementdatasetdeepseekdeployfinetunegithuggingfaceinferencellmmanagement-systemmodelplatformpromptrayspace

SPLWare/esProc

esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.

Java4.7k360Updated 2 days ago

cluster-computingdatabasedatasetesprocjavasql

tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

Python4.6k1.6kUpdated 10 hours ago

datadatasetdatasetsjaxmachine-learningnumpytensorflow

hyunwoongko/transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python4.5k629Updated 11 hours ago

attentiondatasetpytorchtransformer

rom1504/img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python4.4k372Updated 4 hours ago

big-datadatasetdeep-learningdownload-imagesimageimage-datasetmultimodal

whoiskatrin/sql-translator

SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

TypeScript4.3k377Updated 3 days ago

data-analysisdata-engineeringdataquerydatasciencedatasetopenaipostgresqlquerysql

mlabonne/llm-datasets

Curated list of datasets and tools for post-training.

4.3k352Updated 2 hours ago

datadatasetllm

wainshine/Chinese-Names-Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

4.3k1.0kUpdated 1 day ago

corpusdatasetdictnamesner

CLUEbenchmark/CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Python4.2k546Updated 2 days ago

albertbenchmarkbertchinesechinesegluecorpusdatasetgluelanguage-modelnlupretrained-modelspytorchrobertatensorflowtransformers

Charmve/Surface-Defect-Detection

📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.

Python4.0k596Updated 1 day ago

charmvedatasetdeep-learningdefectsimage-segmentationpaperpcb-surface-defectsurfacesurface-defect-detectionsurface-defectssurface-detection

Belval/TextRecognitionDataGenerator

A synthetic data generator for text recognition

Python3.6k1.0kUpdated 5 days ago

datadatasetfakeocrsynthetictexttext-recognitiontraining-set-generator

pytorch/textArchived

Models, data loaders and abstractions for language processing, powered by PyTorch

Python3.6k813Updated 1 week ago

data-loaderdatasetdeep-learningmodelsnlppytorch

jdorfman/awesome-json-datasetsArchived

A curated list of awesome JSON datasets that don't require authentication.

JavaScript3.6k388Updated 3 days ago

awesomeawesome-listdatadatasetdatasetsjsonjson-datasetlist

linhandev/dataset

医学影像数据集列表『An Index for Medical Imaging Datasets』

3.5k426Updated 15 hours ago

4d-lungctdatasetgrand-challengemedical-imagingmrimsdqin-lung-ctqin-prostate-repeatabilitytcia

ashvardanian/StringZilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

C3.4k120Updated 2 days ago

datasetedit-distancegpuhashhashinginformation-retrievallevenshtein-distanceparsersearchsimdsorting-algorithmsstringstring-manipulationstring-matchingstring-parsingstring-searchsubstringunicode

Zjh-819/LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

3.4k232Updated 11 hours ago

chatbotchatgptdatasetllm

Page 1 of 34