13,620 results for “topic:dataset”
A collective list of free APIs
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Faker is a Python package that generates fake data for you.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
A MNIST-like fashion product database. Benchmark :point_down:
Open source annotation tool for machine learning practitioners.
Techniques for deep learning with satellite & aerial imagery
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Documentation on how to access and use the Quick, Draw! Dataset.
Browser compatibility data for Web technologies as displayed on MDN
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Transformer: PyTorch Implementation of "Attention Is All You Need"
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Curated list of datasets and tools for post-training.
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
A synthetic data generator for text recognition
Models, data loaders and abstractions for language processing, powered by PyTorch
A curated list of awesome JSON datasets that don't require authentication.
医学影像数据集列表 『An Index for Medical Imaging Datasets』
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖
A quick guide (especially) for trending instruction finetuning datasets