13 results for “topic:synthetic-dataset”
SynthDet - An end-to-end object detection pipeline using synthetic data
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.
Synthetic Dataset Generation - GANS
Synthetic sounds datasets and real sounds datasets of waterflow sounds for the repo 'Neural-Texture-Sound-Synthesis-with-physically-driven-continuous-controls'.
PhishNet is an experimental research project implementing Reinforced Self-Training (ReST) human-aligned with crafted instructions and fine-tuned models to craft a high-quality synthetic dataset of phishing emails.
Efficient and multi-language generation from CFG grammars and beyond
This repository contains a synthetic dataset and a step-by-step exploratory data analysis (EDA) workflow for a classification problem simulating customer churn prediction. The dataset is fully generated to mimic real-world scenarios with numerical, categorical, and binary target variables.
Power Distribution Modelling for cea and cel algorithms
SynGen is a tool that creates high-quality synthetic datasets using the Gemini API. It analyzes Markdown documents to generate realistic and diverse examples for machine learning, software testing, and data analysis.
🟥🟩 Comprises 10,000 two-dimensional points organized into 100 distinct circles. Designed for evaluating clustering algorithms like k-means, it presents a well-defined clustering challenge. Each point is labeled with its corresponding circle, making it suitable for both classification and clustering tasks.
Detects7 is a lightweight object-detection demo built using YOLOv8, featuring a FastAPI backend and a React + Vite frontend. Trained on the Falcon synthetic dataset, it identifies seven safety-related object classes and includes tools for training, evaluation, visualization, and local deployment.
🟩 Synthetic sales dataset simulating customer purchases. Includes demographics, purchase details, loyalty info, & transaction outcomes. Useful for behavior analysis & forecasting.
Synthetic underwater acoustic waveforms for self-supervised learning. 12,000 5-second clips at 16 kHz covering 4 vessel classes + no-vessel ambient. Non-overlapping shaft rates, blade-gated cavitation bursts, Knudsen-model sea noise. License: TBD by Altair Infrasec Pvt. Ltd. Contact styagi@oravontsystems.com for details