143 results for “topic:data-normalization”
FxNorm-Automix - Implementation of automatic music mixing systems. We show how we can use wet music data and repurpose it to train a fully automatic mixing system
Computational Suite for Bioinformaticians and Biologists (CSBB) is a RShiny application developed with an intention to empower researchers from wet and dry lab to perform downstream Bioinformatics analysis
An exploratory data analysis on a global terrorism dataset
Cross-platform desktop app to search, compare, normalize and analyze Excel files. Data regions, column-level type correction, in-place export. Built with .NET 8 + Avalonia UI.
The PyDI framework provides methods for end-to-end data integration. The framework covers all steps of the integration process, including schema matching, data translation, entity matching, and data fusion. The framework offers traditional string-based methods as well as modern LLM- and embedding-based techniques for these tasks.
Machine Learning Algorithms to predict overall movie gross using IMDB dataset from kaggle
preon (PREcision Oncology Normalization) is a fuzzy search tool for medical entities.
Zero-dependency Rust implementation of ASAP (Automatic Smoothing for Attention Prioritization) for Time Series
Term Project repository for System Analysis and Design course in ITM, Seoultech.
DeltaFi is a flexible, code-light data transformation and normalization platform.
A utility for defining metadata for data types and formats.
🔷 Data Cleaning and Insight Generation from Survey Data 🔷 Cleaned and preprocessed Kaggle’s Data Science Survey data, handling missing values, duplicates, and categorical responses. Applied label encoding and normalization to prepare the dataset for analysis. Built 12+ visualizations (pie, scatter, box, line, heatmap, etc.)
Contact data normalization adapted from the Empreinte Sociométrique's normalizers
I made various data normalization operations with python scripts. Target data in CSV format
This repository provides a practical introduction to data acquisition and analysis using Pandas. It covers loading datasets, exploring data, manipulating data, and gaining insights through statistical summaries. Ideal for beginners, it offers code examples and explanations to enhance your data manipulation skills using Pandas for Python.
Web app to fetch artists events data via public API. Managing global state with redux-toolkit. Responsive design with material-ui. Cool animations and transitions
Feature Engineering with Python
Transformer model for Biomarkers prediction: Evaluating the impact of ECDF normalization on model robustness in clinical data
A production-ready serverless pattern for intelligent data normalization using Claude Haiku via AWS Bedrock
Machine Learning Nano-degree Project : To help a charity organization identify people most likely to donate to their cause
A large pile of interesting and/or useful information
Developed a Python-based web scraper leveraging generative AI with LangChain and GPT-4o-mini to extract and classify FDA drug approval data. Processed over 1,770 records, dynamically categorizing medications and treatment areas using LLMs to simplify complex medical information into actionable insights.
The data preprocessing pipeline for the VISION project (for mouse data)
This project predicts used car prices using a feedforward neural network regression model implemented in PyTorch. Features include car age, mileage, and other attributes. The pipeline supports feature normalization, train/validation/test splitting, and visualization of training and validation loss curves.
Highlighting expertise in data migration, data normalization and standardization, this project demonstrates successful data transfer from Snowflake to Databricks. It emphasizes optimized data flow and enhanced accessibility through standardization, showcasing a commitment to ethical data practices.
The aim of this project is to develop, design, and build a comprehensive and scalable database system for Olist Store to handle potential increases in data volume and allow for more efficient data collection, retrieval, and organization.
Removing unwanted variation in mass spectrometry data with missing values
Clinical Decision Support System (CDSS) for Emergency Triage. Python implementation of regional healthcare protocols featuring complex logic, input normalization, and automated clinical pathways
A collection of bioinformatics and data mining scripts
🌟 Fraud Detection in Application 🌟 Through Isolation Forest and K-Means Clustering, the project detects suspicious patterns like inconsistent income, duplicate entries, and unrealistic employment data. This end-to-end workflow transforms raw data into actionable fraud insights — enhancing trust and accuracy.