41 results for “topic:data-standardization”
A modular ecosystem under this. namespace.
🩺 Machine Learning diabetes prediction model using Support Vector Machine (SVM) classifier. Analyzes 8 medical features (glucose, BMI, age, etc.) from Pima Indian dataset to predict diabetes risk with 75-80% accuracy. Built with Python, scikit-learn, pandas. Includes data preprocessing, model training, and prediction system for diabetes..
Example code accompanying the sternberg concept cell data release for Kyzar et al. (2024)
A digital transformation of cyber assessment and authorization data with a relational schema
Feature Engineering with Python
Prepare and check data to comply with Darwin Core Standard in R
Unifying Biotic Interactions Data: Terminology, Data Analysis, Standardization, and Proposal of a Data Schema for Plant-Pollinator Interactions
Highlighting expertise in data migration, data normalization and standardization, this project demonstrates successful data transfer from Snowflake to Databricks. It emphasizes optimized data flow and enhanced accessibility through standardization, showcasing a commitment to ethical data practices.
A Python-based data cleaning project to streamline Quickbooks invoice data for analysis, paving the way for improved insights into sales, pricing, and inventory management.
Building a modern data warehouse with SQL Server, including ETL Processes, Data Modeling and Analytics
A new package processes textual descriptions of drone designs to extract structured summaries of their operational capabilities. It focuses on identifying and categorizing key features such as locomot
This project is about cleaning and preparing a global layoffs dataset for analysis, focusing on handling null values, correcting data types, and ensuring data integrity for more accurate insights.
This Data Analytics project focused on understanding the career preferences and motivations of Generation Z.Through survey data and analysis, this project aims to identify key trends and factors influencing their career choices, providing insights for employers,educators, and recruiters looking to engage with this new generation of talent.
vuln-structure is a package that extracts vulnerability details from raw text and outputs standardized, structured data for security teams.
csv-managed is a Rust command-line utility for high‑performance exploration and transformation of CSV data at scale, emphasizing streaming, typed operations, and reproducible workflows via schema and index files.
Hi folk, During my internship at KultureHire, I completed an end to end Data Analytics project. I created an executive and functional dashboard using pivot tables, conducted a thorough analysis, and provided actionable recommendations. I'm excited to share my work and the insights I discovered.
🌟 Data Cleaning and Processing 🌟 Handled missing values, removed duplicates, standardized salary formats, and treated outliers for consistency.Revealed trends in company performance, job roles, and salary distributions after refining the dataset. This project highlights the power of data preprocessing as the backbone of reliable analytics.
This repository contains a SQL-based data cleaning project where raw layoffs data was transformed into a clean and structured dataset. The project showcases practical SQL techniques such as duplicate removal, data standardization, null handling, and schema optimization, following real-world data preparation best practices.
基于 Python 的 ETL 流水线,用于标准化 12 个制造基地的异构 IoT 配置数据。具备自动架构映射、多源合并及用于配置生命周期管理的每日变更日志生成功能--自动化聚合 50W+ IoT 资产并生成每日审计追踪,确保平台逻辑与边缘侧实施的一致性。
CDIS data standardization with SAS and R
☺️Hi folk, During my internship at KultureHire, I completed a real-world Data Analyst project. I created an interactive dashboard using pivot tables, conducted a thorough analysis, and provided actionable recommendations. I'm excited to share my work and the insights I discovered.
This project uses SQL to transform messy transactional sales data into a clean, validated dataset for accurate KPI and profitability analysis before BI reporting. I also built a Tableau Public dashboard from this final dataset; it can be viewed via the link below.
🧹 Excel 数据标准化清洗工具 | 100+智能规则 · 两阶段安全处理 · 公式不动 · 逐条审核 · 变更日志导出
The call center provided a messy dataset of customers. The objective was to clean, standardize, and remove duplicates to create an accurate, organized contact list. I used Pandas to load, explore, clean, and export the data, delivering a refined list ready for effective customer outreach.
A practical SQL data cleaning project that standardizes and prepares the Global Layoffs dataset for analysis using SQL techniques like window functions, staging tables, and data quality checks.
Standardized Stata templates for NaNDA data curation, quality control, and publication workflows
Tutorial code for performing PCA (with mathematical explanation) on breast cancer features computed from digitized images of fine needle aspirate (FNA) of a breast mass. Center the data, calculate correlation matrix, compute principal components, visualize and interpret results.
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
This repository contains the projects completed as part of the KultureHire internship program. The projects focus on real-world business and data analysis problems, covering data collection, cleaning, analysis, visualization, and insight generation using tools such as Excel, SQL, and Power BI.
Este proyecto incluye un proceso detallado de limpieza de datos de registros judiciales para la generación de estadísticas relevantes, utilizando Excel y Power BI. También se incluye la visualización interactiva de los datos procesados.