33 results for “topic:data-testing”
Data Contracts engine for the modern data stack. https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
Code review for data in dbt
Data validation toolkit for assessing and monitoring data quality.
Various files useful for manual testing and test automation etc.
Great Expectations Airflow operator
re_data - fix data issues before your users & CEO would discover them 😊
A simple and easy to use Data Validation library for Python.
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Spark Data Test - A PySpark-based automation testing utility to compare Spark DataFrames
DataBridge Quality Control
data and pipeline testing with and for SQL
Data generation and validation tool for any data source
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
Example API implementation for Data Caterer
Library for data quality monitoring based on duckdb.
Documentation for Data Caterer
This project implements a simple Linear Regression model from scratch and compares it to the implementation of scikit-learn, using the California Housing Dataset.
🎯 DBT Masterclass with CI/CD | Hands-on Data Transformation Guide A comprehensive tutorial repository for mastering Data Build Tool (DBT) from scratch to deployment! Includes Bronze-Silver-Gold architecture, testing, snapshots, macros, and production CI/CD workflows. Follow along with the 5-hour video tutorial! 🚀
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
Translating between two sets of notation for Kalman filters
Credit Risk Classification
Data Migration Testing | Data Reconcilation
Dynamic data testing engine based on pySpark
National Grid ( Python, SQL Server, SSIS, SSRS, Tableau, Power BI, SQL Server Import Export Wizard, Data Validations, Data Integrations, Data Conversions )
A data testing framework that executes queries on configurable data providers and validates the results with customizable YAML-defined assertions. Ensure data integrity, consistency, and reliability effortlessly.