111 results for “topic:dataquality”
Always know what to expect from your data.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Compare tables within or across databases
Data Contracts engine for the modern data stack. https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
ML powered analytics engine for outlier detection and root cause analysis.
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
The premier open source Data Quality solution
Library for Semi-Automated Data Science
Possibly the fastest DataFrame-agnostic quality check library in town.
Open Source Data Quality Monitoring.
Frontend for the osmcha-django REST API
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
Make simple storing test results and visualisation of these in a BI dashboard
内嵌AI的数据质量控制系统
Datailot-cli is the command line interface for accessing the AI teammate for engineers to ensure best practices in their SQL and dbt projects.
Enhance your data testing seamlessly with this Dataform package, featuring robust common assertions to ensure the accuracy and integrity of your warehouse data.
BirdiDQ leverages the power of the Python Great Expectations open-source library and combines it with the simplicity of natural language queries to effortlessly identify and report data quality issues, all at the tip of your fingers.
:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.
Codes&Datasets
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool
🦆 Blazing Fast and highly customizable Github Action to setup a DuckDb runtime
Code for data quality with greatexpectations blog
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.