35 results for “topic:data-matching”
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A powerful and modular toolkit for record linkage and duplicate detection in Python
A list of free data matching and record linkage software.
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
🔎 Finds fuzzy matches between CSV files
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Resources for tackling record linkage / deduplication / data matching problems
Link Wikidata items to large catalogs
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
A browser user interface for manual labeling of record pairs.
A maximum-strength name parser for record linkage.
Welcome to Snowman App – a Data Matching Benchmark Platform.
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
🔎 Finds fuzzy matches between datasets
DuckDB community extension for locality-sensitive hashing (LSH)
A collection of awesome resources regarding Record Linkage.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
This projects aims to provide lists containing only great movies to users based only a few filters and search parameters.
Weka Comparator to match rules to test data with filtering abilites
A standardized email and phone number normalization and hashing utility that follows UID2 specifications for email address and phone number processing. This tool ensures consistent normalization and hash generation for identity resolution and data matching purposes.
Service for automatic matching two data sets without mapping
Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.
Crawl, matching and explore data about jobs in Viet Nam.
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
Python Tkinter app to extract VATs from PDFs, match with Excel data, and email corresponding pages automatically.