Olivier Binette
OlivierBinette
Research Scientist @ Upstart // Duke Statistical Science PhD
Languages
Top Repositories
List of entity resolution software and resources.
An End-to-End Evaluation Framework for Entity Resolution Systems
Survey components for Streamlit apps
Efficient String Comparison Functions and Fuzzy String Matching
Lightweight validation tool for checking function arguments and data analysis scripts.
Fingerprint matching tools based on NIST's mindtct and bozorth3 algorithms.
Repositories
120Fingerprint matching tools based on NIST's mindtct and bozorth3 algorithms.
GroupSHAP variant of the TreeSHAP algorithm.
List of entity resolution software and resources.
No description provided.
A lightweight feature store for Pandas, DuckDB, or your choice of backend.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
No description provided.
An End-to-End Evaluation Framework for Entity Resolution Systems
Efficient String Comparison Functions and Fuzzy String Matching
Lightweight validation tool for checking function arguments and data analysis scripts.
Easily cache and retrieve computation results in R
Survey components for Streamlit apps
Card flipping app for "Welcome to the Moon"
Deduplicate data using fuzzy and deterministic matching rules.
Code and analyses for the paper titled “On the Reliability of Multiple Systems Estimation for the Quantification of Modern Slavery” (Binette and Steorts, 2021).
USPTO XML resources and data examples for patent text.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
No description provided.
No description provided.
No description provided.
3D data visualization with WebGL/three.js
No description provided.
ER-Evaluation Demo for JSM 2023
Efficient typo-tolerant search in 76 lines of code, with no dependencies.
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
No description provided.
TruthfulQA: Measuring How Models Imitate Human Falsehoods
Unofficial archive of https://dbs.uni-leipzig.de/research/projects/benchmark-datasets-for-entity-resolution
Tools for the use of Tesseract OCR in R