133 results for “topic:minhash”
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Quickly search, compare, and analyze genomic and metagenomic data sets.
JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
中文文本相似度计算器
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)
Dynatrace hash library for Java
Weighted MinHash implementation on CUDA (multi-gpu).
Detect and visualize text reuse
Locality Sensitive Hashing
A Clojure library for querying large data-sets on similarity
A resistome profiler for Graphing Resistance Out Of meTagenomes
Elasticsearch plugin for b-bit minhash algorism
Quickly estimate the similarity between many sets
SetSketch: Filling the Gap between MinHash and HyperLogLog
Classify sequencing reads using MinHash.
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Genomic neighbor typing of bacterial pathogens using MinHash :rat:
Locality Sensitive Hashing In R
Probabilistic data structures for OCaml
This provides tools for b-bit MinHash algorism.
A simple audio fingerprinting system
A method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Minhash LSH in Golang
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
plagiarism detector
There are Python 2.7 codes and learning notes for Spark 2.1.1