68 results for “topic:simhash”
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
中文文本相似度计算器
A simple implementation of simhash algorithm by java.
Removes most frequent words (stop words) from a text content. Based on a Curated list of language statistics.
Dynatrace hash library for Java
Locality Sensitive Hashing
Simhash implementation in Javascript
semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).
A fast python implementation of the SimHash algorithm.
基于springboot和Google开源simhash算法实现的作业查重/抄袭检测/文本相似度分析可视化系统,,集成jplag、MOSS、singleCloud工具套件进行多方位查重 Ref: https://github.com/ALuShu/checksystem
A text similarity by simhash
Open Source Implementation of Simhash in Python
招商银行FinTech-复赛-财经新闻分析
Elixir SimHash NIFs written in Rust
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex
A simhasher for Chinese documents implemented by golang, simply translated from yanyiwu/gosimhash
A library for cosine similarity & simhash calculation
Find duplicate text files.
A rewrite of Bookmate's simhash gem, which is an implementation of Moses Charikar's simhashes in Ruby.
基于Java的多线程爬虫框架
This system evaluates a collection of mementos (archived web pages) to determine which are off topic. The collection can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
基于simHash的Web作业查重系统
Code plagiarism system based on Simhash and Nicad.
⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke
基于 SimHash 与 BERT 的高校学生作业查重系统,通过结合 SimHash 算法和 BERT-Base-Chinese 模型、Vue3、Spring Boot3、EasyExcel、HanLP,实现智能查重。支持文件批量处理,历史作业比对,自动生成详细的 Excel 查重报告。集成 Jaccard、海明距离、Hash、余弦、图片和加权相似度算法,精准评估文件相似性。
text de-duplication 文本去重
A lightweight Go package implementing Charikar's Simhash algorithm for generating hash fingerprints and calculating similarity, ideal for deduplication and content fingerprinting
Super-Bit locality-sensitive hashing
College project (Analysis of massive data sets) - C# implementation of big data algorithms (2017/2018)