moonbitap
Bitap (Shift-Or) algorithm implementation for MoonBit with full Unicode support.
Features
- Fuzzy string matching with configurable edit distance (insertions, deletions, substitutions)
- Full Unicode support - works with ASCII, CJK characters, emoji, and mixed text
- Pattern precompilation for efficient repeated searches
- Zero dependencies (only uses MoonBit core library)
Installation
moon add fuwasegu/moonbitapUsage
Basic Fuzzy Search
// One-off fuzzy containment check
let found = @moonbitap.fuzzy_contains("hello", "hallo world", max_distance=1)
// true (1 substitution: e -> a)Precompiled Pattern (Recommended for repeated searches)
// Compile pattern once
let pattern = @moonbitap.compile("東京").unwrap()
// Search multiple texts
@moonbitap.contains(pattern, "東京都", max_distance=0) // true (exact)
@moonbitap.contains(pattern, "東宮", max_distance=1) // true (1 substitution)
@moonbitap.contains(pattern, "大阪", max_distance=1) // falseFind Match Position
let pattern = @moonbitap.compile("world").unwrap()
let pos = @moonbitap.find(pattern, "hello world")
// Some(10) - ending position of matchFind All Matches with Distances
let pattern = @moonbitap.compile("test").unwrap()
let matches = @moonbitap.find_all(pattern, "testing tost text", max_distance=1)
// Returns array of FuzzyMatch { position, distance }Get Best Match
let pattern = @moonbitap.compile("color").unwrap()
let best = @moonbitap.best_match(pattern, "the colour of colors", max_distance=2)
// Returns the match with minimum edit distanceAPI Reference
| Function | Description |
|---|---|
compile(pattern: String) -> BitapPattern? |
Compile pattern (max 64 chars) |
find(pattern, text) -> Int? |
Find exact match position |
contains(pattern, text, ~max_distance) -> Bool |
Check fuzzy containment |
find_all(pattern, text, ~max_distance) -> Array[FuzzyMatch] |
Find all matches |
best_match(pattern, text, ~max_distance) -> FuzzyMatch? |
Get closest match |
fuzzy_contains(pattern, text, ~max_distance) -> Bool |
One-off fuzzy check |
Limitations
- Pattern length must be ≤ 64 characters (due to 64-bit bitmask implementation)
- For longer patterns, consider splitting or using alternative algorithms
Algorithm
This library implements the Bitap algorithm (also known as Shift-Or or Shift-And), which uses bitwise operations for efficient approximate string matching. The fuzzy matching extension is based on Wu-Manber's technique.
License
MIT
On this page
Contributors
MIT License
Created February 6, 2026
Updated February 6, 2026