GitHunt
FU

fuwasegu/moonbitap

A Bitap algorithm library written in MoonBit

moonbitap

Bitap (Shift-Or) algorithm implementation for MoonBit with full Unicode support.

Features

  • Fuzzy string matching with configurable edit distance (insertions, deletions, substitutions)
  • Full Unicode support - works with ASCII, CJK characters, emoji, and mixed text
  • Pattern precompilation for efficient repeated searches
  • Zero dependencies (only uses MoonBit core library)

Installation

moon add fuwasegu/moonbitap

Usage

// One-off fuzzy containment check
let found = @moonbitap.fuzzy_contains("hello", "hallo world", max_distance=1)
// true (1 substitution: e -> a)
// Compile pattern once
let pattern = @moonbitap.compile("東京").unwrap()

// Search multiple texts
@moonbitap.contains(pattern, "東京都", max_distance=0)     // true (exact)
@moonbitap.contains(pattern, "東宮", max_distance=1)       // true (1 substitution)
@moonbitap.contains(pattern, "大阪", max_distance=1)       // false

Find Match Position

let pattern = @moonbitap.compile("world").unwrap()
let pos = @moonbitap.find(pattern, "hello world")
// Some(10) - ending position of match

Find All Matches with Distances

let pattern = @moonbitap.compile("test").unwrap()
let matches = @moonbitap.find_all(pattern, "testing tost text", max_distance=1)
// Returns array of FuzzyMatch { position, distance }

Get Best Match

let pattern = @moonbitap.compile("color").unwrap()
let best = @moonbitap.best_match(pattern, "the colour of colors", max_distance=2)
// Returns the match with minimum edit distance

API Reference

Function Description
compile(pattern: String) -> BitapPattern? Compile pattern (max 64 chars)
find(pattern, text) -> Int? Find exact match position
contains(pattern, text, ~max_distance) -> Bool Check fuzzy containment
find_all(pattern, text, ~max_distance) -> Array[FuzzyMatch] Find all matches
best_match(pattern, text, ~max_distance) -> FuzzyMatch? Get closest match
fuzzy_contains(pattern, text, ~max_distance) -> Bool One-off fuzzy check

Limitations

  • Pattern length must be ≤ 64 characters (due to 64-bit bitmask implementation)
  • For longer patterns, consider splitting or using alternative algorithms

Algorithm

This library implements the Bitap algorithm (also known as Shift-Or or Shift-And), which uses bitwise operations for efficient approximate string matching. The fuzzy matching extension is based on Wu-Manber's technique.

License

MIT