160 results for “topic:bpe”
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT models (gpt-5, gpt-o*, gpt-4o, etc.). Port of OpenAI's tiktoken with additional features.
Ready-made tokenizer library for working with GPT and tiktoken
Fast and customizable text tokenization library with BPE and SentencePiece support
Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.
Explains nlp building blocks in a simple manner.
Byte Pair Encoding for Python!
nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Fast bare-bones BPE for modern tokenizer training
Build LLM from scratch
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Simple-to-use scoring function for arbitrarily tokenized texts.
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
Kotlin multiplatform BPE tokenizer library for OpenAI models
BBPE 底层实现
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
GPT3 encoder & decoder tool written in Swift
Geometric Byte Pair Encoding of Protein Structure (ICLR 2026)
High performance unsupervised text tokenization for Ruby
✂️ OpenAI's tiktoken tokenizer written in Go
Sentiment-based classification for stock article title using PhoBert
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [arXiv 2025]
Byte-level byte pair encoding (BPE) in Haskell
Code for the paper "BPE stays on SCRIPT"