Ruan Chaves
ruanchaves
Senior AI Engineer with 5+ years of experience delivering real-world solutions using Generative AI, LLMs, and NLP.
Languages
Repos
76
Stars
175
Forks
10
Top Language
Python
Loading contributions...
Top Repositories
Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.
Supporting code for the paper "Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks".
Telegram bot that recommends songs as YouTube playlists through gensim's word2vec
image downloader plugin for reddit-html-archiver
Zero-shot Entity Linking with blitz start in 3 minutes. Hard negative mining and encoder for all entities are also included in this implementation.
Repositories
76No description provided.
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.
No description provided.
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.
💫 Industrial-strength Natural Language Processing (NLP) in Python
No description provided.
No description provided.
Telegram bot that recommends songs as YouTube playlists through gensim's word2vec
A medical question-answering system that can effectively answer user queries related to medical diseases.
image downloader plugin for reddit-html-archiver
No description provided.
Utility for analyzing Transformer based representations of language.
✨ Github repository for my website
No description provided.
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!
No description provided.
No description provided.
Visualização da evolução da marca Lacoste em 2018 no Twitter
No description provided.
HateBR is the first large-scale expert annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection on the web and social media.
Supporting code for the paper "Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks".
PorSimplesSent - A Portuguese corpus of aligned sentences pairs to investigate sentence readability assessment
Zero-shot Entity Linking with blitz start in 3 minutes. Hard negative mining and encoder for all entities are also included in this implementation.
✨ Python framework for data-centric NLP
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
Code to extract Reddit comments and submissions from Pushshift dumps based on keywords.
No description provided.
Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings