34 results for “topic:code-mixing”
A curated list of research papers and resources on code-switching
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
No description provided.
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
A curated list of resources dedicated to Code-mixed Natural Language Processing (NLP).
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)
A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text
will discuss code mixing algorithms evolution
Indonesian-English code-mixed Twitter dataset
Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur
A language detection model for code-switched texts in es/en/zh
This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.
CodeMixQA is a benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms.
No description provided.
Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R
This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing
This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
No description provided.
This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
The official code for the "True Bilingual NMT" paper
Hindi-English code-mixed text classification using TF-IDF + Logistic Regression and BERT fine-tuning