67 results for “topic:code-switching”
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
A curated list of research papers and resources on code-switching
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
Implementation of meta-transfer-learning for ASR and LM (ACL 2020)
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Natural Language Procesing
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Pytorch implementation of CS-Tacotron, a code-switching speech synthesis end-to-end generative TTS model.
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
A curated list of resources dedicated to Code-mixed Natural Language Processing (NLP).
Hierarchical Korean-English Code-Switching Speech Recognition Benchmark (EACL Findings 2026, To Appear) | 한영 혼용 음성인식 벤치마크
Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning (CALCS 2018, ACL)
Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection
Unsupervised Sentiment Analysis for Code-mixed Data
Implementation of a deep learning model (BiLSTM) to detect code-switching
[EMNLP 2023] Official repository of paper titled "Detecting Propaganda Techniques in Code-Switched Social Media Text"
Code repository for ACL2020 paper Multi-label and Multilingual News Framing Analysis
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
POSIT aims to segment and tag mixed-text that contains English and C-like code, such that the user both knows what a token is, and within the language it's used in, what role, such as an AST tag or PoS tag, it serves.
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
A sequence tagging model with active learning
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
A package for determining the matrix language in bilingual sentences
Point of Interest Error Rate (PIER) Metric for Code-Switching ASR: A specialized evaluation metric designed to focus on critical points in multilingual speech recognition, providing a more accurate analysis of code-switched utterances.
Code for "CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units" (Accepted at ACL-SRW 2024) 🇹🇭
a socket script to obtain chinese phones-sequence for any english word
Official repository for the paper titled "From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text" accepted at ACL 2021
Chrome extension for translating highlighted English text into Chinglish (a chinese + english hybrid)
This repository contains crowdsourced universal part-of-speech tags for the Miami Bangor corpus.