174 results for “topic:tf-idf-vectorizer”
Tunable full text search engine in JavaScript that: (1) works natively on web apps like Express.js; (2) easy to customize (via BM25) to specific types of documents (e.g. tweets, scientifc journals); (3) is deployable on either the client-side or the server side.
NLP based Classification Model that predicts a person's personality type as one of the 16 Myers Briggs personality types. Extremely challenging project dealing with correlation between human psychology and casual writing styles and handling heavily imbalanced classes. Check the app here - https://mb-predictor-motetuzs5q-uc.a.run.app/
Turkish News Category Classification Tutorial
This repository contains the code for basic kind of E-commerce recommendation engine. By using the concept of TF-IDF and cosine similarity, we have built this recommendation engine.
Skincare recommendation android application that uses dataset from Kaggle and scrapped data from cosmetics websites to work a Tf-IDF vectorizer for content based filtering, and KNN and Decision trees for collaborative based filtering. The notebook also contains other approaches for POC including SVD. Backend APIs are based on Flask, Android application is made using Java with Android Studio whereas Firebase acts as the database and the middleware for relaying login information as well to serve the data to the application.
A simple Django-based resume ranker website where recruiters post their jobs and candidates applies for their desired vacancies. The system gets the document similarity between the job description and the candidate resumes, generates similarity scores using the KNN model, and rank or shortlist the candidate resumes.
Twitter Sentiment Analysis Using InSet (Indonesia Sentiment Lexicon) and Random Forest Classifier
The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.
Extractive summarizationof medical transcriptions
Recipe Genie is a recipe recommendation system that recommends recipes to users based on the ingredients they have at home.
NLP tutorials and guidelines to learn efficiently
Text Classification for Sentiment Analysis using Female Daily's Reviews Dataset
Extract textual information from Amazon products reviews and draw correlations through regression and fluctuation analysis.
Large Scale benchmarking of state of the art text vectorizers
Retrieve Information from Text Documents with TF-IDF model and dimention reduction with (Latent Semantic Indexing)LSI.
Sistema de recomendación de películas basado en contenido. Utilizando TF-IDF y la similitud del coseno. La data fue extraída, transformada y analizada para el entrenamiento del modelo. Disponibilizandolo junto con la data limpia para futuras consultas, a través del despliegue con FastAPI y Render.
Twitter Sentiment Analysis Using Vader Lexicon and Random Forest Classifier
AI Text Detection Web App identifies whether text is AI-generated or human-written. It offers unigram and bigram models, combining Logistic Regression, Naive Bayes, Random Forest, and LightGBM to provide accurate predictions based on text structure and context.
Course Project of Information Retrieval.
Scrapped tweets using twitter API (for keyword ‘Netflix’) on an AWS EC2 instance, ingested data into S3 via kinesis firehose. Used Spark ML on databricks to build a pipeline for sentiment classification model and Athena & QuickSight to build a dashboard
A web application that detects aggression and misogyny in text using BERT augmentation, sentiment analysis, XGBoost, TF-IDF vectorization, LIME explainability. [Paper accepted at ICON 2021]
A recommendation system for books. Built by following two filtering methods that are Collaborative Filtering and Content Based Filtering. Algorithms used are KNN, Pearson Correlation, and TF-IDF. Every dataset used can be easily found in the data folder of the respository.
Compilation of Information Retrieval codes.
Implementation of a search engine using a vector space model.
Application of Machine Learning Techniques for Text Classification and Topic Modelling on CrisisLexT26 dataset.
No description provided.
This repo illustrates the use of NLP techniques in legal analytics. Herein, contributors used the Supreme Court of The United States facts and their corresponding issue areas to predict the outcome of a case. After training an LSTM neural network, contributors implemented the model in a streamlit app.
Spam Classifier project for my end-of-semester project for Intro to AI class. We were a group of four people. I worked on all the Naive Bayes models.
This repository contains a product ID mapping solution using TF-IDF vectorizer for weighted text vectors, Facebook AI Similarity Search (FAISS) for coarse filtering with cosine similarity, and Levenshtein distance for refined matching against the Blinkit catalog. Achieved 11.45% match for Zepto and 11.48% for Instamart.
The objective of this capstone project is to use Natural Language Processing (NLP) to create a machine-learning model that predicts the quality of questions posted on Stack Overflow, a popular question-and-answer platform for software developers.