Kavita Ganesan
kavgan
Author of The Business Case For AI. Chief AI Strategist & Architect. Ph.D. in CS. www.kavita-ganesan.com
Languages
Loading contributions...
Top Repositories
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
This repo contains code and dataset for the Opinosis Summarization Framework
Python word cloud library for use within Jupyter notebook and Python apps.
OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
Repositories
63Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
This repo contains code and dataset for the Opinosis Summarization Framework
Examples of code in spark
Practice practice practice. Bubble sort, factorial, powerset, subarray, mergesort, remove duplicates, etc.
Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.
Python word cloud library for use within Jupyter notebook and Python apps.
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Dynamic, browser-based visualization library
OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
Stop word lists
Test hashtags
The most popular HTML, CSS, and JavaScript framework for developing responsive, mobile first projects on the web.
Working examples in python
Curated List of Blog Posts From Opinosis Analytics
Dataset for Micropinion Generation. Dataset is based on user reviews from CNET. The reviews are on products from various categories like tv, cell phones, gps etc.
A curated list of data science blogs
A Cluster Computing System for Processing Large-Scale Spatial Data
CLI HTTP client, user-friendly curl replacement with intuitive UI, JSON support, syntax highlighting, wget-like downloads, extensions, etc.
Experiments using neural networks in java
Test Electron apps using ChromeDriver
APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
Importing/exporting functionality for the RedShift data warehouse
minimal example for sentence embedding by Smooth Inverse Frequency weighting scheme
Super simple NLP tools. Cluster sentences, get multiple text similarity measures including cosine, jaccard and dice, generate topics, extract text from html and more
Test repo
Explore the Electron APIs
website images
Utility tools to prepare and evaluate ROUGE scores. Perl script to convert perl output of ROUGE to CSV.
Linguistic Understanding of Complaints and Praises in User Reviews. Paper talking about going beyond positive and negative sentiment categories. Complaints and Praise have properties that are different from positives and negatives