36 results for “topic:cnn-rnn”
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
This repositary contain all my exercises and projects of Udacity Computer Vision Nanodegree Program
For sequence-to-sequence beginners. PyTorch-implemented 1DCNN, LSTM, Attention, and Transformers.
Contains additional materials for two keras.io blog posts.
16 projects in the framework of Computer Vision algorithms: 16 projects in the framework of Computer Vision algorithms: CNN, RNN, LSTM, Facial KeyPoints, Image Captioning, SLAM, Edge Detectors, Day Night Classifier, etc.
This project aims to assist visually impaired individuals by providing a solution to convert images into spoken language. Leveraging deep learning and natural language processing, the system processes images, generates descriptive captions, and converts these captions into audio output.
Caption generation through a CNN-RNN model further to be converted to speech using a text to speech library for a visually challenged person for understanding the content of an image in the form of speech.
A hybrid CNN-RNN model trained on the DEAM dataset for music emotion recognition using the valence-arousal model. Useful for tagging emotional content of music in media or recommendation systems
Tuning, training, and transfer learning CRNN models for handwritten text words.
This repository contains my end-to-end hands-on Deep Learning learning journey, where I implemented core deep learning concepts from scratch and using Keras/TensorFlow. Each notebook focuses on a single concept, building intuition through experiments, visualizations, and practical implementations.
A WebApp that Generates Caption for Images using CNN-RNN Architecture
This project builds a video classification model using CNNs for spatial feature extraction and RNNs for temporal sequence modeling. Utilizing the UCF101 dataset, it covers data preprocessing, feature extraction, model training, and evaluation, providing a comprehensive approach to action recognition in videos.
Transform TV control with Gesture Recognition! Enable intuitive interaction with smart TVs using gestures built using Conv3D, CNN & RNN
Image Captioning using CNN-RNN architecture made with :heart: in Pytorch. Do :star2: he repo if you find it useful :rocket:
To develop gesture recognition feature for smart TV which help user control the TV without using remote.
Built a CNN-RNN neural network architecture to automatically generate captions from images describing that image.
A collection of Sentiment Analysis models developed using PyTorch
A Deep Learning-based approach to classify human gestures for smart appliances.
scene text detection
Explore CNN & RNN based model for lane-keeping assistance
(50.040) Natural Language Processing
The project aimed to push image captioning technology forward by combining recent advances in image recognition and language modeling to generate novel, descriptive captions that go beyond just naming objects and actions
Develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote.
A neural network architecture that automatically generate captions from images.
AGENT is a mobile app I developed as a requirement for completing my special problem in UPLB
This GitHub repository contains the implementation of a deep learning model capable of generating captions for images in the form of speech.
This project showcase the usage of Neural Network algorithms to develop a feature for smart TVs. The implementation is able to detect 5 different gestures performed by the user, allowing them to control the TV without a remote.
Deep learning model to recognize gestures
Image to text to speech generation
We want to classify songs to their music genre by using spectograms, i.e. plots showing frequencies in a sequential way. To deal with that, we started by using CNN, thus treating spectograms as standard images. We managed to include also the sequential information of frequencies by stacking RNN and CNN layers.