Ramses Alexander Coraspe Valdez
Wittline
Learning and working on data engineering projects
Languages
Top Repositories
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Dockerizing an Apache Spark Standalone Cluster
A tool to automatically infer columns data types in .csv files
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Challenge Data Engineer
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Repositories
62Dockerizing and Consuming an Apache Livy environment
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.
Dockerizing an Apache Spark Standalone Cluster
A tool to automatically infer columns data types in .csv files
Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)
Building Responsive DashBoard with D3.js and ASP.NET MVC from scratch (SQL SERVER - SSIS - API REST)
Term Frequency-Inverse Document Frequency from Scratch
Text analysis of the speeches, conferences and interviews of the current president of Mexico
Challenge Data Engineer
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Take a look at my repository
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
ATD = Api To Datalake
Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)
Distance metrics are one of the most important parts of some machine learning algorithms, supervised and unsupervised learning, it will help us to calculate and measure similarities between numerical values expressed as data points
The goal of this project is to identify students at risk of dropping out the school
A New Approach for Efficient Sequential Decoding of Static Huffman Codes
This repository contain the most important contextual data transformation algorithms which help to improve the rate compression reached by statistical encoders. Ramses Alexander Coraspe Valdez
tuboleta.mx
A tool to automatically Shuffle lines in .csv files
No description provided.
No description provided.
How to Compute Moving Average with Spark
Prescreening Tasks for Data Engineer
Efficiently Bulk Import a Large JSON File into SQLite
Data Quality
Building fast, scalable and secure RESTful services with Node, Express and MongoDB
Jwt with fastapi