GitHunt

Ramses Alexander Coraspe Valdez

Wittline

Learning and working on data engineering projects

Languages

Python50%HTML19%Jupyter Notebook19%VBA4%C#4%JavaScript4%

Top Repositories

Repositories

62
WI
Wittline/docker-livy

Dockerizing and Consuming an Apache Livy environment

HTML139Updated 2 months ago
apache-livyapache-sparkbig-datadockerdocker-composedockerizeddockerizingpostgresql
WI
Wittline/uber-expenses-tracking

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

Jupyter Notebook12336Updated 2 months ago
airflow-dockerapache-airflowawsaws-redshiftdata-engineeringdata-modelingetl-pipelineexpenses-dashboardexpenses-trackerpower-bipythonuberuber-datauber-eats
WI
Wittline/optimizing-public-transportation

Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.

Python21Updated 2 months ago
kafkastream-processingstreamingudacity-nanodegree
WI
Wittline/apache-spark-docker

Dockerizing an Apache Spark Standalone Cluster

VBA4227Updated 2 months ago
apache-sparkdataengineerdataengineeringdockerdocker-composehadoop-clusterhadoop-dockerhdfshivehive-metastorehuepyspark
WI
Wittline/csv-schema-inference

A tool to automatically infer columns data types in .csv files

Jupyter Notebook374Updated 3 months ago
big-datacsvcsv-filesinferencelarge-csvlarge-filesparallel-programmingschema-inference
WI
Wittline/recommendation-system

Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)

Python132Updated 5 months ago
bertbm25nlppythonrecommender-systemrecsystext-analysistf-idfword2vec
WI
Wittline/D3JS-Dashboard

Building Responsive DashBoard with D3.js and ASP.NET MVC from scratch (SQL SERVER - SSIS - API REST)

C#144Updated 5 months ago
api-restasp-net-mvcd3-visualizationd3jsdatavisualizationjavascriptmvc-patternresponsive-designsqlserverssis
WI
Wittline/tf-idf

Term Frequency-Inverse Document Frequency from Scratch

Python146Updated 7 months ago
feature-engineeringpythontext-analyticstfidf
WI
Wittline/text-analysis-speeches-amlo

Text analysis of the speeches, conferences and interviews of the current president of Mexico

Jupyter Notebook93Updated 7 months ago
mexicanmexicomexico-datosmexico-estadosnlppolitical-sciencepoliticspresidentpresidential-candidatespythonspeechspeech-synthesistext-analysisword-embeddings
WI
Wittline/data-engineer-challenge

Challenge Data Engineer

Python258Updated 8 months ago
data-engineeringdata-pipelinedataengineeringdockerdocker-composefastapipostgresql
WI
Wittline/wbz

A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.

Python133Updated 8 months ago
big-databigdataburrows-wheeler-transformcsvcsv-filesdata-compressiondata-engineeringhuffman-codinghuffman-compression-algorithmmove-to-fronttabular-data
WI
Wittline/pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

Python2813Updated 8 months ago
awsaws-emrbig-databig-data-analyticsdataengineeringec2-spotec2-spot-instancesemr-clusterpysparkpythonsparkwordcloud-generator
WI
Wittline/Wittline

Take a look at my repository

32Updated 10 months ago
WI
Wittline/pyDag

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

Python233Updated 11 months ago
big-databigqueryclouddagdata-engineeringdata-pipelinedataengineeringdataprocdataproc-clusterdirected-acyclic-graphgoogle-cloudgoogle-cloud-platformparallel-processingtask-schedulertask-schedulingworkflow-engine
WI
Wittline/ATD

ATD = Api To Datalake

Python00Updated 11 months ago
WI
Wittline/data-engineering-challenge-th

Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)

Python152Updated 11 months ago
data-engineeringdockerfastapipythonscrapingsqlite
WI
Wittline/distance-metrics

Distance metrics are one of the most important parts of some machine learning algorithms, supervised and unsupervised learning, it will help us to calculate and measure similarities between numerical values expressed as data points

Jupyter Notebook52Updated 1 year ago
distance-metricsmachine-learningpythonsimilarity-measures
WI
Wittline/Dropout-Students-Prediction

The goal of this project is to identify students at risk of dropping out the school

HTML2220Updated 1 year ago
genetic-algorithmk-meansk-means-clusteringneural-networksr
WI
Wittline/Huffman-decoding

A New Approach for Efficient Sequential Decoding of Static Huffman Codes

HTML61Updated 1 year ago
burrows-wheeler-transformcompressiondata-compressionhuffman-codinghuffman-decodermove-to-front
WI
Wittline/Contextual-Data-Transforms

This repository contain the most important contextual data transformation algorithms which help to improve the rate compression reached by statistical encoders. Ramses Alexander Coraspe Valdez

HTML31Updated 1 year ago
burrows-wheeler-transformdelta-encodingmove-to-frontrun-length-encodingxor-encoder
WI
Wittline/tuboleta

tuboleta.mx

JavaScript00Updated 1 year ago
WI
Wittline/csv-shuffler

A tool to automatically Shuffle lines in .csv files

Python40Updated 1 year ago
big-datacsvcsv-filesdata-engineeringlarge-filesshuffle
WI
Wittline/csv-estimate-rows

No description provided.

Python40Updated 2 years ago
WI
Wittline/data-eng-frubana

No description provided.

Python00Updated 2 years ago
WI
Wittline/Moving-Average-Spark

How to Compute Moving Average with Spark

51Updated 2 years ago
databrickshadoopmoving-averagespark
WI
Wittline/dataengineering-assignment

Prescreening Tasks for Data Engineer

Jupyter Notebook61Updated 2 years ago
dataengineeringdockerjupyter-notebookpostgresql
WI
Wittline/bulk_json_sqlite

Efficiently Bulk Import a Large JSON File into SQLite

00Updated 2 years ago
WI
Wittline/Data-QualityFork

Data Quality

00Updated 2 years ago
WI
Wittline/RESTful-APIs-Nodejs

Building fast, scalable and secure RESTful services with Node, Express and MongoDB

HTML33Updated 2 years ago
coverage-reportexpress-jsmocha-chaimongodbmongoosenodejsrest-apiunittest
WI
Wittline/fastapi-jwt

Jwt with fastapi

Python00Updated 2 years ago

Gists

Recent Activity