GitHunt

Fei Han

hanfei1986

Senior AI Engineer @ Microsoft | Ex-Amazon | Ex-MIT | Machine Learning and Generative AI

Seattle

Languages

Jupyter Notebook90%Python10%

Repos

46

Stars

14

Forks

1

Top Language

Jupyter Notebook

Loading contributions...

Top Repositories

Repositories

46
HA
hanfei1986/Impute-missing-data-with-XGBoost

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

Jupyter Notebook40Updated 2 years ago
data-imputationfeature-engineeringmachine-learningxgboost
HA
hanfei1986/Batch-reading-of-neural-network-training-and-visualization-of-loss

When training data is bigger than memory, we can feed the training data to neural network training in multiple batches. This notebook demostrates how to do it and visualizes the training and test losses.

Jupyter Notebook10Updated 2 years ago
batch-readingdeep-learningloss-functionsmachine-learningneural-network
HA
hanfei1986/Random-forest-and-RFECV

This Jupyter notebook demonstrates a Recursive Feature Elimination with Cross-Validation (RFECV) feature selection process with a random forest model.

Jupyter Notebook10Updated 2 years ago
machine-learningrandom-forestrfecv
HA
hanfei1986/Image-processing-and-optical-character-recognition-with-tesseract

This Python program is used to pre-process images and recognize characters in them (OCR) with pytesseract in a batch-processing way.

Python10Updated 1 year ago
image-processingocr
HA
hanfei1986/Oversampling-of-imbalanced-data-with-RandomOverSampler-SMOTE-and-ADASYN

Imbalanced data commonly exist in real world, especially in anomaly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. RandomOverSampler, SMOTE, and ADASYN are useful oversampling tools to fabricate data for minority classes and make the dataset balanced.

Jupyter Notebook10Updated 2 years ago
imbalanced-datamachine-learningover-sampling
HA
hanfei1986/Build-a-chatbot-powered-by-GPT-3.5-using-Streamlit

https://chatbot-v2.streamlit.app/

Python10Updated 1 year ago
chatbot
HA
hanfei1986/To-Python-beginners

No description provided.

Jupyter Notebook00Updated 1 year ago
HA
hanfei1986/Estimate-the-area-of-a-region-using-a-Monte-Carlo-simulation

Monte Carlo simulation is a computational technique that uses random sampling and statistical methods to estimate the behavior of complex systems or solve problems. It is particularly useful when dealing with problems that involve a high degree of randomness or complexity.

Jupyter Notebook00Updated 2 years ago
monte-carlo-simulation
HA
hanfei1986/Solve-travelling-salesman-problem-with-genetic-algorithm

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" In this notebook, I demonstate the solution of this problem with the genetic algorithm.

Jupyter Notebook10Updated 2 years ago
genetic-algorithmtravelling-salesman-problem
HA
hanfei1986/Fine-tune-BERT-for-sentiment-analysis

BERT is an NLP model developed by Google Research in 2018, after its inception it has achieved state-of-the-art accuracy on several NLP tasks. This notebook demonstrates fine tuning BERT for sentiment analysis.

Jupyter Notebook10Updated 2 years ago
bert-fine-tuningsentiment-analysis
HA
hanfei1986/CNN-for-digits-recognition

This is a CNN tutorial for beginners about a digits recognition model trained on the MNIST dataset. I built two models with TensorFlow/Keras and PyTorch/Skorch respectively.

Jupyter Notebook00Updated 2 years ago
cnndeep-learningmachine-learningmnist
HA
hanfei1986/Solve-travelling-salesman-problem-with-simulated-annealing-algorithm

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" In this notebook, I demonstate the solution of this problem with the simulated annealing algorithm.

Jupyter Notebook00Updated 2 years ago
simulated-annealingtravelling-salesman-problem
HA
hanfei1986/Calculate-semiconductor-chip-yield-against-defect-density-with-Monte-Carlo-simulation

Calculating semiconductor chip yield against defect density using a Monte Carlo simulation is a common approach to assess the impact of defects on chip manufacturing. In this simulation, we'll randomly generate defect locations and evaluate chip yield based on specified criteria.

Jupyter Notebook00Updated 2 years ago
monte-carlo-simulationsemiconductor-manufacturing
HA
hanfei1986/EDA-plots-for-regression

This notebook demonstrates the charts I usually plot for exploratory data analysis for regression tasks.

Jupyter Notebook00Updated 2 years ago
edamachine-learning
HA
hanfei1986/Interpret-feature-importance-using-SHAP

SHAP is a fancy tool for interpreting feature importance in machine learning tasks. This Jupyter notebook gives a demonstration.

Jupyter Notebook10Updated 2 years ago
feature-importancemachine-learningshap
HA
hanfei1986/Histogram-of-an-image-and-its-heatmap

A histogram of an image provides valuable insights into the distribution of pixel intensities within that image. This notebook gives a brief about how to plot the histogram. Furtherly, we can replot the picture with a heatmap based on its pixel intensities.

Jupyter Notebook10Updated 2 years ago
image-processing
HA
hanfei1986/Impute-missing-data-with-KNNImputer-and-IterativeImputer

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.

Jupyter Notebook10Updated 2 years ago
data-imputationfeature-engineeringiterativeimputerknnimputermachine-learning
HA
hanfei1986/Monte-Carlo-integration

Monte Carlo integration is particularly useful when dealing with high-dimensional integrals or integrals over complex, irregularly shaped domains where traditional methods may be impractical. It's widely used in various fields, including physics, finance, and engineering, for solving problems involving numerical integration.

Jupyter Notebook01Updated 2 years ago
monte-carlo-integration
HA
hanfei1986/Word2Vec-embedding-and-sentiment-analysis

Word2Vec is a popular word embedding technique that converts words into vectors in a high-dimensional space, capturing semantic relationships between words. This notebook demonstrates embedding text data with Word2Vec for sentiment analysis.

Jupyter Notebook00Updated 2 years ago
sentiment-analysisword2vec
HA
hanfei1986/PCA-truncated-SVD-and-visualization-of-explained-variance

PCA or truncated SVD reduces dimensionality of data by transforming the data into a lower-dimensional space. In this notebook a chart visualizes how much variance of the original data is picked up in the new components. The data transformation process is also explained.

Jupyter Notebook00Updated 2 years ago
pcasvd
HA
hanfei1986/Matrix-factorization-with-SVD-NMF-and-gradient-descent

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.

Jupyter Notebook00Updated 2 years ago
collaborative-filteringmatrix-factorizationrecommender-system
HA
hanfei1986/TFIDF-embedding-and-sentiment-analysis

TFIDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This notebook demonstrates how to embed text data with TFIDF and do sentiment analysis based on it.

Jupyter Notebook00Updated 2 years ago
sentiment-analysistfidf
HA
hanfei1986/Inference-of-a-picture-with-ResNet-models

ResNet models are lightweight computer vision pre-trained models. This notebook demostrates how to infer the object in a picture with ResNet18, ResNet34, ResNet50, ResNet101, and ResNet251.

Jupyter Notebook00Updated 2 years ago
computer-visionresnet
HA
hanfei1986/Increase-the-density-of-data-by-interpolation

Increase the density of data by interpolation.

Python00Updated 2 years ago
interpolation
HA
hanfei1986/Linear-regression-and-its-regularizations

Linear regression model is widely used in industry for regression tasks as it is straightforward and easy to interpret. To capature non-linear patterns in data, polynomial features need to be added. However, high-degree polynomial features lead to overfitting. To solve the problem, regularizations can be added to the loss function.

Jupyter Notebook00Updated 2 years ago
linear-regressionmachine-learningregularization
HA
hanfei1986/Tree-based-and-neural-network-regressors

Usually tree-based and neural network regressors work better for regression tasks than linear regression models, because they can capature complex or subtle non-linear patterns in data.

Jupyter Notebook00Updated 2 years ago
deep-learningmachine-learningneural-networktree-based-models
HA
hanfei1986/Neural-network-models-without-using-wrappers

Keras and Starch provide us wrappers which simplify building neural network models. However, the wrappers sacrifice the flexibility of the models. In some scenarios like early stopping and batch reading, building pristine neural network models is still very useful.

Jupyter Notebook00Updated 2 years ago
deep-learningmachine-learningneural-network
HA
hanfei1986/Make-PowerPoint-slides-with-Python

With the python-pptx library, we can automate the updating of PowerPoint slides.

Jupyter Notebook00Updated 2 years ago
powerpointpptxpython
HA
hanfei1986/Two-sample-t-test-and-visuallization

Two-sample t-test is a statistical hypothesis test used to determine if there is a significant difference between two independent groups. If the p-value is less than the chosen significance level (for example 0.05), you reject the null hypothesis and conclude that there is a significant difference between the groups.

Jupyter Notebook00Updated 2 years ago
t-test
HA
hanfei1986/Bar-chart-race

Bar chart race is an elegant animation that depicts the progress of multiple categories over time. We can create them in Python.

Jupyter Notebook00Updated 2 years ago
bar-chart-race

Gists

Recent Activity

Fei Han (hanfei1986) | GitHunt