DyfanJones/sagemaker-r-sdk
A library for training and deploying machine learning models on Amazon SageMaker using R through paws sdk
sagemaker
The idea is to rebuild AWS Sagemaker Python
SDK using
R6 classes and
paws behind the scenes.
Installation
You can install the development version of sagemaker from
GitHub with:
# install.packages("remotes)
remotes::install_github("DyfanJones/sagemaker-r-sdk")Warning!!!
This repo is in constantly under development and is not currently
stable. sagemaker is currently aligning it’s api with sagemaker v2,
apologises for any code breaking this causes.
API overview:
This package aims to mimic python’s AWS Sagemaker SDK api, but using
R6 and paws
Architecture Design:
sagemaker is a metadata package that contains all methods to interact
with Amazon Sagemaker.
- sagemaker.core:
Containse core components of sdk for exampleSessionR6 class - sagemaker.common:
Contains common components used throughout sagemaker sdk - sagemaker.mlcore:
Contains core components for machine learning (ML) and amazon
developed ML. - sagemaker.mlframework:
Contains ML frameworks developed for Amazon Sagemaker i.e.SKLearn - sagemaker.workflow:
Contains sagemaker pipeline and workflows - sagemaker.debugger:
Contains debugging methods
(https://github.com/awslabs/sagemaker-debugger-rulesconfig)
Learn from examples:
Amazon Algorithms:
sagemaker is designed to minic python’s sagemaker sdk. Therefore all
examples for python’s sagemaker should be able to accessible.
Examples:
- Targeted Direct
Marketing
predicts potential customers that are most likely to convert based
on customer and aggregate level metrics, using Amazon SageMaker’s
implementation of XGBoost. - XGBoost
Tuning
shows how to use SageMaker hyperparameter tuning to improve your
model fits for the Targeted Direct
Marketing
task. - BlazingText
Word2Vec
generates Word2Vec embeddings from a cleaned text dump of Wikipedia
articles using SageMaker’s fast and scalable BlazingText
implementation.
R Model Examples:
- R Multivariate Adaptive Regression
Splines
example over iris data.frame
Note: If a feature hasn’t yet been implemented please feel free to
raise a pull request or a ticket
For developers
To keep the package within the CRAN size limit of 5MB. sagemaker is
currently using a separate repository
(sagemaker-r-test-data)
to store R variants of test data stored in
sagemaker-python-sdk.
sagemaker-r-test-data will only consist of data that can’t be read into
R natively i.e. python pickle files. For other test data sagemaker will
read it directly from sagemaker-python-sdk.
