movie-recs
Movie poster object detection with Ray and Anyscale
The following workload has been run on a ml.g4dn.12xlarge Sagemaker instance.
Requirements
# Local commit with changes needed, will be merged soon
# In case you're not using py3.7, get wheel links in https://buildkite.com/ray-project/ray-builders-pr/builds/41075#01825f93-238f-4376-8df2-0f60d5df468b
pip install https://ray-ci-artifact-pr-public.s3.amazonaws.com/c58874ae8545eef0b5c7632418eba3da0b5015c9/tmp/artifacts/.whl/ray-3.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
pip install ray[tune]
pip install torch
pip install torchvision
pip install tqdm # to visualize progress bar
PyTorch Serial
First download the images from s3
cd /home/ec2-user/SageMaker/movie-recs/movie_posters
mkdir images
aws s3 cp --recursive s3://waleed-movies ./images/
Then run python pytorch_serial.py
Running Ray locally on Sagemaker
python movie_poster_rec.py
This will
- Read the Dataset from S3
- Perform batch prediction with Ray AIR
- Take the first 100 prediction results to visualize in the
./object_detectionsfolder.
Running Ray on Anyscale via Workspaces
- Set up Anyscale git access so you can clone the repo.
- You can either create a Personal Access token on Github.
- Or you can generate an ssh key on the Sagemaker instance and add it to your Github account
- Configure the Personal Access Token/SSH key with Anyscale SSO
- Clone the product repo
git clone git@github.com:anyscale/product.git - Install Anyscale CLI
cd product/frontend/cli
pip install -e .
- Create a workspace on Anyscale staging. Use the
sagemaker-cluster-env:1cluster env. Use 4g4dn.12xlargenodes. - Add Anyscale credentials to your Sagemaker instance
Go to https://console.anyscale-staging.com/o/anyscale-internal/credentials and follow the instructions for setting environment variables. Seems likeanyscale authdoes not work for staging.
For example
export ANYSCALE_HOST=https://console.anyscale-staging.com
export ANYSCALE_CLI_TOKEN=<INSERT_YOUR_CLI_TOKEN>
- Clone the workspace in your Sagemaker Notebook. For example, if your workspace is called
sagemaker-demo, then doanyscale workspace clone -n sagemaker-demo - Copy files to your workspace
cp *.py workspace-project-sagemaker-demo - Run batch prediction
cd workspace-project-sagemaker-demo
anyscale workspace run "python movie-poster-rec.py"
- If you access your workspace from Anyscale, you should see the first 100 object detections saved on the head node.
Running Ray on Anyscale via Anyscale Connect
Follow instructions 1-5 from the previous section.
Now do the following
- Add the appropriate Ray address to your script.
For example, add the following line tomovie_poster_rec.py:ray.init("anyscale://workspace-project-sagemaker-demo/workspace-cluster-sagemaker-demo", runtime_env={"working_dir": "."}) - Run
python movie_poster_rec.py - You should see the 100 object detections saved on the locally on the Sagemaker notebook.