achimgaedke/py-aws-vault-auth
Help with aws-vault MFA authentication in notebooks
py-aws-vault-auth
This is a wrapper for the aws-vault command.
This is not an interface to the AWS (glacier) vault or AWS secrets manager.
Introduction
(Re)-Authenticate for AWS services using aws-vault within a python session
(e.g jupyter notebooks):
import py_aws_vault_auth
import boto3
boto_auth = py_aws_vault_auth.authenticate("DataScience", return_as="boto")
c = boto3.client("s3", **boto_auth)
c.list_objects_v2(Bucket="your-bucket")In a Jupyterlab notebook this looks like this
or in a VSCode notebook
by virtue of a context-adjusted version of the builtin input function
- these are
auto-magically provided by Jupyterlab and VSCode.
Credentials for S3 access
That's probably the most prominent data-science usecase...
With boto3 (that works of course for all other services):
boto_auth = py_aws_vault_auth.authenticate("DataScience", return_as="boto")
import boto3
c = boto3.client("s3", **boto_auth)
c.list_objects_v2(Bucket="your-bucket")With s3fs
s3fs_auth = py_aws_vault_auth.authenticate("DataScience", return_as="s3fs")
import s3fs
fs = s3fs.S3FileSystem(**s3fs_auth)
fs.ls("s3://my-bucket/")With pandas (implicitly via fsspec and s3fs)
s3fs_auth = py_aws_vault_auth.authenticate("DataScience", return_as="s3fs")
import pandas
pandas.read_csv("s3://my-bucket/my_file",
storage_options=s3fs_auth
)Credentials as environment variables
Just add the credentials to the environment of a supbrpocess
environ_auth = py_aws_vault_auth.authenticate("DataScience", return_as="environ")
import os, subprocess
subprocess.call(
["aws", "s3", "ls", "my-bucket"],
env=os.environ | environ_auth
)or simply update the running process environment with the (fresh) credentials
environ_auth = py_aws_vault_auth.authenticate("DataScience", return_as="environ")
import os
os.environ.update(environ_auth)Credentials Handling
Without specifying return_as the function authenticate returns all
environment variables starting with AWS_ as seen set by aws-vault - that
includes credentials, their expiration time and the region of the profile.
The expiration time can be converted to a datetime object using
py_aws_vault.expiration_time (this requires dateutil for python<3.11).
The functions to_boto_auth, to_environ_auth and to_s3fs_auth create the
relevant authentication parameters. These can be imported from py_aws_vault_auth
in order to use the same credentials for boto and s3fs, e.g.
ds_credentials = py_aws_vault_auth.authenticate("DataScience")
athena_client = boto3.client("athena", **to_boto_auth(ds_credentials))Installation
No dependencies, just python3... and of course aws-vault
pip install -U git+https://github.com/achimgaedke/py-aws-vault-auth.gitProject Scope
Make the AWS authentication with the command line tool aws-vault easy in an
interactive context different from a terminal, e.g. jupyter notebook.
This project does:
- help with AWS authentication with
aws-vaultmid-session, i.e. - make it easy for data-scientists to avoid copying credentials into a notebook
- avoid starting jupyter with
aws-vault exec XXX -- jupyter lab(or VSCode...) - return the AWS credentials directly usable with popular data-science tools
- request the MFA token via python's input context, i.e. the
inputbuilt-in function - aims to work in Linux/MacOS (and hopefully MS Windows) wo extra dependencies and
supporting a variety of python3 versions
If you prefer another window poping up somewhere, you can use prompt="osascript"
(with MacOS) or similar. This won't use python's input function.
This project does not:
- use all features of
aws-vault - capture the input dialogues for various key-chain/password managers
To avoid too many password manager input dialogues, have a look at the
aws-vault documentation.
Project Maturity
Please star this repository
if you like it or use the issue-tracker
to share some feedback (bug reports or use cases).
This project is born out of need for a smoother integration of devops tools/requirements
with data-science tools. At the moment, it is simply factoring out some code I use
privately.
The project is developed on MacOS, python-3.11 and tested on Linux, python-3.9.
Yes, the thread-based polling of the terminal communication is kind of awkward. Once
upon a time this was the most portable way of waiting on output - or it was 6 years ago.
I might revisit this part another time, as OS and backwards-compatibility got
better. (I'm aware of select, async, or setting streams to non-blocking mode)
Ah, and tests are missing...

