Minehut/archie
a scalable file archiver for MinIO events
archie
summary
A file archiver for Kubernetes built around MinIO and our scalable archie worker used to archie files
from MinIO source buckets to any S3 compatible or Google Cloud Storage destination bucket by way of
MinIO's bucket event notifications, NATS JetStream's durable streams, and KEDA auto-scaling.
app features:
- replicate bucket data from minio sources
- copy and remove
- replicate bucket data to multiple destinations
- minio or any aws s3 compatible
- google-storage
- async healthcheck server
- prometheus metrics server
- nats jetstream provisioning
- efficient queue pull consumer
- graceful shutdown wait timer
- ignore lifecycle expirations
- exclude paths with pcre regex
detailed
A MinIO bucket can be configured to send bucket event notifications for (put & delete) activity to a NATS Jetstream cluster stream.
The NATS Jetstream stream provides a place for the notification messages to queue where they are stored on persistent storage and it guarantees
exactly-once delivery to the archie worker pool using a server-side max timeout retry. The archie workers use the same
NATS Jetstream durable consumer, each one requesting to pull a single event notification message at a time. In the event of a failure archie will
inform the NATS server of the failure and the client-side will request a retry using a more rapid exponential backoff.
The process with only give up and terminate retries after the maxRetries in a few situations,
if the source file in MinIO doesn't exist on a copy, or if the destination file doesn't exist on a delete, any other errors
or timeouts will result in retrying forever or until the message is expired from the NATS Jetstream stream.
notes
The MinIO server does not need to be in Kubernetes, but it does need to be able to communicate with the NATS cluster to deliver the event
notifications to the stream. The NATS cluster could also exist outside of Kubernetes, but I have not tested it.
deploy
Check out the helm chart INSTALL.md
chart features:
- archie worker deployment
- keda
ScaledObjectdeployment scaler - prometheus
ServiceMonitormetrics - prometheus
PrometheusRulesalerts
usage
For CLI and config.yaml settings visit CONFIGURE.md
queue
Use NATS JetStream to queue bucket event notifications from MinIO.
autoscaling
Use KEDA's NATS JetStream Scaler to scale the workers.
development
Check out DEVELOPER.md
known issues
- NATS JetStream stream's first sequence metric is unstable - TODO: Create PR (monitoring issue only)
- PCRE Regex module somewhat limits our build OS and ARCH - INFO
- KEDA needed a patch to fix the scaler for using jetstream in a cluster - PR #3564 (merged)
- NATS-Exporter needed to pass the
first_seqstream info - PR #190 (merged) - MinIO doesn't reconnect to NATS server if it is down for a while - PR #16050 (merged)
