GitHunt
MA

mayukhdas/LoadStar-VLDB

Artifacts of LoadStar framework for VLDB

Analyzing and Optimizing NoSQL Workloads for Cosmos DB using Distressed Resource Volume Metric

Gunika Verma1, Aashutosh A V1, Pooja Srinivas1, Yogesh Simmhan2, Ayush Choure1, Harshit Shah3, Mayukh Das1, Prashant Sasatte3, Chetan Bansal1, Abhijit Pai3, Suraj Dixit3 and Achint Agrawal3

1M365 Research, Microsoft, Bangalore 2Indian Institute of Science, Bangalore, 3Azure, Microsoft, Bangalore

To Appear in PVLDB 2026

Large scale managed cloud databases leverage sophisticated load packing algorithms, which provide the efficiencies and economy of scale necessary for running these services. We address this in the context of Cosmos DB, Microsoft's flagship cloud-hosted NoSQL database, through a novel problem definition of balancing packing efficiency against user-centric reliability metrics. We first propose open-source NoSQL workloads from real Cosmos DB clusters, and analyze these workloads to derive a novel reliability metric, Distressed Resource Volume (DRV).

We then develop an open-source policy simulation framework, LOADSTAR, powered by a nonparametric statistical model of estimating the QoS of real traffic patterns. We define a resource optimization problem for placing Cosmos DB replicas onto VM nodes, develop a forecasting model for future load distributions, LUNA, and a placement algorithm that uses these forecasts to trigger and rebalance stressed replicas, ORBIT, to reduce tail-errors. Our experiments demonstrate ORBIT's benefits over the existing Cosmos DB policy and a worst-fit optimized baseline, with higher load delivered at lower error rates and up to 35% reduction in resources.


About this Repository

This repository contains the artifacts for the above VLDB 2026 article. Specifically, it includes the Policy Simulator (LOADSTAR) and the Forecasting Model (LUNA+ORBIT) used to evaluate the proposed models, policies and algorithms. Due to their large size, the CosmosDB trace files themselves for different testbeds used in the article are hosted on Zenodo and accessed by our tools. We provide instructions for installing, building and running these components on a single machine.

These instructions help install the environment and run the experiments. The goal is ensure that the artifacts can be evaluated to be Functional, i.e., the artifacts associated with the research are found to be documented, consistent, complete, exercisable, and include appropriate evidence of verification and validation.

We provide scripts to (1) Setup the environment, (2) Build the code, (3) Process the CosmosDB trace data, (4) Run the LUNA+ORBIT forecasting model, (5) Run the LOADSTAR policy simulation, and (6) Analyze the performance using DRV profiles.

Document Description
REPRODUCIBILITY.md Quick-start here with a one-touch pipeline (for default and each experiments), expected artifacts, troubleshooting and manual steps.
experiments/README.md Details of configurations required to run each experiment in the article.
DATASET.md Description of the CosmosDB replica trace files for testbeds, hosted on Zenodo
INPUT_FORMAT.md Instructions for configuring and executing only the LoadStar policy simulator
INSTRUCTIONS.md Full setup, data flow, execution order, and analysis (DRV, error-rate preprocessing).

Acknowledgements

We thank students/staff from the DREAM:Lab, IISc, including Pranjal Naman, Mayank Arya, Kautuk Astu and Nikhil Reddy, for help with the experiments, plots and reproducibility.

Contact

For more information on this repo, please contact the authors at Microsoft M365 Research (mayukhdas@microsoft.com).