sambryant4/diego-capacity-monitor
diego-capacity-monitor
diego-capacity-monitor is a Cloud Foundry deployable web application that subscribes to the CF Firehose to gather memory metrics about Diego cells. It then reports health states based a number of key metrics described below.
Operation
If there are no errors this application will return a json response similar to:
{
healthy: true,
message:"Everything is awesome!",
details:[
{
index: 1,
memory: 7000,
low_memory: false
},
{
index: 2,
memory: 7000,
low_memory: false
}
]
cellCount: 2,
cellMemory: 10000,
watermark: 1,
requested_watermark: "1",
totalFreeMemory: 14000,
WatermarkMemoryPercent: 40
}
The following error messages and status can also be received:
- Its under a minute since the system was started
- report.Message = "I'm still initialising, please be patient!"
- report.Healthy = false
- status = http.StatusExpectationFailed
- Invalid Watermark value supplied
- reports.Message = "Error occurred while calculating cell count"
- report.Healthy = false
- status = http.StatusInternalServerError
- No metrics were found
- report.Message = "I'm sorry Dave I can't show you any data"
- report.Healthy = false
- status = http.StatusGone
- The cellCount is not more than the watermark count
- report.Message = "The number of cells needs to exceed the watermark amount!"
- report.Healthy = false
- status = http.StatusExpectationFailed
- A third or more of the cells are under 2G memory free
- report.Message = "At least a third of the cells are low on memory!"
- report.Healthy = false
- status = http.StatusExpectationFailed
- There is less memory free than the watermark amount
- report.Message = "FATAL - There is not enough space to do an upgrade, add cells or reduce watermark!"
- report.Healthy = false
- status = http.StatusExpectationFailed
- During an upgrade there would be less than 20% memory free
- report.Message = "The percentage of free memory will be too low during a migration!"
- report.Healthy = false
- status = http.StatusExpectationFailed
Deployment
Watermark value
The watermark value is the number of Diego cells that will be excluded from the remaining capacity calculation, the intention is for this value to match the number of cells you would upgrade in parallel when performing a bosh deploy. Based on this theory the WatermarkMemoryPercent will show a percentage of spare load during an upgrade event, to ensure app migrations can happen in a timely manner between draining cells.
This value can be supplied either as the number of cells to upgrade in parallel, or as a percentage. It has a default value of 1.
Example:
If we had 50 Diego Cells
WATERMARK: 10% - Watermark count = 5
WATERMARK: 10 - Watermark count = 10
Manual deployment
cf target -o <my_org> -s <my_space>
cf push --no-start
cf set-env diego-capacity-monitor CF_API_ENDPOINT <https://api.system.domain.cf>
cf set-env diego-capacity-monitor CF_USERNAME <CF_USERNAME_FOR_FIREHOSE_CONNECTION>
cf set-env diego-capacity-monitor CF_PASSWORD <CF_PASSWORD_FOR_FIREHOSE_CONNECTION>
cf set-env diego-capacity-monitor WATERMARK <optional, value will default to 1>
cf start diego-capacity-monitor
Automated zero-downtime deployment
CF_SYS_DOMAIN=system.example.cf.com \
CF_DEPLOY_USERNAME=cf_admin \
CF_DEPLOY_PASSWORD=123456789abcdef \
ORG_NAME=my_org \
SPACE_NAME=my_space \
CF_API_ENDPOINT=https://api.system.domain.cf \
CF_USERNAME=cf_firehose_username \
CF_PASSWORD=cf_firehose_password \
APP_NAME=my_diego_capacity_monitoring_app \
./deploy.sh
Development
Currently, this repo should be manually cloned into $GOPATH/src/github.com/FidelityInternational/diego-capacity-monitoras the Godeps.json file has FidelityInternational github.com import path set (which will be used when we open source this project).
Testing
Prereqs
brew install redis
go get github.com/EverythingMe/disposable-redis
go get github.com/onsi/gingko/ginkgo
Test
ginkgo -r -cover
Smoke Tests
APP_URL=<diego-capacity-monitor.apps.example.com> \
./smoke_test.sh
