O&M Kubernetes Project

Introduction
Architecture
- Components
- Data Flow
Project Structure
Prerequisites
Installation
Usage
Components in Detail
Configurations
Troubleshooting
Contribution
License

Introduction

The O&M Kubernetes is a complete solution for implementing a modern observability and monitoring stack in Kubernetes environments. This project automates the deployment and management of an integrated suite of tools for monitoring, logging, tracing, and alerts, providing comprehensive visibility into infrastructure and applications.

The solution is designed following the principles of the three pillars of observability and monitoring:

Metrics: Collection and visualization of metrics with Prometheus and Grafana
Logs: Aggregation and analysis of logs with Loki and Promtail
Traces: Distributed tracing with Tempo

Additionally, the stack includes monitoring of external endpoints via Blackbox Exporter and advanced alert management through Alertmanager, with direct integration to webhooks (such as Discord).

Architecture

Components

The observability and monitoring stack consists of the following main components:

OpenTelemetry Collector: Collects, processes, and exports telemetry data
Prometheus: Time-series monitoring and alerting system
Alertmanager: Alert and notification management
Loki: Log aggregation system inspired by Prometheus
Grafana: Visualization and analytics platform
Promtail: Agent that sends logs to Loki
Tempo: Distributed tracing system
Blackbox Exporter: Monitoring of external endpoints via HTTP, HTTPS, DNS, TCP, and ICMP

Data Flow

                     ┌─────────────┐
                     │ Applications│
                     └──────┬──────┘
                            │
                            ▼
                  ┌─────────────────────┐
                  │ OpenTelemetry       │
                  │ Collector           │
                  └───┬───────┬─────┬───┘
                      │       │     │
           ┌──────────┘       │     └──────────┐
           │                  │                │
           ▼                  ▼                ▼
 ┌──────────────┐    ┌─────────────┐     ┌─────────────┐
 │ Prometheus   │    │    Loki     │     │   Tempo     │
 │ (Metrics)    │    │   (Logs)    │     │  (Traces)   │
 └──────┬───────┘    └──────┬──────┘     └──────┬──────┘
        │                   │                   │
        │                   │                   │
        └───────────┬───────┴────────────┬──────┘
                    │                    │
                    ▼                    ▼
             ┌─────────────┐      ┌─────────────┐
             │   Grafana   │      │ Alertmanager│
             │  (Visual)   │      │  (Alerts)   │
             └─────────────┘      └─────────────┘

Project Structure

observability-monitoring-kubernetes/
├── k8s/
│   ├── configmaps.yaml    # Configurations for all components
│   ├── deployments.yaml   # Kubernetes deployments for each service
│   ├── namespace.yaml     # Dedicated namespace definition
│   └── services.yaml      # Kubernetes service definitions
├── script.sh              # Stack management script
├── LICENSE                # License file (GNU GPL v3)
└── README.md              # This documentation

Prerequisites

Kubernetes Cluster: A functional Kubernetes cluster (Minikube, Kind, EKS, GKE, AKS, etc.)
kubectl: Kubernetes command-line tool (v1.20+)
- Installation: https://kubernetes.io/docs/tasks/tools/
- Correct configuration of kubeconfig pointing to the desired cluster
Permissions: Access to create/modify resources in the cluster (namespaces, deployments, services, configmaps)
Recommended Resources:
- At least 4GB of available RAM
- At least 2 vCPUs
- At least 10GB of disk space

Installation

Clone the repository:

gh repo clone gabrielldn/observability-monitoring-kubernetes
cd observability-monitoring-kubernetes

Verify that kubectl is correctly configured:
```
kubectl cluster-info
```
Grant execution permission to the script:
```
chmod +x script.sh
```

Usage

The script.sh script is the central point for managing the entire observability and monitoring stack.

Deploy the Stack

To deploy the entire observability and monitoring stack:

./script.sh deploy

This command will:

Create the observability namespace
Apply all ConfigMaps with configurations
Deploy all components (Deployments)
Configure the Services for communication between components

Check Status

To check the status of all components:

./script.sh status

This command will show:

Status of all pods in the namespace
Status of all services in the namespace

View Logs

To view logs, there are several options:

# View logs of all pods
./script.sh logs

# View logs of a specific component
./script.sh logs grafana

# View logs in real-time (follow)
./script.sh logs loki -f

# View logs of a component in real-time
./script.sh logs prometheus -f

Update Stack

To update the stack after configuration changes:

./script.sh update

Remove Stack

To completely remove the stack from the cluster:

./script.sh destroy

This command removes all resources in the following order:

Services
Deployments
ConfigMaps
Namespace

Components in Detail

OpenTelemetry Collector

Function: Collects, processes, and exports telemetry data (metrics, logs, and traces).

Features:

Supports gRPC (port 4317) and HTTP (port 4318) protocols
Configured to send:
- Metrics to Prometheus
- Traces to Tempo
- Logs to Loki
Processors configured for data enrichment

Access: Internally via otelcollector:4317 or otelcollector:4318

Configuration: See configmaps.yaml - section otel-collector-config

Prometheus

Function: Time-series monitoring and alerting system.

Features:

Scrape intervals configured to 15 seconds
Collects metrics from all stack components
Integrated with Blackbox Exporter for external monitoring
Alert rules configured

Access: Internally via prometheus:9090

Configuration: See configmaps.yaml - section prometheus-config

Alertmanager

Function: Manages alerts generated by Prometheus, including silencing, inhibition, and grouping.

Features:

Configured to send alerts to Discord webhook
Alert grouping by 'alert' and 'job'
Sends alert resolution notifications
Repeat interval configured to 30 minutes

Access: Internally via alertmanager:9093

Configuration: See configmaps.yaml - section alertmanager-config

Loki

Function: Log aggregation and query system.

Features:

Simplified local storage
Configurable log retention
Integrated with Grafana for visualization
Receives logs from Promtail and OpenTelemetry Collector

Access: Internally via loki:3100

Configuration: See configmaps.yaml - section loki-config

Grafana

Function: Visualization and analytics platform for metrics, logs, and traces.

Features:

Pre-configured with datasources for Prometheus, Loki, and Tempo
Default credentials: admin/admin
Default theme set to "light"
Correlation between metrics, logs, and traces

Access: Internally via grafana:3000

Configuration: See configmaps.yaml - section grafana-datasource

Promtail

Function: Agent that collects logs and sends them to Loki.

Features:

Automatic discovery of Docker containers with label "logging=promtail"
Support for multi-line and JSON format
Addition of labels based on container metadata

Access: Internally via promtail:9080

Configuration: See configmaps.yaml - section promtail-config

Tempo

Function: Backend for storing and querying distributed tracing data.

Features:

Supports OTLP, Jaeger, and other tracing formats
Integration with Prometheus for derived metrics
Integration with Grafana for visualization
Integration with Loki to correlate traces with logs

Access: Internally via tempo:3200

Configuration: See configmaps.yaml - section tempo-config

Blackbox Exporter

Function: Monitoring of external endpoints via HTTP, HTTPS, DNS, TCP, and ICMP.

Features:

Support for HTTP, TCP, and ICMP probes
Monitoring of external site status
Used by Prometheus for availability checks

Access: Internally via blackbox-exporter:9115

Configuration: See configmaps.yaml - section blackbox-config

Configurations

Customization

To customize the stack:

Adjust ConfigMaps: Modify the configuration files in k8s/configmaps.yaml
Adjust Resources: Change resource limits in k8s/deployments.yaml
Modify Endpoints: Adjust the endpoints monitored by Blackbox in k8s/configmaps.yaml
After changes: Run ./script.sh update to apply the modifications

Alerts

The alert system is configured with:

Alert Rules: Defined in alert-rules.yml within the Prometheus ConfigMap
Notifications: Configured for Discord in alertmanager.yml
Customization:
- Modify alert rules in configmaps.yaml - section prometheus-config
- Adjust webhooks in configmaps.yaml - section alertmanager-config

Integrations

The stack comes pre-configured for integration with:

Discord: For alert notifications
Instrumented Applications: Via OpenTelemetry Collector
Kubernetes: Monitoring of cluster resources

To add new integrations:

Add new receivers in the OpenTelemetry Collector
Configure new alertmanagers in Prometheus
Add new datasources in Grafana

Troubleshooting

Common issues and solutions:

Pods in CrashLoopBackOff state:

# Check the logs of the problematic pod
kubectl logs -n observability <pod-name>

# Check pod events
kubectl describe pod -n observability <pod-name>

Configuration issues:

# Check if ConfigMaps were created correctly
kubectl get configmaps -n observability

# Inspect a specific ConfigMap
kubectl get configmap -n observability <configmap-name> -o yaml

Inaccessible services:

# Check if endpoints are correct
kubectl get endpoints -n observability

Check connections between components:

# Use kubectl exec to test connections between pods
kubectl exec -it -n observability <pod-name> -- wget -O- <service>:<port>

Contribution

Contributions are welcome! To contribute:

Fork the repository
Create a branch for your feature (git checkout -b feature/new-feature)
Commit your changes (git commit -m 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Open a Pull Request

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Developed with ❤️ to simplify the implementation of observability and monitoring in Kubernetes environments.

gabrielldn/observability-monitoring-kubernetes

O&M Kubernetes Project

Table of Contents

Introduction

Architecture

Components

Data Flow

Project Structure

Prerequisites

Installation

Usage

Deploy the Stack

Check Status

View Logs

Update Stack

Remove Stack

Components in Detail

OpenTelemetry Collector

Prometheus

Alertmanager

Loki

Grafana

Promtail

Tempo

Blackbox Exporter

Configurations

Customization

Alerts

Integrations

Troubleshooting

Common issues and solutions:

Contribution

License

On this page

Languages

Contributors

Latest Release