OpenTelemetry Logo OpenTelemetry standardizes observability across environments - Image by OpenTelemetry

🚀 Introduction

In today’s distributed software landscape, observability is not a luxury—it’s a necessity. As systems grow in complexity, the ability to monitor, trace, and debug across services becomes critical for reliability and rapid iteration. OpenTelemetry (OTel) has emerged as the open standard for collecting traces, metrics, and logs, but setting up a full observability stack can be daunting, especially when you want parity between local development and production.

That’s where otel-stack comes in. This open-source repository provides a ready-to-use, containerized observability stack that you can spin up locally or deploy in production. It bridges the gap between experimentation and real-world monitoring, making it easy for teams to adopt best practices without the usual friction.

🔍 Why Observability Matters

  • Faster Debugging: Trace requests end-to-end across microservices, pinpointing bottlenecks and failures quickly.
  • 📊 Performance Insights: Collect metrics to understand system health, resource usage, and latency trends.
  • 🛎️ Proactive Monitoring: Set up alerts and dashboards to catch issues before they impact users.
  • 🔄 Consistency: Use the same stack locally and in production, reducing surprises and configuration drift.

🧰 What is otel-stack?

The otel-stack repository bundles:

  • 🟣 OpenTelemetry Collector (otelcol): Ingests, processes, and exports telemetry data (traces, metrics, logs) from your applications.
  • 🟡 Prometheus: Scrapes and stores metrics, enabling powerful monitoring and alerting.
  • 🟢 Grafana: Visualizes metrics, traces, and logs in beautiful dashboards.
  • 🟠 Tempo: A high-scale, cost-effective distributed tracing backend from Grafana Labs, storing and querying traces.
  • 🔵 Loki: A horizontally-scalable, highly-available log aggregation system inspired by Prometheus, for storing and querying logs.

All components are orchestrated via Docker Compose, with sensible defaults and extensible configuration. Whether you’re running on your laptop or deploying to the cloud, otel-stack provides a unified observability platform.

🧩 How the Stack Works

  • otelcol receives telemetry data from your instrumented applications (using OpenTelemetry SDKs), and routes traces to Tempo, metrics to Prometheus, and logs to Loki.
  • Prometheus scrapes metrics from otelcol and your services, storing them for analysis and alerting.
  • Tempo stores distributed traces, allowing you to follow requests across services.
  • Loki collects and indexes logs, making it easy to correlate logs with traces and metrics.
  • Grafana provides a single pane of glass to visualize and explore all your observability data.

💻 Local Development: Observability Without the Hassle

Developers can run the stack locally with a single command. Instrument your code with OpenTelemetry SDKs, and instantly see traces in Tempo, metrics in Prometheus, and logs in Loki—all visualized in Grafana. This tight feedback loop accelerates debugging and helps you build with confidence.

Example:

git clone https://github.com/meesum-Ali/otel-stack.git
cd otel-stack
docker compose up -d

Instrument your app, point it to the local collector, and you’re ready to go! 🎉

🌐 Production-Ready: Scale with Confidence

Otel-stack isn’t just for demos. With minor tweaks, you can deploy the same stack in production. Integrate with managed backends, scale out collectors, and customize dashboards to fit your needs. The repository includes documentation for cloud deployments and best practices for secure, reliable observability.

🌟 Real-World Use Cases

  • 🕵️‍♂️ Microservices Debugging: Trace requests across services to find latency sources and errors using Tempo.
  • 📈 Performance Monitoring: Visualize metrics like request rates, error counts, and resource usage in Prometheus and Grafana.
  • 🚨 Incident Response: Correlate logs (Loki), traces (Tempo), and metrics (Prometheus) in Grafana to diagnose and resolve outages faster.
  • 🔬 Continuous Improvement: Analyze trends over time to guide architectural and process improvements.

🏁 Getting Started

  1. 🧑‍💻 Clone the otel-stack repo.
  2. 📄 Follow the README to spin up the stack locally or in your environment.
  3. 🛠️ Instrument your applications using OpenTelemetry SDKs for your language.
  4. 📊 Explore traces in Tempo, metrics in Prometheus, and logs in Loki—all via Grafana.

📝 Example docker-compose.yml

Here’s a simplified version of the stack’s docker-compose.yml:

version: "3.8"
services:
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.97.0
    command: ["--config=/etc/otelcol/config.yaml"]
    volumes:
      - ./otelcol/otelcol.yaml:/etc/otelcol/config.yaml:ro
    ports:
      - "4317:4317"        # OTLP/gRPC IN (traces, metrics, logs)
      - "8888:8888"        # Collector internal metrics endpoint
    depends_on:
      - tempo
      - loki
      - prometheus

  grafana:
    image: grafana/grafana:11.0.0
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: admin
    depends_on:
      - otelcol
      - tempo
      - loki
      - prometheus

  tempo:
    image: grafana/tempo:2.4.1
    command: ["-config.file=/etc/tempo/tempo.yaml"]
    volumes:
      - ./tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
    ports:
      - "3200:3200"   # Tempo HTTP query API

  loki:
    image: grafana/loki:3.0.0
    command: -config.file=/etc/loki/loki-local.yaml
    volumes:
      - ./loki/loki-local.yaml:/etc/loki/loki-local.yaml:ro
    ports:
      - "3100:3100"   # Loki push & query API

  prometheus:
    image: prom/prometheus:v2.52.0
    command:
      - '--config.file=/etc/prometheus/prometheus.yaml'
      - '--web.enable-remote-write-receiver'
    volumes:
      - ./prometheus/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
    ports:
      - "9090:9090"   # Prometheus UI & scrape endpoint

🎯 Conclusion

Observability shouldn’t be an afterthought. With otel-stack, you can bring world-class monitoring, tracing, and logging to every stage of development and operations. Try it out, contribute, and join the movement towards more reliable, transparent software systems. 🌍

Image Credit: OpenTelemetry Logo