BagelHole

loki-logging

Configure Grafana Loki for log aggregation and analysis. Set up Promtail for log collection, write LogQL queries, and integrate with Grafana for visualization. Use when implementing lightweight log aggregation, especially in Kubernetes environments.

BagelHole 28 4 Updated 4mo ago
GitHub

Install

npx skillscat add bagelhole/devops-security-agent-skills/loki-logging

Install via the SkillsCat registry.

SKILL.md

Grafana Loki

Aggregate and query logs with Grafana Loki, the Prometheus-inspired logging system.

When to Use This Skill

Use this skill when:

  • Implementing cost-effective log aggregation
  • Building logging for Kubernetes environments
  • Integrating logs with Grafana dashboards
  • Querying logs with label-based filtering
  • Preferring lighter-weight alternative to ELK

Prerequisites

  • Docker or Kubernetes
  • Grafana for visualization
  • Promtail or other log shipper

Architecture Overview

┌─────────────┐     ┌──────────┐     ┌──────────┐
│ Application │────▶│ Promtail │────▶│   Loki   │
└─────────────┘     └──────────┘     └──────────┘
                                          │
                                          ▼
                                     ┌──────────┐
                                     │ Grafana  │
                                     └──────────┘

Docker Deployment

# docker-compose.yml
version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yaml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

volumes:
  loki-data:
  grafana-data:

Loki Configuration

# loki-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_query_series: 5000
  max_query_parallelism: 2

chunk_store_config:
  max_look_back_period: 168h

table_manager:
  retention_deletes_enabled: true
  retention_period: 168h

Promtail Configuration

# promtail-config.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # System logs
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*.log

  # Docker container logs
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

  # Application logs with parsing
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: application
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message
            timestamp: timestamp
      - labels:
          level:
      - timestamp:
          source: timestamp
          format: RFC3339

Kubernetes Deployment

# Using Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true

Promtail DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: promtail
  template:
    metadata:
      labels:
        app: promtail
    spec:
      containers:
        - name: promtail
          image: grafana/promtail:2.9.0
          args:
            - -config.file=/etc/promtail/promtail.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/promtail
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: promtail-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

LogQL Queries

Basic Queries

# All logs from a job
{job="application"}

# Filter by label
{job="application", level="error"}

# Multiple labels
{namespace="production", container="api"}

# Regex match
{job=~"app.*"}

Log Pipeline

# Filter by content
{job="application"} |= "error"

# Exclude content
{job="application"} != "debug"

# Regex filter
{job="application"} |~ "user_id=[0-9]+"

# JSON parsing
{job="application"} | json | level="error"

# Line format
{job="application"} | json | line_format "{{.level}}: {{.message}}"

Metric Queries

# Count logs per second
count_over_time({job="application"}[5m])

# Rate of errors
rate({job="application", level="error"}[5m])

# Sum by label
sum by (level) (count_over_time({job="application"}[5m]))

# Top services by error count
topk(5, sum by (service) (count_over_time({level="error"}[1h])))

Aggregations

# Average log line length
avg_over_time({job="application"} | unwrap line_length [5m])

# Percentile of numeric field
quantile_over_time(0.95, {job="application"} | json | unwrap response_time [5m])

# Error percentage
sum(rate({job="application", level="error"}[5m])) 
/ 
sum(rate({job="application"}[5m])) * 100

Pipeline Stages

# promtail-config.yaml
pipeline_stages:
  # Parse JSON logs
  - json:
      expressions:
        level: level
        message: msg
        trace_id: trace_id

  # Extract with regex
  - regex:
      expression: 'user_id=(?P<user_id>\d+)'

  # Add labels from parsed fields
  - labels:
      level:
      user_id:

  # Modify timestamp
  - timestamp:
      source: timestamp
      format: '2006-01-02T15:04:05.000Z'

  # Filter logs
  - match:
      selector: '{level="debug"}'
      action: drop

  # Add static labels
  - static_labels:
      environment: production

  # Modify log line
  - template:
      source: message
      template: '{{ ToUpper .Value }}'

Grafana Integration

Data Source Configuration

# grafana/provisioning/datasources/loki.yaml
apiVersion: 1

datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false
    jsonData:
      maxLines: 1000

Dashboard Panel

{
  "title": "Application Logs",
  "type": "logs",
  "datasource": "Loki",
  "targets": [
    {
      "expr": "{job=\"application\"} | json",
      "refId": "A"
    }
  ],
  "options": {
    "showTime": true,
    "showLabels": true,
    "wrapLogMessage": true
  }
}

Recording Rules

# loki-rules.yaml
groups:
  - name: error_rates
    interval: 1m
    rules:
      - record: job:log_errors:rate5m
        expr: |
          sum by (job) (rate({level="error"}[5m]))

Alerting

# loki-alerts.yaml
groups:
  - name: log_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({level="error"}[5m])) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate in logs"
          description: "Error rate is {{ $value }} errors/second"

Common Issues

Issue: High Memory Usage

Problem: Loki consuming too much memory
Solution: Reduce max_query_series, limit query time range

Issue: Logs Not Appearing

Problem: Promtail not shipping logs
Solution: Check positions file, verify file paths, check label configuration

Issue: Query Timeout

Problem: LogQL queries timing out
Solution: Add more specific label filters, reduce time range

Issue: Ingestion Rate Limit

Problem: Logs being dropped
Solution: Increase per_stream_rate_limit in limits_config

Best Practices

  • Use meaningful labels (avoid high cardinality)
  • Filter by labels before log content
  • Parse logs at collection time with Promtail
  • Set appropriate retention periods
  • Use recording rules for common queries
  • Implement proper multitenancy for large deployments
  • Monitor Loki's own metrics
  • Use chunk caching for better performance

Related Skills