Blackbox Exporter

Overview

Blackbox Exporter is a Prometheus exporter that allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, and ICMP protocols. It's used for monitoring external service availability, SSL certificate expiry, network latency, and overall connectivity.

Key Features

Multi-Protocol Probing: HTTP, HTTPS, DNS, TCP, and ICMP
SSL/TLS Monitoring: Certificate expiry tracking and TLS version validation
Response Time Metrics: Latency and performance monitoring
Flexible Configuration: Customizable probe modules for different use cases
Comprehensive Alerts: Automatic alerting for service downtime, certificate expiry, and degraded performance

Use Cases in This Cluster

Internal Service Monitoring: HTTP/HTTPS probes for ArgoCD and Grafana
External Connectivity: Monitoring internet connectivity via Google and Cloudflare
DNS Health: Query monitoring for gateway, Cloudflare, and Google DNS servers
Certificate Expiry: SSL certificate monitoring for all external HTTPS endpoints
Infrastructure Availability: ICMP ping monitoring for NAS, gateway, and critical infrastructure

Architecture

┌─────────────────┐
│   Prometheus    │
│                 │
│  (Scrapes       │
│   Blackbox      │
│   Exporter)     │
└────────┬────────┘
         │
         │ HTTP GET /probe?target=X&module=Y
         │
         ▼
┌─────────────────────────────┐
│    Blackbox Exporter        │
│                             │
│  - HTTP/HTTPS Prober        │
│  - DNS Prober               │
│  - ICMP Prober              │
│  - TCP Prober               │
└──────────┬──────────────────┘
           │
           │ Probe Target
           │
           ▼
     ┌──────────┐
     │  Target  │
     │ Endpoint │
     └──────────┘

Deployment Details

Container Image

Image: prom/blackbox-exporter:v0.28.0
Port: 9115 (HTTP metrics and probes)
Probes: Liveness and readiness checks on /health

Version Update (2026-01-11)

Upgraded from v0.25.0 to v0.28.0 to address 2 CRITICAL and 7 HIGH vulnerabilities. Now has 0 vulnerabilities.

Resource Allocation

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 128Mi

Security Context

RunAsNonRoot: true
RunAsUser: 65534 (nobody)
ReadOnlyRootFilesystem: true
Capabilities: NET_RAW (required for ICMP probes)

Configuration

Probe Modules

The Blackbox Exporter is configured with multiple probe modules for different monitoring scenarios:

HTTP/HTTPS Probes

http_2xx - Basic HTTP probe

prober: http
timeout: 5s
http:
  valid_status_codes: []  # 2xx
  method: GET
  follow_redirects: true

https_cert_expiry - HTTPS with certificate validation

prober: http
timeout: 5s
http:
  method: GET
  fail_if_not_ssl: true
  tls_config:
    insecure_skip_verify: false

DNS Probe

dns_query - DNS resolution monitoring

prober: dns
timeout: 5s
dns:
  query_name: "kubernetes.default.svc.cluster.local"
  query_type: "A"

ICMP Probe

icmp_ping - Network connectivity via ping

prober: icmp
timeout: 5s
icmp:
  preferred_ip_protocol: "ip4"

TCP Probe

tcp_connect - TCP port connectivity

prober: tcp
timeout: 5s

Monitored Targets

Internal Services (HTTP)

http://argocd-server.argocd:80 - ArgoCD server

External Services (HTTPS)

https://argocd.k8s.n37.ca - ArgoCD external access
https://grafana.k8s.n37.ca - Grafana dashboards
https://google.com - External connectivity test
https://cloudflare.com - External connectivity test

DNS Servers

10.0.1.1:53 - Gateway DNS
1.1.1.1:53 - Cloudflare DNS
8.8.8.8:53 - Google DNS

ICMP Targets

10.0.1.204 - Synology NAS
10.0.1.1 - Network gateway
8.8.8.8 - Google DNS (connectivity test)

Prometheus Integration

Scrape Configuration

The Blackbox Exporter is configured in Prometheus via additionalScrapeConfigs with four separate jobs:

blackbox-http: HTTP endpoint monitoring (30s interval)
blackbox-https: HTTPS with cert expiry (60s interval)
blackbox-dns: DNS query monitoring (30s interval)
blackbox-icmp: ICMP ping monitoring (30s interval)

Relabeling Configuration

Each scrape job uses relabeling to properly set target labels:

relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:9115

This configuration:

Moves the target address to a query parameter
Sets the instance label to the target
Directs Prometheus to scrape the Blackbox Exporter

Alerting

Alert Rules

Comprehensive alerting is configured via PrometheusRule:

Service Availability

Alert Name	Condition	Duration	Severity
`EndpointDown`	`probe_success == 0`	5 minutes	Critical
`EndpointDegraded`	`probe_success == 0`	1 minute	Warning

SSL Certificates

Alert Name	Condition	Duration	Severity
`SSLCertificateExpiresIn30Days`	Expires in < 30 days	1 hour	Warning
`SSLCertificateExpiresIn7Days`	Expires in < 7 days	1 hour	Critical
`SSLCertificateExpired`	Expired certificate	5 minutes	Critical
`TLSVersionTooOld`	TLS 1.0 or 1.1	1 hour	Warning

Performance

Alert Name	Condition	Duration	Severity
`HighHTTPResponseTime`	Response time > 5s	5 minutes	Warning
`VeryHighHTTPResponseTime`	Response time > 10s	2 minutes	Critical
`HighDNSResponseTime`	DNS lookup > 1s	5 minutes	Warning
`HighICMPLatency`	Ping latency > 100ms	5 minutes	Warning

DNS & Network

Alert Name	Condition	Duration	Severity
`DNSQueryFailed`	DNS probe fails	5 minutes	Critical
`HostUnreachable`	ICMP probe fails	5 minutes	Critical

Key Metrics

Probe Success

probe_success: 1 if probe succeeded, 0 if failed
probe_duration_seconds: Total probe duration

HTTP Metrics

probe_http_status_code: HTTP status code returned
probe_http_duration_seconds: HTTP request duration by phase
probe_http_redirects: Number of redirects followed
probe_http_ssl: 1 if SSL was used

SSL/TLS Metrics

probe_ssl_earliest_cert_expiry: Unix timestamp of certificate expiry
probe_tls_version_info: TLS version used (1.0, 1.1, 1.2, 1.3)

DNS Metrics

probe_dns_lookup_time_seconds: DNS query duration
probe_dns_answer_rrs: Number of DNS answer records

ICMP Metrics

probe_icmp_duration_seconds: ICMP round-trip time
probe_icmp_reply_hop_limit: TTL of ICMP reply

Useful Queries

Service Availability

# Current status of all probes
probe_success

# Failed probes
probe_success == 0

# Uptime percentage (last 24h)
avg_over_time(probe_success[24h]) * 100

SSL Certificate Expiry

# Days until certificate expires
(probe_ssl_earliest_cert_expiry - time()) / 86400

# Certificates expiring in < 30 days
(probe_ssl_earliest_cert_expiry - time()) / 86400 < 30

Response Times

# HTTP response times
probe_http_duration_seconds

# DNS query times
probe_dns_lookup_time_seconds

# ICMP ping latency
probe_icmp_duration_seconds

# 95th percentile response time (last 5m)
histogram_quantile(0.95, rate(probe_http_duration_seconds_bucket[5m]))

Connectivity Health

# ICMP packet loss rate
1 - avg_over_time(probe_success{job="blackbox-icmp"}[5m])

# DNS failure rate
1 - avg_over_time(probe_success{job="blackbox-dns"}[5m])

Grafana Dashboards

Recommended Dashboards

Prometheus Blackbox Exporter (ID: 7587)
- Overview of all probes
- Success rates and response times
- SSL certificate status
Blackbox Exporter SSL/TLS (ID: 13659)
- Certificate expiry tracking
- TLS version distribution
- Certificate chain details

Custom Dashboard Panels

Service Availability Timeline

probe_success{job=~"blackbox.*"}

Certificate Expiry (Days)

(probe_ssl_earliest_cert_expiry - time()) / 86400

Response Time Heatmap

probe_http_duration_seconds

Troubleshooting

Common Issues

Probes Failing

Problem: probe_success == 0 for a target

Solutions:

Check target is reachable from the cluster:

kubectl exec -it deployment/blackbox-exporter -- wget -O- http://target

Verify DNS resolution:

kubectl exec -it deployment/blackbox-exporter -- nslookup target

Check probe configuration:

kubectl logs deployment/blackbox-exporter

ICMP Probes Not Working

Problem: ICMP probes fail with permission errors or are blocked by network policy

Solutions:

Verify the deployment has NET_RAW capability:

kubectl get deployment blackbox-exporter -o yaml | grep -A5 capabilities

Verify the Calico NetworkPolicy allows ICMP egress:

kubectl get networkpolicy.p.projectcalico.org -n default allow-egress-icmp-calico -o yaml

Calico NetworkPolicy Required for ICMP (2026-02-27)

Kubernetes NetworkPolicies only support TCP, UDP, and SCTP protocols — not ICMP. A Calico NetworkPolicy (allow-egress-icmp-calico) is required to permit ICMP egress from the blackbox-exporter pod. The policy uses Calico selector syntax (app == 'blackbox-exporter') and restricts ICMP to the exact probe targets: 10.0.1.1/32 (gateway), 10.0.1.204/32 (NAS), 8.8.8.8/32 (Google DNS).

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-egress-icmp-calico
  namespace: default
spec:
  selector: app == 'blackbox-exporter'
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        nets:
          - 10.0.1.1/32
          - 10.0.1.204/32
          - 8.8.8.8/32
      protocol: ICMP

SSL Certificate Warnings

Problem: SSL certificate metrics not appearing

Solution: Ensure https_cert_expiry module is used for HTTPS targets, not http_2xx

High Memory Usage

Problem: Blackbox exporter consuming excessive memory

Solutions:

Reduce probe frequency in additionalScrapeConfigs
Limit number of targets
Increase memory limits if justified

Debug Commands

# Check deployment status
kubectl get deployment blackbox-exporter

# View logs
kubectl logs deployment/blackbox-exporter

# Test a probe manually
kubectl exec -it deployment/blackbox-exporter -- wget -O- \
  'http://localhost:9115/probe?target=https://google.com&module=https_cert_expiry'

# View current configuration
kubectl get configmap blackbox-exporter-config -o yaml

# Check Prometheus targets
kubectl port-forward -n default svc/kube-prometheus-stack-prometheus 9090:9090
# Navigate to: http://localhost:9090/targets

Hairpin NAT Issues (Internal HTTPS Probes)

Problem: Probes to internal services via external DNS names (e.g., https://grafana.k8s.n37.ca) fail with "context deadline exceeded"

Symptoms:

Get "https://10.0.10.10": context deadline exceeded

Why it happens: Pods inside the cluster cannot reach external IPs (MetalLB LoadBalancer IPs) that route back to the same cluster. This is called "hairpin NAT" and causes connection timeouts due to NAT asymmetry.

Solution: Use hostAliases in the deployment to resolve external hostnames directly to the ingress ClusterIP:

# In blackbox-exporter-deployment.yaml
spec:
  template:
    spec:
      hostAliases:
        - ip: "10.98.168.24"  # ingress-nginx ClusterIP
          hostnames:
            - "argocd.k8s.n37.ca"
            - "grafana.k8s.n37.ca"
            - "workflows.k8s.n37.ca"

Get the ingress ClusterIP:

kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'

tip

The ClusterIP is stable unless the ingress-nginx service is deleted and recreated. If probes start failing after ingress-nginx changes, check if the ClusterIP has changed.

Added 2026-01-29

This fix was implemented to resolve certificate health check failures for internal HTTPS endpoints.

Maintenance

Adding New Targets

Edit manifests/base/kube-prometheus-stack/values.yaml

Add target to appropriate additionalScrapeConfigs job:

- job_name: 'blackbox-https'
  static_configs:
    - targets:
        - https://new-service.k8s.n37.ca

Commit and let ArgoCD sync

Updating Probe Modules

Edit manifests/base/kube-prometheus-stack/blackbox-exporter-configmap.yaml
Modify or add probe modules as needed

Restart the deployment:

kubectl rollout restart deployment/blackbox-exporter

Certificate Monitoring Best Practices

Monitor certificates at least 30 days before expiry
Set up critical alerts for 7-day threshold
Verify cert-manager is renewing certificates automatically
Test alert notifications regularly

References

Official Documentation: Blackbox Exporter GitHub
Probe Configuration: Configuration Guide
Example Configurations: Examples
Grafana Dashboards: Dashboard Gallery

Kube Prometheus Stack - Core monitoring stack
SNMP Exporter - Synology NAS monitoring
Cert Manager - Automatic certificate management
Monitoring Overview - Complete monitoring architecture

Overview​

Key Features​

Use Cases in This Cluster​

Architecture​

Deployment Details​

Container Image​

Resource Allocation​

Security Context​

Configuration​

Probe Modules​

HTTP/HTTPS Probes​

DNS Probe​

ICMP Probe​

TCP Probe​

Monitored Targets​

Internal Services (HTTP)​

External Services (HTTPS)​

DNS Servers​

ICMP Targets​

Prometheus Integration​

Scrape Configuration​

Relabeling Configuration​

Alerting​

Alert Rules​

Service Availability​

SSL Certificates​

Performance​

DNS & Network​

Key Metrics​

Probe Success​

HTTP Metrics​

SSL/TLS Metrics​

DNS Metrics​

ICMP Metrics​

Useful Queries​

Service Availability​

SSL Certificate Expiry​

Response Times​

Connectivity Health​

Grafana Dashboards​

Recommended Dashboards​

Custom Dashboard Panels​

Troubleshooting​

Common Issues​

Probes Failing​

ICMP Probes Not Working​

SSL Certificate Warnings​

High Memory Usage​

Debug Commands​

Hairpin NAT Issues (Internal HTTPS Probes)​

Maintenance​

Adding New Targets​

Updating Probe Modules​

Certificate Monitoring Best Practices​

References​

Related Documentation​

Overview

Key Features

Use Cases in This Cluster

Architecture

Deployment Details

Container Image

Resource Allocation

Security Context

Configuration

Probe Modules

HTTP/HTTPS Probes

DNS Probe

ICMP Probe

TCP Probe

Monitored Targets

Internal Services (HTTP)

External Services (HTTPS)

DNS Servers

ICMP Targets

Prometheus Integration

Scrape Configuration

Relabeling Configuration

Alerting

Alert Rules

Service Availability

SSL Certificates

Performance

DNS & Network

Key Metrics

Probe Success

HTTP Metrics

SSL/TLS Metrics

DNS Metrics

ICMP Metrics

Useful Queries

Service Availability

SSL Certificate Expiry

Response Times

Connectivity Health

Grafana Dashboards

Recommended Dashboards

Custom Dashboard Panels

Troubleshooting

Common Issues

Probes Failing

ICMP Probes Not Working

SSL Certificate Warnings

High Memory Usage

Debug Commands

Hairpin NAT Issues (Internal HTTPS Probes)

Maintenance

Adding New Targets

Updating Probe Modules

Certificate Monitoring Best Practices

References

Related Documentation