Trivy Vulnerability Remediation Guide
Production Status
Last Updated: 2026-01-12
Trivy Operator Status: ✅ OPERATIONAL (since 2026-01-05 deployment)
Current Alerts:
- ✅ CriticalVulnerabilitiesDetected: Firing (4 images with CRITICAL CVEs)
- ✅ HighVulnerabilityCount: Firing for 3 images
- ✅ ExposedSecretsDetected: Not firing (no secrets exposed)
- ✅ Alert routing to email confirmed working
Current Vulnerability Summary (2026-01-12):
| Severity | Count | Change from Initial (2026-01-05) |
|---|---|---|
| CRITICAL | 10 | ⬇️ -43 (-81%) |
| HIGH | 332 | ⬇️ -422 (-56%) |
| MEDIUM | 1,074 | ⬇️ -425 (-28%) |
| Reports | 95 | +18 images scanned |
Remaining CRITICAL Vulnerabilities:
| Component | CRITICAL | Blocker |
|---|---|---|
| Synology CSI (controller) | 3 | Awaiting upstream v1.2.2 |
| Synology CSI (snapshotter) | 3 | Awaiting upstream v1.2.2 |
| Synology CSI (node) | 3 | Awaiting upstream v1.2.2 |
| Trivy Server | 1 | Awaiting Alpine base image fix |
Overview
This guide provides procedures for responding to and remediating vulnerabilities detected by Trivy Operator in the Raspberry Pi 5 Kubernetes homelab cluster.
Monitoring and Alerting
Grafana Dashboard
Access the Trivy Security Dashboard at: https://grafana.k8s.n37.ca
Dashboard Panels:
- Total Vulnerabilities: Critical, High, and Medium severity counts
- Images Scanned: Total number of container images monitored
- Vulnerabilities by Image: Sortable table with severity breakdown
- Severity Distribution: Pie chart showing vulnerability composition
- Namespace Breakdown: Critical+High vulnerabilities by namespace
Active Alerts
PrometheusRule alerts configured in manifests/base/trivy-operator/trivy-alerts.yaml:
| Alert Name | Severity | Threshold | Purpose |
|---|---|---|---|
CriticalVulnerabilitiesDetected | Critical | Any image with CRITICAL CVEs | Immediate notification of critical security issues |
HighVulnerabilityCount | Warning | >20 HIGH vulnerabilities in single image | Warn when vulnerability count is excessive |
ClusterCriticalVulnerabilityThresholdExceeded | Warning | >100 CRITICAL across cluster | Cluster-wide security posture degradation |
ExposedSecretsDetected | Critical | Any exposed secrets in images | IMMEDIATE ACTION REQUIRED |
HighRiskRBACPermissions | Warning | Critical RBAC issues | Overly permissive cluster roles |
CISKubernetesBenchmarkFailures | Info | CIS compliance failures | Compliance monitoring |
NSAKubernetesHardeningFailures | Info | NSA hardening failures | Security hardening gaps |
Vulnerability Response Workflow
1. Alert Triage (within 1 hour)
When receiving a vulnerability alert:
# View all vulnerability reports
kubectl get vulnerabilityreports -A
# Get detailed report for specific workload
kubectl get vulnerabilityreport -n <namespace> <report-name> -o yaml
# Filter for CRITICAL vulnerabilities only
kubectl get vulnerabilityreports -A -o json | \
jq -r '.items[] | select(.report.summary.criticalCount > 0) |
"\(.metadata.namespace)/\(.metadata.labels."trivy-operator.resource.name"): \(.report.summary.criticalCount) CRITICAL"'
2. Vulnerability Assessment
For each CRITICAL vulnerability:
-
Identify the CVE:
kubectl get vulnerabilityreport -n <namespace> <report> -o json | \
jq -r '.report.vulnerabilities[] | select(.severity == "CRITICAL") |
"\(.vulnerabilityID): \(.title)"' -
Check exploitability:
- Review CVE details at nvd.nist.gov
- Check if vulnerability is remotely exploitable
- Determine if affected component is exposed to network
-
Assess impact:
- Is the vulnerable package actually used by the application?
- What's the blast radius if exploited?
- Are there compensating controls (network policies, RBAC)?
3. Remediation Strategies
Strategy A: Update Container Image (Preferred)
Best for: Vendor-maintained images (ArgoCD, Grafana, Prometheus)
# Check current image version
kubectl get deployment -n <namespace> <name> -o jsonpath='{.spec.template.spec.containers[0].image}'
# Check for newer image versions
# For Helm charts:
helm search repo <chart-name> --versions | head -10
# Update Helm chart version in ArgoCD Application
vim manifests/applications/<app>.yaml
# Update targetRevision to latest stable version
# Create PR and deploy
git add manifests/applications/<app>.yaml
git commit -m "fix: Update <app> to address CVE-YYYY-XXXXX"
git push
gh pr create
Strategy B: Rebuild Custom Images
Best for: Custom applications and images you control
# Update base image in Dockerfile
FROM debian:12-slim # Update to latest stable base image
# Rebuild and push
docker build -t <registry>/<image>:<new-tag> .
docker push <registry>/<image>:<new-tag>
# Update Kubernetes manifest
kubectl set image deployment/<name> <container>=<registry>/<image>:<new-tag> -n <namespace>
Strategy C: Accept Risk (Temporary)
Only when:
- No patch available from vendor
- Vulnerability not exploitable in your environment
- Critical business application cannot be updated immediately
Document in issue tracker:
## Accepted Risk: CVE-YYYY-XXXXX in <component>
**Severity:** CRITICAL
**Affected:** <namespace>/<workload>
**Reason:** [No patch available / Business critical / Isolated environment]
**Compensating Controls:**
- Network policy restricts ingress to pod
- RBAC limits pod permissions
- WAF/ingress filtering applied
**Remediation Plan:** Upgrade to <version> when available (ETA: <date>)
**Review Date:** <date>
4. Post-Remediation Verification
After applying fixes:
# Force Trivy rescan (Trivy rescans every 24h by default)
kubectl delete vulnerabilityreport -n <namespace> <report>
# Trivy will automatically regenerate the report
# Wait 2-3 minutes for scan to complete, then verify
kubectl get vulnerabilityreport -n <namespace> -o json | \
jq '.items[] | {name: .metadata.name, critical: .report.summary.criticalCount, high: .report.summary.highCount}'
# Check Grafana dashboard for updated metrics
Common Vulnerability Scenarios
Scenario 1: Base OS Package Vulnerabilities (Debian/Alpine)
Example: CVE-2024-37371 (Kerberos vulnerability in Debian base image)
Root Cause: Outdated base image layer
Remediation:
# Update base image to latest patch version
FROM debian:12.8-slim # Instead of debian:11-slim
# Or switch to distroless for minimal attack surface
FROM gcr.io/distroless/base-debian12
Scenario 2: Go/Python Library Vulnerabilities
Example: CVE-2024-45337 (golang.org/x/crypto SSH vulnerability)
Root Cause: Outdated Go module dependencies
Remediation:
# Update Go dependencies
go get -u golang.org/x/crypto@latest
go mod tidy
# Rebuild application
docker build -t <image>:<new-tag> .
Scenario 3: Exposed Secrets in Container Images
CRITICAL - IMMEDIATE ACTION REQUIRED
Example: AWS credentials, API keys, passwords in image layers
Remediation:
-
Immediately rotate exposed credentials
-
Remove secret from image:
# Use Kubernetes Secrets instead
kubectl create secret generic <name> --from-literal=api-key=<value>
# Mount as environment variable
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: <name>
key: api-key -
Rebuild image without secrets
-
Review git history - ensure secrets never committed to source
Scenario 4: Third-Party Helm Chart Vulnerabilities
Example: Synology CSI driver with 5 CRITICAL vulnerabilities
Remediation:
# Check for chart updates
helm search repo synology-csi --versions
# If no update available, check upstream GitHub
# File issue: https://github.com/SynologyOpenSource/synology-csi/issues
# Temporary mitigation:
# - Apply network policies to restrict CSI pod access
# - Monitor for suspicious activity in CSI pods
Vulnerability Remediation Priorities
Priority 1: CRITICAL - Act within 24 hours
- Exposed secrets in images
- Remotely exploitable RCE vulnerabilities
- Privilege escalation in cluster-facing components (API server, kubelet)
Priority 2: HIGH - Act within 1 week
- High severity vulnerabilities in internet-facing services
- Container escape vulnerabilities
- Authentication bypass in exposed services
Priority 3: MEDIUM - Act within 1 month
- Medium severity vulnerabilities with no known exploits
- Vulnerabilities in internal-only services
- Denial of Service vulnerabilities
Priority 4: LOW - Best effort
- Low severity vulnerabilities
- Vulnerabilities in unused code paths
- Informational findings
Compliance and Reporting
Weekly Vulnerability Review
# Generate weekly vulnerability summary
kubectl get vulnerabilityreports -A -o json | \
jq -r '["NAMESPACE","RESOURCE","CRITICAL","HIGH","MEDIUM"],
(.items[] | [
.metadata.namespace,
.metadata.labels."trivy-operator.resource.name",
(.report.summary.criticalCount // 0),
(.report.summary.highCount // 0),
(.report.summary.mediumCount // 0)
]) | @tsv' | column -t
# Track remediation progress
# Compare against previous week's counts
Monthly Compliance Reports
# CIS Kubernetes Benchmark status
kubectl get clustercompliancereport k8s-cis-1.23 -o json | \
jq '{
title: .spec.title,
passCount: (.status.summary.passCount // 0),
failCount: (.status.summary.failCount // 0)
}'
# NSA Kubernetes Hardening Guidance
kubectl get clustercompliancereport k8s-nsa-1.0 -o json | \
jq '{
title: .spec.title,
passCount: (.status.summary.passCount // 0),
failCount: (.status.summary.failCount // 0)
}'
Preventive Measures
1. Image Selection Best Practices
- Prefer official images: Use official vendor images (e.g.,
prom/prometheus,grafana/grafana) - Use minimal base images: Prefer
alpineordistrolessover full Debian/Ubuntu - Pin specific versions: Avoid
:latesttag, use semantic versioning - Verify image signatures: Use Cosign for image signature verification
2. Continuous Scanning
Trivy automatically scans:
- On deployment: New images scanned within minutes
- Daily rescans: All images rescanned every 24 hours
- Compliance checks: Daily CIS/NSA compliance assessment
3. Automated Updates
# Configure Renovate Bot for automated dependency updates
# .github/renovate.json
{
"extends": ["config:base"],
"kubernetes": {
"fileMatch": ["manifests/.+\\.yaml$"]
},
"helm-values": {
"fileMatch": ["manifests/base/.+/values\\.yaml$"]
}
}
Troubleshooting
Trivy Scan Failures
# Check Trivy Operator logs
kubectl logs -n trivy-system deployment/trivy-operator --tail=100
# Check Trivy server logs
kubectl logs -n trivy-system statefulset/trivy-server --tail=100
# Manually trigger scan
kubectl delete vulnerabilityreport -n <namespace> <report>
False Positives
If Trivy reports a vulnerability that doesn't apply:
-
Verify the finding:
kubectl get vulnerabilityreport <name> -o json | \
jq '.report.vulnerabilities[] | select(.vulnerabilityID == "CVE-YYYY-XXXXX")' -
Check if package is actually used:
# Exec into container and verify
kubectl exec -n <namespace> <pod> -- dpkg -l | grep <package> -
Create exception if confirmed false positive:
# Add to trivy-operator values.yaml
trivyOperator:
ignoreUnfixed: true
ignoreVulnerabilities:
- CVE-YYYY-XXXXX # Document reason
Resources
- Trivy Documentation: aquasecurity.github.io/trivy/
- CVE Database: nvd.nist.gov/
- Kubernetes Security Best Practices: kubernetes.io/docs/concepts/security/
- CIS Kubernetes Benchmark: www.cisecurity.org/benchmark/kubernetes
- NSA/CISA Hardening Guide: media.defense.gov/2022/Aug/29/2003066362/-1/-1/0/CTR_KUBERNETES_HARDENING_GUIDANCE_1.2_20220829.PDF
Current Cluster Status
Production Data (2026-01-12, post-Major Remediation Day):
| Severity | Count | Change from Initial (2026-01-05) | Remediation Impact |
|---|---|---|---|
| CRITICAL | 10 | ⬇️ -43 (-81%) | Multiple components remediated |
| HIGH | 332 | ⬇️ -422 (-56%) | Major sidecar and app updates |
| MEDIUM | 1,074 | ⬇️ -425 (-28%) | Cluster-wide improvements |
| TOTAL | ~1,416 | ⬇️ -890 (-39%) | Substantial security improvement ✅ |
Note: Major vulnerability remediation completed across multiple sessions. Only 10 CRITICAL vulnerabilities remain, all blocked on upstream vendor releases.
Vulnerability Trend: ⬇️ EXCELLENT (81% CRITICAL reduction achieved)
2026-01-11: Major Remediation Day Results
| Component | CRITICAL Before | CRITICAL After | HIGH Before | HIGH After |
|---|---|---|---|---|
| ArgoCD Redis | 3 | 0 ✅ | 34 | 0 ✅ |
| MetalLB FRR | 8 | 0 ✅ | 84 | 10 |
| Blackbox Exporter | 2 | 0 ✅ | 7 | 0 ✅ |
| SNMP Exporter | 2 | 0 ✅ | 6 | 0 ✅ |
| External-DNS | 1 | 0 ✅ | 7 | 1 |
| Snapshot Controller | 1 | 0 ✅ | 8 | 4 |
| CSI Snapshotter | 1 | 0 ✅ | 6 | 2 |
PRs Merged: #203, #205, #206, #207, #208, #209, #211, #212
2026-01-12: Synology CSI v1.2.1 Fix
- Issue: v1.2.1 node plugin had iscsiadm mount regression
- Solution: Added
--chroot-dir=/hostand--iscsiadm-path=/usr/sbin/iscsiadmflags - PR #216: Successfully deployed, all PVC mounts working
- Impact: Node plugin now on v1.2.1 (was rolled back to v1.2.0)
Note: CRITICAL count unchanged as v1.2.0 and v1.2.1 share same base image vulnerabilities
Recent Remediation Actions:
-
2026-01-07 (evening): Promtail upgraded to 6.17.1 (app version 3.0.0 → 3.5.1)
- CRITICAL: 7 → 0 (100% elimination) ✅
- HIGH: 34 → 4 (88% reduction) ✅
- Deployment: Rolling update, 3 minutes, zero downtime
- Status: All 5 pods running successfully
- Cluster impact: 43 → 38 CRITICAL
-
2026-01-07 (late evening): Synology CSI sidecars upgraded
- csi-attacher: v4.0.0 → v4.10.0
- csi-node-driver-registrar: v2.3.0 → v2.15.0
- csi-snapshotter: v4.2.1 → v7.0.2
- synology-csi (node): v1.2.0 → v1.2.1
- Component CRITICAL: 13 → 11 (15% reduction, remaining in vendor base image)
- Component HIGH: 163 → 49 (70% reduction) ✅
- Deployment: Rolling updates across 3 StatefulSets/DaemonSets, all nodes updated successfully
- Verification: All PVCs remain Bound, test volume provisioning successful
- Cluster impact: 38 → 28 CRITICAL, 600 → 428 HIGH
The dramatic reduction in vulnerabilities demonstrates the effectiveness of targeted remediation of high-priority components.
Key CVEs to address:
- CVE-2024-37371: Kerberos GSS (affects multiple base images) - High Priority
- CVE-2024-41110: Docker/Moby authorization bypass - High Priority
- CVE-2024-45337: Golang SSH vulnerability - Medium Priority
- CVE-2024-24790: Golang net/netip issue - Medium Priority
Remediation Progress Tracking:
| Component | CRITICAL (Before → After) | HIGH (Before → After) | Remediation Status | Completion Date |
|---|---|---|---|---|
| Promtail | 7 → 0 ✅ | 34 → 4 ✅ | 🟢 Completed | 2026-01-07 |
| Synology CSI Sidecars | 8 → 0 ✅ | 114 → 20 ✅ | 🟢 Completed | 2026-01-07 |
| ArgoCD Redis | 3 → 0 ✅ | 34 → 0 ✅ | 🟢 Completed | 2026-01-11 |
| MetalLB FRR | 8 → 0 ✅ | 84 → 10 ✅ | 🟢 Completed | 2026-01-11 |
| Blackbox Exporter | 2 → 0 ✅ | 7 → 0 ✅ | 🟢 Completed | 2026-01-11 |
| SNMP Exporter | 2 → 0 ✅ | 6 → 0 ✅ | 🟢 Completed | 2026-01-11 |
| External-DNS | 1 → 0 ✅ | 7 → 1 ✅ | 🟢 Completed | 2026-01-11 |
| Snapshot Controller | 1 → 0 ✅ | 8 → 4 ✅ | 🟢 Completed | 2026-01-11 |
| CSI Snapshotter | 1 → 0 ✅ | 6 → 2 ✅ | 🟢 Completed | 2026-01-11 |
| Synology CSI (base image) | 9 | 27 | 🔴 Blocked | Awaiting v1.2.2 |
| Trivy Server | 1 | 7 | 🔴 Blocked | Awaiting Alpine fix |
Remediation Results:
Promtail (2026-01-07 evening):
- Version Upgrade: Helm chart 6.16.6 → 6.17.1 (app version 3.0.0 → 3.5.1)
- CRITICAL Reduction: 100% (7 → 0) - All CRITICAL CVEs eliminated ✅
- HIGH Reduction: 88% (34 → 4) - Reduced from 34 to just 4 HIGH CVEs ✅
- Deployment: Rolling update completed successfully in 3 minutes
- Verification: All 5 pods running, logs flowing, metrics available
- Resource Usage: Memory 26-35Mi (well under 128Mi limit)
- Cluster Impact: Cluster-wide CRITICAL count reduced from 43 → 38
Synology CSI Sidecars (2026-01-07 late evening):
-
Component Upgrades (Final State):
- ✅ csi-attacher: v4.0.0 → v4.10.0
- ✅ csi-node-driver-registrar: v2.3.0 → v2.15.0
- ✅ csi-snapshotter: v4.2.1 → v7.0.2
- ⚠️ synology-csi (node): v1.2.0 → v1.2.1 → v1.2.0 (ROLLED BACK)
- ✅ synology-csi (controller/snapshotter): v1.2.1 (unchanged)
-
Issue Encountered:
-
After upgrading synology-csi node plugin to v1.2.1, new iSCSI volume mounts failed with:
env: can't execute 'iscsiadm': No such file or directory (exit status 127) -
Grafana pod unable to start (stuck mounting PVC)
-
Existing PVCs mounted before upgrade remained functional
-
Root cause: v1.2.1 container regression - cannot find
iscsiadmon host for new mounts -
Resolution: Hotfix PR #201 rolled back node plugin to v1.2.0, Grafana restored successfully
-
-
CRITICAL Reduction: 15% (13 → 11) - Upgraded sidecars now 0 CRITICAL, remaining 11 in vendor base image
-
HIGH Reduction: 70% (163 → 49) - Major reduction in sidecar vulnerabilities ✅
-
Deployment: Partial upgrade successful:
- ✅ Controller StatefulSet: 3 sidecar containers updated (csi-attacher, csi-provisioner, csi-resizer)
- ✅ Node DaemonSet: csi-node-driver-registrar v2.15.0, synology-csi v1.2.0 (rolled back)
- ✅ Snapshotter StatefulSet: csi-snapshotter v7.0.2
-
Verification:
- All 9 CSI pods Running successfully with rollback
- All 4 PVCs Bound and accessible (Prometheus 50Gi, Grafana 5Gi, Loki 20Gi, Trivy 5Gi)
- Grafana pod fully operational after rollback
- CSI driver registration confirmed
-
Cluster Impact: Cluster-wide CRITICAL reduced 38 → 28, HIGH reduced 600 → 428
-
Resolved Issue (2026-01-12): synology-csi v1.2.1 node plugin iscsiadm regression fixed by adding
--chroot-dir=/hostand--iscsiadm-path=/usr/sbin/iscsiadmflags (PR #216). All nodes now running v1.2.1 with working iSCSI mounts.
Next Steps:
- Trivy Operator deployed and operational
- Monitoring and alerting confirmed working
- Review and update Promtail to latest version (Priority 1) ✅ COMPLETED 2026-01-07
- Check for Synology CSI driver updates (Priority 1) ✅ COMPLETED 2026-01-12
- Sidecars successfully upgraded (csi-attacher, csi-node-driver-registrar, csi-snapshotter)
- Node plugin upgraded to v1.2.1 with iscsiadm-path fix (PR #216)
- Remaining 9 CRITICAL in base image - awaiting upstream v1.2.2 release
- Review and update ArgoCD Redis (Priority 2) ✅ COMPLETED 2026-01-11
- Chart upgraded 9.0.5 → 9.2.4, Redis 8.2.2-alpine
- All 3 CRITICAL and 34 HIGH vulnerabilities eliminated
- Major Remediation Day (2026-01-11) ✅ COMPLETED
- MetalLB, Blackbox Exporter, SNMP Exporter, External-DNS upgraded
- Snapshot Controller and CSI Snapshotter upgraded to v8.x
- 18 CRITICAL vulnerabilities eliminated in single day
- Monitor upstream releases:
- Synology CSI v1.2.2 (fixes 9 CRITICAL) - check GitHub releases
- Trivy Server Alpine base image update (fixes 1 CRITICAL)
- Implement automated update pipeline (Renovate Bot)