8 min read
Enterprise Homelab: K3s, Authelia & Longhorn on Proxmox with Terraform

Most Kubernetes homelab guides stop at “kubectl get pods” and call it a day. This one doesn’t.

This article documents a full production-grade homelab stack: three K3s nodes provisioned via Terraform on Proxmox, GitOps-managed with ArgoCD, persistent storage via Longhorn, and Authelia as a proper SSO gateway in front of every service. The kind of setup you’d actually trust to run real workloads.

It also documents every painful mistake along the way — because that’s the part nobody writes about.

The Stack

Proxmox (Bare Metal)
└── Terraform (proxmox provider)
    ├── vm-srv-k3s-11 (Master,  10.0.20.11, VLAN 20)
    ├── vm-srv-k3s-12 (Worker,  10.0.20.12, VLAN 20)
    └── vm-srv-k3s-13 (Worker,  10.0.20.13, VLAN 20)

        └── K3s Cluster
            ├── ArgoCD         (GitOps controller)
            ├── Traefik        (Ingress + TLS termination)
            ├── cert-manager   (Wildcard cert via Let's Encrypt)
            ├── MetalLB        (Bare-metal LoadBalancer)
            ├── Longhorn       (Distributed block storage)
            ├── Authelia       (SSO + 2FA gateway)
            └── Vaultwarden    (Self-hosted Bitwarden)

Everything is managed as code. The VMs are Terraform resources. The cluster applications are ArgoCD Applications pointing at a Git repository. No manual helm install, no imperative kubectl apply in production.

Provisioning the Nodes with Terraform

Each K3s node is a full VM clone from a template (VM ID 9000) on Proxmox, provisioned via the proxmox_virtual_environment_vm provider:

resource "proxmox_virtual_environment_vm" "vm_srv_k3s_11_master" {
  vm_id     = 211
  name      = "vm-srv-k3s-11"
  node_name = local.target_node
  tags      = ["k3s", "master", "kubernetes"]

  clone {
    vm_id = 9000
    full  = true
  }

  cpu    { cores = 4; type = "host" }
  memory { dedicated = 8192 }

  disk {
    datastore_id = local.storage
    interface    = "scsi0"
    size         = 40
    file_format  = "raw"
  }

  network_device {
    bridge  = "vmbr0"
    vlan_id = 20  # Dedicated server VLAN
  }

  initialization {
    ip_config {
      ipv4 { address = "10.0.20.11/24"; gateway = "10.0.20.1" }
    }
    dns { servers = ["10.0.20.5"] }
    user_account {
      username = "dw"
      keys     = ["ssh-ed25519 ..."]
    }
  }
}

cpu.type = "host" passes through the host CPU flags directly — important for Longhorn’s checksumming and for any workload that benefits from AVX instructions. Don’t use the default kvm64 if you’re running real workloads.

Three identical worker definitions follow the same pattern with IPs .12 and .13.

Mistake 1: Docker Hub Rate Limits

The first thing that breaks on a fresh K3s cluster: Docker Hub rate limits.

Pods start appearing with ErrImagePull or ImagePullBackOff. Not because the images don’t exist — because Docker Hub has silently throttled anonymous pulls. In a homelab where you’re constantly tearing down and rebuilding, you hit the limit fast.

The fix: switch image sources entirely for the affected images.

  • Bitnami images (Postgres, Redis) → public.ecr.aws/bitnami/... (Amazon’s public registry, no rate limits)
  • Autheliaghcr.io/authelia/authelia:latest (GitHub Container Registry, generous limits)

This should be in every K3s getting-started guide. It isn’t.

Mistake 2: Longhorn & iSCSI on WSL

Longhorn requires the iSCSI protocol on the host to mount virtual block devices into containers. On a standard WSL Ubuntu installation, the iSCSI daemon is missing.

Symptom: pods stuck in ContainerCreating forever. Longhorn volumes stay Detached or report volume is not ready for workloads.

Fix — run this on every node (including WSL host if applicable):

sudo apt-get install -y open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid

Without this, Longhorn physically cannot attach its virtual disks to the nodes. The error messages are cryptic enough that most people spend hours debugging the wrong thing.

The ArgoCD Application for Longhorn itself is straightforward once iSCSI is working:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: longhorn
  namespace: argocd
spec:
  source:
    repoURL: https://charts.longhorn.io
    targetRevision: 1.6.1
    chart: longhorn
    helm:
      values: |
        preUpgradeChecker:
          jobEnabled: false
  destination:
    server: https://kubernetes.default.svc
    namespace: longhorn-system
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

preUpgradeChecker.jobEnabled: false disables the pre-upgrade check job that fires on every ArgoCD sync and clutters your logs.

Traefik: TLS Termination at the Edge

Traefik runs in kube-system managed by ArgoCD, with HTTP-to-HTTPS redirect enforced at the ingress level and a wildcard certificate from cert-manager as the default TLS store:

source:
  repoURL: https://helm.traefik.io/traefik
  targetRevision: 27.0.2
  chart: traefik
  helm:
    values: |
      ports:
        web:
          redirectTo:
            port: websecure
        websecure:
          tls:
            enabled: true
      ingressRoute:
        dashboard:
          enabled: false
      tlsStore:
        default:
          defaultCertificate:
            secretName: wildcard-woitzik-dev-tls

The Traefik dashboard is disabled — it exposes too much information to be left on in a production-adjacent setup. Access it via kubectl port-forward if you need it.

Mistake 3: Authelia’s Five Failure Modes

Authelia is the most opinionated component in this stack. It fails hard and fast on configuration errors, which is actually good — but the error messages aren’t always obvious.

Here’s the full working Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: authelia
  namespace: apps
spec:
  replicas: 1
  selector:
    matchLabels:
      app: authelia
  template:
    metadata:
      labels:
        app: authelia
    spec:
      enableServiceLinks: false  # Critical — see Failure Mode 1
      containers:
      - name: authelia
        image: ghcr.io/authelia/authelia:latest
        ports:
        - containerPort: 9091
        volumeMounts:
        - name: config
          mountPath: /config
        - name: secrets
          mountPath: /config/secrets
          readOnly: true
        env:
        - name: AUTHELIA_STORAGE_POSTGRES_PASSWORD_FILE
          value: /config/secrets/storage-password
        - name: AUTHELIA_SESSION_REDIS_PASSWORD_FILE
          value: /config/secrets/redis-password
      volumes:
      - name: config
        configMap:
          name: authelia-config
      - name: secrets
        secret:
          secretName: authelia-secrets

Failure Mode 1: enableServiceLinks: false

Kubernetes automatically injects environment variables for every Service in the namespace — including AUTHELIA_PORT, AUTHELIA_PORT_9091_TCP, and others. These collide directly with Authelia’s own configuration keys and cause a fatal startup error. The fix: enableServiceLinks: false disables this injection entirely.

Failure Mode 2: The Read-Only Filesystem

The notifier in Authelia’s configuration needs a writable path to write notification files (used for password reset emails in filesystem mode). The /config directory is mounted from a ConfigMap — which is read-only by design in Kubernetes.

Wrong:

notifier:
  filesystem:
    filename: '/config/notification.txt'  # ConfigMap = read-only = crash

Correct:

notifier:
  filesystem:
    filename: '/tmp/notification.txt'  # /tmp is always writable in containers

Failure Mode 3: Backend DNS Names

Authelia connects to Postgres and Redis using Kubernetes internal DNS. The full service DNS format in a multi-namespace cluster is:

session:
  redis:
    host: 'redis-authelia.database.svc.cluster.local'
    port: 6379

storage:
  postgres:
    address: 'tcp://postgres-authelia.database.svc.cluster.local:5432'
    database: 'authelia'
    username: 'authelia'

Short names like redis-authelia only work within the same namespace. Since Authelia lives in apps and the databases in database, the fully qualified name is required.

Failure Mode 4: YAML Corruption via Terminal Paste

Large YAML blocks pasted via cat <<EOF into a terminal buffer get silently truncated or corrupted. Authelia then crashes with a fatal parse error mid-configuration. The symptom looks like a config bug but is actually a paste artifact.

Fix: always use nano or write files via kubectl create configmap --from-file=.... Never trust terminal paste for multi-hundred-line configs.

Failure Mode 5: The server.address Key

In current Authelia versions, the server bind address is configured as:

server:
  address: 'tcp://0.0.0.0:9091/'

Older guides use server.host and server.port separately. These keys are deprecated and cause a fatal error on startup in recent versions. If you’re copying config from a guide older than 6 months, double-check the key names against the current Authelia documentation.

The Full Authelia Configuration

server:
  address: 'tcp://0.0.0.0:9091/'

log:
  level: 'debug'

identity_validation:
  reset_password:
    jwt_secret: '/config/secrets/jwt-secret'

default_redirection_url: 'https://auth.yourdomain.com'

authentication_backend:
  file:
    path: '/config/users_database.yml'

session:
  name: 'authelia_session'
  domain: 'yourdomain.com'
  secret: '/config/secrets/session-secret'
  same_site: 'lax'
  expiration: '1h'
  inactivity: '5m'
  remember_me: '1M'
  redis:
    host: 'redis-authelia.database.svc.cluster.local'
    port: 6379
    database_index: 0

storage:
  encryption_key: '/config/secrets/storage-key'
  postgres:
    address: 'tcp://postgres-authelia.database.svc.cluster.local:5432'
    database: 'authelia'
    username: 'authelia'

notifier:
  filesystem:
    filename: '/tmp/notification.txt'

access_control:
  default_policy: 'one_factor'
  rules:
    - domain: 'auth.yourdomain.com'
      policy: 'bypass'

The Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: authelia-ingress
  namespace: apps
spec:
  ingressClassName: traefik
  rules:
  - host: auth.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: authelia
            port:
              number: 9091

The Result

After navigating Docker Hub rate limits, iSCSI daemons, Kubernetes service link injection, read-only ConfigMap filesystems, and deprecated configuration keys — the stack runs cleanly:

  • Every service behind Authelia SSO with Redis-backed sessions
  • Persistent storage via Longhorn distributed across three nodes
  • GitOps-managed via ArgoCD — every change is a Git commit
  • Wildcard TLS via cert-manager and Traefik
  • Zero manual kubectl apply in steady state

The entire infrastructure — from bare metal to running pods — is reproducible from a terraform apply and a Git repository.

If this level of network isolation and identity management sounds familiar from your corporate Azure environment, the same principles apply there — just with different primitives.

Wrapping Up

If this level of network isolation and identity management sounds familiar from your corporate Azure environment, the same principles apply there — just with different primitives. Check out the Enterprise Terraform Blueprints if you’re building for regulated environments.