Homelab

A custom server environment based on Linux and Proxmox, used for hosting various services.

About project

The project was created to learn how to design and maintain infrastructure independently. The setup is based on the Proxmox ecosystem (Virtual Environment and Backup Server) and serves as the backend for personal projects, self-hosted services, and systems administration practice. The environment allows for experimentation with containers, virtual machines, and various utility applications, all while maintaining full control over security and service access. It is primarily used by me, with occasional access for a few other users.

Hardware

Rear view

  • TP-Link ER605 v2 router acting as the network gateway.
  • TP-Link 5p LS1005G unmanaged switch.
  • Main Server: Lenovo ThinkCentre M720q Tiny, equipped with an i5-8500T CPU, 32 GB RAM, 500 GB NVMe, and 1 TB SATA SSD. I chose it for its low power consumption (~12.5W idle) paired with relatively high performance, allowing for 24/7 operation without significant energy costs.
  • Backup Server: Lenovo ThinkCentre M910q Tiny, equipped with an i3-7100T CPU, 16 GB RAM, 256 GB NVMe, and 500 GB SATA SSD.
  • Custom 3D-printed rack.

Environment Architecture

The virtualization platform is Proxmox Virtual Environment. Services are hosted in Linux or Docker containers, or VMs. LXC containers are used for their low resource overhead, while VMs are deployed where full isolation is required.

Network

The network is logically divided into VLANs that separate user traffic from server traffic. Inter-VLAN traffic is not filtered at the router level due to software limitations. Access control is implemented at the host level (PVE firewall), where specific machines have restrictive communication rules.

Example PVE firewall configuration for a public service: Additionally Input and Output Policy is set on Drop.

I experimented with deploying Omada Controller as a VM in Proxmox to unlock full traffic filtering features. However, I encountered an issue: the router adoption process forces a factory reset, which, given my homelab's IP addressing (192.168.2.0/24), causes a loss of connectivity with the controller.

GitOps for Docker services

In search of an easy way to update all Docker services, I implemented a simplified version of GitOps in my homelab. The stack update workflow is as follows:

  1. A new version of a Docker image is released.
  2. Renovate opens a pull request.
  3. I review and merge the pull request.
  4. Portainer polls the repo every 2 minutes:
    • If changes are detected, it pulls the latest docker-compose.yml from the repository.
    • A redeployment of the entire stack is then performed.

Portainer operates in polling mode because it is more secure than using a webhook, as it doesn't require exposing a public endpoint.

In this configuration, the entire process of detecting new versions and later deployment is fully automated.
My only task during an update is to check and merge the pull request.

If a service fails to start on a new version for any reason, I can easily roll back to the previous image by reverting the pull request.

Initially, all docker-compose.yml files were in a monorepo, but this caused an issue where Portainer would redeploy all existing stacks every time the repository was updated. Therefore, each Docker-hosted service now has its own GitHub repository.

Selected repositories are open source and available on GitHub in my organization.

I only store .env.example files in the repositories. Secrets and environment variables are configured directly within Portainer during the initial stack deployment. This ensures that sensitive data does not leak into the GitHub commit history.

With automated updates, stacks are updated much more frequently, leading to more images being downloaded. This created a new problem of local storage being filled with unused images.
To solve this, I implemented a cron job (docker image prune -af) that removes unused images daily.

Service Access

Only selected services are accessible from the internet; in the case of websites, they are exposed via Cloudflare Tunnel. When a tunnel isn't a viable option and a port must be opened, I do not expose my router's public IP address. Instead, I use an external reverse-proxy provider to minimize the impact of DoS/DDoS (Denial of Service/Distributed Denial of Service) attacks.

Access to private services that I do not want to expose directly to the internet is handled via the Tailscale VPN. This allows me to access them from anywhere in the world without compromising their security. For even greater security, I have restricted VPN access only to the VLAN segment dedicated to the homelab.

SSL Certificates

Every service in my homelab is assigned its own subdomain with a Let's Encrypt SSL certificate.
SSL certificates are generated using a DNS challenge. This allows services to remain inaccessible from the internet while still having a trusted SSL certificate.
Certificates are managed by a reverse proxy or directly by the specific service.

Monitoring and Notifications

Monitoring system currently consists of two parts:

  1. External monitoring: BetterStack monitors the availability of my public services and sends notifications about incidents.
  2. Local monitoring: For monitoring hosts, containers, and services within the network, I implemented Pulse. I chose this solution due to its simplicity and low resource consumption compared to Prometheus/Grafana.

Notifications about critical system incidents are sent directly to my email address.

In future with the increase in the scale of my homelab, I will consider implementing Grafana.

Email Infrastructure

To ensure reliable delivery of system notifications and professional handling of multiple email addresses in my domain, I implemented a hybrid routing model.

This solution combines two services:

  1. Cloudflare Email Routing: Handles inbound traffic. It allows for the creation of an unlimited number of aliases, which are automatically forwarded to my private inbox.

  2. Mailgun: Acts as a full SMTP server for sending notifications from services running within the homelab.

Why this configuration?
Combining these services solves two key challenges of maintaining a reliable email infrastructure:

  • Zero operational costs: I utilize the free tiers of both services (including up to 3,000 monthly messages in Mailgun), allowing for professional email handling at no cost.
  • Simplicity and Reliability: I don't have to configure or maintain a full mail stack, so I don't need to worry about deliverability, and system notifications don't end up in spam.

Backups

The backup system in my homelab is powered by Proxmox Backup Server.
I implemented this solution to solve the issue of backing up a large LXC container running Docker (~330 GB).
Thanks to deduplication and incremental backups, 8 days of backups take up only 335 GB instead of the expected ~2.5 TB. The current incremental setup allows me to store 7 daily and 2 weekly backups for this container.

To optimize energy consumption, the PBS server does not run 24/7.
It is started using Wake-on-LAN via a cron job before backup tasks begin. Once the backup jobs are finished, a hook script checks if there are any active tasks left on the PBS server; if not, it shuts the server down.

...output omitted...
INFO: Finished Backup of VM 103 (00:21:22)
INFO: Backup finished at 2026-03-19 03:34:03
INFO: No active tasks, shutting down PBS...
...output omitted...
Hook Script Implementation
#!/bin/bash
 
SSH_KEY=
IP=
 
if [ "$1" = "job-end" ]; then
    while true; do
        TASKS=$(ssh -i "$SSH_KEY" root@"$IP" "proxmox-backup-manager task list --output-format json")
 
        if [ "$TASKS" = "[]" ]; then
            echo "No active tasks, shutting down PBS..."
            ssh -i "$SSH_KEY" root@"$IP" "poweroff < /dev/null &"
            break
        fi
 
        echo "Tasks still running, waiting 60s..."
        sleep 60
    done
fi
 
exit 0

I also use manual snapshots before making major changes.

Troubleshooting

The Daylight Saving Time Trap

In the example below, during the transition from CET to CEST, the Wake-on-LAN and backup tasks (originally scheduled for 02:22 and 03:00) were triggered simultaneously at 03:00. This resulted in a race condition where the PBS server hadn't finished booting before the backup job attempted to connect.

INFO: skipping disabled matcher 'default-matcher'
INFO: notified via target `notification`
TASK ERROR: could not activate storage 'pbs': pbs: error fetching datastores - 500 Can't connect to pbs.kubamichalowski.pl:8007 (No route to host)

To resolve this and ensure reliability year-round, I shifted the schedule forward by one hour:

  • WOL: 02:2203:22
  • Backup: 03:0004:00

This ensures that regardless of the current time zone offset, all tasks are executed in the correct sequence.

Network Interface Hang

I encountered an issue where services would intermittently stop responding and the server would drop off the network, despite remaining physically powered on. Kernel log analysis revealed a specific driver issue:

Dec 11 21:21:54 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <51>
  TDT                  <64>
  next_to_use          <64>
  next_to_clean        <50>
buffer_info[next_to_clean]:
  time_stamp           <1052d01c3>
  next_to_watch        <51>
  jiffies              <1052da640>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <8ff>
PHY Extended Status    <3000>
PCI Status             <10>

Instead of manually adjusting ethtool parameters, I utilized a script from community-scripts.org. This script automatically created a systemd service that disables problematic offloading features upon every boot.

...output omitted...
  ℹ️   Searching for Intel e1000e and e1000 interfaces...
  ✔️   Found 1 Intel e1000e/e1000 interfaces
  ✔️   Selected interface: eno1 (e1000e)
  ℹ️   Creating systemd service for interface: eno1 (e1000e)...
  ✔️   Service for eno1 (e1000e) created and enabled!
  ℹ️     Service: disable-nic-offload-eno1.service...
  ℹ️     Status: Active...
  ℹ️     Start on boot: Enabled...
  ✔️   Intel e1000e/e1000 optimization complete for 1 interface(s)!
 
  ℹ️   Verification commands:...
  ethtool -k eno1 # Check offloading status
  systemctl status disable-nic-offload-eno1.service # Check service status

Post-optimization, I verified the offloading status to ensure the fix was persistent:

root@pve:~# ethtool -k eno1
Features for eno1:
rx-checksumming: off
tx-checksumming: off
tcp-segmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
...output omitted...

What I've learned

  • Automation saves time - thanks to GitOps, keeping about 20 containers up to date, now takes me seconds (PR merge) instead of dozens of minutes of manual work.
  • Cost optimization - effective hardware selection for hosting various services while maintaining low power consumption.
  • Effective backup - PBS has demonstrated that data deduplication and incremental backups are essential when disk space is limited.

Roadmap

The following list is a backlog of planned improvements, implemented based on priority and available time:

  • Refining the monitoring system
  • Implementing key-based SSH access
  • Adding an uninterruptible power supply

Summary

This homelab is the result of two years of learning through practice. Despite some compromises, I have managed to create a stable, secure, and highly automated environment that is, most importantly, fully understood by me as the administrator. It supports my daily work on personal projects, and every problem solved brings this environment closer to a professional-grade infrastructure.

Related Links