Automating KVM Backups with Ansible

Automating KVM Homelab Backups with Ansible #

When you’re running a dozen virtual machines in your homelab, manual backups quickly become a nightmare.

In this post, I’ll walk you through my Ansible-based backup strategy for my KVM homelab. It automatically backs up all VMs by shutting them down gracefully, copying their disk images and configurations to a NAS, and bringing them back online.

Backup Strategy #

My backup strategy uses Ansible to orchestrate the entire process:

  1. Mount the backup destination - A CIFS share on my NAS
  2. Discover all VMs - List all running VMs on the hypervisor
  3. Process each VM sequentially - Shutdown, backup, restart
  4. Store both disk images and configuration - Complete recovery capability

The Ansible Playbook Structure #

The backup system consists of two main playbooks:

  • backup.yml - Main orchestration playbook
  • backup_process.yml - Individual VM backup process

Here’s what the backup workflow looks like:

flowchart TD
    A[Start Backup Process] --> B

    subgraph backup["backup.yml"]
        B[Mount CIFS Share]
        C[Validate Mount Point]
        D[Discover All VMs]
        E[Filter VM List]
        F{More VMs to Process?}
        B --> C
        C --> D
        D --> E
        E --> F
        F -->|Yes| G[Call backup_process.yml]
        F -->|No| M[Backup Complete]
    end

    subgraph process["backup_process.yml"]
        G --> H[Shutdown VM]
        H --> I[Wait for Shutdown]
        I --> J[Create Backup Directory]
        J --> K[Copy qcow2 Image]
        K --> L[Copy XML Configuration]
        L --> N[Start VM]
        N --> O[Return to Main Loop]
    end

    O --> F

    style A fill:#2596be

Main Backup Playbook #

Here’s the core structure of the main playbook:

---
- name: Backup VMs
  hosts: kvm.home.arpa
  become: true
  vars_files:
    - "../group_vars/all/vault.yml"
  vars:
    nas: "nas.home.arpa"

The playbook targets my KVM hypervisor and loads sensitive variables from an encrypted vault file. This keeps API tokens and credentials secure.

Setting Up the Backup Mount Point #

The first critical step is ensuring we can write to the backup destination:

- name: Ensure backup mount point exists in fstab
  ansible.posix.mount:
    path: /mnt/backups
    src: "//10.10.10.10/NAS/KVM"
    fstype: cifs
    state: present
    opts: "credentials=/root/.smbcredentials"

- name: Check if backup mount point is accessible
  ansible.builtin.stat:
    path: /mnt/backups
  register: mount_check

- name: Fail if backup mount is not accessible
  ansible.builtin.fail:
    msg: "Backup mount point /mnt/backups is not accessible"
  when: not mount_check.stat.exists

This ensures the CIFS share is properly mounted and accessible. If the mount fails, the playbook stops immediately rather than proceeding with a broken backup.

VM Discovery and Filtering #

Next, we discover all VMs and filter out ones we don’t want to backup:

- name: List all VMs
  community.libvirt.virt:
    command: list_vms
  register: vm_list
  changed_when: false

The community.libvirt.virt module discovers all the VMs.

Processing Each VM #

The magic happens in the loop that processes each VM:

- name: Process each VM one by one
  ansible.builtin.include_tasks: backup_process.yml
  loop: "{{ filtered_vm_list }}"
  loop_control:
    loop_var: item

This calls the backup process playbook for each VM sequentially. I do this because it would otherwise totally congest my network, as my backup network isn’t very ideal.

The Individual VM Backup Process #

The backup_process.yml playbook handles the actual backup of each VM:

Graceful Shutdown #

- name: Display VM being processed
  ansible.builtin.debug:
    msg: "Starting backup process for VM: {{ item }}"

- name: Shut down VM
  community.libvirt.virt:
    name: "{{ item }}"
    state: shutdown
  changed_when: false

- name: Wait for VM to shut down
  community.libvirt.virt:
    command: list_vms
    state: running
  register: vm_list_output
  until: item not in vm_list_output.list_vms
  retries: 30
  delay: 10
  changed_when: false

This performs a graceful shutdown and waits up to 5 minutes for the VM to stop completely. The until loop ensures we don’t proceed with the backup until the VM is fully shut down.

Backup Directory Structure #

- name: Ensure VM backup directory exists
  ansible.builtin.file:
    path: /mnt/backups/{{ item }}
    state: directory
    mode: '0755'

Each VM gets its own backup directory, keeping things organized and making restores easier.

Copying VM Assets #

- name: Copy qcow2 image to backup location
  ansible.builtin.command:
    cmd: >
      rsync -avh /var/lib/libvirt/images/{{ item }}.qcow2
      /mnt/backups/{{ item }}/{{ item }}.qcow2
  changed_when: true

- name: Copy VM configuration file to backup location
  ansible.builtin.command:
    cmd: >
      rsync -avh /etc/libvirt/qemu/{{ item }}.xml
      /mnt/backups/{{ item }}/{{ item }}.xml
  changed_when: true

I use rsync instead of the Ansible copy module for better performance. The -avh flags provide:

  • -a - Archive mode (preserves permissions, timestamps, etc.)
  • -v - Verbose output
  • -h - Human-readable file sizes

Both the disk image (.qcow2) and the VM configuration (.xml) are backed up.

Backup Retention #

As you might have noticed, I don’t keep historical backups. I run my backup automation weekly and each time rsync runs it overwrites the existing backup, and that’s OK for my situation.

If you need historical backups, you could modify the rsync commands to include timestamps in the backup directory names.

Bringing the VM Back Online #

- name: Start VM
  community.libvirt.virt:
    name: "{{ item }}"
    state: running
  changed_when: false

Finally, we start the VM back up. The entire process typically takes 2-5 minutes per VM depending on disk size.

Complete Playbooks #

For reference, here are the complete playbooks you can use in your own environment:

backup.yml - Main orchestration playbook
---
- name: Backup VMs
  hosts: kvm.home.arpa
  become: true
  vars_files:
    - "../group_vars/all/vault.yml"
  vars:
    nas: "nas.home.arpa"
  tasks:
    - name: Ensure backup mount point exists in fstab
      ansible.posix.mount:
        path: /mnt/backups
        src: "//10.10.10.10/NAS/KVM"
        fstype: cifs
        state: present
        opts: "credentials=/root/.smbcredentials"

    - name: Check if backup mount point is accessible
      ansible.builtin.stat:
        path: /mnt/backups
      register: mount_check

    - name: Fail if backup mount is not accessible
      ansible.builtin.fail:
        msg: "Backup mount point /mnt/backups is not accessible"
      when: not mount_check.stat.exists

    - name: List all VMs
      community.libvirt.virt:
        command: list_vms
      register: vm_list
      changed_when: false

    - name: Filter VMs to backup (exclude templates and test VMs)
      ansible.builtin.set_fact:
        filtered_vm_list: "{{ vm_list.list_vms | reject('match', '.*template.*') | reject('match', '.*test.*') | list }}"

    - name: Display VMs to be backed up
      ansible.builtin.debug:
        msg: "VMs to backup: {{ filtered_vm_list }}"

    - name: Process each VM one by one
      ansible.builtin.include_tasks: backup_process.yml
      loop: "{{ filtered_vm_list }}"
      loop_control:
        loop_var: item

    - name: Backup process completed
      ansible.builtin.debug:
        msg: "All VM backups completed successfully"
backup_process.yml - Individual VM backup process
---
- name: Display VM being processed
  ansible.builtin.debug:
    msg: "Starting backup process for VM: {{ item }}"

- name: Shut down VM
  community.libvirt.virt:
    name: "{{ item }}"
    state: shutdown
  changed_when: false

- name: Wait for VM to shut down
  community.libvirt.virt:
    command: list_vms
    state: running
  register: vm_list_output
  until: item not in vm_list_output.list_vms
  retries: 30
  delay: 10
  changed_when: false

- name: Ensure VM backup directory exists
  ansible.builtin.file:
    path: /mnt/backups/{{ item }}
    state: directory
    mode: '0755'

- name: Copy qcow2 image to backup location
  ansible.builtin.command:
    cmd: >
      rsync -avh /var/lib/libvirt/images/{{ item }}.qcow2
      /mnt/backups/{{ item }}/{{ item }}.qcow2
  changed_when: true

- name: Copy VM configuration file to backup location
  ansible.builtin.command:
    cmd: >
      rsync -avh /etc/libvirt/qemu/{{ item }}.xml
      /mnt/backups/{{ item }}/{{ item }}.xml
  changed_when: true

- name: Start VM
  community.libvirt.virt:
    name: "{{ item }}"
    state: running
  changed_when: false

- name: VM backup completed
  ansible.builtin.debug:
    msg: "Backup completed for VM: {{ item }}"