Hyperconverged Supermicro a2sdi-4c-hln4f

De TeriaHowto
Sauter à la navigation Sauter à la recherche

This howto aims at describing the choices and the build of a compact homelab with a hyperconverged chassis based on a Supermicro A2SDi-4C-HLN4F motherboard. The hypervisor OS will be Slackware64-15.0 (with Qemu/KVM for virtualization), the storage will be provided by a Truenas core VM (thanks to pci-passthrough) and network orchestrated by an OPNSense VM.


Motivations

Why such a motherboard with a modest 4 cores Intel Atom C3558 ? Let's see the advantages :

  • obviously, it's very compact (mini-itx form factor)
  • up to 256 GB ECC RDIMM RAM supported
  • the CPU has a very low TDP (~ 17W), so no need for a fancy and potentially noisy cooling solution
  • 4 * 1 Gb/s Ethernet ports (cool for a network appliance such as OPNSense and there's no need for 10 Gb/s for this project)
  • dedicated IPMI Ethernet port
  • the SATA ports are provided by two distinct PCIe lines (see below, very important for pci-passthrough and no need for an additionnal HBA card)
A2SDi system block diagram

Well, the system has drawbacks too :

  • the CPU power will not be extraordinary
  • it's not possible to put a fan directly on top of the CPU heatsink (more on that later)
  • no external USB 3.0 ports (ony one "internal", on the motherboard itself) ; this is desirable to plug an external HDD for local backups (USB 2.0 is too slow)

Hardware

2.5 or 3.5 inches drives ?

What is required for this project :

  • mini tower format chassis
  • Hypervisor OS on soft RAID 1 with two drives
  • at least 4 TB of encrypted data with a minimum of resilience (mirror vdevs or raidz1 / raidz2 ...)
  • it is recommended not to exceed 80% of the capacity of a ZFS filesystem, so, for 4 TB of usable data, 5 TB of raw capacity is required
  • all disks easily accessible from the front of the chassis

Choosing between 2.5 (SFF) or 3.5 (LFF) inches drives is not so easy. Obviously, for raw capacity over a gigabit network, LFF HDD are unbeatable in terms of price. In France (as of march 2021), it costs ~ 80 euros for a 2 TB NAS 3.5 LFF HDD ... Same price for a 1 TB NAS SFF HDD. Moreover, SFF HDD over 2 TB simply doesn't exist for consumer NAS systems (enterprise SFF HDDs exist but are way too expensive). LFF HDD are a clear winner ? What about chassis size, cooling, noise and iops ?

At this point, there are two choices :

  • 2 * 500 GB SFF SSD (RAID 1 for hypervisor and VMs OS) plus 2 * 6 TB LFF HDD (mirror vdev storage)
  • 2 * 500 GB SFF SSD (RAID 1 for hypervisor and VMs OS) plus 4 * 2 TB SFF SSD (raidz1 storage)

The most reasonable would be to use LFF HDDs but silent operation in a tiny chassis with decent iops are very important, so let's stick with the "unreasonable" configuration !

Cooling the CPU

As seen earlier, a standard CPU fan cannot be fitted on the heat sink. But we are in the 21st century and 3D printing is available, so why not a custom CPU fan shroud ? ^^

CPU fan shroud

Is it absolutely necessary ? I don't think so. Does it help ? Sure ! On a similar system located in an attic (so hot in summer), I attached the measured temperatures over the last months (see below). No more than 47 °C for the CPU ... Without the fan shroud, I think you should expect overall 5 °C more on average.

Temperatures

Final system

Hypervisor installation

BMC setup

First things first, taking control of the BMC. These days, Supermicro's BMC are configured by default for DHCP and dedicated Ethernet port. There is no more the "ADMIN" default password but a random one written on a sticker directly on the motherboard (along with the MAC address of the BMC). Be sure to remember these informations before plugging the dedicated port on a switch of your your LAN.

With the MAC address in mind, it should be easy to retrieve the IP address assigned by the DHCP server, or (better), you could assign a dedicated IP address to your BMC within the DHCP server.

Then I suggest you configure the BMC with a fixed IP address, not an IP attributed by a DHCP server. If your LAN goes berserk and you have to connect to your BMC, it will be easier. Two choices : through the WebUI (example below) or through the UEFI BIOS.

Webui BMC network configuration

With the BMC configured, you have full control of your system through your favorite Web browser. HTML5 web Console Redirection is a fantastic feature not often available for free with other vendors (such as Dell, HPE or Lenovo for example).

PXE boot

It's of course possible to install your favorite OS by booting thanks to a USB stick. Having already an operational homelab, PXE boot is even easier and quicker. With the Supermicro a2sdi-4c-hln4f, there is just one caveat : no legacy boot available, only UEFI (well, the manual and options within the BIOS suggest legacy should be possible but in reality it's not working at all).

Through the UEFI BIOS options, you should verify the boot order or press F12 during the boot of the system and choose UEFI Network.

Operating system (slackware64-15.0)

Preparations

Let's assume a Slackware Busybox prompt is available (resulting either from booting with a USB key or through network). The soft RAID 1 has to be configured at this stage, of course on the right disks ! (remember, the system is composed of 6 drives, 2 SSDs for OS and 4 SSDs for data).

The lsblk command is a good tool to retrive and identify the drives connected to a system. On the Supermicro a2sdi-4c-hln4f, the 4 drives connected to the mini-SAS port are identified first (sda, sdb, sdc, sdd), the 2 more drives connected to tradtionnal SATA ports come next (sde, sdf). It is on those two drives (sde, sdf) that the soft RAID 1 is going to be configured.

Reference documentation for this kind of setup :

I recommend two identical partitions per disk (sde and sdf) such as :

# fdisk -l /dev/sde
Disk /dev/sde: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC  WDS500G1R0A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 96BE3679-F270-4EA4-88FF-2935AE4DCB46

Device      Start       End   Sectors   Size Type
/dev/sde1    2048    534527    532480   260M EFI System
/dev/sde2  534528 976773134 976238607 465.5G Linux RAID

When you're done with one disk (e.g. sde), you can clone the partionning to another disk (e.g. sdf) with these commands :

sgdisk -R /dev/sdf /dev/sde # clone
sgdisk -G /dev/sdf # set a new GUID

Let's build the soft RAID 1 :

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sde2 /dev/sdf2

Checking disk synchronization :

watch -n2 cat /proc/mdstat

Saving the RAID configuration for system boot :

mdadm --examine --scan > /etc/mdadm.conf

Using LVM afterwards is a good option :

pvcreate /dev/md0
vgcreate <MY_VOLUME> /dev/md0
lvcreate -L+1g -nswap <MY_VOLUME>
lvcreate -L+4g -nroot <MY_VOLUME>
...

Then you can launch the famous Slackware setup command and proceed through the OS installation itself. Don't forget to create an initrd (thanks to mkinitrd_command_generator.sh which should choose the right options) and be sure to use elilo (or grub) as a bootloader.

Here is my elilo.conf as an example :

chooser=simple
delay=1
timeout=1
#
image=vmlinuz
        label=vmlinuz
        initrd=initrd.gz
        read-only
        append="root=/dev/system/root vga=normal ro consoleblank=0 transparent_hugepage=never hugepagesz=1G default_hugepagesz=1G hugepages=60 intel_iommu=on iommu=pt libata.allow_tpm=1 preempt=none"

Little explanations :

  • transparent_hugepage=never hugepagesz=1G default_hugepagesz=1G hugepages=60 : 60 hugepages of 1GB of RAM reserved for virtualization
  • intel_iommu=on iommu=pt : recommanded option for PCI devices passthrough
  • libata.allow_tpm=1 : mandatory option for SATA SED disks (more on that later)
  • preempt=none : disable kernel preemption (only really useful for desktop systems)

Booting

It's the tricky part ... Supermicro does not allow a command such as efibootmgr to insert or modify boot options in the UEFI subsystem (as mentioned in a Supermicro FAQ) ... At least if you are using an unsigned Linux kernel and no secure boot. It's technically feasible with Slackware (as explained here) but I never tested it.

I scratched my head a little while on this matter and found a solution online

Virtualization layer

I'm currently using Qemu version 6.2.0 along with Libvirt version 8.1.0. Libvirt is optional (especially if you hate XML ^^) but could be useful with a tool such as virt-manager.

There is no official Slackware packages for Qemu or Libvirt but thanks to slackbuilds.org, you do not have to start from nothing.

I recommand compiling Qemu with io_uring for best I/O performances. The slackbuild will take care of that if the dependency is already installed

Truenas core

I'm a big fan of this network storage appliance system. It's robust (based on FreeBSD and ZFS), well maintained, with a solid community. It's also heavy on ressources (especially RAM, count at least 16GB).

There is an endless debate on whether or not to use ECC RAM ... Facts are :

  • it's definitely working without ECC
  • you can run into cases such as this incident where ECC RAM prevents your filesystem from being corrupted

Conclusion : use ECC RAM and make backups ^^

Data encryption

Two alternatives identified :

  • native ZFS encryption within Truenas
  • SED encryption (Crucial MX500 disks are TCG Opal 2.0 compatible)

SED encryption comes with a caveat in the case of PCI passthrough. Disks are indeed activated when the hypervisor starts up, not the Truenas VM ; and it resulted in SED operations (locking / unlocking) not working as intented in Truenas. I decided to use sedutil-cli in the hypervisor, before the Truenas VM starts up (disks are automatically locked when powered off). The operation of unlocking the drives is manual but the password isn't stored inside Truenas.

Command to define a password on a new SED drive :

sedutil-cli --initialsetup <password> <drive>

Script example to unlock the drives :

#!/bin/bash

SEDUTIL="/usr/sbin/sedutil-cli"

echo -n "SED password: "
read -s password
echo

if [ ! -x $SEDUTIL ]; then
    echo "no sedutil-cli found"
    exit 1
fi

_drives=$($SEDUTIL --scan | awk '$2 == "2" {print $1}')

if [ "x$_drives" = "x" ]; then
    echo "no SED drive found"
    exit 1
fi

for _drive in $_drives; do
    $SEDUTIL --setLockingRange 0 RW $password $_drive
    $SEDUTIL --setMBRDone on $password $_drive
done

But why using SED encryption in the first place ? Well, as mentioned earlier, the C3558 is a modest CPU and even if AES-NI instructions are available, I wanted to spare extra cycles. Moreover, I'm not paranoid about data encryption and I think SED is a sweet spot between performance and security. I ran a little bench with fio to illustrate this :

RAIDZ (no crypto) RAIDZ (zfs crypto) RAIDZ (SED crypto)
VCPUs VCPUs VCPUs
2 3 2 3 2 3
sequential write (MB/s) 178 264 153 222 245
sequential read (MB/s) 442 288 436

More CPU power is needed for SSDs to realize their full potential. But this particular setup cosumes ... 38W at full speed (it's powered on 24/7) and I just need to saturate a 1 Gb/s network.

Libvirt XML configuration

Special points of attention.

  • Using huge pages for memory :
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  • Using io_uring for Truenas boot disk :
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='io_uring'/>
      <source dev='/dev/<VOLUME>/<DISK>'/>
      <target dev='vda' bus='virtio'/>
    </disk>
  • SATA controller PCI passthrough (mini-SAS) :
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x13' function='0x0'/>
      </source>
    </hostdev>

OPNSense

OPNSense (fork of PFSense) is another top notch appliance based on FreeBSD but network and firewall oriented this time. Thanks to its four 1 Gb/s Ethernet interfaces, the Supermicro A2SDi-4C-HLN4F motherboard is a perfect candidate to host a router / firewall.

Aimed goal :

Network diagram

Ethernet PCI passthrough

While it's not mandatory, PCI passthrough the different Ethernet devices to the OPNSense VM is recommanded, especially if VLANs will be used in conjunction with NAT to give access to Internet. In this case, the MTU will be 1500 all over the networks and no need to tweak the clients OS to lower it. Virtual interfaces requires indeed lowering the MTU with VLANs.

If virtual interfaces are used (to bridge OPNSense with some VMs on the hypervisor, such as a Truenas one for example !), the global setting disabling all hardware offload can be applied. It's possible to override it for specific hardware interfaces (where all offloads are welcome and will work, such as checksum, LRO, TSO or VLAN filtering)