For a long time I’ve been running a home server based on VMWare ESXi. I wanted a machine capable of hosting VMs so that I could run multiple long-term workloads such as my mail and web servers, my OpenVPN server, my private cloud, my SSH gateway, a TVheadend PVR and providing storage and streaming services to my entertainments media devices. In addition, I wanted to be able to start and test different flavours of OSes, iSCSI-based clusters and VPN scenarios which might involve setting up multiple VMs (my record is 23 on this rig) and running them for a few days before tearing them down and starting again.

Over time the system has evolved to the current configuration comprising VMWare ESXi v5.1 running on an AMD FX-8320 3.5GHz 8 core processor with 32GB memory, a 2 x 128GB RAID 1 system drive and a 3 x 4TB ZFS (RAID-Z1) datastore (WD ‘Green’ drives). All this running in a single enclosure tucked into the corner of my utility/laundry room. I specifically don’t want multiple physical machines burning electricity 24x7. I also don’t want to take over too much space in that room as I already have the server and a brewing area and various tool cupboards.

To get everything running in the one enclosure the first VM that starts up is a Napp-it NAS system with its OS disk configured as a .vmdk container on the same RAID1 mirror pair that ESXi boots from. The ESXi host presents the 3 x 4TB drives through RDM (Raw Device Mapping) to the Napp-it guest VM which then combines the disks into a RAID-Z1 ZFS volume and presents part of that through an NFS share back to the host ESXi server. A secondary ESXi datastore is created on the NFS share and the remaining VMs have their .vmdk volumes created there. This means that until the first (Napp-it) VM starts the other VMs are seen as ‘unknown’ on a missing datastore.

Although I have been very pleased with the (All-in-one) configuration I’ve become aware of some limitations and niggles which I’ve finally decided to do something about. The first limitation has been there from the beginning but has now become something of a PITA; it is that the freeware ESXi host can only be managed via a Windows based vSphere client or via the CLI on the box itself or via SSH. It’s become an issue because these days I very rarely run Windows - I usually use Ubuntu. The second is that the next versions of ESXi (after V5.1) have dropped support for the Realtek ethernet adapter on my machine’s motherboard. Yes, I could try to load the older driver on the newer version of ESXi or install an alternative ethernet adapter but it adds to the inconvenience. So, it’s time to move on from ESXi.

My home network makes use of 802.1q vLANs to separate the various workloads and to keep the management interfaces away from the rest of the family IT users. Yes, I trust them not to break anything deliberately but I’ve worked in IT for 35 years and developed some hard-learned habits. ESXi makes this sort of configuration easy.

The starting point for the re-design is to benchmark a guest VM in the current rig so that I can compare the before and after performance: Create a single CPU VM with 1GB RAM, 16GB .vmdk thick provisioned hard disk on the NFS datastore and install Ubuntu 14.04 server with Open-SSH server, MySQL-server and Sysbench. Bring everything up-to-date and run the benchmarks.

First a CPU test

$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          15.9727s
    total number of events:              10000
    total time taken by event execution: 15.9707
    per-request statistics:
         min:                                  1.58ms
         avg:                                  1.60ms
         max:                                  2.47ms
         approx.  95 percentile:               1.61ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   15.9707/0.00

Then a FileIO test

$ sysbench --test=fileio --file-total-size=10G prepare
sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 81920Kb each, 10240Mb total
Creating files for the test...
$ sysbench --test=fileio --file-total-size=10G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.


Extra file open flags: 0
128 files, 80Mb each
10Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  5825 Read, 3883 Write, 12416 Other = 22124 Total
Read 91.016Mb  Written 60.672Mb  Total transferred 151.69Mb  (517.75Kb/sec)
   32.36 Requests/sec executed

Test execution summary:
    total time:                          300.0054s
    total number of events:              9708
    total time taken by event execution: 83.5880
    per-request statistics:
         min:                                  0.01ms
         avg:                                  8.61ms
         max:                               5071.86ms
         approx.  95 percentile:              20.67ms

Threads fairness:
    events (avg/stddev):           9708.0000/0.00
    execution time (avg/stddev):   83.5880/0.00

and finally a MYSQL OLTP test

$ sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=root --mysql-password=SQL --max-time=60 --oltp-read-only=on --max-requests=0 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

No DB drivers specified, using mysql
Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Doing read-only test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 7 times)
Done.

OLTP test statistics:
    queries performed:
        read:                            612948
        write:                           0
        other:                           87564
        total:                           700512
    transactions:                        43782  (729.59 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 612948 (10214.33 per sec.)
    other operations:                    87564  (1459.19 per sec.)

Test execution summary:
    total time:                          60.0087s
    total number of events:              43782
    total time taken by event execution: 479.8748
    per-request statistics:
         min:                                  1.46ms
         avg:                                 10.96ms
         max:                                 73.21ms
         approx.  95 percentile:              12.45ms

Threads fairness:
    events (avg/stddev):           5472.7500/118.14
    execution time (avg/stddev):   59.9844/0.00

Wow… I knew my storage was slow but I hadn’t realised it was that slow! 517.75Kb/sec.

The concept is then to replace the host ESXi installation with Linux and use KVM (Kernel-based Virtual Machine) hypervisor managed with local virsh and remote virt-manager. Rather than run the NAS in a separate VM, install ZFS, NFS-server and possibly Samba on the host and combine the functions of the NAS and host hypervisor. The host machine OS management IP address is only to be accessible through the ‘Management LAN’ vLAN along with the management interfaces of the network switch and wireless access point. To replace the COMStar iSCSI functionality I will either install the iSCSItarget package on the host or run a VM to provide the targets - I prefer the latter approach so as to keep the management LAN ‘clean’ but it might hit performance. I don’t currently use Mediatomb (which is also bundled with the Napp-it package) but if I need it I know I can get it and run it in a VM on my Entertainment LAN. I hope to import my ZFS pool intact into my new OS and then either run my VMs with the original .vmdk disk volumes or convert them using Convirt.

I initially considered using Ubuntu server but finally decided to go with Fedora Server 22 with the ‘Virtualization’ group selected at install time. I installed the OS with the 3 x 4TB drives disconnected to avoid any chance of overwriting my ZFS volume.

Install the ZFS package:

dnf install --nogpgcheck http://archive.zfsonlinux.org/fedora/zfs-release$(rpm -E %dist).noarch.rpm
dnf install kernel-devel zfs

Shutdown the box and reconnect the 3 x 4TB drives and restart. Import the ZFS pool.

zpool import -f *poolname*

I imagined that there would be some sort of problem at this point because the disk names are totally different between an Omnios and Fedora OS. However, it imported without a moment’s hesitation. Blink and you’d miss it.

Create vLAN interfaces (interfacename.vlanid) on the host for the Management, Entertainments, Internal, External and Experimental etc LANs - don’t define any IP addressing. Create a bridge* based on the Management vLAN and assign the Host management IP address to that. Remove the DHCP IP address from the untagged ethernet interface.

I used a remote VMM (virt manager) to connect to the new host and (re)defined the guest virtual machines using the original .vmdk files, booted them up and re-ran the benchmarks on the test VM.

First a CPU test

$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          15.2400s
    total number of events:              10000
    total time taken by event execution: 15.2372
    per-request statistics:
         min:                                  1.49ms
         avg:                                  1.52ms
         max:                                  4.30ms
         approx.  95 percentile:               1.62ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   15.2372/0.00

Just about the same as before - maybe a tad faster.

Then a FileIO test

$ sysbench --test=fileio --file-total-size=10G prepare
sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 81920Kb each, 10240Mb total
Creating files for the test...
$ sysbench --test=fileio --file-total-size=10G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.


Extra file open flags: 0
128 files, 80Mb each
10Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  25920 Read, 17280 Write, 55276 Other = 98476 Total
Read 405Mb  Written 270Mb  Total transferred 675Mb  (2.25Mb/sec)
  144.00 Requests/sec executed

Test execution summary:
    total time:                          300.0016s
    total number of events:              43200
    total time taken by event execution: 15.7112
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.36ms
         max:                                 17.15ms
         approx.  95 percentile:               0.27ms

Threads fairness:
    events (avg/stddev):           43200.0000/0.00
    execution time (avg/stddev):   15.7112/0.00

a lot more data transferred in the time limit

and finally a MYSQL OLTP test

$ sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=root --mysql-password=SQL --max-time=60 --oltp-read-only=on --max-requests=0 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

No DB drivers specified, using mysql
Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Doing read-only test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 7 times)
Done.

OLTP test statistics:
    queries performed:
        read:                            617904
        write:                           0
        other:                           88272
        total:                           706176
    transactions:                        44136  (735.55 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 617904 (10297.64 per sec.)
    other operations:                    88272  (1471.09 per sec.)

Test execution summary:
    total time:                          60.0045s
    total number of events:              44136
    total time taken by event execution: 479.8680
    per-request statistics:
         min:                                  0.76ms
         avg:                                 10.87ms
         max:                               1182.25ms
         approx.  95 percentile:              11.39ms

Threads fairness:
    events (avg/stddev):           5517.0000/50.60
    execution time (avg/stddev):   59.9835/0.00

Summary:

PlatformCPUFileIOOLTP
ESXi15.9707 sec517.75 Kb/sec729.59 transactions/sec
KVM15.2372 sec2.25 Mb/sec735.55 transactions/sec

Wow again. CPU and OLTP results are pretty similar between the ESXi and KVM rigs but FileIO is over 4 times faster. What a difference removing a virtualisation layer makes! FileIO on the test VM is now comparable to my (6 year old) laptop with a hybrid SSD/HDD (2.9375Mb/sec). It may be possible to improve this further by converting the .vmdk disk image files to another format - perhaps qcow2 or qed.


* I connected the Management vLAN interface of the host to a bridge because the collection of vLANs is connected via a router/firewall running in a VM in this host. To be specific: the host must exchange ethernet frames with one of the VMs. To achieve this the Management vLAN interface of the router VM is also connected to the bridge.

If instead we try to connect the Management vLAN interface of the router VM directly to the vLAN interface of the host then ethernet frame delivery between the host and guest must go via the externally connected vLAN capable switch (or be turned around inside the ethernet adapter - which most adapters can’t do). As both the router VM and the host MAC address appear on the same port on the switch it does not bother to forward a frame back out of the port and so the host and the router never ‘see’ each other’s frames.


Update 5-Apr-2017: After over 3 years almost continuous running (in the host’s original VMware incarnation and currently under Fedora 22) one of my 4TB (WD Green) drives finally quit. Disaster? No, of course not. The only reason I noticed was due to the monitoring I have built in to my setup. The RAID-Z1 (ZFS RAID5 equivalent) array did its job and the services are all running fine. I’ve ordered a new 4TB drive and will fit it as soon as possible. There will be a brief interruption to the services but I don’t suppose anyone outside the family will notice. Of course, my data will disappear for ever if another disk fails before the new disk is installed and up to date - but I have an off-line backup if needed. Maybe I’ll buy another disk and change to RAID-Z2 - but it’ll take a long time to backup and restore the entire volume and I’m loath to buy GBP 100 of equipment to do “nothing”…

Not bad for consumer grade hardware I think. I’ll take the downtime opportunity to rebuild the operating system on Ubuntu Xenial (16.04 LTS) and configure a read cache and ZIL (write ‘cache’) on a pair of Kingston MLC SSDs. Yes, I know that ZIL should be on more reliable SLC storage but with mirroring (and monitoring) I should get decent enough reliability. Actually, the OS is already built and configured - I just need to import and repair the ZFS pool and import the VMs to complete the job.

I’ll re-run the above benchmarks once the ZFS ‘resilvering’ (RAID rebuild) is complete and will add another update to this post with the results.