Stop Satellite 6 from messing up host NIC definitions

Have you ever provisioned a host with a specific interface configuration in Satellite, only to try to reprovision that host later and have it fail because some application has reconfigured the NIC’s post-deployment and Satellite is now polluted with virtual and/or invalid NIC information?

This happened to me whilst messing with RHV (ovirt) – during the configuration of RHV many additional interfaces were created on my hosts, and when I tried to re-kickstart them from Satellite it just complained about the 20+ virtual interfaces that RHV had created, and my primary interface had been changed to the ovirtmgmt bridge – kickstart on a bridge doesn’t work so well…  Turns out if you create an interface in RHV called ‘traffic’ then you will have that as an interface in Satellite.
If it’s VLAN tagged then you’ll have the tagged interfaces such as traffic.11  traffic.12  etc as well.  Gets messy real quick – thanks facter 🙂

Since I used a common base kickstart to build my hosts, puppet was installed even though I am not using it to deploy any configurations. Puppet itself has facter as a dependency, and so when puppet calls home to Satellite it reports the facts gathered as well, which then update the host details in Satellite.

There are a couple of ways to get around this problem… The first is to simply not install puppet on the hosts – no puppet, no report. But in many cases puppet may actually be wanted/needed to deploy classes required by an enterprise for security.

Thankfully we can address this particular annoyance on the Satellite side. In Satellite, under Settings -> Provisioning is an option to ‘Ignore Puppet facts for provisioning‘. The default is NO. Setting this to YES causes Satellite to ignore any interface facts sent by Puppet, so your provisioning configuration doesn’t get screwed up.

This also applies to the installed OS – ever had it where you kickstatred with 7.3 but after the system upgrades to 7.4 and you try to reprovision, you have some weird OS version mismatch in Satellite because part of the host record is 7.4 but the provisioning record is still 7.3?  Yep – this one had been annoying me for some time with various Satellite installations, time to stop it for good.  Again, under Settings -> Provisioning is the option to ‘Ignore facts for operating system‘. Once more the default in NO, but changing this will stop version updates on the host from changing the Satellite records, so re-provisioning will ‘just work’ without version conflict errors, provided the original OS version is still available for use. (You should, of course, update the host records in Satellite to use the latest versions)

Alternatively, if you know in advance what the naming conventions for the virtual NICs is going to be (you are deploying into a development/QA environment first, right?) you can add them to the ‘ignore interfaces with matching identifier‘ parameter, which already has interface prefixes for Openstack (qvo/qbr/tap etc). All you will need to do is to add the interface prefixes that you will be using in RHV to this list.

Note that these parameter changes are GLOBAL so be careful – however I can’t see too many instances where you want your known provisioning state in Satellite to be modified without your knowledge….

Home Lab – Wrap up

Earlier this year I documented the rebuild of my home lab environment using Gluster and RHV. This post is a final wrap-up of that rebuild, as one or two things changed significantly since the last part was written, and I have been intending to write this follow-up article for some time…

Issues with the original setup

If you recall from my series on the rebuild I was using three nodes, each with an internal RAID set acting as a gluster brick, giving three bricks in total per volume.

Well, that setup worked really well – UNTIL one Sunday when both main nodes (baremetal1 and baremetal2) decided to run an mdadm scan on the soft-raid volume, at the same time (thanks cron).

What happened was that the disk IO times went south big time, and the cluster pretty much ground to a halt. This resulted in RHV taking all the volumes offline, and the manager itself stopping. The two hosted-engine hypervisors then went into spasms trying to relaunch the engine, and I was spammed by a couple of hundred emails from the cluster over the space of several hours.

I was able to stabilise things once the mdadm scans had finished, but this was far from a usable solution for me. With the cluster stable, I stood up a temporary filestore on my NAS via iSCSI and relocated all VM images over to that with the exception of the ovirt-engine.

Then I trashed the cluster and rebuilt it a little differently.