Home Lab – Wrap up

Earlier this year I documented the rebuild of my home lab environment using Gluster and RHV. This post is a final wrap-up of that rebuild, as one or two things changed significantly since the last part was written, and I have been intending to write this follow-up article for some time…

Issues with the original setup

If you recall from my series on the rebuild I was using three nodes, each with an internal RAID set acting as a gluster brick, giving three bricks in total per volume.

Well, that setup worked really well – UNTIL one Sunday when both main nodes (baremetal1 and baremetal2) decided to run an mdadm scan on the soft-raid volume, at the same time (thanks cron).

What happened was that the disk IO times went south big time, and the cluster pretty much ground to a halt. This resulted in RHV taking all the volumes offline, and the manager itself stopping. The two hosted-engine hypervisors then went into spasms trying to relaunch the engine, and I was spammed by a couple of hundred emails from the cluster over the space of several hours.

I was able to stabilise things once the mdadm scans had finished, but this was far from a usable solution for me. With the cluster stable, I stood up a temporary filestore on my NAS via iSCSI and relocated all VM images over to that with the exception of the ovirt-engine.

Then I trashed the cluster and rebuilt it a little differently.

The final rebuild

The only thing I changed with the rebuild was the layout of the gluster bricks (original Part 2) – everything else was rebuilt the same was as my original post, so I won’t be rewriting all that.  From the BIOS I removed the RAID setup so that the individual disks were back to standalone devices. Below is the updated configuration that I used. In this rebuild I have also elected to keep the engine volume on my iSCSI NAS rather than the gluster volume, so all I am creating below is the VMStore.

Preparing the filesystem

The first task is to create the three brick partitions on the block devices to be shared, /dev/sd[b-d] on the two main nodes. On the third node I still need to use an arbiter brick as I don’t have enough storage, so that is still an LVM device.  The resultant setup looks like this, with NINE bricks now configured across the three servers:

[root@baremetal1 ~]# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 111.8G  0 disk  
├─sda1             8:1    0     1G  0 part  /boot
└─sda2             8:2    0 110.8G  0 part  
  ├─vg_sys-root  253:0    0    10G  0 lvm   /
  ├─vg_sys-swap  253:1    0     4G  0 lvm   [SWAP]
  ├─vg_sys-audit 253:2    0     1G  0 lvm   /var/log/audit
  ├─vg_sys-log   253:3    0     4G  0 lvm   /var/log
  ├─vg_sys-var   253:4    0    20G  0 lvm   /var
  ├─vg_sys-tmp   253:5    0     6G  0 lvm   /tmp
  └─vg_sys-home  253:6    0     1G  0 lvm   /home
sdb 8:16 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N0CK9PC9 253:3 0 2.7T 0 mpath 
  └─vg_gluster1-brick1 253:6 0 2.7T 0 lvm /bricks/brick1
sdc 8:32 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N2VNCA71 253:4 0 2.7T 0 mpath 
  └─vg_gluster2-brick2 253:5 0 2.7T 0 lvm /bricks/brick2
sdd 8:48 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N0LDHNCT 253:2 0 2.7T 0 mpath 
  └─vg_gluster3-brick3 253:7 0 2.7T 0 lvm /bricks/brick3

[root@baremetal2 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 111.8G 0 disk 
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 110.8G 0 part 
  ├─vg_sys-root 253:0 0 10G 0 lvm /
  ├─vg_sys-swap 253:1 0 4G 0 lvm [SWAP]
  ├─vg_sys-audit 253:8 0 1G 0 lvm /var/log/audit
  ├─vg_sys-log 253:9 0 4G 0 lvm /var/log
  ├─vg_sys-var 253:10 0 20G 0 lvm /var
  ├─vg_sys-tmp 253:11 0 6G 0 lvm /tmp
  └─vg_sys-home 253:12 0 1G 0 lvm /home
sdb 8:16 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N6VTC4CU 253:2 0 2.7T 0 mpath 
  └─vg_gluster1-brick4 253:7 0 2.7T 0 lvm /bricks/brick4
sdc 8:32 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N6VTCKKZ 253:3 0 2.7T 0 mpath 
  └─vg_gluster2-brick5 253:6 0 2.7T 0 lvm /bricks/brick5
sdd 8:48 0 2.7T 0 disk 
└─WDC_WD30EFRX-68EUZN0_WD-WCC4N3FFH6NX 253:4 0 2.7T 0 mpath 
  └─vg_gluster3-brick6 253:5 0 2.7T 0 lvm /bricks/brick6


[root@baremetal3 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 111.8G 0 disk 
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 110.8G 0 part 
  ├─vg_sys-root 253:0 0 10G 0 lvm /
  ├─vg_sys-swap 253:1 0 4G 0 lvm [SWAP]
  ├─vg_sys-audit 253:7 0 1G 0 lvm /var/log/audit
  ├─vg_sys-log 253:8 0 4G 0 lvm /var/log
  ├─vg_sys-var 253:9 0 20G 0 lvm /var
  ├─vg_sys-tmp 253:10 0 6G 0 lvm /tmp
  └─vg_sys-home 253:11 0 1G 0 lvm /home
sdb 8:16 0 1.8T 0 disk 
└─WDC_WD20EZRX-00DC0B0_WD-WMC1T2632906 253:3 0 1.8T 0 mpath 
  ├─vg_gluster1-brick7 253:4 0 200G 0 lvm /bricks/brick7
  ├─vg_gluster1-brick8 253:5 0 200G 0 lvm /bricks/brick8
  └─vg_gluster1-brick9 253:6 0 200G 0 lvm /bricks/brick9
sdc 8:32 0 1.8T 0 disk 
└─WDC_WD20EFRX-68AX9N0_WD-WMC301763701 253:2 0 1.8T 0 mpath 

Next, we need to create some mount points for our new bricks on each node, format them and configure them to be mounted at boot time. When formatting the partitions, the recommendation is to use an inode size of 512 – this is the XFS default, but it doesn’t hurt to explicitly specify it.

The bricks can be mounted anywhere, but I have chosen to mount them under the /bricks directory.

[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster1-brick1 
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster2-brick2
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster3-brick3
[root@baremetal1 ~]# mkdir -p /bricks/brick{1..3}
[root@baremetal1 ~]# echo '/dev/mapper/vg_gluster1-brick1 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab
[root@baremetal1 ~]# echo '/dev/mapper/vg_gluster2-brick2 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab
[root@baremetal1 ~]# echo '/dev/mapper/vg_gluster3-brick3 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal1 ~]# mount -a

[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster1-brick4
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster2-brick5
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster3-brick6
[root@baremetal2 ~]# mkdir -p /bricks/brick{4..6}
[root@baremetal2 ~]# echo '/dev/mapper/vg_gluster1-brick4 /bricks/brick4 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal2 ~]# echo '/dev/mapper/vg_gluster2-brick5 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal2 ~]# echo '/dev/mapper/vg_gluster3-brick6 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal2 ~]# mount -a

[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster1-brick7 
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster1-brick8
[root@baremetal3 ~]# mkfs.xfs -i size=512 /dev/vg_gluster1-brick9
[root@baremetal3 ~]# mkdir -p /bricks/brick{7..9}
[root@baremetal3 ~]# echo '/dev/mapper/vg_gluster1-brick7 /bricks/brick4 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal3 ~]# echo '/dev/mapper/vg_gluster1-brick8 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal3 ~]# echo '/dev/mapper/vg_gluster1-brick9 /bricks/brick1 xfs noatime 0 0' >> /etc/fstab 
[root@baremetal3 ~]# mount -a

Having created and mounted the bricks, the last step in setting up the filesystem is to create a ‘brick’ directory inside each brick mountpoint. This is the actual gluster brick that will be replicated across the nodes.

[root@baremetal1 ~]# mkdir /bricks/brick1/brick
[root@baremetal1 ~]# mkdir /bricks/brick2/brick
[root@baremetal1 ~]# mkdir /bricks/brick3/brick

[root@baremetal2 ~]# mkdir /bricks/brick4/brick 
[root@baremetal2 ~]# mkdir /bricks/brick5/brick 
[root@baremetal2 ~]# mkdir /bricks/brick6/brick

[root@baremetal3 ~]# mkdir /bricks/brick7/brick 
[root@baremetal3 ~]# mkdir /bricks/brick8/brick 
[root@baremetal3 ~]# mkdir /bricks/brick9/brick

[root@baremetal1 ~]# tree /bricks
/bricks
├── brick1
   └── brick
├── brick2 
   └── brick
└── brick3
    └── brick

[root@baremetal2 ~]# tree /bricks
/bricks
├── brick4 
   └── brick
├── brick5 
   └── brick 
└── brick6 
    └── brick 

[root@baremetal3 ~]# tree /bricks
├── brick7 
   └── brick
├── brick8 
   └── brick 
└── brick9
    └── brick 

Now we are ready to configure Gluster.

Configuring Gluster

To set up the Gluster cluster, we need to choose one node to be a ‘master’ and probe each node from there. We probe the ‘storage’ address so that the brick replication traffic uses the dedicated storage network.

[root@baremetal1 ~]# gluster peer probe storage2
peer probe: success.
[root@baremetal1 ~]# gluster peer probe storage3
peer probe: success.

We can check that all three nodes see each other as part of the same cluster with a couple of status commands – gluster pool list  and  gluster peer status

[root@baremetal1 ~]# gluster pool list
UUID					Hostname 	State
6b347e19-e595-49e2-a55f-87a6a4c815e5	storage3 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	storage2 	Connected 
cfc44f73-cba5-406b-a5e4-bf5806d88f92	localhost	Connected 

[root@baremetal2 ~]# gluster pool list
UUID					Hostname 	State
6b347e19-e595-49e2-a55f-87a6a4c815e5	storage3 	Connected 
cfc44f73-cba5-406b-a5e4-bf5806d88f92	storage1 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	localhost	Connected 

[root@baremetal3 ~]# gluster pool list
UUID					Hostname 	State
cfc44f73-cba5-406b-a5e4-bf5806d88f92	storage1 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	storage2 	Connected 
6b347e19-e595-49e2-a55f-87a6a4c815e5	localhost	Connected 


[root@baremetal1 ~]# gluster peer status
Number of Peers: 2

Hostname: storage3
Uuid: 6b347e19-e595-49e2-a55f-87a6a4c815e5
State: Peer in Cluster (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)

Next, we can create the vmstore volume. We can run this command from any node. The volume type is a distributed replica, using three nodes, but rememebr that we don’t have enough storage in baremetal3 to handle a full replica. Because of this we are defining a repliction count of 3 with the last node the arbiter. In this configuration, we have three ‘stripes’ of data across our nine bricks, with every third brick being on our arbiter node.

[root@baremetal1 ~]# gluster volume create vmstore replica 3 arbiter 1 \
                       storage1:/bricks/brick1/brick \
                       storage2:/bricks/brick4/brick \
                       storage3:/bricks/brick7/brick \
                       storage1:/bricks/brick2/brick \
                       storage2:/bricks/brick5/brick \ 
                       storage3:/bricks/brick8/brick \
                       storage1:/bricks/brick3/brick \
                       storage2:/bricks/brick6/brick \
                       storage3:/bricks/brick9/brick
volume create: vmstore: success: please start the volume to access data

Now we can go ahead and start the volumes.

[root@baremetal1 ~]# gluster volume start engine
volume start: engine: success

[root@baremetal1 ~]# gluster volume start vmstore
volume start: vmstore: success

With the two volumes created and started, we can check the status to confirm all is well so far:

[root@baremetal1 ~]# gluster volume status
Status of volume: vmstore
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick1/brick 49152 0 Y 3038 
Brick storage2:/bricks/brick4/brick 49152 0 Y 3300 
Brick storage3:/bricks/brick7/brick 49152 0 Y 2597 
Brick storage1:/bricks/brick2/brick 49153 0 Y 3039 
Brick storage2:/bricks/brick5/brick 49153 0 Y 3328 
Brick storage3:/bricks/brick8/brick 49153 0 Y 2598 
Brick storage1:/bricks/brick3/brick 49154 0 Y 3037 
Brick storage2:/bricks/brick6/brick 49154 0 Y 3336 
Brick storage3:/bricks/brick9/brick 49154 0 Y 2599 
Self-heal Daemon on localhost N/A N/A Y 5605 
Self-heal Daemon on storage2 N/A N/A Y 14866
Self-heal Daemon on storage3 N/A N/A Y 6561 

Task Status of Volume vmstore
------------------------------------------------------------------------------
There are no active volume tasks

[root@baremetal1 ~]# gluster volume info

Volume Name: vmstore
Type: Distributed-Replicate
Volume ID: 63b5d204-e933-4a82-86ed-2230e8e81a92
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: storage1:/bricks/brick1/brick
Brick2: storage2:/bricks/brick4/brick
Brick3: storage3:/bricks/brick7/brick (arbiter)
Brick4: storage1:/bricks/brick2/brick
Brick5: storage2:/bricks/brick5/brick
Brick6: storage3:/bricks/brick8/brick (arbiter)
Brick7: storage1:/bricks/brick3/brick
Brick8: storage2:/bricks/brick6/brick
Brick9: storage3:/bricks/brick9/brick (arbiter)
Options Reconfigured:
transport.address-family: inet
cluster.quorum-type: auto
network.ping-timeout: 10
auth.allow: *
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
nfs.disable: on

From here, the configuration is pretty much as already documented in Part 3 of the original post.

Leave a Reply

Your email address will not be published. Required fields are marked *