Rebuilding my home lab: Part 2

In part 1 of this series we looked at the hardware configuration for the new lab, and installed the base OS via kickstart from Satellite. In this part we will look at how to solve the issue of presenting a shared storage domin from individual servers.

If you remember back from part 1, our new setup looks like this:

Each server has a local RAID5 array consisting of 3x 3TB spinning disks (SSD would be nice, but not an option here yet), with the total storage presented in this way is 5.2 TB per server.  The RAID array is constructed by the Supermicro main board, although it is not ‘Real’ RAID – the onboard RAID constructs a FakeRAID array using mdadm. This means we can see the status of the array using standard mdadm tools, and /proc/mdstat.

[root@baremetal1 ~]# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 111.8G  0 disk  
├─sda1             8:1    0     1G  0 part  /boot
└─sda2             8:2    0 110.8G  0 part  
  ├─vg_sys-root  253:0    0    10G  0 lvm   /
  ├─vg_sys-swap  253:1    0     4G  0 lvm   [SWAP]
  ├─vg_sys-audit 253:2    0     1G  0 lvm   /var/log/audit
  ├─vg_sys-log   253:3    0     4G  0 lvm   /var/log
  ├─vg_sys-var   253:4    0    20G  0 lvm   /var
  ├─vg_sys-tmp   253:5    0     6G  0 lvm   /tmp
  └─vg_sys-home  253:6    0     1G  0 lvm   /home
sdb                8:16   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
sdc                8:32   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
sdd                8:48   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
 
[root@baremetal1 ~]# mdadm --detail /dev/md126
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid5
        Array Size : 5567516672 (5309.60 GiB 5701.14 GB)
     Used Dev Size : 2783758336 (2654.80 GiB 2850.57 GB)
      Raid Devices : 3
     Total Devices : 3

             State : clean 
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-asymmetric
        Chunk Size : 128K

Consistency Policy : resync

              UUID : 925d82a3:933fecde:4a88ff3c:18600c81
    Number   Major   Minor   RaidDevice State
       2       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       0       8       48        2      active sync   /dev/sdd


[root@baremetal1 ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md126 : active (auto-read-only) raid5 sdb[2] sdc[1] sdd[0]
      5567516672 blocks super external:/md127/0 level 5, 128k chunk, algorithm 0 [3/3] [UUU]
      
md127 : inactive sdb[2](S) sdc[1](S) sdd[0](S)
      9459 blocks super external:imsm
       
unused devices: <none>

Shared storage options

Given the intent to install and use Red Hat Virtualisation (RHV) on these hosts, I needed a way to share the available storage between the hosts. Yes, RHV can use local storage, however setting this up means that VMs cannot migrate between the hosts, so there goes High Availability.  So the dilemma now was how to set up the system so that the local RAID volumes are available to both servers?

The first option I looked at was DRBD, or Distributed Replicated Block Device. This is a software layer from LINBIT HA Solutions. It is a method of combining the storage on two servers into a network-based RAID-1 array.  The DRBD kernel modules are not present in RHEL or CentOS and need to be installed as kmod packages from either the community elrepo repository, or a supported version directly from LINBIT (although this requires a subscription).

Given my relationship with Red Hat however, I didn’t do any testing with DRDB, but also looked into using Gluster. DRDB would be my fall-back if Gluster didn’t work out. I have always wanted to check out this technology, so here’s a good opportunity to do so. After all, my lab ultimately will be RHEL with RHV on top – adding Gluster would make my setup look pretty much like the Red Hat Hyperconverged Infrastructure (RHHI) implementation. Using Gluster also allows for future scaling, both in scale up and scale out modes.

Red Hat Gluster Storage for On-premises Architecture

Only one problem – Gluster should have at a minimum THREE nodes… I only have two, and didn’t want to spin up antoher host just to be a storage node (Otherwise I would just build an iSCSI NAS and be done with it). The reason being that the loss of either node would result in the storage going into a read-only state due to the loss of quorum, with the resulting split-brain condition as we are unable to determine which node holds the correct data.

….Enter the Gluster Arbiter node…..

The Arbiter node allows the Gluster cluster to maintain quorum if one host goes down, so the storage volume remains in a read/write state. The beauty of the arbiter is that it doesn’t actually store content – it just maintains a copy of the metadata so it does not need the same quantity of storage. In fact with the formula given for arbiter sizing as ‘4kb * number of files’, using a 16Gb USB thumb drive would allow for 16Gb / 4kb = 4 million files. That should do for my VM storage scenario 🙂

I did have a spare Raspberry Pi v3 in the cupboard – Can I use that as an arbiter node?   In short, YES. HOWEVER, it comes at the cost of not being able to install RHHI as per the install manuals – in fact I suspect that none of the Gluster management capabilities within RHV will work correctly, as they are set up expecting all Gluster nodes to be using RHEL, and the same version of Gluster.  the RPi will be an ARM processor running Debian, whist the hypervisors are x86_64 running RHEL.

So lets see how this works.

For the arbiter node, I have installed Raspbian Stretch on the RPi v3, and inserted a 16Gb thumb drive, which has come up as /dev/sda. This node is called ‘glusterpi’ and has an IP address on my Storage VLAN.

root@glusterpi:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    1 14.6G  0 disk 
mmcblk0     179:0    0 14.9G  0 disk 
├─mmcblk0p1 179:1    0 41.8M  0 part /boot
└─mmcblk0p2 179:2    0 14.8G  0 part /

root@glusterpi:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host 
 valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
 link/ether b8:27:eb:47:2a:77 brd ff:ff:ff:ff:ff:ff
 inet 10.1.12.23/24 brd 10.1.12.255 scope global eth0
 valid_lft forever preferred_lft forever
 inet6 fe80::4c29:6291:8f44:e1a6/64 scope link 
 valid_lft forever preferred_lft forever

Installing Gluster

First, we need to set up the three hosts so that they can resolve and access each other on the storage network.

On each node we update the hosts file, and verify connectivity:

[root@baremetal1 ~]# echo '10.1.12.21   storage1' >> /etc/hosts
[root@baremetal1 ~]# echo '10.1.12.22   storage2' >> /etc/hosts
[root@baremetal1 ~]# echo '10.1.12.23   storage3' >> /etc/hosts

[root@baremetal2 ~]# echo '10.1.12.21   storage1' >> /etc/hosts
[root@baremetal2 ~]# echo '10.1.12.22   storage2' >> /etc/hosts
[root@baremetal2 ~]# echo '10.1.12.23   storage3' >> /etc/hosts

root@glusterpi:~# echo '10.1.12.21   storage1' >> /etc/hosts
root@glusterpi:~# echo '10.1.12.22   storage2' >> /etc/hosts
root@glusterpi:~# echo '10.1.12.23   storage3' >> /etc/hosts

[root@baremetal1 ~]# for i in 1 2 3; do ping -c1 storage${i}|grep packet; done
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms

[root@baremetal2 ~]# for i in 1 2 3; do ping -c1 storage${i}|grep packet; done
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms

root@glusterpi:~# for i in 1 2 3; do ping -c1 storage${i}|grep packet; done
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms

Great – we can see each other. Next, we need to install the Gluster packages on each host. This is of course different for Red Hat and Debian based systems. My RHEL hosts are subscribed to my Satellite, and I have the Gluster repository available for use by them.

I did read ahead a little during my testing, and found that RHV wants a specific global glusterd parameter to be set, so I am also doing this here before starting the daemon.

Edit – 2018-04-16 – I found during testing that Gluster was not reliably detecting the failure of a node, so the cluster would hang if one node was ungracefully shut down. Setting the ‘ping-timeout’ to 10 seconds (instead of the default of 0) resulted in the gluster peer status reliably tracking the Disconnected state of the failed node. Adding that global parameter in here as well.

[root@baremetal1 ~]# subscription-manager repos --enable rh-gluster-3-for-rhel-7-server-rpms
[root@baremetal1 ~]# yum -y install glusterfs-server
[root@baremetal1 ~]# systemctl enable glusterd 
[root@baremetal1 ~]# sed -i '/option event-threads 1/a\ \ \ \ option rpc-auth-allow-insecure on' /etc/glusterfs/glusterd.vol
[root@baremetal1 ~]# sed -i 's/option ping-timeout.*/option ping-timeout 10/' /etc/glusterfs/glusterd.vol
[root@baremetal1 ~]# systemctl start glusterd

[root@baremetal2 ~]# subscription-manager repos --enable rh-gluster-3-for-rhel-7-server-rpms
[root@baremetal2 ~]# yum -y install glusterfs-server 
[root@baremetal2 ~]# systemctl enable glusterd 
[root@baremetal2 ~]# sed -i '/option event-threads 1/a\ \ \ \ option rpc-auth-allow-insecure on' /etc/glusterfs/glusterd.vol
[root@baremetal2 ~]# sed -i 's/option ping-timeout.*/option ping-timeout 10/' /etc/glusterfs/glusterd.vol 
[root@baremetal2 ~]# systemctl start glusterd

On our RHEL nodes we can also configure firewalld to allow Gluster packets. I should also configure the firewall on the RPi, but for now that one is open.

[root@baremetal1 ~]# firewall-cmd --permanent --add-service glusterfs
[root@baremetal1 ~]# firewall-cmd --reload

[root@baremetal2 ~]# firewall-cmd --permanent --add-service glusterfs
[root@baremetal2 ~]# firewall-cmd --reload

For the Raspberry Pi, we need to enable the ‘deb-src’ repository first. We also need to install the xfsprogs package so that we can use the XFS filesystem. Again, we are adding the global parameter required by RHV before starting the daemon.

root@glusterpi:~# sed -i 's/#deb-src\(.*\)/deb-src\1/' /etc/apt/sources.list
root@glusterpi:~# apt-get update
root@glusterpi:~# apt-get install glusterfs-server xfsprogs
root@glusterpi:~# sed -i '/option event-threads 1/a\ \ \ \ option rpc-auth-allow-insecure on' /etc/glusterfs/glusterd.vol
root@glusterpi:~# service glusterfs-server restart

Now it is important to note that the gluster version on the RHEL and RPi nodes differ. The major version is the same (3.8) so we should be fine – but remember this difference!

[root@baremetal1 ~]# glusterfs --version
glusterfs 3.8.4 built on Mar 23 2018 08:11:31
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

root@glusterpi:~# glusterfs --version
glusterfs 3.8.8 built on Jan 11 2017 14:07:11
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

And here is what our configuration looks like at a high level now.

Lab Configuration with Gluster

Preparing the filesystem

This is actually pretty straightforward. First, we need to configure our filesystems to hold the ‘bricks’ that make up the Gluster system. Each brick will be replicated between nodes, and represents essentially a storage pool. Again, reading ahead to what I want to achieve with RHV, I require TWO storage pools – one to hold the RHV self-hosted engine (RHV manager, or RHV-M) and one to hold the VM images. So, I am creating two gluster bricks for this purpose. So that they can be replicated, I need to create the same filesystem on each node – the difference being the brick numbering.

The first task is to create the two brick partitions on the block device to be shared (/dev/md126 on the RHEL nodes, /dev/sda on the RPi). On the RHEL nodes I created a 100Gb partition for the RHV-M, and the rest of the drive for VM images. On the RPI, since we only need to store the metadata, I created a 2Gb partition for the RHV-M and the rest of the device for the VM store. The resultant setup looks like this:

[root@baremetal1 ~]# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 111.8G  0 disk  
├─sda1             8:1    0     1G  0 part  /boot
└─sda2             8:2    0 110.8G  0 part  
  ├─vg_sys-root  253:0    0    10G  0 lvm   /
  ├─vg_sys-swap  253:1    0     4G  0 lvm   [SWAP]
  ├─vg_sys-audit 253:2    0     1G  0 lvm   /var/log/audit
  ├─vg_sys-log   253:3    0     4G  0 lvm   /var/log
  ├─vg_sys-var   253:4    0    20G  0 lvm   /var
  ├─vg_sys-tmp   253:5    0     6G  0 lvm   /tmp
  └─vg_sys-home  253:6    0     1G  0 lvm   /home
sdb                8:16   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
  ├─md126p1      259:0    0   100G  0 md    
  └─md126p2      259:1    0   5.1T  0 md    
sdc                8:32   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
  ├─md126p1      259:0    0   100G  0 md    
  └─md126p2      259:1    0   5.1T  0 md    
sdd                8:48   0   2.7T  0 disk  
└─md126            9:126  0   5.2T  0 raid5 
  ├─md126p1      259:0    0   100G  0 md    
  └─md126p2      259:1    0   5.1T  0 md 


root@glusterpi:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    1 14.6G  0 disk 
├─sda1        8:1    1    2G  0 part 
└─sda2        8:2    1 12.6G  0 part 
mmcblk0     179:0    0 14.9G  0 disk 
├─mmcblk0p1 179:1    0 41.8M  0 part /boot
└─mmcblk0p2 179:2    0 14.8G  0 part /

Next, we need to create some mount points for our new bricks on each node, format them and configure them to be mounted at boot time. When formatting the partitions, the recommendation is to use an inode size of 512 – this is the XFS default, but it doesn’t hurt to explicitly specify it.

The bricks can be mounted anywhere, but I have chosen to mount them under the /bricks directory. I have numbered my bricks in ‘stripes’ across the three nodes, where brick1-3 are the RHV-M partition, and bricks 4-6 are the VM store partition. Note also the mount options ensure that no UUID is used for these volumes.

[root@baremetal1 ~]# mkfs.xfs -i size=512 /dev/md126p1
[root@baremetal1 ~]# mkfs.xfs -i size=512 /dev/md126p2
[root@baremetal1 ~]# mkdir -p /bricks/{brick1,brick4}
[root@baremetal1 ~]# echo '/dev/md126p1   /bricks/brick1    xfs     rw,noatime,nouuid    1 2' >> /etc/fstab
[root@baremetal1 ~]# echo '/dev/md126p2   /bricks/brick4    xfs     rw,noatime,nouuid    1 2' >> /etc/fstab
[root@baremetal1 ~]# mount -a

[root@baremetal2 ~]# mkfs.xfs -i size=512 /dev/md126p1
[root@baremetal2 ~]# mkfs.xfs -i size=512 /dev/md126p2
[root@baremetal2 ~]# mkdir -p /bricks/{brick2,brick5}
[root@baremetal2 ~]# echo '/dev/md126p1   /bricks/brick2    xfs     rw,noatime,nouuid    1 2' >> /etc/fstab
[root@baremetal2 ~]# echo '/dev/md126p2   /bricks/brick5    xfs     rw,noatime,nouuid    1 2' >> /etc/fstab
[root@baremetal2 ~]# mount -a

root@glusterpi:~# mkfs.xfs -i size=512 /dev/sda1
root@glusterpi:~# mkfs.xfs -i size=512 /dev/sda2
root@glusterpi:~# mkdir -p /bricks/{brick3,brick6}
root@glusterpi:~# echo '/dev/sda1           /bricks/brick3    xfs    rw,noatime,nouuid    1 2' >> /etc/fstab
root@glusterpi:~# echo '/dev/sda2           /bricks/brick6    xfs    rw,noatime,nouuid    1 2' >> /etc/fstab
root@glusterpi:~# mount -a

Having created and mounted the bricks, the last step in setting up the filesystem is to create a ‘brick’ directory inside each brick mountpoint. This is the actual gluster brick that will be replicated across the nodes.

[root@baremetal1 ~]# mkdir /bricks/brick1/brick
[root@baremetal1 ~]# mkdir /bricks/brick4/brick

[root@baremetal2 ~]# mkdir /bricks/brick2/brick
[root@baremetal2 ~]# mkdir /bricks/brick5/brick

root@glusterpi:~# mkdir /bricks/brick3/brick
root@glusterpi:~# mkdir /bricks/brick6/brick

[root@baremetal1 ~]# tree /bricks
/bricks
├── brick1
│   └── brick
└── brick4
    └── brick

[root@baremetal2 ~]# tree /bricks
/bricks
├── brick2
│   └── brick
└── brick5
    └── brick

root@glusterpi:~# tree /bricks
/bricks
├── brick3
│   └── brick
└── brick6
    └── brick

Now we are ready to configure Gluster.

Configuring Gluster

To set up the Gluster cluster, we need to choose one node to be a ‘master’ and probe each node from there. Remember earlier I mentioned to note the versions of gluster installed? One thing I did find doing this was that the node with the HIGHER version of the gluster package needs to be the master – in my case, the RPi (v3.8.8).  Here’s what happens if you try to use one of the RHEL nodes (v3.8.4) as the master and probe each node – the higher version fails:

[root@baremetal1 ~]# gluster peer probe storage2
peer probe: success.
[root@baremetal1 ~]# gluster peer probe storage3
peer probe: failed: Peer storage3 does not support required op-version

So, we will probe from the RPi. Note that we don’t need to probe the local node – it is assumed and pre-configured that the local node will be part of the cluster!

root@glusterpi:~# gluster peer probe storage1
peer probe: success. 
root@glusterpi:~# gluster peer probe storage2
peer probe: success.

We can check that all three nodes see each other as part of the same cluster with a couple of status commands – gluster pool list  and  gluster peer status

[root@baremetal1 ~]# gluster pool list
UUID					Hostname 	State
6b347e19-e595-49e2-a55f-87a6a4c815e5	storage3 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	storage2 	Connected 
cfc44f73-cba5-406b-a5e4-bf5806d88f92	localhost	Connected 

[root@baremetal2 ~]# gluster pool list
UUID					Hostname 	State
6b347e19-e595-49e2-a55f-87a6a4c815e5	storage3 	Connected 
cfc44f73-cba5-406b-a5e4-bf5806d88f92	storage1 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	localhost	Connected 

root@glusterpi:~# gluster pool list
UUID					Hostname 	State
cfc44f73-cba5-406b-a5e4-bf5806d88f92	storage1 	Connected 
6c28fbc7-fff8-4341-b57e-8cfc59e8d64b	storage2 	Connected 
6b347e19-e595-49e2-a55f-87a6a4c815e5	localhost	Connected 

[root@baremetal1 ~]# gluster peer status
Number of Peers: 2

Hostname: storage3
Uuid: 6b347e19-e595-49e2-a55f-87a6a4c815e5
State: Peer in Cluster (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)


[root@baremetal2 ~]# gluster peer status
Number of Peers: 2

Hostname: storage3
Uuid: 6b347e19-e595-49e2-a55f-87a6a4c815e5
State: Peer in Cluster (Connected)

Hostname: storage1
Uuid: cfc44f73-cba5-406b-a5e4-bf5806d88f92
State: Peer in Cluster (Connected)


root@glusterpi:~# gluster peer status
Number of Peers: 2

Hostname: storage1
Uuid: cfc44f73-cba5-406b-a5e4-bf5806d88f92
State: Peer in Cluster (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)

Next, we can create the two volumes we need. We will be creating one volume for RHV-M ‘Storage Domain’ (rhvm-dom) on bricks 1-3, and one volume for VM storage ‘Data Domain’ (data-dom) on bricks 4-6. We can run this command from any node. The volume type is a replica, using three nodes, but one of them is an arbiter node (metadata only). The last node defined is the arbiter, so make sure that the RPi (storage3) is last!

[root@baremetal1 ~]# gluster volume create rhvm-dom replica 3 arbiter 1 \
                       storage1:/bricks/brick1/brick \
                       storage2:/bricks/brick2/brick \
                       storage3:/bricks/brick3/brick
volume create: rhvm-dom: success: please start the volume to access data

[root@baremetal1 ~]# gluster volume create data-dom replica 3 arbiter 1 \
                       storage1:/bricks/brick4/brick \
                       storage2:/bricks/brick5/brick \
                       storage3:/bricks/brick6/brick
volume create: data-dom: success: please start the volume to access data

Now we can go ahead and start the volumes.

[root@baremetal1 ~]# gluster volume start rhvm-dom
volume start: rhvm-dom: success

[root@baremetal1 ~]# gluster volume start data-dom
volume start: data-dom: success

With the two volumes created and started, we can check the status to confirm all is well so far:

[root@baremetal1 ~]# gluster volume status
Status of volume: data-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick4/brick         49153     0          Y       22935
Brick storage2:/bricks/brick5/brick         49153     0          Y       22902
Brick storage3:/bricks/brick6/brick         49153     0          Y       9034 
Self-heal Daemon on localhost               N/A       N/A        Y       22955
Self-heal Daemon on storage2                N/A       N/A        Y       22923
Self-heal Daemon on storage3                N/A       N/A        Y       9054 
 
Task Status of Volume data-dom
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: rhvm-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick1/brick         49152     0          Y       22881
Brick storage2:/bricks/brick2/brick         49152     0          Y       22855
Brick storage3:/bricks/brick3/brick         49152     0          Y       9003 
Self-heal Daemon on localhost               N/A       N/A        Y       22955
Self-heal Daemon on storage2                N/A       N/A        Y       22923
Self-heal Daemon on storage3                N/A       N/A        Y       9054 
 
Task Status of Volume rhvm-dom
------------------------------------------------------------------------------
There are no active volume tasks


[root@baremetal1 ~]# gluster volume info
 
Volume Name: data-dom
Type: Replicate
Volume ID: c3072a8d-1608-4439-b2a4-a858e4e8616c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: storage1:/bricks/brick4/brick
Brick2: storage2:/bricks/brick5/brick
Brick3: storage3:/bricks/brick6/brick (arbiter)
Options Reconfigured:
server.allow-insecure: on
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
 
Volume Name: rhvm-dom
Type: Replicate
Volume ID: ca34c280-b166-4406-9d66-1cb436b25be0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: storage1:/bricks/brick1/brick
Brick2: storage2:/bricks/brick2/brick
Brick3: storage3:/bricks/brick3/brick (arbiter)
Options Reconfigured:
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on

In order to now USE our gluster volumes, we need to mount them on both RHEL nodes. Since the RPi will never have an application that writes to the volumes we don’t need to worry about that node here.  I am manually mounting the gluster volumes on each host using FUSE (glusterfs filesystem type) for testing. In a RHV implementation we don’t need them mounted – RHV handles this for us in the storage definition. If we needed to we can define these mounts in the fstab to be persistent.

We could also mount using NFS or SAMBA – these methods require the installation of additional components, and since my storage consumer is on the same host as the storage, I’m not bothering with those components.

[root@baremetal1 ~]# mkdir -p /export/{rhvm,data}
[root@baremetal1 ~]# mount -t glusterfs storage1:/rhvm-dom /export/rhvm
[root@baremetal1 ~]# mount -t glusterfs storage1:/data-dom /export/data

[root@baremetal2 ~]# mkdir -p /export/{rhvm,data}
[root@baremetal2 ~]# mount -t glusterfs storage1:/rhvm-dom /export/rhvm 
[root@baremetal2 ~]# mount -t glusterfs storage1:/data-dom /export/data

A final check that the mounts were successful:

[root@baremetal1 ~]# mount | grep storage
storage1:/rhvm-dom on /export/rhvm type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
storage1:/data-dom on /export/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@baremetal2 ~]# mount | grep storage
storage2:/rhvm-dom on /export/rhvm type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
storage2:/data-dom on /export/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Testing

So that’s pretty much it for storage configuration – we can test that it is working by dropping files into /export/rhvm and /export/data on one node – they should appear in real-time on the other!

[root@baremetal1 ~]# ls -l /export/data/
total 0

[root@baremetal2 ~]# ls -l /export/data/
total 0

[root@baremetal1 ~]# echo test > /export/data/test1

[root@baremetal1 ~]# ls -l /export/data/
total 1
-rw-r--r--. 1 root root 5 Apr 13 18:14 test1

[root@baremetal2 ~]# ls -l /export/data/
total 1
-rw-r--r--. 1 root root 5 Apr 13 18:14 test1


[root@baremetal2 ~]# echo test2 > /export/data/test2

[root@baremetal1 ~]# ls -l /export/data/
total 1
-rw-r--r--. 1 root root 5 Apr 13 18:14 test1
-rw-r--r--. 1 root root 6 Apr 13 18:15 test2

[root@baremetal2 ~]# ls -l /export/data/
total 1
-rw-r--r--. 1 root root 5 Apr 13 18:14 test1
-rw-r--r--. 1 root root 6 Apr 13 18:15 test2

You will notice that the files dropped into the volumes can be seen in the underlying /bricks directories on all three hosts, with the file size on the arbiter (storage3) being 0 bytes. Although you can read the files this way, DO NOT WRITE to the bricks directly. The data won’t replicate and you will end up corrupting the bricks.

A problem?

I did notice on one or two occasions that after rebooting one of the nodes, gluster volume status was missing one node. gluster peer status showed storage1 and 3 rejecting each other.

[root@baremetal1 ~]# gluster volume status
Status of volume: data-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick4/brick         49152     0          Y       5043 
Brick storage2:/bricks/brick5/brick         49153     0          Y       22902
Self-heal Daemon on localhost               N/A       N/A        Y       4163 
Self-heal Daemon on storage2                N/A       N/A        Y       22923
 
Task Status of Volume data-dom
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: rhvm-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick1/brick         49153     0          Y       5051 
Brick storage2:/bricks/brick2/brick         49152     0          Y       22855
Self-heal Daemon on localhost               N/A       N/A        Y       4163 
Self-heal Daemon on storage2                N/A       N/A        Y       22923
 
Task Status of Volume rhvm-dom
------------------------------------------------------------------------------
There are no active volume tasks


[root@baremetal1 ~]# gluster peer status
Number of Peers: 2

Hostname: storage3
Uuid: 6b347e19-e595-49e2-a55f-87a6a4c815e5
State: Peer Rejected (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)

The cure for this is to re-probe the offending node from the master (storage3, our RPi in this case) – however there are a couple of steps to doing this.

root@glusterpi:~# service glusterfs-server stop
root@glusterpi:~# cd /var/lib/glusterd
root@glusterpi:/var/lib/glusterd# rm -rf vols/ peers/
root@glusterpi:/var/lib/glusterd# service glusterfs-server start
root@glusterpi:/var/lib/glusterd# gluster peer probe storage1
peer probe: success. 
root@glusterpi:/var/lib/glusterd# service glusterfs-server restart

root@glusterpi:/var/lib/glusterd# gluster peer status
Number of Peers: 2

Hostname: storage1
Uuid: cfc44f73-cba5-406b-a5e4-bf5806d88f92
State: Peer in Cluster (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)

 
[root@baremetal1 ~]# gluster peer status
Number of Peers: 2

Hostname: storage3
Uuid: 6b347e19-e595-49e2-a55f-87a6a4c815e5
State: Peer in Cluster (Connected)

Hostname: storage2
Uuid: 6c28fbc7-fff8-4341-b57e-8cfc59e8d64b
State: Peer in Cluster (Connected)

[root@baremetal1 ~]# gluster volume status
Status of volume: data-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick4/brick         49152     0          Y       5043 
Brick storage2:/bricks/brick5/brick         49153     0          Y       22902
Brick storage3:/bricks/brick6/brick         49157     0          Y       9624 
Self-heal Daemon on localhost               N/A       N/A        Y       4163 
Self-heal Daemon on storage2                N/A       N/A        Y       22923
Self-heal Daemon on storage3                N/A       N/A        Y       9646 
 
Task Status of Volume data-dom
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: rhvm-dom
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage1:/bricks/brick1/brick         49153     0          Y       5051 
Brick storage2:/bricks/brick2/brick         49152     0          Y       22855
Brick storage3:/bricks/brick3/brick         49157     0          Y       9631 
Self-heal Daemon on localhost               N/A       N/A        Y       4163 
Self-heal Daemon on storage2                N/A       N/A        Y       22923
Self-heal Daemon on storage3                N/A       N/A        Y       9646 
 
Task Status of Volume rhvm-dom
------------------------------------------------------------------------------
There are no active volume tasks

Gluster will self-heal any files that require correcting following a replication issue. We can check the number of files that are in the queue to heal at any time for each volume:

[root@baremetal1 ~]# gluster volume heal data-dom info
Brick storage1:/bricks/brick4/brick
Status: Connected
Number of entries: 0

Brick storage2:/bricks/brick5/brick
Status: Connected
Number of entries: 0

Brick storage3:/bricks/brick6/brick
Status: Connected
Number of entries: 0

I suspect that this ‘reject’ behaviour could be a result of the minor version mis-match between the RHEL and RPi nodes, as the RHEL nodes immediately trust each other on reboot. For now I can live with this – it does not occur every time, but will look at how to get around this quirk later. I will need to remember to check this after reboots, as if TWO nodes go down the volumes will go stale and require manual cleanup. As long as 2 nodes stay up and connected, the volumes will remain usable.

In the next part of this series we will look at installing RHV and putting these volumes to good use.

Leave a Reply

Your email address will not be published. Required fields are marked *