Cluster Communications, GAB, LLT, HAD

GAB, LLT and HAD forms the basic building blocks of vcs functionality. I/O fencing driver on top of it provides the required data integrity. Before diving deep into the topic lets see some basic components of VCS which contribute to the communication stack of VCS.

– LLT stands for low latency transport protocol. The main purpose of LLT is to transmit heartbeats.
– GAB determines the state of a node with the heartbeats sent over the LLTs.
– LLTs are also used to distribute the inter system communication traffic equally among all the interconnects.
– We can configure upto 8 LLT links including the low and high priority links

1. High Priority links
– Heartbeat is sent every 0.5 seconds
– Cluster status information is passed to other nodes
– Configured using a private/dedicated network.
2. Low priority links
– Heartbeat is sent every 1 seconds
– No cluster status is sent over these links
– Automatically becomes high priority links if there are no more high priority links left
– Usually configured over a public interface (not dedicated)

To view the LLT link status use command verbosely :

bash-3.2# /opt/VRTS/bin/lltstat -nvv | more
LLT node information:
 Node                      State                   Link              Status                 Address
 * 0 karri1              OPEN
                                                            e1000g0 UP 08:00:27:2F:0B:29
                                                            e1000g1 UP 08:00:27:4F:B7:F9
                                                            e1000g2 UP 08:00:27:19:DA:48
 1 karri2                  OPEN
                                                            e1000g0 UP 08:00:27:E1:A3:3A
                                                            e1000g1 UP 08:00:27:56:F2:EA
                                                            e1000g2 UP 08:00:27:31:BD:4D

Other lltstat commands

# lltstat -> outputs link statistics
# lltstat -c -> displays LLT configuration directives
# lltstat -l -> lists information about each configured LLT link

Commands to start/stop LLT

# lltconfig -c -> start LLT
# lltconfig -U -> stop LLT (GAB needs to stopped first)

LLT configuration files
LLT uses /etc/llttab to set the configuration of the LLT interconnects.

# cat /etc/llttabs
set-node node01
set-cluster 02
link nxge1 /dev/nxge1 - ether - -
link nxge2 /dev/nxge2 - ether - -
link-lowpri /dev/nxge0 – ether - -

set-cluster -> unique cluster number assigned to the entire cluster [ can have a value ranging between 0 to (64k – 1) ]. It should be unique across the organization.
set-node -> a unique number assigned to each node in the cluster. Here the name node01 has a corresponding unique node number in the file /etc/llthosts. It can range from 0 to 31.
– Another configuration file used by LLT is /etc/llthosts.
– It has the cluster-wide unique node number and nodename as follows:

# cat /etc/llthosts
0 node01
1 node02

– LLT has an another optional configuration file : /etc/VRTSvcs/conf/sysname.
– It contains short names for VCS to refer. It can be used by VCS to remove the dependency on OS hostnames.
– GAB stands for Group membership services and atomic broadcast.
– Group membership services : It maintains the overall cluster membership information by tracking the heartbeats sent over LLT interconnects. If any nodes fails to send the heartbeat over LLT the GAB module send the information to I/O fencing module to take further action to avoid any split brain condition if required. It also talks to the HAD which manages the agents and service groups.
– Atomic Broadcast : atomic broadcast of cluster membership ensures that every node in the cluster has same information about every resource and service group in the cluster.
GAB configuration files; The file /etc/gabtab contains the command to start the GAB.

# cat /etc/gabtab
/sbin/gabconfig -c -n 4

here -n 4 -> number of nodes that must be communicating in order to start VCS.
Note : Its not always the total no of the nodes in the cluster. Its the minimum no of nodes required communicating with each other in order to start VCS.
Seeding During startup
– The option -n 4 in the GAB configuration file shown above ensures that minimum number of nodes are communicating before VCS can start. Its called seeding.
– In case we don’t have sufficient number of nodes to start VCS [ may be due to a maintenance activity ], but have to do it anyways, then we have do what is called as manual seeding by firing below command on each of the nodes.
# gabconfig -c -x
Note : Be assured that no machine is already seeded as it can create a potential split brain scenario in clusters not using I/O fencing.
Start/Stop GAB

# gabconfig -c (start GAB)
# gabconfig -U (stop GAB)
To check the status of GAB
# gabconfig -a
GAB Port Memberships 
Port a gen a36e001 membership 01
Port b gen a36e004 membership 01
Port h gen a36e002 membership 01
Common GAB ports
a --> gab driver
b --> I/O fencing (to ensure data integrity)
d --> ODM (Oracle Disk Manager)
f --> CFS (Cluster File System)
h --> VCS (VERITAS Cluster Server: high availability daemon, HAD)
o --> VCSMM driver (kernel module needed for Oracle and VCS interface)
q --> QuickLog daemon
v --> CVM (Cluster Volume Manager)
w --> vxconfigd (module for cvm)

– HAD, high availability daemon is the main VCS engine which manages the agents and service group.
– It is in turn monitored by hashadow daemon.
– HAD maintains the resource configuration and state information.
Start/Stop HAD
– hastart command needs to be run on every node in the cluster where you want to start the HAD.
– Although hastop can be run from any one node in the cluster too to stop the entire cluster.
– hastop gives us various option to control the behavior of service groups upon stoping the node.
# hastart
# hastop -local
# hastop -local -evacuate
# hastop -local -force
# hastop -all -force
# hastop -all
Meanings of various parameters of hastop are:
-local -> Stops service groups and VCS engine [HAD] on the node where it is fired
-local -evacuate -> migrates Service groups on the node where it is fired and stops HAD on the same node only
-local -force -> Stops HAD leaving services running on the node where it is fired
-all -force -> Stops HAD on all the nodes of cluster leaving the services running
-all -> Stops HAD on all nodes in cluster and takes service groups offline

Configuring LLT and GAB

Create the LLT and GAB configuration files on the new node and update the files on the existing nodes.

To configure LLT

Create the file /etc/llthosts on the new node. You must also update it on each of the current nodes in the cluster.

For example, suppose you are adding east to a cluster consisting of north and south:

  • If the file on one of the existing nodes resembles:

    0 north

    1 south

  • Update the file for all nodes, including the new one, resembling:

    0 north

    1 south

    2 east

  1. Create the file /etc/llttab on the new node, making sure that line beginning “set-node” specifies the new node.

    The file /etc/llttab on an existing node can serve as a guide.

    See /etc/llttab.

    The following example describes a system where node east is the new node on cluster number 2:

    set-node east

    set-cluster 2

    link e1000g0 e1000g:0 – ether – –

    link e1000g1 e1000g:1 – ether – –

  2. On the new system, run the command:

    # /sbin/lltconfig -c

To configure GAB

  1. Create the file /etc/gabtab on the new system.
    • If the /etc/gabtab file on the existing nodes resembles:

      /sbin/gabconfig -c

      then the file on the new node should be the same, although it is recommended to use the -c -nN option, where N is the number of cluster nodes.

    • If the /etc/gabtab file on the existing nodes resembles:

      /sbin/gabconfig -c -n2

      then, the file on all nodes, including the new node, should change to reflect the change in the number of cluster nodes. For example, the new file on each node should resemble:

      /sbin/gabconfig -c -n3

      See /etc/gabtab.

      The -n flag indicates to VCS the number of nodes required to be ready to form a cluster before VCS starts.

  2. On the new node, run the command, to configure GAB:

    # /sbin/gabconfig -c

 To verify GAB

  1. On the new node, run the command:

    # /sbin/gabconfig -a

    The output should indicate that Port a membership shows all nodes including the new node. The output should resemble:

    GAB Port Memberships


    Port a gen a3640003 membership 012

    See Verifying GAB.

  2. Run the same command on the other nodes (north and south) to verify that the Port a membership includes the new node:

    # /sbin/gabconfig -a

    GAB Port Memberships


    Port a gen a3640003 membership 012

    Port h gen fd570002 membership 01

    Port h gen fd570002 visible ; 2