Saturday, November 29, 2008

Installing the Zabbix Agent on a Windows Cluster

For the uninitiated, Zabbix is a fabulous enterprise-class, cross-platform, open-source distributed monitoring solution. One sets up a collection server with a MySQL database, then installs agents on each client machine to enable feature-rich monitoring of various metrics. On Windows clients, this means basic metrics like free disk space, CPU usage, etc.; any counter available in Performance Monitor; and custom scripts.

Installing the Zabbix agent on a Windows cluster is unfortunately quite involved. Because Windows initializes Performance Monitor metrics for all members of a cluster, but only the active node actually reports the metrics for a specific Cluster Group, you need a Zabbix agent that is specific to each Cluster Group. And since the Zabbix agent installs as a service with a specific name, you are required to manually create services for each Cluster Group.

To install the Zabbix monitoring agent on a multi-instance (formerly known as active/active) Windows Server 2003 cluster, you must install the Zabbix agent service several times.
Each physical node should have an agent running to monitor physical components (e.g. CPU, memory, network, local disk). Agents on the physical nodes are to be installed using the normal, systematic installation process.

Each cluster group should have an agent running as a clustered Generic Service resource, monitoring that cluster group's resources (e.g. clustered disk resources, SQL Server, other clustered services). Agents for each cluster group are to be installed manually by using the Service Control command-line program on each physical node and creating Generic Service resources.

In a clustered environment, between zero and many cluster groups may be owned by one physical node at any given time. Since the Zabbix agent listens on a TCP port, care must be taken to eliminate the possibility of more than one agent listening on the same port on the same IP address on the same physical node at the same time. Therefore, each individual agent must bind to a TCP port (default is 10050) on a unique IP address. This is accomplished via the ListenIP parameter in the agent configuration file.

Note: for this example, the following names will be used:
  • First physical node: PhysicalNode1.MyDomain.com (192.168.0.1)
  • Second physical node: PhysicalNode2.MyDomain.com (192.168.0.2)
  • First cluster group: ClusterGroup1.MyDomain.com (192.168.0.3)
Physical node installation

First, copy zabbix_agentd.exe and zabbix_agentd.conf to c:\Zabbix\PhysicalNode1. Then edit c:\Zabbix\PhysicalNode1\zabbix_agentd.conf, modifying the following parameters:
  • Hostname: PhysicalNode1.MyDomain.com
  • ListenIP: 192.168.0.1
  • ListenPort: 10050
  • LogFile: c:\Zabbix\PhysicalNode1\zabbix_agentd.log
From a command line, install the Zabbix agent as a service by executing this command:

c:\Zabbix\PhysicalNode1\zabbix_agentd.conf -i -c c:\Zabbix\zabbix_agentd.conf

Repeat these steps on PhysicalNode2, using the respective hostname, IP Address, and logging directory.

Cluster Group installation

First, copy zabbix_agentd.exe and zabbix_agentd.conf to c:\Zabbix\ClusterGroup1 on the first physical node (PhysicalNode1). Then edit c:\Zabbix\ClusterGroup1\zabbix_agentd.conf, modifying the following parameters:
  • Hostname: ClusterGroup1.MyDomain.com
  • ListenIP: 192.168.0.3
  • ListenPort: 10050
  • LogFile: c:\Zabbix\ClusterGroup1\zabbix_agentd.log
Using zabbix_agentd.exe to install the second service does not work because the service tries to install itself with the same service name (ZABBIX Agent). Therefore, you must manually create the service with the Windows command-line utility sc.exe. To create the service for ClusterGroup1, execute the following command from any directory:

sc \\PhysicalNode1 create "ZABBIX Agent (ClusterGroup1)" binpath= "C:\Zabbix\ClusterGroup1\zabbix_agentd.exe --config c:\zabbix\ClusterGroup1\zabbix_agentd.conf" DisplayName= "ZABBIX Agent (ClusterGroup1)"

Note the display name - this is what allows the service to be created with a different name. After executing this command, you can open up the Services applet in MMC and see that the new service has been created.

Now, execute all of these steps again on all of the physcial nodes of the cluster that can host this particular Cluster Group. Once this is done, you can create the Generic Service Cluster Resource that will allow the Zabbix agent to monitor ClusterGroup1.

Creating the Cluster Resource

Start Cluster Administrator on any of the physical nodes. Right click on ClusterGroup1, and click New > Resource. On the New Resource screen, enter the name - Zabbix Agent (ClusterGroup1) - and select the resource type Generic Service. On the next screen, allow all physical nodes to be possible owners that are possible owners of the other resources in the group. On the dependencies screen, add only the Cluster Group's IP address. On the Generic Service Parameters screen, enter the service name ZABBIX Agent (ClusterGroup1). Note that this is the same name you used in the call to sc.exe. Do not enter any registry keys to be replicated, and you should be done. Bring the new clustered resource online, and test that it successfully fails over to another physical node by right-clicking on ClusterGroup1 and clicking Move Group. Watch the c:\Zabbix\ClusterGroup1\zabbix_agentd.log on the current physical node to make sure there are no errors, and you're done!

Configuring Zabbix

Now create separate hosts in Zabbix for each physical node (to monitor the health of the physical machines) and for each cluster node, and start monitoring!

5 comments:

  1. do we need add any template for cluster node too..??

    ReplyDelete
  2. My amendments to this, on SQL 2008: Creating the clustered service is very easy in 2008, you just browse to the service. I made the dependency on the IP and the SQL data drive, since I decided to put the zabbix files there. I then modified the policy of the zabbix cluster service so that it wouldn't fail sql to another node if zabbix crashed - uncheck "If restart is unsuccessful, fail over all resources in this service or application". Thanks again.

    ReplyDelete
  3. Looking back on this, I'm not sure why the service needs to be clustered. You can just have one running on each machine and use the SQL virtual name as the host name.

    ReplyDelete
  4. Sam Greene, but without the clustered service can get problems with the monitoring of cluster disks.

    ReplyDelete