Heinrich Hartmann

Monitoring With Ganglia

Written on 2014-01-01

In this note we are going to install the Ganglia monitoring system on a Virtual Cluster.

Gangila was initially developed at UBerkley. Is free software. It scales to multiple nodes and multiple clusters. Oreilly has a book on it.


Ganglia consists of three different services gmond, gmetad and gweb.

Architecture Scetch

     gmond                gmetad      gweb
     =====                ======      ====
  * <-----> * <---[poll]---> * <-------> *
  | Cluster |                |
  * <-----> *                |
  * <-----> * <---[poll]-----+
  | Cluster |
  * <-----> *



Install software

Gangila Monitor

Installation via apt-get is a piece of cake:

ssh VLB1 sudo apt-get install ganglia-monitor

Now start the monitor daemon:

ssh VLB1 sudo service ganglia-monitor start

and test it is collecting metrics by typing in:

nc VLB1 8649

You should see an XML dump of the metrics in your terminal window.

Gangila Meta Daemon

We install the gemetad and the web frontend on the host machine

 sudo apt-get install gmetad

Now start the gmetad daemon by running

sudo service gemtad start

Test its functionality by running:

nc localhost 8651

it should respond with an XML representing the state of all connected nodes (i.e. none).

To get more elaborate information about the meta daemon run it from the command line with enabled debug information:

sudo -u nobody gmetad --debug=10

IP Multicast Setup

Ganglia uses multicast channels to connect different gmond daemons with each other.

It seems surprisingly difficult to get install and test multicast networking. First we need to check if multicast is supported by your kernel (should be) following Stackexchange one can use:

ip maddr show
cat /proc/net/igmp
netstat -ng

to display information about the multicast configuration. Another very helpful source http://sorcersoft.org/resources/notes/multicast.html

We make sure the mutlicast packages are sent over the right ethernet interface by adding the following route:

ssh VLB1 sudo route add -net netmask dev eth0
ssh VLB2 sudo route add -net netmask dev eth0

Ganglia Web Frontend

Ganglia provides a nice php web-site that visualizes the data aggregated by gmetad. Installation and start of the application are rather easy:

sudo apt-get install ganglia-webfrontend
sudo cp /etc/ganglia/apache.conf /etc/apache2/sites-enabled/ganglia
sudo service apache2 reload

Remark: The apache.conf file is a single line:

Alias /ganglia /usr/share/ganglia-webfrontend

Now, you should be able to open the webfrontend by opening the url: http://localhost/ganglia on your host machine.


Gangila Monitor

We have two virtual nodes VLB1 and VLB2 running gmond daemon and share their metrics on a multicast channel over the virtual network. To make gmetad aware of those nodes edit the /etc/ganglia/gmetad.conf to contain the following line:

 data_source "Virtual Cluster" 1 VLB1 VLB2

Now restart the gemtad daemon, eg. using

 sudo service gmetad restart

and you should be able to see two virtual machines in the web frontend.


Odds are, that something went wrong along the way, to get a better understanding of the problem start the daemons from the command line:

 sudo -u nobody gmetad -d 10

 # on the VMs
 sudo gmetad -d 10


There are two different ways to extend ganglia by customized metrics.

  1. Using gmetric tool
  2. Including modules in C/C++
  3. Including modules in Python (via mod_python module)

The gmetric tool allow to set specific values to metrics:

gmetric --name="my_metric" --value="18" --type=int32

It does not, however, allow the repeated execution of a specific script scheduled by the gmond daemon but has to be triggered by an extrenal process like cron.


We can add the following line in crontab -e to monitor the size of your www folder every minute

# m h dom mon dow command
* * * * * gmetric --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`

To see, if this script is executed use

tail -f /var/log/syslog | grep CRON

You should see messages like

Dec 27 12:51:01 VLB CRON[4136]: (user) CMD (gmetric --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`)

appear every minute. If another line

Dec 27 12:57:01 VLB CRON[4297]: (CRON) info (No MTA installed, discarding output)

appears next to it, then something went wrong.


Current Setup

My crontab has a single entry that runs a script

# m h dom mon dow command
* * * * * ~/ganglia-metrics.sh 2>&1 >> ~/crontab.log

Note, that the script is called using it’s full path and the output is redirected to a log file. The ganglia-metrics.sh script looks as follows:


echo `date` "- executing ganglia-metrics.sh"

$GMETRIC --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`

# some more dummy metrics ...
$GMETRIC --name="date" --type=int32 --value=`date +%s`
$GMETRIC --name="rand" --type=int32 --value=$RANDOM

Note, that the script uses a shebang ‘#!’ in order to be executed by the bash shell.

More examples can be found on github. See https://github.com/vvuksan/ganglia-misc/tree/master/gmetric-python for a python implementation of gmetric.

Python modules

Ganglia can be extended by python modules. In contrast to the gmetric method explained before, these python modules are executed by gmond and do not have to be scheduled by a cron job.

To enable python modules one has to load the python module wrapper as a module. You can see all installed native-modules using:

ls -l /usr/lib/ganglia

Unfortunately the preinstalled gmond.conf version does not include a configuration template, even though the modpython.so file is provided. We have to add the following lines into gmond.conf (cf. https://bugs.launchpad.net/ubuntu/+source/ganglia/+bug/694208):

modules {
    module {
       name = "python_module"
       path = "/usr/lib/ganglia/modpython.so"
       params = "/usr/lib/ganglia/python_modules"

include ('/etc/ganglia/conf.d/*.pyconf')

Now run

sudo mkdir -p /usr/lib/ganglia/python_modules /etc/ganglia/conf.d

to create the directories if necessary. Use

sudo gmond -m -d 10

to verify the module is loaded correctly. (You shoud see loaded module: python_module at the beginnig followed by no error messages).

Install example python metric

Before we write our own python metric we install the ‘disk_free’ metric from github by Vladimir Vuksan

curl https://raw.github.com/ganglia/gmond_python_modules/master/diskfree/python_modules/diskfree.py \
     | sudo tee /usr/lib/ganglia/python_modules/diskfree.py
curl https://raw.github.com/ganglia/gmond_python_modules/master/diskfree/conf.d/diskfree.pyconf \
     | sudo tee /etc/ganglia/conf.d/diskfree.pyconf

Check that everything was is working fine by running, e.g.

sudo gmond -m -d 10 | grep disk_free

Start gmond again and you should see disk_free metrics in the web interface.

Write our own module

Now, that we know the python module infrastructure works as expected, lets write our own:

cat << EOF | sudo tee /usr/lib/ganglia/python_modules/example.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def get_value(name):
    """Return a value for the requested metric"""
    return 17
def metric_init(lparams):
    """Initialize metric descriptors"""

    # create descriptors
    descriptors = []

        'name': "example",
        'call_back': get_value,
        'time_max': 60,
        'value_type': 'float',
        'units': '%',
        'slope': 'both',
        'format': '%f',
        'description': "example metric",
        'groups': 'example'

    return descriptors

def metric_cleanup():

# the following code is for debugging and testing
if __name__ == '__main__':
    descriptors = metric_init({})
    for d in descriptors:
        print (('%s = %s') % (d['name'], d['format'])) % (d['call_back'](d['name']))

save this script in your python modules directory and test its functionality using:

 python /usr/lib/ganglia/python_modules/example.py

Now add the python module to gmond configuration using e.g.

cat << EOF | sudo tee /etc/ganglia/conf.d/example.pyconf
modules {
    module {
        name = "example"
        language = "python"

collection_group {
    collect_every = 10
    time_threshold = 180
    metric {
       name_match = "example"

For more information see the official docs.