How to monitor your monitoring platform – part 1

This post is also available in: French

How did the configuration part go for you?
Everything’s ok?
If yes, perfect
Otherwise, follow these steps and come back later!
Today, we’ll wring the neck of the old adage that says: “The shoemaker’s children always go barefoot”. Monitoring your IT system guarantee the reliability of it but what happens if the monitoring system is no longer operational? How to be sure of the reliability of the infrastructure if we can’t even guarantee the reliability of the monitoring system? How can I check the SLA measure if I don’t know one/those of my reference system?
To sum it up: why monitor the monitoring system if we don’t give any attention to it.

Theory

So, we have to monitor our monitoring solution. But what do we have to check and how to monitor it? Easy answer: it depends on the monitoring platform’s architecture.
First, sort the main monitoring architecture in order to define all monitoring needs.
• Simple architecture: one monitoring server with MySQL databases server remote or not;
• Distributed architecture: one central server, a MySQL databases server remote or not and a set of remote poller.

Simple architecture

A simple architecture is made of one single server that polls/monitors the IT system and hosts the Centreon web interface and its processes.

Find below the components on the server:
• A monitoring engine (Centreon Engine, Nagios, …);
• A stream multiplexer (Centreon Broker, NDOutil, …);
• A Web presentation server (Apache…);
• A poller presentation and configuration interface (Centreon Web) ;
• A MySQL database server (remote or not);
• Databases : « centreon », « centreon_storage » ;
• RRD databases for the generation of performance graphics;
• Centreon Processes (workflow) ;
• Event logs (logs) ;
• Notification processes.

Now, we are going to define all monitoring indicators of each component and how to monitor them.

Monitoring engine

The monitoring engine, Centreon Engine in our case, is a process « centengine » started by the script « /etc/init.d/centengine » and performed by the user« centreon-engine ». It logs these information in the file « /var/log/centreon-engine/centengine.log ».

To validate its operation, we have to:
• Check the process and the user that started the process;
• Check the presence of errors in the process log.

Stream Multiplexer

The stream multiplexer is the bond between monitoring engine and Centreon Interface. It inserts into MySQL database all data from the monitoring and provides RRD databases.
According to your configuration, at least one process “cbd” (for centreon broker daemon) launched via the initialization script « /etc/init.d/cbd » must be started. These processes performed by the user « centreon-broker » log information into the file « /var/log/centreon-broker/central-*.log ».

To validate its operation:
• Check at least one present process and the user that started the process;
• Check the presence of errors in the process log(s).

Web presentation server

The Web presentation server, usually Apache, allows the user to get to the Centreon web interface.

To validate its operation:
• Check at least one present process (apache2, httpd, …)and the user that started the process(es) (apache, www-data, …) ;
• Check the presence of errors in the process log.

Centreon web Interface

The Centreon web interface is the core of the monitoring. Through the interface, the user can configure/modify resources to monitor, view the result of the collection and any alerts sent.
To validate its operation, try the access to the Centreon uri (http://@ip_serveur_supervision/centreon) ;

MySQL databases server

The MySQL databases server allows to save the monitoring configuration in the “Centreon” database and the result of the polling in the “centreon_storage “database. Started by the script « /etc/init.d/mysql », processes « mysqld » and « mysqld_safe », run by the user “root”, save databases into the directory « /var/lib/mysql ». The operating log is available in « /var/log/mysqld.log ».

To validate its operation:
• Check the processes « mysqld » and « mysqld_safe » ;
• Check the presence of errors in the log of database server;
• Run tests of connection on “centreon” and « centreon_storage », via the user MySQL « centreon », and its properties are available through the file « /etc/centreon/centreon.conf.php “ ;
• Check the free space on the partition « /var/lib/mysql ».
Note: When the partition is full, database server is unavailable and thus, the user can’t connect to the Centreon interface and all data from the monitoring can’t be put into the database.

Databases RRDs

RRDs databases are provided by a module specific to the stream multiplexer called « cbd-rrd ». They are stored in the folders « /var/lib/centreon/metrics » and « /var/lib/centreon/status ».
These databases provide evolution graphs through the Centreon web interface.

To validate its operation:
• Check the free space on the partition that hosts two folders;
• Check the process « cbd-rrd »;
• Check the presence of errors in the log « /var/log/centreon-broker/central-*-rrd.log ».

Centreon processes and workflow

To operate correctly, Centreon run many processes in the background, invisible to the user and which is essential to its operation.
These processes are performed by “cron” tasks which are defined in files « /etc/cron.d/centreon » and « /etc/cron.d/centstorage ». For each execution, the associated log files are updated. These logs are available in the folder « /var/log/centreon ».
In addition to these processes, the daemon “centcore”, started by the script « /etc/init.d/centcore » must run continuously in order to perform actions requested by the user and the monitoring engine.

To validate its operation:
• Check the presence of errors in the log « /var/log/centreon/*.log » ;
• Check the presence of « centcore ».

That’s all for today!
Next week: how to implement the checkpoints in order to ensure the functioning of the monitoring platform for a simple architecture.
Remember these basics because you will use them next weeks for the monitoring of the monitoring platform in a distributed architecture.

Stay tuned !

Leave a Reply