WatchCUB is designed to operate in muti-datacenter environment (like ours), where each data-center is independent from the others. Of course it can be sized to operate in a single data-center with an option for remote monitoring.
In order to minimize the traffic and increase the reliability of the monitoring, we recommend the following deployment principles:
- Install WatchCUB on one machine inside each data-center (preferable without any other services).
- Monitor in deep details all the local services, all virtual hosts, etc.
- Monitor on icmp/port level all remote (in other data-centers) services.
- Collect all the data (using mysql replication) in one data-center (where the NOC is) for visualization purposes.
- Define ticket escalation rules based on alarm from more than one data-center (to avoid false alarms)
This way you can can achieve higher number of tests, lower test timeouts, and good localization of the problem. All local tests can be performed with very tight timeouts, giving quick alarm when some machine/service is dead. The combination of local and remote monitoring is giving more precise diagnostic of the problems and help localize them quickly. When the remote tests are mainly designed to check the bandwidth and port availability, the local test can run real performance benchmark on any complex n-tire architecture, starting from routers, load-balancers, web, application, db servers, etc. With a wise configuration, WatchCUB will detect evolving problems and and their root causes before they impact the end-users. Also you will be notified within a minute for potential bottle-necks or traffic jams, even before thay become a real problem. The flexible threshold and collecting and service availability from several data-centers allow you to filter false alarms due to sensor failure of temporary anomaly.
In a big data-center environment where you have to perform more than 10K tests per minute, the WatchCUB can be easily deployed on several computers sharing common DB. You just have to split the config file, and make sure each WatchCUB machine is monitored by the other.
The WatchCUB configuration (usually in /etc/watchcub) is split into the following files allowing easy distribution around the servers and version control (VSS).
- wc.conf - the main config file (just include of plugins, tests, public and datacenter one).
- plugins.conf - the list of default plugins and default params for them. usually there is no changes requred in this file, unless there are changes in the plugin settings or you need to quaranteen some plugin due to malfunction. This file should be the same on all servers.
- tests.conf - the list of template tests, with most appropriate settings. This file should be the same on all servers.
- public.conf - include all icmp/port tests for the public access points of all datacenters. this file can be the same in all datacenters.
- <datacenter>.conf - define the list of machines inside the <datacenter> and the services that has to be performed on each one of them.
- <project>.conf - if some project has a complex (like chain reaction) test, this file can contain the configuration for it. Once include inside different <machine> tags, this test will be executed agains different host name (for the current datacenter). this will provide an easy way to keep one and same test on stage and official server.