I’ve been a long time user of FLOSS monitoring software, mainly Nagios, Zenoss and Munin. About one year ago I started looking into more scalable monitoring solutions. I really liked Pandora FMS, especially when used together with the outstanding Babel Enterprise IT Security Balanced Scorecard, but when automated security audits are not your conce, then it’s just a simple monitor software. And there’s a better, more mature, and outstanding software for network monitoring out there.
Meet Zenoss Core. Soon after I finished lab evaluation of the product, I included it in our portfolio. About one year later, two of my clients in Venezuela and two in Ecuador are using Zenoss Core as their main and only monitor software, making very cost-effective monitoring solutions for new platforms or completely ditching existing proprietary solutions. At the moment, over 2K nodes are being monitored, from an ISP to a State Govement, and this year I’d be deploying it in over 1K new nodes. This ranges from microwave CPEs to end-user laptops.
What is it about Zenoss Core? What I really like is the concept of unlegacy IT management. When your IT platform involves cloud computing instances, virtual machines (using the plethora of virtualization techniques out there), hybrid scenarios with several operating systems, network and telecom equipment, then your legacy way of monitoring stuff is definitely not enough. Zenoss scales, allows for multiple collectors, all of them with a single level of reporting upwards, and the MySQL/Zope/ZEO services allow for cluster setups. Zenoss handles monitoring techniques sanely, is serious and stable.
Zenoss provides stack installers and source code for several operating systems, including Debian, the universal operating system. Zenoss is a Web application written in Python and deployed in Zope. It uses Zope’s ZEO as a backend for operational information (zProperties, node information, system configuration) and MySQL as a backend for events information (since ZEO isn’t good at writes) and the rest of the stack is just standard components, Python libraries, Zope templates et al.
I like several premises that Zenoss follows, such as start-end event correlations and de-duping, but the most important two are:
- Zenoss distinguishes between the industry top two monitoring techniques: availability monitoring (that is, knowing if a device and/or a service has been available or not, usually provides a percentage KPI) and performance monitoring, which involves all the indicators we need and our devices are able to provide, and allows for complex scenarios such as drill-and-down multigraphs and so. The sooner the user gets the difference between both ways of monitoring, the sooner he/she’ll appreciate more the solution. Unfortunately, most proprietary solutions haven’t taught us to differentiate. I have had clients arguing over SNMP performance collection when they only have PCs with Windows and no time to configure SNMP services on them.
- Zenoss allows and expects a device to be organized in several ways: a device class, a location, a group of devices and a service, the device class being the only one that’s really mandated. Several parts of the solution apply to a subtree, such as zProperties, Performance Templates and Commands, and reports allow for further filtering via organizers.
Zenoss provides a Web interface, which is fully asynchronous in terms of notifying the operator of new events, jobs that have been completed and so, as well as a powerful CLI, the Zope/ZEO stack, REST interfaces… in a fairly documented API. Therefore, it’s quite easy to integrate Zenoss with other applications, or just develop new apps such as the Zenoss Tray Applet in Python for Microsoft Windows and GNU/Linux.
I have integrated Zenoss with PHP Network Weathermap, like some other Zenoss users. What I do is query outstanding events for locations that the user has explicitly stated in a map design inside Network Weathermap, and then go all the way down exploring subdevices and their outstanding events, thus building a drill-and-down series of maps which allows operators to go from an alerted location to the device with issues without even entering Zenoss Core. I’ll be publishing the code in CPAN (yes, it’s written in Perl) in a few days.
MIB management, ZenPacks and Portlets for the Dashboard are Zenoss nice additions, and so is the ability of using Nagios plugins. I’ve setup MIBs and a ZenPack for MIB browsing, as well as Nagios plugins, and for ZenPacks, they seem to offer a nice potential for automating discovery, classification and monitoring of specialized equipment, from UPS to laser printers.
Zenoss documentation is above the standard FLOSS project quality. The slides for basic/advanced training which I use as a community partner of Zenoss are good, but I’d recommend the Getting Started Guide.
This document will guide you through a simplified, yet very common, workflow of system monitoring. Firstly you’ll install Zenoss using the appropriate stack installer, then you’ll access the Web interface and create a user (as well as setting up passwords) in order to autodiscover devices on a subnet or subnets of your choice.
Zenoss will sweep the subnets, which can take long depending on your topology and network conditions, but rest assured it’ll happen since it’s running asynchronously as a background job. For each device answering the ping, Zenoss will model it. That is, it’ll retrieve SNMP information on the SNMP communities you defined (public and private by default), as well as try to get WMI information if you provide credentials for Microsoft Windows and/or information from a GNU/Linux system if you provide SSH/Telnet credentials. Status pages will be prepopulated with uptime information, interface usage et al.
Devices will end up in the /Devices/Discovered class. From there, you can classify them in further device classes if you wish to do so (Zenoss will automatically know what to do in some predefined device classes, e.g., /Devices/Server/Scan will scan TCP/UDP ports) and set their Location. Reports, graphs, alerts, event management, commands, performance templates are all available and configured at this (early) moment of your Zenoss experience! That’s unlegacy network monitoring as easy as it should be.