Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are numerous opensource monitoring solutions available, with varying approaches and architectures. In this study, we compare only the 'agent' component of the monitoring solution, and will not consider the server-side component(s). Because, there can be multiple implementation options of the 'server' - for example, with collectd it could be simple collectd-web or a timeseries database such as Influxdb, etc - and considering all the options would be extremely difficult. Typically the server side components could include some or all of the following following (a) Metric collection infrastructure - raw-metric receiver, message-queues, etc. (b) Metric Modifier - Add add contexts, perform-aggregation, filter, etc. (c) Storage Solutionsolution (d) Alarm/Alerting System (e) Visualization/Graphing. (f)  Publishing

 

Terminology Definition

...

Parameters\Tools

Collectd

Ceilometer

Polling agent.

MonascaSNAPnode-exporter and other exporterssensumunintelegrafnagiosdiamondcentreonicingaOpenNMSNSClient++Elastic BeatsReimannNote:
1. For some parameters the answer could be just YES/NO,
2. Whereas, for some we may have to provide a description/details
3. For some we may have to choose from the list [], whereas for some we may append a value to the list.
4. For some parameters, please provide the number of 'actual metrics' provided under that category. For example, collectd would provide 12 metrics for Processes-category

Use NA - If Not applicable.
Use NK - If it is Not Known

Lowest Sampling Interval -

(for transmitting over network)

can go down to a nano second resolution

(1-sec)

               
CPU metricsidle, system, wait, stolen, user (% & time), util, vcpusidle, system, wait, stolen, user (% & time), util, vcpusidle, system, wait, stolen, user (% & time) idle, system, wait, stolen, user (% & time), util, vcpus  Same as ceilometer or monasca idle, system, wait, user, nice      
Disk IO metrics Read and write (bytes, rate, time, sectors)read and write (bytes, rate, req)read and write (bytes, rate, req) read and write (bytes, rate, req)  Same as ceilometer or monasca read and write (bytes, rate, req)      
Memory metrics usage, bandwidthfree, swap, total, used free, swap, total, used  Same as ceilometer or monasca free, total, swap, active, dirty, inactive, buffers.      
Process metricsI/O, Schec, Stats.        btime, ctxt, processes, blocked, running      
Network Interface MetricsInterface plugin: Standard 4 fields of rx/tx (octets, packets, errors, dropped).
Netlink plugin: uses netlink sockets and covers others
Standard 4 fields of rx/tx (octets, packets, errors, dropped).Standard 4 fields of rx/tx (octets, packets, errors, dropped). Standard 4 fields of rx/tx (octets, packets, errors, dropped).  Same as ceilometer or monasca 

Rx and Tx.

MBs

      
Libvirt MetricsYES - YESYES YES  NO YES      
Container resource usage MonitoringYESNONO Docker  Docker Docker      
Databases Support Writing to and Monitoring : [Influxdb, MongoDb,  MySql, PostgreSql, Carbon(graphite),  Prometheus, RRDCache,Redis, TSDB]YES for all

MySql, PostgreSql, MongoDb - monitoring

Influxdb, Vertica, MySql, PostgreSql, Cassandra - monitoring Monitoring only  

Writing - Influxdb

Monitoring - All.

 Monitoring - All      
Encryption SupportYESNONO NO    YES      
Extensibility - multilanguage support [Python, Java, Golang, C/C++, Lua]YES for allJavaJava Java, Python           
Interoperability [with other monitoring solutions]Sensu, statsd, telegraf?

Nagios zabbix

ceilometer Collectd    Nagios      
Write to Message Queues and protocols (AMQP, Kafka, MQTT, NSQ)YES for ALLAMQPKafka NO  

kafka,

MQTT,

NSQ

        
Metrics Pub/sub Mode SupportYESYESYES             
Metrics ResReq/Resp Mode Support  NO NO NO             
Support for Events (polling, Pushing)Yes NO (1)NO (1)              
Notification SupportYESNO (1)NO (1) NO (1)           
Logging Support YESYESYES YES           
Hypervisor metricsYES   YES           
Log-File AnalysisYESNONO             
Other Writing Support:
[CSV, HTTP, RRD, UnixSocket]
ALL that are listed.               
Transport ProtocolDepends on the end point it's communicating with      

TCP, UDP

        
 Data-Format
[XML, JSON, etc]
JSON, Custom, XMLJSON XMLJSON    Custom        
Data-modelCustomKVPKVP    Custom        
Hardware:
IPMI, Battery, Sensors, 
YES for allIPMIIPMI             
Metric Types: Guage, Derive, Counter, absoluteYES for allGuage cumulative delta              
Language (written)CPythonPython    Go        
Last-Updated201720172017             
Commercial Versions? NO?    No        
Resource consumption by the agent

Binary: 617Kb

 

               
LicenseMIT/GPL v2 or laterApache License, Version 2.0 Apache License, Version 2.0              
Webserver monitoring
[Nginix, Apache]
YES for allApacheApache Nginix, Apache, Passenger varnish           
Platforms - OS?Supports windows, linux, freebsd...LinuxLinux             
Configuration Tool support [Puppet, Chef, Ansible, Salt]YES for allPuppet Chef              
  Server-mode support?YES               
Other Services Support       statsd, webhooks          

...

(1) This aspect is realized either as a server-side component or by a 'customized' agent.

(2) Custom solution exist, and may not be part of main distribution.

(3

Inference Questions

The Questions The Answer
Lowest Interval: Which agent supports the lowest sampling interval, and what is the value? 
Interoperability: Which agent is 'most interoperable'?  (Work with maximum of 'servers' (collection node) 
Large-scale deployment: Which agent is ideal for large-scale monitoring (Provide description in a separate page, if needed) 
Low-footprint: Which agent has the lowest footprint (memory and CPU)? 
Metrics: Which agent supports maximum number of metrics? 
Gaps: Are there any metrics that are not supported by any of the agent and that are relavant to NFV? 
Which agent is ideal for realtime analytics?- [Support for maximum scalable datastores, visualization tools and Analytics engines?] 
Is any of the agents been used in large-scale real-world deployments? If so, please provide the details on the performance. 
Which agent has the least/maximum dependency - Libraries, OS/Kernel versions, etc.? 
Which agent provides maximum 'freedom' w.r.t. Licenses (core agent + plugins)? 
Which agent is best for the following datastores: Influxdb, Graphite, ElasticSearch? 
Which agent support dynamic configuration?