Anuket Project

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 33 Next »

 

The tables and lists of questions have been created by Sridhar Rao <Sridhar.Rao@spirent.com>

 

There are numerous opensource monitoring solutions available, with varying approaches and architectures. In this study, we compare only the 'agent' component of the monitoring solution, and will not consider the server-side component(s). Because, there can be multiple implementation options of the 'server' - for example, with collectd, it could be simple collectd-web or a timeseries database such as Influxdb, telemetry system based on Apache Kafka, etc. - and considering all the options would be extremely difficult. Typically the server side components could include some or all of the following (a) Metric collection infrastructure - raw-metric receiver, message-queues, etc. (b) Metric Modifier - add contexts, perform-aggregation, filter, etc. (c) Storage solution (d) Alarm/Alerting System (e) Visualization/Graphing - dashboards. (f)  Publishing.

 

Terminology Definition

TermWhat we mean by that?
MetricA Measurement of a particular characteristic.
Ex: %ge of CPU used, Amount of Bandwidth used, etc. Complete definition can be found here
EventA record of something that has happened - A simple immutable fact.
Example: Link has gone down. A packet from a flow is dropped, etc. Complete definition can be found here 
AgentSoftware that runs on a node/system that needs to be monitored.
Client NodeA node that is monitored (Node on which agent runs)
Server NodeA node that collects metrics and events from the client node.
Sampling IntervalHow frequently the metrics are sent.
Push ModeFetching of events by subscribing
Poll ModeFetching of events via polling.
Writing of Metrics/eventssending/outputting of metrics or events.
Reading of Metrics/eventsreceiving/reading of measurements
Logging of Metrics/eventsLogging of monitored/received metric or event
Metric Types (data source types)

Guage: Value stored as-is
Derive: Derivative - Change of the value (rate)
Counter: Similar to Derive - but it is NEVER negative (due to wrap-around)
Absolute: 

 

 

Parameter Table

 

Parameters\Tools

Collectd

Ceilometer

Polling agent.

MonascaSNAPnode-exporter and other exporterssensu client: metric collection pluginsmunintelegrafNPRE + PluginsdiamondcentreonicingaOpenNMSNSClient++Elastic BeatsReimannNote:
1. For some parameters the answer could be just YES/NO,
2. Whereas, for some we may have to provide a description/details
3. For some we may have to choose from the list [], whereas for some we may append a value to the list.
4. For some parameters, please provide the number of 'actual metrics' provided under that category. For example, collectd would provide 12 metrics for Processes-category

Use NA - If Not applicable.
Use NK - If it is Not Known
CPU metricsidle, system, wait, stolen, user (% & time), util, vcpusidle, system, wait, stolen, user (% & time), util, vcpusidle, system, wait, stolen, user (% & time)idle, system, wait, stolen, user, guest, irq, nice (% & jiffies)idle, system, wait, stolen, user (% & time), util, vcpusidle, system, wait, stolen, user (% & time), util, vcpus

Freq,

usage - idle, system, wait, user, util and vcpus.

Same as ceilometer or monasca idle, system, wait, user, nice      
Disk IO metrics

Read and write (bytes, rate, time, sectors)

disk-free

read and write (bytes, rate, req)read and write (bytes, rate, req)

read and write (ops, octets, merged, time)

disk-free

read and write (bytes, rate, req)Read and write (bytes, rate, time, sectors)read and write (bytes, rate, req)Same as ceilometer or monasca read and write (bytes, rate, req)      
Memory metricsfree, swap, total, used (bytes and percetages)usage, bandwidthfree, swap, total, usedfree, available, total, used.free, swap, total, usedfree, swap, total, used (Mb and percentages)free, swap, total, used, slab.Same as ceilometer or monasca free, total, swap, active, dirty, inactive, buffers.      
Process metricsI/O, memory, CPU-Usage, read-write (bytes and count)NONOI/O, memory, CPU-Usage, (bytes and count). Same as collectd.status, thread-count, uptime. IO, memory, cpu-usage. connections.Cpu and memory, read-write (bytes, count), and various other fields Cpu and memory, read-write (bytes, count) btime, ctxt, processes, blocked, running      
Network Interface MetricsInterface plugin: Standard 4 fields of rx/tx (octets, packets, errors, dropped).
Netlink plugin: uses netlink sockets and covers others
Standard 4 fields of rx/tx (octets, packets, errors, dropped).Standard 4 fields of rx/tx (octets, packets, errors, dropped).sent and recv : bytes, compressed, drops, errors, fifo, frame, multicast, packetsStandard 4 fields of rx/tx (octets, packets, errors, dropped).Standard 4 fields of rx/tx (octets, packets, errors, dropped). Also includes, fifo, compressed, and frame stats.rx/tx (octets, packets, errors, dropped).Same as ceilometer or monasca 

Rx and Tx.

MBs

      
Libvirt MetricsYES - YESYESYESYESNONONO YES      
Container resource usage MonitoringYESNONODockerDockerDockerNODocker Docker      
Databases Monitoring : [Influxdb, MongoDb,  MySql, PostgreSql, Carbon(graphite),  Prometheus, RRDCache,Redis, TSDB]YES for all

MySql, PostgreSql, MongoDb

Influxdb, Vertica, MySql, PostgreSql, Cassandra Influxdb, mysql, mongodb, Cassandra

ALL (4)

All

NO

All.

 MongoDb, mysql, postgresql, and Redis      
Publish metrics to databases - (influxdb, mysql, TSDB, Postgresql, MongoDb, Carbon, Elasticsearch)YES for allNONOYES for all.NONO (1)NOYes for all Yes for All      
Encryption SupportYESNONOYESNONONONO YES      
Language (written)CPythonPythonGoGoRubyPerlGo Python      
Extensibility - multilanguage support [Python, Java, Golang, C/C++, Lua]YES for allJavaJava

Python

C++

Java, Python, RubyGo, Python.Python, RubyNone. None      
Interoperability [with other monitoring solutions]Sensu, statsd, telegraf?

Nagios zabbix

ceilometerCeilometer, Facter, Reimann, PrometheusCollectdNagios, Zabbix.NOReimann Nagios      
Write to Message Queues and protocols (AMQP, Kafka, MQTT, NSQ)YES for ALLAMQPKafkaAMQP, Kafka.NOAMQPNO

kafka,

MQTT,

NSQ

 Yes for ALL      

Metrics Pub/sub Mode Support

(Metrics push/pull mode support ?)

YESYESYESYESYESYESNOYES YES      
Metrics Req/Resp Mode Support NONONOYESNOYESYESNO NO      
Support for Events (polling, Pushing)YesNO (1)NO (1)YESNOYESNOYES NO      
Notification SupportYESNO (1)NO (1)YESNO (1)YESNONO NO      
Logging Support YESYESYESYESYESYESYESYES YES      
Hypervisor metricsYESNONOYES (KVM)YESYES (XenTop)NONO XEN, KVM.      
Log-File AnalysisYESNONOYESYES (mtail)NONOYES NO      
Other Writing (output) Support:
[CSV, HTTP, RRD, UnixSocket, Multicast]
ALL that are listed.NONONOHTTPNORRDSocket, HTTP      
Transport ProtocolDepends on the end point it's communicating with.TCP*TCP*TCPTCP, UDP. (5)TCPTCP

TCP, UDP

        
 Data-Format
[XML, JSON, etc]
JSON, Custom, XMLJSON XMLJSONJSONJSON ?JSONCustomCustom JSON      
Data-modelCustomKVPKVPKVPKVPKVPCustomCustom KVP      
Hardware:
IPMI, Battery, Sensors, 
YES for allIPMIIPMIIMPIYES for allYES - IPMIYES (3)IPMI sensors        
Metric Types: Guage, Derive, Counter, absoluteYES for allGauge cumulative deltaGauge, rate, counter.gauge, derive, counter.Gauge, Counter, Histogram, summaryGauge, Counter, derive.Gauge, Counter, derive.Gauge, Counter. Gauge, Derivative, delta      
Last-Updated201720172017Varies(5)Varies (5)Varies (5)Varies (5)2017 Varies (5)      
Commercial Versions?NONO?NONOYESNONo YES?      
Resource consumption by the agent

Binary: 617Kb

 

               
LicenseMIT/GPL v2 or laterApache License, Version 2.0 Apache License, Version 2.0 Apache License, Version 2.0Multiple (5)MITGPL V2.MIT MIT      
Webserver monitoring
[Nginix, Apache]
YES for allApacheApacheYES for all.Nginix, Apache, Passenger varnishApache, Nginix, Unicorn.NOYes for all NO      

Platforms - OS?

Linux (unix'es), Windows.

Supports windows, linux, freebsd, etc.LinuxLinux

Linux, MAC,

Windows (soon)

Linux

Windows(3)

Linux, Windows,Linux, WindowsLinux Linux      
Configuration Tool support [Puppet, Chef, Ansible, Salt]YES for allPuppet ChefPuppet, Chef, Ansible,Yes for all.

Yes for all.

YES for allNOYes for All. Puppet      
Deployments: servers, VMs, containers,ALLALLALLALLALLALL.ALLAll ALL      
Openstack ModulesNONOALL.CEPH, Cinder, Glance, Keystone, Neutron, NovaNONONONO NO      
Intel PCM and SSDs SMART metricsNONONOYESNONONONO NO      

Cluster Mgmt.

(Kubernetes, Mesos, Swarm)

NONONOKubernetes and MesosKubernetes and mesosKubernetes and mesosNOKubernetes and Mesos NO      

Modifiers - (filtering, threshold, tags, contexts)

Filtering and threshold - yes.

Tags - YES.

Contexts - No. (1)

NOYESYES for all.Tags, Filtering and threshold.NO(1)NOTagging Tags      
Dynamic Loading of plugins.NONONOYESYESYES.YES?NO NO      

Lowest Sampling Interval -

transmitting over network)

can go down to a nano second resolution

                 
Interval for transmitting over the networkCannot be specified - depends on size of the buffer and reading interval                 

Other Services monitoring:

(DHCP, DNS, FTP, NTP, HAProxy, Consul)

HAProxy, DNS, NTPNOHAProxy, NTP.HAProxyDHCP, HAproxy, NTP, Consul.YES for all.NOHAproxy, NTP, Consul, DNS, NO        

Legends

(1) This aspect is realized either as a server-side component or by a 'customized' agent.

(2) Custom solution exist, and may not be part of main distribution.

(3) Support with strong dependency on additional tool/library.

(4) Supports more-options than the ones provided in column-1 

(5) A single value cannot be entered due development of logically-independent modules by different community groups.

Inference Questions

The Questions The Answer
Lowest Interval: Which agent supports the lowest sampling interval, and what is the value? 
Interoperability: Which agent is 'most interoperable'?  (Work with maximum of 'servers' (collection node) 
Large-scale deployment: Which agent is ideal for large-scale monitoring (Provide description in a separate page, if needed) 
Low-footprint: Which agent has the lowest footprint (memory and CPU)? 
Metrics: Which agent supports maximum number of metrics? 
Gaps: Are there any metrics that are not supported by any of the agent and that are relavant to NFV? 
Which agent is ideal for realtime analytics?- [Support for maximum scalable datastores, visualization tools and Analytics engines?] 
Is any of the agents been used in large-scale real-world deployments? If so, please provide the details on the performance. 
Which agent has the least/maximum dependency - Libraries, OS/Kernel versions, etc.? 
Which agent provides maximum 'freedom' w.r.t. Licenses (core agent + plugins)? 
Which agent is best for the following datastores: Influxdb, Graphite, ElasticSearch? 
Which agent support dynamic configuration? 
  
  
  
  
  
  
  
  • No labels