This wiki will hold the minutes of discussion topics from the Barometer Weekly Call

DMA Project Proposol

Service Assurance Project <April 11, 2018>

This presentation outlines the Service Assurance Project that is currently in development. The SA project will showcase the Barometer container and

make the container(s) available.

Providing Sufficient Measurement Context with Results <Jan 30, 2018>

The Barometer Project considers ETSI GS NFV-TST 008 V2.4.1 (2018-01) to define some of the key metrics, and their required Measurement Context.

Measurement Context includes measurement time stamps, measurement scope, and variable parameters (or input factors) required to understand the measurements.

During development of the complimentary ETSI Performance Management Spec (IFA027), many gaps in were identified, but agreement was reached to add text to specify the Measurement Context.

This slide below illustrates the Measurement Context communicated along with measurements, and difference between Measurement Timestamps, Collection Timestamps, and Reporting Timestamps.

It now remains to conduct a Gap analysis on the relevant Barometer Metrics, to ensure the that Measurement Context is available with the collectd results.

There are three related JIRA Tickets for the Gaps:

DMA Project

Distribute some monitoring and analysis capabilities to the edge
Allow faster polling rates locally without creating a bottleneck for transfer of large amounts of data to a central site.
Allows fast remediation of node-local events
Project is looking for an upstream community
Would Barometer be a good fit?

Discussion topics for the “ideal” monitoring agent

Polling vs Event capture for the monitoring agent
Platform independent monitor agent
Network Interfaces
Kernel events
VM / Container monitoring
Common bus for Events / Telemetry / Config
Common Object model
Agent configuration
Performance
<<50ms and other timing requirements

Decisions

Polling vs Event capture for the monitoring agent <Feb 07 2017>

The scope of polling being discussed is that of the monitoring agent itself (on the node that’s being observed). Collectd is configured to run at a particular interval by default every 10 seconds. the question is, do you leave the read plugins poll for stats and events every time the read interval fires?

A. Both polling and event driven updates should be supported --> it depends on the subsystem you are monitoring, default would be to leverage event based systems where they exist, but polling should be supported as a configuration option that can be selected by the end user.

If we consider the scope of the VIM to the monitoring Agent and whether within this context, we should support polling /event driven updates?

Fault events should always use a push model, and the mechanism over which events are sent needs to be reliable.

Telemetry, can be polled or pushed (could be polled to spread the load on the collection side).

Network (over)load should be taken into consideration as regards which model to use (push vs pull), you don't want to destabilize the network. push is more scalable overall and preferred for fault management.

Agent configuration <Feb 14 2017>

Should be able to dynamically:

* Enable/disable/or restart resource monitoring

* Get values/notifications

* Get capabilities

* Get the list of metrics being collected

* flush the list of metrics

* Set thresholds for resources

* blacklist resources

* support some sort of buffering mechanism, and should be able to configure

* get the timing information for the agent and do aTiming sync if required.