Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This wiki heavily references The ETSI NFV draft specification titled “Network Functions Virtualisation (NFV); Testing; NFVI Compute and Network Metrics Specification” which can be found here:  TST008 (please consult the latest version, and leave a comment if this link is broken, ETSI seems to move it frequently).

...

Distinction between metrics and events

For the purposes of Platform Service Assurance, it's important to distinguish between metrics and events as well as how they are measured (from a timing perspective).

A Metric is a (standard) definition of a quantity describing the performance and/or reliability of a monitored function, which has an intended utility and is carefully specified to convey the exact meaning of the measured value. A measured value of a metric is produced in an assessment of a monitored function according to a method of measurement. For example the number of dropped packets for a networking interface is a metric.

 

An Event is defined as an important state change in a monitored function.  The monitor system is notified that an event has occurred using a message with a standard format. The Event notification describes the significant aspects of the event, such as the name and ID of the monitored function, the type of event, and the time the event occurred. For example, an event notification would take place if the link status of a networking device on a compute node suddenly changes from up to down on a node hosting VNFs in an NFV deployment.

 

Collector requirements:

Polling vs Event capture for the monitoring agent

...

  • Fault events should always use a push model, and the mechanism over which events are sent needs to be reliable.
  • Telemetry, can be polled or pushed (could be polled to spread the load on the collection side).
  • Network (over)load should be taken into consideration as regards which model to use (push  vs pull), you don't want to destabilize the network. push is more scalable overall and preferred for fault management.

 

...

Collector configuration

Should be able to dynamically:

  • Enable/disable/or restart resource monitoring
  • Get values/notifications
  • Get capabilities
  • Get the list of metrics being collected
  • flush the list of metrics
  • Set thresholds for resources
  • blacklist resources
  • support some sort of buffering mechanism, and should be able to configure
  • get the timing information for the agent and do aTiming sync if required.

 

Collector Time stamping support

The Time sent with a sample should be: time stamp at which the value was collected.

Currently there are 2 scenarios as regards time stamps with samples:

 

1. Where the subsystem we are reading from CAN provide us with the “incident” time (time at which an event occurred) and the collector can provide us with the collection time (time at which a sample was collected): In this case we have the “incident” time for the sample/event and the time when a collector retrieves the sample...

 

2. where the subsystem we are reading from CANNOT provide us with the “incident” time only the collection time: In this case we only have the time for when the collector retrieves the sample.

 

The recommendation for collectors where possible is to collect both incident time and collection time and send them with a sample.

 

For collectd there is only 1 time stamp field. The recommendation is to send the collection time in the collectd time stamp field for values and notifications- BUT where detection time is available to send it in the metadata.

Events Requirements

Timing: 

...

In addition to the measurement result, items marked "+" should either be available for collection, or reported with the measurement result.

Distinction between metrics and events

For the purposes of Platform Service Assurance, it's important to distinguish between metrics and events as well as how they are measured (from a timing perspective).

A Metric is a (standard) definition of a quantity describing the performance and/or reliability of a monitored function, which has an intended utility and is carefully specified to convey the exact meaning of the measured value. A measured value of a metric is produced in an assessment of a monitored function according to a method of measurement. For example the number of dropped packets for a networking interface is a metric.

 

An Event is defined as an important state change in a monitored function.  The monitor system is notified that an event has occurred using a message with a standard format. The Event notification describes the significant aspects of the event, such as the name and ID of the monitored function, the type of event, and the time the event occurred. For example, an event notification would take place if the link status of a networking device on a compute node suddenly changes from up to down on a node hosting VNFs in an NFV deployment.

 

Information to be collected in conjunction with NFVI Metrics/Events

...