...
This wiki heavily references The ETSI NFV draft specification titled “Network Functions Virtualisation (NFV); Testing; NFVI Compute and Network Metrics Specification” which can be found at https://docbox.etsi.org/ISG/NFV/Open/Drafts/TST008here: TST008 (please consult the latest version, and leave a comment if this link is broken, ETSI seems to move it frequently).
Metrics/Events Format
It's important to define a common format that can be used for the list of identified metrics and events that should be monitored/collected in the NFVI.
- + Name
- + Where the Metric/Event is collected (e.g., the measurement point, such as Host/Guest/Both)
- + Parameters (input factors or variables)
- + Scope of measurement coverage
- + Unit(s) of measure or associated severities
- Definition
- Method of Measurement
- Sources of Error
- Comments
In addition to the measurement result, items marked "+" should either be available for collection, or reported with the measurement result.
Distinction between metrics and events
...
Each monitoring process in a deployment should support the following events:
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
Heartbeat/ping | Host/Guest (where the monitoring process is running) | ping frequency and size of packet | liveliness check | N/A | Heartbeat/ping to check liveliness of monitoring process | external ping | false alarm for host due to network interruption |
Each monitoring process in a deployment should support the following Metrics:
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments | ||
---|---|---|---|---|---|---|---|---|---|---|
| Host/Guest (where the monitoring process is running) | measurement frequency | The monitoring application being used | The number of metrics currently in the write queue. | ||||||
| Host/Guest (where the monitoring process is running) | measurement frequency | The monitoring application being used | The number of metrics dropped due to a queue length limitation. | ||||||
| Host/Guest (where the monitoring process is running) | measurement frequency | The monitoring application being used | The number of elements in the metric cache | ||||||
CPU utilization | Host/Guest (where the monitoring process is running) | measurement frequency, interrupt frequency, set of execution contexts, time of measurement | The CPUs that are being used by the monitoring application | Nanoseconds or percentage of total CPU utilization | The CPU utilization of the monitoring process | kernel interrupt to read current execution context | short-lived contexts may come and go between interrupts | see section 6 of TST008 | ||
Memory Utilization | Host/Guest (where the monitoring process is running) | Time of measurement, total memory available, swap space configured | The Memory that is being used by the monitoring application | Kibibytes | The amount of physical RAM, in kibibytes, used by the monitoring application | memory management reports current values at time of measurement | see section 8 of TST008 |
Timing Information
NFVI Other/Additional Information
...
- Machine check exceptions (System, Processor, Memory...) [TODO: Break this down further]
- DIMM corrected and uncorrected Errors
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
MCEs | Host | Memory, CPU, IO | Machine Check Exception | using mcelog | ||||
PCIe Errors | Host |
Networking
At a minimum the following events should be monitored for a Networking interface:
...
vSwitch liveliness
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
Link Status | ||||||||
vSwitch Status (liveliness) | ||||||||
Packet Processing Core Status |
Storage
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
NFVI Metrics
Compute
At a minimum the following metrics should be collected:
- CPU utilization TODO: Break this down further]
- vCPU utilization TODO: Break this down further]
- Memory utilization TODO: Break this down further]
- vMemory utilization TODO: Break this down further]
- Cache utilization
- Hits
- Misses
- Instructions per clock (IPC)
- Last level cache utilization
- Memory Bandwidth utilization
- Platform Metrics (thermals, fan-speed) [TODO: Break this down further]
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
cpu_idle | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | Time the host CPU spends idle. | ||||
cpu_nice | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | Time the host CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness. | ||||
cpu_interrupt | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
cpu_softirq | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
cpu_steal | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
cpu_system | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
cpu_user | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
cpu_wait | Host | The host CPUs, individually or total usage summed across all CPUs | nanoseconds or percentage | |||||
total_vcpu_utilization | Host | The host CPUs used by a guest, total usage summed across all CPUs | nanoseconds or percentage |
...
- Average bitrate
- Average latency
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
Total Packets received | ||||||||
Total Packets transmitted | ||||||||
Total Octets received | ||||||||
Total Octets transmitted | ||||||||
Total Error frames received | ||||||||
Total Error frames transmitted | ||||||||
Broadcast Packets | ||||||||
Multicast Packet | ||||||||
Average bitrate | ||||||||
Average latency |
Storage
Disk Utilization
Name | Collection location | Parameters | Scope of coverage | Unit(s) of measure | Definition | Method of Measurement | Sources of Error | Comments |
---|---|---|---|---|---|---|---|---|
The host CPUs, individually or total usage summed across all CPUs