Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This wiki heavily references The ETSI NFV draft specification titled “Network Functions Virtualisation (NFV); Testing; NFVI Compute and Network Metrics Specification” which can be found at https://docbox.etsi.org/ISG/NFV/Open/Drafts/TST008here:  TST008 (please consult the latest version, and leave a comment if this link is broken, ETSI seems to move it frequently).

Metrics/Events Format

It's important to define a common format that can be used for the list of identified metrics and events that should be monitored/collected in the NFVI.

  • + Name
  • + Where the Metric/Event is collected (e.g., the measurement point, such as Host/Guest/Both)
  • + Parameters (input factors or variables)
  • + Scope of measurement coverage
  • + Unit(s) of measure or associated severities
  • Definition
  • Method of Measurement
  • Sources of Error
  • Comments

In addition to the measurement result, items marked "+" should either be available for collection, or reported with the measurement result.

Distinction between metrics and events

...

Each monitoring process in a deployment should support the following events:

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
 Heartbeat/ping Host/Guest (where the monitoring process is running)ping frequency and size of packet liveliness check N/A Heartbeat/ping to check liveliness of monitoring process external ping false alarm for host due to network interruption 

 

Each monitoring process in a deployment should support the following Metrics:

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments

write_queue/queue_length

 Host/Guest (where the monitoring process is running)measurement frequencyThe monitoring application being used The number of metrics currently in the write queue.    

write_dropped

 Host/Guest (where the monitoring process is running)measurement frequencyThe monitoring application being used The number of metrics dropped due to a queue length limitation.   

cache_size

 Host/Guest (where the monitoring process is running)measurement frequencyThe monitoring application being used The number of elements in the metric cache   
CPU utilization Host/Guest (where the monitoring process is running)measurement frequency, interrupt frequency, set of execution contexts, time of measurementThe CPUs that are being used by the monitoring applicationNanoseconds or percentage of total CPU utilizationThe CPU utilization of the monitoring process  kernel interrupt to read current execution contextshort-lived contexts may come and go between interruptssee section 6 of TST008 
Memory Utilization Host/Guest (where the monitoring process is running)Time of measurement, total memory available, swap space configuredThe Memory that is being used by the monitoring application KibibytesThe amount of physical RAM, in kibibytes, used by the monitoring application memory management reports current values at time of measurement  see section 8 of TST008

 

Timing Information

NFVI Other/Additional Information

...

  • Machine check exceptions (System, Processor, Memory...) [TODO: Break this down further]
    • DIMM corrected and uncorrected Errors

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments

MCEsHost Memory, CPU, IO Machine Check Exceptionusing mcelog  
PCIe ErrorsHost       


Networking

At a minimum the following events should be monitored for a Networking interface:

...

vSwitch liveliness

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
Link Status        
vSwitch Status (liveliness)        
Packet Processing Core Status        

 

Storage

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
         

NFVI Metrics

Compute

At a minimum the following metrics should be collected:

  • CPU utilization TODO: Break this down further]
  • vCPU utilization TODO: Break this down further]
  • Memory utilization TODO: Break this down further]
  • vMemory utilization TODO: Break this down further]
  • Cache utilization
    • Hits
    • Misses
    • Instructions per clock (IPC)
    • Last level cache utilization
    • Memory Bandwidth utilization
  • Platform Metrics (thermals, fan-speed) [TODO: Break this down further]

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
cpu_idleHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentageTime the host CPU spends idle   
cpu_niceHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentageTime the host CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness.   
cpu_interruptHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
cpu_softirqHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
cpu_stealHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
cpu_systemHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
cpu_userHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
cpu_waitHost The host CPUs, individually or total usage summed across all CPUsnanoseconds or percentage    
total_vcpu_utilizationHost The host CPUs used by a guest, total usage summed across all CPUsnanoseconds or percentage    

...

  • Average bitrate
  • Average latency

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
Total Packets received        
Total Packets transmitted        
Total Octets received        
Total Octets transmitted        
Total Error frames received        
Total Error frames transmitted        
Broadcast Packets        
Multicast Packet        

Average bitrate

        
Average latency        

Storage

Disk Utilization

Name

Collection locationParameters

Scope of coverage

Unit(s) of measure

Definition

Method of Measurement

Sources of Error

Comments
        

 

The host CPUs, individually or total usage summed across all CPUs