...
This list should be developed in conjunction with the Doctor (Faults) and VES Projects in OPNFV.
...
Monitoring Process information:
- A Unique Process identifier.
- Heartbeat/ping to check liveliness.
NFVI Events
What about entire node and switch failures? In terms of service affecting priority, host and switch failures are at the top as they can affect the most VMs / Containers / VNFs...
...
- Machine check exceptions (System, Processor, Memory...) [TODO: Break this down further]
- DIMM corrected and uncorrected Errors
Networking
At a minimum the following events should be monitored for a Networking interface:
- Link Status
- Dropped Receive Packets – An increasing count could indicate the failure or service interruption of an upstream processes.
Storage
NFVI Other Information
Compute
BIOS information
...
- vSwitch liveliness
Storage
NFVI Metrics
Compute
At a minimum the following metrics should be collected:
- CPU utilization TODO:
...
- Break this down further]
- vCPU utilization TODO: Break this down further]
- Memory utilization TODO: Break this down further]
- vMemory utilization TODO: Break this down further]
- Cache utilization
- Hits
- Misses
- Instructions per clock (IPC)
- Last level cache utilization
- Memory Bandwidth utilization
- CPU Utilization
- Memory Utilization
- vMemory Utilization [TODO]
- Cache Utililzation
- Platform Metrics (thermals, fan-speed) [TODO: Break this down further]
Networking
[TODO] Add a note on the vSwitch and add vSwitch specific metrics
...
- Average bitrate
- Average latency
Storage
Disk Utilization
NFVI Other/Additional Information
Compute
BIOS information
Networking
Storage