Data

Failure Type	Failure parameter	Failure Event	Infrastructure Metrics	Comments
Links	Link Down. Link removed
VM	Deployment/Start Failures: Failed to start* Failed to boot* Post-Deployment/Start failures: Shutdown Crash Hang Panic	nova-compute.log nova-api.log nova-scheduler.log libvirt.log qemu/$vm.log neutron-server.log glance/cinder - flavor Node and Core-mapping	cpu: per-core utilization memory Interfaces statistics - sent, recv, drops Disk Read/Write	If possible, Infrastructure metrics and syslogs from within the VM should be collected. Deployment/Start failures can be the first step.
Container	Deployment/Start Failures: Failed to start* Failed to boot* Post-Deployment/Start failures: Shutdown Crash Hang Panic		cpu: per-core utilization memory Interfaces statistics - sent, recv, drops Disk Read/Write
Node	A node failure (hardware failure, OS crash, etc) C) Fabric component failure -- N/A, assuming redundant/highly available configuration ZK DB RPC D) Failure of other OpenStack services -- N/A, assuming redundant/highly available configuration Glance Keystone	A) node network connectivity failure management network VMs communication network storage network B) nova service failure (e.g., process crashed) -- detected and restarted by a local watchdog process compute volume network scheduler api.
Application	Crash/Connectivity/Non-Functional	Application Log i.e. If it is Apache then logs of Apache	Packet Drops, Latency, Throughput, Saturation, Resource Usage	Deploy Collectd within the application and collect both application logs and infrastructure metrics
Middleware Services

Space shortcuts

Page tree

Data

Models

Gaps

Enhancements

Space shortcuts

Page tree

Failure Predication using AI/ML in NFV Environments

Data

Models

Gaps

Enhancements