Data

Failure Type

Failure parameter

Image Modified Failure Event

Image Modified Infrastructure Metrics

Comments

Links

Link Down.

Link removed

Virtual Switch link failure

Reason: Hardware Failure

Interface Down

`dhcp-agent.log`	`neutron-dhcp-agent`
`l3-agent.log`	`neutron-l3-agent`
`linuxbridge-agent.log`	`neutron-linuxbridge-agent`
`openvswitch-agent.log`	`neutron-openvswitch-agent`

(Ref: https://docs.openstack.org/ocata/config-reference/networking/logs.html)

Network interface status,

High packet drop,

low throughput,

excessive latency or jitter

crc-statistics, fabric-link-failure, link-flap, transceiver-power-low

VM

Deployment/Start Failures:

Failed to start*
Failed to boot*

Post-Deployment/Start failures:

Shutdown
Crash
Hang*
Panic

nova-compute.log

nova-api.log

nova-scheduler.log

libvirt.log

qemu/$vm.log

neutron-server.log

glance/cinder -

flavor

Node and Core-mapping

cpu: per-core utilization

memory

Interfaces statistics - sent, recv, drops

Disk Read/Write

If possible, Infrastructure metrics and syslogs from within the VM should be collected.

Deployment/Start failures can be the first step.

Container

Deployment/Start Failures:

Failed to start*
Failed to boot*

Post-Deployment/Start failures:

Shutdown
Crash
Hang
Panic

OS layer – syslog, boot.log, kern.log etc.
Kubernetes Layer – container Logs (/var/log/containers)
OpenStack Layer – OpenStack service Logs

cpu: per-core utilization

memory

Interfaces statistics - sent, recv, drops

Disk Read/Write

Node

A node failure (hardware failure, OS crash, etc)

A) node network connectivity failure

B) nova service failure

C) Failure of other OpenStack services

/var/log/nova/nova-compute.log
(To ensure that it has successfully connected to the AMQP server
Ref: https://docs.openstack.org/operations-guide/ops-maintenance-compute.html)

Cloud controller	`nova-*`	`/var/log/nova`
Cloud controller	`glance-*`	`/var/log/glance`
Cloud controller	`cinder-*`	`/var/log/cinder`
Cloud controller	`keystone-*`	`/var/log/keystone`
Cloud controller	`neutron-*`	`/var/log/neutron`
Cloud controller	horizon	`/var/log/apache2/`
All nodes	misc (swift, dnsmasq)	`/var/log/syslog`
Compute nodes	libvirt	`/var/log/libvirt/libvirtd.log`
Compute nodes	Console (boot up messages) for VM instances:	`/var/lib/nova/instances/instance-<instance id>/console.log`
Block Storage nodes	cinder-volume	`/var/log/cinder/cinder-volume.log`

(Ref: https://docs.openstack.org/operations-guide/ops-logging.html)

A) node network connectivity failure

management network
VMs communication network
storage network

B) nova service failure (e.g., process crashed) -- detected and restarted by a local watchdog process

compute
volume
network
scheduler
api.

C) Failure of other OpenStack services -- N/A, assuming redundant/highly available configuration

Glance
Keystone

Interfaces statistics - sent, recv, drops

Hypervisor Metrics, Nova Server Metrics, Tenant Metrics, Message Queue Metrics

Keystone and Glance Metrics

Application

Crash/Connectivity/Non-Functional

Application Log i.e. If it is Apache then logs of Apache

Packet Drops, Latency, Throughput, Saturation, Resource Usage

Deploy Collectd within the application and collect both application logs and infrastructure metrics

Middleware Services

Models

We have taken three types of models and in those models we have considered Failure Prediction problem and the remaining types are given as:

...

Space shortcuts

Page tree

Versions Compared

Old Version 11

New Version 12

Key

Data

Models

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 11

New Version 12

Key

Data

Models