Statistics
Statistics in collectd consist of a value list. A value list includes:
Value list | Example | comment | |||||
---|---|---|---|---|---|---|---|
Values | 99.8999 | percentage | |||||
Value length | the number of values in the data set. | ||||||
Time | timestamp at which the value was collected. | 1475837857 | epoch | ||||
Interval | interval at which to expect a new value. | 10 | interval | ||||
Host | used to identify the host. | localhost | can be uuid for vm or host… or can give host a name | ||||
Plugin | used to identify the plugin. | cpu | |||||
Plugin instance (optional) | used to group a set of values together. For e.g. values belonging to a DPDK interface. | 0 | |||||
Type | unit used to measure a value. In other words used to refer to a data set. | percent | |||||
Type instance (optional) | used to distinguish between values that have an identical type. | user | |||||
meta data | an opaque data structure that enables the passing of additional information about a value list. “Meta data in the global cache can be used to store arbitrary information about an identifier” |
Notifications
Notifications in collectd are generic messages containing:
An associated severity, which can be one of OKAY, WARNING, and FAILURE. | ||||||
A time. | ||||||
A Message | ||||||
A host. | ||||||
A plugin. | ||||||
A plugin instance (optional). | ||||||
A type. | ||||||
A types instance (optional). | ||||||
Meta-data. |
Example notification:
Severity:FAILURE |
Time:1472552207.385 |
Host:pod3-node1 |
Plugin:dpdkevents |
PluginInstance:dpdk0 |
Type:gauge |
TypeInstance:link_status |
DataSource:value |
CurrentValue:1.000000e+00 |
WarningMin:nan |
WarningMax:nan |
FailureMin:2.000000e+00 |
FailureMax:nan |
Hostpod3-node1, plugin dpdkevents (instance dpdk0) type gauge (instance link_status): Data source "value" is currently 1.000000. That is below the failure threshold of 2.000000. |
Supported Metrics and Events
Dynamic Metrics
Reference starting point: https://github.com/collectd/collectd/blob/master/src/types.db
But below is a mapping of the "base" plugins that would run on the host/the guest.
Where collectd is running | Plugin | Type | Type Instance | Description | comment |
---|---|---|---|---|---|
Host/guest | CPU | percent/nanoseconds | idle | Time CPU spends idle. | Can be per cpu/aggregate across all the cpus. For more info, please see: http://man7.org/linux/man-pages/man1/top.1.html
http://blog.scoutapp.com/articles/2015/02/24/understanding-linuxs-cpu-stats |
percent/nanoseconds | nice | Time the CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness. | |||
percent/nanoseconds | interrupt | Time the CPU has spent servicing interrupts. | |||
percent/nanoseconds | softirq | (apparently) Time spent handling interrupts that are synthesized, and almost as important as Hardware interrupts (above). "In current kernels there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time." [Ref] | |||
percent/nanoseconds | steal | CPU steal is a measure of the fraction of time that a machine is in a state of “involuntary wait.” It is time for which the kernel cannot otherwise account in one of the traditional classifications like user, system, or idle. It is time that went missing, from the perspective of the kernel. http://www.stackdriver.com/understanding-cpu-steal-experiment/ | |||
percent/nanoseconds | system | Time that the CPU spent running the kernel. | |||
percent/nanoseconds | user | Time CPU spends running un-niced user space processes. | |||
percent/nanoseconds | wait | The time the CPU spends idle while waiting for an I/O operation to complete | |||
Interface | if_dropped | in | The total number of received dropped packets. | http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html | |
if_errors | in | The total number of received error packets. | |||
if_octets | in | The total number of received bytes. | |||
if_packets | in | The total number of received packets. | |||
if_dropped | out | The total number of transmit packets dropped | |||
if_errors | out | The total number of transmit error packets. (This is the total of error conditions encountered when attempting to transmit a packet. The code here explains the possibilities, but this code is no longer present in /net/core/dev.c master at present - it appears to have moved to /net/core/net-procfs.c.) | |||
if_octets | out | The total number of bytes transmitted | |||
if_packets | out | The total number of transmitted packets | |||
Memory | memory | buffered | The amount, in kibibytes, of temporary storage for raw disk blocks. | https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html | |
memory | cached | The amount of physical RAM, in kibibytes, left unused by the system. | |||
memory | free | The amount of physical RAM, in kibibytes, left unused by the system. | |||
memory | slab_recl | The part of Slab that can be reclaimed, such as caches. | Slab — The total amount of memory, in kibibytes, used by the kernel to cache data structures for its own usehttps://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html | ||
memory | slab_unrecl | The part of Slab that cannot be reclaimed even when lacking memory | |||
memory | used | mem_used = mem_total - (mem_free + mem_buffered + mem_cached + mem_slab_total); | https://github.com/collectd/collectd/blob/master/src/memory.c#L349 | ||
disk | disk_io_time | io_time | time spent doing I/Os (ms). You can treat this metric as a device load percentage (Value of 1 sec time spent matches 100% of load). | https://collectd.org/wiki/index.php/Plugin:Disk http://lxr.free-electrons.com/source/include/uapi/linux/if_link.h#L43 | |
disk_io_time | weighted_io_time | measure of both I/O completion time and the backlog that may be accumulating. | |||
disk_merged | read | the number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better. | |||
disk_merged | write | the number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better. | |||
disk_octects | read | the number of octets read from a disk or partition | |||
disk_octects | write | the number of octets written to a disk or partition | |||
disk_ops | read | the number of read operations issued to the disk | |||
disk_ops | write | the number of write operations issued to the disk | |||
disk_time | read | the average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt. | |||
disk_time | write | the average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt. | |||
pending_operations | shows queue size of pending I/O operations. | ||||
Ping | ping | Network latency is measured as a round-trip time in milliseconds. An ICMP “echo request” is sent to a host and the time needed for its echo-reply to arrive is measured. | Latency | ||
ping_droprate | droprate = ((double) (pkg_sent - pkg_recv)) / ((double) pkg_sent); | https://github.com/collectd/collectd/blob/master/src/ping.c#L703 | |||
ping_stddev | if pkg_recv > 1 latency_stddev = sqrt (((((double) pkg_recv) * latency_squared) - (latency_total * latency_total)) / ((double) (pkg_recv * (pkg_recv - 1)))); | https://github.com/collectd/collectd/blob/master/src/ping.c#L698 pkg_recv = # of echo-reply messages received latency_squared = latency * latency (for a received echo-reply message) latency_total = the total latency for received echo-reply messages | |||
load | load | shortterm | load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1 Minute measured CPU and IO utilization for 1 min using /proc/loadavg | http://man7.org/linux/man-pages/man5/proc.5.html https://github.com/collectd/collectd/blob/master/src/load.c | |
load | midterm | load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 5 Minutes measured CPU and IO utilization for 5 mins using /proc/loadavg | |||
load | longterm | load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 15 Minutes measured CPU and IO utilization for 15 mins using /proc/loadavg | |||
OVS events | gauge | link_status | Link status of the OvS interface: UP or DOWN | ||
OVS Stats | collisions | Number of collisions. | per interface | ||
rx_bytes | Number of received bytes. | http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf | |||
rx_crc_err | Number of CRC errors. | ||||
rx_dropped | Number of packets dropped by RX. | ||||
rx_errors | Total number of receive errors, greater than or equal to the sum of the RX errors above. | ||||
rx_frame_err | Number of frame alignment errors. | ||||
rx_over_err | Number of packets with RX overrun. | ||||
rx_packets | Number of received packets | ||||
tx_bytes | Number of transmitted bytes | ||||
tx_dropped | Number of packets dropped by TX | ||||
tx_errors | Total number of transmit errors, greater than or equal to the sum of the TX errors above. | ||||
tx_packets | Number of transmitted packets | ||||
Hugepages | bytes | used | Number of used hugepages in bytes | total/pernode/both | |
bytes | free | Number of free hugepages in bytes | |||
vmpage_number | used | Number of used hugepages in numbers | |||
vmpage_number | free | Number of free hugepages in numbers | |||
percent | used | Number of used hugepages in percent | |||
percent | free | Number of free hugepages in percent | |||
processes | fork_rate | the number of threads created since the last reboot | The information comes mainly from /proc/PID/status, /proc/PID/psinfo and /proc/PID/usage. https://collectd.org/wiki/index.php/Plugin:Processes http://man7.org/linux/man-pages/man5/proc.5.html | ||
ps_state | blocked | the number of processes in a blocked state | |||
ps_state | paging | the number of processes in a paging state | |||
ps_state | running | the number of processes in a running state | |||
ps_state | sleeping | the number of processes in a sleeping state | |||
ps_state | stopped | the number of processes in a stopped state | |||
ps_state | zombies | the number of processes in a Zombie state | |||
Host only | Libvirmt | disk_octets | read | number of read bytes as unsigned long long. | |
disk_octets | write | number of written bytes as unsigned long long | |||
disk_ops | read | number of read requests | |||
disk_ops | write | number of write requests | |||
if_dropped | in | receive packets dropped as unsigned long long | https://libvirt.org/html/libvirt-libvirt-domain.html | ||
if_dropped | out | transmit packets dropped as unsigned long long | |||
if_errors | in | receive errors as unsigned long long | |||
if_errors | out | transmission errors as unsigned long long. | |||
if_octets | in | bytes received as unsigned long long | |||
if_octets | out | bytes transmitted as unsigned long long | |||
if_packets | in | packets received as unsigned long long | |||
if_packets | out | packets transmitted as unsigned long long | |||
memory | actual_balloon | Resident Set Size of the process running the domain. This value is in kB | https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatStruct | ||
memory | rss | How much the balloon can be inflated without pushing the guest system to swap, corresponds to 'Available' in /proc/meminfo | |||
memory | swap_in | The total amount of memory written out to swap space (in kB). | |||
memory | total | the memory in KBytes used by the domain | |||
virt_cpu_total | the CPU time used in nanoseconds | This is in nanoseconds ! | |||
virt_vcpu | vcpu_nr | the CPU time used in nanoseconds per cpu | This is in nanoseconds ! | ||
cpu_affinity | vcpu_NR-cpu_NR | pinning of domain VCPUs to host physical CPUs (Value stored is a boolean) | |||
domain_state | Domain state and reason | ||||
file_system | File system information (mountpoint, device name, filesystem type, number of aliases, disk aliases) Dispatched as notification. Requires guest agent to be installed and configured. | ||||
job_stats | Information about progress of a background/completed job on a domain. Check API documentation for more information. (https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats) | ||||
disk_error | DISK_NAME | Disk error code (Metric isn’t dispatched for disk with no errors) | |||
perf | perf_cmt | usage of l3 cache in bytes by applications running on the platform | |||
perf | perf_ mbmt | total system bandwidth from one level of cache | |||
perf | perf_ mbml | bandwidth of memory traffic for a memory controller | |||
perf | perf_cpu_cycles | the count of cpu cycles (total/elapsed) | |||
perf | perf_instructions | the count of instructions by applications running on the platform | |||
perf | perf_cache_references | the count of cache hits by applications running on the platform | |||
perf | perf_cache_misses | the count of cache misses by applications running on the platform | |||
RDT | ipc | Number of instructions per clock per core group | per core group | ||
memory_bandwidth | local | Local Memory Bandwidth utilization | |||
memory_bandwidth | remote | Remote Memory Bandwidth utilization | |||
bytes | llc | Last Level Cache occupancy | |||
dpdkstats compatible with DPDK 16.04 (based on ixgbe, vhost support will be enabled in DPDK 16.11, patch support being upgraded to DPDK 16.07 in progress) | derive | rx_l3_l4_xsum_error | Number of receive IPv4, TCP, UDP or SCTP XSUM errors. | ||
errors | flow_director_filter_add_errors | Number of failed added filters | |||
flow_director_filter_remove_errors | Number of failed removed filters | ||||
mac_local_errors | Number of faults in the local MAC. | ||||
mac_remote_errors | Number of faults in the remote MAC. | ||||
if_rx_dropped | rx_fcoe_dropped | Number of Rx packets dropped due to lack of descriptors. | |||
rx_mac_short_packet_dropped | Number of MAC short packet discard packets received. | ||||
rx_management_dropped | Number of management packets dropped. This register counts the total number of packets received that pass the management filters and then are dropped because the management receive FIFO is full. Management packets include any packet directed to the manageability console (such as RMCP and ARP packets). | ||||
rx_priorityX_dropped | Number of dropped packets received per UP | where X is 0 to 7 | |||
if_rx_errors | rx_crc_errors | Counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (from <Destination Address> through <CRC>, inclusively) in length. | |||
rx_errors | Number of errors received | ||||
rx_fcoe_crc_errors | FC CRC Count. | ||||
rx_fcoe_mbuf_allocation_errors | Number of fcoe Rx packets dropped due to lack of descriptors. | ||||
rx_fcoe_no_direct_data_placement | |||||
rx_fcoe_no_direct_data_placement_ext_buff | |||||
rx_fragment_errors | Number of receive fragment errors (frame shorted than 64 bytes from <Destination Address> through <CRC>, inclusively) that have bad CRC (this is slightly different from the Receive Undersize Count register). | ||||
rx_illegal_byte_errors | Counts the number of receive packets with illegal bytes errors (such as there is an illegal symbol in the packet). | ||||
rx_jabber_errors | Number of receive jabber errors. This register counts the number of received packets that are greater than maximum size and have bad CRC (this is slightly different from the Receive Oversize Count register). The packets length is counted from <Destination Address> through <CRC>, inclusively. | ||||
rx_length_errors | Number of packets with receive length errors. A length error occurs if an incoming packet length field in the MAC header doesn't match the packet length. | ||||
rx_mbuf_allocation_errors | Number of Rx packets dropped due to lack of descriptors. | ||||
rx_oversize_errors | eceive Oversize Error. This register counts the number of received frames that are longer than maximum size as defined by MAXFRS.MFS (from <Destination Address> through <CRC>, inclusively) and have valid CRC. | ||||
rx_priorityX_mbuf_allocation_errors | Number of received packets per UP dropped due to lack of descriptors. | where X is 0 to 7 | |||
rx_q0_errors | Number of errors received for the queue. | if more queues are allocated then you get the errors per Queue | |||
rx_undersize_errors | Receive Undersize Error. This register counts the number of received frames that are shorter than minimum size (64 bytes from <Destination Address> through <CRC>, inclusively), and had a valid CRC. | ||||
if_rx_octets | rx_error_bytes | Counts the number of receive packets with error bytes (such as there is an error symbol in the packet). This registers counts all packets received, regardless of L2 filtering and receive enablement. | bug - will move this to errors | ||
rx_fcoe_bytes | number of received fcoe bytes | ||||
rx_good_bytes | Good octets/bytes received count. This register includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusively. | ||||
rx_q0_bytes | Number of bytes received for the queue. | per queue | |||
rx_total_bytes | Total received octets. This register includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusively. | ||||
if_rx_packets | rx_broadcast_packets | Number of good (non-erred) broadcast packets received. | |||
rx_fcoe_packets | Number of FCoE packets posted to the host. In normal operation (no save bad frames) it equals to the number of good packets. | ||||
rx_flow_control_xoff_packets | Number of XOFF packets received. This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities. | ||||
rx_flow_control_xon_packets | Number of XON packets received. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities. | ||||
rx_good_packets | Number of good (non-erred) Rx packets (from the network). | ||||
rx_management_packets | Number of management packets received. This register counts the total number of packets received that pass the management filters. Management packets include RMCP and ARP packets. Any packets with errors are not counted, except for the packets that are dropped because the management receive FIFO is full are counted. | ||||
rx_multicast_packets | Number of good (non-erred) multicast packets received (excluding broadcast packets). This register does not count received flow control packets. | ||||
rx_priorityX_xoff_packets | Number of XOFF packets received per UP | where X is 0 to 7 | |||
rx_priorityX_xon_packets | Number of XON packets received per UP | where X is 0 to 7 | |||
rx_q0_packets | Number of packets received for the queue. | per queue | |||
rx_size_1024_to_max_packets | Number of packets received that are 1024-max bytes in length (from <Destination Address> through <CRC>, inclusively). This registers does not include received flow control packets. The maximum is dependent on the current receiver configuration and the type of packet being received. If a packet is counted in receive oversized count, it is not counted in this register. Due to changes in the standard for maximum frame size for VLAN tagged frames in 802.3, packets can have a maximum length of 1522 bytes. | ||||
rx_size_128_to_255_packets | Number of packets received that are 128-255 bytes in length (from <Destination Address> through <CRC>, inclusively). | ||||
rx_size_256_to_511_packets | Number of packets received that are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively). | ||||
rx_size_512_to_1023_packets | Number of packets received that are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively). | ||||
rx_size_64_packets | Number of good packets received that are 64 bytes in length (from <Destination Address> through <CRC>, inclusively). | ||||
rx_size_65_to_127_packets | Number of packets received that are 65-127 bytes in length (from <Destination Address> through <CRC>, inclusively) | ||||
rx_total_missed_packets | the total number of rx missed packets, that is is a packet that was correctly received by the NIC but because it was out of descriptors and internal memory, the packet had to be dropped by the NIC itself | ||||
rx_total_packets | Number of all packets received. This register counts the total number of all packets received. All packets received are counted in this register, regardless of their length, whether they are erred, but excluding flow control packets. | ||||
rx_xoff_packets | Number of XOFF packets received. Sticks to 0xFFFF. XOFF packets can use the global address or the station address. This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only this counter is incremented. | ||||
rx_xon_packets | Number of XON packets received. XON packets can use the global address, or the station address. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only the LXOFFRXCNT counter is incremented. | ||||
if_tx_errors | tx_errors | Total number of TX error packets | |||
if_tx_octets | tx_fcoe_bytes | Number of fcoe bytes transmitted | |||
tx_good_bytes | counter of successfully transmitted octets. This register includes transmitted bytes in a packet from the <Destination Address> field through the <CRC> field, inclusively. | ||||
tx_q0_bytes | Number of bytes transmitted by the queue. | per queue | |||
if_tx_packets | tx_broadcast_packets | Number of broadcast packets transmitted count. This register counts all packets, including standard packets, secure packets, FC packets and manageability packets | |||
tx_fcoe_packets | Number of fcoe packets transmitted | ||||
tx_flow_control_xoff_packets | Link XOFF Transmitted Count | ||||
tx_flow_control_xon_packets | Link XON Transmitted Count | ||||
tx_good_packets | Number of good packets transmitted | ||||
tx_management_packets | Number of management packets transmitted. | ||||
tx_multicast_packets | Number of multicast packets transmitted. This register counts the number of multicast packets transmitted. This register counts all packets, including standard packets, secure packets, FC packets and manageability packets. | ||||
tx_priorityX_xoff_packets | Number of XOFF packets transmitted per UP | where X is 0 to 7 | |||
tx_priorityX_xon_packets | Number of XON packets transmitted per UP | where X is 0 to 7 | |||
tx_q0_packets | Number of packets transmitted for the queue. A packet is considered as transmitted if it is was forwarded to the MAC unit for transmission to the network and/or is accepted by the internal Tx to Rx switch enablement logic. Packets dropped due to anti-spoofing filtering or VLAN tag validation (as described in Section 7.10.3.9.2) are not counted. | per queue | |||
tx_size_1024_to_max_packets | Number of packets transmitted that are 1024 or more bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets. | ||||
tx_size_128_to_255_packets | Number of packets transmitted that are 128-255 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets | ||||
tx_size_256_to_511_packets | Number of packets transmitted that are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets. | ||||
tx_size_512_to_1023_packets | Number of packets transmitted that are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets. | ||||
tx_size_64_packets | Number of packets transmitted that are 64 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, FC packets, and manageability packets. | ||||
tx_size_65_to_127_packets | Number of packets transmitted that are 65-127 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets. | ||||
tx_total_packets | Number of all packets transmitted. This register counts the total number of all packets transmitted. This register counts all packets, including standard packets, secure packets, FC packets, and manageability packets. | ||||
tx_xoff_packets | Number of XOFF packets transmitted | ||||
tx_xon_packets | Number of XON packets transmitted | ||||
operations | flow_director_added_filters | This field counts the number of added filters to the flow director filters logic. | |||
flow_director_matched_filters | This field counts the number of matched filters to the flow director filters logic. | ||||
flow_director_missed_filters | This field counts the number of missed filters to the flow director filters logic. | ||||
flow_director_removed_filters | This field counts the number of removed filters from the flow director filters logic. | ||||
pcie | correctable | non_fatal | Notification (Warning) in case of PCIe correctable error occurrence. Message contains short error description. | ||
uncorrectable | fatal | Notification (Failure) in case of PCIe uncorrectable fatal error occurrence. Message contains short error description. | |||
non_fatal | Notification (Warning) in case of PCIe uncorrectable non-fatal error occurrence. Message contains short error description. | ||||
mcelog | errors | corrected_memory_errors | The total number of hardware errors that were corrected by the hardware (e.g. using a single bit data corruption that was correctible using ECC). These errors do not require immediate software actions, but are still reported for accounting and predictive failure analysis. | Memory (RAM) errors are among the most common errors in typical server systems. They also scale with the amount of memory: the more memory the more errors. In addition large clusters of computers with tens or hundreds (or sometimes thousands) of active machines increase the total error rate of the system. http://www.mcelog.org/memory.html | |
uncorrected_memory_error | the total number of uncorrected hardware errors detected by the hardware. Data corruption has occurred. These errors require software reaction. | ||||
corrected_memory_errors_in_%s | The total number of hardware errors that were corrected by the hardware in a certain period of time | where %s is a timed period like 24 hours http://www.mcelog.org/memory.html | |||
uncorrected_memory_errors_in_%s | the total number of uncorrected hardware errors detected by the hardware in a certain period of time | where %s is a timed period like 24 hours http://www.mcelog.org/memory.html |
Events
Where collectd is running | Plugin | Type | Type Instance | Severity | Description | comment |
---|---|---|---|---|---|---|
host/guest | ovs_events | gauge | link_status | Warning on Link Status Down
| Link status of the OvS interface: UP or DOWN | |
host | mcelog | errors | Failure on failure to connect to the mcelog socket/ if connection is lost OK on connection to mcelog socket Warning for Corrected Memory Errors Failure for Uncorrected Memory Errors | Reports Corrected and Uncorrected DIMM Failures | ||
host/guest | dpdk_events | link_status | ||||
keep_alive |
The information comes mainly from | |
* /proc/PID/status, /proc/PID/psinfo and /proc/PID/usage |