Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 
Where collectd is runningPluginTypeType InstanceDescriptioncomment
Host/guestCPUpercent/nanosecondsidleTime CPU spends idle
 

Can be per cpu/aggregate across all the cpus.

For more info, please see:

http://man7.org/linux/man-pages/man1/top.1.html

 

http://blog.scoutapp.com/articles/2015/02/24/understanding-linuxs-cpu-stats

Note that jiffies operate on a variable time base, HZ. The default value of HZ should be used (1000), yielding a jiffy value of 0.001 seconds) [time(7)]. Also, the actual number of jiffies in each second is subject to system factors, such as use of virtualization. Thus, the percent calculation based on jiffies will nominally sum to 100% plus or minus error.




percent/nanosecondsniceTime the CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness.
percent/nanosecondsinterruptTime the CPU has spent servicing interrupts.
percent/nanosecondssoftirq(apparently) Time spent handling interrupts that are almost as important as Hardware interrupts (above). "In current kernels there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time." [Ref]
percent/nanosecondsstealCPU steal is a measure of the fraction of time that a machine is in a state of “involuntary wait.”  It is time for which the kernel cannot otherwise account in one of the traditional classifications like user, system, or idle.  It is time that went missing, from the perspective of the kernel.

http://www.stackdriver.com/understanding-cpu-steal-experiment/

percent/nanosecondssystemTime that the CPU spent running the kernel.
percent/nanosecondsuserTime CPU spends running un-niced user space processes
percent/nanosecondswaitThe time the CPU spends idle while waiting for an I/O operation to complete
Interfaceif_droppedinThe number of received dropped packets. 
if_errorsinThe number of received error packets. 
if_octetsinThe number of received bytes. 
if_packetsinThe number of received packets. 
if_droppedoutThe number of transmit packets dropped 
if_errorsoutThe number of transmit error packets. 
if_octetsoutThe number of bytes transmitted 
if_packetsoutThe number of transmitted packets 
MemorymemorybufferedThe amount, in kibibytes, of temporary storage for raw disk blocks.https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html
memorycachedThe amount of physical RAM, in kibibytes, left unused by the system.
memoryfreeThe amount of physical RAM, in kibibytes, left unused by the system.
memoryslab_reclThe part of Slab that can be reclaimed, such as caches.Slab — The total amount of memory, in kibibytes, used by the kernel to cache data structures for its own use
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html
memoryslab_unreclThe part of Slab that cannot be reclaimed even when lacking memory
memoryusedmem_used = mem_total - (mem_free + mem_buffered + mem_cached + mem_slab_total);https://github.com/collectd/collectd/blob/master/src/memory.c#L349
diskdisk_io_timeio_timetime spent doing I/Os (ms). You can treat this metric as a device load percentage (Value of 1 sec time spent matches 100% of load).








https://collectd.org/wiki/index.php/Plugin:Disk

disk_io_timeweighted_io_timemeasure of both I/O completion time and the backlog that may be accumulating.
disk_mergedreadthe number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better.
disk_mergedwritethe number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better.
disk_octectsreadthe number of octets read from a disk or partition
disk_octectswritethe number of octets written to a disk or partition
disk_opsreadthe number of read operations issued to the disk
disk_opswritethe number of write operations issued to the disk
disk_timereadthe average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt.
disk_timewritethe average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt.
pending_operations shows queue size of pending I/O operations.
Pingping Network latency is measured as a round-trip time in milliseconds. An ICMP “echo request” is sent to a host and the time needed for its echo-reply to arrive is measured.Latency
ping_droprate droprate = ((double) (pkg_sent - pkg_recv)) / ((double) pkg_sent);https://github.com/collectd/collectd/blob/master/src/ping.c#L703
ping_stddev if pkg_recv > 1
latency_stddev = sqrt (((((double) pkg_recv) * latency_squared) - (latency_total * latency_total)) / ((double) (pkg_recv * (pkg_recv - 1))));

https://github.com/collectd/collectd/blob/master/src/ping.c#L698

pkg_recv = # of echo-reply messages received

latency_squared = latency * latency (for a received echo-reply message)

latency_total = the total latency for received echo-reply messages


loadloadshortterm  
loadmidterm  
load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1 Minute

measured CPU and IO utilization for 1 min using
/proc/loadavg
http://man7.org/linux/man-pages/man5/proc.5.html

https://github.com/collectd/collectd/blob/master/src/load.c
loadmidterm
load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 5 Minutes

measured CPU and IO utilization for 5 mins using
 /proc/loadavg
loadlongterm
load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 15 Minutes

measured CPU and IO utilization for 15 mins using
 /proc/loadavg
loadlongterm 
OVS eventsgaugelink_status  
OVS Stats  collisions  per interface
  rx_bytes   
  rx_crc_err   
  rx_dropped   
  rx_errors   
  rx_frame_err   
  rx_over_err   
  rx_packets   
  tx_bytes   
  tx_dropped   
  tx_errors   
  tx_packets   
Hugepagesbytesused total/pernode/both
bytesfree 
vmpage_numberused 
vmpage_numberfree 
percentused 
percentfree 
processesfork_rate   
ps_stateblocked  
ps_statepaging  
ps_staterunning  
ps_statesleeping  
ps_statestopped  
ps_statezombies  
Host onlyLibvirtdisk_octetsread  
disk_octetswrite  
disk_opsread  
disk_opswrite  
if_droppedin  
if_droppedout  
if_errorsin  
if_errorsout  
if_octetsin  
if_octetsout  
if_packetsin  
if_packetsout  
memoryactual  
memoryballoon  
memoryrss  
memoryswap_in  
memorytotal  
virt_cpu_total  This is in jiffies!
virt_vcpu  This is in jiffies!
RDTipc  per core group
memory_bandwidthlocal 
memory_bandwidthremote 
bytesllc 

dpdkstats

compatible with DPDK 16.04

(based on ixgbe, vhost

support will be enabled in

DPDK 16.11, patch

support being upgraded

to DPDK 16.07 in progress)

deriverx_l3_l4_xsum_error  
errorsflow_director_filter_add_errors  
flow_director_filter_remove_errors  
mac_local_errors  
mac_remote_errors  
if_rx_droppedrx_fcoe_dropped  
rx_mac_short_packet_dropped  
rx_management_dropped  
rx_priorityX_dropped where X is 0 to 7
if_rx_errorsrx_crc_errors  
rx_errors  
rx_fcoe_crc_errors  
rx_fcoe_mbuf_allocation_errors  
rx_fcoe_no_direct_data_placement  
rx_fcoe_no_direct_data_placement_ext_buff  
rx_fragment_errors  
rx_illegal_byte_errors  
rx_jabber_errors  
rx_length_errors  
rx_mbuf_allocation_errors  
rx_oversize_errors  
rx_priorityX_mbuf_allocation_errors where X is 0 to 7
rx_q0_errors if more queues are allocated then you get the errors per Queue
rx_undersize_errors  
if_rx_octetsrx_error_bytes bug - will move this to errors
rx_fcoe_bytes  
rx_fcoe_bytes  
rx_good_bytes  
rx_q0_bytes per queue
rx_total_bytes  
if_rx_packetsrx_broadcast_packets  
rx_fcoe_packets  
rx_flow_control_xoff_packets  
rx_flow_control_xon_packets  
rx_good_packets  
rx_management_packets  
rx_multicast_packets  
rx_priorityX_xoff_packets where X is 0 to 7
rx_priorityX_xon_packets where X is 0 to 7
rx_q0_packets per queue
rx_size_1024_to_max_packets  
rx_size_128_to_255_packets  
rx_size_256_to_511_packets  
rx_size_512_to_1023_packets  
rx_size_64_packets  
rx_size_65_to_127_packets  
rx_total_missed_packets  
rx_total_packets  
rx_xoff_packets  
rx_xon_packets  
if_tx_errorstx_errors  
if_tx_octetstx_fcoe_bytes  
tx_good_bytes  
tx_q0_bytes per queue
if_tx_packetstx_broadcast_packets  
tx_fcoe_packets  
tx_flow_control_xoff_packets  
tx_flow_control_xon_packets  
tx_good_packets  
tx_management_packets  
tx_multicast_packets  
tx_priorityX_xoff_packets where X is 0 to 7
tx_priorityX_xon_packets where X is 0 to 7
tx_q0_packets per queue
tx_size_1024_to_max_packets  
tx_size_128_to_255_packets  
tx_size_256_to_511_packets  
tx_size_512_to_1023_packets  
tx_size_64_packets  
tx_size_65_to_127_packets  
tx_total_packets  
tx_xoff_packets  
tx_xon_packets  
operationsflow_director_added_filters  
flow_director_matched_filters  
flow_director_missed_filters  
flow_director_removed_filters