Anuket Project

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Statistics in collectd consist of a value list. A value list includes:

 

Value listExamplecomment
Values 99.8999percentage
Value lengththe number of values in the data set.  
Timetimestamp at which the value was collected.1475837857epoch
Intervalinterval at which to expect a new value.10interval
Hostused to identify the host.localhostcan be uuid for vm or host… or can give host a name
Pluginused to identify the plugin.cpu 
Plugin instance (optional)used to group a set of values together. For e.g. values belonging to a DPDK interface.0 
Typeunit used to measure a value. In other words used to refer to a data set.percent 
Type instance (optional)used to distinguish between values that have an identical type.user 
meta dataan opaque data structure that enables the passing of additional information about a value list. “Meta data in the global cache can be used to store arbitrary information about an identifier”   

 

Notifications:


Notifications in collectd are generic messages containing:

An associated severity, which can be one of OKAY, WARNING, and FAILURE.
A time.      
A Message     
A host.      
A plugin.      
A plugin instance (optional).    
A type.      
A types instance (optional).    
Meta-data.     


Example notification:

 

Severity:FAILURE
Time:1472552207.385
Host:pod3-node1
Plugin:dpdkevents
PluginInstance:dpdk0
Type:gauge
TypeInstance:link_status
DataSource:value
CurrentValue:1.000000e+00
WarningMin:nan
WarningMax:nan
FailureMin:2.000000e+00
FailureMax:nan
Hostpod3-node1, plugin dpdkevents (instance dpdk0) type gauge (instance link_status): Data source "value" is currently 1.000000. That is below the failure threshold of 2.000000.

 

Supported Metrics and Events

Dynamic Metrics

Reference starting point: https://github.com/collectd/collectd/blob/master/src/types.db  

But below is a mapping of the "base" plugins that would run on the host/the guest.

Where collectd is runningPluginTypeType InstanceDescriptioncomment
Host/guestCPUpercent/nanosecondsidleTime CPU spends idle
 

Can be per cpu/aggregate across all the cpus.

For more info, please see:

http://man7.org/linux/man-pages/man1/top.1.html

 

http://blog.scoutapp.com/articles/2015/02/24/understanding-linuxs-cpu-stats

Note that jiffies operate on a variable time base, HZ. The default value of HZ should be used (1000), yielding a jiffy value of 0.001 seconds) [time(7)]. Also, the actual number of jiffies in each second is subject to system factors, such as use of virtualization. Thus, the percent calculation based on jiffies will nominally sum to 100% plus or minus error.




percent/nanosecondsniceTime the CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness.
percent/nanosecondsinterruptTime the CPU has spent servicing interrupts.
percent/nanosecondssoftirq(apparently) Time spent handling interrupts that are almost as important as Hardware interrupts (above). "In current kernels there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time." [Ref]
percent/nanosecondsstealCPU steal is a measure of the fraction of time that a machine is in a state of “involuntary wait.”  It is time for which the kernel cannot otherwise account in one of the traditional classifications like user, system, or idle.  It is time that went missing, from the perspective of the kernel.

http://www.stackdriver.com/understanding-cpu-steal-experiment/

percent/nanosecondssystemTime that the CPU spent running the kernel.
percent/nanosecondsuserTime CPU spends running un-niced user space processes
percent/nanosecondswaitThe time the CPU spends idle while waiting for an I/O operation to complete
Interfaceif_droppedinThe number of received dropped packets. 
if_errorsinThe number of received error packets. 
if_octetsinThe number of received bytes. 
if_packetsinThe number of received packets. 
if_droppedoutThe number of transmit packets dropped 
if_errorsoutThe number of transmit error packets. 
if_octetsoutThe number of bytes transmitted 
if_packetsoutThe number of transmitted packets 
Memorymemorybuffered  
memorycached  
memoryfree  
memoryslab_recl  
memoryslab_unrecl  
memoryused  
diskdisk_io_timeio_time  
disk_io_timeweighted_io_time  
disk_mergedread  
disk_mergedwrite  
disk_octectsread  
disk_octectswrite  
disk_opsread  
disk_opswrite  
disk_timeread  
disk_timewrite  
pending_operations   
Pingping  Latency
ping_droprate  droprate
ping_stddev  standard deviation
loadloadshortterm  
loadmidterm  
loadlongterm  
OVS eventsgaugelink_status  
OVS Stats  collisions  per interface
  rx_bytes   
  rx_crc_err   
  rx_dropped   
  rx_errors   
  rx_frame_err   
  rx_over_err   
  rx_packets   
  tx_bytes   
  tx_dropped   
  tx_errors   
  tx_packets   
Hugepagesbytesused total/pernode/both
bytesfree 
vmpage_numberused 
vmpage_numberfree 
percentused 
percentfree 
processesfork_rate   
ps_stateblocked  
ps_statepaging  
ps_staterunning  
ps_statesleeping  
ps_statestopped  
ps_statezombies  
Host onlyLibvirtdisk_octetsread  
disk_octetswrite  
disk_opsread  
disk_opswrite  
if_droppedin  
if_droppedout  
if_errorsin  
if_errorsout  
if_octetsin  
if_octetsout  
if_packetsin  
if_packetsout  
memoryactual  
memoryballoon  
memoryrss  
memoryswap_in  
memorytotal  
virt_cpu_total  This is in jiffies!
virt_vcpu  This is in jiffies!
RDTipc  per core group
memory_bandwidthlocal 
memory_bandwidthremote 
bytesllc 

dpdkstats

compatible with DPDK 16.04

(based on ixgbe, vhost

support will be enabled in

DPDK 16.11, patch

support being upgraded

to DPDK 16.07 in progress)

deriverx_l3_l4_xsum_error  
errorsflow_director_filter_add_errors  
flow_director_filter_remove_errors  
mac_local_errors  
mac_remote_errors  
if_rx_droppedrx_fcoe_dropped  
rx_mac_short_packet_dropped  
rx_management_dropped  
rx_priorityX_dropped where X is 0 to 7
if_rx_errorsrx_crc_errors  
rx_errors  
rx_fcoe_crc_errors  
rx_fcoe_mbuf_allocation_errors  
rx_fcoe_no_direct_data_placement  
rx_fcoe_no_direct_data_placement_ext_buff  
rx_fragment_errors  
rx_illegal_byte_errors  
rx_jabber_errors  
rx_length_errors  
rx_mbuf_allocation_errors  
rx_oversize_errors  
rx_priorityX_mbuf_allocation_errors where X is 0 to 7
rx_q0_errors if more queues are allocated then you get the errors per Queue
rx_undersize_errors  
if_rx_octetsrx_error_bytes bug - will move this to errors
rx_fcoe_bytes  
rx_fcoe_bytes  
rx_good_bytes  
rx_q0_bytes per queue
rx_total_bytes  
if_rx_packetsrx_broadcast_packets  
rx_fcoe_packets  
rx_flow_control_xoff_packets  
rx_flow_control_xon_packets  
rx_good_packets  
rx_management_packets  
rx_multicast_packets  
rx_priorityX_xoff_packets where X is 0 to 7
rx_priorityX_xon_packets where X is 0 to 7
rx_q0_packets per queue
rx_size_1024_to_max_packets  
rx_size_128_to_255_packets  
rx_size_256_to_511_packets  
rx_size_512_to_1023_packets  
rx_size_64_packets  
rx_size_65_to_127_packets  
rx_total_missed_packets  
rx_total_packets  
rx_xoff_packets  
rx_xon_packets  
if_tx_errorstx_errors  
if_tx_octetstx_fcoe_bytes  
tx_good_bytes  
tx_q0_bytes per queue
if_tx_packetstx_broadcast_packets  
tx_fcoe_packets  
tx_flow_control_xoff_packets  
tx_flow_control_xon_packets  
tx_good_packets  
tx_management_packets  
tx_multicast_packets  
tx_priorityX_xoff_packets where X is 0 to 7
tx_priorityX_xon_packets where X is 0 to 7
tx_q0_packets per queue
tx_size_1024_to_max_packets  
tx_size_128_to_255_packets  
tx_size_256_to_511_packets  
tx_size_512_to_1023_packets  
tx_size_64_packets  
tx_size_65_to_127_packets  
tx_total_packets  
tx_xoff_packets  
tx_xon_packets  
operationsflow_director_added_filters  
flow_director_matched_filters  
flow_director_missed_filters  
flow_director_removed_filters  
  • No labels