Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1.0

Use Linux perf interface to collect data about performance events on a per core basis

 


2.0

Use jevents library (PMU tools) 


3.0

Report hardware cache events, kernel PMU events, software events, hardware specific events 


4.0

Should have a configurable interval 


5.0

Should have configurable hardware specific events list 


6.0

Provide SNMP support for any collectd values, through an PMU MIB 


7.0

 

 

...

Provide support for multi PMU uncore events



Overview

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. Linux perf interface provides rich generalized abstractions over hardware specific capabilities. 

...

  • Resolving symbolic event names using downloaded event files
  • Reading performance counters from ring 3 in C programs,
  • Handling the perf ring buffer (for example to read memory addresses)

 


For more information on jevents see https://github.com/andikleen/pmu-tools/tree/master/jevents.

...

The intel_pmu plugin collects information provided by Linux perf interface. Using this interface, the intel_pmu plugin should collect the following metrics:

...


Name

Type

Type Instance

Description

Kernel PMU events

cpu-cycles

counter

cpu-cycles

 


instructions

counter

instructions 


cache-references

counter

cache-references 


cache-misses

counter

cache-misses 


Branches

counter

Branches 


branch-misses

counter

branch-misses 


bus-cycles

counter

bus-cycles 


Hardware cache events

L1-dcache-loads

counter

L1-dcache-loads 


L1-dcache-load-misses

counter

L1-dcache-load-misses 


L1-dcache-stores

counter

L1-dcache-stores

 


L1-dcache-store-misses

counter

L1-dcache-store-misses 


L1-dcache-prefetches

counter

L1-dcache-prefetches 


L1-dcache-prefetch-misses

counter

L1-dcache-prefetch-misses 


L1-icache-loads

counter

L1-icache-loads

 


L1-icache-load-misses

counter

L1-icache-load-misses 


L1-icache-prefetches

counter

L1-icache-prefetches

 


L1-icache-prefetch-misses

counter

L1-icache-prefetch-misses

 


LLC-loads

counter

LLC-loads

 


LLC-load-misses

counter

LLC-load-misses 


LLC-stores

counter

LLC-stores 


LLC-store-misses

counter

LLC-store-misses 


LLC-prefetches

counter

LLC-prefetches 


LLC-prefetch-misses

counter

LLC-prefetch-misses 


dTLB-loads

counter

dTLB-loads 


dTLB-load-misses

counter

dTLB-load-misses 


dTLB-stores

counter

dTLB-stores

 


dTLB-store-misses

counter

dTLB-store-misses 


dTLB-prefetches

counter

dTLB-prefetches 


dTLB-prefetch-misses

counter

dTLB-prefetch-misses 


iTLB-loads

counter

iTLB-loads

 


iTLB-load-misses

counter

iTLB-load-misses 


branch-loads

counter

branch-loads

 


branch-load-misses

counter

branch-load-misses

 


Software events

cpu-clock

counter

cpu-clock 


task-clock

counter

task-clock 


context-switches

counter

context-switches 


cpu-migrations

counter

cpu-migrations

 


page-faults

counter

page-faults 


minor-faults

counter

minor-faults

 


major-faults

counter

major-faults 


alignment-faults

counter

alignment-faults 


emulation-faults

counter

emulation-faults

 

 

...




Plugin configuration

The following configuration options should be supported by intel_pmu collectd plugin:  

Name

Description

Comment

Interval

The interval within which to retrieve statistics on monitored events in seconds

Interval option is supported by collectd and is defined in <LoadPlugin> block. No additional functionality should be developed in intel_pmu plugin to support this option.

ReportHardwareCacheEvents

Enable/disable monitoring of hardware cache events

 


ReportKernelPMUEvents

Enable/disable monitoring of kernel PMU events 


ReportSoftwareEvents

Enable/disable monitoring of software vents

 


EventListPath to hardware events list file for current CPU.File can be downloaded by event_download.py script which is part of pmu-tools package.

HardwareEvents

String containing comma separated list of hardware specific events to monitor

 


Cores

Core groups definition. Monitored metrics are reported only for configured cores. If this option is omitted all available cores are monitored.

If a group is enclosed in square brackets each core is added individually to a separate group (that is statistics are not aggregated).

Allowed formats:
"0,1,2,3"
"0-3"
"[0-3]"

...

DispatchMultiPmuEnable/disable dispatching of cloned multi PMU for uncore events. If
disabled only total sum is dispatched as single event. If enabled separate
metric is dispatched for every counter.
If enabled information about event type is added to type_instance.


Here is an example of the plugin configuration section of collectd.conf file:

...

    HWSpecificEvents "L2_RQSTS.CODE_RD_HIT,L2_RQSTS.CODE_RD_MISS" "L2_RQSTS.ALL_CODE_RD"
    Cores ""
DispatchMultiPmu false
  </Plugin>

 Implementation details

 intel_pmu plugin does not introduce its own layer of functionality. It just reads configuration provided by user and prepares all needed parameters/data structures for jevents API. This table shows the correspondence between plugin’s API and jevents API that is used to configure Linux perf monitoring.

...


plugin API
jevents API
Description
pmu_config 

Parse events groups to monitor provided by user in collectd.conf
pmu_init
resolve_event_extra
Resolve hardware specific events names to perf events (perf_event_attr)
jevent_next_pmuExpand event into multiple PMU if neccessary (in use for uncore events)
setup_event
Setup perf events for monitoring
pmu_read
read_all_events
Read values of all monitored events
pmu_shutdown
 
 
 



For more details on plugin API see collectd plugin implementation guide https://collectd.org/wiki/index.php/Plugin_architecture.

...

The following table outlines possible impact(s) the deployment of this deliverable may have on the current system. 


Ref

System Impact Description

Recommendation / Comments

1

Plugin can easily exceed the default

limit of allowed file descriptors.

  1. Reduce the number of monitored events and/or cores.
  2. Increase the limit on the number of open file descriptors allowed.

...

The following assumptions apply to the scope specified in this document. 


Ref

Assumption

Status

1 

 



Key Exclusions

The following exclusions apply to the scope discussed in this document. 


Ref

Exclusion

Status

1

 

 



Key Dependencies

The following table outlines the key dependencies associated with this deliverable. 


Ref

Dependency

Status

1

libjevents 


2

Net-SNMP 


3

 

 



4