Anuket Project

Metrics List & Descriptions:

Technology/CategoryMetric/Feature/InputNameDate TypeFormat ExampleCollectd ReleaseCollectd PluginDescriptionDependenciesLimitationsComments
PCIE AER
PCIe AER Plugin----Plugin to provide PCIe AER metrics, errors, notifications & device informationDepends on sysfs and proc file systemsTo be used on little endian systems. 
PCIe AERFeatureDevice DomainHex10Masterpcie_errorsThe PCI address domain consisting of three distinct address spaces: configuration, memory, and I/O space.None

PCIe AERFeatureDevice BusHex10Masterpcie_errorsPCIe Bus numberNone

PCIe AERFeatureDevice IDHex3597Masterpcie_errorsPCIe Device ID of the deviceNone

PCIe AERFeatureDevice FunctionHex10Masterpcie_errors Bus:Device.Function notation used to succinctly describe PCI and PCIe devicesNone

PCIe AERFeatureInstance TypeTextcorrectable/uncorrectableMasterpcie_errorsPCIe instance typeNone

PCIe AERFeatureSeverityTextFatal/Non-fatalMasterpcie_errorsSeverity flag indicating nature of severity of uncorrectable errors with fatal or non-fatal error typesNone

PCIe AERFeaturePersistent NotificationTextTrue/FalseMasterpcie_errorsIf any uncorrectible error is already reported once, persistent flag is set in the plugin and not reported againNone

PCIE AERMetricUncorrectable ErrorTextuncorrectableMasterpcie_errorsThe errors which don’t have impact on integrity of the PCI Express fabric, but data/information is lost. Non-fatal errors are corrupted transactions that can’t be corrected by PCIe hardware.
However, the PCI Express fabric continues to function correctly and other transactions are unaffected, only particular transaction is affected. Recovery from a non-fatal error may or may not, depends on device-specific software associated with the requester that initiated the transaction
None

PCIe AERMetricCorrectable ErrorTextcorrectableMasterpcie_errorsthe errors which may have an impact on performance (like latency, bandwidth), but no data/information is lost and PCIe fabric remains reliable. Such errors are corrected by hardware and no software intervention is requiredNone

PCIe AERMetricSeverity Non-Fatal ErrorTextnon_fatalMasterpcie_errorsError severity indicating no reboot necessaryNone

PCIe AERMetricSeverity Fatal ErrorTextfatalMasterpcie_errorsError severity indicating reboot necessaryNone

PCIe AERMetricUnsupported RequestTextunsupportedMasterpcie_errorsThis error occurs when an endpoint or a root port recieves any of a set of transactions as defined by PCIe Spec defined in [1]. In all cases the TLP is deleted in the Hard IP block and not presented to the Application Layer. If the TLP is a non-posted request, the Hard IP block generates a completion with Unsupported Request status.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricData Link Protocol Uncorrected ErrorTextData Link ProtocolMasterpcie_errorsThis error occurs when a sequence number specified by the Ack/Nak block in the Data Link Layer (AckNak_Seq_Num) does not correspond to an unacknowledged TLP.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricSurprise Down Uncorrected ErrorTextSurprise DownMasterpcie_errorsWhen the PCIe device goes down without a noticeDepends on what's exposed in sysfs and proc file systems

PCIe AERMetricPoisoned TLP Uncorrected ErrorTextPoisoned TLPMasterpcie_errorsanytime a poisoned TLP is destined to PCIe device, IIO module will drop the poisoned data packet, contain the error in the domain that it was detected in, bring down the link, and signal a fatal error to SW/FWDepends on what's exposed in sysfs and proc file systems

PCIe AERMetricFlow Control Protocol Uncorrected ErrorTextFlow Control ProtocolMasterpcie_errorsAn uncorrected error in flow control protocol found in transaction layer that prevents flow control credits transactions being sent. This error occurs when a component does not receive update flow control credits with the 200 µs limit.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricCompletion Timeout Uncorrected ErrorTextCompletion TimeoutMasterpcie_errorsThis error occurs when a request originating from the Application Layer does not generate a corresponding completion TLP within the established time. It is the responsibility of the Application Layer logic to provide the completion timeout mechanism. The completion timeout should be reported from the Transaction Layer using the cpl_err[0] signal.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricCompleter Abort Uncorrected ErrorTextCompleter AbortMasterpcie_errorsThe Application Layer reports this error using thecpl_err[2]signal when it aborts receipt of a TLP.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricUnexpected Completion Uncorrected ErrorTextUnexpected CompletionMasterpcie_errorsThis error is caused by an unexpected completion transaction as listed in [1]. The TLP is not presented to the Application Layer; the Hard IP block deletes it.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricReceiver Overflow Uncorrected ErrorTextReceiver OverflowMasterpcie_errorsThis error occurs when a component receives a TLP that violates the FC credits allocated for this type of TLP. In all cases the hard IP block deletes the TLP and it is not presented to the Application Layer.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricMalformed TLP Uncorrected ErrorTextMalformed TLPMasterpcie_errorsThis error is caused by an unexpected completion transaction as listed in [1]. The TLP is not presented to the Application Layer; the Hard IP block deletes it.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricECRC Uncorrected Error StatusTextECRCMasterpcie_errorsECRC ensures end-to-end data integrity for systems that require high reliability. When the ECRC generation option is turned on, errors are detected when receiving TLPs with a bad ECRC. More details in [2]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricUnsupported Uncorrected Error RequestTextUnsupportedMasterpcie_errorsThis error is caused by an unexpected completion transaction as listed in [1]. The TLP is not presented to the Application Layer; the Hard IP block deletes it.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricACS Violation Uncorected ErrorTextACS ViolationMasterpcie_errorsViolation in Access Control Services. More details in [3]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricInternal Uncorrected ErrorTextInternal UncorrectedMasterpcie_errorsAn error associated with a PCI Express interface that occurs within a component and which may not be attributable to a packet or event on the PCI Express interface itself or on behalf of transactions initiated on PCI Express. More details in [4]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricMC Blocked TLP Uncorrected ErrorTextMC Blocked TLPMasterpcie_errorsAn error with Multicast TLP processing. More details in [5]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricAtomic Egress Blocked Uncorrected ErrorTextAtomic Egress BlockedMasterpcie_errorsError with setting AtomicOp Egress Blocking bit. More details in [6]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricTLP Prefix Blocked Uncorrected ErrorTextTLP Prefix BlockedMasterpcie_errorsThe TLP Prefix mechanism extends the header size by adding DWORDS to the front of headers that carry additional information. The uncorrected error reflects failure in the process. More details in [7]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricReceiver Error Status Corrected ErrorTextReceiver Error StatusMasterpcie_errorsReceiver error at PCIe physical layerDepends on what's exposed in sysfs and proc file systems

PCIe AERMetricBad TLP Status Corrected ErrorTextBad TLP StatusMasterpcie_errorsThis error occurs when a LCRC verification fails or when a sequence number error occurs.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricBad DLLP Status Corrected ErrorTextBad DLLP StatusMasterpcie_errorsThis error occurs when a CRC verification fails.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricReplay NUM Rollover Corrected ErrorTextReplay NUM RolloverMasterpcie_errorsThis error occurs when the replay number rolls over.Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricReplay Timer Timeout Corrected ErrorTextReplay Timer TimeoutMasterpcie_errorsThis error occurs when the replay timer times outDepends on what's exposed in sysfs and proc file systems

PCIe AERMetricAdvisory Non-Fatal Corrected ErrorTextAdvisory Non-FatalMasterpcie_errorsThe error are reported and signaled as ERR_COR, ERR_NONFATAL, ERR_FATAL or not signaled at all, depending upon the role of the agent that detects the error and whether the agent implements AER as an advisory capacity to application. More details in [8]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricCorrected Internal Corrected ErrorTextCorrected InternalMasterpcie_errorsAn error associated with a PCI Express interface that occurs within a component and which may not be attributable to a packet or event on the PCI Express interface itself or on behalf of transactions initiated on PCI Express. More details in [4]Depends on what's exposed in sysfs and proc file systems

PCIe AERMetricHeader Log Overflow Corrected ErrorTextHeader Log OverflowMasterpcie_errorsWhen a header is logged, the header is that of the first TLP that was lost or corrupted by the Uncorrectable Internal Error. More detilas in [9]Depends on what's exposed in sysfs and proc file systems


Sub-sections:

PCIe Errors High Level Design

PCIe RAS Executed Tests 

  • No labels