Anuket Project

 

NOTE: Unfortunately the patches were rejected by the OVS community.



Implementation of VLAN tunnel (QinQ) sFLOW related counters

Status: Patch was rejected, as it hits counter type was not supported in sFLOW.

Requirement

 Exposure of the "Extended VLAN tunnel information" from the sFlow RFC (http://www.sflow.org/sflow_version_5.txt) in Open vSwitch (OVS).

Overview

 Open vSwitch (OVS) supports traffic monitoring using sFlow protocol (see: http://docs.openvswitch.org/en/latest/howto/sflow/).

sFlow monitoring system consists of sFlow Agent (embedded in OVS) and central sFlow Collector. The sFlow Agent uses two forms of sampling: statistical packet-based sampling of switched or routed packet flows, and time-based sampling of counters.

Recently, QinQ (802.1ad) feature has been added to Open vSwitch (Add support for 802.1ad (QinQ tunneling) f0fb825a).

The patch aims at implementing QinQ related sFlow counters.

Feature description

The patch implements reporting QinQ related counters in accordance with sFlow specification.

The result is not visible for Open vSwitch user. It is visible for sFlow collector user.

Open vSwitch sends a stack of stripped VLAN tags in sFlow datagram. The exact scenario in which the information is exposed to sFlow collector is specified in sFlow documentation.

This is the relevant excerpt:

/* Extended VLAN tunnel information
  Record outer VLAN encapsulations that have
  been stripped. extended_vlantunnel information
  should only be reported if all the following conditions are satisfied:
     1. The packet has nested vlan tags, AND
     2. The reporting device is VLAN aware, AND
     3. One or more VLAN tags have been stripped, either
        because they represent proprietary encapsulations, or
        because switch hardware automatically strips the outer VLAN
        encapsulation.
   Reporting extended_vlantunnel information is not a substitute for
   reporting extended_switch information. extended_switch data must
   always be reported to describe the ingress/egress VLAN information
   for the packet. The extended_vlantunnel information only applies to
   nested VLAN tags, and then only when one or more tags has been
   stripped. */

sFlow collector receives the information in extended_vlantunnel structure in the above mentioned scenario.

Test preparation

Open vSwitch configuration

Open vSwitch should be configured according to: http://docs.openvswitch.org/en/latest/intro/install/general/

Create one bridge with two ports. The two ports should be reachable from Ixia traffic generator.

Add OpenFlow rule for the bridge. The rule should strip VLAN tag from packets received on port no. 1 and forward them to port no. 2.

Enabling Open vSwitch to report to sFlow collector requires additional configuration.

$ ovs-vsctl -- --id=@sflow create sflow agent=${AGENT_IP} \

    target="${COLLECTOR_IP}:${COLLECTOR_PORT}" header=${HEADER_BYTES} \

    sampling=${SAMPLING_N} polling=${POLLING_SECS} \

      -- set bridge br0 sflow=@sflow

These are exemplary variables for the command:

COLLECTOR_IP=10.0.0.1

COLLECTOR_PORT=6343

AGENT_IP=eth1

HEADER_BYTES=128

SAMPLING_N=64

POLLING_SECS=10

More information about configuring Open vSwitch for sFlow purposes can be found here: http://docs.openvswitch.org/en/latest/howto/sflow/

sFlow collector configuration

sflowtool was used for manual tests. The tool can be downloaded from github repository: https://github.com/sflow/sflowtool

Please follow the instructions in the tool’s README file to configure sFlow collector.

Manual test

 The manual test was performed using Ixia traffic generator.

The traffic generator was configured to transmit packets with two VLAN tags. The first tag was inner VLAN with ethertype 0x8100 and VLAN id 2000. The second tag was outer VLAN with ethertype 0x88a8 and VLAN id 150.

Choose ‘Edit Streams’ on the transmitting port in Ixia. Modify the packets with chosen source and destination addresses for layer 2.

Check VLAN(s) option in ‘Protocols’ tab. Edit VLANs. Select ‘Stack VLAN (Q in Q)’.

For outer VLAN change VLAN id to 150 and ethertype to 0x88a8.

For inner VLAN change VLAN id to 2000 and ethertype to 0x8100.

Generate the traffic.

Sflowtool should report extended_vlantunnel with one stripped VLAN with id 150.

The packets received on the other Ixia port should have one VLAN tag with id 2000.

Automatic tests

 Automatic tests have been added with the patch.

The tests consist of two scenarios in which packets are received on port no. 1, VLAN tag is to be stripped and the packet is to be forwarded to port no. 2.

The difference between the scenarios is in the packets received on port no. 1:

  1. The packet is doubly tagged with VLAN. ‘Strip VLAN’ action strips the outer tag and the packet is forwarded.
  2. The packet has only one VLAN tag. The tag is stripped and the packet forwarded.

According to sFlow specification, only in the first scenario the stripped VLAN tag should be reported in extended_vlantunnel structure.

The tests are available in Open vSwitch unit tests suite and can be found under title: ofproto-dpif - sFlow packet sampling - Extended VLAN tunnel

Full patch

Patchwork: https://patchwork.ozlabs.org/patch/751796/

MPLS sFLOW counters implementation in OVS

Status: Can’t be implemented in OVS.

Introduction

In sFLOW RFC (http://www.sflow.org/sflow_version_5.txt ) there are following data structures defined:

/* Extended MPLS Tunnel */

/* opaque = flow_data; enterprise = 0; format = 1008 */

struct extended_mpls_tunnel {

   string tunnel_lsp_name<>;   /* Tunnel name */

   unsigned int tunnel_id;     /* Tunnel ID */

   unsigned int tunnel_cos;    /* Tunnel COS value */

}

/* Extended MPLS VC */

/* opaque = flow_data; enterprise = 0; format = 1009 */

struct extended_mpls_vc {

   string vc_instance_name<>;  /* VC instance name */

   unsigned int vll_vc_id;     /* VLL/VC instance ID */

   unsigned int vc_label_cos;  /* VC Label COS value */

}

/* Extended MPLS FEC

    - Definitions from MPLS-FTN-STD-MIB mplsFTNTable */

/* opaque = flow_data; enterprise = 0; format = 1010 */

struct extended_mpls_FTN {

   string mplsFTNDescr<>;

   unsigned int mplsFTNMask;

}

/* Extended MPLS LVP FEC

    - Definition from MPLS-LDP-STD-MIB mplsFecTable

    Note: mplsFecAddrType, mplsFecAddr information available

          from packet header */

/* opaque = flow_data; enterprise = 0; format = 1011 */

struct extended_mpls_LDP_FEC {

   unsigned int mplsFecAddrPrefixLength;

}

 Most of those counters are taken one-to-one from MIB definition, for example: https://tools.ietf.org/id/draft-ietf-mpls-ftn-mib-01.txt

Counter mplsFTNDescr is defined in: 
MplsFTNEntry  ::=  SEQUENCE {
      mplsFTNIndex               MplsFTNIndex,
      mplsFTNRowStatus           RowStatus,
      mplsFTNDescr               DisplayString,
      mplsFTNApplied             TruthValue,
      mplsFTNMask                BITS,
      mplsFTNAddrType            InetAddressType,
      mplsFTNSourceIpv4AddrMin   InetAddressIPv4,
      mplsFTNSourceIpv6AddrMin   InetAddressIPv6,
      mplsFTNSourceIpv4AddrMax   InetAddressIPv4,
      mplsFTNSourceIpv6AddrMax   InetAddressIPv6,
      mplsFTNDestIpv4AddrMin     InetAddressIPv4,
      mplsFTNDestIpv6AddrMin     InetAddressIPv6,
      mplsFTNDestIpv4AddrMax     InetAddressIPv4,
      mplsFTNDestIpv6AddrMax     InetAddressIPv6,
      mplsFTNSourcePortMin       MplsPortAddr,
      mplsFTNSourcePortMax       MplsPortAddr,
      mplsFTNDestPortMin         MplsPortAddr,
      mplsFTNDestPortMax         MplsPortAddr,
      mplsFTNProtocol            INTEGER,
      mplsFTNActionType          INTEGER,
      mplsFTNActionPointer       RowPointer,
      mplsFTNStorageType         StorageType

All those counters/data are part of MPLS related logic known from commercial solution like e.g. in Cisco routers. In fact OVS is part of Data plane, where logic behind MIB counters is defined in Control plane:

 

MPLS vs OVS

OVS is capable of executing simple MPLS actions (like push/pop label) however it doesn’t have access to control plane data which is necessary to fill/calculate sFLOW MPLS related counters. Below can be found an example when NOX is used to configure OVS to support more complex MPLS scenarios:

http://klamath.stanford.edu/~nickm/papers/mpls-sigcomm11.pdf

Conclusion

OVS can be configured to support similar behavior but above logic would have to be part of upper controller implementation, thus OVS won’t have required information during sampling as it is not aware of MPLS implementation details.

 

NAT extended counters implementation in sFLOW

 

Status: After meeting with Inmon members, it was decided that even proposed solution was in line with sFLOW specs there is it too much risk in implementation (feature seems too complex).

Requirement

Send NAT translated addresses in extended_nat structure, while sending packet through opaque_header before translation has happened.

Problem

Current OVS implementation triggers just one sampling action which can happen before the translation or after. However in some NAT configuration cases, it is not possible to distinguish what translated address will be.

OVS Current NAT implementation

OVS vanilla (kernel space) - NAT mechanism works correctly, no modifications are needed
OVS-DPDK (user space) - 
NAT is currently not supported in userspace.  However there is already ongoing work on the patch, which will provide this functionality: https://patchwork.ozlabs.org/patch/728533/ This patch is not merged to OVS yet.

OVS NAT configuration


  1. a.       Source NAT many-to-one

In OVS NAT can be configured using openflow action `nat`, which is part of the connection tracking mechanism (`ct` action). Using `ovs-ofctl` tool NAT in OVS can be configured like this:

ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,ip,action="ct(commit,zone=1, nat(src=12.34.56.78)),1"

In this example there is specified only single IP address for translation, so all ‘private’ source addresses will be translated to this particular address.

  1. b.      Source NAT many-to-many
    It is another example of NAT configuration. This time ‘private’ IP addresses are translated to IP address selected from the specified range of addresses between 12.34.56.64 and 12.34.56.79:

    ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,ip,action="ct(commit,zone=1,   nat(src=12.34.56.64-12.34.56.79)),1"

    The final IP address used for translation is selected by internal algorithm of NAT mechanism. This selected address is then used for translating source IP address of the packets. There is no way to predict what actually IP address will be selected from the provided range.

sFlow extended NAT data


In sFlow RFC (http://www.sflow.org/sflow_version_5.txt) there is defined structure containing NAT data, which should be exposed from OVS via sFlow protocol:

 

/* Extended NAT Data

   Packet header records report addresses as seen at the sFlowDataSource.

   The extended_nat structure reports on translated source and/or destination

   addesses for this packet. If an address was not translated it should

   be equal to that reported for the header. */

/* opaque = flow_data; enterprise = 0; format = 1007 */

 

struct extended_nat {

     address src_address;            /* Source address */

     address dst_address;            /* Destination address */

}

 

sFlow data in many-to-one scenario

This sFlow NAT data structure should include IP addresses after NAT execution, so translated ones.

In the case of many-to-one scenario IP address used for translation can be found from `nat` action definition and then exposed in sFlow data like this:

 

struct extended_nat {

   address src_address=12.34.56.78;            /* Source address */

   address dst_address=90.90.90.90;            /* Destination address */

}

 

src_address contains IP address used by NAT for translation.

In this case everything is fine. Exposing sFlow NAT counters can be implemented

without any issues.

 

sFlow data in many-to-many scenario

In this scenario in the `nat` openflow action  there is specified range of IP addresses, from which

should be selected the final address for translation.

 

struct extended_nat {

   address src_address=????;                          /* Source address */

   address dst_address=90.90.90.90;            /* Destination address */

}

In this case src_address cannot be determined. This address is selected by NAT internal algorithm

from the specified range of IP addresses.


 

Analysis of possible solution for OVS userspace (OVS with DPDK) NAT implementation for many-to-many scenario

OVS sFlow monitoring is composed from `sample` and `userspace` actions.

So having sFlow and NAT enabled following actions are involved:

# ovs-appctl dpctl/dump-flows

[...] actions:sample(sample=1.0%,actions(userspace(pid=0,sFlow(vid=0,pcp=0,output=2147483649),actions))),ct(commit,zone=1,nat(src=10.0.0.1-10.0.0.255)),2

[…]

Currently in OVS implementation `sample` action is executed always before `ct`, so the NAT translation is not visible in sFlow upcall. After changing the order and making `ct` to be executed before `sample` in sFlow upcall there will be received packet after the NAT translation, so there will be lost original source IP address.

 

 

POSSIBLE SOLUTION: 

Creating sFLOW agent will trigger two actions creation:

  • Normal sFLOW sampling action – currently implemented.
  • Post sFLOW sampling action.

 

General flow: 

 

SUMMARY:

Advantages:

  • In theory such mechanism could be beneficial for some other sampling protocols (or new counters in already existing protocols) in the future, however currently only sFLOW NAT counters would benefit from such approach.

Disadvantages:

  • Possible push back from community, as implementation:
    •  is complex,
    • will affect performance in some way,
    • in some way it messes OVS configuration as additional action (rule) will be created,
    • However the gain is just one counter in sFLOW protocol.
    • It is hard to say if pre and post actions can be easily synchronized (normal sampling action is called with some probability) – it is still to be verified. If not, post action would have to be called for every packet.
    • This solution would work only for user space implementation (DPDK OVS), it won’t work for kernel space NAT (OVS vanilla). 
  • No labels