

# **DPDK Summit**

3.0

 $\equiv 7$ 

io

### VPP overview

Shwetha Bhandari Developer@Cisco



### Scalar Packet Processing

- A fancy name for processing one packet at a time
- Traditional, straightforward implementation scheme
- Interrupt, a calls b calls c ... return return return
- Issues:
  - thrashing the I-cache (when code path length exceeds the primary I-cache size)
  - Dependent read latency (packet headers, forwarding tables, stack, other data structures)
  - Each packet incurs an identical set of I-cache and D-Cache misses





### **Packet Processing Budget**

### 14 Mpps on 3.5 GHz CPU = 250 cycles per packet



### Memory Read/Write latency



BUT memory is ~70+ ns away (i.e. 2.0 GHz = 140+ cycles)

Source: Intel® 64 and IA-32 Architectures: Optimization Reference Manual



DPDK



### Introducing VPP: the vector packet processor



### 

# Introducing VPP

#### Accelerating the dataplane since 2002

#### Fast, Scalable and consistent

- 14+ Mpps per core
- Tested to 1TB
- Scalable FIB: supporting millions of entries
- 0 packet drops, ~15µs latency

### Optimized

- **DPDK** for fast I/O
- ISA: SSE, AVX, AVX2, NEON ..
- **IPC:** Batching, no mode switching, no context switches, non-blocking
- Multi-core: Cache and memory efficient





# Introducing VPP

#### Extensible and Flexible modular design

- Implement as a directed graph of nodes
- Extensible with plugins, plugins are equal citizens.
- Configurable via CP and CLI

#### **Developer friendly**

- Deep introspection with counters and tracing facilities.
- Runtime counters with IPC and errors information.
- Pipeline tracing facilities, life-of-a-packet.
- Developed using standard toolchains.





# Introducing VPP

#### **Fully featured**

- L2: VLan, Q-in-Q, Bridge Domains, LLDP ...
- L3: IPv4, GRE, VXLAN, DHCP, IPSEC ...
- L3: IPv6, Discovery, Segment Routing ...
- **CP:** CLI, IKEv2 ...

#### Integrated

- Language bindings
- Open Stack/ODL (Netconf/Yang)
- Kubernetes/Flanel (Python API)
- OSV Packaging



### VPP in the Overall Stack







### VPP: Dipping into internals..



### **VPP Graph Scheduler**



- Always process as many packets as possible
- As vector size increases, processing cost per packet decreases
- Amortize I-cache misses
- Native support for interrupt and polling modes
- Node types:
  - Internal
  - Process
  - Input







### How does it work?





... graph nodes are optimized to fit inside the instruction cache ...

|           |   | Microprocessor    |  |
|-----------|---|-------------------|--|
| No. 1 and | 3 | Instruction Cache |  |
|           | 4 | Data Cache        |  |
|           |   |                   |  |

... packets are pre-fetched, into the data cache ...

Packet processing is decomposed into a directed graph node ...

\* approx. 173 nodes in default deployment



any remaining packets are processed on by one ...

### How does it work?





#### while packets in vector

Get pointer to vector

while 4 or more packets

PREFETCH #1 and #2

PROCESS #1 and #2

ASSUME next\_node same as last packet

dispatch fn()

Update counters, advance buffers

Enqueue the packet to next\_node

while any packets

- -

<as above but single packet>

... prefetch packets #1 and #2 ...





while packets in vector

Get pointer to vector

while 4 or more packets

PREFETCH #3 and #4

PROCESS #1 and #2

ASSUME next\_node same as last packet

Update counters, advance buffers

Enqueue the packet to next\_node

while any packets

<as above but single packet>

... process packet #3 and #4 ... ... update counters, enqueue packets to the next node ...



### Modularity Enabling Flexible Plugins

### Plugins can:

- Introduce new graph nodes
- Rearrange packet processing graph
- Can be built independently of VPP source tree
- Can be added at runtime (drop into plugin directory)
- All in user space

### Enabling:

- Ability to take advantage of diverse hardware when present
- Support for multiple processor architectures (x86, ARM, PPC)
- Few dependencies on the OS (clib) allowing easier ports to other Oses/Env



### VPP: performance

-







### **VPP**: *integrations*

- -





## Summary



- VPP is a fast, scalable and low latency network stack in user space.
- VPP is trace-able, debug-able and fully featured layer 2, 3,4 implementation.
- VPP is easy to integrate with your data-centre environment for both NFV and Cloud use cases.
- VPP is always growing, innovating and getting faster.
- VPP is a fast growing community of fellow travellers.

### ML: vpp-dev@lists.fd.io

Wiki: wiki.fd.io/view/VPP

Join us in FD.io & VPP - fellow travellers are <u>always</u> welcome. Please reuse and contribute!

### Contributors...





# **THANK YOU**