An Evolution of Network Telemetry - Open Compute Project

11 downloads 0 Views 2MB Size Report
Mar 23, 2018 - Network Telemetry. Are we reinventing the wheel? Self-driving Networks. High reliability. Easy-to-manage. Software-defined Networks.
An Evolution of Network Telemetry Tal Mizrahi Marvell

Telemetry BoF

Network Telemetry Self-driving Networks

Easy-to-troubleshoot

High reliability Software-defined Networks

High speed

Easy-to-manage

Network Telemetry Are we reinventing the wheel? 3/23/2018

3

Ping / Traceroute

Reinventing the wheel?

3/23/2018

4

Counters

Per port

Queue State

Per flow

Latency

Per queue





Old-School Passive Monitoring

Reinventing the wheel?

3/23/2018

5

Carrier Network OAM

Reinventing the wheel?

Higher Layers IP OAM

IETF ICMPv4

IETF ICMPv6

IETF IPPM

Layer 3 MPLS / PWE3 OAM

ITU-T Y.1711 MPLS OAM

ITU-T G.8113.1 MPLS-TP OAM

IETF MPLS-TP OAM

IETF LSP-Ping MPLS OAM

ITU-T Y.1731

Layer 2 Ethernet OAM

IETF PWE3 VCCV

IETF BFD

IEEE 802.1ag IEEE 802.3ah

Layer 1

OAM Protocols Active measurement / monitoring: Control Message

Control Message 3/23/2018

6

Piggybacked Measurement

Reinventing the wheel?

Telemetry is piggybacked onto data packets

IOAM / INT

AM-PM

3/23/2018

7

Piggybacked Metadata – IOAM / INT

Reinventing the wheel?

Analytics Server Telemetry Info

IOAM / INT Domain

Switches push local metadata into header: delay, queue state, …

IOAM (IETF) INT (P4) +SAI TAM

3/23/2018

8

AM-PM: Alternate Marking – Performance Measurement

Reinventing the wheel?

(RFC 8321)

Counts number of packets sent

Counts number of packets received

Data Center Network Servers

Servers

Packets Sent: 10,000 Packets Received: 9,500 Packets Lost:

500 Analytics Server

3/23/2018

9

Network Telemetry: An Evolution

Reinventing the wheel?

NO !

Ping Traceroute

Passive Monitoring

Carrier OAM

IOAM / INT AM-PM

3/23/2018

10

Marvell’s Network Telemetry Solutions Active Telemetry Selective Probing

Timestamping

Metadata Grab-’n-Go

Passive Telemetry

Counters

Flow-based Probes

IEEE 1588

Per-hop Telemetry

3/23/2018

11

Metadata: Grab-’n-Go

Packet Metadata

Incoming Packet

Fixed Programmable Logic Programmable

Modified Packet

Packet Processing

Marvell Prestera 3/23/2018

12

Timestamping Everything

Packet Metadata

Incoming Packet

Packet Processing

Outgoing Packet

IEEE 1588 Synchronized Clock

Marvell Prestera

✔ Periodic probing ✔ AM-PM ✔ TimeFlip

3/23/2018

13

AM-PM using TimeFlip Marvell Prestera Switch

Periodic range

... timestamp value

TCAM header / timestamp metadata

*…*

field2 field3 field4 …

Time.Sec

1

action

1 second

*…*

Time.Frac

3/23/2018

14

Per-Flow Congestion Detection using AM-PM

loss and delay è congestion is detected

3/23/2018

15

Selective Exporting

Analytics Server

Telemetry Info

✔ Drop ✔ Congestion ✔ 1 of N ✔ Periodic

Marvell Prestera

3/23/2018

16

IOAM/INT: Periodic Exporting Analytics Server

Export one packet per second

Telemetry Info

Per flow / per port Export last ms of every second IOAM / INT Domain

3/23/2018

17

IOAM/INT: Adaptive Exporting Analytics Server Event-driven: ✔ On drop ✔ On congestion ✔ On high rate

IOAM / INT Domain

3/23/2018

18

Combining IOAM/INT with AM-PM Analytics Server

Challenge: export telemetry info without the expensive overhead of IOAM/INT.

AM-PM as a trigger for exporting telemetry info.

Per-hop Telemetry

IOAM / INT Domain

3/23/2018

19

The Network Telemetry Toolset

Selective Probing

Counters

Timestamping

Metadata Grab-’n-Go

Flow-based Probes

IEEE 1588

Marvell Prestera 3/23/2018

20

References [1]

Mizrahi, T., Vovnoboy, V., Nisim, M., G. Navon, and A. Soffer, “Network Telemetry Solutions for Data Center and Enterprise Networks”, Marvell white paper, 2018.

[2]

Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., Chang, R. and D. Bernier "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-data-00, work in progress, 2017.

[3]

C. Kim et al., “In-band network telemetry (INT)”, P4 consortium, 2015.

[4]

Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, “Alternate Marking method for passive and hybrid performance monitoring”, RFC 8321, 2018.

[5]

Mizrahi, T., Rottenstreich, O. and Y. Moses, “TimeFlip: Scheduling Network Updates with Timestamp-based TCAM Ranges”, IEEE INFOCOM, 2015.

3/23/2018

22

Suggest Documents