a framework for historical analysis and real-4me monitoring of BGP data Chiara Orsini, Alistair King, Danilo Giordano, Vasileios Giotsas, Alberto Dainotti alistair@caida.org CAIDA, UC San Diego
BGPSTREAM BGP data analysis for the masses • Open source libraries, APIs and tools for live and historical BGP data analysis • Simple API • Versa?le • Facilitates reproducibility and repeatability • Real?me monitoring • Stable: h"ps:/ /bgpstream.caida.org 2
MOTIVATION Why BGPStream? • BGP research and monitoring is important • Lots of exis?ng BGP measurement data • Route Views and RIPE RIS have >15 years of data (16TB) • BUT , dis?nct lack of good tooling for processing/analyzing BGP data • State of the art? wget http://archive.org/xyz/abc/file.mrt bgpdump -m file.mrt | my_parser.py 6
THE BGPSTREAM FRAMEWORK An overview metadata query User Code Python API Metadata User Libraries libBGPStream Broker MRT data (via HTTP) metadata crawler Public HTTP … Data Archives 13
THE BGPSTREAM FRAMEWORK Stacked view 3 2 1 17
BGPSTREAM USER LIBRARY libBGPStream • Issues queries to metadata broker • Retrieves data directly from Data Providers • Currently supports MRT (RFC 6396) • De-mul?plexes data from many sources into a single stream • Provides ?me-ordered sor?ng 18
RECORDS & ELEMS ExtracAng informaAon from MRT BGPStream Record Function Field Type • BGPStream Record : project name (e.g., Route Views) project string collector string collector name (e.g., rrc00) RIB or Updates type enum dump time long time the containing dump was begun • Encapsulates an MRT record first, middle, or last record of a dump position enum time long timestamp of the MRT record status enum record validity flag • Adds metadata (e.g. collector) de-serialized MRT record MRT record struct • MRT records (may) contain info for mul?ple BGPStream Elem Table 1: BGPStream elem fields. peers/prefixes Function Field Type route from a RIB dump, an- type enum nouncement, withdrawal, or state • E.g. routes to a single prefix from mulOple peers message time long timestamp of MRT record in a RIB dump IP address of the VP peer address struct AS number of the VP peer ASN long IP prefix prefix* struct • Records are decomposed into BGPStream Elems: IP address of the next hop next hop* struct AS path AS path* struct community attribute community* struct • E.g. prefix announcement from a single peer FSM state (before the change) old state* enum FSM state (after the change) new state* enum * denotes a field conditionally populated based on 19
C API Consuming the stream 26
PYTHON BINDINGS - CASE STUDY Studying AS path inflaAon using PyBGPStream How many AS paths are longer than the shortest path between two ASes? from _pybgpstream import BGPStream, BGPRecord, BGPElem 1 from collections import defaultdict 2 AS path length discrepancy PMF from itertools import groupby 3 import networkx as nx 4 0.8 5 0.7 stream = BGPStream() 6 as_graph = nx.Graph() 7 0.6 rec = BGPRecord() 8 bgp_lens = defaultdict(lambda: defaultdict(lambda: None)) 9 0.5 lin stream.add_filter(’record-type’,’ribs’) 10 0.4 stream.add_interval_filter(1438415400,1438416600) 11 stream.start() 12 30 LINES OF PYTHON CODE 0.3 13 while(stream.get_next_record(rec)): 14 0.2 elem = rec.get_next_elem() 15 0.1 0.1 while(elem): 16 monitor = str(elem.peer_asn) 17 10 -2 hops = [k for k, g in groupby(elem.fields[’as-path’].split(" "))] 18 10 -3 if len(hops) > 1 and hops[0] == monitor: 19 origin = hops[-1] 20 log 10 -4 for i in range(0,len(hops)-1): 21 as_graph.add_edge(hops[i],hops[i+1]) 22 10 -5 bgp_lens[monitor][origin] = \ 23 10 -6 min(filter(bool,[bgp_lens[monitor][origin],len(hops)])) 24 elem = rec.get_next_elem() 25 10 -7 for monitor in bgp_lens: 26 0 1 2 3 4 5 6 7 8 9 10 11 for origin in bgp_lens[monitor]: 27 AS path length difference[d] nxlen = len(nx.shortest_path(as_graph, monitor, origin)) 28 print monitor, origin, bgp_lens[monitor][origin], nxlen 29 27
PY BGPSTREAM Python bindings • Single script includes data specifica?on and analysis logic: • Enhances reproducibility/repeatability • All of the power of the C API, available in Python 28
PYTHON BINDINGS - CASE STUDY Timely reacAve measurements • We monitor c ommunity-based black-holing • Vic?m of DoS aWack announces prefix with special community aPribute to request neighbors drop traffic • We trigger traceroutes to characterize the black-holing event (using 50-100 probes per event) • probed 253 vic?ms (90-95% of black-holing events) while black-holing in effect • C ombined passive control-plane and ac2ve data-plane measurements to capture and inves2gate transient rou2ng policies 29
BGP CORSARO ConAnuous realAme monitoring Hijacking of AS137 (GARR) - Jan 2015* • Plugin-based tool for processing live BGP data • Con?nuously extracts derived data from BGPStream in regular 2me bins • Incl. “prefix-monitor” sample plugin • Monitor your own address space • How many prefixes/origin ASes? *originally described by Dyn Research: http://research.dyn.com/2015/01/vast-world-of-fraudulent-routing/ 30
BIG DATA BGP data analysis for the 1% • “Students can write scripts to analyze BGP data, but I need to do REAL analysis…” • We conducted a proof-of-concept study using PyBGPStream with Apache Spark: • Analyzed 15 years of data: • one RIB per month • all Route Views and RIPE RIS collectors • > 3000 RIBs, ~44 billion BGPStream Elems • See the paper for more details about lessons learned • PyBGPStream/Spark template script: hWps:/ /github.com/CAIDA/bgpstream 31
BIG DATA - CASE STUDIES RouAng table size over Ame # IPv4 pre fj xes 500k 400k 300k 200k 100k 0 2002 2004 2006 2008 2010 2012 2014 2016 32
