parallelizing network analysis
play

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - PowerPoint PPT Presentation

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org Motivation NIDSs have reached their limits on commodity hardware Keep


  1. Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org

  2. Motivation • NIDSs have reached their limits on commodity hardware • Keep needing to do more analysis on more data at higher speeds • Analysis gets richer over timer, as attacks get more sophisticated • However, single CPU performance is not growing anymore the way it used to • Single NIDS instance (Snort, Bro) cannot cope with >=1Gbps links • Key to overcome current limits is parallel analysis • Volume is high but composed of many independent task • Need to exploit parallelism to cope with load 2

  3. Orthogonal Approaches • The NIDS Cluster • Many PCs instead of one • Communication and central user interface creates the impression of one system • Vision: Parallel operation within a single NIDS instance • In software: multi-threaded analysis on multi-CPU/multi-core systems • In hardware: compile analysis into a parallel execution model (e.g., on FPGAs) 3

  4. The NIDS Cluster 4

  5. The NIDS Cluster • Load-balancing approach: use many boxes instead of one • Most NIDS provide support for multi-system setups • However they work independent in operational setups • Central manager collects alerts of independent NIDS instances • Aggregates results instead of correlating analysis • The NIDS cluster works transparently like a single NIDS • Gives same results as single NIDS would if it could analyze all traffic • No loss in detection accuracy • Scalable to large number of nodes • Single system for user interface (log aggregation, configuration changes) 5

  6. Architecture !"# +$%&"$,-( +$%&"$&% $%&'(")) )&%.#"/ !"# !"#$%&$'()#'&* 6,$,7&" 222 3"#45 0,1/&$'()#'&* 6

  7. Prototype Setups • Lawrence Berkeley National Laboratory • Monitors 10 Gbps upstream link • 1 front-end, 10 backends • University of California, Berkeley • Monitors 2x1Gbps upstream links • 2 front-ends, 6 backends • IEEE Supercomputing 2006 • Conference’s 1 Gbps backbone network • 100 Gbps High Speed Bandwidth Challenge network (partially) • Goal: Replace current operational security monitoring 7

  8. Front-Ends • Distribute traffic to back-ends by rewriting MACs • In software via Click • In hardware via Force-10’s P10 (prototype in collaboration with F10) • Fault-tolerance • Easy to retarget traffic if a back-end node fails • Per connection-hashing • Either 4-tuple (addrs,ports) or 2-tuple (addrs) • MD5 mod n, ADD mod n 8

  9. Simulation of Hashing Schemes ;8 -&9 ! = -&9 ! ; 3.( ! = 3.( ! =%5::%$,&"+7 !"#$%&'((")"$*"+%*,-.#)"&%/'01%"2"$%&'+0)'340',$%567 :9 :8 9 8 !,$%:8<88 !,$%:=<88 !,$%:><88 !,$%;;<88 ?4"%;<88 ?4"%@<88 9

  10. Back-ends • Running Bro as their analysis engine • Bro provides extensive communication facilities • Independent state framework • Sharing of low-level state • Script-layer variables can be synchronized • Basic approach: pick state to be synchronized • A few subtleties needed to be solved • Central manager • Collects output of all instances • Raises alerts • Provides dynamic reconfiguration facilities 10

  11. Evaluation & Outlook • Prototypes are running nicely • Are able to perform analysis not possible before • E.g., full HTTP analysis & Dynamic Protocol Detection/Analysis • Now in the process of making it production quality • Evaluation • Verified accuracy by comparing against single Bro instance • Evaluated performance wrt load-balancing quality, scalability, overhead 11

  12. CPU Load per Node node0 node1 15 node2 node3 node4 node5 node6 node7 node8 Probability density node9 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 CPU utilization 12

  13. Scaling of CPU 10 nodes 5 nodes 25 3 nodes 20 Probability density 15 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 CPU utilization 13

  14. Load on Berkeley Campus 70 Backend 0 Backend 2 Backend 4 Proxy 0 Manager Backend 1 Backend 3 Backend 5 Proxy 1 60 50 40 CPU load (%) 30 20 10 0 Tue 12:00 Tue 18:00 Wed 0:00 Wed 6:00 Wed 12:00 Wed 18:00 Thu 0:00 Thu 6:00 14

  15. Parallelizing Analysis 15

  16. Potential • Observation • Much of the processing of a typical NIDS instance can be done in parallel • However, existing systems do not exploit the potential • Example: Bro NIDS • Assume Gbps network with 10,000 concurrent connections Packet Assembled Event Filtered Aggregated TCP Stream Reassembly Streams Packet Streams Event Event Aggregate Analysis Protocol Analyzers Per Flow Analysis Streams Streams Streams Global Analysis Stream Demux 1-10 Gbps ~10 4 ~10 5 ~10 4 ~10 3 ~10-100 Instances Instances Instances Instances Instances 16

  17. Commodity Hardware • Multi-thread/multi-core CPU provide necessary power • Inexpensive commodity hardware • Aggregated throughput does in fact still follow Moore’s law • Need to structure applications in highly parallel fashion • Do not get the performance gain out of the box • Need to tructure processing into separate low-level threads • Need to address • Intrusion prevention functionality • Exchange of state between threads for global analysis • Yet minimize inter-thread communication • Factor in memory locality (within one core / across several cores) • Provide performance debugging tools 17

  18. Proposed Architecture CPU Core 1 CPU Core 2 ... L1 D-Cache L1 D-Cache Thread Thread Thread Thread Thread Thread Thread Thread Cached Cached Queues Queues L2 Cache & Main Memory Core 1 MSG-Event-Q ... Core 1 Pkt-Q Core 2 Pkt-Q Core 2 MSG-Event-Q ... Core 1 Event-Q Core 2 Event-Q External MSG-Event-Q Active Network Conn Table Packet Interface Host Table Dispatch Pending Pkts 18

  19. Active Network Interface • Only non-commodity components currently • Prototype to be based on NetFPGA platform ($2000) • Commodity hardware might actually be suitable later (E.g., Sun’s Niagara 2 has 8 CPU cores plus 2 directly attached 10GE controller!) • Thread-aware Routing • ANI copies packet directly into thread’s memory (cache) • ANI keeps per-flow table of routing decisions • Dispatcher thread takes initial routing decision per flow • Selective packet forwarding • ANI holds packets until it gets the clearance (might use caching per e.g. flow/ip) • Normalization 19

  20. Parallelized Network Analysis • Architecturally-aware Threading • Need to identify the right granularity for threads • Protocol analysis consists of fixed blocks of functionality • Event processing needs to preserve temporal order � Multiple independent event queues (e.g., one per core) • Scalable Inter-thread Communication • Can use shared memory • Need to consider nonuniformities in system’s cache hierarchy • Potentially restructure detection algorithms to minimize communication (e.g., loosing semantics via probabilistic algorithms) • Prevention Functionality • Only forward packet once all events are processed • Evaluation, profiling & debugging • Race conditions & memory access patterns • Trace-based reproducability 20

  21. Going Further: Custom Hardware • Goal: custom platform for highly parallel, stateful network analysis • Custom hardware (e.g., FPGAs) is ideal for parallel tasks • Expose the parallelism and map it to hardware • We can identify three types of functionality in Bro • Fixed function blocks � Handcraft (e.g., robust reassembly) • Protocol analyzers � Use BinPAC with new backend • Policy scripts � Compile into parallell computation model • Envision using MIT’s Transactor model • Many small self-contained units communicating via message queues • Ambitious but highly promising • Generic network analysis beyond network intrusion detection 21

  22. Thanks for your attention. Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org This work is supported by the Office of Science and Technology at the Department of Homeland Security. Points of view in this document are those of the author(s) and do not necessarily represent the official position of the U.S. Department of Homeland Security or the Office of Science and Technology.

Recommend


More recommend