Decentralized Evaluation of Regular Expressions for Capability Discovery in Peer-to-Peer Networks Maximilian Szengel Advisors: C. Grothoff, R. Holz, H. Niedermayer, B. Polot Master’s thesis Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen 21 Nov 2012 Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 1
Motivation Searching in DHT-based Peer-to-Peer Networks Distributed key/value storage, typically hashes for keys Range queries (PastryStrings, PHT) Pattern matching (Cubit, DPMS) Similarity queries (Karnstedt et al.) Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 2
Motivation Searching in DHT-based Peer-to-Peer Networks Distributed key/value storage, typically hashes for keys Range queries (PastryStrings, PHT) Pattern matching (Cubit, DPMS) Similarity queries (Karnstedt et al.) Our approach: regular expressions Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 2
Motivation Capability Discovery in Peer-to-Peer Networks Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3
Motivation Capability Discovery in Peer-to-Peer Networks Offering Exit Node for: 192.0.4.0/24 TCP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.3.0/24 TCP and UDP Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3
Motivation Capability Discovery in Peer-to-Peer Networks Offering Exit Node for: 192.0.4.0/24 TCP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.3.0/24 TCP and UDP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3
Motivation Capability Discovery in Peer-to-Peer Networks Offering Exit Node for: 192.0.4.0/24 TCP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.3.0/24 TCP and UDP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3
Motivation Capability Discovery in Peer-to-Peer Networks Offering Exit Node for: 192.0.4.0/24 TCP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.3.0/24 TCP and UDP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3
Approach: Idea 1 Offerer creates regular expression describing service 2 Regular expression is converted to a DFA 3 DFA is stored in the DHT 4 Patron matches using a string Offerer Patron DFA Search string PUT GET DHT NFA Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 4
Problem: Mapping of States to Keys Regular expression ( ab | cd ) e ∗ f and corresponding DFA e b a a f q0 (ab|cd)e* (ab|cd)e*f c d c A regular expression is assigned to each state as its identifier. The hash of the identifier is used as the key for DHT PUT. Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 5
Problem: Mapping of States to Keys Regular expression ( ab | cd ) e ∗ f and corresponding DFA e b a a f q0 (ab|cd)e* (ab|cd)e*f c d c h("(ab|cd)e*") h("c") DHT h("a") h("(ab|cd)e*f") Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 5
Problem: Merging of DFAs Regular expressions ( ab | cd ) e ∗ f and ( ab | cd ) e ∗ fg ∗ with corresponding DFAs e a b a f q0 (ab|cd)e* (ab|cd)e*f c d c g e a b a f q0 (ab|cd)e* (ab|cd)e*fg* c d c Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 6
Problem: Merging of DFAs Merged NFA for regular expressions ( ab | cd ) e ∗ fg ∗ and ( ab | cd ) e ∗ f e (ab|cd)e*f a f b a q0 (ab|cd)e* c d c g f (ab|cd)e*fg* Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 6
Problem: Decentralizing the Start State Regular expression: abc ∗ defg ∗ h and k = 4. abcd g ef def abcc h abc*defg* abc*defg*h c c def ab q0 abc* f abde Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 7
Problem: Optimization (Path compression) Compressing linear paths in the DFA. Example for abc ( d ∗ | e ) fgh . d 4 d f a b c e f g h 0 1 2 3 8 5 6 7 f Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 8
Problem: Optimization (Path compression) Compressing linear paths in the DFA. Example for abc ( d ∗ | e ) fgh . d 4 d f a b c e f g h 0 1 2 3 8 5 6 7 f d 4 abcd f abcef gh 0 5 7 abcf Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 8
Problem: Path compression length Merging of DFAs with path compression GNUNET − VPN 00010000 − V 4 TCP 110000000000000000000010 ( 0 | 1 ) ∗ GNUNET − VPN 00010000 − V 4 TCP 110000000000000000000011 ( 0 | 1 ) ∗ GNUNET − VPN 00010000 − V 4 TCP 110000010000000000000011 ( 0 | 1 ) ∗ 1 0 G N U ... 1 0 0 1 2 3 27 28 29 Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9
Problem: Path compression length Merging of DFAs with maximal path compression 1 0 1 110000000000000000000010 1 0 110000000000000000000011 GNUNET-VPN00010000-V4TCP 2 110000010000000000000011 1 0 3 Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9
Problem: Path compression length Merging of DFAs with limited path compression length 1 0 3 00000010 1 0 00000000 00000011 2 4 11000000 1 GNUNET-VPN00010000-V4TCP 11000001 00000000 5 1 6 00000010 0 7 Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9
Evaluation Implementation in GNUnet Profiling of Internet-scale routing using regular expressions to describe AS address ranges CAIDA AS data set: Real AS data Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 10
Evaluation AS 12816 AS 10001 129.187.0.0/16 49.128.128.0/19 AS 7212 131.159.0.0/16 61.195.240.0/20 AS 12812 129.59.0.0/16 138.244.0.0/15 122.49.192.0/21 193.188.128.0/24 160.129.0.0/16 138.246.0.0/16 123.255.240.0/21 193.188.129.0/24 192.111.108.0/24 ... 175.41.32.0/21 193.188.130.0/24 192.111.109.0/24 192.68.211.0/24 202.75.112.0/20 193.188.131.0/24 192.111.110.0/24 192.68.212.0/22 202.238.32.0/20 199.78.112.0/24 210.48.128.0/21 199.78.113.0/24 211.133.224.0/20 199.78.114.0/24 219.124.0.0/20 199.78.115.0/24 219.124.0.0/21 219.124.8.0/21 AS 10002 AS 8265 61.114.64.0/20 91.223.12.0/24 61.195.128.0/20 195.96.192.0/19 120.50.224.0/19 195.96.192.0/24 AS 56357 120.72.0.0/20 195.96.193.0/24 188.95.232.0/22 202.180.192.0/20 195.96.194.0/23 192.48.107.0/24 195.96.196.0/22 195.96.200.0/22 195.96.204.0/22 195.96.208.0/21 AS 32310 AS 50038 195.96.216.0/21 204.94.175.0/24 57.236.47.0/24 57.236.48.0/24 57.236.51.0/24 193.104.87.0/24 AS 931 46.183.152.0/21 103.10.233.0/24 186.233.120.0/21 AS 825 186.233.120.0/22 91.221.132.0/24 186.233.124.0/22 91.221.133.0/24 192.16.240.0/20 Distributed Hash Table Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 10
Evaluation: Results of Simulation (1) Number of transitions and states in the merged NFA 2000000 1800000 # of transitions / states 1600000 1400000 1200000 1000000 800000 600000 400000 no compr. 2 4 6 8 16 Maximum path compression length transitions states Dataset: All 40 , 696 ASs Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 11
Evaluation: Results of Simulation (2) Degree of non-determinism at states in the merged NFA 1e+07 1e+06 100000 # states 10000 1000 100 10 1 1 2 3 degree of non-determinism max path length 1 max path length 4 max path length 8 max path length 2 max path length 6 max path length 16 Dataset: All 40 , 696 ASs Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 12
Evaluation: Results of Simulation (3) 1 0.1 0.01 % of states 0.001 0.0001 1e-05 1e-06 1 10 100 1000 10000 100000 >= k out degree max. path compression length 6 max. path compression length 8 max. path compression length 16 Dataset: All 40 , 696 ASs Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 13
Evaluation: Results of Emulation (1) Search duration for five runs with 500 randomly connected peers, 500 regular expressions and 500 search strings. 500 450 400 # strings matched 350 300 250 200 150 100 50 0 0 20 40 60 80 100 search duration in s Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 14
Summary and Future Work Achievements Capability discovery in DHT-based P2P networks using regular expressions Linear latency in the length of the search string Suitable for applications that can tolerate moderate latency Future Work Use regular expression search in new applications Open problem: searching using a regular expression Ultra-large scale profiling (SuperMUC) Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 15
Thank you! Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 16
Appendix Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 17
Recommend
More recommend