anomaly based intrusion detection in distributed
play

Anomaly Based Intrusion Detection in Distributed Applications - PowerPoint PPT Presentation

Anomaly Based Intrusion Detection in Distributed Applications without global clock Eric Totel, Mouna Hkimi, Michel Hurfin, Mourad Leslous, Yvan Labiche SEC2-2016 5 July 2016 Outline of the Presentation Position of the problem Building


  1. Anomaly Based Intrusion Detection in Distributed Applications without global clock Eric Totel, Mouna Hkimi, Michel Hurfin, Mourad Leslous, Yvan Labiche SEC2-2016 5 July 2016

  2. Outline of the Presentation • Position of the problem • Building a distributed application behavior model • Partial Event Ordering • Automaton recognizing sequences • Temporal properties • Applying Detection on an example • Results on a distributed file system application CentraleSupelec 2

  3. Intrusion Detection in Distributed Systems • Several nodes running processes • Intrusion Detection Systems are deployed • On the network (NIDS) • On each node (HIDS) • Local Detection of compromission • No relationship between the states of the several nodes • Alerts emitted takes into account the state of one node • Current solutions • Alert correlation: Requires total ordering of all alerts • DIDS: Requires total ordering of all events analyzed • In Cloud environments: virtual machines are often desynchronized (clock drift) CentraleSupelec 3

  4. The Case of a Distributed Application • How to enhance the detection ? • Statement • The states of the different nodes are not independent • As such the behaviors of the different nodes of the application are not independent • The actions performed by the nodes are causally dependent on each other • Local actions • Messages exchanged • Solution • Build a reference model that takes into account the causal dependencies between the nodes • Without relying on a total ordering of the events (no global clock) CentraleSupelec 4

  5. Logs and partial ordering • On each node, a process produces a total ordered log • Partial Ordering of the events on different nodes (Lamport happened before relationship) 8 e 2 E α , 8 f 2 E α , e � α f • e occured before f in the same log E α i • e is a message send and f its receipt e � α g and g � α f • there exists g such that • How to learn the right sequences of actions performed by the distributed processes ? CentraleSupelec 5

  6. Example of Logs • A trace: two logs of two processes and a b c1!m Execution α p1 E α E α 1 2 1 a d p2 2 b c1?m d c1?m e 3 c1!m e • On this execution, a � α b • No order relation between b and d: • {a, b, d, c1!m, c1?m, e} is a valid sequence … but not the only one ! CentraleSupelec 6

  7. Notion of a valid sequence • Observed correct normal sequence • Compliant with the partial relationship • A sequence of events is valid iff CentraleSupelec 7

  8. Generation of valid sequences (1) p2 Execution α Generating the lattice of e consistent cuts E α E α 1 2 1 a d c1?m 2 b c1?m A valid sequence is a sequence 3 c1!m e of events consumed by a path d in the lattice of consistent cut a b c1!m p1 CentraleSupelec 8

  9. Generation of valid sequences (2) Generation of an automaton containing all the paths in the lattice of consistent cuts p2 c1!m 3 d e 2 b c1?m e d 13 9 6 c1!m 1 a d 7 0 d c1?m b 17 a 16 d a b c1!m p1 CentraleSupelec 9

  10. Automaton from several executions Execution α Execution β E β E β E α E α 1 2 1 2 1 a d a f 2 b c1?m c1!m c1?m 3 c1!m e g � ���� � � � ���� � � � � � ���� � � � � � � � � � � � ���� �� � �� ���� � � � �� �� �� ���� Merge the start states �� of all the automata � �� CentraleSupelec 10

  11. Analysis of the automaton • Contains only the observed valid sequences • In practice: • In a heavy distributed application, it is very difficult to exhibit all the behaviors of the application due to concurrency • It is thus very difficult to learn a complete behavior model • Solution: • Generalization of the automaton • Permits to introduce new unlearned behaviors • Ensures that all the original valid sequences are included in the generalized automaton CentraleSupelec 11

  12. Generalization (k-tail algorithm) ���� � � � � ���� � � � � � ���� � � � � � � � � Disadvantage: can introduce incorrect sequences � � � ���� �� of events at the same time � �� ���� � � � �� �� �� ���� �� � k=1 (a low k permits a higher generalization) �� ���� � � � � � � � � ���� � � Advantage: can introduce new valid unlearned � � � ���� � � � � ���� � � sequences of events � �� � � � � �� ���� �� � CentraleSupelec 12

  13. How to deal with incorrect sequences ? • Duality of models • Automaton: exhaustive list of sequences • Temporal properties: properties on the types of events • Temporal invariants • Issued from the domain of test • Three invariants considered (a and b are event types) • a is always followed by b • a is never followed by b • a always precedes b CentraleSupelec 13

  14. Invariants on our example ���� � � � � ���� � � � � � ���� � � � � � � � � � � � ���� �� � �� ���� � � � �� �� �� ���� �� � �� (total of 59 invariants) Generalization Model checking ���� � � � � � � � � ���� � � � � � ���� � � � ���� � � � � �� � � � � �� ���� �� � CentraleSupelec 14

  15. Duality of models Model Generalized Invariants that can be Automaton violated by the generalized automaton ���� � � � � � � � � ���� � � � � � ���� � � � � ���� � � � �� (total of 10 invariants) � � � � �� ���� �� � Non acceptable sequence {a, b, c1!m, d, c1?m, g} CentraleSupelec 15

  16. Valid/Accepted/Acceptable sequences • Invariants computed on the original lattice of consistent cuts ∑ '' acceptable • Invariants on valid sequences of sequences events • Invariants are less restrictive than the automaton ∑ ' • We consider a sequence is sequences acceptable if it is accepted by the accepted by the generalized ∑ automaton and complies with the automaton valid invariants sequences CentraleSupelec 16

  17. Detection algorithm • Given a trace • Is this trace compliant: • With the generalized automaton • With the temporal invariants • Two strategies • All total ordering of the events of the trace are compliant with the model • At least one order of the events of the trace is compliant with the model • In practice • Strategy « all » is more time consuming • Similar false positive rate in both approaches CentraleSupelec 17

  18. Simple Example: e-commerce • 3 processes: article buying, 70 possible different behaviours P2-P1!SEARCH P1-P2?AVAILABLE P1-P2?AVAILABLE P2-P1!BUY P1-P2?SOLD Process (p2) P1-P3!SOLD P3-P1?BUY P3-P1?SEARCH P1-P3!AVAILABLE Server (p1) P2-P1?BUY P1-P2!SOLD P2-P1?SEARCH P1-P2!AVAILABLE Process (p3) P3-P1!SEARCH P1-P3?AVAILABLE P3-P1!BUY P1-P3?SOLD CentraleSupelec 18

  19. Detection Accuracy • Simulations of an intrusion • Removing an event • Modifying the order of events • Adding new events • Violating the integrity of the distributed logs • Are detected by the approach CentraleSupelec 19

  20. Generalization and False Positive Rate False Positive Rate • Learning Phase with 10, 20, 30, 40, 50, 60 traces traces learned=10 traces learned=20 traces learned=30 traces learned=40 traces learned=50 traces learned=60 • With a generalization 90% 85% 84% 80% parameter k=1, 2, 3, 4, 5 75% 71% 70% 70% 68% 65% 60% 52% 50% • Result: 42% 40% 39% • The generalization 31% 30% 30% decreases the rate of the 22% 20% 19% 19% 18% 16% false positives, even with a 10% 10% 9% 9% 8% 8% 6% low number of traces learnt 3% 3% 2% 1% 0% 0% 0% 0% 1 2 3 4 5 k CentraleSupelec 20

  21. Real World Evaluation: XtreemFS • High Availability Distributed Replicated File System • Intrusion Detection approach applied on a simple configuration of the nodes CentraleSupelec 21

  22. Experimentation applied • Writing of a set of files • 500 files used to learn the model • 1640 files written to measure the false positive rate • Traces obtained on each node by instrumenting the code of the file servers • One trace for a complete file write CentraleSupelec 22

  23. Model Size • Number of traces used Model Size to learn the model 7800 800 grow 7600 700 • The number of Number of invariants 7400 600 Number of States invariants lower 7200 500 • The size (number of invariants 7000 400 states) of the 6800 300 States automaton grows (k- 6600 200 tail applied with k=1) 6400 100 6200 0 10 50 100 200 300 400 500 Number of Traces CentraleSupelec 23

Recommend


More recommend