Learning Approaches to Post-Hoc LangSec Sheridan Curley and & - PowerPoint PPT Presentation

UNCLASSIFIED UNCLASSIFIED Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec Sheridan Curley and & Dr. Richard Harang (ARL) The Nation’s Premier Laboratory for Land Forces The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED UNCLASSIFIED

UNCLASSIFIED Outline Theory approach – Grammatical inference – LangSec Paper’s work – Machine learning to bypass hardness – Our experimental setup – Results Moving Forward Conclusion The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Grammatical Inference Grammars are tuples: – 𝑯 =< 𝑾, 𝚻, 𝑺, 𝑻 > – Set of nonterminal characters, 𝑾 – Set of terminal chars, 𝚻 where 𝚻 ∩ 𝑾 = ∅ • AKA the alphabet – Production rules, 𝑺: 𝑾 → 𝑾 ∪ 𝚻 ∗ – Set of starting chars, 𝑻 ⊂ 𝑾 Grammars generate Languages ∗ ∗ – ℒ 𝑯 = {𝒙 ∈ 𝚻 ∗ : 𝑻 𝒙} , denoting reflexive, transitive closure The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Chomsky’s Hierarchy Chomsky Hierarchy – Defines complexity of known languages – 4 “levels” – Lowest level languages: • “ Regular ” • “Context - Free” (Deterministic or Nondeterministic) Image: “Chomsky Hierarchy.“ Wikipedia. 30 April 2016. Web. <https://en.wikipedia.org/>. The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Key Questions Biggest questions are: – Given a grammar; produced language = <?> – Equivalence of grammars/languages – Learning grammars from language samples The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Inference Results Most theory negative: – Above “Regular” cannot be learned generally Even probabilistic identification hard – Valiant’s Probably Approximately Correct Some languages have learnable properties: – Angluin’s “pattern languages” – Clark’s “ nonterminally separated” The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Pattern Language Example Above taken from Angluin’s “Finding Patterns in Sets of Strings” Given: 𝚻 = 𝟏, 𝟐 , 𝒒 = 𝟐𝒚 𝟐 𝟏𝟐𝒚 𝟑 𝒚 𝟒 Then: 𝒙 = 𝟐𝟐𝟏𝟐𝟐𝟐, 𝟐𝟏𝟏𝟐𝟐𝟐, 𝟐𝟏𝟏𝟐𝟏𝟐 ⊆ ℒ(𝒒) - Restricted language - Equivalence still NP-hard The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED NTS Languages Clark’s Omphalos algorithm: - Gives exact results Above taken from Clark’s “Learning Deterministic Context Free Grammars: The Omphalos Competition” - Very slow - May not converge reasonably The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Language Theoretic Security Learning grammars is hard: – Cannot determine if parser’s grammar is equivalent to another – Cannot enumerate all “safe” or “bad” strings for parser – Cannot generically learn all parsers with one method To be secure… – Parsers must be restricted to low Chomsky hierarchy – This can be difficult given existing practices The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Learning vs Recognition Computers are discrete, computational – Must be some type of underlying structure – Should be possible to recognize valid structure Rather than exact learning (hard), try close recognition – Relax assumptions Apply machine learning: – Build and train off feature vectors from language examples Key differences: – Building “sentences” from parts using rules (exact) – Recognizing language with only “letters” known (M.L.) The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Our Network Multi-layered LSTM* network: – One-hot feature vector input – Embedding layer – 3-layers of LSTM – Softmax output See Hochreiter & Schmidhuber’s “Long Short - Term Memory” The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Long Short-Term Memory Subtype of Recurrent Neural Network: – Feed-forward to next levels – Feed into same layer simultaneously – Persistent “memory” that is edit -limited Shown to be able to learn over “long - distances” Image: Olah, Christopher. "Understanding LSTM Networks." Colah's Blog . 27 Aug. 2015. Web. <http://colah.github.io/>. The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Training Data Labeled URI data from Apache server logs – URI + response code only – Possible to have multiple labels URI initially unknown language – Network given no prior structure information – Knows nothing about RFC or other rules re: URIs – URI theoretically a CFG Goal is validation – Recognizing valid URIs only – Rejecting improper/invalid URIs The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Results of LSTM Application The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Improving Results Practical learning possible – Recognition rate for grouped URIs >99% – However, false positive rate high Network can be trained to recognize URIs – No prior knowledge – However, training is time consuming – Practical use requires faster identification The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Future Work Possible: develop entropy-based rules – Construct quicker decision machine Possible: test for vulnerability to malicious training – Robustness of result determines efficacy The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Conclusion Theory is often hard (very hard) – Complicated languages have complicated structure – No clear exact learning results Experimental results are promising – Despite theory, can “learn” valid URI – Not perfect, but may be good enough Learning differences – “Exact” builds rules, start, end symbols from given samples – M.L. builds recognizer from alphabet and given samples – M.L. can recognize unlearnable languages The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

UNCLASSIFIED Questions? The Nation’s Premier Laboratory for Land Forces UNCLASSIFIED

Learning Approaches to Post-Hoc LangSec Sheridan Curley and & - PowerPoint PPT Presentation

UNCLASSIFIED UNCLASSIFIED Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec Sheridan Curley and & Dr. Richard Harang (ARL) The Nations Premier Laboratory for Land Forces The Nations Premier Laboratory for Land

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

Process Slides 1 Directing a Project Mandate REQUEST AN EXCEPTION PLAN Ad Ad Ad Ad hoc

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc & Sensor Networks Wireless Ad Hoc & Sensor Networks Introduction -

Corridor Routing Routing in Mobile in Mobile Ad Ad- -hoc hoc Networks Networks Corridor

Energy Management Issue in Ad Hoc Networks Outline In ad hoc networks the devices are battery

GeNeuro and Servier Announce Promising post hoc Analyses of Six-Month Data from CHANGE-MS Phase 2b

Servier and GeNeuro Announce Promising post hoc Analyses of Six-Month Data from CHANGE-MS Phase

Post hoc identification of essential properties of the social networks from a complex simulation

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances

Research Report: Mitigating LangSec Problems With Capabilities Or: How Sandstorm Taught Me to

Compact, totally separated and well-ordered types in univalent mathematics Mart n H otzel

Lecture 2: Intelligent Agents Prof. Julia Hockenmaier juliahmr@illinois.edu

Justification Logic Who? Natalia Kotsani - based on the work of S. Artemov (The Logic of

AI TEAMMATES PUT THE PLAYER BACK AT THE CENTER OF OUR FRIENDLY AI BEHAVIOR PLAN Context

D ECENTRALIZED O RCHESTRATION OF D ATA - CENTRIC W ORKFLOWS U SING THE O BJECT M ODELING S YSTEM

ADVOCACY 101 Texas Society of Oral and Maxillofacial Surgeons, June2019 Prepared by Kelly Parker,

Beam Adjustment Status PANDA Collaboration Meeting 2017/2 Benjamin Hetz Institut fr Kernphysik

April 2009 Wes J. Lloyd 1 Framework Invasiveness Coupling between application code and

Learning Approaches to Post-Hoc LangSec Sheridan Curley and & - PowerPoint PPT Presentation

UNCLASSIFIED UNCLASSIFIED Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec Sheridan Curley and & Dr. Richard Harang (ARL) The Nations Premier Laboratory for Land Forces The Nations Premier Laboratory for Land

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

Process Slides 1 Directing a Project Mandate REQUEST AN EXCEPTION PLAN Ad Ad Ad Ad hoc

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc &amp; Sensor Networks Wireless Ad Hoc &amp; Sensor Networks Introduction -

Corridor Routing Routing in Mobile in Mobile Ad Ad- -hoc hoc Networks Networks Corridor

Energy Management Issue in Ad Hoc Networks Outline In ad hoc networks the devices are battery

GeNeuro and Servier Announce Promising post hoc Analyses of Six-Month Data from CHANGE-MS Phase 2b

Servier and GeNeuro Announce Promising post hoc Analyses of Six-Month Data from CHANGE-MS Phase

Post hoc identification of essential properties of the social networks from a complex simulation

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances

Research Report: Mitigating LangSec Problems With Capabilities Or: How Sandstorm Taught Me to

Compact, totally separated and well-ordered types in univalent mathematics Mart n H otzel

Lecture 2: Intelligent Agents Prof. Julia Hockenmaier juliahmr@illinois.edu

Justification Logic Who? Natalia Kotsani - based on the work of S. Artemov (The Logic of

AI TEAMMATES PUT THE PLAYER BACK AT THE CENTER OF OUR FRIENDLY AI BEHAVIOR PLAN Context

D ECENTRALIZED O RCHESTRATION OF D ATA - CENTRIC W ORKFLOWS U SING THE O BJECT M ODELING S YSTEM

ADVOCACY 101 Texas Society of Oral and Maxillofacial Surgeons, June2019 Prepared by Kelly Parker,

Beam Adjustment Status PANDA Collaboration Meeting 2017/2 Benjamin Hetz Institut fr Kernphysik

April 2009 Wes J. Lloyd 1 Framework Invasiveness Coupling between application code and

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc & Sensor Networks Wireless Ad Hoc & Sensor Networks Introduction -