Formal Semantics for Composable Workflows for Scraper Understanding Flows Albert Schimpf wiki.scraper.server1.link Technische Universität Kaiserslautern (TUK), Kyoto University Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 1 / 26
Content Motivation & Observation Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Content Motivation & Observation Flows Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Content Motivation & Observation Flows Motivation for an Operational Semantics Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Content Motivation & Observation Flows Motivation for an Operational Semantics Scraper Language Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Content Motivation & Observation Flows Motivation for an Operational Semantics Scraper Language Insights & Future Work Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Problem Boundary Task is ... Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
Problem Boundary Task is ... ◮ Resource-intensive (proxies, I/O bound) ◮ Resume-able, long-running ◮ Flexible stream of fresh data ◮ Easily modifiable structure Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
Problem Boundary Task is ... ◮ Resource-intensive (proxies, I/O bound) ◮ Resume-able, long-running ◮ Flexible stream of fresh data ◮ Easily modifiable structure and not ... ◮ CPU-intensive ◮ user interactive Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
Informal Use Case Specification Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
Informal Use Case Specification Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
Informal Use Case Specification Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
Possible Approaches One program for each task (Java) ◮ too much effort ◮ fragile ◮ code duplication ... Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
Possible Approaches One program for each task (Java) ◮ too much effort ◮ fragile ◮ code duplication ... Reuse functionality, abstract and share code ◮ modifications of sub-routines affected other programs ◮ mixed control-flow and data-flow hard to reason about ◮ language focused on control-flow less suited for data-flow problems Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
Possible Approaches One program for each task (Java) ◮ too much effort ◮ fragile ◮ code duplication ... Reuse functionality, abstract and share code ◮ modifications of sub-routines affected other programs ◮ mixed control-flow and data-flow hard to reason about ◮ language focused on control-flow less suited for data-flow problems Encapsulate functions into nodes and connect them → nodes don’t crash, threads that access a node crash Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
Functional Nodes Functional nodes, what about... ◮ ... connecting them (graph structure)? ◮ ... how data is passed around (API)? ◮ ... concurrent access? ◮ ... configuration? ◮ ... complex control-flow, data-parallelism? Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 6 / 26
Functional Nodes Functional nodes, what about... ◮ ... connecting them (graph structure)? ◮ ... how data is passed around (API)? ◮ ... concurrent access? ◮ ... configuration? ◮ ... complex control-flow, data-parallelism? Use specifications instead of programming in Java (DSL) Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 6 / 26
Summary: Requirements Reusable & adaptable nodes Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Summary: Requirements Reusable & adaptable nodes Separation of business logic and program logic Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Summary: Requirements Reusable & adaptable nodes Separation of business logic and program logic Quasi-static graph-like specification Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Summary: Requirements Reusable & adaptable nodes Separation of business logic and program logic Quasi-static graph-like specification Reliability ◮ Guarantee concurrent access and processing of data at any time without crashing Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Summary: Requirements Reusable & adaptable nodes Separation of business logic and program logic Quasi-static graph-like specification Reliability ◮ Guarantee concurrent access and processing of data at any time without crashing Robustness ◮ Errors only happen during initialization of the specification ◮ After initialization, errors are guaranteed to be of business-logic nature Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Flows - Arrows & Nodes Simple Graph Nodes ... ◮ ... implement single unit of work ◮ ... forward data to another node No forward target denotes end with some result data Where is the process, how is work done? Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Flows - Arrows & Nodes Flows (implicit) Initial (empty) flow map F i (flow map) accepted by first node Result data for input F i is F 1 Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Flows - Arrows & Nodes Dependent & Dispatched Flows dispatch node creates a new flow new flow is independent F 2 does not depend on F 1 Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Design Decisions Data in flows are JSON maps Nodes are functional map consumers Nodes are configurable, configuration is typed Nodes are addressable by label Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 9 / 26
Formalization - Considerations Coordination languages CCS / π -calculus and extensions Concurrent object-oriented calculi Petri-nets Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 10 / 26
Formalization - Considerations Coordination languages CCS / π -calculus and extensions Concurrent object-oriented calculi Petri-nets Operational Semantics ◮ Goal: type safety Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 10 / 26
Scraper Syntax Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Scraper Syntax Control-flow defined explicitly Functional nodes handled with mod Process lookups to enable complex configurations Processes are concatenation of nodes Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Scraper Syntax Terms used for configuration of nodes JSON values and templates τ Templates used to lookup values in the current accessing map Lookup element should match type Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Scraper Syntax Map (called FM store) binds terms to keys Typing: JSON objects Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Scraper Evaluation Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 12 / 26
Scraper Evaluation - I Active, forked, and concurrent processes Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 13 / 26
Scraper Evaluation - II Nested evaluation Pull concurrent configurations out of forks Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 14 / 26
Scraper Evaluation - III Function evaluation encapsulated Process lookup inserts new nodes DISP and FORK introduce concurrency JOIN merges forked configurations Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 15 / 26
Scraper Functions Template evaluation inside functional nodes No templates inside map Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 16 / 26
Scraper Typing - Excerpt Concurrent configurations typed with same environment Process and store typed separately Join stores old typing and joins with a list of keys Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 17 / 26
Scope: Language vs. Implementation Stateful nodes ◮ Time Exceptions Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
Scope: Language vs. Implementation Stateful nodes ◮ Time Exceptions Data-parallelism ◮ Map ◮ Map join Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
Scope: Language vs. Implementation Stateful nodes ◮ Time Exceptions Data-parallelism ◮ Map ◮ Map join Complex templates ◮ Template expressions ◮ Key lookup @ | τ : String | ⋆ @ | a | : Simple template ⋆ @@ | a | : Look inside maps (UnpackMapNode) ◮ Array lookup | τ : List<T> | [ τ : Integer ] ◮ String concatenation τ : String + τ : String ◮ Simple value Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
Recommend
More recommend