smart decisions
play

Smart Decisions An Architecture Design Game Humberto Cervantes, - PowerPoint PPT Presentation

Smart Decisions An Architecture Design Game Humberto Cervantes, Serge Haziyev, Olha Hrytsay, Rick Kazman April 2015 Agenda Game Introductions Game Rules Discussion Agenda Game Introductions Game Rules Discussion Agenda Game


  1. Smart Decisions An Architecture Design Game Humberto Cervantes, Serge Haziyev, Olha Hrytsay, Rick Kazman April 2015

  2. Agenda Game Introductions Game Rules Discussion

  3. Agenda Game Introductions Game Rules Discussion

  4. Agenda Game Introductions Game Rules Discussion

  5. Instructions This game intends to illustrate the essentials of architecture design using an iterative method such as ADD. You will be competing against other software architects (or other teams) from rival companies, so you need to make smart design decisions or else your competitors will leave you behind!

  6. Introduction ADD Step 1: Review Inputs Let’s start by reviewing the inputs to the design process…

  7. Functional drivers UC-1,2 Real-time monitoring • Full-text search 24/7 Operations, • Real-time Support Engineers, Dashboard Developers UC-3 Historical static reports Web Servers • Available through BI corporate tool • Management Static Reports Hundreds of • servers Massive logs • UC-4 from Raw and aggregated historical data • multiple Ad-hoc analysis sources • Data Scientists/ Human-time queries Ad-Hoc • Analysts Reports

  8. Quality attributes

  9. Constraints

  10. Agenda Game Introductions Game Rules Discussion

  11. Game Rules ADD Step 2: Review iteration goal and select inputs ADD Step 3: Choose one or more elements of the system to decompose The game is played in rounds which represent the iterations. The goal for the iteration is provided: - Drivers to be considered - Element to decompose

  12. Instructions

  13. Iteration 1 goal: Logically structure the system Drivers for the iteration: Element to decompose: - Ad-Hoc Analysis - Real-time Analysis - Unstructured data processing Big Data System - Scalability - Cost Economy

  14. Game Rules ADD Step 4: Choose one or more design concepts that satisfy the inputs considered in the iteration Make the design decision of selecting design concepts: - Reference architectures - Patterns (including technology families) - Tactics - Externally developed components

  15. Game Rules: Design Concepts Cards Name and type of design concept Influence on drivers Patterns Technologies - Reference Architectures - Families

  16. Time to make your first smart decision! Drivers for the iteration: Element to decompose: - Ad-Hoc Analysis - Real-time Analysis - Unstructured data processing Big Data System - Scalability - Cost Economy Select 1 Reference Architecture Card Possible alternatives: Extended Relational • Pure Non-Relational • Data Refinery • Lambda Architecture • Disqualified alternatives: Traditional Relational •

  17. Fill the scorecard Fill (b) by adding the points for the drivers considered for the iteration, in this case: - Ad-Hoc Analysis - Real-time Analysis - Unstructured data processing = 1 Point - Scalability - Cost Economy

  18. Introduction ADD Step 5: Instantiate elements, allocate responsibilities and define interfaces. ADD Step 6: Sketch views and record design decisions Record the design decision and throw two dice to simulate how well you instantiate your design concept

  19. Fill the scorecard Roll the dice and add or subtract points according to the following table, Record fill (c). design decisions in (a)

  20. Introduction ADD Steps Review design decisions and score iteration. We will review the first iteration together, but the rest will be reviewed at the end.

  21. Iteration 1: Scoring Score Ad-Hoc Analysis, Real-time Analysis, Unstructured data processing, Scalability, Cost Economy Design decision Driver points Bonus points Comments This reference architecture is less appropriate for this solution mostly Extended Relational 3+2+2+2+1= 10 -4 because of cost and real-time analysis limitation This reference architecture is closer to the goal than the others except Pure Non-Relational 2+2.5+3+3+3= 13.5 Lambda Architecture This is the most appropriate reference architecture for this solution! Lambda Architecture From the provided reference architectures Lambda Architecture 2.5+3+3+3+3= 14.5 +2 promises the largest number of benefits, such as access to real-time (Hybrid) and historical data at the same time. This reference architecture is less appropriate for this solution mostly Data Refinery (Hybrid) 3+1+3+2+1= 10 -4 because of cost and real-time analysis limitation

  22. Fill the scorecard Add bonus points, if any and fill (d) Sum the points and calculate the total for the iteration in (e)

  23. Lambda Architecture Logical Structure Batch Layer Serving Layer Master Batch Views Pre-Computing Dataset Query & Data Reporting Stream Speed Layer Real-time Views Source: http://lambda-architecture.net/

  24. Big Data Analytics Reference Architectures Trade-off Pure Non-relational Lambda Architecture Legend Scalability Data Refinery Extended Unstructured data processing Relational capabilities (the larger the better) Traditional Relational Real-time analysis capabilities (more saturated the better) Ad-hoc analysis

  25. Instructions

  26. Iteration 2: Design Data Stream Element Serving Layer Batch Layer Element to Drivers for the iteration: decompose: Master Pre- Batch Performance (for Family • Dataset Computing Views and Technology) Query & Data Compatibility (for Family) Reporting Stream • Speed Layer Reliability (for Technology) • Real-time Views Select 1 Family card and 1 Technology card Possible alternatives: Disqualified alternatives: ETL Engine (lack of real-time data • stream support and no need for complex data transformations) Tip: Look for an option that can be • deployed on-Premise and on- Cloud

  27. Iteration 3: Design Batch Layer Batch Layer Serving Layer Element to decompose: Master Pre- Batch Views Dataset Computing Drivers for the iteration: Scalability • Query & Data Availability Reporting Stream • Speed Layer Real-time Views Select 1 Family card Possible alternatives: Tip: Disqualified alternatives: Look for an option NoSQL Database/Key- • • with better Value extensibility (easy NoSQL Database/Graph- • storing of new data Oriented formats) Analytic RDBMS • Distributed Search Engine •

  28. Iteration 4: Design Serving Layer Element to Batch Layer Serving Layer Drivers for the iteration: decompose: Ad-hoc Analysis (for • Master Pre- Batch Views Dataset Computing Family) Performance (for • Query & Data Family and Technology) Reporting Stream Speed Layer Real-time Views Select 1 Family and 1 Technology card Possible alternatives: Disqualified alternatives: Tip: NoSQL Database/Key- Look for an option • • Value that provides NoSQL Database/Graph- • ad-hoc Oriented analysis and still Analytic RDBMS • good performance Distributed Search Engine for static reports •

  29. Iteration 5: Design Speed Layer Element to Batch Layer Serving Layer decompose: Drivers for the iteration: Master Pre- Batch Views Ad-hoc Analysis (for Dataset Computing • the family) Query & Data Real-time Analysis (for • Reporting Stream Speed Layer the technology) Real-time Views Select 1 Family and 1 Technology card Tip: Possible alternatives: Disqualified alternatives: Look for an option NoSQL Database/Key- • • that provides full-text Value search capabilities and NoSQL • extensibility (new data Database/Graph- formats and Oriented dashboard views) Analytic RDBMS •

  30. Iteration 2: Design decisions analysis and scoring Family card: score Performance and Compatibility Design decision Driver points Bonus points Comments Data Collector 2+3= 5 +2 Additional bonus is added for extensibility Distributed Message 3+1= 4 Broker Technology card: score Performance and Reliability Design decision Driver points Bonus points Comments Apache Flume 2+2= 4 Logstash 2+2= 4 Fluentd 2+3= 5 RabbitMQ 2+2= 4 Additional bonus for easier deployment and configuration comparing Apache Kafka 3+2= 5 +2 with other alternatives Disqualified due to deployment constraint (support On-premise and Amazon SQS 0 Cloud) Apache ActiveMQ 2+2= 4

  31. Iteration 3: Design decisions analysis and scoring Family card: score Scalability and Availability Design decision Driver points Bonus points Comments NoSQL Column families must be defined up front and require modification Database/Column- 3+3= 6 -1 when log format is changed – extensibility disadvantage Family NoSQL Database/Document- 3+3= 6 Oriented Bonus for extensibility (log format changes do not require any changes Distributed File System 3+3= 6 +2 in DFS cluster) and easier deployability/maintainability compared with NoSQL databases Note: If you selected FluentD during the previous iteration and DFS at this iteration you receive -1 performance bonus (FluentD uses WebHDFS which pays a little performance cost due to HTTP)

Recommend


More recommend