2/7/2018 Day 8 Workflow Cloud Resource Provisioning Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions Introduction • Cloud provides resources necessary to execute workflow applications based on – Resources are charged by its type and duration – Pay as you go (e.g., hourly pricing scheme) – Other evolving cloud fee structures spot and reserved pricing) Cloud service request provider Cloud consumer J1 J1 Jn • CPU time usage • Data transfer cost (billed by GB) include the c ost to copy data to/from cloud over network This billed by GB • Storage cost (billed by GB) is the c ost to store VM images 1
2/7/2018 Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions What is workflow? • Simply stated workflow application is divisible application with large task that has precedence constraints. • Workflow applications are being used in a range of domains, such as – astrophysics, – bioinformatics, and – disaster modeling and prediction. – neurosurgical imaging. Directed Acyclic Graph • Workflow application can be modelled as a a Directed Acyclic Graph (DAGs). – the nodes represent the set of workflow tasks – the arcs represent the set of control flow or data dependencies between the tasks. T1 T4 T6 T3 T8 T5 T2 2 T7 – the dependent tasks require a specific execution order due to the relationship between them. – tasks have very different I/O and computational behavior. 2
2/7/2018 Microarray gene expression • Microarray gene expression data analysis workflow expression – Task 1 gene expression data is obtained from a microarray experiments. – Task 2: cluster analysis algorithms are used to identify genes that share similar patterns of gene expression profiles that are then predicted to be co-regulated as part of an interactive biochemical pathway. – … – Task 8: the consensus sequence is fed to the BLAST utility to determine if the gene is a new candidate or for the iteration to continue. Real workflow applications CyberShake workflow is used by the Southern California Earthquake Center to distinguish earthquake threatening a region. The Epigenomics workflow created by the USC Epigenome Center and the The Montage application created Pegasus framework is used to LIGO’s Inspiral Analysis workflow is used by NASA/IPAC closes together automate the different to create and analyze gravitational multiple input images to form operations in genome waveforms from data gathered during the sequence processing. custom mosaics of the sky. coalescing of compact binary systems Epigenomics Workflow • Orchestrate complex, multi-stage scientific computations • It is possible to automatically parallelize it on distributed resources Split Filter & Convert Map Merge Analyze Epigenomics Workflow From Gideon Juve 3
2/7/2018 Large-Scale, Data-Intensive Workflows John Good (Caltech) • Montage Galactic Plane Workflow – 18 million input images (~2.5 TB) – 900 output images (2.5 GB each, 2.4 TB total) – 10.5 million tasks (34,000 CPU hours) • Scientific workflow management systems are designed to automatically distribute data and computations for these large applications From Gideon Juve Workflow Application Patterns • Montage (astronomy) – I/O: High (95% of time waiting on I/O) – Memory: Low – CPU: Low • Epigenome (bioinformatics) – I/O: Low – Memory: Medium – CPU: High (99% of time in CPU) • Broadband (earthquake science) – I/O: Medium – Memory: High (75% of time tasks use > 1GB) – CPU: Medium From Gideon Juve Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions 4
2/7/2018 Workflow Management System Workflow Scheduling • A process that maps and manages the execution of inter-dependent tasks on the distributed resources. • It allocates suitable resources to workflow tasks such that the execution can be completed to satisfy objective functions imposed by users . • Question: How visible is the general scheduler for scheduling workflows? Scheduling Structure • Create a schedule that meet the objectives of the problem • For example – minimizes the total execution cost of a workflow – while satisfying a user-defined deadline • It is well known to be an NP-complete problem. • The scheduling processes of the workflow applications are a multiobjective optimization problem (also known as Pareto optimization), 5
2/7/2018 Workflow Scheduling Classification • There are two types of workflow scheduling: – the best-effort workflow scheduling – the quality of services (QoS) constraint workflow scheduling. Best-effort workflow scheduling • Focuses on reducing the execution time of the whole workflow tasks regardless of other factors. • An example best-effort workflow scheduling algorithm – min-min algorithm – execute the small tasks first and delays the larger tasks for a longer time – max–min algorithm - execute the large tasks first and the small tasks are delayed for a longer time. Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions 6
2/7/2018 QoS Requirements • Quality of Service (QoS) is used to measure the level of satisfaction of a service. • A variety of QoS is discussed in the literature: makespan, cost, reliability and energy For example – The deadline of a workflow is defined as the maximum finish time of its last task to be executed. – Budget is defined as the maximum amount that a user wants to pay for executing a workflow application on computing resources. QoS Requirements • Makespan – An important metric in workflow scheduling – It is defined as the maximum completion time of the workflow. 𝑛𝑏𝑙𝑓𝑡𝑞𝑏𝑜 = max ����� (𝑑𝑝𝑛𝑞𝑚𝑓𝑢(𝑢 � )) Task 𝒰 T1 T2 T3 T4 T5 T6 T7 T8 Completion Time 𝑢 � 3 8 12 4 6 10 12 8 QoS Requirements • This scheduling problem can be formulated as follows: min 𝑛𝑏𝑙𝑓𝑡𝑞𝑏𝑜 � s. t.� x � � ,� � = 1 t � ∈ 𝒰, r � ∈ ℛ ��� � ∑ T � � ,� � x � � ,� � ≤ ��� x � � ,� � ∈ 0, 1 7
2/7/2018 QoS Requirements • Cost represents the cost related for workflow tasks to complete all its tasks. The aim is to minimize the total execution cost of a workflow • The scheduling objective can be formulated as follows minimize T t � ,r � + C t � ,r � s. t. T t � , r � = 𝐹𝑈 t � ,r � − t ��� t ��� − t ��� C t � ,r � = 𝑑 t � , r � − c ��� c ��� − c ��� – T t � , r � : the execution time of task t � on r � – C t � , r � : the monetary cost for executing task t � on r � . – t ��� : the maximum execution time – t ��� : the minimum execution time – c ��� : the maximum monetary cost – c ��� : the minimum monetary cost. Multi-objective Criteria • The workflow scheduling problem becomes more challenging when we consider multiple QoS parameters. • Meta-heuristic methods or search-based strategies have been used to achieve good solutions are quite common. • Research problem : meta-heuristics or search-based strategies usually need significantly high planning costs in terms of the time consumed to produce good results, which makes them less useful in real platforms that need to obtain map decisions on the fly. Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions 8
2/7/2018 Workflow representation • The workflow application tasks are dependent on each other, – the output of some tasks is the input to another. – the order of their execution must be considered when assigning the tasks to VM. precedence constraint task t 1 t 4 t entry t 3 t exit dummy node dummy node t 2 t 5 Directed acyclic graph Workflow representation • How do we represent workflow for processing? t1 Tasks 1 2 3 4 Tasks 1 2 3 4 4 7 1 0 4 7 0 1 0 1 1 0 t2 t3 2 0 0 0 9 2 0 0 0 1 3 0 0 0 6 9 3 0 0 0 1 6 4 0 0 0 0 4 0 0 0 0 t4 Task Interdependence Link cost • The numbers on the link are estimated transfer time/transfer cost of sending the required data along the link Todays Agenda • Introduction • What is workflow? • What are major QoS requirements? • How is workflow represented? • Our work in workflow scheduling • Research problems • Questions and conclusions 9
Recommend
More recommend