secure async execution
play

Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 - PowerPoint PPT Presentation

ECS & Docker: Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 million 4 10 learners courses partners worldwide Education at Scale 18 million 140 1,800 learners courses partners worldwide Outline Evolution


  1. ECS & Docker: Secure Async Execution @ Brennan Saeta

  2. The Beginnings — 2012 1 million 4 10 learners courses partners worldwide

  3. Education at Scale 18 million 140 1,800 learners courses partners worldwide

  4. Outline • Evolution of Coursera’s nearline execution systems • Next-generation execution framework: Iguazú • Iguazú application deep dive: GrID — evaluating programming assignments

  5. Key Takeaways • What is nearline execution, and why it is useful • Best practices for running containers in production in the cloud • Hardening techniques for securely operating container infrastructure at scale

  6. A history of nearline execution

  7. Coursera Architecture (2012) PHP Monolith

  8. Early days - Requirements • Video re-encoding for distribution • Grade computation for 100,000+ learners • Pedagogical data exports for courses

  9. Coursera Architecture (2012) PHP Monolith

  10. Cascade Architecture Cascade PHP PHP Monolith Monolith

  11. Cascade Architecture Cascade Queue PHP PHP Monolith Monolith

  12. Upgrading to Scala Re-architecting delayed execution for our 2 nd generation learning platform.

  13. Upgrading to the JVM • Leverage mature Scala & JVM ecosystems for code sharing • JVM much more reliable (no memory leaks) • New job model: scheduled recurring jobs. • Named: Saturn

  14. Saturn Architecture Online Serving Scala/micro-service architecture Service A Service B Service C C* C*

  15. Saturn Architecture Online Serving Scala/micro-service architecture Service A Saturn Service B Service C C* C*

  16. Saturn Architecture Service A Saturn ZK Ensemble Service B Service C C* C*

  17. Saturn Architecture Service A Saturn Leader ZK Ensemble Service B Service C C* C*

  18. Problems with Saturn • Single master meant naïve implementation ran all jobs in same JVM • Huge CPU contention @ top of the hour • OOM Exceptions & GC issues

  19. Enter: Docker Containers allow for resource isolation! CC-by-2.0 https://www.flickr.com/photos/photohome_uk/1494590209

  20. Supported Features Platform Amazon Iguazú Saturn Docker ECS ✅ ✅ ✅ ✅ Run code ❌ ✅ ✅ ✅ Resource Isolation ☑︐ ❌ ✅ ✅ Clusters / HA Great ✅ ❌ ❌ ✅ developer workflow ✅ ❌ ❌ ✅ Scheduled Jobs

  21. Supported Features Platform Amazon Iguazú Saturn Docker ECS ✅ ✅ ✅ ✅ Run code ❌ ✅ ✅ ✅ Resource Isolation ✅ ❌ ✅ ✅ Clusters / HA Great ✅ ❌ ❌ ✅ developer workflow ✅ ❌ ❌ ✅ Scheduled Jobs

  22. Supported Features Platform Amazon Iguazú Saturn Docker ECS ✅ ✅ ✅ ✅ Run code ❌ ✅ ✅ ✅ Resource Isolation ✅ ❌ ✅ ✅ Clusters / HA Great ✅ ❌ ❌ ✅ developer workflow ✅ ❌ ❌ ✅ Scheduled Jobs

  23. Supported Features Platform Amazon Iguazú Saturn Docker ECS ✅ ✅ ✅ ✅ Run code ❌ ✅ ✅ ✅ Resource Isolation ✅ ❌ ✅ ✅ Clusters / HA Great ✅ ❌ ❌ ✅ developer workflow ✅ ❌ ❌ ✅ Scheduled Jobs

  24. Supported Features Platform Amazon ??? Saturn Docker ECS ✅ ✅ ✅ ✅ Run code ❌ ✅ ✅ ✅ Resource Isolation ✅ ❌ ✅ ✅ Clusters / HA Great ✅ ❌ ❌ ✅ developer workflow ✅ ❌ ❌ ✅ Scheduled Jobs

  25. Solution: Iguazú Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

  26. Solution: Iguazú • Framework & service for asynchronous execution • Optimized Scala developer experience for Coursera • Unified scheduler supports: • Immediate execution (nearline) • Scheduled recurring execution (cron-like) • Deferred execution (run once @ time X) Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

  27. Iguazú Architecture ECS API Iguazú Admin SQS Iguazú Devs Scheduler Iguazú Backend Iguazú Frontend Iguazú Workers Services Services Cassandra Users

  28. Iguazú Architecture ECS API Iguazú SQS Admin Queue Iguazú Devs Scheduler Iguazú Backend Iguazú Frontend Iguazú Workers Services Services Cassandra Users

  29. Iguazú Architecture ECS API Iguazú SQS Admin Queue Iguazú Devs Scheduler Iguazú Backend Iguazú Frontend Iguazú Workers Services Services Cassandra Users

  30. Iguazú Architecture ECS API ZK Ensemble Iguazú SQS Admin Queue Iguazú Devs Scheduler Iguazú Backend Iguazú Frontend Iguazú Workers Services Services Cassandra Users

  31. Iguazú Architecture ECS API ZK Ensemble Iguazú SQS Admin Queue Iguazú Devs Scheduler Iguazú Backend Iguazú Frontend Iguazú Workers Services Services Cassandra Users

  32. Autoscale, autoscale, autoscale!

  33. Autoscaling ⇄ Iguazú ⇆ ECS Shutdown Lifecycle Poll Worker Notification Job Status Iguazu Autoscaling ECS API All finished Proceed Term- inate EC2 EC2 EC2 Worker Worker Worker

  34. Failure in Nearline Systems • Most jobs are non-idempotent • Iguazú: At most once execution • Time-bounded delay • Future: At least once execution • With caveats

  35. Iguazú adoption by the numbers >1000 runs >100 different job ~100 jobs in per day schedules production

  36. Iguazú Applications Nearline Jobs Scheduled Recurring Jobs • Pedagogical Instructor • Course Reminders • System Integrations Data Exports • System Integrations • Payment reconciliation • Course translations • Course Migrations • Housekeeping • Build artifact archival • A/B Experiments

  37. While containers may help you on your journey, they are not themselves a destination. CC-by-2.0 https://www.flickr.com/photos/usoceangov/5369581593

  38. Writing an Iguazu Job class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob { override val reservedCpu = 1024 // 1 CPU core override val reservedMemory = 1024 // 1 GB RAM def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

  39. Writing an Iguazu Job class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob { override val reservedCpu = 1024 // 1 CPU core override val reservedMemory = 1024 // 1 GB RAM def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

  40. Writing an Iguazu Job class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob { override val reservedCpu = 1024 // 1 CPU core override val reservedMemory = 1024 // 1 GB RAM def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

  41. Writing an Iguazu Job class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob { override val reservedCpu = 1024 // 1 CPU core override val reservedMemory = 1024 // 1 GB RAM def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

  42. Writing an Iguazu Job class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob { override val reservedCpu = 1024 // 1 CPU core override val reservedMemory = 1024 // 1 GB RAM def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

  43. Testing an Iguazu job

  44. The Hollywood Principle applies to distributed systems. CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

  45. Deploying a new Iguazu Job • Developer • merge into master… done • Jenkins Build Steps • Compile & package job JAR • Prepare Docker image • Pushes image into registry • Register updated job with Amazon ECS API

  46. Invoking an Iguazú Job // invoking a job with one function call // from another service via REST framework RPC val invocationId = iguazuJobInvocationClient .create(IguazuJobInvocationRequest( jobName = "exportQuizGrades", parameters = quizParams))

  47. A clean environment increases reliability. CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

  48. Evaluating Programming Assignments An application of Iguazú

  49. Design Goals Elastic No Near Real-time Secure Infrastructure Maintenance Infrastructure

  50. Design Goals Elastic No Near Real-time Secure Infrastructure Maintenance Infrastructure

  51. Design Goals Elastic No Near Real-time Secure Infrastructure Maintenance Infrastructure

Recommend


More recommend