delivering intelligence from space crop forecasting
play

Delivering Intelligence from space Crop Forecasting Pipeline - PowerPoint PPT Presentation

Delivering Intelligence from space Crop Forecasting Pipeline Monitoring Market Intelligence Gathering Plane & Ship Tracking 3 Distributed Data Engineering - Lessons 4 Distributed Data Engineering - Lessons 1. Metrics 5 Distributed


  1. Delivering Intelligence from space

  2. Crop Forecasting Pipeline Monitoring Market Intelligence Gathering Plane & Ship Tracking

  3. 3 Distributed Data Engineering - Lessons

  4. 4 Distributed Data Engineering - Lessons 1. Metrics

  5. 5 Distributed Data Engineering - Lessons 1. Metrics 2. Logging

  6. 6 Distributed Data Engineering - Lessons 1. Metrics 2. Logging 3. Frameworks

  7. 7 Distributed Data Engineering - Lessons 1. Metrics 2. Logging 3. Frameworks 4. Serverless ETL

  8. 8 1. Metrics were lacking

  9. 9 Before user_id num_downloads num_uploads 4 10 1 7 6 3

  10. 10 Before user_id num_downloads num_uploads 4 11 +1 1 7 6 3 Server

  11. 11 downloads date user_id image_size 2017-10-01 14:40:32 4 1365 2017-10-02 11:01:11 4 650

  12. 12 downloads date user_id image_size 2017-10-01 14:40:32 4 1365 2017-10-02 11:01:11 4 650 2017-10-02 11:06:00 5 9001

  13. 13 Source of Truth Database

  14. 14 Source of Truth Database ● Migration headaches ● Manage connections ● Performance

  15. 15 Source of Truth Database Logging { "message": "Downloaded img", "userId": "1234", "imgId": "1d3x5", "service": "download-server", "time": "1509385330" }

  16. 16 Two Kinds of Logs Server logs [Wed Oct 11 14:32:12 2000] [info] [client ● Debugging 127.0.0.1] image 1d3x5 downloaded by userId 1234 ● Support

  17. 17 Two Kinds of Logs Server logs [Wed Oct 11 14:32:12 2000] [info] [client ● Debugging 127.0.0.1] image 1d3x5 downloaded by userId 1234 ● Support { Metric logs "message": "Downloaded img", ● Dashboards "userId": "1234", "imgId": "1d3x5", ● Analytics "service": "download-server", "time": "1509385330" }

  18. 18 { Metric logs "message": "Downloaded img", ● Dashboards "userId": "1234", "imgId": "1d3x5", ● Analytics "service": "download-server", "time": "1509385330" }

  19. 19 Centralize

  20. 20 Metric Collector import observatory obs = observatory.Tracker() obs.track('search_made', { 'query': event.query, 'n_results': len(resp['data']), 'user_id': user_item.id })

  21. 21 Metric Collector Enrich / Conform import observatory obs = observatory.Tracker() obs.track('search_made', { REST API 'query': event.query, Lambda 'n_results': len(resp['data']), 'user_id': user_item.id })

  22. 22 Metric Collector import observatory obs = observatory.Tracker() obs.track('search_made', { REST API 'query': event.query, Lambda Redshift 'n_results': len(resp['data']), 'user_id': user_item.id }) S3

  23. 23 After ● Centralized metrics ● Log enrichment ● Persistent store REST Lambda Redshift S3

  24. 24 2. Debugging is painful

  25. 25 Before Filesystem EC2 CloudWatch Lambda EC2 Filesystem

  26. 26 Centralize

  27. 27 EC2 Stream Lambda EC2

  28. 28 EC2 Lambda CloudWatch EC2

  29. 29 EC2 Lambda CloudWatch Consumer EC2

  30. 30 ES EC2 SAAS Lambda CloudWatch Consumer S3 EC2

  31. 31 After ● Elasticsearch ● Search by UUID

  32. 32 But! What does the full flow of a request look like?

  33. 33 Correlation ID UUID

  34. 34 Correlation ID ● Create for any external call CID External Service A Request

  35. 35 Correlation ID ● CID passed everywhere CID Service B External Service A Request Service C

  36. 36 Correlation ID In ES → filter �y CID

  37. 37 3. Building services is slow

  38. 38 Before ● Online console ● Zip file deployment ● Doesn’t s�ale

  39. 39 Infrastructure as Code

  40. 40 Infrastructure as Code ● Serverless Framework ● Template -> service ● Rapid deployment

  41. 41 # serverless.yml service: users provider: name: aws runtime: nodejs6.10 stage: dev functions: usersCreate: handler: users.create events: - http: post users/create resources: Resources: usersTable: Type: AWS::DynamoDB::Table Properties: TableName: usersTable AttributeDefinitions: - AttributeName: email AttributeType: S

  42. 42 # serverless.yml service: users Microservice provider: name: aws runtime: nodejs6.10 stage: dev functions: usersCreate: handler: users.create Internet Lambda events: gateway handler - http: post users/create S3 resources: Resources: usersTable: Type: AWS::DynamoDB::Table Properties: TableName: usersTable AttributeDefinitions: Dynamodb - AttributeName: email AttributeType: S

  43. 43 # serverless.yml service: users provider: name: aws runtime: nodejs6.10 stage: dev STAGE: ${opt:stage, self:provider.stage} functions: usersCreate: handler: users.create events: - http: post users/create resources: Resources: usersTable: Type: AWS::DynamoDB::Table Properties: TableName: usersTable AttributeDefinitions: - AttributeName: email AttributeType: S

  44. 44 # serverless.yml service: users provider: name: aws runtime: nodejs6.10 stage: dev ● Service info as ENV vars STAGE: ${opt:stage, self:provider.stage} functions: usersCreate: handler: users.create events: - http: post users/create resources: Resources: usersTable: Type: AWS::DynamoDB::Table Properties: TableName: usersTable AttributeDefinitions: - AttributeName: email AttributeType: S

  45. 45 # serverless.yml service: users provider: name: aws runtime: nodejs6.10 stage: dev ● Service info as ENV vars STAGE: ${opt:stage, self:provider.stage} ● Inject in logs functions: usersCreate: handler: users.create events: - http: post users/create resources: import observatory Resources: obs = observatory.Tracker() usersTable: Type: AWS::DynamoDB::Table Properties: obs.track('search_made', { TableName: usersTable 'query': event.query, AttributeDefinitions: - AttributeName: email 'n_results': len(resp['data']), AttributeType: S 'user_id': user_item.id })

  46. 46 After ● Rapid dev ● Source controlled ● Log enrichment

  47. 47 4. Server time == $$$

  48. 48 Before ● Bursty ● ETL servers idle

  49. 49 Transient Resources ● Pipeline: ○ Spin up EC2 ○ Terminate ETL EC2 EC2 DB

  50. 50 FAAS Resources ● Pipeline: ○ Discretize work ○ Lambda fleet ○ Inherently transient Worker Listener Worker DB Worker

  51. 51 After ● Faster ● Cheaper ● Highly scalable ETL

  52. 52 Distributed Data Engineering - Lessons 1. Metrics 2. Logging 3. Frameworks 4. Serverless ETL

  53. Skywatch

Recommend


More recommend