design implications for enterprise storage systems via
play

Design Implications for Enterprise Storage Systems via - PowerPoint PPT Presentation

Introduction Methodology Analysis process Results Conclusions Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis Piotr Dobrowolski MIMUW/Distributed Systems October 24, 2012 Piotr Dobrowolski


  1. Introduction Methodology Analysis process Results Conclusions Background & Needs Analysis at different layers Storage is divided to: Clients, creation, and view files via applications, and servers, stores the content, Clients operates on user and applications layers. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  2. Introduction Methodology Analysis process Results Conclusions Background & Needs Analysis at different layers Storage is divided to: Clients, creation, and view files via applications, and servers, stores the content, Clients operates on user and applications layers. Basically client/server behavior creates layer. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  3. Introduction Methodology Analysis process Results Conclusions Background & Needs Analysis at different layers Storage is divided to: Clients, creation, and view files via applications, and servers, stores the content, Clients operates on user and applications layers. Basically client/server behavior creates layer. In example: user sessions, application instances, files on server, directories, ... Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  4. Introduction Methodology Analysis process Results Conclusions Background & Needs Analysis at different layers Storage is divided to: Clients, creation, and view files via applications, and servers, stores the content, Clients operates on user and applications layers. Basically client/server behavior creates layer. In example: user sessions, application instances, files on server, directories, ... (def) Access unit = layer, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  5. Introduction Methodology Analysis process Results Conclusions Background & Needs Analysis at different layers Storage is divided to: Clients, creation, and view files via applications, and servers, stores the content, Clients operates on user and applications layers. Basically client/server behavior creates layer. In example: user sessions, application instances, files on server, directories, ... (def) Access unit = layer, Access pattern is set of access units behaviors. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  6. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  7. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Lets call them feature of this access unit, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  8. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Lets call them feature of this access unit, For example, for an application, the read size in bytes is a feature; the number of unique files accessed is another, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  9. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Lets call them feature of this access unit, For example, for an application, the read size in bytes is a feature; the number of unique files accessed is another, Each feature represents an independent mathematical dimension, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  10. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Lets call them feature of this access unit, For example, for an application, the read size in bytes is a feature; the number of unique files accessed is another, Each feature represents an independent mathematical dimension, dimension describes access unit. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  11. Introduction Methodology Analysis process Results Conclusions Background & Needs At different dimensions Access unit has certain inherent characteristics, Lets call them feature of this access unit, For example, for an application, the read size in bytes is a feature; the number of unique files accessed is another, Each feature represents an independent mathematical dimension, dimension describes access unit. Dimension = feature = characteristic. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  12. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  13. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  14. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  15. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Write sequentiality Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  16. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Write sequentiality Repeated reads Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  17. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Write sequentiality Repeated reads Overwrites Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  18. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Write sequentiality Repeated reads Overwrites Measuring one dimension at time is easier, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  19. Introduction Methodology Analysis process Results Conclusions Background & Needs Example of multi-dimensional insight Files with > 70% sequential read or sequential write have no repeated reads or overwrites. 4 dimensions: Read sequentiality Write sequentiality Repeated reads Overwrites Measuring one dimension at time is easier, In this same time captures other dimensions. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  20. Introduction Methodology Analysis process Results Conclusions Background & Needs In short The need for multi-layered and multi-dimensional insights motivates their methodology. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  21. Introduction Methodology Analysis process Results Conclusions Traces Scale #1 Analysis were taken on Common Internet File System (CIFS), Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  22. Introduction Methodology Analysis process Results Conclusions Traces Scale #1 Analysis were taken on Common Internet File System (CIFS), Better known by Server Message Block (SMB), Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  23. Introduction Methodology Analysis process Results Conclusions Traces Scale #1 Analysis were taken on Common Internet File System (CIFS), Better known by Server Message Block (SMB), Or (open source) SAMBA, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  24. Introduction Methodology Analysis process Results Conclusions Traces Scale #1 Analysis were taken on Common Internet File System (CIFS), Better known by Server Message Block (SMB), Or (open source) SAMBA, M$ uses it in Windows, also known as “Microsoft Windows Network”, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  25. Introduction Methodology Analysis process Results Conclusions Traces Scale #1 Analysis were taken on Common Internet File System (CIFS), Better known by Server Message Block (SMB), Or (open source) SAMBA, M$ uses it in Windows, also known as “Microsoft Windows Network”, CIFS allows to identify layers, doing trace. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  26. Introduction Methodology Analysis process Results Conclusions Traces Scale #2 Data were collecting for 3 months, in 2007 Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  27. Introduction Methodology Analysis process Results Conclusions Traces Scale #2 Data were collecting for 3 months, in 2007 2 different datasets: Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  28. Introduction Methodology Analysis process Results Conclusions Traces Scale #2 Data were collecting for 3 months, in 2007 2 different datasets: corporate Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  29. Introduction Methodology Analysis process Results Conclusions Traces Scale #2 Data were collecting for 3 months, in 2007 2 different datasets: corporate engineering Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  30. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  31. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  32. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  33. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  34. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  35. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Engineering trace: Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  36. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Engineering trace: 500 employees in various engineering roles Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  37. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Engineering trace: 500 employees in various engineering roles 19TB active storage, Windows and Linux applications Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  38. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Engineering trace: 500 employees in various engineering roles 19TB active storage, Windows and Linux applications 232,033 user sessions, 741,319 application instances Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  39. Introduction Methodology Analysis process Results Conclusions Traces Scale #3 Corporate trace: 1000 employees in marketing, finance, etc. 3TB active storage, Windows applications 509,076 user sessions, 138,723 application instances 1,155,099 files, 117,640 directories Engineering trace: 500 employees in various engineering roles 19TB active storage, Windows and Linux applications 232,033 user sessions, 741,319 application instances 1,809,571 files, 161,858 directories Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  40. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means clustering Partition n observations into k clusters, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  41. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means clustering Partition n observations into k clusters, Each observations belong to the cluster with the nearest mean, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  42. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means clustering Partition n observations into k clusters, Each observations belong to the cluster with the nearest mean, It is classified as NP-hard problem, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  43. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means clustering Partition n observations into k clusters, Each observations belong to the cluster with the nearest mean, It is classified as NP-hard problem, But there are good heuristic, (found local optimum), Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  44. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means What we want is Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  45. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means What we want is � k x j ∈ S i � x j − c i � 2 min S � i =1 Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  46. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids k-means What we want is � k x j ∈ S i � x j − c i � 2 min S � i =1 Where c i is the mean of points in cluster S i Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  47. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Observations in 2 dimensions Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  48. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Chose k = 2 “means” randomly Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  49. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Divide space by nearest mean Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  50. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Connect to the centers Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  51. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids New means Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  52. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Once again Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  53. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Once again Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  54. Introduction Methodology Analysis process Results Conclusions Algorithm, centroids Repeat until not stabilized Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  55. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  56. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Define access unit, (need domain knowledge about storage system), Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  57. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Define access unit, (need domain knowledge about storage system), Extract multiple instances of each access unit, with values, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  58. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Define access unit, (need domain knowledge about storage system), Extract multiple instances of each access unit, with values, Input those values into k-means, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  59. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Define access unit, (need domain knowledge about storage system), Extract multiple instances of each access unit, with values, Input those values into k-means, Interpret the k-means output and derive access patterns, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  60. Introduction Methodology Analysis process Results Conclusions Overview Analysis step by step Collect network storage system traces, Define access unit, (need domain knowledge about storage system), Extract multiple instances of each access unit, with values, Input those values into k-means, Interpret the k-means output and derive access patterns, Translate access patterns to design insights. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  61. Introduction Methodology Analysis process Results Conclusions Overview Flow looks like this. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  62. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How it works For each access unit extract instances of from trace Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  63. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How it works For each access unit extract instances of from trace i.e. session instances, application instances Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  64. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How it works For each access unit extract instances of from trace i.e. session instances, application instances For all instances compute all numerical values (i.e. opened files), Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  65. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How it works For each access unit extract instances of from trace i.e. session instances, application instances For all instances compute all numerical values (i.e. opened files), This gives data array with row - instances, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  66. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How it works For each access unit extract instances of from trace i.e. session instances, application instances For all instances compute all numerical values (i.e. opened files), This gives data array with row - instances, k-means algorithm produces clusters, which are described as a data access patterns Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  67. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How to determinate number of clusters In heuristic K-means algorithm k is fixed, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  68. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How to determinate number of clusters In heuristic K-means algorithm k is fixed, Remind - k will be correlated with number of our access patterns Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  69. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How to determinate number of clusters In heuristic K-means algorithm k is fixed, Remind - k will be correlated with number of our access patterns Optimization, searching for best k Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  70. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How to determinate number of clusters In heuristic K-means algorithm k is fixed, Remind - k will be correlated with number of our access patterns Optimization, searching for best k by computing more than one value, Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

  71. Introduction Methodology Analysis process Results Conclusions Identify access patterns via k-means How to determinate number of clusters In heuristic K-means algorithm k is fixed, Remind - k will be correlated with number of our access patterns Optimization, searching for best k by computing more than one value, Using metrics describing clusters “quality”. Piotr Dobrowolski MIMUW/Distributed Systems Design Implications

Recommend


More recommend