opening your eyes to how your mainframe tape environment
play

Opening your eyes to how your Mainframe Tape environment is really - PowerPoint PPT Presentation

17664 Opening your eyes to how your Mainframe Tape environment is really performing. Burt Loper John Ticic Insert Custom Session QR if Desired www.IntelliMagic.com Agenda Is Tape processing dead? What data is available? What can we


  1. 17664 Opening your eyes to how your Mainframe Tape environment is really performing. Burt Loper John Ticic Insert Custom Session QR if Desired www.IntelliMagic.com

  2. Agenda Is Tape processing dead? What data is available? What can we observe in this data? Look at the z/OS and hardware view What’s important in our tape environment? Show examples of important aspects of tape processing, highlighting performance and problem investigation Summary/Conclusions

  3. Who is IntelliMagic • A leader in Availability Intelligence ‒ New visibility of threats to continuous availability by automatic interpretation of RMF/SMF/Config data using built-in expert knowledge • Over 20 years developing storage performance solutions • Privately held, financially independent • Customer centric and highly responsive • Products used daily at some of the largest sites in the world 3

  4. Presenter • Burt Loper – Senior Technical Consultant ‒ 35 years at IBM, latest experience architecting, installing and configuring TS7700 systems for customers ‒ TS7700 Performance – authored the TS7700 Health Assessment ‒ With IntelliMagic since January 2014 4

  5. Is Tape Processing Dead? 5

  6. Is Tape Processing Dead?  Remains lowest cost per Terabyte  Part of the Storage Hierarchy  Legacy uses Backup – possibly diminishing  Disaster Recovery – Last line of insurance   Growing uses Compliance – gov. or regulatory  Archive – older data being retained  Rapid growth in data – longer retentions  6

  7. What data is available? 7

  8. Tape Data Sources  z/OS SMF and RMF general data  IBM TS7700 BVIR history data  Oracle VSM SMF data  Tape catalog data 8

  9. z/OS Tape Data Sources SMF and TMS data SMF Type 14: SMF Type 15: DSN Read DSN Write RMF Type 74.1: Device Data z/OS z/OS  SMF data from each LPAR, includes VSM events also  RMF data about SMF Type 21: SMF Type 30: tape devices Tape Demounts Jobs/Programs  Collect data on a per sysplex basis Optional: TMS required optional Real and/or Virtual Tape 9

  10. Hardware Data Sources TS7700 BVIR Data and VSM Records Optional Back-end Tape  TS7700 BVIR collects data on a TS7700 per Grid basis BVIR  Consolidated by Grid/Library Cluster for reporting VSM Virtual Tape  Oracle HSC writes special SMF records HSC for VSM events ( see events appendix for details ) 10

  11. Tape Information is Everywhere z/OS Tape Catalog SMF Collect Consolidate Analyze Oracle z/OS RMF HSC IBM SMF TS7700 BVIR 11

  12. What’s important in your Tape environment? 12

  13. IBM TS7700 Performance 13

  14. TS7700 Virtual Subsystem Ethernet (replication) Processor FICON Channels Cache (Disk Arrays) 14

  15. The TS7700 Dashboards Summarize the Analysis Each of these dashboards checks a particular aspect of the TS7700 performance and capacity 15

  16. How Hard is my Hardware Running? 16

  17. Utilization Dashboard: each Bubble Summarizes a Chart 17

  18. TS7700 Processor & Disk Utilizations 18

  19. TS7700 Processor Utilizations 19

  20. TS7700 Disk Utilization 20

  21. Channel Throughput (MB/s) for all TS7700 Grids 21

  22. TS7700 Cache Flows 22

  23. How Long is it taking for Data to be Replicated to my DR Site? 23

  24. Replication – Receiving Cluster 24

  25. Replication – Sending Cluster 25

  26. Is there enough Cache to adequately support your Tape Workloads? 26

  27. Cache Dashboard – All Grids 27

  28. Cache Dashboard – Single Grid 28

  29. Cache Overview – Multi-chart 29

  30. Avg. Cache Age – 18 hour interval 30

  31. Avg. Cache Age – over 2 weeks 31

  32. Oracle STK VSM 32

  33. Presenter John Ticic – Senior Technical Consultant Started in Systems Programming in 1984. Joined IntelliMagic in 2008 as a Senior Consultant Specialties include: Disk/Tape performance z/OS Performance z/OS, zSeries implementation Presenting (I/O classes, SHARE, GSE,..) 33

  34. VSM Technology VSM 5 VSM 6 Different generations of hardware. Different methods of replicating tapes. Lot’s of information in the STK user SMF records. 34

  35. Why are some of my Jobs running slowly? 35

  36. Why are some of my Jobs running slowly? Yesterday, some batch Jobs took much longer! Why? Well lots of possible reasons: Application changes Processing more data CPU (or storage) resource shortages Had to wait for devices Had to wait for volumes I/O contention Let’s investigate tape mounts. 36

  37. VSM SMF Mounts We can see our mount distribution (6 x VSM 6). What are our Mount times like? 37

  38. VSM SMF Average Mount Times These are average times. We can look at the maximums, but let’s zoom into one VSM. 38

  39. VSM SMF Average Mount Times These are average times per Mount type. Mounts for scratch tapes are almost not visible, but mounts for existing tapes need to be staged from real drives 39

  40. VSM SMF VTV Mount Times We can look at specific volumes. For example: VTV 0EPZWE (DFHSM) is taking 830 seconds to mount. Let’s look at some more details for this volume. 40

  41. z/OS SMF VSM SMF Job Details Detailed information about the tape activity from both z/OS (SMF 14/15/21/30) and VSM. Note: No replication information since no data was written. 41

  42. VSM SMF Recall Details Real Tape Amount Amount recalled needed Real Tape Drive # 3:41 Mins Is this too long? 42

  43. VSM SMF Average Recall Time On average, the times look ok. What about the peaks! 43

  44. VSM SMF Maximum Recall Time Yes, some peaks. We can look at the detailed records, but let’s look at the mount distributions. 44

  45. VSM SMF RTD Mount Time We see that this VSM is doing Recalls and Migrates. Let’s look at RTD (Real Tape Device) #8 in detail. 45

  46. VSM SMF Specific RTD Activity RTD ID 8 is mainly busy with Recalls. There are occasional Migrates. 46

  47. VSM SMF Specific RTD Activity No apparent thrashing! 47

  48. Summary For very long mount times, there may be: ‒ Contention inside the VSM (large queues) ‒ Contention for RTDs (thrashing between Migrate, Recall, Reclaim) ‒ Robotic delays mounting the tape ‒ Delays positioning to the VTV on the MVC ‒ Media errors Use of the SMF records highlights the possible cause. 48

  49. How long is replication for my tapes taking? 49

  50. Replication Challenges Minimize disruption to production Tape usage. • ‒ E.g. Should batch Jobs wait until Tape is fully replicated. Maximize Recovery Point Objective. • ‒ How much data loss can we accept. Minimize Recovery Time Objective. • ‒ How long until we are back up and running. These decisions need to be made BEFORE a technology is selected and implemented. Now the big question: “How is my Tape replication running?” 50

  51. How long is replication for my tapes taking? So, you’re replicating data synchronously. How long is it taking? Is it consistent during the day? Are all volumes being replicated synchronously? Interesting questions. Let’s have a look. 51

  52. VSM SMF Average Replication Time 6 x VSM 6 systems, replicating synchronously. Average time is around 70 seconds, a little more during the batch window. This is the time per volume (VTV). 52

  53. VSM SMF Average Replication Time (Normalized) An average of 20 seconds per GiB. It’s taking longer during the batch window  There are a few peaks  53

  54. Concentrate on a Specific VSM Let’s concentrate on one VSM (the batch peaks). VSM PZRW48E is taking longer to replicate at times. How is this VSM doing? 54

  55. VSM SMF Average Replication Time – PZRW48E It certainly looks different later in they day. 55

  56. VSM SMF Replication during the Day Replication seems to take longer when there is less data! 56

  57. VSM SMF VTVs Replicated Many more VTVs replicated during the morning. What else is this VSM up to? 57

  58. VSM SMF Other Activity Front-end Mounts Migration Recalls RTD activity/Utilization We can also investigate the times for VTV Mounts, Migrates, Recalls and RTD mounts. 58

  59. VSM SMF VTV Compression Factor Some VTVs can compress more favorably. (Also available from z/OS SMF records) 59

  60. What’s happening? There doesn’t seem to be a clear problem explanation. Late in the day: Fewer GiB to replicate Fewer VTVs to replicate More time per GiB Higher compression factor 60

  61. VSM SMF Maximum Replication Delay We’d been looking at the average replication time. The maximum delay time certainly stands out. 61

  62. Replication Delay Time Replication delay time is the time between closing the Volume (Tape rewind indicator received from z/OS) and the start of the replication process. This should normally be very low (less than 1 second.) We have peaks approaching 2.5 seconds. This VSM is probably quite busy internally at this time and this is probably resulting in a large queue for the replication tasks. 62

  63. VSM Summary These were two examples of what is important for VSM Tape operations, and how they can be investigated. Performance data is available, but needs to be properly mined and presented. Connecting the z/OS view to the virtual hardware view is critical to understand and manage a VSM environment. 63

Recommend


More recommend