survey of the azure
play

Survey of the Azure Data Landscape Ike Ellis Wintellect Core - PowerPoint PPT Presentation

Survey of the Azure Data Landscape Ike Ellis Wintellect Core Services Consulting Custom software application development and architecture Instructor Led Training Microsofts #1 training vendor for over 14 years having trained more than


  1. Survey of the Azure Data Landscape Ike Ellis

  2. Wintellect Core Services Consulting Custom software application development and architecture Instructor Led Training Microsoft’s #1 training vendor for over 14 years having trained more than 50,000 Microsoft developers On-Demand Training World class, subscription based online training

  3. Industry Influencers We wrote the book (over 30 of them)

  4. Some Microsoft Related Highlights Gold Azure Partner  2016 IAMCP Gold Partner of the Year for the U.S. announced at WPC  CEO is the Microsoft Regional Director (RD) for Atlanta  DevOps competency partner   Multiple ALM Rangers Software Development competency partner  Xamarin Premier Consulting Partner   Multiple Xamarin Certified Engineers  Chosen to teach the 2-day Xamarin University pre-con at Evolve 2016 Other: Visual Studio Integration Partner, Azure Circle Partner, ALM Inner  Circle Partner, MVP of the Year, and more…

  5. Agenda • Azure Blob Storage • Azure Table Storage • Azure CosmosDB • Azure SQL Database • Azure SQL in a VM • Azure SQL Data Warehouse • Azure Data Lake • Lots of other things supported: • Postgres, MySQL, MongoDB, Redis

  6. Topic Agenda • What is it? • How is it used? • What are the competitors? • DEMO!

  7. Azure Blob Storage

  8. Azure Blob Storage • Blobs are files (PDFs, JPGs, DOCs, etc) • Highly durable, massively scalable • More than 40 trillion stored objects • 3.5+ Million requests/second • Exposed via REST APIs • Use them in .NET, C++, Java, Node.JS, Android… • AzCopy, PowerShell

  9. Blob Storage Fault Tolerance & Scalability

  10. What kind of blobs can I have? • Share files with clients • off-load static content from web servers (invoices, contracts, resumes) • Azure Websites – Platform as a Service – no files on a webserver • SQL BAK Files • VM Hard Drives

  11. Competitors? • On premise SANS and arrays • Amazon S3 Blob Storage

  12. Azure Blob Storage  Demo

  13. Azure Table Storage • Much of it similar to Azure Blob Storage • Same scalability & redundancy • Affordable price • Very, very fast • NoSQL key value pair solution • Quick data retrieval, little configuration

  14. Competitors • Amazon DynamoDB Table Storage

  15. Azure Table Storage  Demo

  16. JSON Document  Standard for passing data between a server and a web application  Replacement for XML  Hierarchical  Terse  Simple data types

  17. Modeling in CosmosDB {  "id": "1",  Reading is one operation "firstName": "Thomas",  Writing is one operation "lastName": "Andersen",  No assembly de-assembly "addresses": [  {  "line1": "100 Some Street",  "line2": "Unit 1",  "city": "Seattle",  "state": "WA",  "zip": 98012  }  ],  "contactDetails": [  {"email: "thomas@andersen.com"},  

  18. Query Playground http://www.documentdb.com/sql/demo 

  19. Competitors • MongoDB • Amazon DynamoDB

  20. Azure SQL Database • Platform as a Service • All data is backed up for you • Point in time restore • Can be geo-redundant • Scalable both in performance and in data size • Up to 1TB • Not feature complete with SQL Server in a VM

  21. Database Replicas

  22. Azure SQL Database Unsupported Features  https://azure.microsoft.com/en-us/documentation/articles/sql-database-transact- sql-information/

  23. You can also make it scale up!

  24. Competitors  Amazon RDS

  25. Azure SQL Database  Demo

  26. Azure SQL Server in a VM • You manage backups • You create fault tolerant options • You manage disk space • You manage patching • You don’t manage hardware failure • You don’t manage purchasing hardware • You don’t manage networking infrastructure

  27. Performance Considerations • Use Premium Storage. • Use a VM size of DS3 or higher for SQL Enterprise edition and DS2 or higher for SQL Standard edition. • Use a minimum of 2 P30 disks (1 for log files; 1 for data files and TempDB ). • Keep the storage accountand SQL Server VM in the same region. • Disable Azure geo-redundant storage (geo-replication) on the storage account. • Avoid using operating system or temporary disks for database storage or logging.

  28. Backups & Fault Tolerance • Back up to Azure Blob Storage • Use Always on Availability Groups and Windows Failover Clustering Services (WFCS) for fault tolerance • Can use mirroring or log shipping too • Can also mix in on-premise

  29. Competitors  Amazon EC2 – VMs in the cloud

  30. Azure SQL Data Warehouse • Elastic Massively Parallel Processing System • Use T-SQL to query across relational and non-relational data • Up to petabyte volumes of data • Scale compute separately from data • When paused, you only pay for storage • Deploys in seconds

  31. Azure SQL Data Warehouse • Supports 32 concurrent queries • Used for fanning out queries over multiple machines for processing/aggregation/analytics • Performance becomes far more predictable than with just straight SQL Server • Not used in OLTP environments

  32. What is a DTU (Data Warehouse Unit)? • A unit of scale that determines how much hardware will give great performance • Done in increments of 100 (mostly) • How many DTUs? • Start Small • Monitor • Change as needed. It’s instant. ALTER DATABASE MySQLDW MODIFY (SERVICE_OBJECTIVE = 'DW1000') ;

  33. Partitioning Data  Two choices: • Distribute data based on hashing values from a single column • Good if clusters of tables will be joined and are related • Distribute data evenly but randomly • Fail-safe method

  34. Non-supported data types • geometry, use a varbinary type • geography, use a varbinary type • hierarchyid, CLR type not native • image, text, ntext when text based use varchar/nvarchar (smaller the better) • nvarchar(max), use varchar(4000) or smaller for better performance • numeric, use decimal • sql_variant, split column into several strongly typed columns • sysname, use nvarchar(128) • table, convert to temporary tables • timestamp, re-work code to use datetime2 and CURRENT_TIMESTAMP function. • varchar(max), use varchar(8000) or smaller for better performance • uniqueidentifier, use varbinary(8) • user defined types, convert back to their native types where possible • xml, use a varchar(8000) or smaller for better performance - split across columns if needed

  35. Unsupported Features • primary keys • user-defined types • foreign keys • indexed views • check constraints • identities • unique constraints • sequences • unique indexes • triggers • computed columns • synonyms • sparse columns

  36. Competitors  Amazon RedShift

  37. Azure SQL Data Warehouse  Demo

  38. Azure Data Lake  HDFS for the cloud  Can use tools like Spark, Storm, Flume, Sqoop, Kafka, etc.  No fixed limits on account size or file size

  39. What is a generic data lake? • An enterprise wide repository of every type of data collected in a single place • Prior to any formal definition of requirements or schema. Allows every type of data to be kept without discrimination Organizations can then use Hadoop or advanced analytics to find patterns of the data. • Serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse.

  40. Products • Azure Data Lake Store – Built on HDFS • Azure Data Lake Analytics – Built on Yarn. Introduces U-SQL/C# 45

  41. Competitors  A lot of Hadoop implementations, but nothing really quite like it 46

  42. More data options…. • MongoDB • PostGres • Redis • MySQL • Oracle 47

  43. Ike Ellis  Ike Ellis, MVP blog.ikeellis.com  Book: Developing Azure Solutions  Podcast Guest: Talk Python to Me – Dec 2015, June 2016  .NET Rocks – Sept 2015, Sept 2016  SDTIG – www.sdtig.com

Recommend


More recommend