B IG D ATA A NALYTICS R EFERENCE A RCHITECTURES AND C ASE S TUDIES
Relational vs. Non-Relational Architecture Relational Non-Relational • Rational • Agile • Predictable • Flexible • Traditional • Modern 2
Agenda Tips for Big Data Big Data Case Designing Reference Challenges Studies Big Data Architectures Solutions 3
Big Data Challenges UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Media Social Public Data Machine Sensor Apps Networks Web Storages Log Data Data Complexity Velocity Variety Volume Media Archives Data Storages Scanned documents, statements, Images, video, audio etc. RDBMS, NoSQL, Hadoop, file systems medical records, e-mails etc.. etc. Docs Social Networks Machine Log Data XLS, PDF, CSV, HTML, JSON etc. Twitter, Facebook, Google+, Application logs, event logs, server LinkedIn etc. data, CDRs, clickstream data etc. Business Apps Public Web Sensor Data CRM, ERP systems, HR, project Wikipedia, news, weather, public Smart electric meters, medical management etc. finance etc devices, car sensors, road cameras etc. 4
Big Data Analytics Big Data Analytics Traditional Analytics (BI) vs Focus on • Predictive analytics • Descriptive analytics • Data Science • Diagnosis analytics • Large scale data sets • Limited data sets Data Sets • More types of data • Cleansed data • Raw data • Simple models • Complex data models Supports Causation: what happened, Correlation : new insight and why? More accurate answers 5
Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Volume Data Quality Performance Self Service Data Business Discovery Reporting Data Scientists/ Business Users Analysts 6
Big Data Analytics Reference Architectures Architecture Drivers: Reference Architectures: ▪ Extended Relational ▪ Volume ▪ Sources ▪ Non-Relational ▪ Throughput ▪ Hybrid ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost 7
Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Data Query & Web Structured ETL Warehouses Reporting Browsers Semi- Native OLAP Cubes Messaging Data Marts Structured Desktop Operational Advanced Mobile Unstructured API/ODBC Data Stores Analytics Devices Replication Web Services 8
Extended Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Data Query & Web Structured ETL Warehouses Reporting Browsers Semi- Native OLAP Cubes Messaging Data Marts Structured Desktop Operational Advanced Mobile Unstructured API/ODBC Data Stores Analytics Devices Replication Web Services Key components affected with Big Data challenges 9
Non-Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Query & Web NoSQL Structured ETL Reporting Browsers Databases Semi- Native Distributed File Messaging Map Reduce Structured Desktop Systems Mobile Unstructured API Search Engines Devices Advanced Web Services Analytics Key components introduced with non-relational movement 10
Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 11
Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 12
Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 13
Relational vs. Non-Relational Architecture Relational Non-Relational • Rational • Agile • Predictable • Flexible • Traditional • Modern 14
Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Performance Volume Data Business Discovery Reporting Data Scientists Business Users 15
Data Discovery: Non-Relational Architecture Data Sources Integration Data Storages Analytics Presentation Query & Web NoSQL Structured ETL Reporting Browsers Databases Semi- Native Distributed File Map Reduce Messaging Structured Desktop Systems Mobile Search Engines Unstructured API Devices Advanced Web Services Analytics 16
Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Data Quality Self Service Data Business Discovery Reporting Data Scientists Business Users 17
Business Reporting: Hybrid Architecture Data Sources Integration Data Storages Analytics Presentation SQL Query & Web Relational Structured ETL Reporting Browsers DWH/DM Semi- Native Distributed File Map Reduce Messaging Desktop Structured Systems Mobile Search Engines Unstructured API Devices Advanced Web Services Analytics Extended Relational components Non-relational components 18
Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Data Business Discovery Reporting Data Scientists Business Users 19
Lambda Architecture Source: 20
Case Study #1: Usage & Billing Analysis Business Goals: Business Goals: Business Ar Business Area: ea: Provide visual environment for building Cloud based platform for building, deploying, custom mobile application hosting and managing of mobile applications Charge customers based on the platform they are using, number of consumers’ applications etc. 21
Architectural Decisions Architectur Ar chitecture Driver Drivers: s: ▪ Volume (> 10 TB) ▪ Reliability (24/7) ▪ Sources (Semi-structured - JSON) ▪ Security (Multitenancy) ▪ Throughput (> 10K/sec) ▪ Self Self-Ser -Service (Ad-Ho vice (Ad-Hoc r repor ports) s) ▪ Latency (2 min) ▪ Cost (The less the better ) ▪ Extensibility (Custom m tensibility (Custom metrics) trics) ▪ Constraints (Public Cloud) ▪ Data Quality (Consisten Data Quality (Consistency) cy) Tr Trade-off: Extended Non-Relational // Relational Extensibility ‐ + Extended Relational Architecture Extensibility via Pre ‐ allocated Data Quality + ‐ Fields pattern Self-Service + ‐ 22
Technologies: Solution Architecture • Amazon Redshift • Amazon SQS • Amazon S3 • Elastic Beanstalk • Jaspersoft BI Professional • Python 23
Case Study #2: Clickstream for retail website Business Goals: Business Goals: Business Ar Business Area: ea: Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature Retail. A platform for e-commerce and delivered by the e-commerce platform; collecting feedbacks from customers Provide the ability to understand how end-users are interacting with service content, products, and features on sites; Do clickstream analysis; Perform A/B T esting 24
Architectural Decisions Architectur Ar chitecture Driver Drivers: s: ▪ Volume (45 TB) lume (45 TB) ▪ Reliability (24/7) ▪ Sources (Semi-structured - JSON) ▪ Security (Multitenancy) ▪ Thr Throughput (> 20K/sec) ughput (> 20K/sec) ▪ Self Self-Ser -Service (Canned r vice (Canned repor ports, Data s, Data scien science) e) ▪ Latency (1 hour) ▪ Cost (The less the better ) ▪ Extensibility (Custom tags) tensibility (Custom tags) ▪ Constraints (Public Cloud) ▪ Data Quality (Not critical) Tr Trade-off: Extended Non- // Relational Relational Volume/Scalability +/ ‐ + Non ‐ Relational Architecture Reporting via Materialized View Throughput + + pattern Self-Service + +/ ‐ Extensibility ‐ + 25
Technologies: Solution Architecture • Amazon S3 • Flume • Hadoop/HDFS, MapReduce • HBase • Oozie • Hive Node 1 Node 2 Node N 26
Tips for Designing Big Data Solutions Understand data users and sources Discover architecture drivers Select proper reference architecture Do trade-off analysis, address cons Map reference architecture to technology stack Prototype, re-evaluate architecture Estimate implementation efforts Set up devops practices from the very beginning Advance in solution development through “small wins” Be ready for changes, big data technologies are evolving rapidly 27
Clients include: Leading global Product and ▪ Application Development partner founded in 1993 3,300+ employees across North ▪ America, Ukraine and Western Europe Thousands of successful outsourcing ▪ projects! SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security 28
Recommend
More recommend