Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University of Chinese Academy of Sciences COMPUTIN MPUTING G T TECHNOLOGY HPBDC 2015 Ohio, USA
Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015
What is BigDataBench ? What is BigDataBench ? � An open source big data benchmarking project p g g p j • http://prof.ict.ac.cn/BigDataBench • Search Google using “ BigDataBench ” BigDataBench HPBDC 2015
BigDataBench Detail BigDataBench Detail • Five application domains • Propose benchmark specifications for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015
Five Application Domains Five Application Domains DDBJ/EMBL/GenBank database Growth / / Taking up 80% of Nucleotides Entries internet services Internet Service Multimedia Search Engine Social Network according to page g p g 200 180 180 new new new Electronic Commerce El t i C M di St Media Streaming i S Search engine, Social network, E ‐ commerce h i S i l k E views and daily visitors 180 Others 160 VIDEOS on YouTube hours MUSIC streaming PHOTOS on FLICKR every 160 15% 140 5% on) e e y every minute ute on PANDORA every minute on PANDORA every minute minute minute 140 140 40% 40% es (million) ) tides (billio 120 120 15% 100 100 80 Entrie 80 80 Nucleot 25% 60 data growth 60 VIDEO feeds from 40 Bioinformatics f 40 are minutes VOICE calls on 20 20 surveillance cameras 20 ll IMAGES, VIDEOS , doc Skype every minute 0 Top 20 websites 0 uments, … http://www.oldcolony.us/wp ‐ content/uploads/2014/11/whatisbigdata ‐ DKB ‐ v2.pdf http://www.alexa.com/topsites/global;0 p // / p /g ; http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth ‐ e.html#dbgrowth ‐ graph BigDataBench HPBDC 2015
Benchmark specification Benchmark specification � Guidelines for BigDataBench implementation � Data model � workloads Describe data model Model typical application scenarios Extract important workloads BigDataBench HPBDC 2015
BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015
BigDataBench Summary g y BDGS(Big Data Generator Suite) for scalable data Wikipedia Entries Amazon Movie Reviews Google Web Graph Facebook Social Network E ‐ commerce Transaction ProfSearch Resumes ImageNet g English broadcasting audio DVD Input Streams p Genome sequence data Image scene Assembly of the human genome MNIST SoGou Data 14 Real ‐ world Data Sets NoSql Impala Impala Social Search Shark E-commerce Engine Network Hadoop RDMA H d RDMA Bioinformatics MPI Multimedia DataMPI Software Stacks Software Stacks 33 Workloads 33 Workloads BigDataBench HPBDC 2015
Big Data Generator Tool Big Data Generator Tool � 3 kinds of big data generators � Preserving original characteristics of real data g g � Text/Graph/Table generator BigDataBench HPBDC 2015
BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementations Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015
BigDataBench Subset BigDataBench Subset � Motivation � Expensive to run all the benchmarks for system p y and architecture researches • multiplied by different implementations multiplied by different implementations • BigDataBench 3.0 provides about 77 workloads Eliminate the Eliminate the correlation data Identify workload Clustering Subset Subset characteristics from a characteristics from a (K ‐ Means) specific perspective Dimension reduction (PCA) ( ) BigDataBench HPBDC 2015
Why BigDataBench? Why BigDataBench? Specifi Application Workload Work Scalable data Multiple Multite Subs Simulat cation ca o domains do a s Types ypes loads oads sets (from real se s ( o ea impleme p e e nancy a cy ets e s or o data) ntations version BigDataBench Four [1] Y Five 33 8 Y Y Y Y BigBench Y One Three 10 3 N N N N Cloud ‐ Suite N N/A Two 8 3 N N N Y HiBench N N/A Two 10 3 N N N N CALDA Y N/A / One 5 N/A / Y N N N YCSB Y N/A One 6 N/A Y N N N LinkBench Y N/A One 10 N/A Y N N N AMP Y N/A One 4 N/A Y N N N Benchmarks [1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service BigDataBench HPBDC 2015
BigDataBench Users BigDataBench Users � http://prof.ict.ac.cn/BigDataBench/users/ htt // f i t /Bi D t B h/ / � Industry users � Accenture, BROADCOM, SAMSUMG, Huawei, IBM � China’s first industry ‐ standard big data benchmark y g suite � http://prof.ict.ac.cn/BigDataBench/industry ‐ standard ‐ p //p / g / y benchmarks/ � About 20 academia groups published papers using g p p p p g BigDataBench BigDataBench HPBDC 2015
BigDataBench Publications BigDataBench Publications BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE h h k f h � International Symposium On High Performance Computer Architecture (HPCA ‐ 2014). Characterizing data analysis workloads in data centers. 2013 IEEE � International Symposium on Workload Characterization (IISWC 2013) ( Best paper award ) BigOP: generating comprehensive big data workloads as a benchmarking � framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014) Advanced Applications (DASFAA 2014) BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The � Fourth workshop on big data benchmarking (WBDB 2014) Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint Id if i D f W kl d i Bi D A l i Xi i � arXiv:1505.06872 BigDataBench ‐ MT: A Benchmark Tool for Generating Realistic Mixed Data � Center Workloads arXiv preprint arXiv:1504.02205 BigDataBench HPBDC 2015
Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015
� Diversified system level BigDataBench behaviors: System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weighte ed I/O time ratio o 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H The Average Weighted disk I/O time ratio H H-Bayes(1) H-Bayes(1) M-Bayes M-Bayes M M-Kmeans M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A
� Diversified system level BigDataBench behaviors: � High CPU utilization & less I/O time System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weight ed I/O time rat tio 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) The Average Weighted disk I/O time ratio S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H H-Bayes(1) H H-Bayes(1) M-Bayes M-Bayes M-Kmeans M M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A
Recommend
More recommend