Bench'19 Benchmarking Database Ingestion Ability with Real-Time Big Astronomical Data Qing Tang Qing Tang,Chen Yang, Xiaofeng Meng, Zhihui Du RUC 15/11/2019
Outline Ø Background Ø Benchmark Methodology Ø Experiments and Results Analysis Ø Conclusion
1 Background 微引力透镜 Catalog 流 超新星 ? AstroServer 伽玛暴 Accelerating scientific discovery Real-time discovery of the transients Mining the long-term regular pattern Gamma-ray burats Evolution of sun-class stars Supernova microlensing
1.1 Big Astronomy Data • GWAC(the ground-based wide-angle camera array) • Covering large field & high sampling frequency Sky Survey Field 5000 (square degree) Sampling Frequence 15s observation stars 6.8 million generated data 2.5TB/day Service life 10 years Total data 3PB~6PB
1.2 Application characteristics … a) Quick response b) Massive storage of data c) Timeliness of data analysis d) High cost performance
Outline Ø Background Ø Benchmark Methodology Ø Experiments and Results Analysis Ø Conclusion
2 Benchmark Methodology The specific methods are as follows: (1) According to the characteristics of data sets ,the corresponding workloads are analyzed in depth, and the frequent basic operating units are extracted; (2)The benchmark test specifications are determined; (3)The loads based on various software stacks are provided;
Outline Ø Background Ø Benchmark Methodology Ø Experiments and Results Analysis Ø Conclusion
3.1 Experimental environment Performance test environment c onfiguratio Slaves Slaves Master n Hardware software Ubuntu 14.04.5 Memory : 96GB Redis_3.2.5 Hard disk : 3.5TB Master HBase_1.2.4 CPU : E5-2603 v3 @ MySQL_5.6.33 1.60GHz Slaves Slaves Kafka Ubuntu 14.04.5 Memory : 96GB Redis_3.2.5 Hard disk :30 T B HBase_1.2.4 Slave CPU : E5-2603 v3 @ MySQL _5.6.33 1.60GHz Kafka
3.2 Test data Set A ttribute Type A ttribute Type Data generator redis_key string magcalibe double jd_str string sigma_base double ccdNum string sigma_ext double zone string tag_valid int starId long magdiff double 1920 files alpha float lastCMtempname string delta float starBelong string pixx double abSignal string pixy double abVal double mag double 2.8TB abQuality double mage double abRank double thetaimage long sigma_ext_median double flags float mag_interval_num int ellipticity float sigmedthreshold double classstar float data11 double background float data12 double u One time: 1920 files fwhm float data13 double u One file: 170,000 rows vignet float data14 double u One row: 39 columns magnorm double data15 double magcalib double
3.3 Results Analysis DataBase Average storage time compare Selecttion HBase 340s > 15s No MySQL- cluster 1700s > 15s No Oracle 50.7s > 15s No Redis- cluster 6.4s < 15s Yes Kafka 20.5s > 15s - DataBase Persistence time Compression Rate Input anomaly rate Redis+HBase 4.8h 40% 2.50% Redis/HBase 6h 40% 4.60% Redis+MySQL 201h 100% 1.00% Redis/MySQL 202h 100% 1.00% Kafka+HBase 10.9 100% 2.50%
Outline Ø Background Ø Benchmark Methodology Ø Experiments and Results Analysis Ø Conclusion
4 Conclusion 微引力透镜 Catalog 流 超新星 ? AstroServer 伽玛暴 The cache Data generater Cross matcher manager Hbase Data persister Redis cluster Query engine
Thank k You! http://idke.ruc.edu.cn email: tangqing@ruc.edu.cn
Recommend
More recommend