bug bites elephant
play

Bug bites Elephant? T est-driven Quality Assurance in Big Data - PowerPoint PPT Presentation

Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords Who speaks the Elephant language? Class A ? TDD! extends Mapper ? ? ROI, $$, ?


  1. Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords

  2. Who speaks… … the Elephant language? Class A ? TDD! extends Mapper… ? ? ROI, $$, ? … ? ? ? ? apt-get Write/execute tests, install… specify acceptance criteria, … 2

  3. The road… … to Big Data QA the FitNesse approach our Big Data QA problem test data definition / selection result inspection job & workflow control 3

  4. QA Web Intelligence @ 1&1 problem BI reporting , web analytics , … ~ 1 billion log events / day, DWH ~ 1 TB (thrift) logfiles chains of MR jobs, running on Hadoop Cluster 20 nodes / 8 cores / 96 GB RAM (CDH) 4

  5. QA An exemplary workflow problem ? inspect ? ? create (binary control (sample) ) workflow input data format s s Log Inter- Log Log Files DWH mediat Files MR MR Files (thrift) … (RDBMS e result (thrift) job 2 job 1 (thrift) ) (avro) 5

  6. QA Existing Approaches problem metho tests what? issues for our d usecase JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop Java / Groovy syntax jobs/workflows Scripts/ (manual) „script chaos“, CLI scripting/inspect. syntax  FitNesse as suitable addition / solution! 6

  7. The road… … to Big Data QA the FitNesse approach Big Data QA is different! test data definition / selection result inspection job & workflow control 7

  8. FitNesse In a nutshell „executable“ Wiki - Pages (returning test results) (almost) natural language test specification „fully integrated connection to SUT via standalone wiki and (Java-)“ Fixtures “ acceptance testing framework” 8

  9. FitNesse Architecture Overview Fixtur Brows es er public int script | FitNesse check | numResults Server num results | { ... } 3 |  „calling java methods System under Test from wiki“, compare return values  Integrates with REST, 9 Jenkins…

  10. FitNesse An Exemplary T est 10

  11. FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 11

  12. FitNesse Hadoop Fixture Java Code public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} } 12

  13. The road… … to Big Data QA Fitnesse Wiki test execution! Big Data QA is different! test data definition / selection result inspection job & workflow control 13

  14. T est CSV Data 14

  15. T est Thrift Data ‣ Big Data: Efficient data transfer among heterogeneous sources ‣ Define Interface via IDL , Compiler for many languages 15

  16. T est Real World Data Data ‣ Dev/T est Hadoop Cluster: Identical Hardware like Prod, but fewer nodes ‣ (random/biased) sampling e.g. on daily basis ‣ Feedback loop: ‣ identify „ special cases “ from real data ‣ include them in (manual) data definition ‣ Gradually increase test coverage / artefact quality 16

  17. The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world test data! result inspection job & workflow control 17

  18. Job Swiss Army Knife: Shell Control ‣ Execute arbitrary (shell) commands ‣ Mainly a wrapper around apache.commons.exec.CommandLine 18

  19. Job Hadoop Fixture Control ‣ Hide complexity from test authors ‣ „define“ appropriate test language via (Java) method names ‣ re-use other fixtures (Shell, …) internally 19

  20. Job Workflows & Suites Control ‣ FitNesse allows to group tests into suites 1 M R b o j ‣ Can be used to simulate MR processing chains 2 ‣ SetupSuite / T M R o b j earDownSuite for creating / destroying test conditions ‣ T ests can still be executed individually 20

  21. The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! result inspection Use suites & fixtures for jobs/workflows! 21

  22. Results Data Warehouse / Hive ‣ Validate RDBMS contents (via JDBC) ‣ E.g. for checking the final result ‣ Or use Hive + Hive-Server to query raw data 22

  23. Results Pig ‣ Execute arbitrary pig commands from Wiki page ‣ Inspect e.g. binary intermediate results (avro, …) 23

  24. Results Pig Fixture extends PigServer public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } } 24

  25. Results Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 25

  26. Thank you! dominik.benz@inovex.de FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! Inspect results Use suites & fixtures via Pig/Hive for jobs/workflows! 26

  27. Want more? Inovex trains you!  Android Developer Training (3 days, Karlsruhe/München)  Certified Scrum Developer Training (5 days, Köln)  Hadoop Developer Training (3 days, Karlsruhe/Köln)  Liferay Portal-Developer Training (4 days, Karlsruhe)  Liferay Portal-Admin Training (3 days, Karlsruhe)  Pentaho Data Integration Training (4 days, München/Köln) information and registration at www.inovex.de/offene-trainings 27

  28. Inovex @bbuzz Stefan Bernha Kathri rd n Jörg Andre Christia Christi w n an 28

  29. BACKUP 29

  30. FitNesse Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 30

  31. Results Demo ‣ Download & install FitNesse server ‣ Create csv log file ‣ Run hadoop job which counts viewed items ‣ Inspect Results with Hive 31

  32. 32

  33. FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | Table:Log File | | /home/inovex/viewLog.csv | | | date | user | product | browser | os | | 2013-03-12 | john | 1 | ff | win | | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 33

  34. FitNesse An Exemplary T est 34

Recommend


More recommend