Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords
Who speaks… … the Elephant language? Class A ? TDD! extends Mapper… ? ? ROI, $$, ? … ? ? ? ? apt-get Write/execute tests, install… specify acceptance criteria, … 2
The road… … to Big Data QA the FitNesse approach our Big Data QA problem test data definition / selection result inspection job & workflow control 3
QA Web Intelligence @ 1&1 problem BI reporting , web analytics , … ~ 1 billion log events / day, DWH ~ 1 TB (thrift) logfiles chains of MR jobs, running on Hadoop Cluster 20 nodes / 8 cores / 96 GB RAM (CDH) 4
QA An exemplary workflow problem ? inspect ? ? create (binary control (sample) ) workflow input data format s s Log Inter- Log Log Files DWH mediat Files MR MR Files (thrift) … (RDBMS e result (thrift) job 2 job 1 (thrift) ) (avro) 5
QA Existing Approaches problem metho tests what? issues for our d usecase JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop Java / Groovy syntax jobs/workflows Scripts/ (manual) „script chaos“, CLI scripting/inspect. syntax FitNesse as suitable addition / solution! 6
The road… … to Big Data QA the FitNesse approach Big Data QA is different! test data definition / selection result inspection job & workflow control 7
FitNesse In a nutshell „executable“ Wiki - Pages (returning test results) (almost) natural language test specification „fully integrated connection to SUT via standalone wiki and (Java-)“ Fixtures “ acceptance testing framework” 8
FitNesse Architecture Overview Fixtur Brows es er public int script | FitNesse check | numResults Server num results | { ... } 3 | „calling java methods System under Test from wiki“, compare return values Integrates with REST, 9 Jenkins…
FitNesse An Exemplary T est 10
FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 11
FitNesse Hadoop Fixture Java Code public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} } 12
The road… … to Big Data QA Fitnesse Wiki test execution! Big Data QA is different! test data definition / selection result inspection job & workflow control 13
T est CSV Data 14
T est Thrift Data ‣ Big Data: Efficient data transfer among heterogeneous sources ‣ Define Interface via IDL , Compiler for many languages 15
T est Real World Data Data ‣ Dev/T est Hadoop Cluster: Identical Hardware like Prod, but fewer nodes ‣ (random/biased) sampling e.g. on daily basis ‣ Feedback loop: ‣ identify „ special cases “ from real data ‣ include them in (manual) data definition ‣ Gradually increase test coverage / artefact quality 16
The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world test data! result inspection job & workflow control 17
Job Swiss Army Knife: Shell Control ‣ Execute arbitrary (shell) commands ‣ Mainly a wrapper around apache.commons.exec.CommandLine 18
Job Hadoop Fixture Control ‣ Hide complexity from test authors ‣ „define“ appropriate test language via (Java) method names ‣ re-use other fixtures (Shell, …) internally 19
Job Workflows & Suites Control ‣ FitNesse allows to group tests into suites 1 M R b o j ‣ Can be used to simulate MR processing chains 2 ‣ SetupSuite / T M R o b j earDownSuite for creating / destroying test conditions ‣ T ests can still be executed individually 20
The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! result inspection Use suites & fixtures for jobs/workflows! 21
Results Data Warehouse / Hive ‣ Validate RDBMS contents (via JDBC) ‣ E.g. for checking the final result ‣ Or use Hive + Hive-Server to query raw data 22
Results Pig ‣ Execute arbitrary pig commands from Wiki page ‣ Inspect e.g. binary intermediate results (avro, …) 23
Results Pig Fixture extends PigServer public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } } 24
Results Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 25
Thank you! dominik.benz@inovex.de FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! Inspect results Use suites & fixtures via Pig/Hive for jobs/workflows! 26
Want more? Inovex trains you! Android Developer Training (3 days, Karlsruhe/München) Certified Scrum Developer Training (5 days, Köln) Hadoop Developer Training (3 days, Karlsruhe/Köln) Liferay Portal-Developer Training (4 days, Karlsruhe) Liferay Portal-Admin Training (3 days, Karlsruhe) Pentaho Data Integration Training (4 days, München/Köln) information and registration at www.inovex.de/offene-trainings 27
Inovex @bbuzz Stefan Bernha Kathri rd n Jörg Andre Christia Christi w n an 28
BACKUP 29
FitNesse Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 30
Results Demo ‣ Download & install FitNesse server ‣ Create csv log file ‣ Run hadoop job which counts viewed items ‣ Inspect Results with Hive 31
32
FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | Table:Log File | | /home/inovex/viewLog.csv | | | date | user | product | browser | os | | 2013-03-12 | john | 1 | ff | win | | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 33
FitNesse An Exemplary T est 34
Recommend
More recommend