Beyond assertion: setup and teardown UN IT TES TIN G F OR DATA S - PowerPoint PPT Presentation

Making the MagicMock() bug-free Raw data 1,801 201,411 def test_on_raw_data(raw_and_clean_data_file, 1,767565,112 mocker, 2,002 333,209 ): 1990 782,911 raw_path, clean_path = raw_and_clean_data_file 1,285 389129 row_to_list_mock = mocker.patch( "data.preprocessing_helpers.row_to_list" ) def row_to_list_bug_free(row): row_to_list_mock.side_effect = row_to_list_bug_free return_values = { "1,801\t201,411\n": ["1,801", "201,411"], "1,767565,112\n": None, "2,002\t333,209\n": ["2,002", "333,209"], "1990\t782,911\n": ["1990", "782,911"], "1,285\t389129\n": ["1,285", "389129"], } return return_values[row] UNIT TESTING FOR DATA SCIENCE IN PYTHON

Side effect Raw data 1,801 201,411 def test_on_raw_data(raw_and_clean_data_file, 1,767565,112 mocker, 2,002 333,209 ): 1990 782,911 raw_path, clean_path = raw_and_clean_data_file 1,285 389129 row_to_list_mock = mocker.patch( "data.preprocessing_helpers.row_to_list", side_effect = row_to_list_bug_free def row_to_list_bug_free(): ) return_values = { "1,801\t201,411\n": ["1,801", "201,411"], "1,767565,112\n": None, "2,002\t333,209\n": ["2,002", "333,209"], "1990\t782,911\n": ["1990", "782,911"], "1,285\t389129\n": ["1,285", "389129"], } return return_values[row] UNIT TESTING FOR DATA SCIENCE IN PYTHON

Bug free replacement of dependency raw def test_on_raw_data(raw_and_clean_data_file, mocker, ): row_to_list_mock (bug-free) row_to_list() raw_path, clean_path = raw_and_clean_data_file row_to_list_mock = mocker.patch( "data.preprocessing_helpers.row_to_list", convert_to_int() side_effect = row_to_list_bug_free ) preprocess(raw_path, clean_path) clean UNIT TESTING FOR DATA SCIENCE IN PYTHON

Checking the arguments call_args_list attribute returns a list of arguments that the mock was called with def test_on_raw_data(raw_and_clean_data_file, mocker, row_to_list_mock.call_args_list ): raw_path, clean_path = raw_and_clean_data_file [call("1,801\t201,411\n"), row_to_list_mock = mocker.patch( call("1,767565,112\n"), "data.preprocessing_helpers.row_to_list", call("2,002\t333,209\n"), side_effect = row_to_list_bug_free call("1990\t782,911\n"), ) call("1,285\t389129\n") preprocess(raw_path, clean_path) ] UNIT TESTING FOR DATA SCIENCE IN PYTHON

Checking the arguments call_args_list attribute returns a list of from unittest.mock import call arguments that the mock was called with def test_on_raw_data(raw_and_clean_data_file, mocker, row_to_list_mock.call_args_list ): raw_path, clean_path = raw_and_clean_data_file [call("1,801\t201,411\n"), row_to_list_mock = mocker.patch( call("1,767565,112\n"), "data.preprocessing_helpers.row_to_list", call("2,002\t333,209\n"), side_effect = row_to_list_bug_free call("1990\t782,911\n"), ) call("1,285\t389129\n") preprocess(raw_path, clean_path) ] assert row_to_list_mock.call_args_list == [ call("1,801\t201,411\n"), call("1,767565,112\n"), call("2,002\t333,209\n"), call("1990\t782,911\n") call("1,285\t389129\n") ] UNIT TESTING FOR DATA SCIENCE IN PYTHON

Dependency buggy, function bug-free, test still passes! pytest -k "TestRowToList" =========================== test session starts ============================ collected 21 items / 14 deselected / 7 selected data/test_preprocessing_helpers.py .....FF [100%] ================================= FAILURES ================================= _________________ TestRowToList.test_on_normal_argument_1 __________________ ... _________________ TestRowToList.test_on_normal_argument_2 __________________ ... ============ 2 failed, 5 passed, 14 deselected in 0.70 seconds ============= UNIT TESTING FOR DATA SCIENCE IN PYTHON

Dependency buggy, function bug-free, test still passes! pytest -k "TestPreprocess" =========================== test session starts ============================ collected 21 items / 20 deselected / 1 selected data/test_preprocessing_helpers.py . [100%] ================= 1 passed, 20 deselected in 0.63 seconds ================== UNIT TESTING FOR DATA SCIENCE IN PYTHON

Let's practice mocking! UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Testing models UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer

Functions we have tested so far preprocess() get_data_as_numpy_array() split_into_training_and_testing_sets() UNIT TESTING FOR DATA SCIENCE IN PYTHON

Raw data to clean data from data.preprocessing_helpers import preprocess data from features.as_numpy import get_data_as_numpy_array |-- raw from models.train import ( | |-- housing_data.txt split_into_training_and_testing_sets |-- clean ) | src preprocess("data/raw/housing_data.txt", tests "data/clean/clean_housing_data.txt" ) data/raw/housing_data.txt 2,081 314,942 1,059 186,606 293,410 <-- row with missing area 1,148 206,186 ... UNIT TESTING FOR DATA SCIENCE IN PYTHON

Raw data to clean data from data.preprocessing_helpers import preprocess data from features.as_numpy import get_data_as_numpy_array |-- raw from models.train import ( | |-- housing_data.txt split_into_training_and_testing_sets |-- clean ) | |-- clean_housing_data.txt src preprocess("data/raw/housing_data.txt", tests "data/clean/clean_housing_data.txt" ) data/clean/clean_housing_data.txt 2081 314942 1059 186606 1148 206186 ... UNIT TESTING FOR DATA SCIENCE IN PYTHON

Clean data to NumPy array from data.preprocessing_helpers import preprocess get_data_as_numpy_array( from features.as_numpy import get_data_as_numpy_array "data/clean/clean_housing_data.txt", 2 from models.train import ( ) split_into_training_and_testing_sets ) array([[ 2081., 314942.], [ 1059., 186606.], preprocess("data/raw/housing_data.txt", [ 1148., 206186.] "data/clean/clean_housing_data.txt" ... ) ] data = get_data_as_numpy_array( ) "data/clean/clean_housing_data.txt", 2 ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Splitting into training and testing sets from data.preprocessing_helpers import preprocess split_into_training_and_testing_sets(data) from features.as_numpy import get_data_as_numpy_array from models.train import ( (array([[1148, 206186], # Training set (3/4) split_into_training_and_testing_sets [2081, 314942], ) ... ] preprocess("data/raw/housing_data.txt", ), "data/clean/clean_housing_data.txt" array([[1059, 186606] # Testing set (1/4) ) ... data = get_data_as_numpy_array( ] "data/clean/clean_housing_data.txt", 2 ) ) ) training_set, testing_set = ( split_into_training_and_testing_sets(data) ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Functions are well tested - thanks to you! UNIT TESTING FOR DATA SCIENCE IN PYTHON

The linear regression model def train_model(training_set): UNIT TESTING FOR DATA SCIENCE IN PYTHON

The linear regression model from scipy.stats import linregress def train_model(training_set): slope, intercept, _, _, _ = linregress(training_set[:, 0], training_set[:, 1]) return slope, intercept UNIT TESTING FOR DATA SCIENCE IN PYTHON

Return values dif�cult to compute manually UNIT TESTING FOR DATA SCIENCE IN PYTHON

Return values dif�cult to compute manually Cannot test train_model() without knowing expected return values. UNIT TESTING FOR DATA SCIENCE IN PYTHON

True for all data science models UNIT TESTING FOR DATA SCIENCE IN PYTHON

Trick 1: Use dataset where return value is known import pytest import numpy as np from models.train import train_model def test_on_linear_data(): test_argument = np.array([[1.0, 3.0], [2.0, 5.0], [3.0, 7.0] ] ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Trick 1: Use dataset where return value is known import pytest import numpy as np from models.train import train_model def test_on_linear_data(): test_argument = np.array([[1.0, 3.0], [2.0, 5.0], [3.0, 7.0] ] ) expected_slope = 2.0 expected_intercept = 1.0 slope, intercept = train_model(test_argument) assert slope == pytest.approx(expected_slope) assert intercept == pytest.approx( expected_intercept ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Trick 2: Use inequalities import numpy as np from models.train import train_model def test_on_positively_correlated_data(): test_argument = np.array([[1.0, 4.0], [2.0, 4.0], [3.0, 9.0], [4.0, 10.0], [5.0, 7.0], [6.0, 13.0], ] ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Trick 2: Use inequalities import numpy as np from models.train import train_model def test_on_positively_correlated_data(): test_argument = np.array([[1.0, 4.0], [2.0, 4.0], [3.0, 9.0], [4.0, 10.0], [5.0, 7.0], [6.0, 13.0], ] ) slope, intercept = train_model(test_argument) assert slope > 0 UNIT TESTING FOR DATA SCIENCE IN PYTHON

Recommendations Do not leave models untested just because they are complex. Perform as many sanity checks as possible. UNIT TESTING FOR DATA SCIENCE IN PYTHON

Using the model from data.preprocessing_helpers import preprocess train_model(training_set) from features.as_numpy import get_data_as_numpy_array from models.train import ( 151.78430060614986 17140.77537937442 split_into_training_and_testing_sets, train_model ) preprocess("data/raw/housing_data.txt", "data/clean/clean_housing_data.txt" ) data = get_data_as_numpy_array( "data/clean/clean_housing_data.txt", 2 ) training_set, testing_set = ( split_into_training_and_testing_sets(data) ) slope, intercept = train_model(training_set) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing model performance def model_test(testing_set, slope, intercept): """Return r^2 of fit""" 2 Returns a quantity r . Indicates how well the model performs on unseen data. 2 Usually, 0 ≤ r ≤ 1 . 2 r = 1 indicates perfect �t. 2 r = 0 indicates no �t. 2 Complicated to compute r manually. UNIT TESTING FOR DATA SCIENCE IN PYTHON

Let's practice writing sanity tests! UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Testing plots UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer

Pizza without cheese! UNIT TESTING FOR DATA SCIENCE IN PYTHON

This lesson: testing matplotlib visualizations UNIT TESTING FOR DATA SCIENCE IN PYTHON

The plotting function plots.py data/ src/ def get_plot_for_best_fit_line(slope, |-- data/ intercept, |-- features/ x_array, |-- models/ y_array, |-- visualization title | |-- __init__.py ): | |-- plots.py tests/ """ slope: slope of best fit line intercept: intercept of best fit line """ UNIT TESTING FOR DATA SCIENCE IN PYTHON

The plotting function plots.py data/ src/ def get_plot_for_best_fit_line(slope, |-- data/ intercept, |-- features/ x_array, |-- models/ y_array, |-- visualization title | |-- __init__.py ): | |-- plots.py tests/ """ slope: slope of best fit line intercept: intercept of best fit line x_array: array containing housing areas y_array: array containing housing prices """ UNIT TESTING FOR DATA SCIENCE IN PYTHON

The plotting function plots.py data/ src/ def get_plot_for_best_fit_line(slope, |-- data/ intercept, |-- features/ x_array, |-- models/ y_array, |-- visualization title | |-- __init__.py ): | |-- plots.py tests/ """ slope: slope of best fit line intercept: intercept of best fit line x_array: array containing housing areas y_array: array containing housing prices title: title of the plot """ UNIT TESTING FOR DATA SCIENCE IN PYTHON

The plotting function plots.py data/ src/ def get_plot_for_best_fit_line(slope, |-- data/ intercept, |-- features/ x_array, |-- models/ y_array, |-- visualization title | |-- __init__.py ): | |-- plots.py tests/ """ slope: slope of best fit line intercept: intercept of best fit line x_array: array containing housing areas y_array: array containing housing prices title: title of the plot Returns: matplotlib.figure.Figure() """ UNIT TESTING FOR DATA SCIENCE IN PYTHON

Training plot ... from visualization import get_plot_for_best_fit_l preprocess(...) data = get_data_as_numpy_array(...) training_set, testing_set = ( split_into_training_and_testing_sets(data) ) slope, intercept = train_model(training_set) get_plot_for_best_fit_line(slope, intercept, training_set[:, 0], training_set[:, 1], "Training" ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing plot ... from visualization import get_plot_for_best_fit_l preprocess(...) data = get_data_as_numpy_array(...) training_set, testing_set = ( split_into_training_and_testing_sets(data) ) slope, intercept = train_model(training_set) get_plot_for_best_fit_line(slope, intercept, training_set[:, 0], training_set[:, 1], "Training" ) get_plot_for_best_fit_line(slope, intercept, testing_set[:, 0], testing_set[:, 1], "Testin ) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Don't test properties individually matplotlib.figure.Figure() Axes con�guration style Data style Annotations style ... UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing strategy for plots UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing strategy for plots One-time baseline generation Testing UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments Call plotting function on test arguments UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments Call plotting function on test arguments Convert Figure() to PNG image UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments Call plotting function on test arguments Convert Figure() to PNG image Image looks OK? UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments Call plotting function on test arguments Convert Figure() to PNG image Image looks OK? Yes Store image as baseline image UNIT TESTING FOR DATA SCIENCE IN PYTHON

One-time baseline generation One-time baseline generation Testing Decide on test arguments Call plotting function Fix plotting function on test arguments Convert Figure() to PNG image No Image looks OK? Yes Store image as baseline image UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing One-time baseline generation Testing Decide on test Call plotting function arguments on test arguments Convert Figure() Call plotting function Fix plotting function on test arguments to PNG image Convert Figure() to PNG image No Image looks OK? Yes Store image as baseline image UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing One-time baseline generation Testing Decide on test Call plotting function arguments on test arguments Convert Figure() Call plotting function Fix plotting function on test arguments to PNG image Convert Figure() to PNG image No Image looks OK? Yes Store image as baseline image Compare UNIT TESTING FOR DATA SCIENCE IN PYTHON

pytest-mpl Knows how to ignore OS related differences. Makes it easy to generate baseline images. pip install pytest-mpl UNIT TESTING FOR DATA SCIENCE IN PYTHON

An example test import pytest import numpy as np from visualization import get_plot_for_best_fit_line def test_plot_for_linear_data(): slope = 2.0 intercept = 1.0 x_array = np.array([1.0, 2.0, 3.0]) # Linear data set y_array = np.array([3.0, 5.0, 7.0]) title = "Test plot for linear data" return get_plot_for_best_fit_line(slope, intercept, x_array, y_array, title) UNIT TESTING FOR DATA SCIENCE IN PYTHON

An example test import pytest import numpy as np from visualization import get_plot_for_best_fit_line @pytest.mark.mpl_image_compare # Under the hood baseline generation and comparison def test_plot_for_linear_data(): slope = 2.0 intercept = 1.0 x_array = np.array([1.0, 2.0, 3.0]) # Linear data set y_array = np.array([3.0, 5.0, 7.0]) title = "Test plot for linear data" return get_plot_for_best_fit_line(slope, intercept, x_array, y_array, title) UNIT TESTING FOR DATA SCIENCE IN PYTHON

Run the test !pytest -k "test_plot_for_linear_data" --mpl ======================= test session starts ======================= ... collected 24 items / 23 deselected / 1 selected visualization/test_plots.py . [100%] ============= 1 passed, 23 deselected in 0.68 seconds ============= UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reading failure reports !pytest -k "test_plot_for_linear_data" --mpl ============================ FAILURES ============================= _______ TestGetPlotForBestFitLine.test_plot_for_linear_data _______ Error: Image files did not match. RMS Value: 11.191347848524174 Expected: /tmp/tmplcbtsb10/baseline-test_plot_for_linear_data.png Actual: /tmp/tmplcbtsb10/test_plot_for_linear_data.png Difference: /tmp/tmplcbtsb10/test_plot_for_linear_data-failed-diff.png Tolerance: 2 ============= 1 failed, 36 deselected in 1.13 seconds ============= UNIT TESTING FOR DATA SCIENCE IN PYTHON

Yummy! UNIT TESTING FOR DATA SCIENCE IN PYTHON

Let's test plots! UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Congratulations UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer

UNIT TESTING FOR DATA SCIENCE IN PYTHON

You've written so many tests UNIT TESTING FOR DATA SCIENCE IN PYTHON

You learned a lot UNIT TESTING FOR DATA SCIENCE IN PYTHON

T esting saves time and effort. pytest T esting return values and exceptions. Running tests and reading the test result report. Best practices Well tested function using normal, special and bad arguments. TDD, where tests get written before implementation. T est organization and management. Advanced skills Setup and teardown with �xtures, mocking. Sanity tests for data science models. Plot testing. UNIT TESTING FOR DATA SCIENCE IN PYTHON

Beyond assertion: setup and teardown UN IT TES TIN G F OR DATA S - PowerPoint PPT Presentation

Beyond assertion: setup and teardown UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer The preprocessing function def preprocess(raw_data_file_path, 1,801 201,411 clean_data_file_path

8. Assertion Based Design and 8.1 Assertion-based design Assertion Languages Assertions

8. Assertion Based Design and Assertion Languages Verification Technology Content 8.1

Rush Creek Preserve Setup & Teardown Team CW Station & Operators KG9X W9HB KF9D W9NXM

Compilation of Damage Findings from Multiple Recent Teardown Analysis Programs 25 th

Scintillators: Setup, performance and lessons learned Ran Hong CENPA, University of Washington

Commitment and implicit assertion Dave Ripley University of Connecticut http://davewripley.rocks

Assertion how to communicate well and improve relationships Sarah Patterson Therapist,

1.113.5 2.113.7 Set up secure shell (OpenSSH) Setup and configure basic DNS services Setup and

SCS Scorecard System V3.0 Super Admin (SHRU) Setup agency, category, location, period type

PVMD Delft University of Technology Learning objectives Typical JV testing setup Learning

Pile Driving Setup for Ohio Soils mer Bilgin, PhD, PE University of Dayton Dayton, Ohio 2019

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

SECTION 2: What is a loop invariant? Loop Reasoning An assertion that always holds at

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

UNIT TESTING 3 / 8 1 / 8 Unit testing involves: Lots of small, independent tests Reporting

Simple Linear Regression Chapter 10 1 Motivation Have data (sample, x s) Want to

Announcements Wednesday, November 28 Please fill out your CIOS survey! If 85% of the class

1 Hough Transform: Noisy line tokens votes Mechanics of the Hough transform Construct an

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Independent CPU performance Scatter Plot Matrix 209 CPU data: Cycle Time Minimum Memory

y i y = n Median : the midpoint of a group of data. Uchechukwu Ofoegbu Temple University

Develop Your Data Mindset Module 8 - Progress Monitoring Part 2 - Background Knowledge (Graphing

STAT 113 Analytic Inference for Regression Colin Reimer Dawson Oberlin College 21-24 April 2017

Beyond assertion: setup and teardown UN IT TES TIN G F OR DATA S - PowerPoint PPT Presentation

Beyond assertion: setup and teardown UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer The preprocessing function def preprocess(raw_data_file_path, 1,801 201,411 clean_data_file_path

8. Assertion Based Design and 8.1 Assertion-based design Assertion Languages Assertions

8. Assertion Based Design and Assertion Languages Verification Technology Content 8.1

Rush Creek Preserve Setup &amp; Teardown Team CW Station &amp; Operators KG9X W9HB KF9D W9NXM

Compilation of Damage Findings from Multiple Recent Teardown Analysis Programs 25 th

Scintillators: Setup, performance and lessons learned Ran Hong CENPA, University of Washington

Commitment and implicit assertion Dave Ripley University of Connecticut http://davewripley.rocks

Assertion how to communicate well and improve relationships Sarah Patterson Therapist,

1.113.5 2.113.7 Set up secure shell (OpenSSH) Setup and configure basic DNS services Setup and

SCS Scorecard System V3.0 Super Admin (SHRU) Setup agency, category, location, period type

PVMD Delft University of Technology Learning objectives Typical JV testing setup Learning

Pile Driving Setup for Ohio Soils mer Bilgin, PhD, PE University of Dayton Dayton, Ohio 2019

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

SECTION 2: What is a loop invariant? Loop Reasoning An assertion that always holds at

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

UNIT TESTING 3 / 8 1 / 8 Unit testing involves: Lots of small, independent tests Reporting

Simple Linear Regression Chapter 10 1 Motivation Have data (sample, x s) Want to

Announcements Wednesday, November 28 Please fill out your CIOS survey! If 85% of the class

1 Hough Transform: Noisy line tokens votes Mechanics of the Hough transform Construct an

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Independent CPU performance Scatter Plot Matrix 209 CPU data: Cycle Time Minimum Memory

y i y = n Median : the midpoint of a group of data. Uchechukwu Ofoegbu Temple University

Develop Your Data Mindset Module 8 - Progress Monitoring Part 2 - Background Knowledge (Graphing

STAT 113 Analytic Inference for Regression Colin Reimer Dawson Oberlin College 21-24 April 2017

Rush Creek Preserve Setup & Teardown Team CW Station & Operators KG9X W9HB KF9D W9NXM