Project templates CREATIN G ROBUS T P YTH ON W ORK F LOW S Martin Skarzynski Co-Chair, Foundation for Advanced Education in the Sciences (FAES)
Why use templates? Avoid repetitive tasks Standardize project structure Include con�guration �les: Pytest ( pytest.ini ) Sphinx ( conf.py ) Include Make�le to automate further steps: Build Sphinx documentation Create virtual environments Initialize git repositories Deploy packages to the PyPI CREATING ROBUST PYTHON WORKFLOWS
Not just for Python Flexible Edit �les in template directory Local Remote CREATING ROBUST PYTHON WORKFLOWS
Cookiecutter prompts cookiecutter() arguments: from cookiecutter import main template : git repository url or path main.cookiecutter(TEMPLATE_REPO) project [PROJECT_NAME]: >? "My Project" CREATING ROBUST PYTHON WORKFLOWS
Cookiecutter prompts cookiecutter() arguments: from cookiecutter import main template : git repository url or path main.cookiecutter(TEMPLATE_REPO) project [PROJECT_NAME]: "My Project" Select license: 1 - MIT 2 - BSD 3 - GPL3 Choose from 1, 2, 3 (1, 2, 3) [1]: >? CREATING ROBUST PYTHON WORKFLOWS
Cookiecutter defaults cookiecutter() arguments: from cookiecutter import main template : git repository url or path main.cookiecutter( no_input TEMPLATE_REPO_URL, Suppress prompts no_input=True ) Use cookiecutter.json defaults Key-value pairs CREATING ROBUST PYTHON WORKFLOWS
Override defaults cookiecutter() arguments: from cookiecutter import main template : git repository url or path main.cookiecutter( no_input TEMPLATE_REPO, Suppress prompts no_input=True, extra_context={'KEY': 'VALUE'} Use cookiecutter.json defaults ) Key-value pairs extra_context Override defaults { "project": "Your project's name", "license": ["MIT", "BSD", "GPL3"] } CREATING ROBUST PYTHON WORKFLOWS
Access JSON �les from json import load from requests import get from pathlib import Path # Remote JSON file to dictionary # Local JSON file to dictionary get(JSON_URL).json().values() load(Path(JSON_PATH).open()).values() # List remote cookiecutter.json keys # List local cookiecutter.json keys [*get(JSON_URL).json()] [*load(Path(JSON_PATH).open())] CREATING ROBUST PYTHON WORKFLOWS
Jinja2 template {"project": "Project Name", "author": "Your name (or your organization/company/team)", "repo": "{{ cookiecutter.project.lower().replace(' ', '_') }}", } project = "My Project" project.lower().replace(' ', '_') my_project CREATING ROBUST PYTHON WORKFLOWS
Cookiecutter example from cookiecutter.main import cookiecutter cookiecutter('https://github.com/marskar/cookiecutter', no_input=True, extra_context={'project': 'PROJECT_NAME', 'author': 'AUTHOR_NAME'}) $ cookiecutter https://github.com/marskar/cookiecutter --no-input \ project="PROJECT_NAME" author="AUTHOR_NAME" \ user=USER_NAME description="DESCRIPTION" CREATING ROBUST PYTHON WORKFLOWS
Cookiecutter example from cookiecutter.main import cookiecutter cookiecutter('gh:marskar/cookiecutter', no_input=True, extra_context={'project': 'PROJECT_NAME', 'author': 'AUTHOR_NAME', 'user': 'USER_NAME', 'description': 'DESCRIPTION'}) $ cookiecutter gh:marskar/cookiecutter --no-input \ project="PROJECT_NAME" author="AUTHOR_NAME" \ user=USER_NAME description="DESCRIPTION" CREATING ROBUST PYTHON WORKFLOWS
Project structure ... ??? src ??? docs ? ??? template ? ??? Makefile ? ??? __init__.py ? ??? _static ? ??? template.py ? ??? authors.rst ??? tests ? ??? changelog.rst ??? conftest.py ? ??? conf.py ??? test_template.py ? ??? index.rst ? ??? license.rst $ make html ??? requirements.txt ??? setup.cfg ??? setup.py CREATING ROBUST PYTHON WORKFLOWS
CREATING ROBUST PYTHON WORKFLOWS
Let's practice using project templates! CREATIN G ROBUS T P YTH ON W ORK F LOW S
Executable projects CREATIN G ROBUS T P YTH ON W ORK F LOW S Martin Skarzynski Co-Chair, Foundation for Advanced Education in the Sciences (FAES)
Run a module def print_name_and_file(): prj print('Name is', __name__, ??? src 'and file is', __file__) ??? pkg ??? __init__.py if __name__ == "__main__": ??? main.py print_name_and_file() $ python -m prj.src.pkg.main Name is __main__ and file is /Users/USER/prj/src/pkg/main.py CREATING ROBUST PYTHON WORKFLOWS
Top-level imports # Import module into __main__.py prj (from prj.src.pkg.main ??? __main__.py import print_name_and_file) ??? src ??? pkg if __name__ == "__main__": ??? __init__.py print_name_and_file() ??? main.py $ python -m prj Name is prj.src.pkg.main and file is /Users/USER/prj/src/pkg/main.py CREATING ROBUST PYTHON WORKFLOWS
Import error # Import module into __main__.py prj (from src.pkg.main ??? __main__.py import print_name_and_file) ??? src ??? pkg if __name__ == "__main__": ??? __init__.py print_name_and_file() ??? main.py $ python -m pkg ... ModuleNotFoundError: No module named 'src' CREATING ROBUST PYTHON WORKFLOWS
Run zipped project import zipapp prj ??? __main__.py zipapp.create_archive('prj') ??? src ??? pkg $ python -m zipapp prj ??? __init__.py $ python prj.pyz ??? main.py Name is src.pkg.main and file is prj.pyz/src/pkg/main.py CREATING ROBUST PYTHON WORKFLOWS
Pass arguments to projects 1. Include a command-line interface (CLI) in import sys __main__.py if __name__ == "__main__": 2. Use zipapp to create zipped project print(sys.argv) 3. Pass shell arguments to project $ python -m zipapp prj $ python prj.pyz hello ['prj.pyz', 'hello'] CREATING ROBUST PYTHON WORKFLOWS
Zipapp main argument import os import zipapp os.remove('prj/__main__.py') zipapp.create_archive('prj', main='src.pkg.main:print_name_and_file') $ rm prj/__main__.py $ python -m zipapp prj --main src.pkg.main:print_name_and_file CREATING ROBUST PYTHON WORKFLOWS
Zipapp set interpreter import zipapp zipapp.create_archive('prj', interpreter="/usr/bin/env python") $ python -m zipapp prj --python "/usr/bin/env python" $ ./prj.pyz Name is src.pkg.main and file is ./prj.pyz/src/pkg/main.py CREATING ROBUST PYTHON WORKFLOWS
Self-contained zipped projects import zipapp zipapp.create_archive('prj', interpreter="/usr/bin/env python") $ python -m pip install --requirement prj/requirements.txt --target prj $ python -m zipapp prj --python "/usr/bin/env python" $ ./prj.pyz Name is src.pkg.main and file is ./prj.pyz/src/pkg/main.py CREATING ROBUST PYTHON WORKFLOWS
Let's make an executable project! CREATIN G ROBUS T P YTH ON W ORK F LOW S
Notebook pipelines CREATIN G ROBUS T P YTH ON W ORK F LOW S Martin Skarzynski Co-Chair, Foundation for Advanced Education in the Sciences (FAES)
Jupyter nbconvert Can be used as a Python library, e.g. our nbconv() function Can execute notebooks $ jupyter nbconvert --execute --to notebook input.ipynb --output output.ipynb Cannot pass arguments to code in notebooks CREATING ROBUST PYTHON WORKFLOWS
Injected parameters $ papermill input.ipynb output.ipynb --parameters PARAMETER VALUE CREATING ROBUST PYTHON WORKFLOWS
Default parameters $ papermill input.ipynb output.ipynb --parameters alpha 0.2 CREATING ROBUST PYTHON WORKFLOWS
Classic notebook interface CREATING ROBUST PYTHON WORKFLOWS
JupyterLab interface Edit metadata (JupyterLab) { "tags": [ "parameters" ] } CREATING ROBUST PYTHON WORKFLOWS
Jupyter nbformat nbformat.read() : read in a notebook import nbformat Edit the �rst cell Add a parameters tag to metadata nb = nbformat.read('NOTEBOOK.ipynb', as_version=4) Add a default parameter to source nbformat.write() : overwrite the original nb.cells[0].metadata = {'tags': ['parameters']} nb.cells[0].source = "alpha = 0.4" nbformat.write(nb, 'NOTEBOOK.ipynb') CREATING ROBUST PYTHON WORKFLOWS
Execute notebook pm.execute_notebook() $ papermill NOTEBOOK_PATH input_path: str OUTPUT_PATH output_path: str --cwd cwd: Any = None -p, --parameters parameters: Any = None -k, --kernel kernel_name: Any = None --report-mode / --not-report-mode report_mode: Any = False CREATING ROBUST PYTHON WORKFLOWS
Parametrize Save parameter names and values as lists import papermill as pm Create a dictionary of custom parameters names = ['alpha', 'ratio'] values = [0.6, 0.4] Pass the dictionary T o the execute_notebook() function param_dict = dict(zip(names, values)) As its parameters argument pm.execute_notebook( 'IN.ipynb', 'OUT.ipynb', kernel_name='python3', parameters=param_dict ) CREATING ROBUST PYTHON WORKFLOWS
Overwrite defaults CREATING ROBUST PYTHON WORKFLOWS
Notebook parameters # Parameters dataset_name = "diabetes" model_type = "ensemble" model_name = "RandomForestRegressor" hyperparameters = {"max_depth": 3, "n_estimators": 100, "random_state": 0} CREATING ROBUST PYTHON WORKFLOWS
Recommend
More recommend