Skip to content
Snippets Groups Projects
Commit b3b041f0 authored by Nando Farchmin's avatar Nando Farchmin
Browse files

Add README and gitignore

parents
No related branches found
No related tags found
No related merge requests found
README.md 0 → 100644
Open Source Development
=======================
The eScience Center provides a good <a href="https://guide.esciencecenter.nl" target="_blank">guide</a> on several aspects required for scientific open-source projects.
Using this guide as a starting point, we discuss some of the issues necessary to provide an easy to use and maintainable software project at the PTB.
For a real-world example on the points dicussed below see the following links:
- **PyThia:** <a href="https://gitlab1.ptb.de/pythia/pythia" target="_blank">https://gitlab1.ptb.de/pythia/pythia</a>
- **PyThia-Doc:** <a href="https://readthedocs.org/projects/pythia-uq/" target="_blank">https://readthedocs.org/projects/pythia-uq/</a>
Table of Contents
-----------------
1. [Project Setup](#project_setup)
1. [Version Control](#version_control)
2. [README and other Information](#readme)
3. [Code Release](#code_release)
2. [Coding (Python)](#coding_python)
1. [Coding Environment](#coding_env)
2. [Code Formatting](#code_formatting)
3. [Get your Code to run](#run_code)
3. [Code structure](#structure_code)
3. [Testing (Python)](#testing_python)
1. [PyTest](#pytest)
2. [CI / CD](#cicd)
4. [Documentation (Python)](#doc_python)
1. [Doc-Strings](#docstrings)
2. [Auto-Doc with Sphinx](#sphinx)
3. [Host Documentation online](#readthedocs)
Project Setup <a name="project_setup"></a>
-------------
### Version Control <a name="version_control"></a>
Different Websites to host a `git` repository:
- **PTB-GitLab:** <a href="https://gitlab1.ptb.de/" target="_blank">https://gitlab1.ptb.de/</a>
- GitLab: <a href="https://gitlab.com" target="_blank">https://gitlab.com</a>
- Github (Google): <a href="https://github.com" target="_blank">https://github.com</a>
- Bitbucket (Atlassian): <a href="https://bitbucket.org" target="_blank">https://bitbucket.org</a>
Especially if a project is developed/maintained by more then one person, it is useful to determine a develepment strategy (branching model) to keep the history clean and readible.
If the project is not tied to a single person, or if multiple people are responsible to maintain the project, it might be useful to create a <a href="https://docs.gitlab.com/ee/user/group/" target="_blank">GitLab Group</a>.
This way the project itself is not tied to a specific user account and additional projects may be linked under the same group as well.
(See the <a href="https://gitlab1.ptb.de/pythia/" target="_blank">PyThia Group</a> as an example.)
To keep the Git history clean and understandable, it is best practice to follow a consistent commit style guide, as e.g. the one discussed in <a href="http://who-t.blogspot.com/2009/12/on-commit-messages.html" target="_blank">this blog post</a>.
As an overview on the common git commit message style, <a href="https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html" target="_blank">this cheat sheet</a> might help a lot.
**General rules:**
- Each commit message consists of a **header**, a **body** and a **footer**.
- The **header** is mandatory and should not exceed 50 characters.
- The **body** and **footer** are optional, however mentioning an issue in the footer will close the issue automatically.
- Any line of the commit message cannot be longer than 72 characters!
This allows the message to be easier to read on GitLab as well as in various git tools.
Another important thing when developing code with multiple developers, maintainers and/or users is to choose (and specify) a branching model.
This way everyone working with the code, either through usage or contribution, knows how to obtain a stable version of the software, checkout the latest development stages or add features in a reproducable way.
There exist a multitude of branching models, neither of them being better then the other.
For examples, see either the <a href="https://docs.github.com/en/get-started/quickstart/github-flow" target="_blank">GitHub flow</a> model or the <a href="https://gitlab1.ptb.de/pythia/pythia/-/blob/development/DEVELOPERS.md#git-workflow" target="_blank">branching model of PyThia</a>.
To keep only the relevant information in your repository, you should specify which files should not be tracked by `Git`.
To do so, simply add a `.gitignore` file to your repository root.
A `.gitignore` can look something like this:
```yaml
# general
__pycache__/
*.pyc
.ipynb_checkpoints/
# data and image storage directories
data/
img/
# configuration file
app/config/config.ini
```
**Important:**
Do not use `jupyter notebooks` when implementing code used for version control, as `Git` is only able to use diffs efficiently for plain text files.
Similarily, don't add data files (`.npz`, `.hdf5`, etc.), documents (`.pdf`, `.docx`, etc.) or images (`.png`, `.jpeg`, `.svg`, etc.) to your git repository.
### README and other Information <a name="readme"></a>
Each repository should include a `README.md` file wich will be used automatically as a start page on Gitlab and others.
The `README.md` should include information on the purpose of the repository, installation guides and general useful things.
Gitlab in particular uses some special commands to display source code, links and images (see <a href="https://docs.gitlab.com/ee/user/markdown.html" target="blank">Gitlab Flavoured Markdown</a>).
If you make your repository publicly available, you definitely should include a `LICENSE.txt` to specify under which conditions your code can and should be used by others.
Gitlab, Github and Bitbucket even give you suggestions for different licese files.
If many people are contributing to your repository, you should also think about including markdown files indicating how to raise issues or in which style the code and commiting to the repository work.
Also a code of conduct might be a good idea, if your project is larger and publicly visible.
Finally, if other people's work is based on your code, you should include a `CHANGELOG.md` file to keep every user up-to-date with the latest changes.
### Code Release <a name="code_release"></a>
Choose a way to determine different versions of your code, such as <a href="https://semver.org/" target="_blank">semantic versioning</a>.
This way users of your code know which version of your code they are working with, which helps with reproducability of complex code projects later on.
Best practice is to use e.g. <a href="https://docs.gitlab.com/ee/topics/git/tags.html" target="_blank">GitLab Tags</a> to distinguish different versions of the code in the git history.
With this, it is very easy to checkout a certain commit to install a specific version.
Coding (Python) <a name="coding_python"></a>
---------------
### Coding Environment <a name="coding_env"></a>
Choose a coding invironment you are comfortable with.
Optimally, your environment supports you while coding, i.e., has features like
- code completion
- advanced file navigation (easy to jump to functions in other files)
- live code suggestions (e.g., too see function names and input types)
- linting (format code automatically, show you if inputs have wrong type)
This can look something like this (in neovim):
![auto_completion](./img/auto_completion.png)
Here are some suggestions for coding environments:
- IDEs:
- <a href="https://code.visualstudio.com/" target="_blank">VS Code</a>
- <a href="https://www.jetbrains.com/pycharm/" target="_blank">PyCharm</a>
- text editors:
- <a href="https://www.vim.org/" target="_blank">vim</a>
- <a href="https://neovim.io/" target="_blank">neovim</a>
- <a href="https://www.gnu.org/software/emacs/" target="_blank">emacs</a>
**Tip:**
Try to learn at least the basic commands of vi/vim, as this editor is terminal based (no GUI required) and is preinstalled on any linux distribution.
This lets you edit file on e.g., a server via ssh.
Git also uses vim (or less) as a standard editor.
### Code Formatting <a name="code_formatting"></a>
Use a consistent coding style.
Best practice is to adhere to industry standards such as the <a href="https://pep8.org/" target="_blank">PEP-8</a> code conventions.
The <a href="https://google.github.io/styleguide/pyguide.html" target="_blank">Google Python Style Guide</a> uses the PEP-8 standard as well and gives a concise explaination of good practices as well.
Optimally, use some kind of auto-formatting tool to enforce PEP-8 format.
### Get your Code to run <a name="run_code"></a>
There are several way to make your code available to other persons and devices.
Depending on the scope and goal of your project, some options may be more reasonable then others.
Here are some scenarios with suggestions:
###### 1. Writing scientific code:
If you simply write some test scripts while relying on either common packages or packages developped by other groups, you should ensure that each required package (with the respective version) is available to users.
This allows others (or yourself) to easily setup all requirements to run your code.
The easiest way to do this is either using a `pip requirements.txt` or, if you use Anaconda, a `conda environment.yml`.
These files contain the package (version) information and the environment can simply be installed using one command line, e.g.,
```sh
conda env create --file environment.yml
```
An example `environment.yml` file looks like this:
```python
name: my_env
channels:
- conda-forge
- defaults
dependencies:
- ipython=8.4.*
- matplotlib=3.5.*
- pip=22.1.*
- python=3.9.*
- scipy=1.7.*
- pip:
- numpy==1.22.*
- pylint==2.14.*
```
**Tip:**
Even though you can "freeze" your current environment to create a snapshot of every package currently installed, it is better to add the packages to the `environment.yml` file manually.
This way you can e.g., leave some specifics out (`numpy==1.22.*`) to get the latest bugfixes.
More importantly, you can specify the packages that are necessary to run your code and don't force others to install every package on your machine on theirs as well.
**Tip:**
If you use conda, switch to <a href="https://mamba.readthedocs.io/en/latest/user_guide/mamba.html" target="_blank">Mamba</a> as this uses the same syntax but is a lot faster.
You can install mamba by running `conda install mamba -n base -c conda-forge` in your base environment.
###### 2. Writing scientific code with own package:
Assume you have a repository with some utility functions and another one in which you implement some application.
You can (locally) import the utility package, but communicating this to others is difficult.
Moreover, if you need to change something in the utility repository while working on an application, you need to manage multiple git repositories.
For this case, you can use <a href="https://git-scm.com/book/en/v2/Git-Tools-Submodules" target="_blank">Git Submodules</a> to integrate a snapshot of the utility repository into you application one.
Simply navigate to the submodule (sub-directory) and start editing/commiting your files there.
It is basically your git-repo inside another git repo.
**Tip:**
You should still use an environment file to track package versions of other packages.
And you should specify with which commit of the utility repository your application repository expects to work.
###### 3. Writing a public package:
If you are writing a library/package for others to use (like <a href="https://gitlab1.ptb.de/pythia/pythia" target="blank">PyThia</a>), you can use the `setuptools` package and a `setup.py` script to enable installation of your code via `pip`.
You can specify a version, description, author, copy right and package versions this way and installation becomes as easy as calling
```sh
pip install .
```
from the directory the `setup.py` script is located in.
This installs your package into the general environment, so that you cann import the package from any location on your device.
An example setup script can look like this:
```python
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="pythia-uq",
version="2.0.0",
author="Nando Farchmin",
author_email="nando.farchmin@ptb.de",
description=("Package for solving inverse problems and quantifying their "
+ "uncertainties via general polynomial chaos."),
long_description=long_description,
long_description_content_type="text/markdown",
url="https://gitlab1.ptb.de/pythia/pythia",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
],
install_requires=[
"numpy>=1.20.0",
"scipy>=1.5.0",
"psutil>=5.0",
"sphinx-autodoc-typehints>=1.18.1",
],
)
```
**Tip:**
While developing the package, don't rely on the pip installation process, as you need to install/update the package everytime with pip if you change anything.
For development, simply add the path to the repository locally to your `PYTHONPATH`.
**Tip:**
If you want to increase accessibility even further, you can upload the package to <a href="https://pypi.org/" target="blank">PyPI.org</a>.
This way everyone can install you package using `pip install <package-name>` via the internet, no need to clone the repository.
### Code structure <a name="structure_code"></a>
You should split the code doing the actual work from the main files/applications you need to run.
Here is an example for a reasonable code package structure:
![code_structure](./img/code_structure.png)
Testing (Python) <a name="testing_python"></a>
----------------
It is very important to ensure that your code is doing exactly what it should do, especially if you write scientific code that is very complex.
### PyTest <a name="pytest"></a>
The native way to ensure the core functionality of your functions, classes and methods is using `pytest` (or any other Python testing module).
The <a href="https://docs.pytest.org/en/7.1.x/" target="blank">pytest</a> website explains the basic workings very well.
**Tip:**
Using unit tests to check the functionality of your code can be used for development as well.
In `test-driven development`, you first specify in a test the things your functions or class should be able to do and then start to write the actual thing.
This way you can ensure that the code you write does exactly what you wanted from it in the beginning.
### CI / CD <a name="cicd"></a>
Another step would be to include **CI/CD** (continuous integration and continuous development) into your repository.
What this essentially does is specifying tasks that are run everytime you push your changes to the repository server.
A typical application is running your unit tests to ensure that you did not change core functionality of your code after editing it, but you can do other things such as creating auto-doc or updating webpages as well.
The setup is very easy and basically built-in into Gitlab, Github and Bitbucket.
You simply need to include a `.gitlab-ci.yml` file (for Gitlab) in your repository root directory which looks something like this
```yaml
stages:
- build
- test
- deploy
before_script:
- export HTTPS_PROXY="webproxy.bs.ptb.de:8080"
build-job: # check if installation of pythia works
stage: build
image: python:3.8
script:
- pip install .
unit-test-job: # run unit tests
stage: test
image: python:3.8
script:
- pip install pytest pytest-cov
- pip install .
- python -m pytest --cov-report=html --cov=pythia .
artifacts:
paths:
- coverage
expire_in: 30 days
deploy-job:
stage: deploy
script:
- echo "Not implemented yet."
```
and you're done.
If multiple people work on the same repository (e.g., students or other group members), you can enable that merging a commit is only done if the pipeline succeeds.
This way nobody can produce code that is doing unexpected things.
Documentation (Python) <a name="doc_python"></a>
----------------------
### Doc-Strings <a name="docstrings"></a>
Always use doc-strings to document functions, classes and modules.
To be able to generate a documentation of your code automatically, keep doc-string standards such as the <a href="https://numpydoc.readthedocs.io/en/latest/format.html" target="_blank">Numpy Doc Style Guide</a>.
The Numpy style guide in particular is very suitable to document scientific code.
A practical example of the Numpy Doc style can be found <a href="https://numpydoc.readthedocs.io/en/latest/example.html#example" target="_blank">here</a>.
**Tip:**
Using docstrings also can help you directly through your IDE.
Here is an example:
![power of docstrings](./img/auto_completion.png)
### Auto-Doc with Sphinx <a name="sphinx"></a>
If you plan to write a software package that should be used by others or if you need to provide a documentation for your code as a deliverable for a project, you should think about creating the documentation automatically.
This way the documentation is always up to date with your code and you don't need to do anything.
Setting up an auto-doc with, for example <a href="https://www.sphinx-doc.org" target="blank">Sphinx</a> can be a little tricky, but in principle this is very easy.
If you want to get started, simply read the <a href="https://www.sphinx-doc.org/en/master/usage/quickstart.html" target="blank">Sphinx quickstart guide</a>.
This is a very nice feature as you cannot only produce a documentation based on your docstrings, but can also include additional pages with tutorials or a description on the setup of your project.
For a demo of an auto-doc, you can checkout the <a href="https://pythia-uq.readthedocs.io/en/latest/" target="blank">PyThia documentation</a>.
Of course you can also look into the <a href="https://gitlab1.ptb.de/pythia/pythia/-/tree/development/docs/source" target="blank">doc-source files of PyThia</a>, as they are tracked in the repository as well.
### Host Documentation online <a name="readthedocs"></a>
Being able to create the documentation either as html or pdf locally is nice, but directly hosting the documentation online for everyone to access cooler by far.
As long as you're writing open-source non-profit code (which you should in science!), <a href="https://readthedocs.org/" target="blank">readthedocs.org</a> can host your documentation online for free.
You can create an account there and simply link to your repository.
Read the Docs will even create different versions of the documentation based on the commits of your Git repository, i.e., a "stable" and a "latest" version you can specify the branches of as well as versions for different tags (v1.2.17, v2.0.1) of your project.
This way, even if somebody uses an older version of your code, they still can access the correct documentation.
All you need to do from the git side is include a `.readthedocs.yml` file in your repository root that looks something like this:
```yaml
# Required
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-20.04
tools:
python: "3.8"
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats:
- pdf
- epub
# Optionally declare the Python requirements required to build your docs
python:
install:
- method: setuptools
path: .
```
**Tip:**
Using Read the Docs with the PTB Gitlab (`gitlab1.ptb.de`) is not possible (afaik) due to the firewall proxy settings.
A workaround is mirroring the repository on e.g., Gitlab.
Doing this unidirectional allows Gitlab to pull updates from the original repository on the PTB Gitlab approximately every 30 minutes automatically.
Then you can use this mirror repository to create the documentation on Read the Docs.
_It might be clever to disable CI/CD in the mirror repo, as CI/CD times are limited for free users._
img/auto_completion.png

70.7 KiB

img/code_structure.png

68.6 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment