Python Introductory Course

Welcome to day 4:

  • Testing (briefly)
  • File-IO
  • package management (virtual environments, pip, anaconda)
  • complex data types III (Queues)
  • Numpy I

Presenter Notes

Testing Primer

Landscape

Presenter Notes

Testing - Well ... Why?

  • points out errors and defects
  • increases reliability, reduces chance of failure
  • ensures quality over the lifecycle of the software
  • reduces maintenance costs (money & time)
  • eases implementation of new features
  • may increase performance

Presenter Notes

DocTests - Easiest way to test in Python

  • Combines documentation with test cases
  • At the same time acts as examples for usage
  • We may test stand-alone function and class methods
  • Only basic testing, no preparation of test case

Presenter Notes

Doc Blocks

1 def fun():
2     """Just an example for a single line doc string."""
  • single- or multi-line docstrings
  • describe the function / method
  • must be kept in sync with code
  • should give fast understanding of what the unit is doing (functionality, interface and return values)
  • help(fun) shows docstring in Python Console

Presenter Notes

DocTests - Example

 1 def check_input_type(input):
 2     """
 3     Checks the type that is passed and returns type name as string.
 4 
 5     # Example usages:
 6 
 7     >>> check_input_type("Hello World")
 8     'str'
 9     >>> check_input_type(1)
10     'int'
11     >>> check_input_type(0.0)
12     'float'
13     >>> check_input_type([])
14     'something else'
15     """
16     if isinstance(input, int):
17         return 'int'
18     elif isinstance(input, float):
19         return 'float'
20     elif isinstance(input, str):
21         return 'str'
22     else:
23         return 'something else'

Presenter Notes

DocTests - How to Run?

• Right-click the function with doctest and click on "Run 'Doctest '"

• Or: add the following lines and run the whole module with all doctests

 1 import doctest
 2 
 3 # ...
 4 # ... put your functions with doctests here    
 5 # ...
 6 
 7 # this invokes the doctests for the actual module
 8 def test_it():
 9     doctest.testmod(verbose=True)
10 
11 # this is a "main" function
12 if __name__ == '__main__':
13     test_it()

• Other ways to run → advanced course

Presenter Notes

DocTests

  • easy and fast way to test functions and methods
  • tests output against expectations
  • in case, you have to print
  • caveats: element ordering, sometimes whitespace problems
  • you may skip a line with: # doctest: +SKIP in the end

Presenter Notes

Exercises : Day 4 - 1

Write a function that checks that an expression of open and close parenthesis, "(" and ")", is correct. A parenthesis can only close, if one has been opened before. Example "()()" and "((()())())" are correct, while "())(" is not. Test your function using a doctest.

Presenter Notes

File IO

  • probably more important than User IO
  • Python gives you many possibilities
  • simple line based
  • csv formatted
  • excel (xlsx) through pandas
  • hdf5 through h5py
  • binary e.g. through pickle

Presenter Notes

File IO - the basics, line based

  • open file: file_p = open(<FILENAME>, <MODE>), MODE: a(append), r(read), w(write)
  • close: close(file_p)
  • read a line: file_p.readline()
  • read as a whols: file_p.read()
  • write: file_p.write(some_data)
  • printing using print: print(exp, file=file_p) Note: uses the str representation of exp

Presenter Notes

File IO - read and write pattern

 1 # looping over all lines delimited by newline characters (`\n`) 
 2 input_file = open('inFile.txt', 'r')
 3 for line_str in input_file:
 4         # process line_str, the string of chrs up to and including
 5         # the next '\n' in the file attached to inp_obj
 6 input_file.close()
 7 
 8 
 9 
10 # processing and writing to output file
11 output_file = open('outFile.txt', 'w')
12 while processing_data:
13     # calculate the next output next_output
14     output_file.write(next_output)
15 output_file.close()

Presenter Notes

File IO - CSV

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

-- https://docs.python.org/3/library/csv.html

Presenter Notes

File IO - CSV

  • Python csv module:
  • csv = "comma separated values"
  • csv allows other delimiters, too
  • for strings, there is a quote character!
  • wrong choices of delimiter and quote character lead to wrong io

Presenter Notes

CSV - Example

1 import csv
2 with open('sample.csv') as csv_file:
3     csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
4     for row in csv_reader:
5         # rows are now lists with column elements
6         print(row)

Presenter Notes

Exercises : Day 4 - 2

  • Copy example.csv file in the day-3/assets/ directory to your current directory
  • Read in the file using csv module and calculate the per column sum and mean values

Presenter Notes

Package Management

  • Python has a rich ecosystem with many packages
  • Python Package Index: https://pipy.org (almost half a million packages available)
  • packages are bundled modules, functions and classes and usually have dependencies
  • packages have version numbers (semantic versioning Major.Minor.Patch)
  • it is wise to encapsulate to use per-project separated environment

Presenter Notes

Package Managements - Environments

  • 1) virtual environment with pip:

    • command: pip or pip3
    • packages from pypi.org, local, git, urls, tarballs etc.
    • can invoke compiling
  • 2) with Anaconda:

    • command conda
    • also non-python dependencies
    • installs binaries
  • in both cases: Remember the PTB Web-Proxy (webproxy.berlin.ptb.de:8080)!

Presenter Notes

pip Example

# create a virtual environment
python -m venv my_venv

# source environment (Windows)
. my_venv/Scripts/Activate
# source environment (Linux)
. my_venv/bin/activate

# shell now shows the environment folder in prompt

# install a python package NOTE: proxy is required
pip install --proxy=webproxy.berlin.ptb.de:8080 numpy

# show what is installed
pip freeze

Presenter Notes

Anaconda Example:

# create the environment
conda create -n my_conda_env

# possibly list your environments and check
conda env list

# activate the environment
conda activate my_conda_env

# install some packages
conda install biopython

# Again: Set the proxy!
conda config --set proxy_servers.http <PROXY>
conda config --set proxy_servers.https <PROXY>

Presenter Notes

Usage in PyCharm

  • You can use your environment in PyCharm (or create new ones)
  • [File] → [Settings] → [Project:] → [Python Interpreter]
  • click gear wheel to the right
  • choose an existing pip or conda environment
  • ... or create a new one
  • In each case you will see installed python packages of the environment
  • Proxy: [Settings] → [Appearance & Behavior] → [System Settings] → [HTTP Proxy]

Presenter Notes

Exercises : Day 4 - 3

  • Install numpy in your environment
  • Check the version of the package that has been installed
  • Have there been other packages installed as dependencies?

Presenter Notes

More Data Types : Queues

Landscape

Presenter Notes

Queues - Properties

  • head and tail (only points of interaction)
  • ... i.e. only there we can add / remove elements
  • can be double ended (deque)
  • LIFO (last in first out) aka stack
  • FIFO (first in first out) [the standard at supermarket cash]

Presenter Notes

Queues : Latest Re-Implementation

Landscape

Presenter Notes

Python - collections.deque

  • from collections import deque
  • double ended queue
  • adding: append and appendleft
  • removing: pop and popleft
  • in newer versions also methods for searching and inserting in queue body
  • helpers: clear, copy, count, reverse, rotate

Presenter Notes

Python - queue module

  • import queue
  • designed with multi-threading in mind
  • multi-producer, multi-consumer
  • single ended queue : queue.Queue
  • stack : queue.LiFoQeueue

Presenter Notes

Python - queue example

 1 import threading
 2 import queue
 3 
 4 q = queue.Queue()
 5 
 6 def worker():
 7     while True:
 8         it = q.get()
 9         print(f'Working on {it}')  # real work here
10         q.task_done()
11 
12 # Turn-on the worker thread.
13 threading.Thread(target=worker, daemon=True).start()
14 
15 # Send thirty task requests to the worker.
16 for item in range(30):
17     q.put(item)
18 
19 # Block until all tasks are done.
20 q.join()
21 print('All work completed')

Presenter Notes

Exercises : Day 4 - 4

Simulate the following:

  • At the supermarket cash a long queue has piled up consisting of the following people: "Alice", "Bob", "Charly", "Daniela", "Emile"
  • The cashier manages to service "Alice" and "Bob" but in the mean time "Fabienne" and "Gustavo" have queued
  • Now "Daniela" has lost patience and leaves the queue from the middle
  • Due to an outrageous pandemic spaces between customers are needed. Do the same as above and make sure to insert a "shopping trolley" in between two persons. Think of how to implement this in your PandemicQueue class elegantly.

Presenter Notes

Numpy - Motivation

  • Python list flexible but SLOW
  • especially when dealing with lots of numbers
  • no direct math operations on list
  • mult-dimension tedious e.g. my_3d_list[x][y][z] (no slicing)
  • Numpy: np.array (1d), np.matrix (2d) and np.ndarray (arbitrary dimension)

Presenter Notes

Numpy Introduction

Check out the numpy quickstart

  • C-library of n-dimensional arrays of fixed size
  • arrays of same datatype aligned in memory → fast
  • features element-wise operations
  • minimal data copies ↔ works with views of same data
  • fast vectorized reduction operations (np.sum, np.mean)
  • math operations: np.exp, np.sin
  • zero-copy array slicing
  • basis for a lot of other Python packages (e.g. pandas, sklearn etc.)

Presenter Notes

Numpy 1d Array Construction

 1 # always needed
 2 import numpy as np
 3 
 4 # creation of a linear array from input list (or tuple)
 5 arr = np.array([i for i in range(200)])
 6 
 7 # creation of an array with 200 zeros (floats)
 8 float_zeros = np.zeros(200)
 9 
10 # creation of array with 200 zeros (ints)
11 int_zeros = np.zeros(200, dtype=np.int32)
12 
13 # creation of an array initialized to 1
14 int_ones = np.ones(200, dtype=np.int32)
15 
16 # similar to python range integer ascending from 0...N-1
17 int_range = np.arange(10)

Presenter Notes

Numpy Data Types

  • can be set in constructor of arrays with dtype=
  • many data types available, short list: np.int32, np.int64, np.float32, np.float64
  • aliases for above data types 'i4', 'i8', 'f4', 'f8'
  • can convert the type with astype(new_type) method

Presenter Notes

Numpy Examples

 1 # array of ones with dtype float32 ('f4')
 2 a = np.ones(10, dtype='f4')
 3 
 4 # conversion to int32
 5 a_int = a.astype('i4')
 6 
 7 # creation of a float64 array
 8 b = np.array([1.234, 3.876, 3.54], dtype='f8')  # array([1.234, 3.876, 3.54 ])
 9 
10 # conversion to i4 - values are truncated (NOT rounded!)
11 b_int = b.astype('i4')  # array([1, 3, 3])
12 
13 # rounding to nearest integer
14 b_int_rounded = b.round().astype('i4')  # array([1, 4, 4])

Presenter Notes

Numpy 1d Math

 1 # addition (subtraction)
 2 a = np.array([1., 2.])
 3 b = np.array([0.5, 0.5])
 4 a + b
 5 >>> array([1.5, 2.5])
 6 
 7 # multiplication
 8 a * b
 9 >>> array([0.5, 1. ])
10 
11 # division (element-wise)
12 b / a
13 >>> array([0.5, 1. ])
14 
15 # multiplication with scalar
16 3.0 * a
17 >>> array([3., 6.])

Presenter Notes

Numpy : Array Access

 1 # indexing as with lists
 2 a = np.arange(10)
 3 a[0]
 4 >>> 0
 5 a[1]
 6 >>> 1
 7 
 8 
 9 # slicing as with lists, return is sub-array !representation! (view)
10 sub_a = a[2:4]
11 sub_a
12 >>> array([2, 3])
13 sub_a[0] = 5
14 a
15 >>> array([0, 1, 5, 3, 4, 5, 6, 7, 8, 9])
16 
17 
18 # slicing in strides of 2  syntax: [start:end:stride]
19 a[0::2]
20 >>> array([0, 5, 4, 6, 8])

Presenter Notes

Numpy - Math Functions

• Lots of math functions available

• Trigonometric, Sums, Trigonometric, etc...

• Check https://numpy.org/doc/stable/reference/routines.math.html

• All act on complete array:

1 np.exp(np.array([0.0, 1.0, 2.0]))
2 >>> array([1., 2.71828183, 7.3890561 ])

Presenter Notes

Numpy - Further Information

Presenter Notes

Presenter Notes