Winter Semester 2022/2023

Summary

This class is a first introduction to Python programming for Data Science.

Software that needs to be installed on your computer

  • Anaconda Python Distribution
  • git version control. It is recommended that you install git via Anaconda. As soon as you have installed Anaconda, open the Anaconda console and type conda install -c anaconda git. (If you use the default installer for git on Windows, you’ll end up with two different command consoles that use different shells and have different path settings. This would make the workflow described below less coherent.)
  • Alternatively, if you run Linux, you may use your distribution packages for all software. A preliminary list of the relevant software is git, spyder, pytest, sympy, and scipy. These packages should pull in all necessary dependencies. Please check for actual naming of the distribution packages.

Homework submission process

  1. The URL to the assignment will be published to the class.

  2. If you follow the assignment URL, Github will ask you to accept the assignment. It will also ask you to associate your user name on Github with your real name. Once you accept the assignment invitation, it creates a repository for you and gives you the URL to your own repository.

  3. You can follow your repository URL to view the assignment. However, to do actual work, you will need to clone the repository to your own computer.

  4. Open an Anaconda console (with alternative installations: any console that has git in its path settings). Navigate, if necessary, to an appropriate folder using cd foldername, and type, inserting the URL given by Github:

    git clone https://github.com/KU-MIDS-MO/abcd
  1. Open Spyder, navigate to the new folder created in the previous step, and do your work.

  2. To run the tests locally on your machine, go back to the console and type

    pytest
  1. When all tests pass locally (or you run out of time), type
    git commit -am "Assignment is done"
  git push
  1. Github will then rerun all tests and assign 5 points for each unit test that passes without error. (As of now, there is no possibility for assigning partial credit, so that requests for partial credit will not be considered.)

Main Textbook

  • M.T. Goodrich, R. Tamassia, and M.H. Goldwasser, Data Structures and Algorithms in Python, Wiley, 2013 (“GTG”)
  • R. Johansson, Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib, Apress, 2019 (“Joh”)

Supplementary Reading

  • J.V. Guttag, Introduction to Computation and Programming Using Python, third edition, The MIT Press, 2021

  • C. Hill, Learning scientific programming with Python, second edition, Cambridge University Press, 2020

Grading

  • The grade is determined from your solutions to the weekly assignments. Submission must be through Github Classroom as described below (“portfolio grade”)

Topics

Oct 17, 2022

Introduction

Oct 18/19, 2022

Variables, statements, conditional, loops, functions; integers and floats; equality vs. identity; selected operators, in particular division, floored integer division, and modulo (selected topics from GTG Sections 1.2-1.5)

Oct 25, 2022

Euclidean algorithm (see notes); floating point and machine epsilon

Oct 26/27, 2022

Discussion of Assignment 1, Hints for Assignment 2

Nov 2/3, 2022

Lists, tuples, strings, dictionaries (GTG pp. 9-11, 14-16)

Nov 8, 2022

Discussion of Assignment 2; Iterators and Generators (GTG 1.8), scope, functions as first-class objects (GTG 1.10), simple Python exceptions (parts of GTG 1.7)

Nov 9/10, 2022

Hints for Assignment 4

Nov 15, 2022

Discussion of Assignment 3; Goals of object-oriented programming (GTG 2.1, also see 2.2 for background reading)

Nov 16/17, 2022

Class definitions (GTG 2.3)

Nov 22, 2022

Inheritance (GTG 2.4)

Nov 23/24, 2022

Discussion of Assignment 4, hints for Assignments 5+6

Nov 29, 2022

Discussion of Assignment 5, file handling, pickling, also see this chapter

Nov 30/Dec 1, 2022

No class

Dec 6, 2022

Discussion of Assignment 6, Hints for Assignment 7

Dec 7/8, 2022

Introduction to numpy: array indexing, fancy indexing (see notes on Numpy indexing by Andrei Caragea, for an broader overview, see this Introduction to the Scipy Stack)

Dec 13, 2022

Introduction to plotting using matplotlib

Dec 14/15, 2022

Least square fitting and simple root solving (see notes by Andrei Caragea)

Dec 20, 2022

Random numbers

Dec 21/22, 2022

Applications: Random walk and simple Monte Carlo (see notes by Andrei Caragea)

Jan 10, 2023

Matrix arithmetic and an application to graph adjacency matrices

Jan 11/12, 2023

Solving linear systems (see notes by Andrei Caragea)

Jan 17, 2023

Hints for Assignment 10/11; Matrix norm and condition number (see Joh, Chapter 5, Section “Square Systems”; the Wikipedia entries on the matrix Norm and condition number are very comprehensive, but go far beyond what is required here)

Jan 18/19, 2023

Hints for Assignment 10/11; Ill-conditioned linear systems (Example from Joh, Chapter 5, Section “Square Systems”)

Jan 24, 2023

Inner and outer products (for outer products using broadcasting, see Example 4 from Array Broadcasting in Numpy); Interpolation and the Vandermonde matrix, construction of the Vandermonde matrix in Numpy as a generalized outer product; Discussion of solutions to Assignment 10/11

Jan 25/25, 2023

Eigenvalues and eigenvectors (informal introduction), eigenvalues in Numpy, computation of the dominant eigenvalue/eigenvector pair using power iteration, connection to matrix norm and condition number

Jan 31, 2023

Introduction to sympy (Joh, Chapter 3; there are many good tutorials, this one is very well written, some of the more advanced examples can be ignored for now; there are others)

Feb 1, 2023

no class, please attend the session on Feb 2

Feb 2, 2023

Hints for Assignment 13, discussion and presentation of past assignments (start)

Feb 7–9, 2023

Discussion and presentation of past assignments (ctd.)