Last updated May 2025

by Jason Josephson

# Fundamentals of Python

Before doing much with Python, we'll cover the basics of how to write simple code. Python code is designed to be intuitive to read and write, so this isn't too hard.

## II. Modules, conditionals, lists, and loops



Right now, you can't really do much with what we've covered. These next tools will allow you to work with data and do things you couldn't (or at least wouldn't want to!) with either a calculator or by hand.

### i. Importing modules

**Modules** are one of the most crucial parts of Python. Modules are files (.py extension) that contain objects such as functions. You can upload and download modules, usually very easily and quickly. This facilitates the existence of a Python community in which functions and projects can be easily shared and modified. **Libraries** are collections of modules.

The result of all this is a large number of software packages and projects which you can quickly download and use for free. This is one of the major advantages of the Python language.

There are **built-in** modules that will be included when you download Python (or use Google Colab, etc.). One of these is the `random` module, which has, among others, the `randint(a,b)` function that returns a random int from a to b (inclusive for both, i.e. $[a,b]$).

Unlike built-in functions like `print()`, functions contained in modules, even built-in modules, need to be imported into your current environment/session before you can access them. You can do this in a few ways.

In [None]:
import random

random.randint(1, 10)

If you use `import module` like above, you'll then have to call a function with `module.function()`; just using `function()` will raise an error.

In [None]:
randint(1,3)

If you instead use `from module import function`, you'll be able to call the function without the `module.` part, like so:

In [None]:
from random import randint

randint(4,6)

Let's try another function. In the `random` module, the `choice(list)` function randomly chooses an entry in the given list, while the `random()` function returns a random real number from 0 to 1. To import multiple modules on the same line, use a comma like so:

In [None]:
from random import choice, random

print(random())
print(choice(['a','b','c']))

You can also use the syntax `from module import *` to import all the functions in a module at once, as if you had written `from module import function1, function2, ...`, etc., for all functions. You'll then call the functions without the `module.` part.

This is often discouraged and should be avoided if you can. Suppose you're importing 2 modules and, unbeknownst to you, they both have a function with the same name, call it `func()`. Now when you run

`from module1 import *`

`from module2 import *`

The `func()` from `module1` will be overwritten by the `func()` from `module2`, and you won't be able to access `module1`'s `func()`--or worse, you might accidentally use the wrong function without noticing. If you use

`import module1`

`import module2`

you'll avoid this, since the expressions `module1.func()` and `module2.func()` are distinct. Also, if you don't need, say, `module2.func()`, you can use

`from module1 import func`

`from module2 import whatever_other_functions_you_wanted`

and there will be no overwriting of `module1`'s `func()`.

And this goes for whatever other objects are in a module. Generally, you want to keep your **namespace** clean, that is, the functions, variables, etc. defined in your environment should be kept under control, not cluttered up with objects you don't know about. `from module import *` goes against this.

#### **Exercises**

---
1. The built-in `math` module has a number of useful functions. Use its `ceil()` function to evaluate $\lceil 3.1 \rceil$. Use its `factorial()` function to evaluate $10!$ . A list of all its functions can be found [here in the official Python documentation](https://docs.python.org/3/library/math.html).



### ii. Conditional statements

**If statements** do something only *if* a certain statement is true. Start by writing `if`, followed by the condition, followed by a colon `:`. The next line will be executed only if the condition is true.

In [None]:
if 1 + 1 == 2:
  print('The if statement evaluated the expression as True.')

In [None]:
if 1 + 1 == 3:
  print('The if statement evaluated the expression as True.')

Notice that the line after the colon is indented. We've already seen that in the definition of functions, an indent must be used before every line that is to be a part of the function. With if statements, the indent is also extremely important: all the indented lines underneath the if statement will be executed if the condition is true and ignored if false. These indented lines can be referred to as a "block" of code. The first line which is NOT indented marks the end of the if statement's code block and has nothing to do with the truth value of the condition.

In [None]:
if 1 + 1 == 2:
  print('abc')
  print('def')

In [None]:
if 1 + 1 == 3:
  print('abc')
  print('def')

In [None]:
if 1 + 1 == 3:
  print('abc')
  print('def')
print('ghi')

`print('ghi')`, not being indented, had nothing to do with the if statement, and so it was run regardless of the falsity of the condition.

In some other programming languages, indentations are more flexible than in Python because braces (`{}`) or other keywords are used to surround code blocks in, e.g., if statements or function definitions. For better or worse, Python uses indentations instead, so these can't be used as you please.

Now, what if you want to run one block of code if a condition is true, but if it's false, you want to run a different block of code? Use `else`.

In [None]:
if 1 + 1 == 2:
  print('abc')
else:
  print('def')

In [None]:
if 1 + 1 == 3:
  print('abc')
else:
  print('def')

Notice that the `else:` goes on the same indent level as the `if`, and the `else` block is indented under the `else` keyword, just like the `if` block is indented under the `if` keyword.

What if you want more than one condition? If condition A is true, run code block A; else if condition B is true, run block B; else if condition C is true, run block C, etc. For this, use `elif`.

In [None]:
if 1 + 1 == 3:
  print('abc')
elif 1 + 1 == 4:
  print('def')
elif 1 + 1 == 2:
  print('ghi')

In [None]:
if 1 + 1 == 2:
  print('abc')
elif 1 + 1 == 3:
  print('def')

But be careful: the whole if/elif block stops as soon as one condition is true:

In [None]:
if 1 + 1 == 2:
  print('abc')
elif 2 + 2 == 4:
  print('def')
elif 3 + 3 == 6:
  print('ghi')

`2 + 2 == 4` and `3 + 3 == 6` are true, but those statements were never even evaluated, because the first statement was true. The whole group of `if`/`elif`/... statements is evaluated one by one from top to bottom, and as soon as one of the statements is true and the corrseponding block run, the rest are skipped.

If you want to check multiple conditions, use multiple separate `if` statements:

In [None]:
if 1 + 1 == 2:
  print('abc')
if 2 + 2 == 4:
  print('def')
if 3 + 3 == 6:
  print('ghi')

And you can also use `else` at the end of a group of `if`/`elif`/`elif`/... statements. The `else` block runs if none of the `if` or `elif` conditions are true:

In [None]:
if 1 + 1 == 3:
  print('abc')
elif 1 + 1 == 4:
  print('def')
elif 1 + 1 == 123:
  print('ghi')
else:
  print('All failed')

Further, you can **nest** `if` statements inside each other:

In [None]:
def identifier(formula, isAlcohol, isPrimary):
  if formula == 'C3H8O':

    if isAlcohol:              # note the indentation

      if isPrimary:            # note the second indentation
        print('1-propanol')
      else:
        print('isopropanol')

    else:                   # note the de-indentation
      print('methyl ethyl ether')

  else:                     # note the second de-indentation
    print('NA')

In [None]:
identifier('C4H10O', True, True)
identifier('C3H8O', True, False)
identifier('C3H8O', False, False)
identifier('C3H8O', True, True)

In this case, the `if alcohol:`/`else:` group is only run if the `formula == 'C3H8O'` condition is met. And the `if primary:`/`else:` group is run only if `alcohol` is `True`. We could represent this with the **decision tree**:

![Binary decision tree displaying use of conditionals.](https://jasondjosephson.com/conditional.png)

Many layers of nesting, however, might indicate that your code could be designed in a better way. It may not be the easiest to read, either. And on a similar note, you can imagine that having many `elif` statements can look very cumbersome. There may be ways to get around having to use them, depending on what you need to do.

Lastly, you can use "switch" statements in place of `if`/`elif`/`else` statements. These, however, were only added in Python 3.10. To use a switch statement, you use `match` and `case` as follows:

In [None]:
country = 'Vietnam'
# first we'll show an if/elif/else:

if country == 'China':
  print('CH')
elif country == 'Indonesia':
  print('IN')
elif country == 'Thailand':
  print('TH')
elif country == 'Vietnam':
  print('VI')
else:
  print('Other')

In [None]:
# now an equivalent switch statement:

match country:
  case 'China':
    print('CH')
  case 'Indonesia':
    print('IN')
  case 'Thailand':
    print('TH')
  case 'Vietnam':
    print('VI')
  case _:
    print('Other')

The switch statement is cleaner looking, since we didn't have to write `country ==` for each `case` like we did with `elif`. But just remember that it won't work for someone using a version of Python <3.10.

Notice that the equivalent of `else:` is `case _:`, which is run if none of the previous cases match. You don't have to include it, just like you don't have to include `else`:

In [None]:
match 1 + 1:
  case 2:
    print(':)')
  case 3:
    print(':(')

#### **Exercises**

---
1. Suppose you are dealing with a new user's username. The username must have no capital letters, and it must be alphanumeric (i.e., only contains letters and numbers). Write a function that takes the string argument `username` and does the following:

  (1) If `username` is not alphanumeric, print "ERROR: USERNAME MUST BE ALPHANUMERIC", and return `False`;
  
  (2) otherwise, if the letters in `username` are all lowercase, print "USERNAME ACCEPTED" and return the username unchanged;
  
  (3) otherwise, return the same string but with all capital letters replaced by lowercase ones, and warn the user by printing "WARNING: ALL UPPERCASE LETTERS CHANGED TO LOWERCASE".

### iii. Lists and similar data structures

Say we have a bunch of names, or a bunch of C-C bond lengths, etc. Naturally, we often want to collect and store all of this data under one variable. We could certainly try to do this with strings or some such, but it would be much more convenient to have a data structure which is specifically designed to hold our data. These are **lists**.

Lists are indicated with square brackets. Entries in the list are separated by commas, and these entries can be of whatever data type.

In [None]:
a = [1, 2, 'dendrite', True, 0.002]
type(a)

Lists can contain lists; they can even contain functions:

In [None]:
sub_list = ['a', print]
super_list = [2, sub_list, 'Xunzi']

print(super_list)

To access an entry in the list, use **indexing**. Index an item in the list by writing the entry's position (an int) in square brackets next to the list, e.g., if `a` is a list, `a[3]` is the entry in position 3 of the list.

In [None]:
books = ['Apology', 'Phaedo', 'Meno', 'Symposium']
print(books[3])

Indexing in Python (usually) starts counting at 0, not 1. So the entry labelled `3` was the 4th entry, and the entry labelled `0` is the 1st:

In [None]:
print(books[0])

You can also index multiple entries at once: these are **slices**. The syntax is `list[n:m]`, where *n* is the starting index and *m* is the ending index. But the *m*th position will *not* be included: that is, positions *n* through *m*-1 will be included:

In [None]:
print(books[1:3])

If you just write `list[:m]`, the result is the same as `list[0:m]`, that is, the slice is from position 0 to position *m*-1. If you write `list[n:]`, the slice will be from position *n* to the end of the list, including the final position.

In [None]:
print(books[:3])
print(books[2:])

The syntax `list[n:m:p]` will return a slice from *n* to *m*-1 positions, *but* will only include every *p* positions:

In [None]:
numbers = [0,1,2,3,4,5,6,7,8,9]
print(numbers[0:10:2])
print(numbers[0:10:3])

If you want to slice every *p* positions but for the whole list, then just use `list[::p]`. The missing `n` implies 0 and the missing `m` implies going to the end of the list.

In [None]:
print(numbers[::4])

Negative integers can be used to index starting from the final position. -1 is the final position, -2 the second last, -3 the third last, etc.

In [None]:
books = ['Apology', 'Phaedo', 'Meno', 'Symposium']

print(books[-1])
print(books[:-2])
print(books[-4:])
print(books[-3:-1])
print(books[:-1:2])

Negative integers can also be used to slice every *p* positions, but going from the last element to the first:

In [None]:
print(books[::-1])
print(books[::-2])

Recall the `in` operator. We used it for substring searching before, but we can also use it to check if an object is an entry in a list.

In [None]:
print('Apology' in books)
print('Nicomachean Ethics' in books)

if 'Apology' in books:
  print('Yes')

To add an entry to the end of the list, you can use the `append()` method:

In [None]:
numbers = [234, 21, -34, 2]

print(numbers)

numbers.append(233)

print(numbers)

To remove an entry, use the `remove()` method:

In [None]:
numbers = [234, 21, -34, 2]

print(numbers)

numbers.remove(21)

print(numbers)

Only the first matching entry will be removed:

In [None]:
letters = ['a', 'b', 'a']

print(letters)

letters.remove('a')

print(letters)

What if you want to remove an entry by position, without knowing the value at that position? One solution is the `pop()` method:

In [None]:
numbers = [234, 21, -34, 2]

print(numbers)

numbers.pop(2)

print(numbers)

Actually, the `pop()` method not only deletes the entry, but returns it, so you can, e.g., assign it to a variable:

In [None]:
numbers = [234, 21, -34, 2]

deletedNum = numbers.pop(2)

print(numbers)
print(deletedNum)

You can also use `del` to delete entries by index or slice:

In [None]:
numbers = [234, 21, -34, 2]

print(numbers)

del numbers[0]

print(numbers)

del numbers[0:2]

print(numbers)

`del` is, in fact, a way to delete objects in general. This can be useful if you need to free up memory.

In [None]:
string = 'If the book we\'re reading doesn\'t wake us up with a blow to the head, what are we reading for?'
del string
print(string)

Since we `del`eted the variable, it was destroyed, and `print` couldn't print it.

There are other data structures that are similar to lists. **Tuples** are indicated with parentheses `()` instead of square brackets. Lists can be modified, but tuples cannot be.

In [None]:
a = [1,2,3]
b = (1,2,3)
print(type(a), type(b))

In [None]:
a = [1,2,3]
a.append(4)
print(a)

In [None]:
# but this doesn't work on tuples

a = (1,2,3)

a.append(4) # this will raise an error

 Tuples are valuable because they are more memory efficient, so you ought to use them if you know you don't need to change a list.

Lists and tuples can be interconverted with the `list()` and `tuple()` functions.

In [None]:
a = [1,2,3]
print(type(a))

b = tuple(a)
print(type(b))

c = list(b)
print(type(c))

**Sets** are indicated with curly braces `{}`. Their entries have no order. Importantly, they cannot contain any duplicate entries.

In [None]:
a = {3, 3, 2, 1, 3, 3}

print(a) # when we print, we'll see that the duplicate 3's are gone
         # there is no order, so print() will display the entries
         # differently from the order we listed them in

In [None]:
a[0] # since there's no order, we can't index, so we get an error

Sets can also interconvert with lists and tuples using the `set()` function.

In [None]:
a = [1,3,3]
b = set(a)
print(a,b)

The `len()` function can also be used on lists, tuples, and sets, returning the number of entries in them.

In [None]:
a = [1,2,3]
b = (1,2,3)
c = {1,2,3}

print(len(a), len(b), len(c))

Suppose you have a list which contains duplicate elements. If you want to find the number of **unique** entries in the list, you can't just use `len()` on the list, or you'll get the number of all entries, including duplicates. Here's a quick solution:

In [None]:
a = [1, 2, 3, 3, 3]
print(len(a))

b = set(a)
print(len(b))

If you wanted a list with duplicate entries removed, you could then convert the set back to a list:

In [None]:
a = [1, 2, 2, 3, 3, 3]
b = set(a)
c = list(b)
print(c)

In [None]:
# we can also do this on one line

a = list(set([1, 2, 2, 3, 3, 3]))
print(a)

# which looks better, one line or more, is up to your predilections

**Dictionaries** (dicts) are also indicated with curly braces and are ordered according to the order you specify when you make them. Dicts contain pairs of objects. Each pair is made of a **key** and a **value**. Key-value pairs don't have a position, so you don't index by position, like lists and tuples. Rather, indexing a dict with a key returns the corresponding value.

In [None]:
# dict syntax is {key:value, key:value, key:value}

address_book = {'Mary':'123 Fake St.', 'John':'43 Forgery Way', 'Jin':'1 False Lane'}

# now we can look up a name and return the corresponding address:

print(address_book['John'])

In [None]:
# we can't go in reverse, looking up an address to get a name:

print(address_book['123 Fake St.'])

Imagine we have a dict like the following:

`age_register = {'Mary':23, 'John':34, 'Jin':24, 'Mary':31}`

Now suppose we try `age_register['Mary']`. What will Python do? There are *2* "Mary"s, so how does Python know which to look up?

In [None]:
age_register = {'Mary':23, 'John':34, 'Jin':24, 'Mary':31}
age_register['Mary']

It doesn't, so the first "Mary" was overwritten. You *can't* have multiple keys which are the same. But you can have multiple *values* which are the same.

In [None]:
a = {'A':2, 'B':2}

print(a['A'], a['B'])

You might think of dictionaries like mathematical functions, $f(key) = value$, since each key (each $x$) is mapped to just 1 value (one $y$).

We can get the keys or values of a dict using the `keys()` and `values()` methods.

In [None]:
print(a.keys(), a.values())

Note also this way of creating dicts with the built-in `dict()` function:

In [None]:
# two ways to make a dict
dict1 = dict(A = 1, B = 2)
dict2 = {'A':1, 'B':2}

print(dict1 == dict2)

This has been quite a lot. Let's summarize:

*   Lists `[]` can be altered by `.append()`, `.remove()`, etc.
*   Tuples `()` are like lists but immutable
*   Sets `{}` have no duplicates and no order to their elements
*   Dictionaries `{key:value}` match keys to values; you access a value with the corresponding key

When you go to use one of these data structures, always ask: Is this the most appropriate structure for the job? Your data structure should always be suited for your data. You might make things pointlessly complicated if you use lists where you might've used a dict.

#### **Exercises**

---
1. Suppose you've done a survey of people's universities and received the `responses` in the list below.

  a. `count()` is a list method that returns the number of entries in a list. For instance, `['a','b','c','b'].count('b')` returns `2`. Use it to find the number of responses saying "University of Ottawa".

  b. Find the number of different universities, i.e., without duplicate answers.

2. Using the list `sciences` below, create the list `reverseSciences` which is the same list in reverse order. Don't write any strings out manually.

In [None]:
responses = ["University of Ottawa", "Carleton University", "University of Toronto",
             "University of Toronto", "University of Ottawa", "Carleton University",
             "Queen's University", "University of Ottawa", "Dalhousie University",
             "University of Toronto", "McGill University", "Carleton University"]



In [None]:
sciences = ['chemistry', 'biology', 'physics', 'geology', 'astronomy']
reverseSciences =

### iv. `for` loops

When you think of using a computer program to do a task, you may think of getting it to do something repetitive. If you have, say, a million equations to evaluate, a human would take forever to do this, while a computer could do so in no time. **Loops** are used in Python (and other languages) to execute a block of code multiple times.

We'll look at **for loops** first. Suppose we have a list. We want to print out a sentence followed by an item in the list. And we want to repeat this task for every item in the list. We could do so manually:

In [None]:
myList = ['tetrahedron', 'cube', 'octahedron', 'icosahedron', 'dodecahedron']

print('The ' + myList[0] + ' is a Platonic solid.')
print('The ' + myList[1] + ' is a Platonic solid.')
print('The ' + myList[2] + ' is a Platonic solid.')
print('The ' + myList[3] + ' is a Platonic solid.')
print('The ' + myList[4] + ' is a Platonic solid.')

Copying and pasting the same code five times is boring, cumbersome, and a waste of space. We might as well have not even bothered with the list and just written the strings separately.

A for loop can save us:

In [None]:
polyhedra = ['tetrahedron', 'cube', 'octahedron', 'icosahedron', 'dodecahedron']

for poly in polyhedra:
  print('The ' + poly + ' is a Platonic solid.')

We say that the for loop **iterates** over the list. That is to say, it takes the first element of the list, runs a code block with it, then takes the second element of the list, runs the same block with it, then takes the third element..., and so on.

In this case, it assigns the variable `poly` to the first element of the list, executes the code in the indented block below it, then starts over, assigning `poly` to the second element, executing the block, starting over again by assigning the third element to `poly`, etc., until there are no more items in the list to use.

The syntax is straightforward: `for` is followed by the variable to be assigned to the list item used for the current iteration; this is followed by `in` and then the list, and the line ends with a colon. Like the `if`/`else` syntax, the indentations matter: all the indented lines below the initial line are part of the block that will be executed as part of the for loop. The whole block will be executed for every iteration:

In [None]:
for n in [1, 2, 3, 4]:
  print('Hi there!')        # indented, part of the for loop
  print(n)                  # indented, part of the for loop
print('Bye')                # not indented, so not part of the for loop, will only execute once

In fact, you don't actually need to include the variable in your execution block. The loop will still iterate over all the items in the list:

In [None]:
for n in [1, 2, 3, 4]:
  print('Hi there!')          # the variable 'n' is not in here, but the
                              # block will still run 4 times

If you do this, the effect is to run the execution block *x* times, where *x* is the length of the list. (Of course, this is also true if you do include the iterated variable in the block. But you don't *need* to include it if you don't want to.)

Here's another useful built-in function: `range()`. It will return a range of integers to be iterated over. If you pass only one argument, e.g. let's call it *n*, it will return integers from 0 to *n*-1:

In [None]:
for i in range(5):
  print(i)

If you pass two arguments, say, *n* and *m*, it will return a range from *n* to *m*-1:

In [None]:
for i in range(63,68):
  print(i)

If you pass three arguments, say, *n* and *m* and *p*, it will return a range from *n* to *m*-1, but only every *p*th number:

In [None]:
for i in range(1, 11, 2):
  print(i)

In [None]:
for i in range(1, 11, 3):
  print(i)

With `range()`, then, you can easily tell a for loop to iterate any number of times. You can repeat something literally a million times:

In [None]:
counter = 0                 # our variable starts at 0

for n in range(1000000):
  counter = counter + 1     # we'll add 1 to the current number, overwriting with the new number

print(counter)              # the result of our for loop: adding 1 a million times

Notice the line `counter = counter + 1`. We can use lines like this to keep count of how many times a for loop has iterated. There's an easier way to write this, however:

In [None]:
x = 0
y = 0

x = x + 1   # longer way to add 1 to the current variable
y += 1      # shorter way

print(x, y)

The `+=` operator took the current value of the variable and added the number on the right side of it. There are other operators that do corresponding operations:

In [None]:
y = 10

y -= 2

print(y)

In [None]:
y = 4

y *= 3

print(y)

In [None]:
y = 10

y /= 2

print(y)

Now, let's say we're iterating over a list, but we want to keep track of what position the current item is in the list. Perhaps we want to execute some code `if` the item is in a certain position. One way to do this would be:

In [None]:
citiesJapan = ['Tokyo', 'Sendai', 'Osaka', 'Sapporo', 'Minamisanriku', 'Tome']

counter = 1      # we'll use this variable to keep track

for city in citiesJapan:

  # print every second city, that is, print if counter is divisible by 2
  if counter % 2 == 0:
    print(city, "is city number", counter, "on our list.")

  # increase counter by one for the next iteration
  counter += 1

Another way to keep track of the iteration number is to use `enumerate()`. Simply use `enumerate(yourList)` instead of `yourList` in the `for` line. You'll also have to specify two variables after `for`, the first to assign to the iteration number, the second to assign to the list item.

In [None]:
citiesJapan = ['Tokyo', 'Sendai', 'Osaka', 'Sapporo', 'Minamisanriku', 'Tome']

for counter, city in enumerate(citiesJapan):

  if counter % 2 == 0:
    print(city, "is city number", counter, "on our list.")

Just remember that the numbers from `enumerate()` will start at 0, not 1! If we want to get around this, though, it's easy:

In [None]:
citiesJapan = ['Tokyo', 'Sendai', 'Osaka', 'Sapporo', 'Minamisanriku', 'Tome']

for counter, city in enumerate(citiesJapan):

  if (counter + 1) % 2 == 0:
    print(city, "is city number", counter + 1, "on our list.")

Again, when you iterate over `enumerate()`, you use 2 variables in the `for` line: `for var1, var2 in enumerate(list)`. Let's generalize this. Suppose you have a list where each element itself is a list with 2 elements:

In [None]:
# you can use a line break after each entry when making a list
# if the list is long, this can make things look much nicer and more readable
statesCaps = [['Canada', 'Ottawa'],
              ['China', 'Beijing'],
              ['Libya', 'Tripoli'],
              ['Philippines', 'Manila']]

In [None]:
# we could do this:

for element in statesCaps:
  print(element[0], element[1])

In [None]:
# but often it's neater and more readable to use
# multiple variables in the for loop:

for state, capital in statesCaps:
  print(state, capital)

This is especially true when the block of code in the loop is long and convoluted. Reading `element[0]` is less informative than a variable named `state`, and when there are lots of variables and other objects to keep track of, informative variable names are important, as we said earlier.

In [None]:
# you can do this for as many variables as you want

longlist = [['a','b','c','d','e'],
            ['A','B','C','D','E']]

for one,two,three,four,five in longlist:
  print(one, two, three, four, five)

In [None]:
# BUT your iterable cannot be "jagged", with different levels/dimensions having
# different numbers of elements

# for instance, this list is NOT jagged; it has 3 elements, each of which
# has 2 elements

nonJagged = [['a','b'],
             ['c','d'],
             ['e','f']]

# this list IS jagged, since it has 3 elements, but these elements do not
# all have the same number of elements themselves (i.e., the first has 1,
# the second has 3, the third has 2)

jagged = [['a'],
          ['b','c','d'],
          ['e','f']]

In [None]:
for x,y in nonJagged:
  print(x,y)

In [None]:
for x,y in jagged:
  print(x,y)

You can deal with the jagged array like so:

In [None]:
for x in jagged:
  print(*x)

The little asterisk `*` "unpacks" the list `x`, regardless of what size `x` is, so `print(*x)` will work for lists with variable numbers of elements. Unpacking allows you to use list elements as arguments to a function without explicitly writing them. That is, `function(*x)` is equivalent to `function(x[0], x[1], x[2], ... x[n])` for a list of length n. We could call it "length-agnostic", useful for when we don't know the length of a list beforehand.

Another very useful technique is the `zip()` function:

In [None]:
# suppose you have two lists

molecule = ['methanol', 'formaldehyde', 'water', 'acetic acid']
SMILES_code = ['CO', 'C=O', 'O', 'CC(=O)O']

# suppose you want to print out item 1 from list 1 and item 1 from list 2,
# then item 2 from list 1, item 2 from list 2, and so on
# you could do this:

for idx, mol in enumerate(molecule):
  print(mol, SMILES_code[idx])

In [None]:
# but it's neater to use zip()

for mol, smiles in zip(molecule, SMILES_code):
  print(mol, smiles)

`zip()` creates an object in the form

`((list1_item1, list2_item1), (list1_item2, list2_item2), ...)`.

And `zip()` works with more than 2 lists.

Now, can you put for loops inside of for loops? Of course:

In [None]:
for i in range(3):
  for j in range(3):
    print(i, j)

With these **nested** for loops, the inner (indented) for loop goes through its list entirely, then the outer for loop iterates once, then the inner loop goes through its whole list again, and so on. In fact, it's like we were dealing with a matrix with rows and columns. The `i` variable was the row number, while the `j` variable was the column number. Look:

In [None]:
for i in range(3):
  for j in range(3):
    print("row " + str(i) + ", column " + str(j))

But be careful: using for loops to modify lists can be tricky. If you run the following block of code, you might think that it would result in a list of `[1, 2, 3, 2, 4, 6]`. However, since the list keeps getting longer, the iterations will never end, and the code will run forever. In Google Colab or Jupyter Notebook, hit the stop (square) button to stop the code; in many Python terminals/shells, you can hit Ctrl+C to stop running code.

In [None]:
myList = [1, 2, 3]

for n in myList:
  myList.append(n*2)

Removing items, especially by position, can also have problems. Trying to remove every second entry by the following will not work:

In [None]:
myList = ['a', 'b', 'c', 'd']

for idx, letter in enumerate(myList):

  if idx % 2 == 0:
    myList.pop(idx)

print(myList)

Often, it's better to just make a new list and add the appropriate items without altering the original list:

In [None]:
myList = ['a', 'b', 'c', 'd']
myList2 = [] # new empty list

for idx, letter in enumerate(myList):

  if idx % 2 != 0:
    myList2.append(letter)

print(myList2)

You can always `del` the old list if you want to clean up.

So far, we've only iterated over lists. But you can iterate over many objects:

In [None]:
# strings

for x in 'Bauhaus':
  print(x)

In [None]:
# sets
# (note that the order specified when building the set is not obeyed when iterating)

for x in {'aaa', 1, 2, 'xyz'}:
  print(x)

In [None]:
# dicts
# when you iterate over a dict, you automatically iterate over the keys

myDict = {'a':1, 'b':2, 'c':3}

for x in myDict:
  print(x)

# i.e., the above gives the same as below:

for x in myDict.keys():
  print(x)

In [None]:
# you could also iterate over dict values:

for x in myDict.values():
  print(x)

In [None]:
# to iterate over keys and access values within the loop, use the keys to index the dict:

for key in myDict:
  value = str(myDict[key])
  print("Value of the key", key, "is", str(value) + "!")

In general, an object which can be iterated over is an **iterable**. There are objects which only exist to be iterated over. The output of `range()` isn't actually a list; it's a particular object made for iterating over.

In [None]:
a = [1, 2, 3, 4, 5]
b = range(1,6)

print(type(a), type(b))
print(a, b) # note the differences in print() output!

We're almost done with for loops. But here's something neat. Suppose you want to construct a list with a for loop:

In [None]:
squares = []

for n in range(5):
  squares.append(n**2)

print(squares)

There's a faster and more elegant way to do so: **list comprehensions**.

In [None]:
squares = [n**2 for n in range(5)]

print(squares)

The syntax is hopefully apparent: `[x for item in iterable]` returns a list created by iterating over `iterable` with the variable `item` and running the code `x` each time, the result of which becomes an element of the new list.

You can also use conditionals:

In [None]:
# if statement goes at the end
odds = [n for n in range(10) if n % 2 != 0]

print(odds)

In [None]:
# but if you use "else" as well, the if and else statements go at the start
even = [True if n % 2 == 0 else False for n in range(10)]

print(even)

One final thing about for loops: Suppose you want to iterate over an object, but once a certain condition has been met, you want to immediately terminate the loop and move on. We could try, say:

In [None]:
# task: find the position of the first occurrence of 4
# in a list of 100 random ints
# (without using .index())

# list of 100 random integers
hundredInts = [randint(1,10) for x in range(100)]

found = False
for idx, x in enumerate(hundredInts):
  if x == 4 and not found:
    print(idx)
    # once we find the first number divisible, we set to True to avoid printing more
    found = True

We wasted time iterating over the whole list, when we could have just stopped after we got to the first occurrence. We also had to use that awkward `found` variable. The problem is that we don't know *a priori* when to stop printing. We need some way to stop the loop--break out of the loop--once a condition is met. This is `break`.

In [None]:
# task: find the position of the first occurrence of 4
# in a list of 100 random ints
# (without using .index())

# list of 100 random integers
hundredInts = [randint(1,10) for x in range(100)]

for idx, x in enumerate(hundredInts):
  if x == 4 :
    print(idx)
    break

I must concede that this example is contrived, since we can just use `.index()`, but such situations occur much more naturally in practice. A similar keyword is `continue`. It terminates the current iteration and immediately begins the next one:

In [None]:
for x in range(5):

  if x == 2:
    continue

  # when x == 2, this print statement won't be reached
  # but the loop will continue on to the next iteration (where x == 3)
  print(x)

We've covered a lot. This is an indication of how common iterating with `for` loops is.

#### **Exercises**

1. Use a for loop to print all multiples of 3 between 33 and 103, inclusive.

2. Using `zip()` and the list of chemical elements given, print the atomic number and symbol of an element if the element's name contains the letter 'e'.

3. Create a new dict such that the keys of the new dict are the values of `myDict` and vice versa. To add a new key-value pair to an existing dict, use `dict_variable[new_key] = new_value`.

4. a. Use list comprehension to create a list of multiples of 3 less than 60 which are NOT multiples of 5.

  b. Using that list, make a copy without even numbers. Do so two ways: with and without list comprehension.

5. a. Create a function `factor(n)` which `return`s a list of all the factors of a positive integer `n`. To do so, recognize that all factors will be in the range [1,`n`]. So you simply need to test whether each of these is a factor of `n`.

  b. Iterate over the tuple of numbers given (`nums`). Each time a number is a perfect number, the variable `a` should be doubled. The loop should stop as soon as a number divisible by 9 is reached. A perfect number is a number which is equal to the sum of its factors (excluding itself), e.g., 6 = 1 + 2 + 3. Note that `sum(list)` is a built-in function which will return the sum of a list of ints/floats.

6. The list of compounds given has mixed capitalization in the various strings.

   a. Use list comprehension to create a new list `lowerCompounds` with the same strings, only with all lowercase letters.

   b. Using `lowerCompounds`, create a second new list `lowerCompoundsNoDup` without duplicates.

   c. Iterate over all members of `lowerCompoundsNoDup`, printing the number of occurrences of each in `lowerCompounds`.

In [None]:
elements = ('hydrogen', 'helium', 'lithium', 'beryllium', 'boron', 'carbon',
            'nitrogen', 'oxygen', 'fluorine', 'neon', 'sodium', 'magnesium')
symbols = ('H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg')



In [None]:
myDict = dict(A = 1, B = 2, C = 3)

newDict = dict() # "initialize" new dict before filling it

# for loop goes here

In [None]:
nums = [123, 11, 233, 674, 233333, 6, 12, 34, 8128, 93, 345, 496, 134, 2, 113,
        394, 5967, 594, 2444, 28, 294, 91022, 4910, 5, 2105]



In [None]:
compounds = ['Triethylamine', 'benzyl bromide', 'ammonium chloride', 'triethylamine',
             'Tert-butyl alcohol', 'Benzyl Bromide', 'D-alanine', 'tert-butyl alcohol',
             'd-Alanine', 'd-alanine', 'Ammonium chloride', 'Scandium triflate']



### v. `while` loops

When we used a `for` loop with a conditional `break` statement, we were saying: "Iterate over this iterable, running the code block underneath for each iteration, until some condition is `True`, then stop the loop and move on." Suppose we want instead to say: "Run this code block again and again until some condition is `False`, then stop the loop and move on." This is similar, but we don't have an iterable to iterate over; we want the loop to continue indefinitely until the condition is false.

For instance, suppose we want to keep track of how many tries it takes to output a certain number from a random number generator. We don't know beforehand how many tries it will take, so `range()` isn't well suited for this. Really, `for` loops in general aren't well suited, since we could theoretically be looping forever, and we don't really need to iterate *over* anything anyway. We'll use a `while` loop:

In [None]:
var = 0

while var != 5:
  var = randint(1,10)
  print(var)

print('The loop is finished.')

The syntax is simple: `while condition:` is followed by an intended code block to execute. The loop will (1) evaluate the condition; if the condition is `False`, the loop stops, otherwise, (2) the indented code block is run, and (3) we go back to (1) and start over.

So in the above, we did:

![Binary decision tree corresponding to while loop.](https://www.jasondjosephson.com/whileLoop.png)

If you want to create a program that runs infinitely, `while` loops are your go-to. Simply make your condition always evaluate to `True`, the simplest way of which is:

In [None]:
# WARNING: IF YOU RUN THIS, THE CODE WILL NEVER STOP
# PRESS THE STOP BUTTON OR CTRL + M + I IF IN COLAB OR JUPYTER NOTEBOOK
# OTHERWISE PRESS CTRL+C TO STOP IT

while True:
  print('Press the stop button / ctrl+c to stop this code')

You might want to run a while loop forever, but only have the code run, say, every 5 seconds. We can use the built-in `time` module to pause code execution for n seconds with the `sleep(n)` function.

In [None]:
# AGAIN, YOU HAVE TO MANUALLY STOP THIS CELL
# OR IT WILL NEVER END

from time import sleep

loop_num = 0

while True:
  sleep(5)           # stop running for 5 sec, then continue
  loop_num += 1
  print(loop_num)

#### **Exercises**

1. Use the `sleep()` function of the `time` module to create a while loop which every 20 minutes prints a reminder to drink water.

2. As mentioned in iii., the `.remove(a)` method of a list only removes the first value of `a` in the list. Write a function `removeAll(l, a)` which uses a `while` loop and the `.remove()` method to remove all occurrences of `a` from the list `l`.

3. a. Initialize a variable `n` by assigning it to 0. Create a while loop such that in each iteration (1) a random integer, call it `step`, is chosen as -1 or 0 or 1, and (2) `step` is added to `n`. The while loop continues to run until `n` equals 5.

   b. It would be interesting to see how many iterations it takes to get to 5. Before the while loop, create a `counter` variable, initially 0. At the start of each iteration, increase the value of `counter` by 1. Then, after the loop is complete, print `counter`. Let's also include a precaution, since this could theoretically run forever: in addition to the existing condition, if `counter` exceeds 100,000, we'll stop the loop.

   c. We should find the median number of iterations it takes. Perform the following steps: Nest the while loop (and the initalizations of `n` and `counter`) within a for loop that runs 10,000 times. Outside of the for loop, initialize an empty list, `counterHistory`. Within the for loop, after the while loop finishes, `append` the result of `counter` to `counterHistory`, thereby recording the value of `counter` for that run. Lastly, import the `statistics` module and use its `median()` function, which takes a list as an argument and returns the median, to get the median value of `counterHistory`.

   d. How does the median value of `counterHistory` change if we stop the while loop when `n` equals 4 instead? How about if it must equal 10?