Python Lectures

Introducing Python

It allows for more flexibility than a GUI based GIS program.
Other GIS applications (e.g. QGIS, ArcGIS, postGIS) often have a Python interface.
generally a widely used language.

Environment

Before we get started we need to set up a Python environment in which to work. We will be using Anaconda for this. It should already be installed on the lab computers. If you are using your own computer

Download the environment.yml
If you are working on a lab computer In the start menu search for Anaconda. Open an Anaconda Powershell. (on linux or mac with Anconda or miniconda or Mamba installed just open a terminal)(If you are installing on your own computer I recommend Mamba)

cd ~
mkdir geog441
cd geog441
cp ~/Downloads/environment.yml .
conda env create -f environment.yml

What is this environment.yml thing?

name: geo
channels:
  - defaults
dependencies: 
  - geopandas
  - jupyterlab
  - rioxarray
  - tabulate
  - tqdm

Using conda env create with the environment yaml creates a new conda environment named geo with the dependencies installed. Next we will activate the geo environment.

conda activate geo

Notice that the conda environment shown after the prompt has changed from (base) to (geo).

If later you find that you need another package you can add it to the environment (while in the environment) with conda install <whatever_package>.

More info on managing conda environments

Starting a Jupyter Lab

jupyter lab

This will open a new jupyter lab in your browser.

Using VScode

Running Jupyter in VScode offers some advantages (in my opinion). To do this you need to install the jupyter extension and the Python Extension. Then when you open a file with a .ipynb extension it will be treated as a Jupyter notebook.

Basic Python data types

Type	Example(s)
String	`'Dude!'`
Float	`1.2`
Int	`3`
Tuple	`('x', 'y')` `(1, 2)` `('x', 3.2)`
List	`['x', 'y']` `[1, 2]` Possible but bad –> `['x', 3.2]`
Dict	`{'dogs': 26, 'cats', 100}`
etc…	there are others

Basic Numbers

Floats and Ints don’t do anything all that surprising

a = 2 + 2
b = 2.0 + 2.0
c = a + b


print(f'a is an {type(a)}')
print(f'b is a {type(b)}')
print(f'c is a {type(c)}')
print(f'a / b is {a / b}')
print(f'5 / 4 is {5 / 4}')
print(f'but 5 // 4 is {5 // 4}')
print(f'and 5 % 4 is {5 % 4}')

a is an <class 'int'>
b is a <class 'float'>
c is a <class 'float'>
a / b is 1.0
5 / 4 is 1.25
but 5 // 4 is 1
and 5 % 4 is 1

Exercise - Basic Numbers

Try the following, print the results, see what they do.

a = 5 // 4
b = 5 % 4
c = a + b

Sequences

Lists, tuples, and strings ae all sequences

# a list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# a tuple
a_tup = (1, 2, 3, 4, 5, 6, 7, 8, 9)

# a string
a_str = 'Wil je graag een neushorn?'

# access by index
a = a_list[0]
b = a_list[-1]
c = a_list[4]
print(f'by index:\n 0 --> {a},\n-1 --> {b},\n 4 --> {c}')

# works fro strings to
print('string item at 4 --> ', a_str[4])

# you can slice a list or tuple(remember 0 indexed)
print('\nslices:')
print(a_list[2:5])
print(a_list[8:])
print(a_tup[2:5])
print(a_str[-9:])

by index:
 0 --> 1,
-1 --> 9,
 4 --> 5
string item at 4 -->  j

slices:
[3, 4, 5]
[9]
(3, 4, 5)
neushorn?

Slicing
Sequences tutorial
Sequence tutorial as video

Exercise - Lists and Tuples

The key between lists and tuples is that tuples are immutable. Try assigning a value to both of the following.

# a list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# a tuple
a_tup = (1, 2, 3, 4, 5, 6, 7, 8, 9)

Dictionaries

pets = {'honden': 26, 'katten': 100}

for key, val in pets.items():
  print(f'{val} {key}')

print(f'{pets["honden"] + pets["katten"]} huisdieren')

26 honden
100 katten
126 huisdieren

Iterating

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# use a loop to append squared values to new lists
b = []
for n in a:
  b.append(n**2)

# or, better, use a comprehension
c = [n**2 for n in a]

print(b)
print('is the same as')
print(c)

[1, 4, 9, 16, 25, 36, 49, 64, 81]
is the same as
[1, 4, 9, 16, 25, 36, 49, 64, 81]

Exercise - Iterating

You can nest loops.

numbers = []
for i in range(3):
  for j in range(3):
    numbers.append((i, j))

numbers

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

Recreate the above with a list comprehension.
replace the square brackets in this list comprehension, = [n**2 for n in a], with parenthesis.
Can you create a dictionary with a comprehension.

Functions

Defined with def
Can take arguments
can return something
have there own scope

import numpy as np

def root_mean_square(list_of_numbers):
  rms = np.sqrt(
    np.array([n**2 for n in list_of_numbers]).sum() /
    len(list_of_numbers)
    )

  return rms

l = [1.2, 3.3, 1.4, 1.4, 4.2]
rms = root_mean_square(l)

print(rms)

2.6034592372457075

They do not have to have arguments or return stuff.

def complain():
  print('This is boring!\nwhen will we get to the GIS stuff.\n\U0001F620\U0001F620\U0001F620\n\nSoon! Bear with me.')

complain()

This is boring!
when will we get to the GIS stuff.
😠😠😠

Soon! Bear with me.

Python is an object oriented programming language. Everything that you can assign to a variable (which is almost everything) can be though of as an object, for example, built in functions are objects.

zzz = print
zzz('toenails')

toenails

Many objects have built in functions as an attribute, these are called methods. For instance, strings have a method called split (they have many other methods as well).

s = 'hyphens-are-everywhere-they-haunt-me-in-my-dreams'
s.split('-')

['hyphens', 'are', 'everywhere', 'they', 'haunt', 'me', 'in', 'my', 'dreams']

We will encounter heaps of methods later.

Exercise - Functions

Use a function and a list comprehension to reproduce the results of the for loop below:

l = [(4, 3, 2), (2, 2, 2), (1, 2, 3)]

results = []

for t in l:
  results.append(
    t[0] + t[1] - t[2] +
    (t[0] * t[1] + t[2])**t[1] /
    (t[1]**2 * 5**t[0])
  )

results

[5.487822222222222, 2.36, 1.25]

Take some time to contemplate, what is this?

import numpy as np

nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

np.array(nested_list)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

BTW, you can import a library as anything you want

import numpy as frogz_on_crack

frogz_on_crack.array([1, 2, 3])

array([1, 2, 3])

Why would one avoid doing this kind of thing?

You can also dump all of the objects from a module into the main namespace

# from numpy import everything
from numpy import *

array([1, 2, 3 ])

array([1, 2, 3])

Why is this a bad practice? What Is a namespace

x = 'I love \U0001F30D'

def print_wrapper():
    print(x)

def print_poo():
    x = 'This is the value of x within the function'
    print(x)

print_wrapper()
print_poo()

print(x)

I love 🌍
This is the value of x within the function
I love 🌍

Pandas DataFrame

DataFrames hold tabular data. Here we will read a csv file as a DataFrame and mess with it.

This cell just downloads the file from a url. Take a few minutes and see if you can change it to be generally useful by allowing the url and save location to be specified as arguments. Hint, you need to change very little, and you don’t really need to know how the function works in any detail.

import requests
from pathlib import Path

def get_NZ_file():
  '''
  A purposefully bad function to download one specific file.
  '''
  # url to Dec 2024 NZ GH emissions by industry (long url!)
  url = 'https://www.stats.govt.nz/assets/Uploads/Greenhouse-gas-emissions-industry-and-household/Greenhouse-gas-emissions-industry-and-household-December-2024-quarter/Download-data/greenhouse-gas-emissions-industry-and-household-december-2024-quarter.csv'
  # make a dir for data, if does not exist
  save_dir = Path.cwd() / 'data'
  save_dir.mkdir(exist_ok=True)
  # get file name
  basename = url.split('/')[-1]
  # and a directory to save to 
  save_path = save_dir / basename

  try:
      response = requests.get(url, stream=True)
      # raise HTTPError for bad responses (4xx or 5xx)
      response.raise_for_status() 
      with open(save_path, 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
          file.write(chunk)
      print(f"File downloaded successfully to {save_path}")

  except requests.exceptions.RequestException as e:
    print(f'An error occurred: {e}')
  except Exception as e:
    print(f'An unexpected error occurred: {e}')

  return save_path

file_path = get_NZ_file()

File downloaded successfully to /home/michael/CP/geog441/data/greenhouse-gas-emissions-industry-and-household-december-2024-quarter.csv

Read the csv, and look at the head (first few rows). Notice that there are column names, and on the left side there is a numerical index.

import pandas as pd

df = pd.read_csv(file_path)
df.head()

	Anzsic	Anzsic_descriptor	Gas	Period	Data_value	Variable	Units	Magnitude
0	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.03	10875	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
1	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.06	11003	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
2	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.09	10993	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
3	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.12	10914	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
4	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2011.03	11014	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents

We can also see the last few rows if we want.

df.tail()

	Anzsic	Anzsic_descriptor	Gas	Period	Data_value	Variable	Units	Magnitude
4795	ZZZ	Total	Nitrous oxide	2023.12	1708	Actual	Kilotonnes	Carbon dioxide equivalents
4796	ZZZ	Total	Nitrous oxide	2024.03	1615	Actual	Kilotonnes	Carbon dioxide equivalents
4797	ZZZ	Total	Nitrous oxide	2024.06	1482	Actual	Kilotonnes	Carbon dioxide equivalents
4798	ZZZ	Total	Nitrous oxide	2024.09	1652	Actual	Kilotonnes	Carbon dioxide equivalents
4799	ZZZ	Total	Nitrous oxide	2024.12	1662	Actual	Kilotonnes	Carbon dioxide equivalents

Or we can slice from the middle using index slicing, just like with a list.

df[100:105]

	Anzsic	Anzsic_descriptor	Gas	Period	Data_value	Variable	Units	Magnitude
100	BB1	Mining	Carbon dioxide equivalents	2020.03	307	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
101	BB1	Mining	Carbon dioxide equivalents	2020.06	256	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
102	BB1	Mining	Carbon dioxide equivalents	2020.09	280	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
103	BB1	Mining	Carbon dioxide equivalents	2020.12	271	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
104	BB1	Mining	Carbon dioxide equivalents	2021.03	276	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents

There are many other ways to select from Pandas DataFrames.

The Period column gives dates, but they are in a format that will be interpreted as floats (e.g. 20007.08). Here we change them to datetimes and use them to set the index, giving us a DateTime index.

Earlier we mentioned methods, and we saw that strings have method split() which splits the string. Columns in Pandas that are of the type str, that is to say string, have all of the same methods under an attribute called str.

This cell uses many methods strung together to change the Period column to DateTime format and set the index.

df.index = pd.to_datetime(
  df.Period.astype('str').str.split('.').str.join('-')
)

df.head()

	Anzsic	Anzsic_descriptor	Gas	Period	Data_value	Variable	Units	Magnitude
Period
2010-03-01	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.03	10875	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
2010-06-01	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.06	11003	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
2010-09-01	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.09	10993	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
2010-12-01	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2010.12	10914	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents
2011-03-01	AAZ	Agriculture, forestry, fishing	Carbon dioxide equivalents	2011.03	11014	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents

See what gasses are measured by looking at unique values in the Gas column.

df.Gas.unique()

array(['Carbon dioxide equivalents', 'Methane', 'Carbon dioxide',
       'Fluorinated gases', 'Nitrous oxide'], dtype=object)

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1)

df[(df.index.month == 6) &
  (df.Gas == 'Nitrous oxide') &
  (df.Anzsic == 'ZZZ')
].plot.scatter('Period', 'Data_value', ax=ax)
ax.set_title('NZ June Nitrous Oxide Emissions')
ax.set_ylabel('Kilotonnes')
ax.set_xlabel('Year')

Text(0.5, 0, 'Year')

Pandas DataFrame Cont.

There is a lot more to know about Pandas DataFrames. This is enough for us to move on to GeoPandas. I encourage you to learn more about Pandas as an exercise.

Exploring Vector Data with Geopandas

Geopandas

Vector Data

Most often we will interact with vector data using GeoPandas
- GeoPandas is an extension of Pandas.
- It uses Fiona, which in-turn relies on GDAL and OGR, to read and write.
- Vector data is handled by the Shapely module, which relies on GEOS.
- GEOS implements the OGC Simple Features geometry
Projections are managed by pyproj which is built upon PROJ

Shapely

Shapely Geometric Objects consist of coordinate tuples:
- Point - (x, y) or three dimensional (x, y, z), e.g. Point(5.2, 52.1)
- LineString - List if coordinates of vertices, e.g. LineString([(0, 0), (1, 2)])
- Polygon - Basically a closed linestring, e.g. Polygon(((0., 0.), (0., 1.), (1., 1.), (1., 0.), (0., 0.)))
- Notice that the first and last coord of the Polygon are the same.
- More on Shapely geometries

Polygon

from shapely import Polygon

coords = ((0., 0.), (0., 2.), (1., 1.), (1., 0.), (0., 0.))
p = Polygon(coords)

c = p.exterior.coords

print(c[0] == c[-1])
p

True

Reading Files

Here we will read a geojson using the read_file method.

from pathlib import Path
import geopandas as gpd

# path to data
data_dir = Path('data')
geojson = data_dir / 'upper_santa_rita_creek.geojson'

# open the geojson
gdf = gpd.read_file(geojson)

gdf

	id	geometry
0	globalwatershedpoint	POINT (-120.80331 35.52537)
1	globalwatershed	POLYGON ((-120.81096 35.50944, -120.81161 35.5...

import matplotlib.pyplot as plt
import contextily as cx

fig, ax = plt.subplots(figsize=(10, 10))
gdf.to_crs(epsg=3857).plot(
  edgecolor='r',
  ax=ax,
  facecolor='none')
cx.add_basemap(
  ax=ax,
  source=cx.providers.USGS.USImageryTopo
)

import geodatasets
chicago = gpd.read_file(geodatasets.get_path('geoda.chicago_commpop'))
chicago.head()

	community	NID	POP2010	POP2000	POPCH	POPPERCH	popneg	geometry
0	DOUGLAS	35	18238	26470	-8232	-31.099358	1	MULTIPOLYGON (((-87.60914 41.84469, -87.60915 ...
1	OAKLAND	36	5918	6110	-192	-3.142390	1	MULTIPOLYGON (((-87.59215 41.81693, -87.59231 ...
2	FULLER PARK	37	2876	3420	-544	-15.906433	1	MULTIPOLYGON (((-87.6288 41.80189, -87.62879 4...
3	GRAND BOULEVARD	38	21929	28006	-6077	-21.698922	1	MULTIPOLYGON (((-87.60671 41.81681, -87.6067 4...
4	KENWOOD	39	17841	18363	-522	-2.842673	1	MULTIPOLYGON (((-87.59215 41.81693, -87.59215 ...

Plot Cloropleth

chicago.plot(column="POP2010")

The Chicago example is taken from GeoPandas Mapping and plotting tools tutorial, which you should take a look at.

Working with Rasters in Python

Look at the temperature variable tos.