Python Lectures

Introducing Python

  • It allows for more flexibility than a GUI based GIS program.
  • Other GIS applications (e.g. QGIS, ArcGIS, postGIS) often have a Python interface.
  • generally a widely used language.

Environment

Before we get started we need to set up a Python environment in which to work. We will be using Anaconda for this. It should already be installed on the lab computers. If you are using your own computer

  • Download the environment.yml
  • If you are working on a lab computer In the start menu search for Anaconda. Open an Anaconda Powershell. (on linux or mac with Anconda or miniconda or Mamba installed just open a terminal)(If you are installing on your own computer I recommend Mamba)
cd ~
mkdir geog441
cd geog441
cp ~/Downloads/environment.yml .
conda env create -f environment.yml

What is this environment.yml thing?

name: geo
channels:
  - defaults
dependencies: 
  - geopandas
  - jupyterlab
  - rioxarray
  - tabulate
  - tqdm

Using conda env create with the environment yaml creates a new conda environment named geo with the dependencies installed. Next we will activate the geo environment.

conda activate geo

Notice that the conda environment shown after the prompt has changed from (base) to (geo).

If later you find that you need another package you can add it to the environment (while in the environment) with conda install <whatever_package>.

More info on managing conda environments

Starting a Jupyter Lab

jupyter lab

This will open a new jupyter lab in your browser.

Using VScode

Running Jupyter in VScode offers some advantages (in my opinion). To do this you need to install the jupyter extension and the Python Extension. Then when you open a file with a .ipynb extension it will be treated as a Jupyter notebook.

Basic Python data types

Type Example(s)
String 'Dude!'
Float 1.2
Int 3
Tuple ('x', 'y')
(1, 2)
('x', 3.2)
List ['x', 'y']
[1, 2]
Possible but bad โ€“> ['x', 3.2]
Dict {'dogs': 26, 'cats', 100}
etcโ€ฆ there are others

Basic Numbers

Floats and Ints donโ€™t do anything all that surprising

a = 2 + 2
b = 2.0 + 2.0
c = a + b


print(f'a is an {type(a)}')
print(f'b is a {type(b)}')
print(f'c is a {type(c)}')
print(f'a / b is {a / b}')
print(f'5 / 4 is {5 / 4}')
print(f'but 5 // 4 is {5 // 4}')
print(f'and 5 % 4 is {5 % 4}')
a is an <class 'int'>
b is a <class 'float'>
c is a <class 'float'>
a / b is 1.0
5 / 4 is 1.25
but 5 // 4 is 1
and 5 % 4 is 1

Exercise - Basic Numbers

Try the following, print the results, see what they do.

a = 5 // 4
b = 5 % 4
c = a + b

Sequences

Lists, tuples, and strings ae all sequences

# a list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# a tuple
a_tup = (1, 2, 3, 4, 5, 6, 7, 8, 9)

# a string
a_str = 'Wil je graag een neushorn?'

# access by index
a = a_list[0]
b = a_list[-1]
c = a_list[4]
print(f'by index:\n 0 --> {a},\n-1 --> {b},\n 4 --> {c}')

# works fro strings to
print('string item at 4 --> ', a_str[4])

# you can slice a list or tuple(remember 0 indexed)
print('\nslices:')
print(a_list[2:5])
print(a_list[8:])
print(a_tup[2:5])
print(a_str[-9:])
by index:
 0 --> 1,
-1 --> 9,
 4 --> 5
string item at 4 -->  j

slices:
[3, 4, 5]
[9]
(3, 4, 5)
neushorn?

Slicing
Sequences tutorial
Sequence tutorial as video

Exercise - Lists and Tuples

The key between lists and tuples is that tuples are immutable. Try assigning a value to both of the following.

# a list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# a tuple
a_tup = (1, 2, 3, 4, 5, 6, 7, 8, 9)

Dictionaries

pets = {'honden': 26, 'katten': 100}

for key, val in pets.items():
  print(f'{val} {key}')

print(f'{pets["honden"] + pets["katten"]} huisdieren')
26 honden
100 katten
126 huisdieren

More on dictionaries

Iterating

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# use a loop to append squared values to new lists
b = []
for n in a:
  b.append(n**2)

# or, better, use a comprehension
c = [n**2 for n in a]

print(b)
print('is the same as')
print(c)
[1, 4, 9, 16, 25, 36, 49, 64, 81]
is the same as
[1, 4, 9, 16, 25, 36, 49, 64, 81]

Exercise - Iterating

You can nest loops.

numbers = []
for i in range(3):
  for j in range(3):
    numbers.append((i, j))

numbers
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
  • Recreate the above with a list comprehension.
  • replace the square brackets in this list comprehension, = [n**2 for n in a], with parenthesis.
  • Can you create a dictionary with a comprehension.

Functions

  • Defined with def
  • Can take arguments
  • can return something
  • have there own scope
import numpy as np

def root_mean_square(list_of_numbers):
  rms = np.sqrt(
    np.array([n**2 for n in list_of_numbers]).sum() /
    len(list_of_numbers)
    )

  return rms

l = [1.2, 3.3, 1.4, 1.4, 4.2]
rms = root_mean_square(l)

print(rms)
2.6034592372457075

They do not have to have arguments or return stuff.

def complain():
  print('This is boring!\nwhen will we get to the GIS stuff.\n\U0001F620\U0001F620\U0001F620\n\nSoon! Bear with me.')

complain()
This is boring!
when will we get to the GIS stuff.
๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ 

Soon! Bear with me.

Python is an object oriented programming language. Everything that you can assign to a variable (which is almost everything) can be though of as an object, for example, built in functions are objects.

zzz = print
zzz('toenails')
toenails

Many objects have built in functions as an attribute, these are called methods. For instance, strings have a method called split (they have many other methods as well).

s = 'hyphens-are-everywhere-they-haunt-me-in-my-dreams'
s.split('-')
['hyphens', 'are', 'everywhere', 'they', 'haunt', 'me', 'in', 'my', 'dreams']

We will encounter heaps of methods later.

Exercise - Functions

Use a function and a list comprehension to reproduce the results of the for loop below:

l = [(4, 3, 2), (2, 2, 2), (1, 2, 3)]

results = []

for t in l:
  results.append(
    t[0] + t[1] - t[2] +
    (t[0] * t[1] + t[2])**t[1] /
    (t[1]**2 * 5**t[0])
  )

results
[5.487822222222222, 2.36, 1.25]

Take some time to contemplate, what is this?

import numpy as np

nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

np.array(nested_list)
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

BTW, you can import a library as anything you want

import numpy as frogz_on_crack

frogz_on_crack.array([1, 2, 3]) 
array([1, 2, 3])

Why would one avoid doing this kind of thing?

You can also dump all of the objects from a module into the main namespace

# from numpy import everything
from numpy import *

array([1, 2, 3 ])
array([1, 2, 3])

Why is this a bad practice? What Is a namespace

x = 'I love \U0001F30D'

def print_wrapper():
    print(x)

def print_poo():
    x = 'This is the value of x within the function'
    print(x)

print_wrapper()
print_poo()

print(x)
I love ๐ŸŒ
This is the value of x within the function
I love ๐ŸŒ

Pandas DataFrame

DataFrames hold tabular data. Here we will read a csv file as a DataFrame and mess with it.

  1. This cell just downloads the file from a url. Take a few minutes and see if you can change it to be generally useful by allowing the url and save location to be specified as arguments. Hint, you need to change very little, and you donโ€™t really need to know how the function works in any detail.
import requests
from pathlib import Path

def get_NZ_file():
  '''
  A purposefully bad function to download one specific file.
  '''
  # url to Dec 2024 NZ GH emissions by industry (long url!)
  url = 'https://www.stats.govt.nz/assets/Uploads/Greenhouse-gas-emissions-industry-and-household/Greenhouse-gas-emissions-industry-and-household-December-2024-quarter/Download-data/greenhouse-gas-emissions-industry-and-household-december-2024-quarter.csv'
  # make a dir for data, if does not exist
  save_dir = Path.cwd() / 'data'
  save_dir.mkdir(exist_ok=True)
  # get file name
  basename = url.split('/')[-1]
  # and a directory to save to 
  save_path = save_dir / basename

  try:
      response = requests.get(url, stream=True)
      # raise HTTPError for bad responses (4xx or 5xx)
      response.raise_for_status() 
      with open(save_path, 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
          file.write(chunk)
      print(f"File downloaded successfully to {save_path}")

  except requests.exceptions.RequestException as e:
    print(f'An error occurred: {e}')
  except Exception as e:
    print(f'An unexpected error occurred: {e}')

  return save_path

file_path = get_NZ_file()
File downloaded successfully to /home/michael/Work/geog441/data/greenhouse-gas-emissions-industry-and-household-december-2024-quarter.csv
  1. Read the csv, and look at the head (first few rows). Notice that there are column names, and on the left side there is a numerical index.
import pandas as pd

df = pd.read_csv(file_path)
df.head()
Anzsic Anzsic_descriptor Gas Period Data_value Variable Units Magnitude
0 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.03 10875 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
1 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.06 11003 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
2 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.09 10993 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
3 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.12 10914 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
4 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2011.03 11014 Seasonally adjusted Kilotonnes Carbon dioxide equivalents

We can also see the last few rows if we want.

df.tail()
Anzsic Anzsic_descriptor Gas Period Data_value Variable Units Magnitude
4795 ZZZ Total Nitrous oxide 2023.12 1708 Actual Kilotonnes Carbon dioxide equivalents
4796 ZZZ Total Nitrous oxide 2024.03 1615 Actual Kilotonnes Carbon dioxide equivalents
4797 ZZZ Total Nitrous oxide 2024.06 1482 Actual Kilotonnes Carbon dioxide equivalents
4798 ZZZ Total Nitrous oxide 2024.09 1652 Actual Kilotonnes Carbon dioxide equivalents
4799 ZZZ Total Nitrous oxide 2024.12 1662 Actual Kilotonnes Carbon dioxide equivalents

Or we can slice from the middle using index slicing, just like with a list.

df[100:105]
Anzsic Anzsic_descriptor Gas Period Data_value Variable Units Magnitude
100 BB1 Mining Carbon dioxide equivalents 2020.03 307 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
101 BB1 Mining Carbon dioxide equivalents 2020.06 256 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
102 BB1 Mining Carbon dioxide equivalents 2020.09 280 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
103 BB1 Mining Carbon dioxide equivalents 2020.12 271 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
104 BB1 Mining Carbon dioxide equivalents 2021.03 276 Seasonally adjusted Kilotonnes Carbon dioxide equivalents

There are many other ways to select from Pandas DataFrames.

The Period column gives dates, but they are in a format that will be interpreted as floats (e.g. 20007.08). Here we change them to datetimes and use them to set the index, giving us a DateTime index.

Earlier we mentioned methods, and we saw that strings have method split() which splits the string. Columns in Pandas that are of the type str, that is to say string, have all of the same methods under an attribute called str.

This cell uses many methods strung together to change the Period column to DateTime format and set the index.

df.index = pd.to_datetime(
  df.Period.astype('str').str.split('.').str.join('-')
)

df.head()
Anzsic Anzsic_descriptor Gas Period Data_value Variable Units Magnitude
Period
2010-03-01 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.03 10875 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
2010-06-01 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.06 11003 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
2010-09-01 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.09 10993 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
2010-12-01 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2010.12 10914 Seasonally adjusted Kilotonnes Carbon dioxide equivalents
2011-03-01 AAZ Agriculture, forestry, fishing Carbon dioxide equivalents 2011.03 11014 Seasonally adjusted Kilotonnes Carbon dioxide equivalents

See what gasses are measured by looking at unique values in the Gas column.

df.Gas.unique()
array(['Carbon dioxide equivalents', 'Methane', 'Carbon dioxide',
       'Fluorinated gases', 'Nitrous oxide'], dtype=object)
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1)

df[(df.index.month == 6) &
  (df.Gas == 'Nitrous oxide') &
  (df.Anzsic == 'ZZZ')
].plot.scatter('Period', 'Data_value', ax=ax)
ax.set_title('NZ June Nitrous Oxide Emissions')
ax.set_ylabel('Kilotonnes')
ax.set_xlabel('Year')
Text(0.5, 0, 'Year')

Pandas DataFrame Cont.

There is a lot more to know about Pandas DataFrames. This is enough for us to move on to GeoPandas. I encourage you to learn more about Pandas as an exercise.

Exploring Vector Data with Geopandas

Geopandas

Vector Data

Shapely

  • Shapely Geometric Objects consist of coordinate tuples:
    • Point - (x, y) or three dimensional (x, y, z), e.g. Point(5.2, 52.1)
    • LineString - List if coordinates of vertices, e.g. LineString([(0, 0), (1, 2)])
    • Polygon - Basically a closed linestring, e.g. Polygon(((0., 0.), (0., 1.), (1., 1.), (1., 0.), (0., 0.)))
    • Notice that the first and last coord of the Polygon are the same.
    • More on Shapely geometries

Polygon

from shapely import Polygon

coords = ((0., 0.), (0., 2.), (1., 1.), (1., 0.), (0., 0.))
p = Polygon(coords)

c = p.exterior.coords

print(c[0] == c[-1])
p
True

Reading Files

Here we will read a geojson using the read_file method.

from pathlib import Path
import geopandas as gpd

# path to data
data_dir = Path('data')
geojson = data_dir / 'upper_santa_rita_creek.geojson'

# open the geojson
gdf = gpd.read_file(geojson)

gdf
id geometry
0 globalwatershedpoint POINT (-120.80331 35.52537)
1 globalwatershed POLYGON ((-120.81096 35.50944, -120.81161 35.5...
import matplotlib.pyplot as plt
import contextily as cx

fig, ax = plt.subplots(figsize=(10, 10))
gdf.to_crs(epsg=3857).plot(
  edgecolor='r',
  ax=ax,
  facecolor='none')
cx.add_basemap(
  ax=ax,
  source=cx.providers.USGS.USImageryTopo
)

import geodatasets
chicago = gpd.read_file(geodatasets.get_path('geoda.chicago_commpop'))
chicago.head()
community NID POP2010 POP2000 POPCH POPPERCH popplus popneg geometry
0 DOUGLAS 35 18238 26470 -8232 -31.099358 0 1 MULTIPOLYGON (((-87.60914 41.84469, -87.60915 ...
1 OAKLAND 36 5918 6110 -192 -3.142390 0 1 MULTIPOLYGON (((-87.59215 41.81693, -87.59231 ...
2 FULLER PARK 37 2876 3420 -544 -15.906433 0 1 MULTIPOLYGON (((-87.6288 41.80189, -87.62879 4...
3 GRAND BOULEVARD 38 21929 28006 -6077 -21.698922 0 1 MULTIPOLYGON (((-87.60671 41.81681, -87.6067 4...
4 KENWOOD 39 17841 18363 -522 -2.842673 0 1 MULTIPOLYGON (((-87.59215 41.81693, -87.59215 ...

Plot Cloropleth

chicago.plot(column="POP2010")

The Chicago example is taken from GeoPandas Mapping and plotting tools tutorial, which you should take a look at.

Working with Rasters in Python

Look at the temperature variable tos.