Static Type Checking in Python

Lecture 1 — CS 12 (Computer Programming II)
Second Semester, AY 2024-2025
Department of Computer Science
University of the Philippines Diliman

Lecture 1 outline

Administrivia
Statically typed Python
Reading assignment: Review of Pythonic features

Administrivia

  • To be up by tomorrow:
    • UVLe section
    • Lecture 0 slide deck
    • Lecture 1 slide deck
    • Student information form
    • Pyright installation guide

Lecture 1 outline

Administrivia
Statically typed Python
Reading assignment: Review of Pythonic features

Motivating example

Problem: Find the bug in the code below:
from collections import Counter
 
def mode(xs):
    if not xs:
        return None
 
    ctr = Counter(xs)
    largest_count = max(ctr.values())
    mode_candidate_count = list(ctr.values()).count(largest_count)
 
    return ctr.most_common()[0][0] if mode_candidate_count == 1 else None
 
answer = mode([1, 2, 3, 1])
square_of_mode = answer**2

Is there an easier way to identify this?

Type hints [1/2]

Type hint/annotation: Explicit declaration of types of variables and return values
  • Code without type hints:
  • students = 93
    nums = []
    names = []
  • Syntax for annotating variable declarations:
  • <variable-name>: <type> = <value>
    students: int = 93
    nums: list[int] = []
    names: list[str] = []
    • Must specify type of elements of list inside square brackets
    • Type of variable must not change

Type hints [2/2]

  • Code without type hints:
  • def is_prime(n):
        ...
     
    def print_larger_len(a, b):
        ...
  • Syntax for annotating function parameters and return type:
  • def <function-name>(<arg>: <type>[, <arg>: <type>]) -> <return-type>:
        <block>
    def is_prime(n: int) -> bool:
        ...
     
    def print_larger_len(a: str, b: str) -> None:
        ...

Why bother adding type hints? [1/2]

  • Suppose this is in a codebase you are to modify:
  • def update(spaceship):
        ...
  • What is the expected type of spaceship?
  • What type does update return?

Need to go through function body to find out

Why bother adding type hints? [2/2]

  • Same code, but with type hints:
  • def update(spaceship: dict[str, int]) -> None:   # Note: dict[str, int]
        ...                                          # means the keys are strs
                                                     # while the values are ints
  • What is the expected type of spaceship?
  • What type does update return?

Above can be answered without going through function body; saves time and effort
  • Toy example; imagine larger codebase
  • Type-mentioning comments can also address this
  • What do type hints provide that comments do not?

Type checking

Type checking: Process of checking whether type rules of PL are violated by code
  • Type rule violation is called a type error

  • Python type rule example: str and int cannot be added together
    • When is this rule checked?

KindWhenPython
Static type checkingBefore executionNot done by default
Dynamic type checkingDuring executionDone by default

Dynamic type checking in Python

  • Type rule: str and int cannot be added together

  • Code below violates the above type rule:
  • def f(a, b):      # Intention is for both params to be ints
        return a + b
     
    f(1, 2)           # Passes type checking
     
    f(12, "hello")    # Does not pass type checking;
                      # raises TypeError during execution (crash!)
In general, we do not want our programs to crash during runtime
  • Python allows running of programs with type errors
  • Better if we can flag the above error before runtime; how?

Static type checking and type hints

  • Same code, but with type hints:
  • def f(a: int, b: int) -> int:
        return a + b
     
    f(1, 2)         # Should pass type checking
     
    f(12, "hello")  # Should not pass type checking
  • b is annotated to be an int
  • f(12, "hello") passes a str to b
    • Should be a type error (inconsistency with hint and usage)

Pyright

  • Pyright: Static type checker by Microsoft (Github repo)
    • Called Pylance on VS Code
    • Installation guide to be uploaded

  • Pyright is able to automatically flag the type error (sandbox):
  • def f(a: int, b: int) -> int:
        return a + b
     
    f(1, 2)
     
    # Argument of type "Literal['hello']" cannot be assigned to
    # parameter "b" of type "int" in function "f";
    # -> "Literal['hello']" is incompatible with "int"
    f(12, "hello")
    • Ideally, Pyright should report no errors before we run our code
Please practice reading through type error messages

Pyright strict mode rules

  • CS 12 24.2 requirement: Pyright 1.1.392 strict mode

  • All function parameters must be annotated (optional for return type)
  • def f(a: int, b: int):  # Return type is optional
        return a + b

  • Variables initialized with empty collections must be annotated
  • # Flagged by Pyright; must be `nums: list[int] = []`
    nums = []
     
    for num in range(1, 11):
        nums.append(num)
Ensure your code passes Pyright strict mode before submission!

Type inference [1/2]

  • What is the return type of the function below?
  • def f(n: int):
        if n > 0:
            return "POSITIVE"
     
        elif n < 0:
            return "NEGATIVE"

Type inference [2/2]

Note: Python functions that terminate without encountering a return statement automatically return None
  • f can return a str ("POSITIVE" or "NEGATIVE") or None:
  • def f(n: int):
        if n > 0:
            return "POSITIVE"
     
        elif n < 0:
            return "NEGATIVE"   # None will be returned when n == 0
  • Pyright is able to infer this based on the given and have your IDE report it (very useful!):
  • # f(n: int) -> (Literal['POSITIVE', 'NEGATIVE'] | None)
    • A | B is the Union type that covers values of type A and of type B
    • f returning None might be a bug (likely unintentional); Pyright helps detect these

Limitations of type inference

  • f below returns "POSITIVE", "NEGATIVE", or "ZERO":
  • def f(n: int):
        if n > 0:
            return "POSITIVE"
        elif n < 0:
            return "NEGATIVE"
        elif n == 0:
            return "ZERO"
  • Pyright incorrectly infers the return type to include None
    • Cannot infer that the three cases cover all possible ints
    • Smart enough to make inferences based on else branch:
    • def f(n: int):             # Pyright infers the return type
          if n > 0:              # of f correctly due to `else`
              return "POSITIVE"
          elif n < 0:
              return "NEGATIVE"
          else:
              return "ZERO"

Type narrowing

  • Pyright can rule out possibilities of union types using conditionals:
  • def f(nums: list[int] | None) -> int | None:
        if nums is None:      # How does Pyright behave when
            return None       # condition is removed?
     
        nums.append(100)      # What more specific type does
                              # Pyright infer here?
        return sum(nums)
Very useful in ensuring types are as expected

Static type rules of collections

TypeDynamic typingStatic typing (homogeneity)Static type annotation
listElements can be of any typeElements must have same typelist[int], list[str], list[list[int]]
tupletuple[int,int,int],
tuple[str,bool]
setElements can be of any typeElements must have same typeset[int], set[str]
dictKeys and values can be of any typeKeys must have same type; values must have same typedict[str,int], dict[int,dict[int,str]]

Back to motivating example

Problem: Find the bug in the code below:
from collections import Counter
 
def mode(xs: list[int]):
    if not xs:
        return None
 
    ctr = Counter(xs)
    largest_count = max(ctr.values())
    mode_candidate_count = list(ctr.values()).count(largest_count)
 
    return ctr.most_common()[0][0] if mode_candidate_count == 1 else None
 
answer = mode([1, 2, 3, 1])
square_of_mode = answer**2

Pyright can help with this

Type alias

Type alias: Alternative name for existing type
  • Equivalent to existing type
  • Simplifies writing of complex types and enables domain-specific type names:
  • Matrix = list[list[int]]
     
    def mul_matrices(a: Matrix, b: Matrix) -> Matrix:
        nrow = len(a)
        ncol = len(b[0])
        ret = [[0 for _ in range(ncol)] for _ in range(nrow)]
     
        for i in range(nrow):
           for j in range(ncol):
               for k in range(nrow):
                   ret[i][j] += a[i][k] * b[k][j]
     
        return ret

Literal type

Literal type: Type having specific constant values
  • Literal[1, 2] is same as Literal[1] | Literal[2]
  • Valid types for literal: int, str, byte, bool, None, Enum value (not float; PEP 586)
    • Cannot use variables; must be constant values

  • Useful for expressing that only specific values are allowed:
  • from typing import Literal, get_args  # Must import from typing module
     
    MyLiteral = Literal['a', 'b']
     
    def f(x: MyLiteral):        # Pyright can infer that that return type
        if x == 'a':            # of f is Literal[1, 2]
            return 1
        else:
            return 2
     
    print(get_args(MyLiteral))  # ('a', 'b')
     
    f('c')                      # Invalid; Literal['c'] is not in Literal['a', 'b']
     
    if f('a') == 0:             # Invalid; Literal[0] is not in Literal[1, 2]
        ...

Exhaustiveness checking [1/2]

  • Suppose you have forgotten to handle 'CS' and 'IE':
  • EnggProgram = Literal['ChE', 'CE', 'CoE', 'CS', 'EcE', 'EE', 'GE',
                          'IE', 'MatE', 'ME', 'MetE', 'EM']
    EnggUnit = Literal['DChE', 'DCS', 'DGE', 'DIEOR', 'EEE', 'ICE', 'DME', 'DMMME']
     
    def get_offering_unit(program: EnggProgram) -> EnggUnit:
        if program == 'ChE':
            return 'DChE'
        elif program == 'CE':
            return 'ICE'
        elif program == 'CoE' or program == 'EcE' or program == 'EE':
            return 'EEE'
        elif program == 'GE':
            return 'DGE'
        elif program == 'MatE' or program == 'MetE' or program == 'EM':
            return 'DMMME'
        elif program == 'ME':
            return 'DME'
     
        # Recall that functions return None if allowed to reach the end
  • Pyright complains with: "None" cannot be assigned to type "EnggUnit"
    • Adding a reveal_type(program) (debug stub used by Pyright) after the if chain...
    • ...says: Type of "program" is "Literal['CS', 'IE']"
Pyright can tell you exactly which cases you missed!

Exhaustiveness checking [2/2]

  • Also works with pattern matching:
  • EnggProgram = Literal['ChE', 'CE', 'CoE', 'CS', 'EcE', 'EE', 'GE',
                          'IE', 'MatE', 'ME', 'MetE', 'EM']
    EnggUnit = Literal['DChE', 'DCS', 'DGE', 'DIEOR', 'EEE', 'ICE', 'DME', 'DMMME']
    def get_offering_unit(program: EnggProgram) -> EnggUnit:
        match program:
            case 'ChE':
                return 'DChE'
            case 'CE':
                return 'ICE'
            case 'CoE' | 'EcE' | 'EE':  # Value must be one of them;
                return 'EEE'            # more concise than if version
            case 'CS':
                return 'DCS'
            case 'GE':
                return 'DGE'
            case 'IE':
                return 'DIEOR'
            case 'MatE' | 'MetE' | 'EM':
                return 'DMMME'
            case 'ME':
                return 'DME'

To be covered in later lectures

  • Subtyping
  • Generics
  • Type variance

Lecture 1 outline

Administrivia
Statically typed Python
Reading assignment: Review of Pythonic features

Truthy and falsy collections

  • Instead of using len to check emptiness:
  • if len(elems) > 0:
        ...

  • Use collection itself as truthy/falsy value:
  • if elems:      # Easier to read
        ...        # and write
    • Nonempty collections are truthy
    • Empty collections are falsy
      • Empty string is falsy

Python collections can be used as boolean values

Negative indexing

  • Negative indices start from the tail end:
  • elems = [ 'a', 'b', 'c', 'd' ]
    #          0    1    2    3
    #         -4   -3   -2   -1
     
    # elems[-1] == 'd'
    # elems[-len(elems)] == 'a'
    • Index -1 is last element (if nonempty)
    • Be wary of empty lists (check emptiness before indexing)

  • elems[-i] is shorthand for elems[len(elems)-i]:
  • elems = [ 'a', 'b', 'c', 'd' ]  # len(elems) == 4
    #          0    1    2    3     # i
    #        4-4  4-3  4-2  4-1     # len(elems)-i
    • Shorthand is easier to read and write

Aside: Boolean shortcircuiting

  • and shortcircuiting:
    • a and b
    • If a is False, evaluation of b is skipped
      • False and ??? is always False

  • or shortcircuiting:
    • a or b
    • If a is True, evaluation of b is skipped
      • True or ??? is always True

String interpolation via fstrings

  • Instead of manual concatenation:
  • if word and word[-1] == 's':     # Nonempty string is truthy
        suffix = "'"
    else:
        suffix = "'s"
     
    possessive_form = word + suffix  # "student" -> "student's"
                                     # "Cyrus" -> "Cyrus'"
  • Use an f-string instead:
  • possessive_form = f"{word}{suffix}"  # Explicitly a string
  • f-string: All {}s inside string are evaluated and made into strings
    • Do not forget to prefix f to opening ' or " (assume num = 12):
    • Use {{ and }} inside an f-string to add regular braces

Conditional expressions

  • Previous example using an if statement:
  • if word and word[-1] == 's':
        suffix = "'"
    else:
        suffix = "'s"
  • Indentation-free version using an if expression:
  • suffix = "'" if word and word[-1] == 's' else "'s"
    • Indentation may make code harder to read

Slicing

  • Slicing syntax (only for lists, tuples, and strings):
  • # Indexes [start,stop) of sequence in increments of step
    # similar to range
    <sequence>[<start>:<stop>:<step>]
    lst = [10, 11, 12, 13, 14]
    print(lst[1:3])   # [11, 12]
     
    tup = (10, 11, 12, 13, 14)
    print(tup[1:3])   # (11, 12)
     
    s = "abcde"
    print(s[1:3])     # "bc"
    lst = [10, 11, 12, 13, 14]
    print(lst[1:4])   # [11, 12, 13]
    print(lst[1:])    # [11, 12, 13, 14]
    print(lst[:4])    # [10, 11, 12, 13]
    print(lst[:])     # [10, 11, 12, 13, 14] (effectively copies list)
    print(lst[::-1])  # [14, 13, 12, 11, 10] (list(reversed(lst)))

Iterable unpacking [1/2]

  • Instead of assigning individual elements via indexing:
  • coords = (10, 20)
    x = coords[0]
    y = coords[1]
  • Use tuple unpacking instead:
  • coords = (10, 20)
    x, y = coords
  • Temporary variable swap vs. one-line swap:
  • temp = x
    x = y
    y = temp
    x, y = y, x

Iterable unpacking [2/2]

  • Instead of multiple list concatenation via +:
  • a = [20, 30]
    b = [50, 60]
    c = [10] + a + [40] + b  # [10, 20, 30, 40, 50, 60]
  • Use the unpacking star operator instead:
  • a = [20, 30]
    b = [50, 60]
    c = [10, *a, 40, *b]  # [10, 20, 30, 40, 50, 60]
  • Use ** for unpacking dicts (dictionary unpacking operator):
  • x = {'a': 1, 'b': 2}
    y = {'c': 3}
    z = {**x, **y, 'd': 4}  # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

Index-free, read-only iteration

  • Task: Print all elements of a given list
  • elems = ['Poring', 'Fabre', 'Lunatic', 'Chonchon']
  • Naive approach: Iteration using list indexing:
  • for i in range(len(elems)):  # [0, 1, 2, 3]
        print(x[i])
  • Index-free list iteration via for:
  • for elem in elems:
        print(elem)
Use range(len(elems)) only when index is necessary

Numbered iteration

  • Task: Print each element of a given list with its 1-indexed position
  • elems = ['Tressa', 'Cyrus', 'Olberic', 'Primrose']
  • Naive approach: Using incremented index:
  • for i in range(len(elems)):                # "#1: Tressa";
        print(f'#{i+1}: {elems[i]}')           # must index
  • Better approach: Use enumerate instead (list of tuples):
  • for i, elem in enumerate(elems):           # i starts at 0;
        print(f'#{i+1}: {elem}')               # no need to index
    for n, elem in enumerate(elems, start=1):  # n starts at 1; no need
        print(f'#{n}: {elem}')                 # to index and increment

Key-value dict enumeration

  • Instead of iterating by keys only:
  • d = {'a': 1, 'b': 2, 'c': 3}
     
    for key in d:  # No need to do d.keys()
        print(f'Key: {key} has value {d[key]}')

  • Use .items() instead:
  • d = {'a': 1, 'b': 2, 'c': 3}
     
    for key, value in d.items():
        print(f'Key: {key} has value {value}')

Reverse iteration [1/2]

  • Task: Print all elements of a given list in reverse
  • elems = ['Poring', 'Fabre', 'Lunatic', 'Chonchon']
  • Naive approach #1: Decreasing range sequence:
  • for i in range(len(elems) - 1, -1, -1):  # i: [3, 2, 1, 0]
        print(elems[i])
  • Naive approach #2: Reverse index via subtraction:
  • for i in range(len(elems)):     #     i: [0, 1, 2, 3]
        print(elems[len(elems)-i])  # 3 - i: [3, 2, 1, 0]
Prone to off-by-one errors; can we do better?

Reverse iteration [2/2]

  • Approach #1: Use built-in reversed function:
  • for elem in reversed(elems):
        print(elem)

  • Approach #2: Use reverse slice
  • for elem in elems[::-1]:
        print(elem)
    • Not as readable as above

Populating a list via for

  • Instead of using a for to build a list from an existing iterable:
  • nums = []
    for n in range(1, 11):
        nums.append(n**2)

  • Use a list comprehension instead:
  • # [<expr> for <variable> in <iterable>] builds a new list
    nums = [n**2 for n in range(1, 11)]

  • Can also be nested to form list of lists:
  • matrix = [[((r * 3) + c) for c in range(3)] for r in range(5)]

Conditionally populating a list

  • Instead of using a for to build a subset of an existing iterable:
  • primes = []
    for n in range(2, 101):
        if is_prime(n):
            primes.append(x)

  • Use a list comprehension instead:
  • # [<expr> for <variable> in <iterable> if <condition>];
    # <expr> is only appended if <condition> is True
    primes = [n for n in range(2, 101) if is_prime(n)]

Set and dict comprehensions

  • Set comprehension:
  • strs = ['a', 'aa', 'b', 'abc', 'c', 'bbb']
     
    lengths = set()
    for s in strs:
        lengths.add(len(s))
    lengths = {len(s) for s in strs}  # {1, 2, 3}
  • Dict comprehension:
  • strs = ['a', 'aa', 'b', 'abc', 'c', 'bbb']
     
    lengths = {}
    for s in strs:             # lengths['aa'] == 2
        lengths[s] = len(s)    # lengths['abc'] == 3
    lengths = {s: len(s) for s in strs}