In defense of Python assertions
I've gotten into arguments about whether Python asserts should be used. I think that they are great if used correctly.
I use Python’s assert
statements a lot: for debugging, testing, and documenting assumptions. But before I make my case, I'll try to steelman the arguments against them.
The case against assertions
Assertions have two downsides, which are valid. Later, I will explain how my usage of assertions mitigates these downsides:
- they can be disabled if you run python in optimized mode (
python -O
). I've never seen anyone use this flag, but point granted that your code should never rely on them. AssertionError
s are not very informative. This is true, which is why we should only use it in specific circumstances (more below).
The right way to use assertions
Design by Contract (DbC) on data processing code
DbC is about defining specs for your code using:
- Preconditions: conditions that must be true before a function is called.
- Postconditions: conditions that must be true after a function is called.
- Invariants conditions that must be true throughout the execution of a function.
I use assertions to document these expectations. It’s like executable documentation. For example:
def fancy_processing(input_df: pd.DataFrame) -> pd.DataFrame:
# Pre-conditions
assert list(input_df.columns) == sorted(input_df.columns), f"Columns should be sorted: {input_df.columns}"
# ...processing...
# Post-condidtions
assert output_df.columns == input_df.columns, f"Columns should not change: {output_df.columns} != {input_df.columns}"
return output_df
This isn't something you want to go overboard with, and I wouldn't do this in a typical software system since it's too ugly and constrains you. However, I find it useful to really clamp down on admissible inputs and outputs of data processing functions that are stable and need to ensure data integrity over time.
For more robust DbC in Python, check out:
I am aware of the icontract, but I find that it makes code less readable than using @beartype
and assertions.
Debugging
When I am in the thick of coding, assertions are my sanity checks. Unsure of my tensor shapes? Assert it. Unexpected dtypes? Assert it.
Once the code’s working, I’ll prune the less useful asserts. Liberal asserts are useful to constrain the search space of bugs, but you don’t need all of them once the code is more production-ready.
When NOT to use assertions
- Handling production errors. Assertions should never be used to handle errors which are expected to happen in production. For example, never
try: ... except AssertionError: ...
. Use meaningful exceptions here. - On code that must run.For example, don’t use assertions for sanitizing user input.
- sanity checking GPU tensors. Using operations like
print
,.item()
, andassert
implicitly require a copy to the CPU which is extremely slow.