Case-insensitive string comparison in Python using casefold, not lower

July 15, 2020

Categories: Python

Case-insensitive string comparison in Python using casefold, not lower

Here is a discipline I am trying to adopt in my Python programs: use "My string".casefold() instead of "My string".lower() when comparing strings irrespective of case.

When checking for string equality, in which I don’t care about uppercase vs. lowercase, it is tempting to do something like this:

if "StrinG".lower() == "string".lower():
    print("Case-insensitive equality!")

Of course, it works.

But some human languages have case rules that function with a bit more nuance. Let’s say we have three strings with slightly different ways of writing Kubernetes (writing Kubernetes in Greek makes you sound doubly smart).

k8s = "ΚυβερνΉτης"
k8S = "ΚυβερνΉτηΣ"
k8s_odd = "ΚυβερνΉτησ"  # Apologies to the scribes of Athens

These three are all mixed-case strings. The first one correctly ends with a final lowercase sigma, the second one has a capital sigma, and that last one, oddly, has a non-final sigma.

Let’s imagine we have a use case in which we want to consider all of these as equal. Would str.lower() work?

>>> k8s.lower()
'κυβερνήτης'
>>> k8S.lower()
'κυβερνήτης'
>>> k8s_odd.lower()
'κυβερνήτησ'

Apparently not.

Using str.casefold() instead:

>>> k8s.casefold()
'κυβερνήτησ'
>>> k8S.casefold()
'κυβερνήτησ'
>>> k8s_odd.casefold()
'κυβερνήτησ'

All are equal! Exactly what we want for case-insensitive string comparison.

One should not use str.casefold() if you are aiming for clean spellings. str.upper().lower() might yield a more printable result:

>>> k8s_odd.upper().lower()
'κυβερνήτης'

But for case-insensitive comparison that respects a wide range of human languages, str.casefold() is our friend.

References

Python docs on str.casefold
The Unicode Standard, Section “3.13 Default Case Algorithms” on page 150 of chapter 3