Case-insensitive string comparison in Python using casefold, not lower
Categories: Python
Here is a discipline I am trying to adopt in my Python programs: use "My string".casefold() instead of "My string".lower() when comparing strings irrespective of case.
When checking for string equality, in which I don’t care about uppercase vs. lowercase, it is tempting to do something like this:
if "StrinG".() == "string".():
print("Case-insensitive equality!")
Of course, it works.
But some human languages have case rules that function with a bit more nuance. Let’s say we have three strings with slightly different ways of writing Kubernetes (writing Kubernetes in Greek makes you sound doubly smart).
k8s = "ΚυβερνΉτης"
k8S = "ΚυβερνΉτηΣ"
k8s_odd = "ΚυβερνΉτησ" # Apologies to the scribes of Athens
These three are all mixed-case strings. The first one correctly ends with a final lowercase sigma, the second one has a capital sigma, and that last one, oddly, has a non-final sigma.
Let’s imagine we have a use case in which we want to consider all of these as equal. Would str.lower() work?
>>> k8s.()
'κυβερνήτης'
>>> k8S.()
'κυβερνήτης'
>>> k8s_odd.()
'κυβερνήτησ'
Apparently not.
Using str.casefold() instead:
>>> k8s.()
'κυβερνήτησ'
>>> k8S.()
'κυβερνήτησ'
>>> k8s_odd.()
'κυβερνήτησ'
All are equal! Exactly what we want for case-insensitive string comparison.
One should not use str.casefold() if you are aiming for clean spellings. str.upper().lower() might yield a more printable result:
>>> k8s_odd.().()
'κυβερνήτης'
But for case-insensitive comparison that respects a wide range of human languages, str.casefold() is our friend.
References
- Python docs on
str.casefold - The Unicode Standard, Section “3.13 Default Case Algorithms” on page 150 of chapter 3