Are Regexes Fast? ================= Regexes are certainly convenient. Are they faster than the same Python code? >>> import sys >>> print(sys.version) 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] Consider the following regex: >>> NOTLOWER = re.compile('[^a-z]') >>> re_split_lower = lambda s: re.sub(NOTLOWER, '', s) It is a lot nicer to write it as the equivalent python code: >>> split_lower = lambda s: ''.join(c for c in s if c.islower()) I benchmarked this for the strings I am working with: >>> from timeit import timeit >>> timeit('s("AbcCdeFfEqFefqEFE")', globals={'s': re_split_lower}) 1.7502129878848791 >>> timeit('s("AbcCdeFfEqFefqEFE")', globals={'s': split_lower}) 1.0115699651651084 In this case it seems regex is significantly slower. What if we make the regex slightly more complicated? Let's try to split a camel case string into its individual words, i.e.: helloWorldCruel ===> hello, World, Cruel helloWorld ===> hello, World hello ===> hello >>> CAMELCASE = re.compile('^[a-z]+|[A-Z][a-z]*') >>> re_decamelcase = lambda s: re.findall(CAMELCASE, s) We can write this as a not-quite-equivalent function: >>> def decamelcase(s): ... prev_idx = 0 ... for i in range(len(s)): ... if s[i].isupper(): ... yield s[prev_idx:i] ... prev_idx = i ... yield s[prev_idx:] For small strings, it seems the regex is still slower: >>> timeit('t("helloWorldHowAre")', globals={'t': re_decamelcase}) 1.349484241567552 >>> timeit('list(t("helloWorldHowAre"))', globals={'t': decamelcase}) 1.2007813868112862 But for larger strings, the regex is faster: >>> timeit('t("hello" + "World"*200)', globals={'t': re_decamelcase}) 24.34574169665575 >>> timeit('list(t("hello" + "World"*200))', globals={'t': decamelcase}) 63.48653336800635 It would be intereting to also see if this holds for other VMs like Pypy, or if we can rewrite the decamelcase function to be quicker than the regex regardless.