why is nonzero(arr != 0) faster than nonzero(arr) in python

2 weeks ago 20
ARTICLE AD BOX

It doesn’t make sense to me. It looks like nonzero(arr != 0) just creates an intermediate array, allocating more memory. No way it is faster otherwise why don’t numpy optimize it? But here is my benchmark:

import numpy as np from timeit import timeit arr = np.random.randint(0, 2, 10_000_000) a = timeit(lambda: np.nonzero(arr != 0), number=10) b = timeit(lambda: np.nonzero(arr), number=10) print(f"nonzero(arr != 0): {a}") print(f"nonzero(arr): {b}")

Results:

nonzero(arr != 0): 0.20066774962469935 nonzero(arr): 0.5988789172843099

It seems that nonzero is just much better if you convert the input into boolean array first. I tested it using smaller integers, and the results are consistent:

arr = np.random.randint(0, 2, 10_000_000, dtype=np.uint8) a = timeit(lambda: np.nonzero(arr != 0), number=10) b = timeit(lambda: np.nonzero(arr), number=10) c = timeit(lambda: np.nonzero(arr.view(bool)), number=10) print(f"nonzero(arr != 0): {a}") print(f"nonzero(arr): {b}") print(f"nonzero(view): {c}")

Results:

nonzero(arr != 0): 0.15293325018137693 nonzero(arr): 0.5660374169237912 nonzero(view): 0.13332204101607203

So an intermediate array does add some overhead, but the overhead is much smaller than the nonzero() overhead on non-boolean arrays.

Are there some reasons for this? My guess is that nonzero() is optimized (maybe using SIMD?) for boolean array, but somehow the optimizations doesn’t happen for other types.

Read Entire Article