ARTICLE AD BOX
It doesn’t make sense to me. It looks like nonzero(arr != 0) just creates an intermediate array, allocating more memory. No way it is faster otherwise why don’t numpy optimize it? But here is my benchmark:
import numpy as np from timeit import timeit arr = np.random.randint(0, 2, 10_000_000) a = timeit(lambda: np.nonzero(arr != 0), number=10) b = timeit(lambda: np.nonzero(arr), number=10) print(f"nonzero(arr != 0): {a}") print(f"nonzero(arr): {b}")Results:
nonzero(arr != 0): 0.20066774962469935 nonzero(arr): 0.5988789172843099It seems that nonzero is just much better if you convert the input into boolean array first. I tested it using smaller integers, and the results are consistent:
arr = np.random.randint(0, 2, 10_000_000, dtype=np.uint8) a = timeit(lambda: np.nonzero(arr != 0), number=10) b = timeit(lambda: np.nonzero(arr), number=10) c = timeit(lambda: np.nonzero(arr.view(bool)), number=10) print(f"nonzero(arr != 0): {a}") print(f"nonzero(arr): {b}") print(f"nonzero(view): {c}")Results:
nonzero(arr != 0): 0.15293325018137693 nonzero(arr): 0.5660374169237912 nonzero(view): 0.13332204101607203So an intermediate array does add some overhead, but the overhead is much smaller than the nonzero() overhead on non-boolean arrays.
Are there some reasons for this? My guess is that nonzero() is optimized (maybe using SIMD?) for boolean array, but somehow the optimizations doesn’t happen for other types.
