ARTICLE AD BOX
This is related to this question.
Basically, I need to add many fractions together. Seems simple enough? Except it isn't simple.
I can't use fractions.Fraction class, that thing stupidly does fraction reduction every single step and has all Python object overheads therefore is extremely inefficient.
Now how to properly add fractions? Use the formula a/b + c/d = (a * d + b * c)/(b * d), we can just separate the numerator and denominator and store them as a pair of integers and never do the division and we then just combine them using the formula. Done. Done? No, it is never so simple.
In [3]: from fractions import Fraction In [4]: Fraction(500, 1000) Out[4]: Fraction(1, 2) In [5]: %timeit Fraction(1, 2) + Fraction(3, 5) 2.24 μs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [6]: a, b, c, d = 1, 2, 3, 5 In [7]: %timeit a * d + b * c, b * d 98.3 ns ± 0.401 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)As you can see, fractions.Fraction is about 22 times slower than the simple cross multiplication.
But as you can see, the cross multiplication method has problems too, it has to do three multiplications and one addition every single call. Which means the numerator and denominator will always get bigger. And Python's int is arbitrary precision (not infinite precision as it has a maximum number of digits bound by ssize_t), that is why I use fractions over float because IEEE-754 double has measly 53 bits of mantissa thus 53 * log10(2) = 15.954589770191003 accurate decimal digits.
And as the numbers get larger, cost of addition and multiplication will also get larger, they both scale with the number of digits. Which means the more fractions I add together, the larger the numbers will be, and the more expensive the calculations will become.
So we need to reduce the fractions to keep them from getting too large. How do we reduce the fractions? It is simple, we calculate the greatest common denominator of the numerator and denominator and do integer division by the GCD on both numerator and denominator.
Is the problem solved? Not, GCD is slow, and it also scales with the number of digits. And instead of just three multiplications and one addition every iteration, we have to do three multiplications and one addition, one GCD and two floor divisions. So this method will only make the code even slower.
As my test proves:
import dill from concurrent.futures import ThreadPoolExecutor from itertools import batched from math import gcd from pathos.multiprocessing import ProcessPool def arctan_derivative(x: int, y: int) -> tuple[int, int]: return y * y, x * x + y * y def primitive_rationals_simple(denominator: int) -> list[tuple[int, int]]: assert isinstance(denominator, int) and denominator > 0 mid = (den_plus := denominator + 1) >> 1 result = [(1, 2)] * den_plus result[0] = (0, 1) result[-1] = (1, 1) for i in range(1, mid): result[i] = ( (n := i // (common := gcd(i, denominator))), (d := denominator // common), ) result[denominator - i] = (d - n, d) return result def make_test_case(n: int) -> list[tuple[int, int]]: return [arctan_derivative(*frac) for frac in primitive_rationals_simple(n)] def test1(fracs: list[tuple[int, int]]) -> tuple[int, int]: num, den = 0, 1 for tnum, tden in fracs: num, den = num * tden + tnum * den, den * tden return num, den def test2(fracs: list[tuple[int, int]]) -> tuple[int, int]: num, den = 0, 1 for tnum, tden in fracs: num, den = num * tden + tnum * den, den * tden num //= (common := gcd(num, den)) den //= common return num, den In [38]: data = make_test_case(360) In [39]: %timeit test1(data) 268 μs ± 7.24 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [40]: %timeit test2(data) 9.72 ms ± 131 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)As you can see, reducing the fractions every single step makes the code 36 times slower.
But, what if we can both reduce the fractions and not slow down the execution too much? What if we reduce the fractions and add them simultaneously in parallel? We can have the best of both worlds. As you can probably guess from my imports, we have to add the fractions in parallel.
Now consider a fraction addition function, it takes two fractions and gives one fraction. Seems simple enough, right? Now, for two pairs of fractions, we need two calls, obviously the output of each call is independent from each other. Consider four pairs of fractions, the output of each call is independent. Thus they can be done in parallel.
Okay, given four pairs of fractions, label them (a, b), (c, d), (e, f), (g, h), how do we add all of them together?
It is simple, we add the pairs, then we have four fractions: (a+b, c+d, e+f, g+h). After this we have two pairs, so we add the pairs together: (a+b+c+d, e+f+g+h). And look, we now have one pair, so we add them together, done.
Thus, for n fractions, we have n//2 pairs, with one unpaired fraction if n is odd, all these pairs can be added in parallel. Which means we can also do GCD in parallel. And then we have n//2 fractions with a conditional carry fraction, we can iteratively and recursively use the output of the current iteration as the input of the next iteration, until there is only one fraction in the stack.
Here is the algorithm:
def add_fractions(fracs: tuple[tuple[int, int], tuple[int, int]]) -> tuple[int, int]: (num_1, den_1), (num_2, den_2) = fracs num, den = num_1 * den_2 + num_2 * den_1, den_1 * den_2 num //= (common := gcd(num, den)) den //= common return num, den def test3(fracs: list[tuple[int, int]]) -> tuple[int, int]: with ProcessPool() as pool: while (length := len(fracs)) > 1: carry = fracs.pop(-1) if length & 1 else None fracs = pool.map(add_fractions, batched(fracs, 2)) if carry: fracs.append(carry) return fracs[0]I don't know how fast it is, because it doesn't work.
In [41]: %timeit test3(data) --------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Program Files\Python312\Lib\site-packages\multiprocess\pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\site-packages\multiprocess\pool.py", line 48, in mapstar return list(map(*args)) ^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda> func = lambda args: f(*args) ^^^^^^^^ File "<ipython-input-37-f2f753c191f8>", line 53, in add_fractions NameError: name 'gcd' is not defined """And I know there are many ways to execute code in parallel: concurrent.futures.ThreadPoolExecutor, concurrent.futures.ProcessPoolExecutor, threading.Threads, asyncio, subprocess... I have no idea which to use.
How can I properly parallelize fraction addition and make it as fast as possible?
