ARTICLE AD BOX
I’m working on a high-load async web application using FastAPI and I have a question about implementing login/registration logic with password hashing.
Problem
As we know, password hashing (e.g. with bcrypt or argon2) is a CPU-bound and relatively slow operation. In my case I’m using bcrypt.
Hashing takes noticeable time per request
It blocks execution (CPU-bound)
Under high load (many concurrent login requests), it becomes a bottleneck
I also understand that:
Python bcrypt is implemented in C, so it can run in parallel threads
In async apps, blocking operations should be offloaded to a thread pool
What I tried
I tried to offload hashing using:
await asyncio.to_thread(bcrypt.hashpw, password, salt)and also:
loop.run_in_executor(...)However, under load I still hit a limit:
throughput does not scale as expected
increasing thread pool size does not significantly improve performance
CPU seems to become the bottleneck
Observations
Increasing number of workers (processes) gives noticeable performance improvement
Increasing number of threads (via executor) does not give the same effect
It feels like I’m limited by CPU and/or GIL behavior
Question
How is this problem typically solved in high-load Python web applications?
Specifically:
Is using ThreadPoolExecutor the correct approach for bcrypt/argon2 in async apps?
Is it expected that scaling is mostly achieved via multiple workers (processes) rather than threads?
Are there common architectural patterns, such as:
dedicated auth service
background workers
rate limiting login endpoints
Are there better alternatives to bcrypt for high-load scenarios (e.g. argon2 tuning)?Goal
I’m trying to understand what is considered a best practice for handling password hashing in high-load async Python applications.
Any real-world approaches or architecture patterns would be highly appreciated.
