FastAPI high-load login: how to handle bcrypt/argon2 hashing without blocking and scaling limits?

22 hours ago 1
ARTICLE AD BOX

I’m working on a high-load async web application using FastAPI and I have a question about implementing login/registration logic with password hashing.

Problem

As we know, password hashing (e.g. with bcrypt or argon2) is a CPU-bound and relatively slow operation. In my case I’m using bcrypt.

Hashing takes noticeable time per request

It blocks execution (CPU-bound)

Under high load (many concurrent login requests), it becomes a bottleneck

I also understand that:

Python bcrypt is implemented in C, so it can run in parallel threads

In async apps, blocking operations should be offloaded to a thread pool

What I tried

I tried to offload hashing using:

await asyncio.to_thread(bcrypt.hashpw, password, salt)

and also:

loop.run_in_executor(...)

However, under load I still hit a limit:

throughput does not scale as expected

increasing thread pool size does not significantly improve performance

CPU seems to become the bottleneck

Observations

Increasing number of workers (processes) gives noticeable performance improvement

Increasing number of threads (via executor) does not give the same effect

It feels like I’m limited by CPU and/or GIL behavior

Question

How is this problem typically solved in high-load Python web applications?

Specifically:

Is using ThreadPoolExecutor the correct approach for bcrypt/argon2 in async apps?

Is it expected that scaling is mostly achieved via multiple workers (processes) rather than threads?

Are there common architectural patterns, such as:

dedicated auth service

background workers

rate limiting login endpoints

Are there better alternatives to bcrypt for high-load scenarios (e.g. argon2 tuning)?

Goal

I’m trying to understand what is considered a best practice for handling password hashing in high-load async Python applications.

Any real-world approaches or architecture patterns would be highly appreciated.

Read Entire Article