Designing a real-time anti-scam detection system for voice and text — what features and architecture should I consider?

18 hours ago 1

I am building a real-time scam detection prototype for both voice calls and text chats.

Voice input will be transcribed using streaming ASR, then classified with NLP.

Text input will be classified directly.

The system should output a risk score and highlight suspicious phrases (e.g., urgency, requests for OTP/passwords, impersonation).

It needs to run with low latency on mobile/edge, so model size and inference cost matter.

I am currently deciding between:

rule-based keywords,

a lightweight transformer classifier,

or a hybrid approach.

Questions:

What features beyond keywords are effective for scam detection in conversations?

What lightweight models are suitable for real-time on-device text classification?

For streaming voice transcripts, should classification be done on sliding windows or full utterances?

What pipeline design best balances latency and accuracy (ASR → NLP → risk scoring)?

I am looking for implementation patterns and system design advice rather than product recommendations.

Hidden in mobile, Best for skyscrapers.