The word problem

Traditional sentiment analysis works by classifying words and phrases as positive, negative or neutral. It was designed for product reviews and social media — contexts where the words someone chooses are a reasonable proxy for how they feel.

In financial services, this assumption breaks down. A customer in genuine financial distress may speak calmly and politely throughout an entire collections call. A customer making a fraudulent insurance claim may use appropriately emotional language. The words are not the signal. The person is the signal.

What sentiment analysis misses

Consider a debt recovery call where the customer says "I understand" in a quiet, steady voice. Sentiment analysis scores this as neutral or mildly positive. But voice prosody analysis reveals: pitch has dropped 40% from baseline, speech rate has halved, micro-pauses have tripled in frequency. The customer is not calm. They are shutting down — a classic vulnerability indicator that keyword analysis will never detect.

FACS-based facial analysis adds another dimension. Involuntary Action Units — brow tension (AU1+4), lip press (AU24), chin raiser (AU17) — signal stress, suppression and cognitive load regardless of what the person is saying. These are physiological responses that cannot be consciously controlled.

The regulatory consequence

When the FCA reviews a complaint escalation, they ask: did the firm identify vulnerability? If the firm's compliance tools classified the interaction as "neutral" because the words were polite, the firm has a documentation gap that no amount of process can fill.

EchoDepth closes this gap by analysing the physiological signal beneath the words — across voice, video, image and text — producing structured VAD scores that document emotional state at every point in the interaction.

Sentiment analysis reads the script. EchoDepth reads the performance. See how multimodal analysis works →