case-studies

Answering Agent: Helped create a call score using AI

Answering Agent is an AI-powered calling platform that automates customer outreach, follow-ups, and support conversations. As an AI-first product, the quality of each call directly impacts customer satisfaction, product trust, and conversion outcomes. But without a clear way to evaluate call performance, the team struggled to understand how well the AI was functioning — and what needed improvement.

Challenge 1: No Visibility Into AI Call Quality

The core issue Answering Agent faced was the inability to evaluate whether an AI-generated call was “good” or “bad,” and more importantly, why.

This created multiple problems:

The team couldn’t pinpoint weaknesses in the AI’s calling logic
Product decisions were based on assumptions rather than data
There was no standardized framework for evaluating calls
Improving the AI became guesswork instead of a structured process

Without call-level insights, the product's evolution was limited.

Solution

To give the team clarity and actionable insights, we built an end-to-end evaluation framework for AI call quality.

1. Deep dive into the product & industry benchmarks

We studied:

How the AI conducted calls
Typical call flows and expected outcomes
Industry standards for conversational AI performance
Real-world call scenarios and edge cases

This groundwork enabled us to define what “good” looked like.

2. Defined a complete set of evaluation metrics

We created a structured metric system that captured dimensions such as:

Call clarity and coherence
Response relevance
Latency and hesitation patterns
Completion of the intended task
User sentiment cues
Compliance with call scripts or guidelines

These metrics formed the foundation of a consistent scoring framework.

3. Built an AI call scoring model

On top of the defined metrics, we developed a model that:

Analyzed each AI call
Scored it across the defined dimensions
Highlighted specific issues when a call underperformed
Provided an overall call quality score

This transformed raw call data into actionable insights.

Result

The scoring system is now actively used across the team:

They can instantly see which calls performed poorly and why
Product and engineering teams can prioritize improvements based on real data
The model enables continuous optimization of the AI calling logic
Over time, call quality has become significantly more predictable and measurable

The team now operates with full visibility into AI performance, making improvements faster, more targeted, and far more effective.

My Love-Hate Relationship With PostHog (And Why I Keep Recommending It Anyway)

I have a complicated relationship with PostHog. Always have. There are things about it that genuinely frustrate me. Things I've complained about more than once. And yet, every time a founder asks me "what analytics tool should I use?" - PostHog is almost always my answer.

AnyIP: Helped get more visibility into marketing, and improving CAC, spend efficiency

About AnyIP AnyIP is a high-performance proxy and networking infrastructure provider used by developers, marketers, and automation teams to route traffic reliably across the globe. With multiple marketing channels driving acquisition, the team needed clarity on performance, spend efficiency, and the true ROI of their campaigns, but their analytics setup

FRAI: Improved paid conversion rate (2x)

About FRAI FRAI is an AI-driven product designed to help users clear interviews using the help of AI-interviewer for their live job interviews. As the product grew, the team needed clarity on user behaviour, conversion drivers, and the effectiveness of their experiments. Without a clear understanding of what was working,

Termplus: Setup the entire Data Analytics Infra, & unified user journey

About TermPlus TermPlus is a digital financial services platform offering streamlined insurance and banking workflows for users across Australia. As the product scaled, the team needed clearer visibility into user behaviour, product performance, and marketing effectiveness. They were using PostHog, but the setup was incomplete, leaving them without reliable insights