UUbaidullah
  • Work
  • About
  • Writing
  • Now
  • Uses
  • Playground
Let's build something that actually ships.

I take on a small number of product engineering engagements at a time. If you're building something AI-shaped and need someone who can own end-to-end, that's where I'm useful.

Start a conversation
Sitemap
  • Work
  • About
  • Writing
  • Now
  • Uses
  • Playground
  • Contact
Elsewhere
  • GitHub
  • LinkedIn
  • Instagram
  • Email
© 2026 Ubaidullah. Built in Pakistan.
Next.js · React 19 · Tailwind v4 · Vercel
All work
ShippedMar 2026 — PresentAI Voice Agent Developer & Full-Stack Engineer

Nobuko Japan

Multi-provider AI voice agent and live supervision platform for a Tokyo-based used-vehicle exporter.

2+
Voice pipelines
Gemini Live + custom ElevenLabs + STT pipeline
1
Engineer
owning AI, automation, frontend & backend
100%
Audit coverage
audio, transcript, prompt, supervisor actions
4
Export markets
UK, Ireland, Cyprus, Pakistan
The problem

Sales calls were a bottleneck. A small team couldn't reach enough prospects, every call quality depended on which agent picked up, and there was no systematic way to learn from what went well or badly. They needed AI that could make outbound calls at scale, with human agents supervising in real time and a complete audit trail of every interaction.

The approach

Built a pluggable voice agent architecture where any voice model can be swapped in. Currently runs Gemini Live as one pipeline and a custom pipeline I built combining ElevenLabs TTS with separate transcription models. Users pick the model, the voice, and bring their own API keys — so when a better model launches, we can A/B test it in minutes instead of rewriting the system. On top of that, human agents can listen to live calls, see the streaming transcript, inject context from a sidebar to steer the AI mid-call, or take over entirely. Campaigns import contacts from Excel and launch outbound calls at scale through the company's SIP server.

Architecture

How the system is wired.

The boundaries that mattered: keeping the teacher UI responsive while heavy AI work happens behind a WebSocket + microservice boundary.

Client
Service
AI
Data
External
Hover to focus a service
The outcome

Demo platform handling outbound calling, live supervision, and full audit trails for every interaction. Every call's audio, transcript, exact prompt used, supervisor actions, and QA reviews are stored — mistakes get flagged and fed back into prompt improvements. The provider-agnostic design means the platform stays competitive as the voice AI landscape shifts every few months.

What I owned
  • 01Designed a pluggable multi-provider voice agent — any voice model can be swapped in; users pick model, voice, and bring their own API keys.
  • 02Built a custom voice pipeline combining ElevenLabs TTS with separate transcription models, alongside the Gemini Live pipeline.
  • 03Implemented live agent supervision — human agents hear AI calls in real time, see live transcripts, inject context to steer the AI mid-call, or take over.
  • 04Built the campaign system — sales agents import contacts from Excel, build outbound campaigns, and launch via the company's SIP server.
  • 05Designed the audit-trail data model so every call is reproducible: audio, transcript, exact prompt, supervisor actions, and QA review.
Stack
ReactNode.jsExpressMongoDBWebSocketsSIPGemini LiveElevenLabsSTT / TTSPrompt engineering
Previous
Tututor.ai
Next
Insight-X