Item #8 — Replace Suki

Self-Hosted Ambient Documentation.

Replace Suki on infrastructure LifeMD already owns. Same provider experience. Same EHR integration. Audio never leaves our perimeter.

MVP shipped on Spark today
Audio → transcript via NVIDIA Parakeet TDT Transcript → 23-section Suki-shape note via local LLM
Dima VorobiovSpark · ASR · LLMs · Suki schema · Voice loop · UI
Pedro MoreiraPitch · Testing · Coordination
StatusMVP live on Spark · ready to demo
The problem

Suki works. But the math doesn't.

Three structural reasons to evaluate a self-hosted alternative — and the integration shipped to dev only two months ago, so the switching cost is at its lowest.

$

Per-provider subscription

Suki bills monthly per provider. As LifeMD scales the provider base, this cost scales linearly with no leverage on unit economics.

PHI leaves our perimeter

Every encounter audio + transcript transits Suki's cloud. That's the largest third-party PHI surface we currently maintain.

Vendor lock-in

The 23-section LOINC schema and Elation pipeline are now coupled to one vendor's roadmap, prompts, and pricing decisions.

What we built

A drop-in replacement at the JSON boundary.

Audio in → byte-compatible Suki JSON out. Live on Spark today — Parakeet is transcribing, the local LLM is generating notes in Suki's exact 23-section schema. The existing thecvlb/lifemd → Elation pipeline doesn't change at all.

Capture

Telehealth Visit

Stereo recording with provider on the left channel, patient on the right.

L = Provider · R = Patient
Inference · NVIDIA DGX Spark
Live

On-Prem AI Pipeline

128 GB unified memory · GB10 · aarch64
1
NVIDIA Parakeet TDT-0.6BStereo split → per-channel ASR → dialogue · working
2
Local LLM via llama.cppOpenBioLLM-70B · Llama 3.3 70B · MedGemma 27B · working
→ {noteId, contents[]}
Unchanged

Existing Pipeline

thecvlb/lifemd backend

PR #15133 endpoint

Elation EHR

createNonVisitNote

GPU
GB10 Grace Blackwell
Memory
128 GB unified
CUDA
12.6+ SM_121
Driver
580.x aarch64
Models loaded
3 GGUF + 1 NeMo
Live demo · running on Spark

Watch it write a chart note.

Real output from this Spark — not screenshots. Pick a sample, pick a model, click Run live to watch the note stream out token-by-token, or jump to the Mic tab and dictate a note while you're talking.

Sample audio

Transcript · Parakeet TDT-0.6B

Generated note

— select a sample —
Live consultation · voice loop

Have a visit with our doctor.

Click start, speak as the patient. Parakeet streams your audio to ASR. Server-side voice activity detection ends each turn. The doctor (MedGemma 27B + Piper TTS, all on Spark) replies out loud and asks follow-up questions. End the consultation and we generate the structured note from the whole conversation.

Idle. Click "Start consultation" to begin.
Voice settings
Conversation will appear here, one turn at a time.

Structured note

Note will start filling in after the first exchange.
— waiting for patient turn —
Quality match

Same encounter. Same shape. Same EHR.

Suki output from production (Justin's March 11 thread) vs. ours from OpenBioLLM-70B on Spark. 21 fixed sections + dynamic problem sections. LOINC codes hardcoded by our backend; the LLM never picks codes.

Suki · production payload noteId: 316e923a…
Chief Complaint 10154-3

The patient presents for a consultation regarding weight management.

Problem List 11450-4

- Obesity - Hypertension

Patient Instructions 69730-0

- Lab tests before medication for weight loss - Continue nutrition and physical activity - Wegovy: side effects can include nausea, stomach discomfort, constipation - Follow-up after lab results

Vitals 8716-3

- Height: 5 feet 8 inches - Weight: 215 pounds

LifeMD on Spark · OpenBioLLM-70B noteId: 8f4c91a2…
Chief Complaint 10154-3

The patient presents for a consultation regarding weight management.

Problem List 11450-4

- Obesity - Hypertension

Patient Instructions 69730-0

- Lab tests are needed before starting weight-loss medication - Maintain current nutrition and physical activity - Wegovy: be aware of nausea, stomach discomfort, or constipation early - Plan follow-up once labs return

Vitals 8716-3

- Height: 5'8" - Weight: 215 lb

The numbers

Cheaper. Private. Ours.

And we own the prompt, the schema, the model — swap to whatever beats this on the day it ships.

~$0
Marginal cost / visit
Spark hardware amortized; inference cost is electricity. Suki: per-provider monthly subscription that scales linearly with provider count.
vs Suki $X.xx / provider / month · paste real number
0
Third-party PHI surface
Audio + transcript never leave the LifeMD perimeter. BAA scope shrinks. HIPAA-relevant risk surface reduces.
vs Suki cloud full encounter audio + transcript
<90s
Note generation latency
Stereo audio in, signed-ready Elation note out. Comparable to Suki's webhook delivery time.
measured on Spark paste benchmark
Roadmap

Five to seven weeks to full replacement.

Click a phase for details. The proof shipped during the hackathon — phases 2-4 are the engineering investment we're asking leadership to greenlight.

PHASE 01

Proof

Today

Spark + Parakeet ASR + 3-model A/B → Suki-shape JSON → live Elation write for one test patient.

SHIPPED on dev
PHASE 02

Productionize

2–3 weeks

Wrap into svc-transcribe with Presidio redaction, Celery async, ECS deploy, monitoring + alerting, audit logs.

PHASE 03

Streaming

1–2 weeks

NVIDIA Riva for production gRPC streaming, Sortformer diarization, Triton multi-tenancy.

PHASE 04

Parity+

2 weeks

Real-time partial transcripts during the visit, voice commands, multilingual via Canary-1B.

The ask

Greenlight Phase 2.

Replace Suki for all providers in ~5–7 weeks.

What we need
Two engineers · 5–7 weeks
  • Phase 2: production wrapping into svc-transcribe
  • Phase 3: NVIDIA Riva for streaming
  • Phase 4: real-time partial transcripts + parity features
Spark hardware already owned · zero new infra cost
What you get
Suki replaced. PHI on-prem. Cost approaches zero.
  • Same 23-section LOINC schema · same Elation pipeline
  • Model-agnostic — swap to better models as they ship
  • Forward-compatible with Riva, streaming, multilingual
Proof shipped today · ready to scale