Job description
Lucida is teaching the world to speak.
Two billion people are trying to learn a language.
Almost all of them are stuck ; not because they lack motivation, but because the only thing that actually works (talking to a human tutor) is too expensive, too inconvenient, or too embarrassing.
We're building the alternative: a voice-first AI tutor you can actually have a conversation with, anytime, in your pocket.
Real-time. Sub-second.
Feels-like-a-person.
Already serving a million learners.
We're well-funded, seed-stage, and we're hiring the engineer who'll build the backbone behind that product.
The role You'll own a meaningful surface of our backend ; the systems that turn audio, models, prompts, and user state into a working tutor at scale.
Day-to-day, you'll: Design and operate the real-time conversational pipeline ; streaming services and WebSocket interfaces that keep latency budgets honest at the scale of a million users Build and harden the LLM orchestration layer ; prompt design as code, structured outputs, streaming, retries, fallbacks, cost control across multiple providers Treat prompts as engineering artifacts : versioned, evaluated, regression-tested.
Vibes are not a methodology.
Take open-source models (LLM, ASR, TTS, avatar) from a paper or HF repo and put them on our GPUs ; benchmark, optimize, serve, monitor Fine-tune and train our own models on top of open-source bases ; curate datasets, run training jobs, evaluate against production criteria, and ship the result Design event-driven media flows ; webhooks, post-session processing, recording and export pipelines Own third-party integrations end-to-end ; contracts, retries, observability, the boring-important stuff Make architecture decisions with the founders, not after them What we're looking for 5+ years writing production Python you're not embarrassed by ; typed, tested, readable Deep fluency in asyncio and concurrent/streaming code Strong command of HTTP, WebSockets, and event-driven systems Hands-on experience integrating with LLM APIs in production ; streaming, tool use, structured outputs, and the operational realities (rate limits, retries, cost control) A real sense of prompt engineering as engineering ; you've shipped prompts that survived contact with users, iterated on them with data, and didn't just "feel good in the playground" A real fine-tuning / training track record ; you've taken an open-source model, prepared the data, run the training, evaluated it honestly, and shipped the result to users.
Not a notebook tutorial.
A model that moved a metric.
Experience deploying and serving your own models on GPUs ; quantization, batching, KV-cache, latency/throughput tradeoffs A debugging instinct for distributed systems at scale: traces, profiling, backpressure, capacity planning Comfort with Postgres, Redis, and a queue/broker layer Pragmatism ; you ship, you measure, you iterate.
You don't over-engineer, and you don't under-test.
Nice to have Real-time media systems (WebRTC, SFU, streaming pipelines) Audio or speech model deployment and fine-tuning in production Distillation, synthetic data generation, or RLHF/DPO-style alignment work Multi-region or multi-cloud infrastructure Cost optimization at scale, token economics, GPU utilization, caching strategies Open-source contributions
This job post has been translated by AI and may contain minor differences or errors.