Submitting more applications increases your chances of landing a job.
Here’s how busy the average job seeker was last month:
Opportunities viewed
Applications submitted
Keep exploring and applying to maximize your chances!
Looking for employers with a proven track record of hiring women?
Click here to explore opportunities now!You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for
Would You Be Likely to Participate?
If selected, we will contact you via email with further instructions and details about your participation.
You will receive a $7 payout for answering the survey.
Work Flexibility: Hybrid or Onsite
Principal Engineer – AI Quality & Evaluation Architecture
Role Overview
Vocera (Stryker) is seeking a Principal Engineer to own end-to-end AI quality across the lifecycle — data, models, prompts, evaluation, deployment, and monitoring. This role will define and scale reliable, measurable, production-grade AI systems across speech, NLP, and GenAI in healthcare.
Key Responsibilities
AI Quality Ownership
Own AI quality across the full lifecycle
Define SLAs, KPIs, release gates, and production readiness decisions
Evaluation & Reliability
Build evaluation frameworks for ASR (WER, latency), NLP (intent/entity), and LLMs/RAG (hallucination, safety, groundedness)
Develop benchmarking, regression pipelines, and golden datasets
Drive adversarial testing, edge case handling, and failure analysis
AI Testing Platform
Architect scalable evaluation platforms (offline, regression, A/B, shadow testing)
Integrate with CI/CD and MLOps pipelines
Implement monitoring, observability, and drift detection
Data Governance
Define standards for data curation, annotation, and versioning
Ensure reproducibility and feedback loops from production
Maintain healthcare data compliance
MLOps & Continuous Quality
Establish AI MLOps standards for evaluation, retraining, and deployment
Enable continuous evaluation and performance monitoring at scale
Leadership
Act as AI quality authority across the organization
Mentor teams and align with product and business goals
Qualifications
12+ years in software/AI engineering; 5+ years in LLMs, NLP, RAG, or speech
Experience building scalable AI evaluation frameworks
Expertise in:
LLM evaluation (hallucination, safety, groundedness)
Golden datasets, regression testing, adversarial testing
Prompt validation, Python, data analysis, automation
CI/CD, MLOps, distributed systems
Nice to Have
RAG evaluation & retrieval benchmarking
Speech/ASR evaluation
Azure ML / OpenAI / AI Search
Responsible AI & compliance
You'll no longer be considered for this role and your application will be removed from the employer's inbox.