Submitting more applications increases your chances of landing a job.
Here’s how busy the average job seeker was last month:
Opportunities viewed
Applications submitted
Keep exploring and applying to maximize your chances!
Looking for employers with a proven track record of hiring women?
Click here to explore opportunities now!You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for
Would You Be Likely to Participate?
If selected, we will contact you via email with further instructions and details about your participation.
You will receive a $7 payout for answering the survey.
We are seeking a Senior MLOps Engineer to design, build, and maintain the infrastructure and pipelines that operationalize AI and Machine Learning systems at scale.
This role bridges the gap between model development and production deployment—ensuring ML and GenAI workloads are reliable, observable, cost-efficient, and continuously improving across enterprise environments.
Key Responsibilities Design and implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, evaluation, and deployment.
Build and manage CI/CD pipelines for ML models, including automated testing, validation, and rollback mechanisms.
Architect and maintain model serving infrastructure for real-time and batch inference workloads, including LLM and agentic AI deployments.
Implement model monitoring, drift detection, and alerting systems to ensure production model health and reliability.
Manage experiment tracking, model versioning, and artifact registries to enable reproducibility and governance.
Optimize compute costs and inference latency across GPU/CPU workloads on cloud platforms (AWS, Azure, or GCP).
Containerize and orchestrate ML workloads using Docker and Kubernetes.
Automate data pipeline workflows and feature store management for training and inference.
Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the path from prototype to production.
Establish and enforce MLOps best practices, standards, and documentation across the engineering organization.
Bachelor’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in DevOps, Platform Engineering, or MLOps roles with 1–2+ years focused on ML/AI infrastructure.
Strong programming skills in Python; experience with Bash, Go, or Java is a plus.
Hands-on experience with ML pipeline orchestration tools such as Kubeflow, MLflow, Airflow, or Vertex AI Pipelines.
Proficiency with containerization (Docker) and orchestration (Kubernetes, Helm).
Experience with cloud-native ML services on AWS (SageMaker), Azure (Azure ML), or GCP (Vertex AI).
Familiarity with model serving frameworks such as TorchServe, Triton Inference Server, vLLM, or TGI.
Knowledge of Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or equivalent).
Strong understanding of software engineering fundamentals, version control (Git), and CI/CD practices.
Nice to Have: Experience deploying and serving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems in production.
Familiarity with vector databases (Pinecone, Weaviate, Qdrant, or pgvector).
Exposure to AI observability platforms (LangSmith, Weights & Biases, Arize, or WhyLabs).
Experience with feature stores (Feast, Tecton, or equivalent).
Familiarity with GPU cluster management and distributed training infrastructure.
Experience with enterprise SaaS platforms and multi-tenant ML infrastructure.
You'll no longer be considered for this role and your application will be removed from the employer's inbox.