Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/X2T9KMQfdmcCNCy9A
Back to the job results
No experience required
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

This role builds and runs the infrastructure our Generative AI products depend on: the pipelines that ship code, the platforms that run services and models, and the controls that keep all of it secure and reliable.
AI workloads bring their own demands.
GPUs, model serving, inference autoscaling, and token cost all shape the work, and you have run workloads like these before.
You should be comfortable owning infrastructure as code, CI/CD, observability, and security on AWS, and ready to set the operational standards a growing team will lean on.
WHAT YOU WILL DO Write and maintain infrastructure as code so environments are reproducible, reviewable, and quick to recover.
Own CI/CD: the pipelines that build, test, scan, and deploy applications, agents, and model-serving services.
Run the container platform (EKS, ECS, or Fargate) and the deployment workflows on top of it, including GitOps where it fits Stand up the runtime for AI workloads: GPU capacity, model serving such as vLLM, Triton, or TGI, inference autoscaling, and the gateways and caching that sit in front of the models.
Manage API gateways, networking, load balancing, DNS, and certificates so services are exposed safely and predictably.
Own secrets, identity, and least-privilege access across every environment.
Run databases in production: clustering, replication, failover, backups, and recovery.
Build monitoring into everything, including token usage and GPU utilisation, with alerting and clear service objectives.
Lead reliability and security practice: incident response, policy as code, vulnerability and container scanning, and cost discipline, which matters once GPUs are in the mix.
Eight or more years in DevOps, SRE, or infrastructure engineering overall.
That includes hands-on experience supporting AI or ML workloads in production, which can be a more recent part of your backgroun.
Strong infrastructure as code with Terraform or OpenTofu, including module design and remote state.
Experience Strong infrastructure as code with Terraform or OpenTofu, including module design and remote state.
Experience with HCP Terraform (formerly Terraform Cloud) is a plus.
Configuration management with Ansible.
Solid AWS experience across compute, networking (VPC, subnets, security groups, load balancers, Route 53), IAM, and storage Strong CI/CD with GitHub Actions, including reusable workflows and careful handling of credentials.
Containers and orchestration: Docker with Kubernetes (EKS preferred), Helm, and a registry such as ECR.
API gateway experience with Kong or Amazon API Gateway, including auth, rate limiting, and routing.
Database operations including clustering and high availability, with RDS or Aurora, PostgreSQL, and a cache such as Redis or ElastiCache Secrets management with HashiCorp Vault, AWS Secrets Manager, or Parameter Store.
Observability with Prometheus, Grafana, CloudWatch, and OpenTelemetry, or close equivalents.
Comfort in Linux and scripting with Bash and Python.
This job post has been translated by AI and may contain minor differences or errors.

Preferred candidate

Years of experience
No experience required
Degree
Bachelor's degree / higher diploma
You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.