Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/qDBYAtsX7druaAxn6
Back to the job results

Operations Engineer

30+ days ago 2026/07/01
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Project Role : Operations Engineer
Project Role Description : Support the operations and/or manage delivery for production systems and services based on operational requirements and service agreement.
Must have skills : Site Reliability Engineering
Good to have skills : NA
Minimum 5 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary: A Site Reliability Engineer (SRE) ensures systems are stable, scalable, and highly available, bridging the gap between Business Application development and IT operations. This role combines automation, observability, incident response, and performance engineering to maintain continuous service reliability while accelerating delivery velocity. The Site Reliability Engineer designs and maintains production systems that meet defined Service Level Objectives (SLOs) and error budgets. Using software engineering principles, an SRE prevents downtime, automates operations, and improves platform performance through observability, fault tolerance, and system resilience. Roles & Responsibilities: Reliability and Performance: Monitor and optimize system uptime, latency, and throughput to meet SLOs and SLIs. Incident Management: Lead incident response, manage escalations, perform root cause analysis (RCA), and drive postmortem reviews. Automation and Tooling: Develop CI/CD pipelines, automate infrastructure management, and eliminate manual toil through scripting and orchestration. Monitoring and Observability: Implement metrics, logging, and tracing frameworks (Prometheus, Grafana, ELK, Datadog) to gain real-time visibility into distributed systems. Capacity Planning: Conduct resource forecasting, design scalable infrastructure, and handle performance under surge conditions. Change & Release Management: Partner with developers to ensure safe, reliable rollout of new features with automated testing and rollback mechanisms. Disaster Recovery & Resilience Engineering: Implement multi-region resilience strategies, chaos tests, and failover automation for business continuity. Process Improvement: Use post-incident analytics to refine operational practices and improve reliability with data-driven improvements. Collaborate with product, design, ML, and DevOps teams to build intelligent workflows and user experiences Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, AZURE DEV OPS or Pulumi. Expert in Cloud IaaS and PaaS services. Integrate and support AI-driven tools and frameworks, including Generative AI and Agentic AI technologies, within cloud infrastructure and applications. Professional & Technical Skills: Expertise in Python, Go, Bash, or JavaScript for automation and tooling. Hands-on with cloud environments AWS, Azure, GCPnd orchestration tools like Kubernetes and Terraform. Deep understanding of Linux systems, networking, and distributed architectures. Experience with observability solutions Prometheus, Grafana, Datadog, CloudWatch, or New Relic. Familiarity with incident management and alerting platforms (PagerDuty, xmatters) Proficiency in CI/CD frameworks such as Jenkins, GitHub Actions, or GitLab CI. Working knowledge of security, compliance, and performance optimization for highly available systems. Agentic AI Framework (CrewAI, LangGraph, AutoGen) and Responsible AI Concepts and AI Guardrails Additional Information: This position is based at our Bengaluru office. A 15 years full time education is required. AWS Certified Solutions Architect Professional Microsoft Certified: Azure Solutions Architect Expert Google Professional Cloud Architect Certified Kubernetes Administrator (CKA) HashiCorp Certified: Terraform Associate Certified DevOps Engineer certifications (AWS, Azure, or Google) Resource needs to be AI Ready.
This job post has been translated by AI and may contain minor differences or errors.

You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.