Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/PAUkXE9oaxznu8ie6
Back to the job results

Lead Site Reliability Engineer

30+ days ago 2026/10/03
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Our Purpose




Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.




Title and Summary




Lead Site Reliability Engineer
Role Overview
We are seeking a highly technical Lead Site Reliability Engineer (SRE) to architect, engineer, and operate highly reliable, scalable, and secure platforms across multi-cloud (AWS, Azure) and hybrid (on-prem + cloud) environments.
This is a deeply hands-on engineering role requiring expertise in distributed systems, Kubernetes, hybrid networking, automation, CI/CD, observability, and production incident leadership. The Lead SRE will serve as the technical authority for reliability across interconnected cloud and datacenter ecosystems.

Core Responsibilities

1. Reliability Engineering Across Hybrid & Multi-Cloud
• Define and implement SLIs, SLOs, and error budgets across cloud-native and on-prem workloads.
• Architect high-availability designs spanning:
o AWS and Azure regions
o On-prem datacenters
o Cross-cloud failover patterns
• Design DR strategies (RTO/RPO driven) across hybrid environments.
• Eliminate single points of failure across network, compute, storage, and DNS layers.
• Conduct resilience validation, chaos testing, and failure scenario modeling.

2. Multi-Cloud Architecture & Engineering
• Engineer and operate workloads across:
o Amazon Web Services
o Microsoft Azure
• Design cross-cloud networking (VPN, ExpressRoute, Direct Connect, Transit Gateway).
• Implement workload portability and cloud-agnostic deployment strategies.
• Optimize cost, performance, and reliability across providers.
• Design cloud-native autoscaling, load balancing, and traffic routing strategies.

3. Hybrid Infrastructure (On-Prem + Cloud Integration)
• Integrate on-prem infrastructure with cloud platforms using:
o Active Directory / IAM federation
o Hybrid DNS architecture
o Secure certificate lifecycle management
• Troubleshoot hybrid connectivity issues (BGP routing, firewall policies, NAT, MTU mismatches).
• Manage hybrid Kubernetes deployments and private registry integrations.
• Support legacy-to-cloud modernization initiatives.

4. Kubernetes & Container Platform Engineering
• Architect and operate:
o Amazon EKS
o Azure Kubernetes Service
o Self-managed Kubernetes clusters (on-prem)
• Optimize cluster autoscaling, resource allocation, and performance.
• Implement cluster security hardening and RBAC governance.
• Troubleshoot CNI, ingress controllers, service mesh, and pod networking issues.
• Implement GitOps-driven deployments.

5. Observability Engineering Across Distributed Systems
• Build unified observability across hybrid environments using:
o Splunk
o Dynatrace
o Prometheus
o Grafana
o OpenTelemetry
• Implement centralized logging across cloud and on-prem workloads.
• Design distributed tracing across multi-cloud microservices.
• Engineer proactive alerting to reduce MTTR and improve signal quality.

6. CI/CD & Infrastructure Automation
• Engineer resilient CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps).
• Implement cross-cloud infrastructure as code using:
o Terraform
o CloudFormation
• Automate:
o Certificate rotation
o Auto-scaling policies
o Patch orchestration
o Drift detection
• Improve deployment reliability via blue-green and canary strategies.

7. Advanced Production Troubleshooting
• Lead technical investigation of:
o DNS resolution failures (private/public zones, hybrid forwarding)
o TLS/PKI certificate failures
o Network latency across hybrid circuits
o Memory leaks & kernel-level issues
o Thread contention & CPU throttling
• Perform packet-level debugging (tcpdump, netstat, traceroute).
• Analyze distributed system failures spanning multiple platforms.

Technical Skills Required
• 7–10+ years in SRE / DevOps / Cloud Engineering roles.
• Deep hands-on experience in:
o AWS and Azure
o Hybrid networking
o Kubernetes (cloud & on-prem)
• Strong knowledge of:
o Linux internals
o TCP/IP, DNS, Load Balancing
o TLS/PKI and certificate lifecycle
o Distributed systems architecture
• Strong scripting/programming skills (Python preferred).
• Experience designing cross-cloud DR and failover models.
• Experience with infrastructure as code and GitOps.

Preferred Certifications
• AWS Solutions Architect (Associate/Professional)
• Azure Architect / DevOps Engineer
• Certified Kubernetes Administrator (CKA)

Work Schedule Requirement
This role supports globally distributed, business-critical systems operating 24x7.
The candidate must be willing to participate in rotational on-call shifts, including weekends and off-hours support, as part of a follow-the-sun enterprise support model.

Key Success Metrics
• Improved cross-cloud resiliency and DR posture.
• Reduced hybrid networking incidents.
• Improved SLO compliance across platforms.
• Measurable MTTR reduction.
• Increased automation coverage.
• Reduced change failure rate.

Corporate Security Responsibility




All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:



  • Abide by Mastercard’s security policies and practices;



  • Ensure the confidentiality and integrity of the information being accessed;



  • Report any suspected information security violation or breach, and



  • Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.








This job post has been translated by AI and may contain minor differences or errors.
You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.