Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/Lu2vgrtL3jNajo8v9
Back to the job results

Manager of Site Reliability Engineer

Yesterday 2026/09/12
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.


As a SRE Manager at JPMorgan Chase within the Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.



Job Responsibilities:


  • Define and enforce quality gates across requirements, design, secure coding, testing, release, and post-production monitoring, translate business objectives into clear, testable requirements that include reliability, availability, performance, security, and observability.
  • Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans, challenge Product Owners and teams to meet a rigorous, objective Definition of Done before release.
  • Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit, integration, security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.
  • Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.
  • Maintain logging, alerting, and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security, reliability, and change management; promote automation to eliminate toil.
  • Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and products
  • Oversee capacity planning and resilience management for large-scale, distributed systems, Partner with engineering on public cloud best practices (AWS or equivalent) for compute, storage, networking, messaging, automation (CloudFormation, Terraform), and data services.
  • Build a culture of collaboration, reliability, and continuous improvement; coach teams to adopt DevOps and SRE principles. Partner with regional engineering leaders to drive operational best practices and consistent execution. Provide concise, outcome-focused updates to management and stakeholders; influence decisions across Product, Engineering, SRE, and Security.

Required Qualifications, Capabilities, and Skills


  • Formal training or certification with 5+ years supporting critical finance-focused applications in large-scale environments and managing and mentoring teams.
  • Solid understanding of AI-assisted solutions to accelerate root cause analysis and reduce overall TTX with appropriate validation and human judgment  
  • Experience with monitoring/logging tools (e.g., Splunk, AppDynamics) and dashboard technologies; 
  • Strong grasp of SDLC, secure development, DevOps/CI/CD tooling; capable of implementing top-tier continuous improvement with root-cause analysis and auto-remediation.
  • Effective under pressure; accountable, with excellent stakeholder management and communication skills.
  • This position may require HSA system access. Enhanced screening (criminal and credit background checks, and/or other screening) is required prior to employment and annually thereafter.
  • Global team collaboration with flexibility to engage during critical incidents outside standard business hours
  • Experience implementing and managing SLOs/SLIs, error budgets, and operational readiness reviews for distributed systems, including leading post-incident analysis and resilience improvements.
  • Deep expertise in public cloud platforms (AWS or equivalent), infrastructure automation tools (CloudFormation, Terraform), and capacity planning for large-scale environments, with a track record of driving DevOps and SRE adoption across teams.

Preferred Qualifications


  • Splunk Administrator certification desired.

This job post has been translated by AI and may contain minor differences or errors.

You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.