We are looking for a high-impact Observability Engineer with proven experience in fintech, banking, or other regulated environments to design and scale enterprise-grade observability systems.
This role is critical in ensuring high availability, low latency, and full-stack visibility across mission-critical financial platforms, while supporting compliance, auditability, and incident response readiness .
Key Responsibilities: Own and evolve the end-to-end observability architecture across applications, infrastructure, and cloud environments Centralize metrics, logs, traces, and events with high reliability and scalability Design and enforce SLOs, SLIs, and error budgets for critical financial systems Build advanced real-time dashboards and business-aligned KPIs for engineering and leadership Develop intelligent alerting frameworks to minimize noise and enable faster incident resolution Ensure observability pipelines are resilient, scalable, and cost-optimized Collaborate with DevOps and engineering teams to implement instrumentation, distributed tracing, and logging standards Integrate observability systems with incident management, on-call, and escalation workflows Support compliance, audit, and forensic analysis through structured logging and traceability Drive root cause analysis (RCA) and continuous improvement of system reliability Automate monitoring, alerting, and data enrichment workflows 6 to 10 years of experience in Observability, SRE, or Monitoring Engineering roles Mandatory experience in fintech, banking, or highly regulated environments Strong hands-on expertise with: Monitoring: Dynatrace, Prometheus, Grafana Logging: Elastic Stack (ELK), Splunk, Fluentbit, Logstash Alerting & Correlation: Dynatrace, ELK, Splunk Alertmanager Proficiency in PromQL, SPL, KQL for advanced log/metric analysis Experience developing high-performance, scalable dashboards in Grafana and Kibana , integrating application, infrastructure, and business KPIs for end-to-end observability.
Deep understanding of distributed systems observability and performance monitoring Experience with high-throughput, low-latency systems Experience with enterprise monitoring tools such as Riverbed and SolarWinds for network performance monitoring (NPM), application visibility, traffic analysis, and infrastructure health tracking across distributed systems.
Core Expertise: Observability pillars: metrics, logs, traces, events Golden signals: latency, traffic, errors, saturation SLO/SLI-driven reliability engineering Alert design with high signal-to-noise ratio Telemetry standardization and instrumentation strategies Mapping technical metrics to financial/business KPIs Preferred Qualifications and FinTech Alignment: Proven experience supporting audit, compliance, and regulatory requirements within fintech, banking, or other regulated environments Strong familiarity with industry frameworks such as: PCI DSS ISO 27001 SAMA / NCA Solid understanding of data sensitivity, traceability, and audit logging standards for financial systems Experience working on large-scale fintech or digital banking platforms Exposure to CI/CD-integrated observability and DevSecOps practices Proficiency in scripting and automation (Python, Bash) Hands-on experience with incident management and on-call frameworks (e.
g., PagerDuty, Opsgenie) What We’re Looking For: A proactive engineer with a strong reliability and performance mindset Ability to translate observability data into actionable insights Experience working cross-functionally with SRE, DevOps, and product teams Ownership-driven individual focused on continuous improvement of monitoring systems