Job description
Job Purpose As a Senior DevOps Engineer at Tarjama&, you will own the design, reliability, and scalability of our cloud-based production systems.
You will lead the architecture, automation, and operation of our Azure/AWS infrastructure, drive security, performance, and cost efficiency across our platforms, and set DevOps standards and best practices for the team.
You will act as a technical authority on cloud operations, mentor junior engineers, and partner with engineering leadership to shape the platform roadmap.
Duties & Responsibilities Cloud System Operations Lead the design, deployment, and operation of Azure/AWS cloud-based production systems.
Own system performance, incident response, and root-cause analysis across production applications.
Define release engineering and pre-production validation standards to ensure system quality and functionality.
Architect and enforce backup, disaster recovery, and cost optimization (FinOps) strategies across cloud environments.
Lead container orchestration and workload management on AKS/Kubernetes clusters, including upgrades, scaling, and hardening.
Automation and Scripting Design and maintain enterprise-grade automation frameworks for operational and platform processes.
Build reusable tooling and scripts (e.
g., Python, Bash, PowerShell) for automation, observability, and incident response.
Lead GitOps adoption and continuous-delivery practices using ArgoCD or Flux.
Security and Compliance Define and enforce cloud security best practices, IAM policies, and secrets management across environments.
Establish and maintain security protocols and compliance posture (e.
g., ISO 27001, SOC 2 controls relevant to infrastructure).
Monitoring and Metrics Architect and operate observability platforms (metrics, logging, tracing) across Azure/AWS, defining SLOs, SLIs, and alerting strategy.
Drive operational excellence by analyzing reliability metrics and leading post-incident reviews and improvement initiatives.
Research and Evaluation Evaluate and recommend emerging technologies, tools, and architectural patterns for adoption.
Lead vendor and product evaluations, including proofs-of-concept and total-cost-of-ownership analysis.
Communication and Collaboration Mentor junior and mid-level DevOps engineers through code reviews, pairing, and technical guidance.
Partner with engineering, security, and product stakeholders to define technical requirements and influence platform direction.
Communicate effectively with executive and technical audiences on cloud strategy, risk, and roadmap.
Education, Experience & Qualifications Bachelor’s degree in Computer Science, Information Systems, or a related field.
6+ years of hands-on experience in DevOps, Cloud Engineering, or SRE roles, including 3+ years with primary focus on Microsoft Azure (required).
Expert-level Kubernetes administration, including cluster lifecycle management, upgrades, networking, and security hardening.
Production experience operating Azure Kubernetes Service (AKS) at scale.
Strong experience designing and maintaining Infrastructure as Code with Terraform, including module design and state management.
Deep experience designing and operating CI/CD pipelines (e.
g., GitHub Actions, Azure DevOps, GitLab CI).
Hands-on experience with observability stacks (Prometheus, Grafana, Azure Monitor, ELK, or similar), including dashboard and alert design.
Strong Linux system administration knowledge.
Experience working with GitOps tools such as ArgoCD or Flux.
Working knowledge of database administration in production (backups, performance tuning, HA/DR, and troubleshooting).
Strong scripting and automation skills in Python, Bash, and/or PowerShell.
Strong analytical and problem-solving abilities.
Ability to collaborate effectively within cross-functional teams.
Clear and precise documentation and communication skills.
Fluency in both English and Arabic (spoken and written).
Behavioral Competencies Initiative Problem Solving Team Oriented Adaptability Ability to Work Under Pressure Technical Competencies Cloud Computing Fundamentals Linux Operating Systems Networking Protocols and Topologies Scripting and Automation Monitoring and Logging Tools Security Best Practices System Troubleshooting Backup and Disaster Recovery Concepts Container Orchestration (AKS / Kubernetes) GitOps (ArgoCD, Flux) Database Administration
This job post has been translated by AI and may contain minor differences or errors.