Submitting more applications increases your chances of landing a job.
Here’s how busy the average job seeker was last month:
Opportunities viewed
Applications submitted
Keep exploring and applying to maximize your chances!
Looking for employers with a proven track record of hiring women?
Click here to explore opportunities now!You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for
Would You Be Likely to Participate?
If selected, we will contact you via email with further instructions and details about your participation.
You will receive a $7 payout for answering the survey.
Proudly voted a Great Place to Work®, we are a dynamic startup in the CPaaS (Communication Platform as a Service) space that is revolutionizing the way businesses communicate. Our team is made up of 500 energetic and passionate Unifones who are dedicated to delivering the best possible experience to 5000+ customer-centric companies.
We pride ourselves on our fun and collaborative work environment, where creativity and new ideas are constantly encouraged. As shareholders in the business, we’re so much more than a group of passionate communicators. We are Unifones. Join our team and be a part of something big!
Meet the team!
Our Engineering team is responsible for designing, developing, and maintaining the systems and technologies that drive Unifonic’s solutions. We work closely with other departments to ensure our products and services meet the needs of our customers. If you are passionate about technology and are excited about working on cutting-edge communication and engagement solutions, we want you on our team.
As a Senior Infrastructure Engineer in the Production Operations (Live) team you will be responsible for enhancing system reliability, scalability, and resilience. As part of our elite SRE team, you'll drive continuous improvement across our cloud infrastructure and ensure the consistent high performance of our distributed messaging platforms.
Help us shape the future of communication by:
Owning the reliability, uptime, and scalability of critical production services 24/7.
Participating in the on-call rotation to respond to incidents, troubleshoot live production issues, and lead post-incident analysis 24/7.
Building robust operational playbooks, escalation paths, and improve Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
Ensuring operational excellence by proactively detecting and addressing reliability risks through SLO monitoring, chaos testing, and capacity planning.
Automating operational tasks to minimize human intervention.
Being available at night during the usual non-working hours of the rest of the team according to the on-call schedule is a MUST.
Architecting, implementing, and managing infrastructure across AWS, Oracle Cloud Infrastructure (OCI), and OpenStack environments.
Optimizing cloud resources to balance performance, security, and cost-efficiency.
Managing Kubernetes clusters (EKS, OKE, Rancher RKE2), ensuring scalability, availability, and robust performance.
Deploying advanced containerization strategies and troubleshooting.
Managing and optimizing high-performance messaging and caching systems including Kafka, RabbitMQ, and Redis.
Ensuring efficient, reliable message and data delivery critical to Unifonic's SMS and distributed systems.
Managing and optimizing production-grade MySQL and PostgreSQL databases.
Ensuring high availability, performance tuning, backups, and recovery processes for critical databases.
Leading the planning and execution of comprehensive disaster recovery strategies.
Developing and maintaining robust business continuity plans.
Implementing advanced observability solutions (Prometheus, Grafana, CloudWatch).
Defining, measuring, and enforcing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in alignment with SRE best practices.
Proactively identifying issues, minimizing downtime, and enhancing system transparency.
Driving automation initiatives using Terraform, Helm, Jenkins, Tekton or GitLab CI/CD.
Streamlining deployment pipelines and reduce manual intervention through innovative automation.
Integrating security best practices into infrastructure and application layers.
Performing regular audits ensuring compliance and robust security posture.
Collaborating with cross-functional teams (engineering, product, QA) to foster SRE culture.
Mentoring junior engineers, enhancing team capabilities and promoting knowledge sharing.
What you'll bring:
Bachelor's or master's degree in computer science, Engineering, or a related technical field.
8+ years of hands-on production experience in SRE, DevOps, or cloud engineering roles.
Strong expertise in AWS, OCI, OpenStack environments.
Deep understanding of Kubernetes ecosystems (EKS, OKE, Rancher RKE2).
Proven experience with Kafka, RabbitMQ, Redis, and distributed messaging and caching systems.
Solid experience managing MySQL and PostgreSQL in production environments.
Expert-level scripting and automation skills (Python, Bash, Go).
Advanced proficiency with Helm, Terraform, and modern CI/CD toolchains.
Demonstrable experience with Linux system administration and troubleshooting.
Being available at night during the usual non-working hours of the rest of the team according to the on-call schedule is a MUST.
Craft and Toolkit:
Distributed Systems & Architecture — Scalability, fault tolerance, consistency models, microservices.
Cloud Platforms — Hands-on with Amazon Web Services, Google Cloud Platform, or Microsoft Azure.
Infrastructure as Code — Terraform, AWS CloudFormation.
Containers & Orchestration — Docker, Kubernetes.
CI/CD & Automation — Jenkins, GitHub Actions, GitLab CI.
Observability — Prometheus, Grafana, ELK Stack.
Incident & Reliability — RCA, postmortems, SLIs/SLOs, MTTR reduction.
Character Traits:
Analytical thinking and problem-solving – approach problems clearly, use data, and find solutions.
Ownership and accountability – take responsibility and follow projects through to completion.
Communication – explain ideas clearly and listen to others across teams.
Collaboration – work well with others and support team goals.
Adaptability and learning – adjust to change and keep learning new skills or tools.
Mentorship and knowledge sharing – help others grow and share what you know.
Resilience – stay calm under pressure and handle setbacks constructively.
Quality and attention to detail – do work carefully and strive for improvement.
Advocacy and innovation – encourage best practices, efficiency, and new ideas.
AI mindset and utilization – Leverage AI tools to enhance productivity and drive efficient, data-informed outcomes.
As a Unifone you’ll receive a range of benefits:
Competitive salary and bonus
Unifonic share scheme (we are all owners!)
30 holiday days after the first anniversary
Your Birthday off!
Spend up to 25 days per year working from anywhere in the world!
Paid leave and assistance for new parents
You'll no longer be considered for this role and your application will be removed from the employer's inbox.