Submitting more applications increases your chances of landing a job.
Here’s how busy the average job seeker was last month:
Opportunities viewed
Applications submitted
Keep exploring and applying to maximize your chances!
Looking for employers with a proven track record of hiring women?
Click here to explore opportunities now!You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for
Would You Be Likely to Participate?
If selected, we will contact you via email with further instructions and details about your participation.
You will receive a $7 payout for answering the survey.
We are looking for a Systems Engineer to join the CIPM Compute team and own the technical execution layer of our fleet lifecycle program. This is an embedded engineering role within a product management team; you will be the technical hands and eyes of the compute space, working alongside the Technical Program Manager (TPM) who coordinates and owns the overall program.
You will lead lab testing and validation, execute and support pilot deployments, create and maintain technical documentation, and supervise the health and compliance of the production compute fleet. You will work closely with partner engineering teams as a technical peer, consumer, and validator, ensuring that hardware platforms, deployment processes, and lifecycle procedures meet CIPM's standards before handover to delivery teams.
This role is ideal for an engineer who thrives at the intersection of hands-on technical work and structured program execution, and who wants to grow their impact in a team that directly shapes infrastructure strategy at Amazon scale.
Key job responsibilities
Fleet Lifecycle & Health Supervision
● Monitor and review the health of the production compute fleet (CPU, memory, storage, firmware compliance) and proactively identify risks including end-of-life hardware, unresolved vulnerabilities, and capacity gaps
● Coordinate firmware patching cycles and validate remediation outcomes across the fleet
● Provide Tier-3 support for technically complex post-deployment issues, maintaining a structured support queue and hosting regular open-office hours for stakeholders
● Join quarterly reviews with hardware vendors to track lifecycle status and emerging platform developments
Lab Testing & Pilot Execution
● Execute lab testing sessions for new hardware platforms, firmware releases, and deployment automation, documenting findings and providing structured feedback
● Validate deployment and migration runbooks in lab environments before production rollout
● Support and lead pilot deployments at corporate office sites: shadow initial pilots, lead reverse shadow pilots, and document issues and resolutions throughout
● Test new compute products and configurations against hosted service requirements
Technical Documentation
● Create, maintain, and continuously improve hardware deployment runbooks, standard operating procedures, and configuration guides (server provisioning, VM deployment, performance benchmarking, troubleshooting procedures)
● Validate deployment artifacts produced by partner engineering teams
● Maintain documentation currency through a structured feedback collection framework, incorporating learnings from pilots, deployments, and support cases
● Contribute to the consolidation of deployment documentation into a single source
Technical Design & Standards Ownership
● Define and validate technical designs for compute infrastructure deployments
● Validate Bill of Materials (BoM) specifications against site and service requirements
● Help defining hardware configuration tiers to serve multiple customer profiles and budget constraints
● Support vendor evaluation and platform certification efforts, including technical validation of alternative compute platforms
Automation & Tooling
● Test and validate deployment automation scripts and tools developed by partner engineering teams, providing actionable bug reports and improvement feedback
● Maintain fleet core automation tasks such as password rotation, patching workflows, and firmware testing pipelines
● Build lightweight scripts or tooling as needed to address immediate operational gaps, with the ability to read, troubleshoot, and suggest fixes to existing code
Post-Deployment Supervision & Knowledge Transfer
● Validate and supervise compute deployments at warranty sites, logging defects, monitoring resource utilization, and confirming that deployed products meet expected performance baselines
● Define and deliver training sessions, shadowing programs, and Q&A sessions for delivery teams and stakeholders throughout onboarding and pilot phases
● Complete deployment checklists and formal handoff documentation to ensure smooth transitions to operational teams
About the team
Corporate Infrastructure Product Management (CIPM) is responsible for the hardware fleet lifecycle of Amazon's corporate compute infrastructure, encompassing vendor and platform strategy, firmware and patching compliance, fleet inventory and lifecycle management, and hardware capacity planning. CIPM serves as both product owner and strategic driver of the compute hardware roadmap, collaborating closely with partner engineering and deployment teams across the organization.
The team manages a global fleet of enterprise compute servers, mostly Cisco UCS, across corporate office sites worldwide, supporting a range of critical hosted services and infrastructure functions.
- 5+ years of Linux experience
- 5+ years of systems engineering experience
- Bachelor's degree in Systems Engineering, Computer Science, or related field or relevant work experience
- Experience with one or more of the following domains: systems administration (Linux/Window), network administration (DNS, IPsec, BGP, VPN, Load Balancing), or programming (Node.JS, Java, Ruby, C#, Python, or PHP), or experience implementing a cloud-based technology solution
- Experience with virtualization (Hypervisors, VMware, Xen), or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience with programming/scripting (Batch, VB, PowerShell, Java, C#, Chef, Perl, Ruby and/or PHP)
- Experience writing and publishing technical documents or equivalent
- Experience troubleshooting and debugging technical systems, or experience with automation and any version control tools and experience in managing firewalls
- Knowledge of TCP/IP and networking protocols such as HTTP and DNS
- Experience designing and developing scripts to automate operational burdens and reviewing scripting changes to ensure they meet the standards for maintainability, scalability and security
- Experience working in 24/7 production environment
- Experience with service-oriented architecture and web services
- Experience working with Advanced Compute technologies including, but not limited to: Accelerated Compute, High Performance Compute, Visual/Spatial Compute, and/or IoT.
- Knowledge of AWS Infrastructure
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
You'll no longer be considered for this role and your application will be removed from the employer's inbox.