Responsibilities
- Improve the lifecycle of microservices from inception and design to deployment and operation.
- Write code to reduce operational workload, eliminate toil, and enable developers to deliver features faster.
- Define and manage SLIs and SLOs (Service Level Indicators/Objectives) to ensure system reliability.
- Facilitate blame-free RCA meetings for incidents to drive continuous improvement.
- Participate in the global Incident Response Coordination (IRC) for all products.
- Scale systems sustainably through automation and evolve systems for better velocity.
Qualifications
- Bachelor’s or Master’s Degree in Computer Science, Electrical Engineering, or related technical discipline.
- 0-1 years of industry experience (Freshers with strong skills are welcome).
- Deep understanding of AWS Networking, Compute, and Storage.
- Competency with CI/CD tooling like Kubernetes, Terraform, Ansible, and Jenkins.
- Ability to author production-ready code in at least one language: Java, Scala, or Go.
- Experience with Linux systems and comfort with the command line.
- Experience with streaming technologies like Kafka or KSQL is a plus.