Site Reliability Engineer - Production Engineering (M/F)
emagine - Portugal
21.05.2025 | | Referência: 2269190

PARTILHAR
Empresa:
emagine - Portugal
Descrição da Função
About the Job:
We're hiring a Site Reliability Engineer to join one team and support the reliability and performance of next-generation software systems at OutSystems, a global leader in low-code application development platforms.
If you like automation, cloud-native infrastructure, and enabling development teams to build reliable and scalable systems, this is your chance to play a key role in a forward-thinking tech environment.
Responsibilities:
- Act as a key partner to development teams, driving the adoption of Site Reliability Engineering best practices.
- Define, implement, and manage Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Design and implement resilient, secure, and scalable infrastructure using cloud-native technologies.
- Lead efforts to improve observability, including monitoring, alerting, logging, and tracing.
- Drive the incident management lifecycle, from detection to resolution, and lead post-incident reviews.
- Automate operational processes to reduce toil and increase system reliability.
- Champion a culture of continuous improvement, knowledge sharing, and accountability.
- Participate in a 24/7 on-call rotation to support production systems.
Required Skills & Experience:
- 5+ years of experience in Software Engineering, DevOps, or Site Reliability Engineering.
- Proficiency in at least one programming language: Python, Go, Java, C#, or similar.
- Experience with Kubernetes, EKS, and container orchestration platforms.
- Familiarity with AWS services (e.g., EC2, RDS, Lambda, ELB, CloudFront).
- Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, Puppet, etc.).
- Skilled in implementing monitoring and incident management using tools like Prometheus, Grafana, ELK stack, or equivalent.
- Strong troubleshooting and debugging skills, especially in distributed systems.
- Fluent in English with excellent communication skills.
Nice to Have:
- Certifications such as CKA, CKAD, or CKS.
- Experience with Spacelift, Chef, or similar automation tools.
- Familiarity with SLOs, SLIs, and metrics-driven reliability engineering.
Soft Skills:
- Excellent problem-solving mindset and top-down analytical approach.
- A humble, collaborative attitude and the ability to take ownership.
- Skilled in negotiation, expectation management, and stakeholder communication.
- Process-oriented and eager to challenge inefficiencies.
Location & Collaboration:
- 100% remote, but candidates must be based in Portugal.
- Strong collaboration with international product teams.
- Participation in on-call support rotation required.

Observações
Not Specified (Portugal)