Anterior     |     Seguinte  

Site Reliability Engineer - Production Engineering (M/F)

emagine - Portugal

21.05.2025 | | Referência: 2269190


  PARTILHAR






Empresa:

emagine - Portugal


Descrição da Função

About the Job:

We're hiring a Site Reliability Engineer to join one team and support the reliability and performance of next-generation software systems at OutSystems, a global leader in low-code application development platforms.


If you like automation, cloud-native infrastructure, and enabling development teams to build reliable and scalable systems, this is your chance to play a key role in a forward-thinking tech environment.


Responsibilities:

  • Act as a key partner to development teams, driving the adoption of Site Reliability Engineering best practices.
  • Define, implement, and manage Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
  • Design and implement resilient, secure, and scalable infrastructure using cloud-native technologies.
  • Lead efforts to improve observability, including monitoring, alerting, logging, and tracing.
  • Drive the incident management lifecycle, from detection to resolution, and lead post-incident reviews.
  • Automate operational processes to reduce toil and increase system reliability.
  • Champion a culture of continuous improvement, knowledge sharing, and accountability.
  • Participate in a 24/7 on-call rotation to support production systems.


Required Skills & Experience:

  • 5+ years of experience in Software Engineering, DevOps, or Site Reliability Engineering.
  • Proficiency in at least one programming language: Python, Go, Java, C#, or similar.
  • Experience with Kubernetes, EKS, and container orchestration platforms.
  • Familiarity with AWS services (e.g., EC2, RDS, Lambda, ELB, CloudFront).
  • Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, Puppet, etc.).
  • Skilled in implementing monitoring and incident management using tools like Prometheus, Grafana, ELK stack, or equivalent.
  • Strong troubleshooting and debugging skills, especially in distributed systems.
  • Fluent in English with excellent communication skills.


Nice to Have:

  • Certifications such as CKA, CKAD, or CKS.
  • Experience with Spacelift, Chef, or similar automation tools.
  • Familiarity with SLOs, SLIs, and metrics-driven reliability engineering.


Soft Skills:

  • Excellent problem-solving mindset and top-down analytical approach.
  • A humble, collaborative attitude and the ability to take ownership.
  • Skilled in negotiation, expectation management, and stakeholder communication.
  • Process-oriented and eager to challenge inefficiencies.


Location & Collaboration:

  • 100% remote, but candidates must be based in Portugal.
  • Strong collaboration with international product teams.
  • Participation in on-call support rotation required.


Observações

Not Specified (Portugal)





EMPREGOS SEMELHANTES





ÚLTIMOS EMPREGOS