Senior Site Reliability Engineer(Events)
RingCentral.com
Office
Spain Valencia
Full Time
About Ringcentral Events
RingCentral Events (formerly Hopin Events) is a robust, all-in-one platform for creating and managing professional, engaging events for any audience. It provides a complete solution that simplifies the entire event lifecycle - from planning, promotion, and live execution to post-event analytics.
The platform's broad set of features supports virtual, hybrid, and in-person events, ensuring a seamless experience whether the audience is online, in-person, or a mix of both.
Position Overview
As a Site Reliability Engineer for RingCentral Events, you're not just an infrastructure owner - you're a crucial part of our mission to deliver flawless, high-scale experiences for global audiences. Your role is central to our ability to deliver a reliable and performant platform. You will be a key contributor to our software delivery flow, ensuring that changes move from development to production with speed, safety, and consistency. Additionally, you will proactively eliminate observability gaps and build a self-healing infrastructure to ensure our system performs under pressure.
Responsibilities:
- Manage cloud infrastructure on AWS and EKS, leveraging IaC and GitOps to ensure scalability
- Participate in service capacity planning, software performance analysis, and system tuning
- Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
- Participate in release management, working closely with engineering teams to bring GitOps principles to our release process and manage CI/CD pipelines using GitLab CI
- Take part in 24/7 on-call responsibilities (~2 days/month based on rotation schedule) to ensure continuous availability and quick response to issues in production
- Conduct blameless post-mortems to learn from incidents and prevent future ones
- Develop and test disaster recovery plans and runbooks to ensure business continuity
- Implement security best practices and controls within the infrastructure to meet compliance standards and prepare for audits
Requirements:
- Familiarity with cloud-native services and architectures, experience with cloud providers - our infrastructure is built on AWS
- Experience in running mission critical services at scale without disruption
- Hands-on experience with Kubernetes and infrastructure as code (IaC) using Terraform, focusing on scalability and efficient infrastructure management
- Proficiency in designing and maintaining CI/CD pipelines, with a preference for GitLab CI
- Experience with monitoring, APM, logging, and analytics tools
- Strong problem-solving skills with the ability to analyze and debug complex distributed systems, tracing requests and data flows from the kernel to the web to identify root causes
- Ability to spot, address, and optimize performance bottlenecks
- Proactive approach, favoring iterative action over waiting for things to happen or to be perfect
- Familiarity with incident, problem and change management processes and best practices
Nice To Have:
- A reliability-oriented mindset with a focus on designing and building resilient architectures
- Previous SRE experience or knowledge, giving you a heightened awareness of what data to collect, how to display it, and how users can benefit from it
- Knowledge of scripting languages such as Python or Go
- Familiarity with GitOps principles and tools like ArgoCD
- Knowledge of caching mechanisms, such as Redis
- Experience with messaging queues like MSK Kafka, SQS or RabbitMQ
- Familiarity with database management systems like AWS Aurora and PostgreSQL
We Offer:
Well-Coordinated Professional Team
- Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth
- Additional Health and Life Insurance Package
Employee Assistance Program
23 Vacation Days
