SRE Manager/Engineering Manager (£120k+)

Job description

Site Reliability Manager/ Engineering Manager (£120k+)
VANRATH is pleased to be working with a global software company who are continuing to grow after a recent acquisition and are looking for a Site Reliability Manager/Engineering (£120k+). The company is a market leader in their space (Gartner) and is really expanding across multiple areas within the AI tech space. The company has a remote first approach so the person can be based anywhere in the UK with a great atmosphere and building a really strong team.

The Role
They are looking to build a new team for their archive platform which is their largest product in the company with over 7,000 customers. They are looking to expand that across Europe more and need a new team to provide coverage in the time zone. They have around 30 engineers in the US and they are looking to hire around 35 people this year. They are building a 24/7 operations across the different time zones. They want someone who is passionate about people, and processes and building a culture. They want someone who has come from an engineering background who if really needed could jump in. They have the largest SQL database in production outside Microsoft which has been used in use cases.

Manage day to day operations of our SaaS on-prem platform using a data driven approach to ensuring health and performance remain within our SLAs and SLOs while scaling our data center, public, and private cloud infrastructure.
Manage our incident response and escalation processes, including stakeholder communication, demonstrating continuous improvement as we incorporate feedback into the processes. Demonstrate fully accountability and ownership for platform disruptions and manage incidents through complete resolution.
Adopt SRE/DevOps best practices and apply them to critical initiatives and transformation activities across teams; act as a strategic thought leader in this space for the broader engineering organization and coach the team to develop their skillsets through knowledge sharing, documenting, and acting as a role model for behaviors and attitude.
Lead a large, globally distributed, diverse team to provide around the clock coverage for the platform.
Present to both internal and external audiences on a myriad of topics including roadmap, platform health (key metrics), and all aspects of incidents.
Lead the forecasting, planning, and execution around creating greater scalability, availability, and reliability of the platform, taking into consideration security and observability; a willingness to get hands on at times to lead through doing.
Unrelenting drive for producing increased observability and alerting throughout the platform while guiding the engineering teams toward a more automated, autoscaling, and self-healing architecture.
Lead and manage a growing team of Site Reliability and DevOps engineers fostering a strong team culture, continuously monitoring morale, and consistently delivering quality outcomes.
Champion the move towards Agile/Scrum for the SRE and DevOps organization, creating greater visibility into the day-to-day activities of the team.

The Person:

- Experience iin Managing teams and building a culture
- Passionate about building and managing people and processes
- Experience with an engineering background (Java, c#, python)
- Previous experience with SQL is a plus

Remuneration
Salary for the role would be around the 95k mark with other great benefits including health, life, discounts and more.
For further information on this role or any other, please contact Matthew Evers in the strictest confidence
Or please send in your CV via the link below.