Observability Engineer (Network Operations)

Job Description

Our client, a global leader in electronic trading solutions, is seeking an Observability Engineer to join their Production Support team. This role is pivotal in ensuring the stability, performance, and reliability of high-availability trading platforms used by financial institutions worldwide.

You will work across a broad range of monitoring and observability tools, contributing to both day-to-day operations and strategic transformation projects - including a significant migration from on-premises to cloud infrastructure over the next two years.

The role combines hands-on technical monitoring, incident support, and process improvement with a strong focus on automation, helping to reduce repetitive tasks and increase operational efficiency.

Key Responsibilities

  • Oversee and support an established offshore monitoring team, ensuring processes, runbooks, and escalation paths are consistently followed.
  • Monitor trading platform health, proactively identifying and addressing potential issues before they impact service.
  • Collaborate with engineers, DevOps, and deployment teams to ensure observability is built into new systems from the outset.
  • Act as a key contributor during major incident calls, providing real-time insights and supporting rapid root cause identification.
  • Lead improvements in monitoring quality through post-incident reviews and continuous feedback loops.
  • Design and implement automation scripts and workflows for routine checks, smoke testing, and post-deployment validation.
  • Help drive the standardisation and consolidation of monitoring tools and practices across the wider group.
  • Support the onboarding of new applications and infrastructure into the observability stack, tailoring solutions where off-the-shelf tools fall short.
  • Occasionally participate in planned weekend work for system upgrades or testing.

What We're Looking For

We don't expect you to tick every box - the ideal candidate will be an avid learner who can adapt quickly and bring enthusiasm for improving systems, processes, and tools. If you have a solid foundation in production support or monitoring and the drive to expand your skills across automation, cloud, and trading systems, we'd like to hear from you.

Core Skills & Experience:

  • Strong background in system operations, network operations, or infrastructure support (ideally in financial services).
  • Practical experience with both on-premise and cloud-based environments (AWS knowledge highly desirable).
  • Hands-on scripting and automation experience (e.g., Python, PowerShell, Bash, Perl).
  • Understanding of Unix/Linux and Windows Server environments.
  • Ability to troubleshoot infrastructure and network issues (firewall, routing, connectivity).
  • Experience implementing and maintaining monitoring solutions in complex technical settings.
  • Strong communication skills, able to collaborate across teams and regions.

Nice-to-Have Skills:

  • Familiarity with industry monitoring tools.
  • Understanding of DevOps principles, automation frameworks (Ansible, Puppet)
  • Exposure to database platforms (MSSQL, Oracle, Sybase) or the FIX protocol.
  • Experience working within ITIL-aligned environments.
  • AWS certification or equivalent cloud qualifications.

Why Join?

This is a high-visibility role where you'll be part of a global organisation modernising its technology stack and embracing automation at scale. You'll gain exposure to cutting-edge tools, complex technical environments, and transformation projects that have a direct impact on trading operations.

If you are proactive, curious, and passionate about making systems more reliable, efficient, and automated, this role will give you the platform to make a measurable difference.