About us > Careers

Lead Site Reliability Engineer

What happens next?

Once we have received your completed application, this will be reviewed against set criteria for the role.

We’ll always strive to let you know the outcome, however, if you don’t hear from us within two weeks please assume that on this occasion your application was unsuccessful.

Don’t forget to keep an eye on our careers page for future roles – we’re growing all the time.

East Croydon
Job description:

Our company and product is growing fast and we want to expand our development team to match. We work in small teams following an Agile methodology, and you'll be an active team member helping us troubleshoot, automate and find and resolve problems of scale. We work in a great office and don't take ourselves too seriously. We love technology and our platform is ours to look after for the long-haul.

What does a Site Reliability Engineer do at dotmailer?

As our Lead Site Reliability Engineer you will be responsible for the day to day running of our service. You will be an experienced C# developer who has worked on large distributed systems and has a passion for troubleshooting complex problems in a production system.

You’ll build tools to monitor our service levels and not just our infrastructure. You’ll be confident in implementing application changes to improve efficiency or fix severe service affecting bugs. Using tools like WinDbg and DebugDiag you’ll analyse memory dump files to solve those really hard to understand issues.

You and your team will build supporting systems to help our development team deploy dotmailer and get feedback on how code is performing. Managing a multi discipline team including sys admins you’ll automate repetitive tasks and help reduce time spent managing our server farm. You’ll be responsible for ensuring we’ve got the basics covered from backups to SSLs renewals.

You will be one step ahead of customers and ensure we’ve provisioned enough capacity to stay on top of demand and growth. You’ll also be on the lookout for savings during less busy periods.

During times of crisis you and your team will manage incidents by ascertaining service impact and handling internal and external communications. Plus, once the fire is out you’ll look for ways to see it stays that way. We offer a 24x7 service so you’ll also be responsible for managing our on-call rota.

As a team lead you will be responsible for organising your team by assigning and tracking work. The role requires managerial skills as you will be mentoring and appraising team members.

What we’re looking for

  • C# development experience in a large distributed system, extra points for working in a SaaS environment.
  • A working knowledge of SQL
  • An understanding of Windows monitoring
  • PowerShell
  • An analytical mind and the ability to measure, change and measure
  • Experience with Azure
  • Ability to work in a mixed discipline team alongside developers, sys admins, database admins
  • Good communication skills and the ability to distil complex situations into customer friendly announcements
  • Desire to lead a team