Machine Learning DevOps Engineer

Newcastle Upon Tyne

£45,000-£50,000

Apply before

Job Description

About the Role:
The ML DevOps Engineer is responsible for designing, deploying, automating, monitoring, and maintaining scalable machine learning platforms and production-grade AI systems.

This role is heavily focused on MLOps and combines expertise in machine learning engineering, cloud infrastructure, DevOps, automation, observability, and production operations to enable reliable end-to-end ML delivery.

The engineer will build and maintain robust ML pipelines, scalable deployment environments, CI/CD workflows, automated retraining systems, and monitoring frameworks to ensure machine learning solutions are reproducible, scalable, observable, and production-ready.

The role supports agile delivery teams by operationalising machine learning systems across cloud and hybrid infrastructure environments.


Key Responsibilities:

ML Engineering & MLOps

  • Design, develop, deploy, and maintain machine learning pipelines for training, validation, inference, and monitoring.

  • Automate model lifecycle processes including data ingestion, feature engineering, model retraining, versioning, and rollback.

  • Build scalable environments for model experimentation and production deployment.

  • Integrate ML models into APIs, applications, and enterprise systems.

  • Monitor model drift, performance degradation, and prediction reliability.

  • Ensure reproducibility of experiments using version control and artifact management.

  • Build automated ML workflows supporting continuous training, deployment, rollback, and model lifecycle management.

  • Optimise RAM/CPU resource utilisation for ML workloads.


DevOps & Infrastructure Automation

  • Design and implement infrastructure automation using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or equivalent.

  • Build, maintain, and optimize CI/CD pipelines for software and ML workloads.

  • Manage containerized environments using Docker and orchestration platforms such as Kubernetes or Swarm.

  • Support hybrid environments including cloud platforms and on-premise infrastructure.

  • Administer Windows and Linux systems to ensure operational stability.

  • Control VMware / vSphere / storage / SAN infrastructure where required.


Quality Assurance & Test Automation

  • Develop automated validation frameworks for:

  • Data quality


Integrate automated testing into ML CI/CD pipelines.

Implement validation gates for:

  • Model promotion

  • Deployment approval

  • Performance benchmarking

  • Bias/drift checks


Perform functional, integration, system, and performance testing for ML-enabled applications.

Monitoring, Reliability & Observability

  • Implement monitoring, alerting, logging, and diagnostics using tools such as Grafana, ELK, Prometheus, or similar.

  • Ensure availability, scalability, security, and resilience of production systems.

  • Perform root cause analysis for incidents and deliver preventive improvements.

  • Build dashboards for application performance, and ML services.


Agile Delivery & Stakeholder Collaboration

  • Work within Agile / Scrum teams to deliver iterative business solutions.

  • Collaborate with developers, data scientists, testers, product owners, and clients.

  • Translate technical challenges into clear business-facing updates.

  • Provide effort estimates, timelines, technical recommendations, and delivery plans.

  • Support continuous improvement of engineering automation, delivery practices, and technical culture.


Essential Skills & Experience

  • Strong hands-on experience in DevOps, automation engineering, QA automation, or MLOps.

  • Proven experience designing enterprise-grade infrastructure and deployment pipelines , specially machine learning systems.

  • Experience supporting production systems with high availability requirements.

  • Ability to troubleshoot across code, infrastructure, networking, and data layers.

  • Strong understanding of CI/CD for ML, Model lifecycle management, Data engineering workflows, Infrastructure scalability, Observability

  • Strong communication skills in client-facing environments.

  • Ownership mindset with ability to work independently.

  • Comfortable working in technically complex, fast-paced delivery environments.

  • Exposure to cloud platform (Microsoft Azure).

  • Experience deploying ML systems into production environments.


Technical Skills Required

- Programming / Scripting

  • Python

  • Bash / Shell scripting

  • PowerShell

  • JavaScript / C# (desirable)


- MLOps & ML Tooling

  • MLflow

  • Kubeflow

  • Azure ML

  • Airflow

  • DVC

  • Weights & Biases

  • TensorFlow Serving / TorchServe (desirable)


- DevOps & CI/CD

  • Azure DevOps

  • GitHub Actions

  • GitLab CI/CD

  • TeamCity

  • Octopus Deploy

  • Artifactory


- Infrastructure & Containers

  • Docker

  • Kubernetes

  • Helm

  • VMware / vSphere

  • Linux

  • Windows Server


- Monitoring / Quality

  • Grafana

  • ELK Stack

  • SonarQube

  • Control-M


- Data & Databases

  • Microsoft SQL Server

  • PostgreSQL

  • Relational database fundamentals


Preferred Background

  • Master’s or PhD degree in:

  • Computer Science, AI, Machine Learning, Data Science, and Engineering degree


Experience with:

  • Geospatial data

  • Remote sensing

  • Large-scale spatial analytics

  • Earth Observation workflows


Experience working in enterprise consulting, infrastructure-heavy, or regulated environments.

  • Artificial Intelligence

  • Data Science

  • Engineering

  • Mathematics

  • Related STEM discipline


Job Types: Full-time, Permanent


Pay: £45,000.00-£50,000.00 per year


Benefits:

  • Casual dress

  • Company events

  • Company pension

  • Cycle to work scheme

  • Enhanced maternity leave

  • Financial planning services

  • Private medical insurance

  • Sick pay

  • Work from home

  • Ability to commute/relocate


Newcastle upon Tyne NE1: reliably commute or plan to relocate before starting work (required)
Work Location: In person.