Get In Touch

Machine Learning DevOps Engineer

Newcastle Upon Tyne

£45,000-£50,000

Apply before

30 Jun 2026

Apply for this role

Job Description

About the Role:
The ML DevOps Engineer is responsible for designing, deploying, automating, monitoring, and maintaining scalable machine learning platforms and production-grade AI systems.

This role is heavily focused on MLOps and combines expertise in machine learning engineering, cloud infrastructure, DevOps, automation, observability, and production operations to enable reliable end-to-end ML delivery.

The engineer will build and maintain robust ML pipelines, scalable deployment environments, CI/CD workflows, automated retraining systems, and monitoring frameworks to ensure machine learning solutions are reproducible, scalable, observable, and production-ready.

The role supports agile delivery teams by operationalising machine learning systems across cloud and hybrid infrastructure environments.

Key Responsibilities:

ML Engineering & MLOps

Design, develop, deploy, and maintain machine learning pipelines for training, validation, inference, and monitoring.
Automate model lifecycle processes including data ingestion, feature engineering, model retraining, versioning, and rollback.
Build scalable environments for model experimentation and production deployment.
Integrate ML models into APIs, applications, and enterprise systems.
Monitor model drift, performance degradation, and prediction reliability.
Ensure reproducibility of experiments using version control and artifact management.
Build automated ML workflows supporting continuous training, deployment, rollback, and model lifecycle management.
Optimise RAM/CPU resource utilisation for ML workloads.

DevOps & Infrastructure Automation

Design and implement infrastructure automation using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or equivalent.
Build, maintain, and optimize CI/CD pipelines for software and ML workloads.
Manage containerized environments using Docker and orchestration platforms such as Kubernetes or Swarm.
Support hybrid environments including cloud platforms and on-premise infrastructure.
Administer Windows and Linux systems to ensure operational stability.
Control VMware / vSphere / storage / SAN infrastructure where required.

Quality Assurance & Test Automation

Develop automated validation frameworks for:
Data quality

Integrate automated testing into ML CI/CD pipelines.

Implement validation gates for:

Model promotion
Deployment approval
Performance benchmarking
Bias/drift checks

Perform functional, integration, system, and performance testing for ML-enabled applications.

Monitoring, Reliability & Observability

Implement monitoring, alerting, logging, and diagnostics using tools such as Grafana, ELK, Prometheus, or similar.
Ensure availability, scalability, security, and resilience of production systems.
Perform root cause analysis for incidents and deliver preventive improvements.
Build dashboards for application performance, and ML services.

Agile Delivery & Stakeholder Collaboration

Work within Agile / Scrum teams to deliver iterative business solutions.
Collaborate with developers, data scientists, testers, product owners, and clients.
Translate technical challenges into clear business-facing updates.
Provide effort estimates, timelines, technical recommendations, and delivery plans.
Support continuous improvement of engineering automation, delivery practices, and technical culture.

Essential Skills & Experience

Strong hands-on experience in DevOps, automation engineering, QA automation, or MLOps.
Proven experience designing enterprise-grade infrastructure and deployment pipelines , specially machine learning systems.
Experience supporting production systems with high availability requirements.
Ability to troubleshoot across code, infrastructure, networking, and data layers.
Strong understanding of CI/CD for ML, Model lifecycle management, Data engineering workflows, Infrastructure scalability, Observability
Strong communication skills in client-facing environments.
Ownership mindset with ability to work independently.
Comfortable working in technically complex, fast-paced delivery environments.
Exposure to cloud platform (Microsoft Azure).
Experience deploying ML systems into production environments.

Technical Skills Required

- Programming / Scripting