Machine Learning DevOps Engineer
Newcastle Upon Tyne
£45,000-£50,000
Apply before
Job Description
About the Role:
The ML DevOps Engineer is responsible for designing, deploying, automating, monitoring, and maintaining scalable machine learning platforms and production-grade AI systems.
This role is heavily focused on MLOps and combines expertise in machine learning engineering, cloud infrastructure, DevOps, automation, observability, and production operations to enable reliable end-to-end ML delivery.
The engineer will build and maintain robust ML pipelines, scalable deployment environments, CI/CD workflows, automated retraining systems, and monitoring frameworks to ensure machine learning solutions are reproducible, scalable, observable, and production-ready.
The role supports agile delivery teams by operationalising machine learning systems across cloud and hybrid infrastructure environments.
Key Responsibilities:
ML Engineering & MLOps
Design, develop, deploy, and maintain machine learning pipelines for training, validation, inference, and monitoring.
Automate model lifecycle processes including data ingestion, feature engineering, model retraining, versioning, and rollback.
Build scalable environments for model experimentation and production deployment.
Integrate ML models into APIs, applications, and enterprise systems.
Monitor model drift, performance degradation, and prediction reliability.
Ensure reproducibility of experiments using version control and artifact management.
Build automated ML workflows supporting continuous training, deployment, rollback, and model lifecycle management.
Optimise RAM/CPU resource utilisation for ML workloads.
DevOps & Infrastructure Automation
Design and implement infrastructure automation using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or equivalent.
Build, maintain, and optimize CI/CD pipelines for software and ML workloads.
Manage containerized environments using Docker and orchestration platforms such as Kubernetes or Swarm.
Support hybrid environments including cloud platforms and on-premise infrastructure.
Administer Windows and Linux systems to ensure operational stability.
Control VMware / vSphere / storage / SAN infrastructure where required.
Quality Assurance & Test Automation
Develop automated validation frameworks for:
Data quality
Integrate automated testing into ML CI/CD pipelines.
Implement validation gates for:
Model promotion
Deployment approval
Performance benchmarking
Bias/drift checks
Perform functional, integration, system, and performance testing for ML-enabled applications.
Monitoring, Reliability & Observability
Implement monitoring, alerting, logging, and diagnostics using tools such as Grafana, ELK, Prometheus, or similar.
Ensure availability, scalability, security, and resilience of production systems.
Perform root cause analysis for incidents and deliver preventive improvements.
Build dashboards for application performance, and ML services.
Agile Delivery & Stakeholder Collaboration
Work within Agile / Scrum teams to deliver iterative business solutions.
Collaborate with developers, data scientists, testers, product owners, and clients.
Translate technical challenges into clear business-facing updates.
Provide effort estimates, timelines, technical recommendations, and delivery plans.
Support continuous improvement of engineering automation, delivery practices, and technical culture.
Essential Skills & Experience
Strong hands-on experience in DevOps, automation engineering, QA automation, or MLOps.
Proven experience designing enterprise-grade infrastructure and deployment pipelines , specially machine learning systems.
Experience supporting production systems with high availability requirements.
Ability to troubleshoot across code, infrastructure, networking, and data layers.
Strong understanding of CI/CD for ML, Model lifecycle management, Data engineering workflows, Infrastructure scalability, Observability
Strong communication skills in client-facing environments.
Ownership mindset with ability to work independently.
Comfortable working in technically complex, fast-paced delivery environments.
Exposure to cloud platform (Microsoft Azure).
Experience deploying ML systems into production environments.
Technical Skills Required
- Programming / Scripting
Python
Bash / Shell scripting
PowerShell
JavaScript / C# (desirable)
- MLOps & ML Tooling
MLflow
Kubeflow
Azure ML
Airflow
DVC
Weights & Biases
TensorFlow Serving / TorchServe (desirable)
- DevOps & CI/CD
Azure DevOps
GitHub Actions
GitLab CI/CD
TeamCity
Octopus Deploy
Artifactory
- Infrastructure & Containers
Docker
Kubernetes
Helm
VMware / vSphere
Linux
Windows Server
- Monitoring / Quality
Grafana
ELK Stack
SonarQube
Control-M
- Data & Databases
Microsoft SQL Server
PostgreSQL
Relational database fundamentals
Preferred Background
Master’s or PhD degree in:
Computer Science, AI, Machine Learning, Data Science, and Engineering degree
Experience with:
Geospatial data
Remote sensing
Large-scale spatial analytics
Earth Observation workflows
Experience working in enterprise consulting, infrastructure-heavy, or regulated environments.
Artificial Intelligence
Data Science
Engineering
Mathematics
Related STEM discipline
Job Types: Full-time, Permanent
Pay: £45,000.00-£50,000.00 per year
Benefits:
Casual dress
Company events
Company pension
Cycle to work scheme
Enhanced maternity leave
Financial planning services
Private medical insurance
Sick pay
Work from home
Ability to commute/relocate
Newcastle upon Tyne NE1: reliably commute or plan to relocate before starting work (required)
Work Location: In person.