Machine Learning DevOps Engineer

Newcastle Upon Tyne

£45,000-£50,000

Apply before

Job Description

About the Role:
The ML DevOps Engineer is responsible for designing, deploying, automating, testing, and maintaining scalable machine learning and software delivery environments.

This hybrid role combines Machine Learning Engineering, DevOps, QA Automation, and Infrastructure Engineering to ensure reliable end-to-end product delivery.

The role supports agile teams by building robust CI/CD pipelines, managing cloud/on-prem infrastructure, integrating automated testing, and enabling production-grade ML systems.


Key Responsibilities:

ML Engineering & MLOps
Design, develop, deploy, and maintain machine learning pipelines for training, validation, inference, and monitoring.
Automate model lifecycle processes including data ingestion, feature engineering, model retraining, versioning, and rollback.
Build scalable environments for model experimentation and production deployment.
Integrate ML models into APIs, applications, and enterprise systems.
Monitor model drift, performance degradation, and prediction reliability.
Ensure reproducibility of experiments using version control and artifact management.

DevOps & Infrastructure Automation
Design and implement infrastructure automation using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or equivalent.
Build, maintain, and optimize CI/CD pipelines for software and ML workloads.
Manage containerized environments using Docker and orchestration platforms such as Kubernetes or Swarm.
Support hybrid environments including cloud platforms and on-premise infrastructure.
Administer Windows and Linux systems to ensure operational stability.
Control VMware / vSphere / storage / SAN infrastructure where required.

Quality Assurance & Test Automation
Develop and maintain automated testing frameworks across UI, API, integration, regression, and acceptance layers.
Deliver scheduled automated test suites across browsers, devices, and environments.
Perform manual functional, exploratory, UAT, and systems integration testing where required.
Define test strategies, test plans, scripts, and release readiness criteria.
Integrate automated tests into CI/CD pipelines for continuous validation.
Record, triage, monitor, and resolve defects efficiently.

Monitoring, Reliability & Observability
Implement monitoring, alerting, logging, and diagnostics using tools such as Grafana, ELK, Prometheus, or similar.
Ensure availability, scalability, security, and resilience of production systems.
Perform root cause analysis for incidents and deliver preventive improvements.
Build dashboards for application performance, and ML services.

Agile Delivery & Stakeholder Collaboration
Work within Agile / Scrum teams to deliver iterative business solutions.
Collaborate with developers, data scientists, testers, product owners, and clients.
Translate technical challenges into clear business-facing updates.
Provide effort estimates, timelines, technical recommendations, and delivery plans.
Support continuous improvement of engineering automation, delivery practices, and technical culture.

Essential Skills & Experience
Strong hands-on experience in DevOps, automation engineering, QA automation, or MLOps.
Proven experience designing enterprise-grade infrastructure and deployment pipelines.
Experience supporting production systems with high availability requirements.
Ability to troubleshoot across code, infrastructure, networking, and data layers.
Strong communication skills in client-facing environments.
Ownership mindset with ability to work independently.
Comfortable working in technically complex, fast-paced delivery environments.
Exposure to cloud platforms such as Microsoft Azure or Google Cloud.
Experience deploying ML systems into production environments.

Technical Skills Required

  • Programming / Scripting
    Python
    PowerShell
    Java / JavaScript / C#
    Bash / Shell scripting

  • DevOps Tooling
    Azure DevOps
    TeamCity
    Octopus Deploy
    Artifactory
    Git / GitHub / GitLab

  • Infrastructure & Containers
    Docker
    Kubernetes / Swarm
    VMware / vSphere
    Windows Server / Linux

  • Testing Tools
    Selenium
    Cypress
    Playwright
    Karate
    WebdriverIO
    JMeter

  • Monitoring / Quality
    Grafana
    ELK Stack
    SonarQube
    Control-M

  • Databases
    Microsoft SQL Server
    Relational database administration basics

Preferred Background:
Experience in regulated industries, or enterprise consulting.
Master/ PhD degree in Computer Science, Engineering, Mathematics, or STEM discipline.
Familiar with geospatial data.

Job Types: Full-time, Permanent

Pay: £45,000.00-£50,000.00 per year

Benefits:
Casual dress
Company events
Company pension
Cycle to work scheme
Enhanced maternity leave
Financial planning services
Private medical insurance
Sick pay
Work from home
Ability to commute/relocate

Newcastle upon Tyne NE1: reliably commute or plan to relocate before starting work (required)
Work Location: In person.