Devops Engineer Job Opening in Farringdon, Central London

Job Details

Explore Location

Test Triangle

Farringdon, Central London, United Kingdom

Description

DevOps Engineer — EMEIA Infrastructure
ROLE OVERVIEW
We are looking for a skilled and pragmatic DevOps Engineer to own and evolve our infrastructure across the EMEIA region. This is a dual-horizon role: you will keep our existing VM-based systems healthy while leading a greenfield effort to design and build the managed environment that those solutions will migrate onto.
A significant proportion of what we build is produced rapidly using AI-assisted, structured development. That means our solutions can move from idea to deployment faster than ever, and our infrastructure needs to keep pace. We need someone who thrives in a fast-moving, ambiguous environment, can absorb change quickly, and treats adaptability as a core part of the job rather than an occasional demand.
The new managed environment is most likely to be based on Kube — Apple’s internal Kubernetes (EKS) deployment — though the final architecture will be a team decision and AWS@Apple remains an option for workloads requiring greater control. You will help inform that decision and then own the build-out, regardless of which direction is chosen.
You will work closely with data engineers, developers, and analysts, acting as the infrastructure backbone for a team that moves quickly and expects you to move with it. The role also involves working directly with third-party vendors who support some of the tools being deployed, and collaborating with teams outside of EMEIA — including WorldWide — to align on standards, share solutions, and resolve cross-regional dependencies.
KEY RESPONSIBILITIES
Platform Migration & Environment Design

Lead the design and build-out of a new managed container environment to replace existing VM-based infrastructure — the most likely candidate is Kube (Apple’s internal Kubernetes/EKS cluster), but the final decision will be made collaboratively as a team
Contribute meaningfully to the environment selection decision: weigh trade-offs between managed solutions (Kube) and more directly controlled alternatives (AWS@Apple), considering maintenance overhead, operational control, and team capability
Own the migration of existing VM-based workloads onto the new platform, managing sequencing, risk, and continuity of service throughout
Establish and maintain the standard workflow for deploying solutions: build locally → containerise → publish to Kube → configure connectivity to Apple internal system dependencies

Apple Internal Networking & Connectivity

Configure and maintain networking between Kube and Apple’s internal systems, including Shield, Snowflake, Appleconnect, Floodgate, and any other platform dependencies the team relies on
Own namespace and compute provisioning on the shared Kube cluster, ensuring workloads are appropriately isolated and correctly configured
Manage credentials, service accounts, and access controls across the full connectivity chain — from container to downstream service
Act as the go-to expert on how things connect within Apple’s internal network topology

Infrastructure Management

Own and manage cloud infrastructure across EMEIA using internal cloud tooling (cloud.apple.com and connected systems including Shield)
Manage certificates, firewalls, resource pools, networking, and access controls
Ensure infrastructure is appropriately sized, resilient, and cost-efficient
Maintain accurate documentation of infrastructure topology and configuration

VM Provisioning & Automation (Existing Estate)

Maintain and operate existing virtual machines, primarily on RHEL, while migration to the new environment is in progress
Build and maintain standardised, repeatable provisioning processes (e.g. via Ansible, Terraform, or equivalent IaC tooling)
Manage package deployment, software repositories, databases, and web servers
Own the patching and update lifecycle for managed systems

Monitoring & Reliability

Implement and maintain monitoring, alerting, and observability across both the existing VM estate and the new container environment
Proactively identify risks, bottlenecks, and failure patterns before they impact users
Define and track appropriate SLIs/SLOs for critical services
Conduct post-incident reviews and drive lasting improvements

Supporting AI-Augmented Development

A large proportion of the solutions you will support are built rapidly using structured AI-assisted development — you must be comfortable working with codebases and configurations that evolve quickly, may not have deep documentation histories, and may have been substantially generated with AI tooling
Provide the infrastructure scaffold that allows AI-assisted solutions to move from local development to production reliably and safely
Be a pragmatic partner to developers: unblock deployment quickly, catch infrastructure-level risks early, and help establish patterns that make rapid iteration safe at scale
Actively use AI tools (e.g. Claude, Copilot, or similar) to accelerate your own work: writing scripts, diagnosing issues, generating runbooks, reviewing configurations

Diagnosis & Incident Response

Take ownership of vague or ambiguous production issues (e.g. “it’s running slow”, “the server keeps falling over”) and drive them through to resolution
Deliver short-term fixes rapidly to restore service, while tracking and delivering long-term root cause resolutions
Maintain a pragmatic balance between speed-of-recovery and quality-of-fix

SKILLS & EXPERIENCE
Essential

Proven experience in a DevOps, infrastructure, or platform engineering role
Hands-on experience with Kubernetes — deploying, configuring, and operating workloads in a shared or managed cluster environment
Experience containerising applications: writing Dockerfiles, managing images, publishing to a registry, and debugging container-level issues
Strong networking fundamentals: DNS, TLS/SSL certificates, firewall rules, load balancing, VPNs, and service-to-service connectivity
Comfort operating in environments where the architecture is still being defined — able to contribute to the decision, then execute once direction is set
Hands-on experience with RHEL (or equivalent enterprise Linux) — provisioning, hardening, package management (yum/dnf), systemd services
Experience managing cloud infrastructure, ideally in an enterprise private/hybrid cloud environment
Experience with infrastructure-as-code or configuration management tooling (e.g. Terraform, Ansible, Puppet, or similar)
Solid scripting ability in Bash and at least one higher-level language (Python preferred)
Experience with monitoring and observability tooling (e.g. Prometheus, Grafana, Datadog, or similar)
Strong incident diagnosis skills — able to work from vague symptoms to root cause using logs, metrics, and reasoning
Comfortable working with AI-generated or AI-assisted codebases: reading, extending, and debugging solutions without a full traditional authorship history
Clear written and verbal communication — able to translate infrastructure complexity for non-technical stakeholders

Desirable

Experience with AWS or AWS@Apple, particularly EKS
Familiarity with Apple’s internal platform tooling: Kube, Shield, Appleconnect, Floodgate, or similar
Experience integrating with Snowflake, including managing drivers, credentials, and network access
Experience with CI/CD pipelines (GitLab CI, Jenkins, GitHub Actions, or similar)
Exposure to security tooling, vulnerability scanning, or compliance frameworks (e.g. CIS Benchmarks)
Familiarity with secrets management tooling (Vault, CyberArk, or similar)
Experience working in a regulated or enterprise environment with change management processes

WAYS OF WORKING

You are comfortable with genuine ambiguity — including at the architectural level — and can make progress and contribute to decisions without waiting for everything to be resolved
You default to automation: if you do something twice, you script it; if you do it three times, you build a process
You adapt quickly: the tools, environments, and solutions you support can change fast, and you treat that as normal rather than exceptional
You are pragmatic under pressure: you know when to stop the bleeding first and fix it properly later
You are self-directed and comfortable owning problems end-to-end with minimal hand-holding
You are a willing partner to developers who move fast — you keep up, add guardrails where they matter, and don’t become a bottleneck

WHAT SUCCESS LOOKS LIKE

A new managed container environment is designed, built, and running — with existing VM-based workloads migrated onto it in a controlled, sequenced way
The standard deployment path (build → containerise → publish → connect) is well-established, documented, and easy for the team to use
Connectivity from the new environment to Apple internal systems (Snowflake, Appleconnect, Shield, Floodgate, etc.) is reliable, well-understood, and correctly secured
Teams are unblocked quickly when they need new integrations, access, or capabilities — even when the solutions they are deploying have been built at speed
Production issues are resolved rapidly, with lasting fixes following close behind
Monitoring catches issues before users do
The infrastructure estate — both old and new — is well-documented, well-understood, and in a known-good state

Job ID: f14adde5-5748314640

Jobs You May Like

Sr Lead Software Engineer, Software Defined...

JPMorganChase

London, United Kingdom (on-site)

Lead Software Engineer, Software Defined...

JPMorganChase

London, United Kingdom (on-site)

Lead Python Software Engineer - Asset Management...

JPMorganChase

London, United Kingdom (on-site)

Lead Software Engineer - Full Stack AI/ML

JPMorganChase

London, United Kingdom (on-site)

Job Location

Community Intel Unavailable

Details for Farringdon, Central London, United Kingdom are unavailable at this time.