Prinicpal Architect, Site Reliability Engineering
PelotonAbout the job
ABOUT THE ROLE
At Peloton, we provide a seamless experience for our members. To achieve that, our internal engines—Finance, HR, Supply Chain, and Legal—must run with the same precision as our world-class fitness content.
As the Prinicipal Architect of SRE for Internal Systems, you will lead the team responsible for the "Order-to-Cash," "Procure-to-Pay," and "Record-to-Report" lifecycles. You aren't just managing infrastructure; you are the architect of business continuity. You will lead a team of high-performing SREs to ensure our global SaaS ecosystem (NetSuite, Coupa, Workday) and underlying network infrastructure are resilient, observable, and ready to scale.
YOUR DAILY IMPACT AT PELOTON
- Lead, mentor, and grow a team of SREs. Conduct 1:1s, define career growth paths, and foster a culture of high accountability and psychological safety
- Transition from reactive support to proactive engineering. Align the team’s quarterly goals with broader Finance and Supply Chain digital transformation initiatives
- Architect observability across complex business paths (e.g., ensuring a customer order flows from e-commerce through supply chain into the financial ledger)
- Partner with business owners to define and track Service Level Objectives (SLOs) and Error Budgets for critical SaaS integrations
- Own the Major Incident Response process for corporate systems. Ensure "War Rooms" are efficient and result in actionable improvements
- Lead the Root Cause Analysis (RCA) process, ensuring a culture of continuous learning and systematic "toil" reduction
- Oversee the reliability of API-driven connections and identity management (Okta/Azure AD) across our tech stack
- Champion "Infrastructure as Code" (IaC) to automate manual hand-offs between business systems using Python, Go, or Terraform
YOU BRING TO PELOTON
- 8+ years in SRE, DevOps, or Production Engineering, with 2+ years of direct people management experience
- Deep understanding of Order-to-Cash or Procure-to-Pay cycles. You can translate a "database lag" into its specific impact on warehouse shipping or financial reconciliation
- Management of enterprise ecosystems (NetSuite, SAP, Workday, Salesforce)
- Solid grasp of Networking (SD-WAN, VPNs), Identity (IAM), and Endpoint Management
- Proficiency with Datadog, Splunk, New Relic, or Prometheus
- Proven ability to communicate technical risk to non-technical stakeholders (CFO, General Counsel, Head of People
WHAT SUCCESS LOOKS LIKE IN 6 MONTHS
- You have established a predictable sprint velocity and a clear 12-month reliability roadmap for your direct reports
- You’ve launched an Executive Health Dashboard that provides real-time visibility into the health of Peloton's financial and logistical "nervous system"
- Through your team's automation efforts, you have reduced manual toil and decreased the Mean Time to Recovery (MTTR) for Tier-1 internal outages by 20%
#LI-DD1
#LI-REMOTE
#LI-DD1
#LI-Hybrid
Skills & tags
Compare the essentials before you leave: pay, remote scope, employment type, source, and the employer apply destination.