Senior Platform Engineer
STN Inc
Posted 12 days ago
Senior Platform Engineer
Platform and software · shared across customers
Reports to: Director, Platform Engineering (or Chief Architect)
Location: Remote (US) or Pleasanton, CA (hybrid)
Department: Cloud Platform Engineering / GPU Platform Engineering
Position summary
The Senior Platform Engineer builds and operates the multi-tenant orchestration, scheduling, and customer-facing platform layer that turns raw GPU infrastructure into a usable cloud service. This role is the software backbone of GPU One (GPUaaS).
Key responsibilities
Design and build the orchestration layer (Kubernetes, Slurm, Run:ai, or comparable)
Manage multi-tenant isolation including namespaces, networking, storage, and quotas
Build customer-facing platform APIs, CLIs, web portals, and SDKs
Implement and operate image management, GPU operator, and node provisioning automation
Drive infrastructure-as-code and automation across the platform stack
Partner with SRE on platform reliability, SLO definition, and observability
Support TAM and Support engineers on customer-impacting platform issues
Maintain customer environment templates, configuration management, and rollout tooling
Participate in architecture review, design discussions, and technical roadmap
Drive continuous platform improvement and reduce operational toil
Required qualifications
6+ years in platform engineering, SRE, or cloud engineering at scale
Deep Kubernetes expertise including CRDs, operators, and multi-tenant patterns
Strong programming skills in Go, Python, or both
Experience operating GPU clusters or AI infrastructure at production scale
Bachelor's degree in computer science or equivalent experience
Preferred qualifications
Experience with NVIDIA GPU Operator, MIG, MPS, and NCCL operator patterns
Familiarity with Slurm operator, Run:ai, KubeRay, or comparable AI orchestration
Job details
Jobr Assistant extension
Get the extension →