We’re looking for an experienced platform engineer who can integrate infrastructure into a platform with a focus on reliability and scalability. Our teams and users will trust and rely on the platform you help build to do their best work and build their businesses.
What we're looking for
- Strong interest in development platforms, MLOps, CI/CD, infrastructure or making products for technical teams
- Able to make effective trade-offs in regards to both engineering and product requirements, while balancing short term and long term needs
- 8+ years relevant industry experience in a fast-paced, high growth tech environment building and scaling internal platforms using Javascript, Typescript, or Go
- Experience with system, API, and infrastructure design using cloud concepts such as storage volumes, private networks, container scheduling, and Kubernetes
What you'll be doing
- Own a technical area by leading a team
- Work primarily in Typescript/Javascript and some Go
- Implement MLOps platform features that integrate with Kubernetes supporting multiple clouds providers
- Integrate cloud capabilities with the platform such as: storage, private networks, and load balancers
- Add and maintain integrations to platform entities such as storage providers, datasets, models, and metrics
- Ship product features from planning to launch to maintenance with high autonomy
- Collaborate with other engineers to find elegant architectures and solutions
Technical problems the team has worked on
- Integrating storage quotas with a distributed file system and Kubernetes
- Wrote a fluent-bit plugin in Rust to handle multi-tenancy authentication for logging
- Created a multi-node image cacher for Kubernetes that supports limits
- Create an architecture to store datasets with a unified interface to multiple S3 providers
- Create a user facing YAML spec to be used for a workflow engine to support CI/CD primitives
- Create a terraform module for cluster provisioning and a CLI to facilitate installation