Senior Platform Engineer
Date Posted
22 November, 2025
Salary Offered
$75,000 — $100,000 yearly
Experience Required
3+ years
Remote Work
Not Allowed
Stock Options
No
Vacancies
1 available
About the Role
We’re looking for a Senior Platform / Infra Engineer to own the core infrastructure that powers Cosine’s products — from Kubernetes and deployment pipelines to networking and platform services.
You’ll design and run the “paved road” that our engineers, researchers, and customers build on: reliable Kubernetes clusters, fast and safe CI/CD, solid observability, and hardened environments for demanding enterprise and on-prem deployments. You’ll also wear a classic “DevOps/SRE” hat: thinking in SLOs, running incident response, and keeping us up even as we move quickly.
This is a high-ownership role at a fast-paced, venture-backed Silicon Valley startup. You’ll work directly with founding engineers and leadership, and your decisions will materially shape how we build and ship products.
What You’ll Do
-
Own core infrastructure
- Design, operate, and evolve our Kubernetes-based platform (EKS or similar), including cluster topology, node groups, autoscaling, and multi-environment isolation.
- Manage supporting cloud resources: container registries, load balancers, queues, caches, and data infra needed to run our APIs and agents.
-
Build the deployment & tooling layer
- Design and maintain CI/CD pipelines for image builds and infra rollouts (e.g. Pulumi/Terraform + Helm/Docker).
- Implement safe rollout strategies (blue/green, canary, staged rollouts) and fast rollback paths.
- Build internal tools and abstractions that make it easy for product teams to self-serve infra safely.
-
Own reliability & operations (SRE-ish)
- Define and track SLOs/SLIs for key services (latency, error rates, availability).
- Improve our observability stack (metrics, logs, traces, alerts) so issues are obvious, actionable, and debuggable.
- Participate in the on-call rotation, lead incident response when needed, and drive blameless post-mortems and fixes.
-
Shape networking & security
- Design and maintain networking: VPCs, subnets, ingress/egress, service meshes / L7 routing, DNS, and TLS.
- Implement least-privilege access via IAM, secure secret management, and hardened configurations for multi-tenant and isolated customer environments.
- Help design patterns for secure enterprise and on-prem / regulated deployments.
-
Partner with product & research
- Work closely with application, ML, and research teams to understand their needs and translate them into reusable infra building blocks.
- Provide guidance on “how to run this in production” — capacity planning, failure modes, and operational readiness reviews.
You Might Be a Great Fit If You
-
Have strong experience
- 5+ years building and operating production infrastructure on a major cloud (AWS, GCP, or Azure).
- Significant hands-on experience running Kubernetes in production (EKS/GKE/AKS or self-managed):
- Cluster upgrades, autoscaling, node group design, and multi-env setups.
- Helm or similar for packaging services.
-
Think in infrastructure-as-code
- Deep experience with IaC tools (Pulumi, Terraform, CDK, or similar).
- Comfortable managing infra changes via code review, CI, and automated rollouts.
-
Care deeply about reliability
- Have owned the uptime and performance of user-facing systems.
- Comfortable participating in (and improving) on-call rotations and incident management.
- Experience setting up / tuning observability (Prometheus, Grafana, CloudWatch, OpenTelemetry, etc.).
-
Build great tooling & abstractions
- You’ve built internal tools, libraries, or platforms on top of cloud providers so product teams can move faster with fewer foot-guns.
- You think about developer experience and “golden paths,” not just raw infra.
-
Are comfortable in code
- Strong scripting and programming skills in at least one modern language (e.g. TypeScript, Go, Python).
- Happy to dive into app code when needed to debug a production issue or improve an integration.
-
Have the startup mindset
- Enjoy working in a fast-moving environment with evolving priorities and incomplete specs.
- Bias toward pragmatic solutions: ship something small, measure, iterate.
- Communicate clearly, give/receive direct feedback, and collaborate across functions.
Nice to Have (Not Required)
- Experience with:
- AWS primitives like EKS, ECS/Fargate, ECR, SQS, ElastiCache/Redis.
- Argo CD or other GitOps tools for Kubernetes.
- On-prem, air-gapped, or regulated industry deployments (e.g. finance, healthcare).
- AI/ML infrastructure (GPU workloads, model hosting, feature stores).
- Prior experience as an early infra / platform hire at a startup.


Cosine



