Senior Backend & Infrastructure Engineer

This is a hands-on backend and infrastructure role. You will write production code, make architecture choices, and own systems from design through deployment, monitoring, and incident response.

Why Scispot

Scispot began after my brother Guru and I (Satya) watched someone they loved run out of time while slow, manual lab processes delayed a promising treatment. We are building Scispot so life-saving science can move at software speed.
Biotech & Lifescience teams should not have to choose between moving fast and keeping their data clean, connected, traceable, and ready for AI.
We are building the digital backbone for scientific discovery. Scispot connects lab operations, instrument data, scientific workflows, and AI-driven insights in one platform. This becomes the memory layer for lifescience teams for their agents.
Your code will not optimize clicks for another consumer app. It will help scientists run experiments faster, trace samples accurately, automate repetitive work, and move treatments closer to patients.
This is a rare chance to build infrastructure at the intersection of software, AI, data, and biology.

Today, Scispot supports more than 100 labs, 250+ instrument types, over 1,000 experiments each month, and millions of samples. After raising an $8M Series A, we are expanding the engineering team to build the reliable platform beneath the next generation of lab automation and AI

The role

We are looking for a senior backend and infrastructure engineer who treats production as a product.

You will own the systems beneath Scispot: backend services, messaging, databases, cloud infrastructure, CI/CD, observability, security, and reliability.

This is not an ops-only role.

A normal week may include:

Tracing a RabbitMQ bottleneck.
Building a FastAPI or Spring Boot service.
Tuning PostgreSQL or ElasticSearch.
Improving an EKS rollout.
Designing a safer AWS and Azure boundary.
Reducing cloud cost without weakening reliability.
Debugging a production issue across code, queues, caches, and infrastructure.
Optimising the workload for AI pipelines

You will work closely with the founders and product engineers.

You will get broad goals, real customer stakes, and room to decide how to solve the problem. We want someone who acts like an owner, not someone who waits for a perfect ticket.

What you’ll own

Design, build, and operate cloud infrastructure across AWS and Azure for scale, reliability, security, and cost efficiency.
Build and evolve backend services in Python and FastAPI, Java and Spring Boot, or closely related frameworks.
Own backend reliability and performance across services, dependencies, queues, caches, databases, and external integrations.
Build and improve CI/CD pipelines so the team can deploy quickly, safely, and with clear rollback paths.
Run production end to end. This includes deployments, monitoring, alerting, debugging, incident response, post-incident follow-up, and capacity planning.
Design event-driven and asynchronous workflows using RabbitMQ or similar messaging systems.
Use Redis and other caching patterns to improve latency, throughput, and resilience.
Operate relational data stores in RDS, graph workloads in Cosmos DB, and NoSQL or vector workloads in MongoDB Atlas.
Build useful observability with logs, metrics, traces, dashboards, and alerts using tools such as Datadog and ELK.
Improve network and application security. This includes VPC design, secrets management, access control, encryption, and auditability.
Turn repeated operational work into code, tools, runbooks, and guardrails that raise developer velocity.
Make clear trade-offs among speed, reliability, maintainability, compliance, and cloud cost.

Problems you may work on

How do we absorb bursts of instrument and workflow data without losing work, creating duplicates, or slowing customer-facing services?
How do we preserve sample lineage, permissions, and audit history as data moves across services and cloud systems?
How do we make graph and vector retrieval dependable enough to support AI features used in real lab workflows?
How do we let engineers ship many services quickly while keeping deployments observable, reversible, and safe?
How do we scale across AWS and Azure without building fragile one-off infrastructure or wasting cloud spend?
How do we find production risks before customers do, then remove the root cause instead of only treating the symptom?

What success looks like in your first 90 days

First 30 days

Map the architecture, critical customer flows, deployment path, data stores, and main production risks.
Ship at least one useful production improvement in a backend service, deployment workflow, observability path, or reliability issue.
Join incident response and learn the current operating model.
Establish a baseline for the system health metrics that matter most.
Understand all AI traces leveraging langfuse and suggesting the feedback pipeline for AI workloads

By 60 days

Become the clear owner of at least one service or infrastructure domain.
Remove a meaningful bottleneck or source of operational risk in messaging, caching, databases, deployments, or cloud infrastructure.
Improve CI/CD, dashboards, alerts, runbooks, or automated recovery so the team can move faster with less production risk.

By 90 days

Lead and launch a major backend or infrastructure initiative from design through production rollout.
Show a measurable gain in reliability, developer speed, performance, cloud cost, or incident reduction.
Present a practical roadmap for the next two quarters, including the highest-leverage technical risks, trade-offs, and milestones.

What we’re looking for

Roughly 3+ years of experience building and operating backend, platform, DevOps, SRE, or infrastructure systems. Strong evidence matters more than a precise year count.
Strong production backend experience with Python and FastAPI, Java and Spring Boot, or a comparable stack.
A solid grasp of distributed systems, asynchronous processing, failure modes, retries, idempotency, and message-driven architecture.
Hands-on experience with AWS and Azure. You have made practical decisions about networking, compute, storage, identity, security, and cost.
Solid understanding of AI pipelines, prompt looping, langfuse tracing, and context engineering.
Experience with Docker and Kubernetes. Production EKS experience is especially useful.
Strong database fundamentals across relational systems such as MySQL or PostgreSQL.
Exposure to graph, NoSQL, or vector data systems.
Experience designing and consuming REST APIs.
Experience operating services that other teams or customers depend on.
Proven ownership of systems from architecture and implementation through deployment, monitoring, debugging, and optimization.
Strong production debugging and performance-tuning skills.
The ability to move from a symptom to the root cause across application code and infrastructure.
Good security instincts around secrets, access control, network boundaries, data protection, and regulated environments.
Clear written and verbal communication. You explain trade-offs, surface risks early, and leave systems easier for others to operate.

Strong pluses

Infrastructure as code with Terraform or a similar tool.
Datadog, ELK, OpenTelemetry, or another observability stack.
Microservices and event-driven systems at meaningful production scale.
Vector search, AI infrastructure, or LLM-powered applications.
Security, compliance, audit trails, or regulated software environments.
Experience in an early-stage startup where you operated beyond a narrow job boundary.
Life-science, lab, healthcare, manufacturing, or other data-heavy workflow experience.
Building short term or long term memory layer using databases that work well with agents.

Our stack

Python, FastAPI, Java, Spring Boot, REST APIs, RabbitMQ, Redis, MySQL, PostgreSQL, AWS RDS, Cosmos DB, MongoDB Atlas, Docker, Kubernetes, EKS, AWS, Azure, Terraform, Datadog, and ELK.

You do not need every exact vendor on day one.

You do need strong backend and systems fundamentals, recent hands-on production work, and the ability to learn the missing pieces quickly.

What high agency means here

You do not wait for a detailed ticket when the system, customer impact, and goal are clear.
When production breaks, you help diagnose it, mitigate it, communicate clearly, and prevent a repeat.
You can choose a practical first version without creating a dead end.
You challenge weak assumptions with data and propose a better path.
You care about the outcome after launch, not only whether the code merged.
You make the engineers around you faster through tools, documentation, feedback, and sound technical judgment.

This role is probably not for you if

You want to work only in application code or only in cloud configuration. This role spans both.
You need requirements to be complete before you can start making progress.
You prefer handing production issues to another team once your code is deployed.
You avoid on-call work, incident response, or the operational side of software.
You optimize for elegant abstractions before proving that they solve the real problem.
You want the narrow boundaries and long approval chains of a large company.

Why join Scispot

Build systems that sit between physical lab work and AI-driven action. Reliability here has a direct effect on how fast scientists can work.
Own important production systems early. Your decisions will shape the platform rather than disappear into a large org chart.
Work directly with founders and a small team that values speed, precision, clear thinking, and honest feedback.
Solve hard problems across backend engineering, cloud infrastructure, distributed data, security, and developer experience.
Join at a stage with real customer usage and fresh capital, while the technical foundation is still open to meaningful change.

How to apply

Along with your resume or YC profile, tell us about one production system you personally owned.

In a few bullets, explain:

What the system did and what was at stake.
What was unclear or broken when you took ownership.
The key technical choices you made.
The trade-offs you considered.
A production failure or hard constraint you had to work through.
The result for users, reliability, speed, cost, or the engineering team.

Links to code, technical writing, architecture notes, open-source work, or a product you shipped are welcome.

We care more about clear evidence of ownership than employer prestige or polished interview language.

Join Scispot if you want to build like a founder, remain close to users, and apply software and AI to problems that matter beyond software. You will have high autonomy, direct feedback, hard technical problems, and a visible link between what you ship and how quickly scientists can do their work.

About Scispot

Company Size: 11 - 50 People

Year Founded: 2020

Country: United States

Senior Backend & Infrastructure Engineer

About Scispot

Share This Job

More Companies Hiring

ANORIA

Claim Health

Scalar Field

Fort

Fiber AI

Connect with Us

Discover More