Forward Deployed ML Engineer

Triomics Logo Triomics United States

Date Posted

13 April, 2026

Work Location

New York, United States

Salary Offered

$170,000 — $190,000 yearly

Job Type

Full Time

Experience Required

1+ years

Remote Work

Not Allowed

Stock Options

Vacancies

1 available

About Triomics

Triomics is building the agentic AI layer for oncology EHRs. Cancer hospitals spend billions on highly trained staff manually reading unstructured patient records - pathology reports, clinical notes, genomic panels - to power workflows like trial matching, registry curation, visit prep, and quality reporting. We replace that manual work with task-driven AI agents that sit inside the EMR and process records at scale, in real time.

Our platform is trusted by the 4 of the top 10 Best Hospitals for Cancer by U.S.News and several of the largest community practices. We have grown 10x in the last year and process millions of oncology medical documents monthly.

Our investors include Lightspeed, General Catalyst, Nexus Venture Partners and Y-Combinator.

Role

Build and deploy AI agent pipelines that extract structured oncology variables from unstructured patient documents for tailor made use cases for pharmaceutical companies and cancer hospitals. You own the full cycle: understanding the customer's data dictionary, studying the source clinical documents, building extraction agents, evaluating accuracy, deploying to production, and iterating until it works. This role requires someone who can go deep into both the agentic layer as well as the clinical domain, coordinate across customer and internal teams, and deliver under deadline pressure.

Responsibilities

Design and build agentic extraction pipelines that process 500+ page patient charts (clinical notes, pathology reports, imaging reports, genomic panels) and output structured JSON per customer data dictionaries
Own accuracy end-to-end: define evaluation datasets, run precision/recall analysis per variable, identify failure modes, and improve through agent architecture changes, prompt engineering, fine-tuning, or rule-based post-processing
Go deep into the clinical source data - read the actual patient charts, understand how oncologists document, learn why certain data points are ambiguous and use that understanding to improve extraction
Work with the clinical annotation team to build gold-standard datasets and resolve edge cases
Coordinate with customer data science and clinical teams to clarify dictionary definitions, review output quality, and close accuracy gaps
Coordinate with internal engineering and infrastructure teams to deploy, scale, and monitor pipelines in production
Deliver on customer timelines - this means intense sprint periods around customer deliveries followed by iteration and improvement cycles

What Success Looks Like in the First 90 Days

Days 1-30: Learn the stack, the data, and the domain.

You should be reading real patient charts within your first week - not abstractions of them. Understand how oncologists document across clinical notes, pathology reports, imaging, and genomic panels. Learn why the same data point (e.g., disease stage, biomarker status, line of therapy) shows up differently across document types and why extraction is hard. Get hands-on with the existing extraction pipeline architecture: how agents are orchestrated, how documents are segmented and classified, how structured JSON is produced, and where the current system fails. Run the evaluation suite on an active customer dictionary and understand the per-variable accuracy breakdown - which variables are easy, which are hard, and why. By end of month one, you should be able to explain the top 5 failure modes in the current extraction pipeline and have an opinion on which ones are fixable with prompt/agent changes vs. which require deeper architectural work.

Days 30-60: Own a customer delivery end-to-end.

Pick up an active customer workstream -- a new dictionary, a new tumor type, or an accuracy improvement cycle on an existing delivery. Run it yourself: study the customer's data dictionary, map it to the source documents, build or modify the extraction agents, define the evaluation dataset with the annotation team, run precision/recall per variable, and iterate until accuracy targets are met. You should be coordinating directly with the customer's data science team on edge cases and definition ambiguities. Simultaneously, you should be identifying patterns across customer dictionaries.

Days 60-90: Ship improvements and have an opinion on every decision

Deliver measurable accuracy improvements on your owned workstream - concrete numbers, not vibes. Document the pipeline architecture, evaluation methodology, and customer-specific decisions well enough that another engineer can pick up the work. You should have a point of view on how to standardize extraction pipelines across customers so that new dictionary onboarding takes days, not weeks.

Requirements

2+ years building ML/AI systems in production
Built and deployed AI agents or multi-step LLM pipelines (not just single-call wrappers) - you should have a clear point of view on agent architectures, tool use, orchestration frameworks, and where they break down
Strong Python - pipeline code, data processing, infrastructure glue, not just model training scripts
Practical LLM experience: prompt engineering, fine-tuning, RAG, evaluation design
Built evaluation frameworks for LLM based document extraction tasks (precision, recall, per-class analysis, error taxonomy)
Willingness to become a domain expert in oncology data - this role requires going deep into clinical documentation, not just treating it as generic text
Comfortable owning customer-facing communication alongside technical delivery - you'll talk to customer data science teams, clinical teams, and internal engineering regularly
Can operate in high-intensity delivery sprints and manage your own time across multiple workstreams

Preferred

Kept up with the agentic ML landscape - frameworks, patterns, and failure modes in production agent systems
Clinical or biomedical NLP is a plus but not required - what matters is willingness to go deep into the domain