LLM/ML Engineer (Inference)

Reducto Logo Reducto United States

Date Posted

03 Jan, 2025

Work Location

California, United States

Salary Offered

$150000 — $240000 yearly

Job Type

Full Time

Experience Required

3+ years

Remote Work

Not Allowed

Stock Options

Yes

Vacancies

1 available

We would love to meet you if you:

Philosophy: You are your own worst critic. You have a high bar for quality and don’t rest until the job is done right—no settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.
Experience: You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.
Approach: You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

The core work will include:

Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
Optimizing model serving infrastructure for high throughput and low latency at scale
Developing and integrating advanced inference optimization techniques
Working closely with our research team to bring cutting-edge capabilities into production
Building developer tools and infrastructure to support rapid experimentation and deployment.

Bonus points if you:

Have experience with low-level systems programming (CUDA, Triton) and compiler optimization
Are passionate about open-source contributions and staying current with ML infrastructure developments
Bring practical experience with high-performance computing and distributed systems
Have worked in early-stage environments where you helped shape technical direction
Are energized by solving complex technical challenges in a collaborative environment

This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

About Reducto

The most accurate API to parse documents

Company Size: 11 - 50 People

Year Founded: 2023

Country: United States

Share This Job

Junior Software Engineer - Remote (Global)

Full Time

$10000 - $18000 yearly

Product Designer for Top Social App Fam (Series A)

New York

Full Time

Account Executive B2B (AE)

California

Full Time

$150000 - $200000 yearly

BEAMSTART is a global entrepreneurship community, serving as a catalyst for innovation and collaboration. With a mission to empower entrepreneurs, we offer exclusive deals with savings totaling over $1,000,000, curated news, events, and a vast investor database. Through our portal, we aim to foster a supportive ecosystem where like-minded individuals can connect and create opportunities for growth and success.

Connect with Us

Discover More

Home

Jobs

Investors

Members

LLM/ML Engineer (Inference)

We would love to meet you if you:

The core work will include:

Bonus points if you:

About Reducto

The most accurate API to parse documents

Share This Job

More Full Time Jobs

Junior Software Engineer - Remote (Global)

Product Designer for Top Social App Fam (Series A)

Account Executive B2B (AE)

Sales Development Associate

Full Stack Developer (LATAM)

More Companies Hiring

Diode Computers, Inc.

Lamar Health

Reacher

Mineflow

Rosebud AI

Connect with Us

Discover More