BEAMSTART Logo

HomeJobsFull Time

ML Infrastructure Engineer

Cerebrium LogoCerebrium


Date Posted

27 Apr, 2023

Salary Offered

$100 — $140 yearly

Job Type

Full Time

Experience Required

3+ years

Remote Work

Allowed

Stock Options

Yes

Vacancies

1 available


Cerebrium is a machine learning platform that makes it easy for companies to fine-tune, deploy and monitor ML models in production. We abstract away the complexity that comes with managing ML infrastructure and the need to stay up to date with the latest ML research and technologies.

As an ML Infrastructure Engineer, you'll work closely with our team to build and operate our infrastructure at scale. You'll help us optimize our model training and deployment pipeline, implement GPU sharding and time-slicing on top of K8’s, implement parallelism and scale all of this across 1000’s of machines to ensure our models and fine-tuning jobs run smoothly.

The ideal candidate has experience building and scaling infrastructure at a large scale, expertise in Kubernetes and serverless architectures, willingness to dive into the depths of CUDA. Additionally, you should be able to communicate complex topics clearly and work closely with customers - typically they are all machine learning engineers and software engineers so these should be your people

How we work:

  • We focus on output. We don’t care what hours you work or from where you work. Just do what it takes to meet your weekly sprint. Finished early or just not having a good day - take the day.
  • We have a flat structure and want to constantly to be challenged by you. In terms of product and company decisions.
  • We ship multiple times a week - every time we can add value to the customer we ship it. Also you do weekly demos to team members on what you have been building.
  • We offer a remote, international work environment and only require 4 hours overlap with the team (we work on EST).

Responsibilities:

  • Build and operate our infrastructure at scale
  • Optimize and deploy machine learning models and training jobs at scale
  • Improve GPU-sharing and time-slicing capability

Qualifications:

  • Experience building and scaling infrastructure at a large scale, particularly with GPU based systems.
  • Expertise in Kubernetes, helm and serverless architectures.
  • Proficiency with Python or Go.
  • Experience with CUDA and NVIDIA libraries such as TensorRT.
  • Experience with infrastructure as code such as Terraform.
  • Good C++ knowledge is a plus

Where will you be in a year:

A big part of our company is to focus on you, the team, so you can focus on the customer. So there is a few things we hope will happen in a year:

  • You will be 2x the engineer you were when you joined working with new technologies and problems you haven’t before.
  • You have been able to connect with the brightest people either on our team or in industry to solve some of the toughest challenges
  • Your work isn’t getting any easier - you are challenged daily
  • You are a leader in a growing company
  • Most importantly, you are happy. Work is a large part of your life - so you must enjoy it

Why should I join - sell me this pen:

  • We are growing quickly (40% WoW) that we can’t build product fast enough for our users.
  • We are working with top companies from Seed to Series B.
  • We don’t micro manage.
  • We are helping companies build the next generation of software through machine learning.
  • We are becoming the preferred product for our users and we are doing it with half the team size

Benefits

  • Competitive salary and meaningful equity
  • Flexible work environment – work remotely from home or from a WeWork which we sponsor
  • Health, dental, and vision benefits with 80% coverage for you
  • Unlimited PTO
  • Opportunities to speak and participate in events across the Cloud Native community
  • 2-3 company off-sites a year. We have previously done Tulum and Greece.
  • Learning budget, and much more

About Cerebrium

Cerebrium Logo

AWS Sagemaker Alternative

Company Size: 1 - 5 People
Year Founded: 2021
Country: United States

BEAMSTART

BEAMSTART is a global entrepreneurship community, serving as a catalyst for innovation and collaboration. With a mission to empower entrepreneurs, we offer exclusive deals with savings totaling over $1,000,000, curated news, events, and a vast investor database. Through our portal, we aim to foster a supportive ecosystem where like-minded individuals can connect and create opportunities for growth and success.

© Copyright 2024 BEAMSTART. All Rights Reserved.