Backend Software Engineer (ML Infra) Job at Rockstar, San Francisco, CA

VnYyUklLVUJwZ1FKZHRVampBRmowRm5wNUE9PQ==
  • Rockstar
  • San Francisco, CA

Job Description

Rockstar is recruiting for a fast-growing startup that is building the AI backbone for the next generation of intelligent products. They help fast-growing AI startups design, fine-tune, evaluate, deploy, and maintain specialized models across text, vision, and embeddings. Think of them as “AWS for AI models”—not data or raw compute, but a full-stack backend for fine-tuning, reinforcement learning, inference, and long-term model maintenance. Their customers are Series A–C AI companies building enterprise-grade products. Their promise is simple: they make your AI system better.

They are hiring a Backend Software Engineer (ML Infrastructure) to help design, build, and scale the core systems that power large-scale model training and deployment.

The candidate will work on distributed training pipelines, cloud-native infrastructure, and internal developer platforms that support fine-tuning, reinforcement learning, and inference at scale. This role sits at the intersection of backend engineering and ML systems—the candidate will collaborate closely with ML engineers while owning production-grade infrastructure.

This is an ideal role for an early-career engineer who wants to work on real distributed systems, GPU workloads, and modern ML infrastructure—not dashboards or CRUD apps.

What You’ll Do

Build & Scale Core Infrastructure

- Design and implement backend systems that support large-scale ML workloads, including fine-tuning and reinforcement learning.

- Build distributed training and inference pipelines that are efficient, fault-tolerant, and observable.

- Develop internal developer tools and platforms that make it easier for ML engineers to train, evaluate, and deploy models.

Cloud & Systems Engineering

- Work on cloud-native systems using containers and orchestration (e.g., Kubernetes).

- Optimize systems for performance, reliability, and cost efficiency, especially for GPU-heavy workloads.

- Implement monitoring, logging, and observability for long-running training jobs and production services.

Collaborate with ML Engineers

- Partner closely with ML engineers to support evolving model architectures, training workflows, and evaluation needs.

- Translate ML requirements into scalable backend and infrastructure solutions.

Who You Are

Required

- 1–3 years of backend engineering experience, ideally working on production systems.

- Strong fundamentals in distributed systems, networking, and backend architecture.

- Experience building systems that scale under real load.

- Comfortable working in Python and/or Go (or similar backend languages).

- Excited to work on-site in San Francisco with a fast-moving early-stage team.

Strongly Preferred

- Experience with or exposure to ML infrastructure or ML platforms.

- Familiarity with GPU workloads, training pipelines, or inference systems.

- Experience with containerization and orchestration (Docker, Kubernetes).

- Contributions to or deep familiarity with ML infrastructure libraries such as:

- Ray

- vLLM

- SGLang

- or similar distributed ML systems

Bonus

- Computer science background from a top-tier program or equivalent demonstrated excellence.

- Open-source contributions, research projects, or side projects in systems or ML infrastructure.

- A track record of high ownership and technical curiosity.

Job Tags

Remote work

Similar Jobs

Nationwide Video

Audio Visual Screen Technician Job at Nationwide Video

Overview: This position performs quality control procedures on projection screens. Inspect vinyl projection screens for damage, stains, and missing parts. Cross check inbound returns. Cleaning and repairs as needed in the time allotted per Nationwides QC Handbook. ...

Mohawk Industries

Material Handler I Commerce TX B Shift Job at Mohawk Industries

 ...Function and Scope: Pulls products (bulk pallets using forklift or individual boxes manually), ranging from one small piece of...  ...Equipment Operate stretch wrap machine Attend any job-related training deemed necessary by supervision Follow all established... 

Procon Consulting

Construction Quality Assurance Representative - Federal Sector Job at Procon Consulting

 ...communication skills with the ability to collaborate effectively with contractors, engineers, and project managers. ~ Proficiency with...  ..., cost, or quality. ~ Ensure contractor compliance with federal and state labor and safety regulations, including Davis-Bacon... 

Sephora

Director of Email and Direct Mail Marketing Job at Sephora

Sephora seeks a Director of Email and Direct Mail Marketing in San Francisco to lead strategic marketing initiatives. The role involves overseeing campaign execution, developing high-impact strategies, and managing a team of 17. Candidates should have 10+ years of digital... 

Maxion Corp

Data Entry Clerk - Work From Home - Remote Job at Maxion Corp

 ...Join Our Team as a Work-From-Home Data Entry Research Panelist! Are you ready to earn money from the comfort of your own...  ...contribute to meaningful outcomes. Enjoy the freedom of remote work while building your career. This role is your...