Suraj Subramanian

Menlo Park, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

Suraj Subramanian is a Menlo Park–based software engineer with around a decade of experience building ML systems across finance and healthcare. He blends hands-on MLOps and ML engineering—specializing in distributed PyTorch training, DDP, multi-GPU/multi-node workflows, and experiment tracking—with full‑stack contributions to open-source projects. His work spans practical RL tutorials (Mario), production-ready training tooling (snapshot/resume, SLURM, torchrun), and high-impact docs and integrations for Llama2 on Hugging Face in the popular Llama cookbook. Notably, he reorganized repo structures and quickstart notebooks to make complex model workflows easier to reproduce, reflecting a bias for developer ergonomics as well as performance. Colleagues rely on him to move models from research prototypes into reliable, scalable production pipelines.

10 years of coding experience

Tamil, Hindi, English

Github Skills (24)

transformers10

pytorch10

slurm10

python10

ddp10

llama10

data-parallel10

machine-learning10

multiple-gpu10

reinforcement-learning10

data-parallelism10

huggingface10

mlops10

multi-gpu10

wandb10

Programming languages (8)

TypeScriptJavaRJavaScriptGoHTMLJupyter NotebookPython

Github contributions (5)

meta-llama/llama-cookbook

Feb 2024 - Dec 2024

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Role in this project:

Full-stack Developer

Contributions:10 reviews, 34 PRs, 31 pushes in 9 months

Contributions summary:Suraj appears to have been involved in the restructuring of the repository's file organization and updating the main README. They added new notebooks to the quickstart guide, specifically for running Llama2 on Hugging Face transformers, and consolidated images into a top-level folder. The commit also included changes to the code differences of a notebook, showcasing work on running Llama models using the Hugging Face transformers library. These changes suggest involvement in both the front-end documentation and backend model integration.

aifinetuninglangchainllamallama2

pytorch/examples

Sep 2022 - Nov 2022

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Role in this project:

MLOps Engineer

Contributions:31 reviews, 14 commits, 11 PRs in 2 months

Contributions summary:Suraj primarily contributes to setting up and configuring distributed training environments for PyTorch models, specifically using DDP (DistributedDataParallel). Their work focuses on creating scripts and configurations for multi-GPU and multi-node training using tools like `torchrun` and SLURM. They integrate features like snapshotting and resuming training, enhancing the training workflow. Furthermore, they introduce minGPT-based training, demonstrating expertise in distributed training and potentially automated model deployment.

pytorchvisiondeep-learningreinforcement-learningreinforcement

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial