Simon Mo

CEO And Cofounder at UC Berkeley Electrical Engineering & Computer Sciences (EECS)

Berkeley, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Simon Mo is a lead engineer and UC Berkeley PhD student with eight years of experience building high-throughput, memory-efficient inference and serving systems. He maintains the vLLM project and helped build Ray Serve from zero to one, contributing backend and distributed-data improvements across Ray, Modin, and Clipper. His production work spans GPU inference optimizations at Character.AI, Kubernetes multi-tenancy, containerized deployments with shared-memory tuning, Prometheus metrics, and developer tooling improvements such as migrating linters from pylint to ruff. Comfortable at the intersection of research and production, he combines systems-level performance tuning with developer-facing engineering to make large models practical at scale.
code9 years of coding experience
job7 years of employment as a software developer
bookDoctor of Philosophy - PhD Computer Science, Doctor of Philosophy - PhD Computer Science at University of California, Berkeley
github-logo-circle

Github Skills (46)

continuous-deployment10
kubernetes10
dockerce10
docker10
performance-monitor10
python10
testing10
dataframes10
pandas10
machine-learning10
performanceanalysis10
dockers10
ml-deployment10
performance-measurement10
numpy10

Programming languages (16)

C++CSSRustTeXGoHTMLJupyter NotebookCuda

Github contributions (5)

github-logo-circle
vllm-project/vllm

Oct 2023 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs
Role in this project:
userBackend & DevOps Engineer
Contributions:1021 reviews, 1290 PRs, 1119 pushes in 1 year 5 months
Contributions summary:Simon contributed to the project by fixing a bug related to sequence group duplication within the engine step, demonstrating a focus on core engine functionality. They also documented the official Docker image deployment process, which involved updating the documentation to include shared memory usage. Furthermore, the user migrated the linter from `pylint` to `ruff`, improving code quality and maintainability, along with adding production metrics in Prometheus format.
amdcudagpthpuinference
ucbrise/clipper

Oct 2017 - Jul 2020

A low-latency prediction-serving system
Role in this project:
userDevOps & Backend Engineer
Contributions:1 release, 29 commits, 97 PRs in 2 years 9 months
Contributions summary:Simon primarily focused on improving the Clipper project's infrastructure and backend functionality. Their contributions include fixing broken links, addressing Docker and exception handling issues, and implementing a metrics monitoring system, including frontend exporters. They also made significant changes to the Docker container manager, and added Kubernetes support for multi tenancy.
pythonservingpredictiondeep-learninglatency
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial