Simon Mo - CEO And Cofounder at UC Berkeley Electrical Engineering & Computer Sciences (EECS)

Simon Mo

CEO And Cofounder at UC Berkeley Electrical Engineering & Computer Sciences (EECS)

Berkeley, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Simon Mo is a lead engineer and UC Berkeley PhD student with eight years of experience building high-throughput, memory-efficient inference and serving systems. He maintains the vLLM project and helped build Ray Serve from zero to one, contributing backend and distributed-data improvements across Ray, Modin, and Clipper. His production work spans GPU inference optimizations at Character.AI, Kubernetes multi-tenancy, containerized deployments with shared-memory tuning, Prometheus metrics, and developer tooling improvements such as migrating linters from pylint to ruff. Comfortable at the intersection of research and production, he combines systems-level performance tuning with developer-facing engineering to make large models practical at scale.

9 years of coding experience

7 years of employment as a software developer

Doctor of Philosophy - PhD Computer Science, Doctor of Philosophy - PhD Computer Science at University of California, Berkeley

Github Skills (46)

continuous-deployment10

kubernetes10

dockerce10

docker10

performance-monitor10

python10

testing10

dataframes10

pandas10

machine-learning10

performanceanalysis10

dockers10

ml-deployment10

performance-measurement10

numpy10

Programming languages (16)

C++CSSRustTeXGoHTMLJupyter NotebookCuda

Github contributions (5)

vllm-project/vllm

Oct 2023 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Role in this project:

Backend & DevOps Engineer

Contributions:1021 reviews, 1290 PRs, 1119 pushes in 1 year 5 months

Contributions summary:Simon contributed to the project by fixing a bug related to sequence group duplication within the engine step, demonstrating a focus on core engine functionality. They also documented the official Docker image deployment process, which involved updating the documentation to include shared memory usage. Furthermore, the user migrated the linter from `pylint` to `ruff`, improving code quality and maintainability, along with adding production metrics in Prometheus format.

amdcudagpthpuinference

ucbrise/clipper

Oct 2017 - Jul 2020

A low-latency prediction-serving system

Role in this project:

DevOps & Backend Engineer

Contributions:1 release, 29 commits, 97 PRs in 2 years 9 months

Contributions summary:Simon primarily focused on improving the Clipper project's infrastructure and backend functionality. Their contributions include fixing broken links, addressing Docker and exception handling issues, and implementing a metrics monitoring system, including frontend exporters. They also made significant changes to the Docker container manager, and added Kubernetes support for multi tenancy.

pythonservingpredictiondeep-learninglatency

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial