Kuntai Du - Chief Scientist at Tensormesh

Kuntai Du

Chief Scientist at Tensormesh

Chicago, Illinois, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Kuntai Du is a Chief Scientist and graduating PhD from the University of Chicago with 8 years of experience building high-throughput LLM inference systems and production performance tooling. As an early vLLM team member he led benchmarking and a public performance dashboard (perf.vllm.ai), and implemented disaggregated prefilling and CPU offloading—changes that required only about 200 lines yet drove multiple industry adoptions. His reproducible benchmark work reached 92k readers and catalyzed a collaboration with NVIDIA, while he also explores KV-cache innovations in the LMCache project. Earlier research compressed satellite imagery 3× across millions of square kilometers (ASPLOS’24) and he optimized video-model workflows at TuSimple, showing a rare blend of systems performance engineering and ML research. Based in Chicago, he connects academic rigor with production-grade MLOps to accelerate LLM inference at scale.

9 years of coding experience

Doctor of Philosophy - PhD, Computer Science, Doctor of Philosophy - PhD, Computer Science at 美国芝加哥大学

Github Skills (12)

pytorch10

benchmark10

benchmarking10

inference10

python10

llm10

mlops9

docker8

dockerce8

dockers8

kubernetes-pod7

kubernetes7

Programming languages (7)

HCLJavaC++CJavaScriptHTMLPython

Github contributions (5)

vllm-project/vllm

Mar 2024 - Mar 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Role in this project:

MLOps Engineer & Performance Engineer

Contributions:167 reviews, 48 PRs, 12 pushes in 1 year

Contributions summary:Kuntai's contributions center on performance optimization and benchmarking within the vLLM project. They implemented and documented performance benchmarks, including latency, throughput, and serving tests. The user also focused on improving the readability of benchmark results and preparing them for a performance dashboard. Their work included fixing bugs in the serving benchmark and integrating with tools like TGI, TensorRT-LLM, and LMDeploy for comprehensive performance evaluations.

amdcudadeepseekgpthpu

KuntaiDu/vllm

May 2024 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Contributions:2 reviews, 13 PRs, 789 pushes in 10 months

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial