Kuntai Du

Chief Scientist at Tensormesh

Chicago, Illinois, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Kuntai Du is a Chief Scientist and graduating PhD from the University of Chicago with 8 years of experience building high-throughput LLM inference systems and production performance tooling. As an early vLLM team member he led benchmarking and a public performance dashboard (perf.vllm.ai), and implemented disaggregated prefilling and CPU offloading—changes that required only about 200 lines yet drove multiple industry adoptions. His reproducible benchmark work reached 92k readers and catalyzed a collaboration with NVIDIA, while he also explores KV-cache innovations in the LMCache project. Earlier research compressed satellite imagery 3× across millions of square kilometers (ASPLOS’24) and he optimized video-model workflows at TuSimple, showing a rare blend of systems performance engineering and ML research. Based in Chicago, he connects academic rigor with production-grade MLOps to accelerate LLM inference at scale.
code9 years of coding experience
bookDoctor of Philosophy - PhD, Computer Science, Doctor of Philosophy - PhD, Computer Science at 美国芝加哥大学
github-logo-circle

Github Skills (12)

pytorch10
benchmark10
benchmarking10
inference10
python10
llm10
mlops9
docker8
dockerce8
dockers8
kubernetes-pod7
kubernetes7

Programming languages (7)

HCLJavaC++CJavaScriptHTMLPython

Github contributions (5)

github-logo-circle
vllm-project/vllm

Mar 2024 - Mar 2025

A high-throughput and memory-efficient inference and serving engine for LLMs
Role in this project:
userMLOps Engineer & Performance Engineer
Contributions:167 reviews, 48 PRs, 12 pushes in 1 year
Contributions summary:Kuntai's contributions center on performance optimization and benchmarking within the vLLM project. They implemented and documented performance benchmarks, including latency, throughput, and serving tests. The user also focused on improving the readability of benchmark results and preparing them for a performance dashboard. Their work included fixing bugs in the serving benchmark and integrating with tools like TGI, TensorRT-LLM, and LMDeploy for comprehensive performance evaluations.
amdcudadeepseekgpthpu
KuntaiDu/vllm

May 2024 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs
Contributions:2 reviews, 13 PRs, 789 pushes in 10 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial