Xiao Wang

Software Engineer at NVIDIA

California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Xiao Wang is a software engineer based in California with 7 years of experience specializing in PyTorch and CUDA-focused deep learning infrastructure. He is an active contributor to the flagship pytorch/pytorch repository, where he improved linear algebra performance by adding a driver kwarg to torch.linalg.svd and svdvals and integrating cuSolver’s gesvdaStridedBatched driver. He also implemented support for relocatable device code linking in CUDA extensions, improving build robustness and portability. His work blends low-level GPU optimization with practical engineering, helping bridge research-quality kernels to production ML systems. Xiao excels at squeezing performance from GPU code while making complex build and runtime behaviors more predictable for teams shipping models at scale.
code8 years of coding experience
job5 years of employment as a software developer
bookBachelor’s Degree, Physics, Bachelor’s Degree, Physics at University of Science and Technology of China
bookDoctor of Philosophy (Ph.D.), Physics, Doctor of Philosophy (Ph.D.), Physics at Purdue University
languagesEnglish, Chinese
github-logo-circle

Github Skills (12)

cuda10
pytorch10
deep-learning10
gpu10
deep-learning-ai10
cusolver10
python9
machine-learning9
tensor9
lg8
neural-network8
autograd8

Programming languages (11)

TypeScriptC#C++CJavaScriptGoObjective-CHTML

Github contributions (5)

github-logo-circle
pytorch/pytorch

Feb 2020 - Dec 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration
Role in this project:
userML Engineer
Contributions:197 reviews, 287 commits, 154 PRs in 2 years 10 months
Contributions summary:Xiao primarily contributed to the `pytorch/pytorch` repository by enhancing linear algebra and CUDA functionalities. They added a `driver=` kwarg to `torch.linalg.svd` and `svdvals`, integrating the cusolver gesvdaStridedBatched driver for improved performance. Additionally, the user implemented support for relocatable device code linking in CUDA extensions, improving the build process. These changes demonstrate a focus on optimizing and expanding the capabilities of PyTorch's linear algebra and CUDA integration, core components of its deep learning functionality.
pythongpu-accelerationdeep-learninggpunumpy
xwang233/code-snippet

Feb 2020 - Dec 2021

Contributions:150 commits, 134 pushes, 1 branch in 1 year 9 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial