Jongsoo Park

Menlo Park, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

Jongsoo Park is a Member of Technical Staff at OpenAI and a software/processor-architecture co-design specialist with 10 years of experience optimizing ML, HPC, and graph analytics workloads. He led SW/HW/model co-design at Meta — co-design lead for the Meta Training and Inference Accelerator and a core contributor to Llama3’s pretrain scalability and performance decisions. An active open-source performance engineer, his high-impact contributions include SIMD-optimized GEMM work in Facebook’s FBGEMM, bfloat16 and flash-attention fixes in PyTorch, and low-level AVX512 and Winograd tuning in libxsmm. His earlier research produced an LLVM back-end for a low-power microprocessor and code that powered a top HPCG benchmark and earned a Supercomputing best-paper award for low-communication FFTs. Based in Palo Alto with a Stanford PhD in Electrical Engineering, he blends deep systems research with production-focused engineering at scale.

10 years of coding experience

Github Skills (29)

pytorch10

c-language10

caffe10

back-end-development10

matrix-multiplication10

machine-learning10

avx51210

deep-learning10

gpu10

performance-optimization10

ai10

caffe210

convolution10

simd10

jit10

Programming languages (5)

JuliaShellC++CPython

Github contributions (5)

pytorch/FBGEMM

Nov 2018 - Jan 2023

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Role in this project:

Back-end Developer & Performance Engineer

Contributions:8 reviews, 316 commits, 344 PRs in 4 years 2 months

Contributions summary:Jongsoo focused on the implementation and optimization of core functionality within the fbgemm library, specifically contributing to matrix-matrix multiplication (GEMM) operations. Their work involved the addition of new methods, such as `equals` and `metaEquals`, to the `PackBMatrix` class, as well as significant refactoring of transpose code, optimizing it with SIMD instructions. Additionally, the user addressed rounding consistency issues and adapted the code to support group convolutions, highlighting their dedication to performance improvements and feature enhancements.

matrix-multiplicationfacebookmultiplicationmatrixml-applications

pytorch/pytorch

Apr 2018 - Dec 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Role in this project:

ML Engineer

Contributions:11 reviews, 217 commits, 299 PRs in 4 years 8 months

Contributions summary:Jongsoo primarily focused on improving and optimizing the performance of machine learning models and related infrastructure within the PyTorch ecosystem. Their contributions included fixing issues in the inductor compiler, a component used for optimizing model performance, and addressing problems in the transformer benchmark, particularly with scaled dot-product attention. They also added bfloat16 support in erfinv and made changes to the flash attention implementation.

pythongpu-accelerationdeep-learninggpunumpy

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial