Eugene Zhulenev - Software Engineer at Google

Eugene Zhulenev

Software Engineer at Google

San Francisco, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

Eugene Zhulenev is a Staff Software Engineer at Google Brain in San Francisco with 13 years of experience building high-performance infrastructure for data analysis and machine learning, focused on low-level runtime performance for CPUs and GPUs. He brings deep expertise in C++ performance tuning, multi-threaded systems, ML compilers and runtimes, cloud-native distributed architectures, and functional Scala for Big Data. A core OpenXLA contributor and former Eigen maintainer, his work on tensor contraction and kernel optimizations produced 2–3x speedups in TensorFlow and saved millions in hardware costs. He pairs production-grade engineering with hands-on compiler work — from small-shape copy optimizations in XLA/CPU to CUDA buffer integration in IREE and persistent compilation caching in JAX — bridging research and deployable runtime efficiency.

14 years of coding experience

Stackoverflow

Stats

9,734reputation

467kreached

182answers

14questions

Badges

sbt

top-5%

scala

top-1%

rdd

top-5%

intellij-idea

top-5%

apache-spark

top-1%

serialization

top-5%

Github Skills (57)

apache-spark10

c-language10

compilation10

python10

algebra10

operation10

llvm10

tensorrt10

cpp-1110

machine-learning10

c1110

python-templates10

mlr10

scala10

c1710

Programming languages (9)

JavaC++ShellStarlarkLLVMScalaMLIRJupyter Notebook

Github contributions (5)

tensorflow/runtime

Apr 2020 - Dec 2022

A performant and modular runtime for TensorFlow

Role in this project:

Back-end Developer

Contributions:468 commits in 2 years 8 months

Contributions summary:Eugene primarily contributed to the core runtime and compilation aspects of the TensorFlow project, specifically focusing on the JitRt (Just-In-Time Runtime) and its associated kernel implementations. The commits reveal work on enhancing the handling of memory management, particularly for tensor operations, and integrating custom call functionality for improved performance. Furthermore, the user implemented several new native operations and also modified and improved existing kernels that are core to the runtime.

runtimeperformantmodulartensorflow

openxla/xla

Jan 2019 - Jan 2023

A machine learning compiler for GPUs, CPUs, and ML accelerators

Role in this project:

Back-end Developer

Contributions:462 reviews, 378 commits, 3 PRs in 4 years 1 month

Contributions summary:Eugene contributed to the XLA compiler project, with the primary focus on improving the runtime and the core components. Their work includes enabling support for custom contraction kernels in XLA's single-threaded matrix multiplication, implementing batch normalization through the cuDNN BatchNormEx API, and fixing ODR violations in Eigen contraction kernels. Furthermore, they added support for executing XLA:GPU on top of JitRt and improved the XLA runtime library.

compilercommunity-drivenmachine-learningmodular

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial