Boris Fomitchev - Sr. Software Engineer In Deep Learning Frameworks at NVIDIA

Boris Fomitchev

Sr. Software Engineer In Deep Learning Frameworks at NVIDIA

Gig Harbor, Washington, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Boris Fomitchev is a Sr. Software Engineer in Deep Learning Frameworks with 14 years of experience building high-performance C++ and GPU-accelerated systems. At NVIDIA he led framework-level engineering for Torch, Caffe and Theano, helped port Lua Torch to Android and produced a Real Time HD Video Style Transfer demo showcased at GTC 2017. He pairs deep CUDA/cuDNN and mixed-precision expertise with tooling and build-system work—contributions to torch7, cuDNN bindings, cutorch, CMake FindCUDA, TensorRT and NeMo highlight deployment-focused fixes like ONNX/TorchScript export and FP16 support. Earlier he developed core image-processing tech at Adobe (including the Oil Paint Photoshop filter) and architected scalable C++ REST backends for eBay and PayPal while consulting. An active open-source maintainer, he has worked on STLport and even submitted an ANSI/ISO proposal to add a short-float fundamental type for C++, reflecting standards-level thinking alongside production pragmatism. Based in Gig Harbor, WA, he brings systems-level fluency across heterogeneous CPU/GPU/NPU platforms to ship stable, high-performance ML tooling.

15 years of coding experience

16 years of employment as a software developer

Master of Science (MS), Computer Science, A-, Master of Science (MS), Computer Science, A- at Moscow Institute of Physics and Technology (State University) (MIPT)

Russian, English

Github Skills (52)

torchscript10

transformers10

algorithm10

optimizations10

code-optimization10

algorithms10

pytorch10

c-language10

convolutional-neural-networks10

python10

operation10

large-language-model10

tensorrt10

gpu-programming10

machine-learning10

Programming languages (8)

C++CCMakeLuaHTMLJupyter NotebookCudaPython

Github contributions (5)

soumith/cudnn.torch

Oct 2015 - Mar 2017

Torch-7 FFI bindings for NVIDIA CuDNN

Role in this project:

Back-end Developer

Contributions:112 commits, 52 PRs, 15 pushes in 1 year 6 months

Contributions summary:Boris contributed significantly to the Torch-7 FFI bindings for NVIDIA CuDNN, focusing on adding and updating features. Their primary contribution was implementing compatibility patches for `cudnn_v4`. These included adding support for `cudnn4 Batch Normaliztion` and converting the existing `addTensor_v2` function to the latest API. The user also focused on updating various CuDNN API functions like `cudnnConvolution` and `cudnnActivation`, which demonstrates deep expertise in low-level numerical libraries.

cudanvidiagputorchcudnn

NVIDIA/NeMo

Nov 2019 - Jan 2023

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Role in this project:

Back-end Developer & ML Engineer

Contributions:246 reviews, 81 commits, 160 PRs in 3 years 2 months

Contributions summary:Boris's commits focus on exporting and integrating the NeMo framework with ONNX and TorchScript for deployment. They modified existing code to enable ONNX export for several models, particularly those related to ASR, TTS and NLP, and also adjusted code to address inconsistencies with tools like TensorRT. The user also made adjustments related to mixed-precision training, indicating an awareness of performance optimization and deployment considerations in model serving.

asrspeech-recognitionnatural-language-processingttsspeaker-diarization

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial