Sr. Software Engineer In Deep Learning Frameworks at NVIDIA
Gig Harbor, Washington, United States
Join Prog.AI to see contacts
Join Prog.AI to see contacts
Summary
🤩
Rockstar
🎓
Top School
Boris Fomitchev is a Sr. Software Engineer in Deep Learning Frameworks with 14 years of experience building high-performance C++ and GPU-accelerated systems. At NVIDIA he led framework-level engineering for Torch, Caffe and Theano, helped port Lua Torch to Android and produced a Real Time HD Video Style Transfer demo showcased at GTC 2017. He pairs deep CUDA/cuDNN and mixed-precision expertise with tooling and build-system work—contributions to torch7, cuDNN bindings, cutorch, CMake FindCUDA, TensorRT and NeMo highlight deployment-focused fixes like ONNX/TorchScript export and FP16 support. Earlier he developed core image-processing tech at Adobe (including the Oil Paint Photoshop filter) and architected scalable C++ REST backends for eBay and PayPal while consulting. An active open-source maintainer, he has worked on STLport and even submitted an ANSI/ISO proposal to add a short-float fundamental type for C++, reflecting standards-level thinking alongside production pragmatism. Based in Gig Harbor, WA, he brings systems-level fluency across heterogeneous CPU/GPU/NPU platforms to ship stable, high-performance ML tooling.
15 years of coding experience
16 years of employment as a software developer
Master of Science (MS), Computer Science, A-, Master of Science (MS), Computer Science, A- at Moscow Institute of Physics and Technology (State University) (MIPT)
Contributions:112 commits, 52 PRs, 15 pushes in 1 year 6 months
Contributions summary:Boris contributed significantly to the Torch-7 FFI bindings for NVIDIA CuDNN, focusing on adding and updating features. Their primary contribution was implementing compatibility patches for `cudnn_v4`. These included adding support for `cudnn4 Batch Normaliztion` and converting the existing `addTensor_v2` function to the latest API. The user also focused on updating various CuDNN API functions like `cudnnConvolution` and `cudnnActivation`, which demonstrates deep expertise in low-level numerical libraries.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Role in this project:
Back-end Developer & ML Engineer
Contributions:246 reviews, 81 commits, 160 PRs in 3 years 2 months
Contributions summary:Boris's commits focus on exporting and integrating the NeMo framework with ONNX and TorchScript for deployment. They modified existing code to enable ONNX export for several models, particularly those related to ASR, TTS and NLP, and also adjusted code to address inconsistencies with tools like TensorRT. The user also made adjustments related to mixed-precision training, indicating an awareness of performance optimization and deployment considerations in model serving.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Boris Fomitchev - Sr. Software Engineer In Deep Learning Frameworks at NVIDIA