Principal Engineer at The Apache Software Foundation
Western Cape, South Africa
Join Prog.AI to see contacts
Join Prog.AI to see contacts
Summary
👤
Senior
🎓
Top School
Nick Pentreath is a principal engineer and AI/ML leader with 15 years of experience building scalable, data-driven systems across social, ecommerce, advertising, and finance domains. At Tumblr, he led ML development for core feeds, ranking and personalization, and helped architect embedding-based recommendations from the ground up. He is an active open-source contributor, Apache Spark PMC member and committer, and author of Machine Learning with Spark, with notable work on streaming analytics and scalable ML primitives. Nick co-founded GraphFlow and previously drove open-source ML initiatives at IBM CODAIT, bridging research rigor with production-grade systems. He is currently a Principal Engineer at Rumi.ai, shaping the intelligence layer for enterprise meetings and communication data. Based in the Western Cape, South Africa, he blends commercial focus with AI to deliver practical, data-driven business value across diverse industries.
13 years of coding experience
14 years of employment as a software developer
BSc, Quantitative Finance, BSc, Quantitative Finance at University of Cape Town
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Role in this project:
Full-stack Developer
Contributions:90 commits, 19 PRs, 69 pushes in 3 years 1 month
Contributions summary:Nick updated Jupyter Notebooks to align with specific versions of Apache Spark and Elasticsearch. This involved modifying code within the notebooks to reflect changes in library versions and ensure compatibility. The changes focused on the core components of a recommender system, including data loading, ALS model training, and writing model factors to Elasticsearch.
Apache Spark - A unified analytics engine for large-scale data processing
Role in this project:
Back-end Developer & Data Scientist
Contributions:37 PRs, 1796 comments in 5 years 9 months
Contributions summary:Nick's contributions primarily involve modifying the `ALS.scala` file, which is a core component for Alternating Least Squares matrix factorization, a technique often used for recommendation systems. The changes include enhancements for implicit feedback models, and also include schema validation. The user has added or modified parameters related to storage levels and cold start strategies in the ALS algorithm, demonstrating the ability to configure and optimize the system for data processing and model training. The user is also involved in updating the ALS examples for Java and Python.
analyticspythondata-processingsqlapache
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.