Matei Zaharia is CTO and cofounder of Databricks and an Associate Professor of Computer Science at Berkeley, combining product leadership with active academic research. He launched Apache Spark during his PhD and has since driven widely adopted open-source projects including MLflow, Delta Lake, the Dolly open LLM, and ColBERT. His work blends deep systems and performance engineering—contributions range from low-level runtime work in weld and Spark (Kryo serializer and append-only map optimizations) to full-stack improvements in MLflow—enabling large-scale data and AI workloads. At Berkeley he researches NLP, databases and security with a practical focus on building highly reliable applications with LLMs, while continuing to steward Spark as an Apache VP and Hadoop committer. Based in Berkeley with about 15 years in the field, he uniquely bridges cutting-edge research and production-grade distributed systems.
Lightning-fast cluster computing in Java, Scala and Python.
Role in this project:
Back-end Developer
Contributions:1591 commits in 3 years 8 months
Contributions summary:Matei contributed to the core functionality of the Spark framework, developing and implementing new features and improvements. Their work includes adding new operations for RDDs, such as mapPartitions, and creating unit tests for these new methods. The commits show work on serialization and deserialization within the Spark framework to support features like caching and data handling. They also made performance enhancements and incorporated documentation.
High-performance runtime for data analytics applications
Role in this project:
Back-end Developer & Systems Architect
Contributions:99 commits, 50 PRs, 30 pushes in 1 year 2 months
Contributions summary:Matei primarily focused on implementing and refining the `easy-ll` library, a core component for compiling and running LLVM IR within the project. This included adding features for different types of data, checking type consistency of different expressions and building a new function. Their contributions were mainly focused on low-level code improvements and involved significant changes to the codebase. The user added the new expression type GetField and also added documentation.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.