Matei Zaharia - CTO Cofounder at University of California, Berkeley

Matei Zaharia

CTO Cofounder at University of California, Berkeley

Berkeley, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Matei Zaharia is CTO and cofounder of Databricks and an Associate Professor of Computer Science at Berkeley, combining product leadership with active academic research. He launched Apache Spark during his PhD and has since driven widely adopted open-source projects including MLflow, Delta Lake, the Dolly open LLM, and ColBERT. His work blends deep systems and performance engineering—contributions range from low-level runtime work in weld and Spark (Kryo serializer and append-only map optimizations) to full-stack improvements in MLflow—enabling large-scale data and AI workloads. At Berkeley he researches NLP, databases and security with a practical focus on building highly reliable applications with LLMs, while continuing to steward Spark as an Apache VP and Hadoop committer. Based in Berkeley with about 15 years in the field, he uniquely bridges cutting-edge research and production-grade distributed systems.

16 years of coding experience

18 years of employment as a software developer

PhD Computer Science, PhD Computer Science at University of California, Berkeley

Stackoverflow

Stats

11reputation

1kreached

0answers

2questions

Github Skills (38)

unit-testing10

apache-spark10

javascript10

spark10

llvm10

big-data10

data-serialization10

mlflow10

spark-streaming10

ui-design10

data-structure10

java10

scala10

serialization10

javas10

Programming languages (8)

JavaC++RustScalaJavaScriptHTMLJupyter NotebookPython

Github contributions (5)

mesos/spark

Mar 2010 - Dec 2013

Lightning-fast cluster computing in Java, Scala and Python.

Role in this project:

Back-end Developer

Contributions:1591 commits in 3 years 8 months

Contributions summary:Matei contributed to the core functionality of the Spark framework, developing and implementing new features and improvements. Their work includes adding new operations for RDDs, such as mapPartitions, and creating unit tests for these new methods. The commits show work on serialization and deserialization within the Spark framework to support features like caching and data handling. They also made performance enhancements and incorporated documentation.

pythoncluster-computinglightningsparkscala

weld-project/weld

Jun 2016 - Sep 2017

High-performance runtime for data analytics applications

Role in this project:

Back-end Developer & Systems Architect

Contributions:99 commits, 50 PRs, 30 pushes in 1 year 2 months

Contributions summary:Matei primarily focused on implementing and refining the `easy-ll` library, a core component for compiling and running LLVM IR within the project. This included adding features for different types of data, checking type consistency of different expressions and building a new function. Their contributions were mainly focused on low-level code improvements and involved significant changes to the codebase. The user added the new expression type GetField and also added documentation.

data-analyticsanalyticscode-generationdatastanford

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial