Matei Zaharia

San Francisco, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
Matei Zaharia is CTO and cofounder of Databricks and an Associate Professor of Computer Science at Berkeley, combining product leadership with active academic research. He launched Apache Spark during his PhD and has since driven widely adopted open-source projects including MLflow, Delta Lake, the Dolly open LLM, and ColBERT. His work blends deep systems and performance engineering—contributions range from low-level runtime work in weld and Spark (Kryo serializer and append-only map optimizations) to full-stack improvements in MLflow—enabling large-scale data and AI workloads. At Berkeley he researches NLP, databases and security with a practical focus on building highly reliable applications with LLMs, while continuing to steward Spark as an Apache VP and Hadoop committer. Based in Berkeley with about 15 years in the field, he uniquely bridges cutting-edge research and production-grade distributed systems.
code15 years of coding experience
github-logo-circle

Github Skills (45)

unit-testing10
apache-spark10
javascript10
ui-d10
spark10
llvm10
big-data10
data-serialization10
mlflow10
spark-streaming10
ui-design10
java10
data-structure10
scala210
serialization10

Programming languages (8)

JavaC++RustScalaJavaScriptHTMLJupyter NotebookPython

Github contributions (5)

github-logo-circle
mesos/spark

Mar 2010 - Dec 2013

Lightning-fast cluster computing in Java, Scala and Python.
Role in this project:
userBack-end Developer
Contributions:1591 commits in 3 years 8 months
Contributions summary:Matei contributed to the core functionality of the Spark framework, developing and implementing new features and improvements. Their work includes adding new operations for RDDs, such as mapPartitions, and creating unit tests for these new methods. The commits show work on serialization and deserialization within the Spark framework to support features like caching and data handling. They also made performance enhancements and incorporated documentation.
pythoncluster-computinglightningsparkscala
weld-project/weld

Jun 2016 - Sep 2017

High-performance runtime for data analytics applications
Role in this project:
userBack-end Developer & Systems Architect
Contributions:99 commits, 50 PRs, 30 pushes in 1 year 2 months
Contributions summary:Matei primarily focused on implementing and refining the `easy-ll` library, a core component for compiling and running LLVM IR within the project. This included adding features for different types of data, checking type consistency of different expressions and building a new function. Their contributions were mainly focused on low-level code improvements and involved significant changes to the codebase. The user added the new expression type GetField and also added documentation.
data-analyticsanalyticscode-generationdatastanford
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial