Shirshanka Das

Co-Founder And CTO at DataHub

San Francisco Bay Area United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Shirshanka Das is a seasoned software leader and Co-Founder & CTO in the San Francisco Bay Area with 13 years of experience building large-scale data and infrastructure systems. He co-founded DataHub and Acryl Data and previously served as Principal Staff Software Engineer at LinkedIn, where he architected GDPR strategy and helped build core systems like Databus and Espresso. An active open-source committer on Apache Gobblin and a key contributor to DataHub, his GitHub work spans Kafka integration, schema registry support, multi-MCE ingestion, structured properties, and column-level lineage for metadata pipelines. Trained at IIT Delhi and UCLA, he blends academic rigor with pragmatic engineering and product sense. Raised in a sleepy town in Bihar, he brings a mathematician’s precision to solving practical, production-scale data problems.
code13 years of coding experience
job16 years of employment as a software developer
bookUniversity of California, Los Angeles
github-logo-circle

Github Skills (17)

avro10
python10
back-end-development10
data-engineering10
databases10
kafka10
java10
registry10
javas10
schema10
data-integration10
database10
hadoop9
json9
metadata9

Programming languages (5)

TypeScriptJavaMustacheHTMLPython

Github contributions (5)

github-logo-circle
datahub-project/datahub

Sep 2020 - Jan 2023

The Metadata Platform for your Data and AI Stack
Role in this project:
userBack-end Developer
Contributions:32 releases, 2497 reviews, 940 commits in 2 years 4 months
Contributions summary:Shirshanka primarily focused on back-end development tasks, enhancing the functionality of the metadata ingestion pipeline. They added support for processing multiple MCEs in a single file, and fixed unit tests. Furthermore, they implemented code to support structured properties, and made improvements to database connection for the project's backend. They are responsible for adding column-level lineage.
data-managementdata-discoverydata-stackmodern-data-stackdata-catalog
apache/gobblin

Feb 2015 - Feb 2021

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Role in this project:
userBack-end Developer
Contributions:22 reviews, 59 commits, 90 PRs in 6 years
Contributions summary:Shirshanka primarily focused on enhancing the Gobblin data integration framework by implementing new features and improving existing ones. Their work includes adding support for Kafka writers, which involved creating and modifying Java files to integrate with Kafka schema registries, including the LiKafkaSchemaRegistry. Furthermore, the user made contributions to refactor and improve the codebase related to Hadoop file system helper classes and added a simple console writer.
datadcosdata-streambig-data-integrationbatch-data
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial