Shirshanka Das - Co-Founder And CTO at DataHub

Shirshanka Das

Co-Founder And CTO at DataHub

San Francisco Bay Area United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Shirshanka Das is a seasoned software leader and Co-Founder & CTO in the San Francisco Bay Area with 13 years of experience building large-scale data and infrastructure systems. He co-founded DataHub and Acryl Data and previously served as Principal Staff Software Engineer at LinkedIn, where he architected GDPR strategy and helped build core systems like Databus and Espresso. An active open-source committer on Apache Gobblin and a key contributor to DataHub, his GitHub work spans Kafka integration, schema registry support, multi-MCE ingestion, structured properties, and column-level lineage for metadata pipelines. Trained at IIT Delhi and UCLA, he blends academic rigor with pragmatic engineering and product sense. Raised in a sleepy town in Bihar, he brings a mathematician’s precision to solving practical, production-scale data problems.

13 years of coding experience

16 years of employment as a software developer

University of California, Los Angeles

Github Skills (17)

avro10

python10

back-end-development10

data-engineering10

databases10

kafka10

java10

registry10

javas10

schema10

data-integration10

database10

hadoop9

json9

metadata9

Programming languages (5)

TypeScriptJavaMustacheHTMLPython

Github contributions (5)

datahub-project/datahub

Sep 2020 - Jan 2023

The Metadata Platform for your Data and AI Stack

Role in this project:

Back-end Developer

Contributions:32 releases, 2497 reviews, 940 commits in 2 years 4 months

Contributions summary:Shirshanka primarily focused on back-end development tasks, enhancing the functionality of the metadata ingestion pipeline. They added support for processing multiple MCEs in a single file, and fixed unit tests. Furthermore, they implemented code to support structured properties, and made improvements to database connection for the project's backend. They are responsible for adding column-level lineage.

data-managementdata-discoverydata-stackmodern-data-stackdata-catalog

apache/gobblin

Feb 2015 - Feb 2021

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Role in this project:

Back-end Developer

Contributions:22 reviews, 59 commits, 90 PRs in 6 years

Contributions summary:Shirshanka primarily focused on enhancing the Gobblin data integration framework by implementing new features and improving existing ones. Their work includes adding support for Kafka writers, which involved creating and modifying Java files to integrate with Kafka schema registries, including the LiKafkaSchemaRegistry. Furthermore, the user made contributions to refactor and improve the codebase related to Hadoop file system helper classes and added a simple console writer.

datadcosdata-streambig-data-integrationbatch-data

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial