Blog Overview

Building a searchable developer graph

Prog.AI is creating one of the most comprehensive datasets of software developers in the world. Not by scraping résumés or collecting self-reported profiles from LinkedIn, but by analyzing real open source contributions at scale. 

The system currently maps over 60 million developer profiles from GitHub to 50,000 skills, companies they work at, personal and professional contacts, and other online handles. Our team has built a proprietary taxonomy of thousands of technology tags, creating a precise and searchable index of tools, languages, and frameworks.

From there, Prog.AI processes over one billion contribution events from GitHub, including commits, pull requests, and issues, to assess the level of involvement and role of this person in the project, importance and quality of their contributions. Each event is classified by our in-house built machine learning to determine the contributor's role: backend, frontend, machine learning, documentation, devops, code review, and more. 

This is not just a record of what people worked on. It is an analysis of the depths of their contribution.

The system examines commit history and metadata to infer expertise areas and contribution types. Profiles are then enriched with data from sources like StackOverflow and Kaggle, when available, but the primary signal remains technical contribution.

The result is a searchable graph of developer activity and skill, built directly from the source of truth: the codebase itself.

→ For hiring, it offers verified insight into real-world experience. 

→ For research, it reveals the internal dynamics of projects and surfaces core contributors.

→ For engineering teams, it becomes a way to find collaborators who have already solved the problems they are facing.

Prog.AI is mapping the global software workforce in a way that finally reflects what developers actually do.

No items found.
Join Prog.AI and find more top contributors to Huggingface/transformers
author
Val Demar