Niklas

Palo Alto, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

Niklas Muennighoff is an AI researcher in Palo Alto with five years of experience building evaluation infrastructure, data pipelines, and efficient inference tooling for language and embedding models. He has held a research-engineer role at Hugging Face and currently works at AI2 and Contextual AI, while contributing to flagship open-source projects like EleutherAI's lm-evaluation-harness—where he added ethics-focused evaluation tasks—and the Massive Text Embedding Benchmark (MTEB). His work spans dataset engineering, metric design and bug fixes (e.g., main-score handling and BLEU integration), and runtime optimizations such as VLLM/test-time scaling to make model evaluation more reliable and multilingual. A Peking University graduate now starting a PhD in CS at Stanford, he also brings the uncommon background of years as a Disney voice-over artist, which shapes his focus on human-centered prompts and clear evaluation.

5 years of coding experience

Github Skills (30)

sentence-transformers10

embed10

python10

word-embeddings10

machine-learning10

text-classification10

wordembedding10

semantic-search10

trainings10

natural-language-processing10

word-embedding10

evaluation-framework10

nlp10

bleu10

embedding10

Programming languages (10)

TypeScriptShellC++CSSCHandlebarsJavaScriptHTML

Github contributions (5)

simplescaling/s1

Feb 2025 - Apr 2025

s1: Simple test-time scaling

Role in this project:

ML Engineer

Contributions:14 reviews, 10 PRs, 44 pushes in 1 month

Contributions summary:Niklas's commits primarily involve modifications to data loading, preprocessing, and model training scripts, specifically `data/collect_data.py` and `train/sft.py`. These changes include updates to dataset paths and loading mechanisms for open-source math datasets, indicating a focus on preparing data for model training. Further modifications to training configurations suggest involvement in model experimentation and refinement. The integration with VLLM, as seen in `eval/generate.py`, suggests a focus on efficient inference.

embeddings-benchmark/mteb

Jul 2022 - Jan 2023

MTEB: Massive Text Embedding Benchmark

Role in this project:

Back-end Developer

Contributions:7 releases, 255 reviews, 153 commits in 6 months

Contributions summary:Niklas primarily focused on fixing issues related to the main score calculation in a multilingual text embedding benchmark. They addressed warnings and made adjustments to the `AbsTaskClassification.py` file, ensuring the correct handling of main scores. Furthermore, the user updated various task configurations in multiple files to correctly set main scores and fix task splits for accurate evaluation. The user also refactored the summarization evaluator and adjusted the code to skip samples with no variance.

bertsimilarity-searchbenchmarkretrievalbug-reporting

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial