Houjun Liu

Undergraduate Research Associate, NLP Group

San Francisco Bay Area United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Houjun Liu is an NLP-focused research engineer and founder in the San Francisco Bay Area with eight years of experience building production research software, open-source tools, and interactive products. As an undergraduate research associate in Stanford’s NLP group (working with Prof. Chris Manning) and an active contributor to the widely used stanza library, he implemented a tokenization postprocessor and refactored coreference code to support data augmentations. At CMU’s TalkBank he developed Batchalign, a corpus-alignment tool adopted by institutions worldwide for aphasia and Alzheimer’s detection research, and he leads product and engineering at Shabang and Condution, an open-source task manager that reached 7,000 users in five months. His portfolio spans embedded ML for autonomous drones and fire response, guidance systems for student space projects, and end-to-end MLOps to bring models into production. Interested in mechanistic interpretability, AGI, and Emacs, he blends academic rigor with startup execution to teach robots “how to robot” while optimizing human-facing experiences.
code9 years of coding experience
job5 years of employment as a software developer
bookThe Nueva School
bookParticipant, Participant at LaunchX Entrepreneurship Program
bookUniversity of California Santa Cruz
bookBachelor of Science - BS Computer Science — AI, Bachelor of Science - BS Computer Science — AI at Stanford University
languagesEnglish, Chinese, Spanish
stackoverflow-logo

Stackoverflow

Stats
286reputation
35kreached
1answer
3questions
github-logo-circle

Github Skills (13)

tokenize10
pytorch10
tokenizer10
nlp10
python10
natural-language-processing10
machine-learning9
deep-learning8
artificial-intelligence8
deeplearning-ai8
webassembly6
wpf6
geometry6

Programming languages (15)

JavaC++CVueGoHTMLSvelteTypeScript

Github contributions (5)

github-logo-circle
stanfordnlp/stanza

Oct 2023 - Apr 2025

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Role in this project:
userBack-end Developer
Contributions:90 reviews, 25 PRs, 241 pushes in 1 year 6 months
Contributions summary:Houjun implemented a tokenization postprocessor, enhancing the existing tokenization process in the Stanford NLP Python library. Their work involved modifying the `stanza/models/tokenization/utils.py` file to include the postprocessor, allowing for manual token cleanup and customization. They also added tests to verify the postprocessor and reassembly functions. Further contributions include refactoring of existing coreference codebase to deal with the data augmentations
nlppytorchhuman-languagesnamed-entity-recognitionpython
Jemoka/.emacs.d

Aug 2021 - Dec 2022

Contributions:101 commits, 132 pushes, 1 branch in 1 year 4 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial