Markus Konrad is a Berlin-based software engineer with 13 years of experience combining data analysis and backend development in Python and R. He specializes in building tooling for messy, real-world data — notably contributing to pdftabextract by implementing rotated-page detection and robust text-box handling to extract tables from OCR-processed PDFs. Markus bridges analytics and engineering, turning data-mining challenges into dependable backend systems and reusable libraries. His work shows a particular talent for solving the edge cases in document parsing that commonly break automated pipelines.
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Role in this project:
Back-end Developer
Contributions:169 commits, 2 PRs, 24 pushes in 6 years 1 month
Contributions summary:Markus appears to be focused on developing and refining functionality for extracting table data from PDF files, as indicated by the edits to the `ipynotes.py` and `pdftabextract.py` files. These changes include adding functions for handling text elements, particularly those related to rotated pages, and starting an implementation within `pdftabextract.py`. The changes demonstrate work on core functionalities like page rotation angle detection and text box position updates.
Contributions:9 commits, 2 pushes, 1 branch in 6 years
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.