Familiarity with tesseract, paddleocr, layoutlmv3, donut, nougat
Understanding of ooxml document formats
TransPerfect Translations International Inc is seeking a passionate Python Backend Developer to join their AI team in Madrid, Spain. The role focuses on developing solutions for document processing, particularly converting unstructured PDFs into editable formats using AI technologies
Job Summary
Join an innovative Artificial Intelligence (AI) team shaping the future of AI in a global organization.
Lead the research and implementation of a document conversion pipeline to convert complex PDFs into editable .docx files.
This is a hybrid role requiring you to be both a strategic decision-maker and a hands-on developer combining engineering and AI skills.
Matching Summary
Match Score: 85
TransPerfect Translations International Inc is seeking a passionate Python Backend Developer to join their AI team in Madrid, Spain. The role focuses on developing solutions for document processing, particularly converting unstructured PDFs into editable formats using AI technologies.
Skills & Requirements
Must-have
Python mastery with OpenCV, PyMuPDF, python-docx
Familiarity with Tesseract, PaddleOCR, LayoutLMv3, Donut, Nougat
Understanding of OOXML document formats
Experience with GPT or Claude models
Ability to choose between APIs and custom pipelines
Nice-to-have
Pandoc AST experience
Background in DTP, Typography, or Graphic Design
Contributions to open-source OCR projects
Key Requirements
Expert-level Python skills
Deep familiarity with OCR/Document AI models
Experience using LLMs for layout correction
Ability to decide when to use off-the-shelf APIs vs. custom pipelines