The role involves building the semantic layer and lakehouse architecture to connect scientists with petabyte-scale data through natural language interfaces
Job Summary
The role involves building the semantic layer and lakehouse architecture to connect scientists with petabyte-scale data through natural language interfaces.
Candidates will design data pipelines that transform raw omics data into harmonized, AI-consumable layers optimized for model training and evaluation.
Lilly offers a comprehensive benefit program including medical, dental, vision, 401(k) matching, and flexible spending accounts alongside a competitive salary range.
Matching Summary
The role involves building the semantic layer and lakehouse architecture to connect scientists with petabyte-scale data through natural language interfaces.
Salary
Base: $166,500 - $266,200; Bonus/Equity: Company bonus depending on performance; Benefits: Medical, dental, vision, 401(k), vacation, wellness programs
Skills & Requirements
Must-have
8 years data engineering experience
Expertise in ETL/ELT workflows
Strong SQL skills with complex schemas
Experience with Databricks or Snowflake
Proficiency in Python for data processing
Nice-to-have
PhD in data science or related field
Experience with biomedical ontologies
Knowledge of knowledge graph technologies
Familiarity with vector databases
Deep experience with Databricks Unity Catalog
Key Requirements
Bachelor's degree plus 8 years experience OR Master's plus 5 years
Experience in pharmaceutical or life sciences environments
Knowledge of data governance in regulated industries