Ml Engineer - Evaluation Analysis, Metric And Data Strategy
Apple
United States Of America, United States
Not specified; not specified; not specified
Not specified
5+ years applied science experience
Statistical analysis and experimental design
Python or r proficiency for data analysis
Apple is seeking an experienced Machine Learning Engineer to join their Productivity and Machine Learning Evaluation team, focusing on ensuring the quality of AI features across productivity applications. The role emphasizes designing feature-level quality metrics, analyzing evaluation data, and collaborating with cross-functional teams to drive data-informed decisions
Job Summary
This role serves as the analytical core of the team, responsible for defining how AI feature quality is measured across a suite of productivity applications.
The position involves designing feature-level quality metrics and multi-turn evaluation frameworks where the unit of analysis is a conversation rather than a single response.
Candidates will collaborate with partner teams to ensure evaluation data represents real-world usage and translate complex analytical findings into actionable decisions for leadership.
Matching Summary
Match Score: 85
Apple is seeking an experienced Machine Learning Engineer to join their Productivity and Machine Learning Evaluation team, focusing on ensuring the quality of AI features across productivity applications. The role emphasizes designing feature-level quality metrics, analyzing evaluation data, and collaborating with cross-functional teams to drive data-informed decisions.
Salary
Not specified; Not specified; Not specified
Skills & Requirements
Must-have
5+ years applied science experience
Statistical analysis and experimental design
Python or R proficiency for data analysis
Designing session-level evaluation frameworks
Analyzing production user data biases
Nice-to-have
Experience with agentic orchestration frameworks
Familiarity with productivity software applications
Background in inter-annotator agreement methods
Understanding of tool-use accuracy evaluation
Experience translating findings for non-technical leaders
Key Requirements
Bachelor's degree in Statistics, Data Science, or related field
5+ years experience in applied science or evaluation research
Proficiency in Python (pandas, scipy, scikit-learn) or R