Software Engineer, Infrastructure

FAL

Turkey
On-site
3+ years managing bare-metal server fleets
Strong python software engineering skills
Deep linux systems knowledge and tuning
FAL is seeking a hands-on Software Engineer for Infrastructure to manage and enhance software processes for a fleet of GPU servers. The role involves building automation tools, monitoring systems, and ensuring the health and performance of the server infrastructure

Job Summary

  • You will build and maintain a Python fleet tracking system that manages the full lifecycle of thousands of servers.
  • The role involves creating metrics, dashboards, and alerting for hardware health across the entire GPU fleet.
  • Candidates must have strong software engineering skills in Python to write production tooling rather than scripts.

Matching Summary

Match Score: 85

FAL is seeking a hands-on Software Engineer for Infrastructure to manage and enhance software processes for a fleet of GPU servers. The role involves building automation tools, monitoring systems, and ensuring the health and performance of the server infrastructure.

Skills & Requirements

Must-have

  • 3+ years managing bare-metal server fleets
  • Strong Python software engineering skills
  • Deep Linux systems knowledge and tuning
  • Experience with Ansible Terraform cloud-init
  • Solid understanding of storage technologies

Nice-to-have

  • Familiarity with network configuration diagnostics
  • Experience with NVIDIA GPU infrastructure
  • Experience with AMD GPUs
  • Experience with bare metal VM provisioning
  • Experience with compliance frameworks SOC 2

Key Requirements

  • 3+ years experience managing server fleets at scale
  • Strong software engineering skills in Python
  • Deep Linux systems knowledge including kernel tuning
  • Experience with configuration management tools
  • Solid understanding of storage technologies

Work Rights

Not specified

Tailored Resume

Cover Letter