Our Customer: Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability.
Your tasks: * Lead the design and development of low-latency ML inference services handling massive request volumes. * Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs. * Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production. * Design systems for model versioning, shadowing, and A/B testing at runtime. * Ensure high availability, scalability, and observability of production systems. * Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques. * Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders.
Required Experience and Skills: * B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field. * 5+ years of experience building high-performance backend or ML inference systems. * Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML). * Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing. * Strong understanding of model deployment, online/offline feature parity, and real-time monitoring. * Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes). * Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access. * Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices. * Strong ownership mindset and ability to deliver solutions end-to-end. * Passion for performance, clean architecture, and impactful systems.
Would be a plus: * Prior experience leading high-throughput, low-latency ML systems in production. * Knowledge of real-time feature pipelines and streaming data platforms. * Familiarity with advanced monitoring and profiling techniques for ML services.
Working Conditions: * Remote work; * 5-day working week, 8-hour working day, flexible schedule.