We’re looking for an Applied AI Engineer who combines strong ML fundamentals with the discipline of improving production AI systems through metrics, evaluation, and iteration.
This role is hands-on, product-focused, and collaborative with the AI platform lead. The role in a nutshell You’ll work on improving production AI systems through evaluation, experimentation, and system design.
A large part of the role involves: * diagnosing failures in agent workflows * designing evaluation metrics and KPIs * improving system prompts and agent behavior * running structured experiments and measuring impact
You won’t be working in isolation on research projects — you’ll be improving systems that real users depend on.
Rough responsibility breakdown: * AI evaluation and KPI design — ~30% * Prompt and agent system design — ~30% * ML systems (recommendation, optimization, etc.) — ~30% * Engineering integration — ~10%
What you’ll work on:AI evaluation and system quality * Design evaluation strategies for LLM and agent workflows * Create metrics and KPIs for AI system performance * Build and maintain evaluation datasets * Debug production AI failures systematically * Compare system behavior against baselines
This is a core responsibility of the role. Multi-agent AI systems * Improve agent orchestration and workflows * Diagnose failures across agent pipelines * Refine system prompts and agent interactions * Improve reliability, latency, and response quality
ML and AI systems You’ll contribute to areas such as: * Recommendation systems (ranking and personalization) * Itinerary optimization and constraint-based planning * LLM-based reasoning systems * Optional: computer vision pipelines
Depth in one of these areas is more important than superficial experience in all of them. Engineering collaboration We use: * Golang (primary production language) * Python when necessary for ML workflows * Postgres, Redis, and internal services
You don’t need to be a Go expert on day one, but you should be comfortable reading and modifying production code.
Backend engineers handle infrastructure-heavy service development — your focus is AI system behavior and correctness. What we’re looking forMust-haves Strong AI/ML fundamentals You understand the theory behind what you build and can choose appropriate methods for a problem.
Examples: * evaluation metrics (precision/recall/F1/etc.) * ranking and recommendation concepts * embeddings and similarity * experimentation methodology
Not required: * academic publications * advanced theoretical math * large-scale model training experience
Evaluation-driven mindset You: * think in metrics and baselines * design experiments instead of guessing * measure system improvements quantitatively * debug failures methodically
This is the most important signal for the role.
Experience with LLM systems You’ve worked with: * prompt design * agent workflows * evaluation of LLM outputs * production LLM integrations
Ability to ship production systems You can: * turn ideas into working systems * iterate based on results * balance exploration with delivery
Programming ability You’re comfortable writing production code in at least one language (Python, Go, or similar) and learning others when needed. Strong signals (nice to have) * Experience improving an AI system after deployment * Recommendation systems or ranking experience * Optimization or constraint-based systems * Computer vision experience * Experience building evaluation frameworks * Golang experience * Startup or small-team engineering experience
This role may not be a fit if * You are looking for a research focused role without production deployment * You rely heavily on frameworks without understanding fundamentals * You’re uncomfortable working with partially-defined problems * You prefer narrow specialization over product ownership
When you apply for the role, please answer the following questions: * How much commercial experience do you have with AI/ML? * How much commercial experience do you have with LLM systems? * Do you have commercial experience with multi-agent AI systems? * Are you comfortable writing production code in at least one language (Python, Go, or similar)? * What is your current level of proficiency in English? * Please share your monthly salary expectations (gross amount in USD). * Can you start asap?