CodeTiburon is looking for a Senior Python Developer to join our team remotely. Project: Agent Engineering Platform (AI Systems Control Layer) Israel based startup is building a platform that helps teams design, observe, evaluate, and optimize AI agents systematically — like software engineering, not prompt guesswork. The goal is to create the infrastructure behind production-grade AI agents: evaluation execution, gating, orchestration, reliability, and optimization workflows, grounded in real usage data. Product technology stack: Python, microservices, distributed pipelines, async/event-driven architecture, Ray / Optuna (or equivalents), React, TypeScript, Docker, Kubernetes, CI/CD, AWS/GCP/Azure, SDKs + CLI tools, observability tooling (metrics/logs/tracing). Required Skills: * 10+ years of experience building production-grade software systems * Strong backend expertise in Python * Experience building microservices and distributed systems * Knowledge of async/event-driven architectures, retries, scheduling, idempotency * Strong understanding of designing clean and stable APIs (service-to-service + SDKs/CLI) * Experience ensuring scale and correctness: * queueing, backpressure * fault tolerance * multi-tenant isolation * SLAs / reliability requirements * Strong testing mindset: unit/integration tests, contract tests, CI/CD gates * Experience with observability: metrics, logs, tracing, profiling, performance/cost budgeting * Cloud and platform fundamentals: Docker, Kubernetes, CI/CD pipelines * Comfortable working with AWS / GCP / Azure * Intermediate+ spoken and written English
Will be a plus: * Strong SRE/observability experience (profiling, tracing, incident response patterns) * Infrastructure-as-code (Terraform, Pulumi, etc.) * Security hardening / production readiness practices * Data/ML backend experience (retrieval systems, vector DBs, evaluation datasets) * Familiarity with frameworks like Ray, Optuna, or other compute/optimization systems * Ability to contribute to light full-stack development (React / TypeScript )
Your key accountabilities and responsibilities will include: * Own and evolve Python microservices and distributed workflows end-to-end * Build and harden distributed evaluation pipelines (execution, scheduling, retries, idempotency, fault tolerance) * Design and maintain stable, well-documented APIs (including versioning, SDK-first ergonomics, backward compatibility) * Engineer services for reliability and scale in multi-tenant production systems * Raise the quality bar: * * testing strategy * CI/CD gates * release discipline * Drive platform observability: * metrics, logging, tracing, profiling * cost/latency budgeting * Work with static code analysis tools and program transformation (ASTs, instrumentation, linters/rules engines) * Contribute when needed to platform UI features (React/TypeScript)
What we offer: * Remote-friendly work with overlap in Israel/Europe time zone * Paid leaves and holidays * Small high-autonomy founding team with real ownership * Working product with SDK and specification language (TVL) * Deep technical challenges: distributed systems, pipelines, orchestration, optimization
If this sounds like you and you have most of the skills and qualifications above please send your CV.
We sincerely thank all applicants for applying; if we like what we see and feel you are a match for our position, we will be in touch.