OTAKOYI is looking for a smart and eager Senior AI Engineer(part-time) to join our team. We like challenges and self-development. If you like it too, don’t hesitate to join us!
What You’ll Do AI Development & Implementation * Design and implement multi-agent systems (supervisor-worker, collaborative topologies) with production-grade failure recovery * Build advanced RAG pipelines: hybrid search, reranking, GraphRAG, Agentic RAG — with justified architectural trade-offs * Develop scalable backend AI services using Python, FastAPI, Pydantic, and Celery * Integrate vector databases (pgvector, Pinecone, Weaviate) into production AI systems * Implement MCP servers and integrations (Streamable HTTP transport, OAuth 2.1)
Eval Design & Production Quality * Build eval infrastructure as part of CI/CD: golden datasets, regression detection, online monitoring * Design evaluation metrics per use-case (RAGAS for RAG, custom for agents) and maintain dataset freshness * Implement production observability via LangSmith / Helicone / Langfuse: traces, spans, A/B model testing, anomaly detection
Cost & Architecture * Design multi-tier model routing strategies and prompt caching to control LLM inference cost * Contribute to architecture decisions with documented trade-offs (RAG vs fine-tuning, model selection, ADR) * Apply Spec Driven Development: formalize AI tasks with eval criteria before implementation
Delivery * Work within MAE delivery model using Claude Code as primary agentic coding environment * Deploy and manage containerized AI services via Docker and Kubernetes on AWS / Azure
Required Skills * Hands-on experience building multi-agent systems with LangGraph; knows failure modes in production (infinite loops, state desync, tool call storms) * Advanced RAG implementation: chunking strategies, hybrid search, reranking, evaluation metrics (MRR, NDCG, recall@k) * Production eval pipeline experience: can walk through an eval they designed — metric, dataset, what it caught * LLM APIs: OpenAI, Azure OpenAI, Anthropic Claude, Gemini — can justify model selection with data * Prompt engineering: system prompt design, few-shot calibration, prompt caching, injection defense * LLM cost optimization: token economics, model routing, TCO analysis * MCP integration: knows spec, Streamable HTTP, can build a custom MCP server * Safety: OWASP LLM Top 10, prompt injection defense (direct + indirect), guardrails architecture * Production Python: async, FastAPI with DI + error handling, Pydantic, Celery * LLM observability: LangSmith / Helicone / Langfuse — traces, alerting, regression detection * Docker + CI/CD with automated eval runs on PR
Nice to Have * n8n for AI workflow automation * GCP / Vertex AI experience * DSPy for systematic prompt optimization * MLflow or DVC for experiment tracking * Experience with BMAD methodology for large-scope projects