Datarobot is actively seeking a Senior Database Engineer to join our Fleet Management team. This is a pivotal role that requires creativity, deep technical knowledge, and great enthusiasm to manage our stateful infrastructure.
This position is an exciting opportunity to own the full lifecycle (administration, automation, and troubleshooting) of our critical database systems operating within a large-scale, multi-tenant Kubernetes environment. You will be essential in driving our GitOps and Helm-centric deployment strategy, focusing on ensuring zero-downtime upgrades and maximizing performance and stability for our core platform services.
Key Responsibilities: * Design, implement, and maintain database infrastructure using StatefulSets, Operators, and Helm charts to ensure databases are reliable, self-healing, and scalable. * Own the deployment lifecycle for database clusters by managing version control for Helm charts and configuration templates. * Support and administer production database systems by proactively instrumenting and monitoring performance, security, and availability within the containerized environment. * Perform zero-downtime upgrades and migrations for major and minor releases, developing and maintaining Helm hooks and custom scripts to automate complex stateful operations. * Manage and optimize performance for backend data stores, ensuring data consistency and integrity across pod life cycles. * Develop and maintain automation tools and scripts (Bash, Python) specifically focused on simplifying Kubernetes management tasks, such as provisioning users/secrets and monitoring cluster state.
Knowledge, Skills & Abilities: * 5+ years of experience managing large-scale, high-availability database systems (PostgreSQL and MongoDB) in a SaaS environment. * Deep Expertise in Kubernetes & Helm * Deep knowledge of advanced PostgreSQL HA concepts (e.g., streaming replication, Repmgr/Patroni) and/or MongoDB sharding and replication, specifically how they are implemented and configured via Helm values. * Experience managing database infrastructure on major cloud platforms (AWS, GCP, or Azure). * Highly proficient in scripting (Bash/Python) and using GitOps principles to manage infrastructure and deployment pipelines. * Strong grasp of database performance tuning, scaling concepts, and optimizing SQL/Aggregation queries. * Container Orchestration Experience with production databases