We are looking for a highly skilled and independent Senior Infrastructure Engineer to support the long-term stability, scalability, and security of a global cloud platform. The role involves driving infrastructure architecture, improving reliability through SRE best practices, and contributing to continuous operational excellence. You will work across engineering, product, and operations teams to ensure system resilience, efficient delivery pipelines, and high service availability. The position requires a hands-on approach, strong ownership mindset, and the ability to operate in dynamic, high-growth environments. Requirements * 6+ years in Infrastructure, SRE, or DevOps roles within large-scale distributed environments. * Strong expertise with AWS ecosystems and Infrastructure-as-Code tools (Terraform, CloudFormation, CDK). * Excellent knowledge of cloud-native architectures, Linux systems, microservices, networking, and security. * Proficiency with CI/CD pipelines (CircleCI, GitHub Actions, Argo), GitOps workflows, API Gateways, containers (Docker), and Kubernetes (EKS). * Practical experience with monitoring, observability, and alerting platforms (Prometheus, Grafana, OpenTelemetry, New Relic or similar). * Solid understanding of performance tuning, capacity planning, and high-availability system design. * Strong scripting or programming skills (Python, Go, Bash). * Experience leading incident response, post-mortem analysis, and disaster-recovery processes. * Ability to collaborate effectively across engineering and cross-functional teams. * Strong communication skills and experience mentoring other engineers. * Experience with message brokers (Kafka/MSK) and JVM-based stacks (Java/Spring) is a plus. * Familiarity with configuration-management tools (Ansible, Puppet, Chef) is beneficial.
Duties * Architect, implement, and maintain scalable, secure, and fault-tolerant infrastructure across distributed systems. * Lead incident-management workflows, ensuring fast resolution and driving systemic improvements. * Build automated monitoring, alerting, self-healing tools, and service-health mechanisms. * Perform ongoing performance analysis, load optimization, and capacity planning. * Improve reliability and efficiency of deployment pipelines, cloud operations, and release processes. * Collaborate with software engineering, product teams, and operational stakeholders to enhance platform stability. * Introduce and champion SRE best practices, automation initiatives, and continuous improvement programs. * Provide mentorship and technical leadership within the infrastructure and engineering teams. * Participate in strategic infrastructure planning, architectural decisions, and long-term platform evolution.
Benefits: * A competitive salary and flexible compensation package. * Flexible working format: remote, office-based, co-working space. * Professional development tools (mentorship program, tech talks and trainings). * Medical insurance. * Free corporate English classes and speaking clubs with a native speaker.