We are seeking a highly skilled Senior Data Architect to design, build, and optimize our enterprise data platforms using the AWS ecosystem. The ideal candidate has deep experience with AWS Glue, distributed data processing, data orchestration using DAG-based workflows, and machine-learning pipelines leveraging PyTorch. This role combines hands-on architecture, strategic planning, and technical leadership.
Key Responsibilities:
Data Architecture & Design: * Design end-to-end data architectures across batch, streaming, and real-time use cases. * Develop scalable data models, lakehouse structures, and metadata strategies aligned with business requirements. * Architect ETL/ELT solutions using AWS Glue Jobs, Glue Data Catalog, Glue Studio, and Glue Workflows. * Design architectures using essential AWS services including S3, Glue, Lake Formation, Athena, Redshift, IAM, KMS, VPC, CloudWatch, and CloudTrail. * Ensure designs follow AWS Well-Architected Framework principles (security, cost, performance, reliability).
Data Engineering & Pipelines: * Build and optimize data pipelines in AWS using Glue, Lambda, Step Functions, EMR, Athena, and S3. * Implement DAG-based orchestration using Apache Airflow, AWS Managed Workflows (MWAA), or Glue Workflows. * Ensure data quality, reliability, lineage, and observability across all pipelines.
Machine Learning Pipeline Enablement: * Collaborate with ML teams to productionize models, including those built in PyTorch. * Architect feature store solutions, data preparation workflows, and training/inference pipelines. * Optimize storage and compute architectures for large-scale ML training and batch inference.
Security, Governance & Compliance: * Define and enforce data governance policies, IAM roles, encryption standards, and access patterns. * Implement best practices for data security, privacy, and regulatory compliance (GDPR, HIPAA, etc.). * Establish metadata, lineage, and cataloging strategies using Glue Data Catalog and related tools. * Use AWS KMS, Secrets Manager, and Parameter Store to enforce encryption and secret management.
Technical Leadership: * Set architectural standards and guide engineering teams on best practices in AWS data ecosystems. * Mentor data engineers, analysts, and ML engineers on modern data-platform design. * Evaluate new technologies and drive adoption of scalable, cost-efficient cloud solutions. * Lead cost-optimization and architecture reviews
Required Qualifications: * 7+ years of experience in data architecture, data engineering, or similar roles. * Expert-level knowledge of AWS Glue, S3, Athena, Lambda, Step Functions, and IAM. * Hands-on experience with ETL/ELT development, distributed processing, and job optimization. * Strong experience with DAG-based orchestration: Airflow, MWAA, Step Functions, or Glue Workflows. * Experience integrating or supporting ML pipelines using PyTorch. * Proficiency in Python, SQL, and scalable data-processing frameworks. * Strong understanding of data modeling (OLTP, OLAP, lakehouse) and architectural patterns. * Experience with CI/CD pipelines and DevOps on AWS.
Preferred Qualifications: * AWS certifications (e.g., AWS Certified Data Analytics — Specialty, Solutions Architect). * Experience with EMR, Redshift, Kinesis, or Kafka. * Knowledge of MLOps tools (SageMaker, MLflow, Feature Stores). * Familiarity with IaC (Terraform, CloudFormation). * Experience working in enterprise-scale, highly regulated environments.
Soft Skills: * Strong communication and documentation skills for both technical and executive audiences. * Ability to lead architectural discussions and influence technical direction. * Strong problem-solving mindset and ability to work autonomously.
We’re grateful for your interest in joining us. Kindly note that only applicants whose experience and qualifications most closely align with the role will be contacted for the next steps. Thank you for your understanding.