The Role Cloud billing data is inherently messy — multi-cloud, multi-structure, with billing models that don’t normalize cleanly and usage signals that don’t behave. You’ll own the data infrastructure that untangles this: the pipelines, models, and backend services that turn raw AWS, GCP, and Azure billing exports into reliable product capabilities. This is a production ownership role — architecture, code, monitoring, and stability all land on you. About the Product The platform ingests and processes large-scale cloud billing, usage, and operational data across AWS, Azure, and GCP — and turns it into cost visibility, recommendations, forecasting, and anomaly detection for enterprise customers. The core engineering challenge is scale and reliability: cloud billing structures are complex, volumes are high, and the data directly drives product decisions and customer spend outcomes. This is a product company, not a consulting engagement — the infrastructure you build runs in production and affects real customers.
Technology Stack: The platform works with cloud billing and usage data from AWS, Azure, and GCP — processed through Python and SQL, orchestrated with Airflow, and running across modern data platforms including Spark, ClickHouse, BigQuery, Databricks, and Snowflake. The stack was chosen for scale and cost-awareness: the same discipline applied to customer cloud spend applies internally. AWS is the primary cloud environment. What You’ll Be Doing * Design and maintain production ETL/ELT pipelines that ingest, normalize, and model cloud billing and usage data at scale across multiple cloud providers. * Own the performance, reliability, and cost-efficiency of the data platform — query optimization, storage architecture, processing costs, and production stability. * Build backend data services in Python and SQL that power product capabilities: cost recommendations, usage forecasting, anomaly detection, and customer-facing insights. * Work with cloud billing source data including AWS CUR, Azure Cost Management exports, and GCP billing exports — including complex structures like marketplace billing and partner models. * Architect and improve orchestration flows using Airflow or equivalent, across platforms such as Spark, Databricks, Snowflake, BigQuery, or ClickHouse. * Own data quality, monitoring, and observability — not just the pipeline, but what comes out of it. * Review architecture and code, mentor other engineers, and drive engineering standards within the data domain. * Use AI/LLM tools (Cursor, GitHub Copilot, Claude, ChatGPT, or equivalent) as a daily development accelerator — for coding, debugging, testing, documentation, and technical research — while maintaining full engineering ownership of the output.
What We ExpectMust-have * 7+ years in data engineering, data platform engineering, or backend engineering with heavy data focus. * Production-grade Python and SQL — not notebooks, not scripts. Code that runs reliably in production at scale. * Strong experience building and maintaining ETL/ELT pipelines with real ownership: design, deployment, monitoring, and incident response. * Experience with large-scale data processing using Spark or equivalent frameworks. * Workflow orchestration with Apache Airflow or similar. * Cloud experience, primarily AWS. GCP and/or Azure are a strong advantage. * Hands-on experience with at least one cloud data warehouse or query engine: Redshift, Athena, BigQuery, Snowflake, Databricks, ClickHouse, or equivalent. * Strong understanding of data modeling, data quality, and production monitoring. * Demonstrated experience optimizing query performance, storage usage, or infrastructure costs. * Ability to lead technical discussions, own domains end to end, and mentor other engineers. * Hands-on experience using AI/LLM tools as part of the software development workflow — and the engineering judgment to validate what they produce.
Nice to have * Experience with cloud billing data specifically: AWS CUR, Azure Cost Management, GCP billing exports, marketplace billing, or partner billing models. * Background in FinOps, cloud cost optimization, or usage-based billing data. * Experience building data products: recommendations, forecasting flows, dashboards, or anomaly-detection pipelines. * Experience leading a small team or acting as a technical owner for a data domain. * AWS services experience: S3, Glue, Lambda, Athena, Redshift, EMR, ECS, EKS. * Track record in a startup or product-company environment.
Why This Role Is Worth Your Time * The domain has real technical depth. Cloud billing data is genuinely complex — multi-source, inconsistently structured, high-volume, and directly tied to business outcomes. This isn’t CRUD pipelines over clean data. * You’ll own infrastructure that shapes the product. The data platform isn’t a support function — it’s the core layer that makes cost recommendations, forecasting, and anomaly detection possible. What you build determines what the product can do. * AI tooling is a first-class expectation, not a novelty. The team uses AI tools as daily productivity multipliers. You won’t be explaining why you use Cursor or Claude — you’ll be expected to. * End-to-end ownership. From design to production behavior, you’re accountable. The role suits engineers who want to see their decisions run in the real world, not hand off to someone else.