sqlmesh cube_generate build part 1

A first iteration

Dec 17, 2024

I’ve been hacking away at this today and got so far:

I have a working CLI command: sqlmesh cube_generate

To test whether it works well, I used Windsurf to build an example e-commerce data model. The above model is one of the more interesting used to generate Cube modeling code.

This is what sqlmesh cube_generate outputs for it:

"model": "gold.daily_sales_metrics",

"joins": [

{

"type": "INNER",

"left": "silver.orders",

"right": "silver.order_items",

"condition": "o.date = i.date"

}

],

"fields": [

{

"name": "silver.orders.date",

"type": "DATE",

"sql": "CAST(order_timestamp AS DATE)",

"is_agg": false

},

{

"name": "silver.orders.total_orders",

"type": "BIGINT",

"sql": "COUNT(DISTINCT order_id)",

"is_agg": true

},

{

"name": "silver.orders.unique_customers",

"type": "BIGINT",

"sql": "COUNT(DISTINCT user_id)",

"is_agg": true

},

{

"name": "silver.orders.total_revenue",

"type": "DECIMAL",

"sql": "SUM(total_amount)",

"is_agg": true

},

{

"name": "silver.orders.completed_orders",

"type": "DECIMAL",

"sql": "SUM(CASE WHEN is_completed THEN 1 ELSE 0 END)",

"is_agg": true

},

{

"name": "silver.order_items.date",

"type": "DATE",

"sql": "CAST(order_timestamp AS DATE)",

"is_agg": false

},

{

"name": "silver.order_items.unique_products_sold",

"type": "BIGINT",

"sql": "COUNT(DISTINCT product_id)",

"is_agg": true

},

{

"name": "silver.order_items.total_items_sold",

"type": "DECIMAL",

"sql": "SUM(quantity)",

"is_agg": true

},

{

"name": "silver.order_items.avg_unit_price",

"type": "DOUBLE",

"sql": "AVG(unit_price)",

"is_agg": true

},

{

"name": "avg_order_value",

"type": "DOUBLE",

"sql": "o.total_revenue / NULLIF(NULLIF(o.total_orders, 0), 0)",

"is_agg": false

},

{

"name": "order_completion_rate",

"type": "DOUBLE",

"sql": "CAST(o.completed_orders AS FLOAT) / NULLIF(NULLIF(o.total_orders, 0), 0)",

"is_agg": false

}

]

I can already see that what I want to do is possible. I’ve struggled for the last hour or so to replace things like o.completed_orders with silver.orders.total_revenue, where it is a derived field. However, even if I can’t get this to work, I can use the JSON output above to generate some decent Cube YAML, and then use an AI step to enhance it using the SQL, JSON, and intermediate YAML. I could actually have stopped sooner and just said this won’t support CTEs, but where’s the fun in that?

davidj.substack

Discussion about this post