I’ve been hacking away at this today and got so far:
I have a working CLI command: sqlmesh cube_generate
To test whether it works well, I used Windsurf to build an example e-commerce data model. The above model is one of the more interesting used to generate Cube modeling code.
This is what sqlmesh cube_generate
outputs for it:
"model": "gold.daily_sales_metrics",
"joins": [
{
"type": "INNER",
"left": "silver.orders",
"right": "silver.order_items",
"condition": "o.date = i.date"
}
],
"fields": [
{
"name": "silver.orders.date",
"type": "DATE",
"sql": "CAST(order_timestamp AS DATE)",
"is_agg": false
},
{
"name": "silver.orders.total_orders",
"type": "BIGINT",
"sql": "COUNT(DISTINCT order_id)",
"is_agg": true
},
{
"name": "silver.orders.unique_customers",
"type": "BIGINT",
"sql": "COUNT(DISTINCT user_id)",
"is_agg": true
},
{
"name": "silver.orders.total_revenue",
"type": "DECIMAL",
"sql": "SUM(total_amount)",
"is_agg": true
},
{
"name": "silver.orders.completed_orders",
"type": "DECIMAL",
"sql": "SUM(CASE WHEN is_completed THEN 1 ELSE 0 END)",
"is_agg": true
},
{
"name": "silver.order_items.date",
"type": "DATE",
"sql": "CAST(order_timestamp AS DATE)",
"is_agg": false
},
{
"name": "silver.order_items.unique_products_sold",
"type": "BIGINT",
"sql": "COUNT(DISTINCT product_id)",
"is_agg": true
},
{
"name": "silver.order_items.total_items_sold",
"type": "DECIMAL",
"sql": "SUM(quantity)",
"is_agg": true
},
{
"name": "silver.order_items.avg_unit_price",
"type": "DOUBLE",
"sql": "AVG(unit_price)",
"is_agg": true
},
{
"name": "avg_order_value",
"type": "DOUBLE",
"sql": "o.total_revenue / NULLIF(NULLIF(o.total_orders, 0), 0)",
"is_agg": false
},
{
"name": "order_completion_rate",
"type": "DOUBLE",
"sql": "CAST(o.completed_orders AS FLOAT) / NULLIF(NULLIF(o.total_orders, 0), 0)",
"is_agg": false
}
]
I can already see that what I want to do is possible. I’ve struggled for the last hour or so to replace things like o.completed_orders
with silver.orders.total_revenue
, where it is a derived field. However, even if I can’t get this to work, I can use the JSON output above to generate some decent Cube YAML, and then use an AI step to enhance it using the SQL, JSON, and intermediate YAML. I could actually have stopped sooner and just said this won’t support CTEs, but where’s the fun in that?