Export & Bundles¶
1 endpoint for merging two to five remote files into a single virtual dataset via configurable joins.
| Method | Endpoint | Purpose |
|---|---|---|
POST |
/v1/datasets/bundle |
Join 2–5 remote data sources into a single named dataset |
Python SDK Examples¶
Join an orders CSV with a customers Parquet file¶
from toolkitapi import Analytics
with Analytics(api_key="tk_...") as analytics:
bundle = analytics.create_bundle({
"sources": [
{
"alias": "orders",
"data_url": "https://storage.example.com/orders.csv",
"file_type": "csv",
},
{
"alias": "customers",
"data_url": "https://storage.example.com/customers.parquet",
"file_type": "parquet",
},
],
"joins": [
{
"left_alias": "orders",
"right_alias": "customers",
"left_key": "customer_id",
"right_key": "id",
"join_type": "INNER",
}
],
})
dataset_id = bundle["dataset_id"]
print(dataset_id)
Analyze the bundled dataset immediately¶
from toolkitapi import Analytics
with Analytics(api_key="tk_...") as analytics:
bundle = analytics.create_bundle({
"sources": [
{"alias": "sales", "data_url": "https://example.com/sales.csv", "file_type": "csv"},
{"alias": "regions", "data_url": "https://example.com/regions.json", "file_type": "json"},
],
"joins": [
{
"left_alias": "sales",
"right_alias": "regions",
"left_key": "region_code",
"right_key": "code",
"join_type": "LEFT",
}
],
})
result = analytics.analyze({
"data_url": "", # not used when dataset_id is supplied via bundle
"prompt": "What is total revenue by region name?",
"file_type": "auto",
})
Note
Pass the dataset_id returned from create_bundle directly to /v1/analyze, /v1/visualize, or /v1/validate-chart — there is no separate "attach" step.
Request Parameters¶
POST /v1/datasets/bundle¶
| Parameter | Type | Required | Description |
|---|---|---|---|
sources |
array | Yes | Between 2 and 5 source objects. Each alias must be unique |
joins |
array | Yes | One or more join definitions linking sources together |
sources item fields¶
| Field | Type | Required | Description |
|---|---|---|---|
alias |
string | Yes | Short unique name for this source — used as a column-name prefix in the merged schema (e.g. orders.revenue) |
data_url |
string | Yes | Publicly reachable URL to the data file. Pre-signed object-storage URLs are supported |
file_type |
string | No | One of csv, json, parquet, or tsv. Inferred from the URL extension when omitted |
joins item fields¶
| Field | Type | Required | Description |
|---|---|---|---|
left_alias |
string | Yes | Alias of the left-hand source |
right_alias |
string | Yes | Alias of the right-hand source |
left_key |
string | Yes | Column name in the left source to join on (without alias prefix) |
right_key |
string | Yes | Column name in the right source to join on (without alias prefix) |
join_type |
string | No | INNER (default), LEFT, LEFT ANTI, or CROSS |
Response Fields¶
| Field | Type | Description |
|---|---|---|
dataset_id |
string | Unique handle for the merged dataset — pass directly to /v1/analyze, /v1/visualize, or /v1/validate-chart |
sources |
array of strings | Aliases of the sources included in the bundle |
columns |
array | Merged schema — each entry has name (with alias prefix), type, and nullable |
schema_fingerprint |
string | Fingerprint of the merged schema for cache and drift detection |
Tip
Column names in the merged schema are prefixed with their source alias (e.g. orders.revenue, customers.region) to prevent collisions. Use these prefixed names when building query prompts or chart specs against the bundle.