Export & Bundles

1 endpoint for merging two to five remote files into a single virtual dataset via configurable joins.

Method Endpoint Purpose
POST /v1/datasets/bundle Join 2–5 remote data sources into a single named dataset

Python SDK Examples

Join an orders CSV with a customers Parquet file

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    bundle = analytics.create_bundle({
        "sources": [
            {
                "alias": "orders",
                "data_url": "https://storage.example.com/orders.csv",
                "file_type": "csv",
            },
            {
                "alias": "customers",
                "data_url": "https://storage.example.com/customers.parquet",
                "file_type": "parquet",
            },
        ],
        "joins": [
            {
                "left_alias": "orders",
                "right_alias": "customers",
                "left_key": "customer_id",
                "right_key": "id",
                "join_type": "INNER",
            }
        ],
    })

dataset_id = bundle["dataset_id"]
print(dataset_id)

Analyze the bundled dataset immediately

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    bundle = analytics.create_bundle({
        "sources": [
            {"alias": "sales", "data_url": "https://example.com/sales.csv", "file_type": "csv"},
            {"alias": "regions", "data_url": "https://example.com/regions.json", "file_type": "json"},
        ],
        "joins": [
            {
                "left_alias": "sales",
                "right_alias": "regions",
                "left_key": "region_code",
                "right_key": "code",
                "join_type": "LEFT",
            }
        ],
    })

    result = analytics.analyze({
        "data_url": "",           # not used when dataset_id is supplied via bundle
        "prompt": "What is total revenue by region name?",
        "file_type": "auto",
    })

Note

Pass the dataset_id returned from create_bundle directly to /v1/analyze, /v1/visualize, or /v1/validate-chart — there is no separate "attach" step.

Request Parameters

POST /v1/datasets/bundle

Parameter Type Required Description
sources array Yes Between 2 and 5 source objects. Each alias must be unique
joins array Yes One or more join definitions linking sources together

sources item fields

Field Type Required Description
alias string Yes Short unique name for this source — used as a column-name prefix in the merged schema (e.g. orders.revenue)
data_url string Yes Publicly reachable URL to the data file. Pre-signed object-storage URLs are supported
file_type string No One of csv, json, parquet, or tsv. Inferred from the URL extension when omitted

joins item fields

Field Type Required Description
left_alias string Yes Alias of the left-hand source
right_alias string Yes Alias of the right-hand source
left_key string Yes Column name in the left source to join on (without alias prefix)
right_key string Yes Column name in the right source to join on (without alias prefix)
join_type string No INNER (default), LEFT, LEFT ANTI, or CROSS

Response Fields

Field Type Description
dataset_id string Unique handle for the merged dataset — pass directly to /v1/analyze, /v1/visualize, or /v1/validate-chart
sources array of strings Aliases of the sources included in the bundle
columns array Merged schema — each entry has name (with alias prefix), type, and nullable
schema_fingerprint string Fingerprint of the merged schema for cache and drift detection

Tip

Column names in the merged schema are prefixed with their source alias (e.g. orders.revenue, customers.region) to prevent collisions. Use these prefixed names when building query prompts or chart specs against the bundle.