Data Sources¶

2 endpoints for ingesting remote files and inspecting their inferred column schemas.

Method	Endpoint	Purpose
`POST`	`/v1/analyze`	Upload a data URL, ask a natural-language question, receive an AI summary and schema
`GET`	`/v1/datasets/{dataset_id}/schema`	Retrieve column metadata for a previously uploaded dataset

Python SDK Examples¶

Analyze a CSV dataset¶

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    result = analytics.analyze({
        "data_url": "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv",
        "file_type": "csv",
        "prompt": "What were the top 3 months by passenger count?",
        "execution_mode": "sync",
    })

print(result["summary"])
print(result["dataset_id"])
print(result["result_preview"])

Inspect a dataset schema¶

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    # First analyze to get a dataset_id
    result = analytics.analyze({
        "data_url": "https://example.com/sales.parquet",
        "file_type": "parquet",
        "prompt": "Summarize the columns",
    })

    schema = analytics.get_schema(result["dataset_id"])

for col in schema["columns"]:
    print(col["name"], col["dtype"], col["null_count"])

Request Parameters¶

POST /v1/analyze¶

Parameter	Type	Required	Description
`data_url`	string	Yes	Publicly accessible URL to a CSV, JSON, Parquet, or Excel file
`prompt`	string	Yes	Natural-language question or instruction describing the desired analysis
`file_type`	string	No	One of `csv`, `json`, `parquet`, `excel`, or `auto` (default)
`execution_mode`	string	No	`sync` (default) waits for the result; `async` returns a `job_id` immediately
`include_debug`	boolean	No	When `true`, includes the generated SQL and query metrics in the response

GET /v1/datasets/{dataset_id}/schema¶

Parameter	Type	Required	Description
`dataset_id`	string (path)	Yes	The unique identifier returned by `/v1/analyze`

Response Fields¶

Analyze¶

Field	Type	Description
`dataset_id`	string	Unique handle for the cached dataset — pass to subsequent calls
`summary`	string	AI-generated natural-language summary of the analysis result
`result_preview`	array	Up to 100 rows of the query result as an array of objects
`schema_`	array	Inferred column definitions — each entry has `name`, `dtype`, and `nullable`
`meta`	object	Observability metadata: `request_id`, `runtime_ms`, `cache_hit`, `rows_scanned_estimate`, `schema_fingerprint`
`sql`	string \| null	Generated SQL (only present when `include_debug` is `true`)
`metrics`	object \| null	Query execution metrics (only present when `include_debug` is `true`)

Schema¶

Field	Type	Description
`dataset_id`	string	The unique identifier of the dataset
`row_count`	integer	Total number of rows in the dataset
`columns`	array	List of column descriptor objects
`columns[].name`	string	Column header as it appears in the source data
`columns[].dtype`	string	Inferred pandas-compatible data type (e.g. `float64`, `object`, `datetime64[ns]`)
`columns[].sample_values`	array	Up to five representative values from the column
`columns[].null_count`	integer	Number of null or missing values in the column
`columns[].unique_count`	integer	Number of distinct non-null values in the column

Tip

For large files, set execution_mode to async and poll GET /v1/jobs/{job_id} for the result. The returned dataset_id is reusable across /v1/visualize, /v1/save, and /v1/datasets/{dataset_id}/schema without re-uploading the file.

Analytics Toolkit

Query & Analysis