Data Sources

2 endpoints for ingesting remote files and inspecting their inferred column schemas.

Method Endpoint Purpose
POST /v1/analyze Upload a data URL, ask a natural-language question, receive an AI summary and schema
GET /v1/datasets/{dataset_id}/schema Retrieve column metadata for a previously uploaded dataset

Python SDK Examples

Analyze a CSV dataset

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    result = analytics.analyze({
        "data_url": "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv",
        "file_type": "csv",
        "prompt": "What were the top 3 months by passenger count?",
        "execution_mode": "sync",
    })

print(result["summary"])
print(result["dataset_id"])
print(result["result_preview"])

Inspect a dataset schema

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    # First analyze to get a dataset_id
    result = analytics.analyze({
        "data_url": "https://example.com/sales.parquet",
        "file_type": "parquet",
        "prompt": "Summarize the columns",
    })

    schema = analytics.get_schema(result["dataset_id"])

for col in schema["columns"]:
    print(col["name"], col["dtype"], col["null_count"])

Request Parameters

POST /v1/analyze

Parameter Type Required Description
data_url string Yes Publicly accessible URL to a CSV, JSON, Parquet, or Excel file
prompt string Yes Natural-language question or instruction describing the desired analysis
file_type string No One of csv, json, parquet, excel, or auto (default)
execution_mode string No sync (default) waits for the result; async returns a job_id immediately
include_debug boolean No When true, includes the generated SQL and query metrics in the response

GET /v1/datasets/{dataset_id}/schema

Parameter Type Required Description
dataset_id string (path) Yes The unique identifier returned by /v1/analyze

Response Fields

Analyze

Field Type Description
dataset_id string Unique handle for the cached dataset — pass to subsequent calls
summary string AI-generated natural-language summary of the analysis result
result_preview array Up to 100 rows of the query result as an array of objects
schema_ array Inferred column definitions — each entry has name, dtype, and nullable
meta object Observability metadata: request_id, runtime_ms, cache_hit, rows_scanned_estimate, schema_fingerprint
sql string | null Generated SQL (only present when include_debug is true)
metrics object | null Query execution metrics (only present when include_debug is true)

Schema

Field Type Description
dataset_id string The unique identifier of the dataset
row_count integer Total number of rows in the dataset
columns array List of column descriptor objects
columns[].name string Column header as it appears in the source data
columns[].dtype string Inferred pandas-compatible data type (e.g. float64, object, datetime64[ns])
columns[].sample_values array Up to five representative values from the column
columns[].null_count integer Number of null or missing values in the column
columns[].unique_count integer Number of distinct non-null values in the column

Tip

For large files, set execution_mode to async and poll GET /v1/jobs/{job_id} for the result. The returned dataset_id is reusable across /v1/visualize, /v1/save, and /v1/datasets/{dataset_id}/schema without re-uploading the file.