Data Sources¶
2 endpoints for ingesting remote files and inspecting their inferred column schemas.
| Method |
Endpoint |
Purpose |
POST |
/v1/analyze |
Upload a data URL, ask a natural-language question, receive an AI summary and schema |
GET |
/v1/datasets/{dataset_id}/schema |
Retrieve column metadata for a previously uploaded dataset |
Python SDK Examples¶
Analyze a CSV dataset¶
from toolkitapi import Analytics
with Analytics(api_key="tk_...") as analytics:
result = analytics.analyze({
"data_url": "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv",
"file_type": "csv",
"prompt": "What were the top 3 months by passenger count?",
"execution_mode": "sync",
})
print(result["summary"])
print(result["dataset_id"])
print(result["result_preview"])
Inspect a dataset schema¶
from toolkitapi import Analytics
with Analytics(api_key="tk_...") as analytics:
# First analyze to get a dataset_id
result = analytics.analyze({
"data_url": "https://example.com/sales.parquet",
"file_type": "parquet",
"prompt": "Summarize the columns",
})
schema = analytics.get_schema(result["dataset_id"])
for col in schema["columns"]:
print(col["name"], col["dtype"], col["null_count"])
Request Parameters¶
POST /v1/analyze¶
| Parameter |
Type |
Required |
Description |
data_url |
string |
Yes |
Publicly accessible URL to a CSV, JSON, Parquet, or Excel file |
prompt |
string |
Yes |
Natural-language question or instruction describing the desired analysis |
file_type |
string |
No |
One of csv, json, parquet, excel, or auto (default) |
execution_mode |
string |
No |
sync (default) waits for the result; async returns a job_id immediately |
include_debug |
boolean |
No |
When true, includes the generated SQL and query metrics in the response |
GET /v1/datasets/{dataset_id}/schema¶
| Parameter |
Type |
Required |
Description |
dataset_id |
string (path) |
Yes |
The unique identifier returned by /v1/analyze |
Response Fields¶
Analyze¶
| Field |
Type |
Description |
dataset_id |
string |
Unique handle for the cached dataset — pass to subsequent calls |
summary |
string |
AI-generated natural-language summary of the analysis result |
result_preview |
array |
Up to 100 rows of the query result as an array of objects |
schema_ |
array |
Inferred column definitions — each entry has name, dtype, and nullable |
meta |
object |
Observability metadata: request_id, runtime_ms, cache_hit, rows_scanned_estimate, schema_fingerprint |
sql |
string | null |
Generated SQL (only present when include_debug is true) |
metrics |
object | null |
Query execution metrics (only present when include_debug is true) |
Schema¶
| Field |
Type |
Description |
dataset_id |
string |
The unique identifier of the dataset |
row_count |
integer |
Total number of rows in the dataset |
columns |
array |
List of column descriptor objects |
columns[].name |
string |
Column header as it appears in the source data |
columns[].dtype |
string |
Inferred pandas-compatible data type (e.g. float64, object, datetime64[ns]) |
columns[].sample_values |
array |
Up to five representative values from the column |
columns[].null_count |
integer |
Number of null or missing values in the column |
columns[].unique_count |
integer |
Number of distinct non-null values in the column |
Tip
For large files, set execution_mode to async and poll GET /v1/jobs/{job_id} for the result. The returned dataset_id is reusable across /v1/visualize, /v1/save, and /v1/datasets/{dataset_id}/schema without re-uploading the file.