Data Sources

2 endpoints for ingesting remote files and inspecting their inferred column schemas.

Method Endpoint Purpose
POST /v1/analyze Upload a data URL, ask a natural-language question, receive an AI summary and schema
GET /v1/datasets/{dataset_id}/schema Retrieve column metadata for a previously uploaded dataset

REST API Examples

Analyze a dataset

curl -X POST "https://analytics.toolkitapi.io/v1/analyze" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"data_url": "https://example.com/data.csv", "file_type": "csv", "prompt": "What are the top 5 values?"}'
const resp = await fetch("https://analytics.toolkitapi.io/v1/analyze", {
  method: "POST",
  headers: { "X-API-Key": "YOUR_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({
    data_url: "https://example.com/data.csv",
    file_type: "csv",
    prompt: "What are the top 5 values?",
  }),
});
const data = await resp.json();
console.log(data.summary, data.dataset_id);

Inspect dataset schema

curl "https://analytics.toolkitapi.io/v1/datasets/{dataset_id}/schema" \
  -H "X-API-Key: YOUR_KEY"

Python SDK Examples

Analyze a CSV dataset

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    result = analytics.analyze({
        "data_url": "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv",
        "file_type": "csv",
        "prompt": "What were the top 3 months by passenger count?",
        "execution_mode": "sync",
    })

print(result["summary"])
print(result["dataset_id"])
print(result["result_preview"])

Inspect a dataset schema

from toolkitapi import Analytics

with Analytics(api_key="tk_...") as analytics:
    # First analyze to get a dataset_id
    result = analytics.analyze({
        "data_url": "https://example.com/sales.parquet",
        "file_type": "parquet",
        "prompt": "Summarize the columns",
    })

    schema = analytics.get_schema(result["dataset_id"])

for col in schema["columns"]:
    print(col["name"], col["dtype"], col["null_count"])

Request Parameters

POST /v1/analyze

Parameter Type Required Description
data_url string Yes Publicly accessible URL to a CSV, JSON, Parquet, or Excel file
prompt string Yes Natural-language question or instruction describing the desired analysis
file_type string No One of csv, json, parquet, excel, or auto (default)
execution_mode string No sync (default) waits for the result; async returns a job_id immediately
include_debug boolean No When true, includes the generated SQL and query metrics in the response

GET /v1/datasets/{dataset_id}/schema

Parameter Type Required Description
dataset_id string (path) Yes The unique identifier returned by /v1/analyze

Response Fields

Analyze

Field Type Description
dataset_id string Unique handle for the cached dataset — pass to subsequent calls
summary string AI-generated natural-language summary of the analysis result
result_preview array Up to 100 rows of the query result as an array of objects
schema_ array Inferred column definitions — each entry has name, dtype, and nullable
meta object Observability metadata: request_id, runtime_ms, cache_hit, rows_scanned_estimate, schema_fingerprint
sql string | null Generated SQL (only present when include_debug is true)
metrics object | null Query execution metrics (only present when include_debug is true)

Schema

Field Type Description
dataset_id string The unique identifier of the dataset
row_count integer Total number of rows in the dataset
columns array List of column descriptor objects
columns[].name string Column header as it appears in the source data
columns[].dtype string Inferred pandas-compatible data type (e.g. float64, object, datetime64[ns])
columns[].sample_values array Up to five representative values from the column
columns[].null_count integer Number of null or missing values in the column
columns[].unique_count integer Number of distinct non-null values in the column

Tip

For large files, set execution_mode to async and poll GET /v1/jobs/{job_id} for the result. The returned dataset_id is reusable across /v1/visualize, /v1/save, and /v1/datasets/{dataset_id}/schema without re-uploading the file.