Text Analysis Toolkit¶
Run deterministic text analysis across 9 endpoints. Compute readability scores, produce extractive summaries, compare documents, create structured diffs, mask sensitive data, filter profanity, analyze term frequency, detect language, and transliterate Unicode text.
Base URL¶
https://textanalysis.toolkitapi.io/v1/
Endpoints¶
Readability and Summarization¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/text/readability |
Compute readability metrics and audience interpretation |
POST |
/v1/text/summarize |
Build an extractive summary with top ranked sentences |
Similarity and Diff¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/text/similarity |
Compare two strings using Levenshtein, cosine, or Jaccard |
POST |
/v1/text/diff |
Generate unified and structured line-level differences |
Filtering and Frequency¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/text/pii-mask |
Detect and mask PII types such as email, phone, and SSN |
POST |
/v1/text/profanity |
Detect and optionally mask profane terms |
POST |
/v1/text/word-frequency |
Return top word frequencies with percentages |
Language Tools¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/text/language |
Detect language and return ranked candidates |
POST |
/v1/text/transliterate |
Convert Unicode text to ASCII approximation |
Quick SDK Example¶
from toolkitapi import TextAnalysis
with TextAnalysis(api_key="tk_...") as ta:
readability = ta.readability(text="This is a short sample paragraph for scoring.")
similarity = ta.similarity(
a="The quick brown fox jumps over the lazy dog.",
b="A quick brown fox jumped over a lazy dog.",
method="cosine",
)
print(readability["scores"]["flesch_reading_ease"])
print(similarity["similarity"])
Python SDK¶
pip install toolkitapi
from toolkitapi import TextAnalysis
with TextAnalysis(api_key="tk_...") as ta:
result = ta.language(text="Bonjour tout le monde", top_n=3)
print(result["detected"], result["confidence"])
See drilldowns for endpoint-specific request and response fields.
Tip
Language detection requires at least 10 characters for reliable scoring. For very short inputs, aggregate adjacent text before calling the endpoint.