Similarity and Diff¶

2 endpoints for pairwise text comparison and line-level diffing.

Method	Endpoint	Purpose
`POST`	`/v1/text/similarity`	Compute similarity with Levenshtein, cosine, or Jaccard
`POST`	`/v1/text/diff`	Return unified and structured text differences

REST API Examples¶

Compare text similarity¶

curl -X POST "https://textanalysis.toolkitapi.io/v1/text/similarity" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text1": "The cat sat on the mat", "text2": "The cat sat on a rug", "method": "cosine"}'

const resp = await fetch("https://textanalysis.toolkitapi.io/v1/text/similarity", {
  method: "POST",
  headers: { "X-API-Key": "YOUR_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({ text1: "Hello world", text2: "Hello earth", method: "cosine" }),
});
const data = await resp.json();
console.log(`Similarity: ${data.score}`);

Python SDK Examples¶

Similarity by method¶

from toolkitapi import TextAnalysis

a = "The release includes search improvements and bug fixes."
b = "This release adds better search and fixes several bugs."

with TextAnalysis(api_key="tk_...") as ta:
    for method in ["levenshtein", "cosine", "jaccard"]:
        result = ta.similarity(a=a, b=b, method=method)
        print(method, result["similarity"])

Unified diff¶

from toolkitapi import TextAnalysis

original = "line one\nline two\nline three\n"
modified = "line one\nline 2\nline three\nline four\n"

with TextAnalysis(api_key="tk_...") as ta:
    diff = ta.diff(a=original, b=modified, context_lines=2)

print(diff["is_identical"])
print(diff["statistics"])
print(diff["unified_diff"])

Request Parameters¶

POST /v1/text/similarity¶

Parameter	Type	Description
`a`	string	First text, max 1048576 characters
`b`	string	Second text, max 1048576 characters
`method`	string	`levenshtein`, `cosine`, or `jaccard`

POST /v1/text/diff¶

Parameter	Type	Description
`a`	string	Original text, max 1048576 characters
`b`	string	Modified text, max 1048576 characters
`context_lines`	integer	Context lines in unified diff, 0 to 50

Response Fields¶

Similarity¶

Field	Type	Description
`method`	string	Algorithm used
`similarity`	number	Similarity score in range 0 to 1
`distance`	integer	Levenshtein distance (Levenshtein only)
`max_length`	integer	Max input length (Levenshtein only)

Diff¶

Field	Type	Description
`unified_diff`	string	Unified diff text
`changes`	array	Structured operations (`insert`, `delete`, `replace`)
`statistics.additions`	integer	Added line count
`statistics.deletions`	integer	Deleted line count
`statistics.unchanged`	integer	Unchanged line count
`is_identical`	boolean	True when no changes detected

Tip

Use cosine similarity for semantic overlap in longer prose, and Levenshtein when you need edit-distance sensitivity for short strings such as titles or identifiers.

Readability and Summarization

Filtering and Frequency