Similarity and Diff¶
2 endpoints for pairwise text comparison and line-level diffing.
| Method |
Endpoint |
Purpose |
POST |
/v1/text/similarity |
Compute similarity with Levenshtein, cosine, or Jaccard |
POST |
/v1/text/diff |
Return unified and structured text differences |
Python SDK Examples¶
Similarity by method¶
from toolkitapi import TextAnalysis
a = "The release includes search improvements and bug fixes."
b = "This release adds better search and fixes several bugs."
with TextAnalysis(api_key="tk_...") as ta:
for method in ["levenshtein", "cosine", "jaccard"]:
result = ta.similarity(a=a, b=b, method=method)
print(method, result["similarity"])
Unified diff¶
from toolkitapi import TextAnalysis
original = "line one\nline two\nline three\n"
modified = "line one\nline 2\nline three\nline four\n"
with TextAnalysis(api_key="tk_...") as ta:
diff = ta.diff(a=original, b=modified, context_lines=2)
print(diff["is_identical"])
print(diff["statistics"])
print(diff["unified_diff"])
Request Parameters¶
POST /v1/text/similarity¶
| Parameter |
Type |
Description |
a |
string |
First text, max 1048576 characters |
b |
string |
Second text, max 1048576 characters |
method |
string |
levenshtein, cosine, or jaccard |
POST /v1/text/diff¶
| Parameter |
Type |
Description |
a |
string |
Original text, max 1048576 characters |
b |
string |
Modified text, max 1048576 characters |
context_lines |
integer |
Context lines in unified diff, 0 to 50 |
Response Fields¶
Similarity¶
| Field |
Type |
Description |
method |
string |
Algorithm used |
similarity |
number |
Similarity score in range 0 to 1 |
distance |
integer |
Levenshtein distance (Levenshtein only) |
max_length |
integer |
Max input length (Levenshtein only) |
Diff¶
| Field |
Type |
Description |
unified_diff |
string |
Unified diff text |
changes |
array |
Structured operations (insert, delete, replace) |
statistics.additions |
integer |
Added line count |
statistics.deletions |
integer |
Deleted line count |
statistics.unchanged |
integer |
Unchanged line count |
is_identical |
boolean |
True when no changes detected |
Tip
Use cosine similarity for semantic overlap in longer prose, and Levenshtein when you need edit-distance sensitivity for short strings such as titles or identifiers.