Similarity and Diff

2 endpoints for pairwise text comparison and line-level diffing.

Method Endpoint Purpose
POST /v1/text/similarity Compute similarity with Levenshtein, cosine, or Jaccard
POST /v1/text/diff Return unified and structured text differences

Python SDK Examples

Similarity by method

from toolkitapi import TextAnalysis

a = "The release includes search improvements and bug fixes."
b = "This release adds better search and fixes several bugs."

with TextAnalysis(api_key="tk_...") as ta:
    for method in ["levenshtein", "cosine", "jaccard"]:
        result = ta.similarity(a=a, b=b, method=method)
        print(method, result["similarity"])

Unified diff

from toolkitapi import TextAnalysis

original = "line one\nline two\nline three\n"
modified = "line one\nline 2\nline three\nline four\n"

with TextAnalysis(api_key="tk_...") as ta:
    diff = ta.diff(a=original, b=modified, context_lines=2)

print(diff["is_identical"])
print(diff["statistics"])
print(diff["unified_diff"])

Request Parameters

POST /v1/text/similarity

Parameter Type Description
a string First text, max 1048576 characters
b string Second text, max 1048576 characters
method string levenshtein, cosine, or jaccard

POST /v1/text/diff

Parameter Type Description
a string Original text, max 1048576 characters
b string Modified text, max 1048576 characters
context_lines integer Context lines in unified diff, 0 to 50

Response Fields

Similarity

Field Type Description
method string Algorithm used
similarity number Similarity score in range 0 to 1
distance integer Levenshtein distance (Levenshtein only)
max_length integer Max input length (Levenshtein only)

Diff

Field Type Description
unified_diff string Unified diff text
changes array Structured operations (insert, delete, replace)
statistics.additions integer Added line count
statistics.deletions integer Deleted line count
statistics.unchanged integer Unchanged line count
is_identical boolean True when no changes detected

Tip

Use cosine similarity for semantic overlap in longer prose, and Levenshtein when you need edit-distance sensitivity for short strings such as titles or identifiers.