Similarity and Diff

2 endpoints for pairwise text comparison and line-level diffing.

Method Endpoint Purpose
POST /v1/text/similarity Compute similarity with Levenshtein, cosine, or Jaccard
POST /v1/text/diff Return unified and structured text differences

REST API Examples

Compare text similarity

curl -X POST "https://textanalysis.toolkitapi.io/v1/text/similarity" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text1": "The cat sat on the mat", "text2": "The cat sat on a rug", "method": "cosine"}'
const resp = await fetch("https://textanalysis.toolkitapi.io/v1/text/similarity", {
  method: "POST",
  headers: { "X-API-Key": "YOUR_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({ text1: "Hello world", text2: "Hello earth", method: "cosine" }),
});
const data = await resp.json();
console.log(`Similarity: ${data.score}`);

Python SDK Examples

Similarity by method

from toolkitapi import TextAnalysis

a = "The release includes search improvements and bug fixes."
b = "This release adds better search and fixes several bugs."

with TextAnalysis(api_key="tk_...") as ta:
    for method in ["levenshtein", "cosine", "jaccard"]:
        result = ta.similarity(a=a, b=b, method=method)
        print(method, result["similarity"])

Unified diff

from toolkitapi import TextAnalysis

original = "line one\nline two\nline three\n"
modified = "line one\nline 2\nline three\nline four\n"

with TextAnalysis(api_key="tk_...") as ta:
    diff = ta.diff(a=original, b=modified, context_lines=2)

print(diff["is_identical"])
print(diff["statistics"])
print(diff["unified_diff"])

Request Parameters

POST /v1/text/similarity

Parameter Type Description
a string First text, max 1048576 characters
b string Second text, max 1048576 characters
method string levenshtein, cosine, or jaccard

POST /v1/text/diff

Parameter Type Description
a string Original text, max 1048576 characters
b string Modified text, max 1048576 characters
context_lines integer Context lines in unified diff, 0 to 50

Response Fields

Similarity

Field Type Description
method string Algorithm used
similarity number Similarity score in range 0 to 1
distance integer Levenshtein distance (Levenshtein only)
max_length integer Max input length (Levenshtein only)

Diff

Field Type Description
unified_diff string Unified diff text
changes array Structured operations (insert, delete, replace)
statistics.additions integer Added line count
statistics.deletions integer Deleted line count
statistics.unchanged integer Unchanged line count
is_identical boolean True when no changes detected

Tip

Use cosine similarity for semantic overlap in longer prose, and Levenshtein when you need edit-distance sensitivity for short strings such as titles or identifiers.