Filtering and Frequency¶

3 endpoints for content sanitization and lexical frequency analysis.

Method	Endpoint	Purpose
`POST`	`/v1/text/mask`	Detect and mask PII patterns
`POST`	`/v1/text/profanity`	Detect and optionally mask profane words
`POST`	`/v1/text/word-frequency`	Return top word frequencies and percentages

REST API Examples¶

Mask PII¶

curl -X POST "https://textanalysis.toolkitapi.io/v1/text/mask" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Contact [email protected] or call 555-0100"}'

Check profanity¶

curl -X POST "https://textanalysis.toolkitapi.io/v1/text/profanity" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Some text to check"}'

const resp = await fetch("https://textanalysis.toolkitapi.io/v1/text/profanity", {
  method: "POST",
  headers: { "X-API-Key": "YOUR_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({ text: "Some text to check" }),
});
const data = await resp.json();
console.log(`Contains profanity: ${data.contains_profanity}`);

Word frequency¶

curl -X POST "https://textanalysis.toolkitapi.io/v1/text/word-frequency" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "The API returns structured data about DNS records and domain data"}'

Python SDK Examples¶

Mask PII¶

from toolkitapi import TextAnalysis

text = "Contact [email protected], SSN 123-45-6789, card 4111 1111 1111 1111"

with TextAnalysis(api_key="tk_...") as ta:
    result = ta.pii_mask(
        text=text,
        mask_char="*",
        types=["email", "ssn", "credit_card"],
    )

print(result["masked_text"])
print(result["detection_count"])

Profanity detection only¶

from toolkitapi import TextAnalysis

with TextAnalysis(api_key="tk_...") as ta:
    check = ta.profanity(text="sample text", check_only=True)

print(check["is_profane"], check["profanity_count"])

Profanity cleaning¶

from toolkitapi import TextAnalysis

with TextAnalysis(api_key="tk_...") as ta:
    cleaned = ta.profanity(text="sample text", mask_char="#", check_only=False)

print(cleaned["cleaned_text"])

Word frequency¶

from toolkitapi import TextAnalysis

text = "logs logs traces alerts logs traces dashboards"

with TextAnalysis(api_key="tk_...") as ta:
    result = ta.word_frequency(
        text=text,
        top_n=10,
        min_length=3,
        exclude_stop_words=True,
    )

print(result["total_words"], result["unique_words"])
for item in result["frequencies"]:
    print(item["word"], item["count"], item["percentage"])

Request Parameters¶

POST /v1/text/mask¶

Parameter	Type	Description
`text`	string	Input text, max 1048576 characters
`mask_char`	string	Single mask character
`types`	array	Any of `email`, `phone`, `ssn`, `credit_card`, `ipv4`, `date_of_birth`

POST /v1/text/profanity¶

Parameter	Type	Description
`text`	string	Input text, max 1048576 characters
`mask_char`	string	Single replacement character
`check_only`	boolean	If true, detect only and do not replace

POST /v1/text/word-frequency¶

Parameter	Type	Description
`text`	string	Input text, max 1048576 characters
`top_n`	integer	Number of terms to return, 1 to 500
`min_length`	integer	Minimum token length, 1 to 50
`exclude_stop_words`	boolean	Remove common English stop words

Response Fields¶

PII mask¶

Field	Type	Description
`masked_text`	string	Text after masking
`detections`	array	Match objects with type, original, masked, start, end
`detection_count`	integer	Number of detections
`types_checked`	array	PII types evaluated

Profanity¶

Field	Type	Description
`is_profane`	boolean	Whether profanity exists
`profanity_count`	integer	Number of matched words
`matches`	array	Matched word objects
`cleaned_text`	string	Cleaned output when check_only is false

Word frequency¶

Field	Type	Description
`total_words`	integer	Total tokens analyzed
`unique_words`	integer	Unique tokens
`top_n`	integer	Returned terms count
`frequencies`	array	Objects with `word`, `count`, and `percentage`

Tip

Run pii-mask before storing logs or forwarding payloads to third-party systems. Keeping raw identifiers out of observability pipelines reduces compliance risk.

Similarity and Diff

Language and Transliteration