Filtering and Frequency

3 endpoints for content sanitization and lexical frequency analysis.

Method Endpoint Purpose
POST /v1/text/pii-mask Detect and mask PII patterns
POST /v1/text/profanity Detect and optionally mask profane words
POST /v1/text/word-frequency Return top word frequencies and percentages

Python SDK Examples

Mask PII

from toolkitapi import TextAnalysis

text = "Contact [email protected], SSN 123-45-6789, card 4111 1111 1111 1111"

with TextAnalysis(api_key="tk_...") as ta:
    result = ta.pii_mask(
        text=text,
        mask_char="*",
        types=["email", "ssn", "credit_card"],
    )

print(result["masked_text"])
print(result["detection_count"])

Profanity detection only

from toolkitapi import TextAnalysis

with TextAnalysis(api_key="tk_...") as ta:
    check = ta.profanity(text="sample text", check_only=True)

print(check["is_profane"], check["profanity_count"])

Profanity cleaning

from toolkitapi import TextAnalysis

with TextAnalysis(api_key="tk_...") as ta:
    cleaned = ta.profanity(text="sample text", mask_char="#", check_only=False)

print(cleaned["cleaned_text"])

Word frequency

from toolkitapi import TextAnalysis

text = "logs logs traces alerts logs traces dashboards"

with TextAnalysis(api_key="tk_...") as ta:
    result = ta.word_frequency(
        text=text,
        top_n=10,
        min_length=3,
        exclude_stop_words=True,
    )

print(result["total_words"], result["unique_words"])
for item in result["frequencies"]:
    print(item["word"], item["count"], item["percentage"])

Request Parameters

POST /v1/text/pii-mask

Parameter Type Description
text string Input text, max 1048576 characters
mask_char string Single mask character
types array Any of email, phone, ssn, credit_card, ipv4, date_of_birth

POST /v1/text/profanity

Parameter Type Description
text string Input text, max 1048576 characters
mask_char string Single replacement character
check_only boolean If true, detect only and do not replace

POST /v1/text/word-frequency

Parameter Type Description
text string Input text, max 1048576 characters
top_n integer Number of terms to return, 1 to 500
min_length integer Minimum token length, 1 to 50
exclude_stop_words boolean Remove common English stop words

Response Fields

PII mask

Field Type Description
masked_text string Text after masking
detections array Match objects with type, original, masked, start, end
detection_count integer Number of detections
types_checked array PII types evaluated

Profanity

Field Type Description
is_profane boolean Whether profanity exists
profanity_count integer Number of matched words
matches array Matched word objects
cleaned_text string Cleaned output when check_only is false

Word frequency

Field Type Description
total_words integer Total tokens analyzed
unique_words integer Unique tokens
top_n integer Returned terms count
frequencies array Objects with word, count, and percentage

Tip

Run pii-mask before storing logs or forwarding payloads to third-party systems. Keeping raw identifiers out of observability pipelines reduces compliance risk.