Filtering and Frequency¶
3 endpoints for content sanitization and lexical frequency analysis.
| Method |
Endpoint |
Purpose |
POST |
/v1/text/pii-mask |
Detect and mask PII patterns |
POST |
/v1/text/profanity |
Detect and optionally mask profane words |
POST |
/v1/text/word-frequency |
Return top word frequencies and percentages |
Python SDK Examples¶
Mask PII¶
from toolkitapi import TextAnalysis
text = "Contact [email protected], SSN 123-45-6789, card 4111 1111 1111 1111"
with TextAnalysis(api_key="tk_...") as ta:
result = ta.pii_mask(
text=text,
mask_char="*",
types=["email", "ssn", "credit_card"],
)
print(result["masked_text"])
print(result["detection_count"])
Profanity detection only¶
from toolkitapi import TextAnalysis
with TextAnalysis(api_key="tk_...") as ta:
check = ta.profanity(text="sample text", check_only=True)
print(check["is_profane"], check["profanity_count"])
Profanity cleaning¶
from toolkitapi import TextAnalysis
with TextAnalysis(api_key="tk_...") as ta:
cleaned = ta.profanity(text="sample text", mask_char="#", check_only=False)
print(cleaned["cleaned_text"])
Word frequency¶
from toolkitapi import TextAnalysis
text = "logs logs traces alerts logs traces dashboards"
with TextAnalysis(api_key="tk_...") as ta:
result = ta.word_frequency(
text=text,
top_n=10,
min_length=3,
exclude_stop_words=True,
)
print(result["total_words"], result["unique_words"])
for item in result["frequencies"]:
print(item["word"], item["count"], item["percentage"])
Request Parameters¶
POST /v1/text/pii-mask¶
| Parameter |
Type |
Description |
text |
string |
Input text, max 1048576 characters |
mask_char |
string |
Single mask character |
types |
array |
Any of email, phone, ssn, credit_card, ipv4, date_of_birth |
POST /v1/text/profanity¶
| Parameter |
Type |
Description |
text |
string |
Input text, max 1048576 characters |
mask_char |
string |
Single replacement character |
check_only |
boolean |
If true, detect only and do not replace |
POST /v1/text/word-frequency¶
| Parameter |
Type |
Description |
text |
string |
Input text, max 1048576 characters |
top_n |
integer |
Number of terms to return, 1 to 500 |
min_length |
integer |
Minimum token length, 1 to 50 |
exclude_stop_words |
boolean |
Remove common English stop words |
Response Fields¶
PII mask¶
| Field |
Type |
Description |
masked_text |
string |
Text after masking |
detections |
array |
Match objects with type, original, masked, start, end |
detection_count |
integer |
Number of detections |
types_checked |
array |
PII types evaluated |
Profanity¶
| Field |
Type |
Description |
is_profane |
boolean |
Whether profanity exists |
profanity_count |
integer |
Number of matched words |
matches |
array |
Matched word objects |
cleaned_text |
string |
Cleaned output when check_only is false |
Word frequency¶
| Field |
Type |
Description |
total_words |
integer |
Total tokens analyzed |
unique_words |
integer |
Unique tokens |
top_n |
integer |
Returned terms count |
frequencies |
array |
Objects with word, count, and percentage |
Tip
Run pii-mask before storing logs or forwarding payloads to third-party systems. Keeping raw identifiers out of observability pipelines reduces compliance risk.