Broken Links¶
Fetch a URL, extract up to 50 unique outbound links from its <a href> tags, HEAD-check each in parallel, and report which are broken.
"Broken" = the final response is ≥ 400, or the request timed out, or the host failed to resolve / the TLS handshake failed. Mailto, tel, javascript, and in-page (#fragment) links are skipped.
Use it for a one-click health report, a CI gate on your docs site, or as a scheduled sweep of your most-visited pages. For the fuller on-page SEO picture, combine with /v1/seo/audit.
Endpoint¶
GET /v1/seo/broken-links
Base URL: https://seo.toolkitapi.io
Query Parameters¶
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | Absolute URL whose links will be checked. http:// or https://. |
The endpoint always checks at most 50 unique outbound links per call (the first 50 in document order after dedupe). mailto:, tel:, javascript:, and #fragment hrefs are skipped. Each link is HEAD-checked with a 10 s timeout, following redirects.
Response Fields¶
| Field | Type | Description |
|---|---|---|
url |
string | The page URL you submitted. |
total_links |
integer | Total <a href> count on the page (before skipping mailto/tel/JS/fragment and dedupe). |
checked_links |
integer | How many links were actually HEAD-checked (≤ 50). |
broken_count |
integer | Number of entries in links with is_broken: true. |
links |
LinkResult[] |
One entry per checked link. See below. |
LinkResult¶
| Field | Type | Description |
|---|---|---|
url |
string | The absolute URL that was checked (after resolving against the page's base URL). |
anchor_text |
string | Link anchor text, trimmed and truncated to 100 chars. |
status_code |
integer | HTTP status from the HEAD request. 0 = timeout. -1 = network/connection error (DNS, TLS, reset). |
is_broken |
boolean | true when status_code >= 400 or status_code <= 0. |
Examples¶
curl¶
curl -G "https://seo.toolkitapi.io/v1/seo/broken-links" \
-H "x-api-key: $TOOLKIT_API_KEY" \
--data-urlencode "url=https://example.com/blog/post-123"
Python¶
import requests
resp = requests.get(
"https://seo.toolkitapi.io/v1/seo/broken-links",
params={"url": "https://example.com/blog/post-123"},
headers={"x-api-key": API_KEY},
timeout=60, # up to 50 links × 10s each, run concurrently — budget generously
)
data = resp.json()
print(f"{data['broken_count']}/{data['checked_links']} broken "
f"(total links on page: {data['total_links']})")
for link in data["links"]:
if link["is_broken"]:
reason = {0: "timeout", -1: "connection error"}.get(
link["status_code"], f"HTTP {link['status_code']}"
)
print(f" [{reason}] {link['url']} «{link['anchor_text']}»")
JavaScript¶
const params = new URLSearchParams({ url: pageUrl });
const resp = await fetch(
`https://seo.toolkitapi.io/v1/seo/broken-links?${params}`,
{ headers: { "x-api-key": process.env.TOOLKIT_API_KEY } },
);
const { total_links, checked_links, broken_count, links } = await resp.json();
// Fail a preview-deploy CI run if this page introduces a broken link.
if (broken_count > 0) {
const lines = links.filter(l => l.is_broken)
.map(l => ` - ${l.status_code} ${l.url}`);
core.setFailed(
`${broken_count} broken link(s) on ${pageUrl}\n${lines.join("\n")}`,
);
}
Errors¶
| Status | Condition |
|---|---|
400 |
url is malformed, unsupported scheme, or resolves to a disallowed/internal host (SSRF-blocked). |
401 |
Missing / invalid API key. |
422 |
url query parameter is missing. |
502 |
The page itself could not be fetched (the outbound link checks each report their own status). |
Notes & gotchas¶
total_linksvschecked_links.total_linkscounts every<a href>on the page (including mailto/tel/javascript/fragment).checked_linksis capped at 50 and is what thelinksarray corresponds to. A page with 400 anchors will only surface up to 50 HEAD-check results.- Order matters. Links are kept in document order and deduplicated. If you need to cover more than 50 links, split your page or crawl section-by-section — there is no paging on this endpoint.
status_code: 0is a timeout (10 s per link);status_code: -1is a connection / protocol error. Both count as broken. For genuinely slow-but-working links you may see repeated 0s — run the check again before alerting.- Some sites reject
HEADrequests outright (older WordPress, a few CDNs) and return 405 or even 404 forHEADwhileGETworks fine. Treat persistent 405s as likely-false-positives and spot-check them manually. - The checker sends a
User-AgentofWebScrapingToolkit/1.0and follows redirects. Links behind Cloudflare "I'm under attack" mode or JS challenges will often return403. - The endpoint is synchronous — it blocks until all 50 HEADs complete (or time out). Budget up to ~10 s per call; use client-side concurrency to scan many pages at once.
- Links are deduplicated by absolute URL, so a navbar link that appears on every page still counts once per page.