Broken Links¶

Fetch a URL, extract up to 50 unique outbound links from its <a href> tags, HEAD-check each in parallel, and report which are broken.

"Broken" = the final response is ≥ 400, or the request timed out, or the host failed to resolve / the TLS handshake failed. Mailto, tel, javascript, and in-page (#fragment) links are skipped.

Use it for a one-click health report, a CI gate on your docs site, or as a scheduled sweep of your most-visited pages. For the fuller on-page SEO picture, combine with /v1/seo/audit.

Endpoint¶

GET /v1/seo/broken-links

Base URL: https://seo.toolkitapi.io

Query Parameters¶

Field	Type	Required	Description
`url`	string	Yes	Absolute URL whose links will be checked. `http://` or `https://`.

The endpoint always checks at most 50 unique outbound links per call (the first 50 in document order after dedupe). mailto:, tel:, javascript:, and #fragment hrefs are skipped. Each link is HEAD-checked with a 10 s timeout, following redirects.

Response Fields¶

Field	Type	Description
`url`	string	The page URL you submitted.
`total_links`	integer	Total `<a href>` count on the page (before skipping mailto/tel/JS/fragment and dedupe).
`checked_links`	integer	How many links were actually HEAD-checked (≤ 50).
`broken_count`	integer	Number of entries in `links` with `is_broken: true`.
`links`	`LinkResult[]`	One entry per checked link. See below.

`LinkResult`¶

Field	Type	Description
`url`	string	The absolute URL that was checked (after resolving against the page's base URL).
`anchor_text`	string	Link anchor text, trimmed and truncated to 100 chars.
`status_code`	integer	HTTP status from the HEAD request. `0` = timeout. `-1` = network/connection error (DNS, TLS, reset).
`is_broken`	boolean	`true` when `status_code >= 400` or `status_code <= 0`.

Examples¶

curl¶

curl -G "https://seo.toolkitapi.io/v1/seo/broken-links" \
  -H "x-api-key: $TOOLKIT_API_KEY" \
  --data-urlencode "url=https://example.com/blog/post-123"

Python¶

import requests

resp = requests.get(
    "https://seo.toolkitapi.io/v1/seo/broken-links",
    params={"url": "https://example.com/blog/post-123"},
    headers={"x-api-key": API_KEY},
    timeout=60,  # up to 50 links × 10s each, run concurrently — budget generously
)
data = resp.json()

print(f"{data['broken_count']}/{data['checked_links']} broken "
      f"(total links on page: {data['total_links']})")

for link in data["links"]:
    if link["is_broken"]:
        reason = {0: "timeout", -1: "connection error"}.get(
            link["status_code"], f"HTTP {link['status_code']}"
        )
        print(f"  [{reason}] {link['url']}  «{link['anchor_text']}»")

JavaScript¶

const params = new URLSearchParams({ url: pageUrl });

const resp = await fetch(
  `https://seo.toolkitapi.io/v1/seo/broken-links?${params}`,
  { headers: { "x-api-key": process.env.TOOLKIT_API_KEY } },
);
const { total_links, checked_links, broken_count, links } = await resp.json();

// Fail a preview-deploy CI run if this page introduces a broken link.
if (broken_count > 0) {
  const lines = links.filter(l => l.is_broken)
                     .map(l => `  - ${l.status_code}  ${l.url}`);
  core.setFailed(
    `${broken_count} broken link(s) on ${pageUrl}\n${lines.join("\n")}`,
  );
}

Errors¶

Status	Condition
`400`	`url` is malformed, unsupported scheme, or resolves to a disallowed/internal host (SSRF-blocked).
`401`	Missing / invalid API key.
`422`	`url` query parameter is missing.
`502`	The page itself could not be fetched (the outbound link checks each report their own status).

Notes & gotchas¶

total_links vs checked_links. total_links counts every <a href> on the page (including mailto/tel/javascript/fragment). checked_links is capped at 50 and is what the links array corresponds to. A page with 400 anchors will only surface up to 50 HEAD-check results.
Order matters. Links are kept in document order and deduplicated. If you need to cover more than 50 links, split your page or crawl section-by-section — there is no paging on this endpoint.
status_code: 0 is a timeout (10 s per link); status_code: -1 is a connection / protocol error. Both count as broken. For genuinely slow-but-working links you may see repeated 0s — run the check again before alerting.
Some sites reject HEAD requests outright (older WordPress, a few CDNs) and return 405 or even 404 for HEAD while GET works fine. Treat persistent 405s as likely-false-positives and spot-check them manually.
The checker sends a User-Agent of WebScrapingToolkit/1.0 and follows redirects. Links behind Cloudflare "I'm under attack" mode or JS challenges will often return 403.
The endpoint is synchronous — it blocks until all 50 HEADs complete (or time out). Budget up to ~10 s per call; use client-side concurrency to scan many pages at once.
Links are deduplicated by absolute URL, so a navbar link that appears on every page still counts once per page.

Keyword Density

Page Speed