Broken Links

Fetch a URL, extract up to 50 unique outbound links from its <a href> tags, HEAD-check each in parallel, and report which are broken.

"Broken" = the final response is ≥ 400, or the request timed out, or the host failed to resolve / the TLS handshake failed. Mailto, tel, javascript, and in-page (#fragment) links are skipped.

Use it for a one-click health report, a CI gate on your docs site, or as a scheduled sweep of your most-visited pages. For the fuller on-page SEO picture, combine with /v1/seo/audit.

Endpoint

GET /v1/seo/broken-links

Base URL: https://seo.toolkitapi.io

Query Parameters

Field Type Required Description
url string Yes Absolute URL whose links will be checked. http:// or https://.

The endpoint always checks at most 50 unique outbound links per call (the first 50 in document order after dedupe). mailto:, tel:, javascript:, and #fragment hrefs are skipped. Each link is HEAD-checked with a 10 s timeout, following redirects.

Response Fields

Field Type Description
url string The page URL you submitted.
total_links integer Total <a href> count on the page (before skipping mailto/tel/JS/fragment and dedupe).
checked_links integer How many links were actually HEAD-checked (≤ 50).
broken_count integer Number of entries in links with is_broken: true.
links LinkResult[] One entry per checked link. See below.

LinkResult

Field Type Description
url string The absolute URL that was checked (after resolving against the page's base URL).
anchor_text string Link anchor text, trimmed and truncated to 100 chars.
status_code integer HTTP status from the HEAD request. 0 = timeout. -1 = network/connection error (DNS, TLS, reset).
is_broken boolean true when status_code >= 400 or status_code <= 0.

Examples

curl

curl -G "https://seo.toolkitapi.io/v1/seo/broken-links" \
  -H "x-api-key: $TOOLKIT_API_KEY" \
  --data-urlencode "url=https://example.com/blog/post-123"

Python

import requests

resp = requests.get(
    "https://seo.toolkitapi.io/v1/seo/broken-links",
    params={"url": "https://example.com/blog/post-123"},
    headers={"x-api-key": API_KEY},
    timeout=60,  # up to 50 links × 10s each, run concurrently — budget generously
)
data = resp.json()

print(f"{data['broken_count']}/{data['checked_links']} broken "
      f"(total links on page: {data['total_links']})")

for link in data["links"]:
    if link["is_broken"]:
        reason = {0: "timeout", -1: "connection error"}.get(
            link["status_code"], f"HTTP {link['status_code']}"
        )
        print(f"  [{reason}] {link['url']}  «{link['anchor_text']}»")

JavaScript

const params = new URLSearchParams({ url: pageUrl });

const resp = await fetch(
  `https://seo.toolkitapi.io/v1/seo/broken-links?${params}`,
  { headers: { "x-api-key": process.env.TOOLKIT_API_KEY } },
);
const { total_links, checked_links, broken_count, links } = await resp.json();

// Fail a preview-deploy CI run if this page introduces a broken link.
if (broken_count > 0) {
  const lines = links.filter(l => l.is_broken)
                     .map(l => `  - ${l.status_code}  ${l.url}`);
  core.setFailed(
    `${broken_count} broken link(s) on ${pageUrl}\n${lines.join("\n")}`,
  );
}

Errors

Status Condition
400 url is malformed, unsupported scheme, or resolves to a disallowed/internal host (SSRF-blocked).
401 Missing / invalid API key.
422 url query parameter is missing.
502 The page itself could not be fetched (the outbound link checks each report their own status).

Notes & gotchas

  • total_links vs checked_links. total_links counts every <a href> on the page (including mailto/tel/javascript/fragment). checked_links is capped at 50 and is what the links array corresponds to. A page with 400 anchors will only surface up to 50 HEAD-check results.
  • Order matters. Links are kept in document order and deduplicated. If you need to cover more than 50 links, split your page or crawl section-by-section — there is no paging on this endpoint.
  • status_code: 0 is a timeout (10 s per link); status_code: -1 is a connection / protocol error. Both count as broken. For genuinely slow-but-working links you may see repeated 0s — run the check again before alerting.
  • Some sites reject HEAD requests outright (older WordPress, a few CDNs) and return 405 or even 404 for HEAD while GET works fine. Treat persistent 405s as likely-false-positives and spot-check them manually.
  • The checker sends a User-Agent of WebScrapingToolkit/1.0 and follows redirects. Links behind Cloudflare "I'm under attack" mode or JS challenges will often return 403.
  • The endpoint is synchronous — it blocks until all 50 HEADs complete (or time out). Budget up to ~10 s per call; use client-side concurrency to scan many pages at once.
  • Links are deduplicated by absolute URL, so a navbar link that appears on every page still counts once per page.