What's new in Screaming Frog SEO Spider 19.8

Mar 12, 2024
  • Fixed issue not being able to pause a crawl when API data is loaded post crawl.
  • Fixed issue with double hyphen not working in some URLs.
  • Fixed crash entering file:// prefixed URLs in Visual Custom Extraction, this is no longer allowed.
  • Fixed crash when dealing wtih really large robots.txt files.

New in Screaming Frog SEO Spider 19.7 (Mar 11, 2024)

  • Fixed issue not being able to pause a crawl when API data is loaded post crawl.
  • Fixed issue with double hyphen not working in some URLs.
  • Fixed crash entering file:// prefixed URLs in Visual Custom Extraction, this is no longer allowed.
  • Fixed crash on macOS when the app looses focus.
  • Fixed crash when dealing wtih really large robots.txt files.

New in Screaming Frog SEO Spider 19.6 (Feb 21, 2024)

  • Bug fixes.

New in Screaming Frog SEO Spider 19.5 (Feb 12, 2024)

  • Bug fixes.

New in Screaming Frog SEO Spider 19.4 (Nov 7, 2023)

  • Fixed an issue with JavaScript rendering and redirects.
  • Fixed an issue causing a crash when seleting the bounce rate metric in GA4.

New in Screaming Frog SEO Spider 19.3 (Oct 31, 2023)

  • Added Subscription, Paywalled Content and Vehicle Listing rich result features to structured data validation.
  • Removed Google How-To feature.
  • Updated to Schema.org version 23.
  • Bring Sitemap only user-agent directive handling in line with Googlebot’s updated behaviour.
  • Introduced synced scrolling of duplicate content frames in the Duplicate Details Tab.
  • Updated ‘View > Reset Columns for All Tables’ to also reset visibility if disabled.
  • Added right-hand Segments Tab & Overview filter to ‘Reports’ to allow exporting in Scheduling / CLI.
  • Added indication as to why ‘OK’ button is disabled when there is an error on another tab in system config.
  • Fixed issue with page transfer size missing in rendered crawls.
  • Fixed issue with image in ‘Missing Alt Attribute’ filter during JavaScript rendering.
  • Fixed issue with PageSpeed tab not appearing if you enable PSI mid-crawl with focus mode enabled.
  • Fixed issue with Microdata not parsing.
  • Fixed issue with unreadable scheduled crawl history errors.
  • Fixed issue where Forms Auth was not using the user-agent from the working config.
  • Fixed issue with various sites not loading in forms based authentication.
  • Fixed various disaply issues with the config windows.
  • Fixed various crashes.
  • Update Chrome to resolve WebP exploit CVE-2023-4863.

New in Screaming Frog SEO Spider 19.2 (Aug 31, 2023)

  • Bug fixes.

New in Screaming Frog SEO Spider 19.1 (Jul 27, 2023)

  • Bug fixes.

New in Screaming Frog SEO Spider 19.0 (Jul 17, 2023)

  • Updated Design, Unified Config, Segments, Visual Custom Extraction, 3D Visualisations, New Filters & Issues.

New in Screaming Frog SEO Spider 18.5 (Apr 26, 2023)

  • Bug fixes.

New in Screaming Frog SEO Spider 18.4 (Mar 16, 2023)

  • Bug fixes.

New in Screaming Frog SEO Spider 18.3 (Mar 8, 2023)

  • Bug fixes

New in Screaming Frog SEO Spider 18.2 (Feb 13, 2023)

  • Bug fixes.

New in Screaming Frog SEO Spider 18.1 (Dec 14, 2022)

  • Bug fixes.

New in Screaming Frog SEO Spider 18.0 (Dec 5, 2022)

  • GA4 Integration:
  • It’s taken a little while, but like most SEOs, we’ve finally come to terms that we’ll have to actually switch to GA4. You’re now able to (begrudgingly) connect to GA4 and pull in analytics data in a crawl via their new API.
  • Connect via ‘Config > API Access > GA4’, select from 65 available metrics, and adjust the date and dimensions.
  • Similar to the existing UA integration, data will quickly appear under the ‘Analytics’ and Internal tabs when you start crawling in real-time.
  • You can apply ‘filter’ dimensions like in the GA UI, including first user, or session channel grouping with dimension values, such as ‘organic search’ to refine to a specific channel.
  • If there are any other dimensions or filters you’d like to see supported, then do let us know.
  • Parse PDFs:
  • PDFs are not the sexiest thing in the world, but due to the number of corporates and educational institutions that have requested this over the years, we felt compelled to provide support parsing them. The SEO Spider will now crawl PDFs, discover links within them and show the document title as the page title.
  • This means users can check to see whether links within PDFs are functioning as expected and issues like broken links will be reported in the usual way in the Response Codes tab. The outlinks tab will be populated, and include details such as response codes, anchor text and even what page of the PDF a link is on.
  • You can also choose to ‘Extract PDF Properties’ and ‘Store PDF’ under ‘Config > Spider > Extraction’ and the PDF subject, author, created and modified dates, page count and word count will be stored.
  • PDFs can be bulk saved and exported via ‘Bulk Export > Web > All PDF Documents’.
  • If you’re interested in how search engines crawl and index PDFs, check out a couple of tweets where we shared some insights from internal experiments for both Google and Bing.
  • Validation Tab:
  • There’s a new Validation tab, which performs some basic best practice validations that can impact crawlers when crawling and indexing. This isn’t W3C HTML validation which is a little too strict, the aim of this tab is to identify issues that can impact search bots from being able to parse and understand a page reliably.
  • Most SEOs know about invalid HTML elements in the head causing it to close early, but there are other interesting fix-ups and quirks that both browsers like Chrome (and subsequently) Google do if it sees a non-head element prior to the head in the HTML (it creates its own blank head), or if there are multiple, or missing HTML elements etc.
  • The new filters include –
  • Invalid HTML Elements In <head> – Pages with invalid HTML elements within the <head>. When an invalid element is used in the <head>, Google assumes the end of the <head> element and ignores any elements that appear after the invalid element. This means critical <head> elements that appear after the invalid element will not be seen. The <head> element as per the HTML standard is reserved for title, meta, link, script, style, base, noscript and template elements only.
  • <head> Not First In <html> Element – Pages with an HTML element that proceed the <head> element in the HTML. The <head> should be the first element in the <html> element. Browsers and Googlebot will automatically generate a <head> element if it’s not first in the HTML. While ideally <head> elements would be in the <head>, if a valid <head> element is first in the <html> it will be considered as part of the generated <head>. However, if non <head> elements such as <p>, <body>, <img> etc are used before the intended <head> element and its metadata, then Google assumes the end of the <head> element. This means the intended <head> element and its metadata may only be seen in the <body> and ignored.
  • Missing <head> Tag – Pages missing a <head> element within the HTML. The <head> element is a container for metadata about the page, that’s placed between the <html> and <body> tag. Metadata is used to define the page title, character set, styles, scripts, viewport and other data that are critical to the page. Browsers and Googlebot will automatically generate a <head> element if it’s omitted in the markup, however it may not contain meaningful metadata for the page and this should not be relied upon.
  • Multiple <head> Tags – Pages with multiple <head> elements in the HTML. There should only be one <head> element in the HTML which contains all critical metadata for the document. Browsers and Googlebot will combine metadata from subsequent <head> elements if they are both before the <body>, however, this should not be relied upon and is open to potential mix-ups. Any <head> tags after the <body> starts will be ignored.
  • Missing <body> Tag – Pages missing a <body> element within the HTML. The <body> element contains all the content of a page, including links, headings, paragraphs, images and more. There should be one <body> element in the HTML of the page. Browsers and Googlebot will automatically generate a <body> element if it’s omitted in the markup, however, this should not be relied upon.
  • Multiple <body> Tags – Pages with multiple <body> elements in the HTML. There should only be one <body> element in the HTML which contains all content for the document. Browsers and Googlebot will try to combine content from subsequent <body> elements, however, this should not be relied upon and is open to potential mix-ups.
  • HTML Document Over 15MB – Pages which are over 15MB in document size. This is important as Googlebot limit their crawling and indexing to the first 15MB of an HTML file or supported text-based file. This size does not include resources referenced in the HTML such as images, videos, CSS, and JavaScript that are fetched separately. Google only considers the first 15MB of the file for indexing and stops crawling afterwards. The file size limit is applied on the uncompressed data. The median size of an HTML file is about 30 kilobytes (KB), so pages are highly unlikely to reach this limit.
  • We plan on extending our validation checks and filters over time.
  • In-App Updates:
  • Every time we release an update there will always be one or two users that remind us that they have to painstakingly visit our website, and click a button to download and install the new version.
  • WHY do we have to put them through this torture?
  • The simple answer is that historically we’ve thought it wasn’t a big deal and it’s a bit of a boring enhancement to prioritise over so many other super cool features we could build. With that said, we do listen to our users, so we went ahead and prioritised the boring-but-useful feature.
  • You will now be alerted in-app when there’s a new version available, which will have already silently downloaded in the background. You can then install in a few clicks.
  • We’re planning on switching our installer, so the number of clicks required to install and auto-restart will be implemented soon, too. We can barely contain our excitement.
  • Authentication for Scheduling / CLI:
  • Previously, the only way to authenticate via scheduling or the CLI was to supply an ‘Authorization’ HTTP header with a username and password via the HTTP header config, which worked for standards based authentication – rather than web forms.
  • We’ve now made this much simpler, and not just for basic or digest authentication, but web form authentication as well. In ‘Config > Authentication’, you can now provide the username and password for any standards based authentication, which will be remembered so you only need to provide it once.
  • You can also login as usual via ‘Forms Based’ authentication and the cookies will be stored.
  • When you have provided the relevant details or logged in, you can visit the new ‘Profiles’ tab, and export a new .seospiderauthconfig file.
  • This file which has saved authentication for both standards and forms based authentication can then be supplied in scheduling, or the CLI.
  • This means for scheduled or automated crawls the SEO Spider can login to not just standards based authentication, but web forms where feasible as well.
  • New Filters & Issues:
  • There’s a variety of new filters and issues available across existing tabs that help better filter data, or communicate issues discovered.
  • Many of these were already available either via another filter, or from an existing report like ‘Redirect Chains’. However, they now have their own dedicated filter and issue in the UI, to help raise awareness. These include –
  • ‘Response Codes > Redirect Chains’ – Internal URLs that redirect to another URL, which also then redirects. This can occur multiple times in a row, each redirect is referred to as a ‘hop’. Full redirect chains can be viewed and exported via ‘Reports > Redirects > Redirect Chains’.
  • ‘Response Codes > Redirect Loop’ – Internal URLs that redirect to another URL, which also then redirects. This can occur multiple times in a row, each redirect is referred to as a ‘hop’. This filter will only populate if a URL redirects to a previous URL within the redirect chain. Redirect chains with a loop can be viewed and exported via ‘Reports > Redirects > Redirect Chains’ with the ‘Loop’ column filtered to ‘True’.
  • ‘Images > Background Images’ – CSS background and dynamically loaded images discovered across the website, which should be used for non-critical and decorative purposes. Background images are not typically indexed by Google and browsers do not provide alt attributes or text on background images to assistive technology.
  • ‘Canonicals > Multiple Conflicting’ – Pages with multiple canonicals set for a URL that have different URLs specified (via either multiple link elements, HTTP header, or both combined). This can lead to unpredictability, as there should only be a single canonical URL set by a single implementation (link element, or HTTP header) for a page.
  • ‘Canonicals > Canonical Is Relative’ – Pages that have a relative rather than absolute rel=”canonical” link tag. While the tag, like many HTML tags, accepts both relative and absolute URLs, it’s easy to make subtle mistakes with relative paths that could cause indexing-related issues.
  • ‘Canonicals > Unlinked’ – URLs that are only discoverable via rel=”canonical” and are not linked-to via hyperlinks on the website. This might be a sign of a problem with internal linking, or the URLs contained in the canonical.
  • ‘Links > Non-Indexable Page Inlinks Only’ – Indexable pages that are only linked-to from pages that are non-indexable, which includes noindex, canonicalised or robots.txt disallowed pages. Pages with noindex and links from them will initially be crawled, but noindex pages will be removed from the index and be crawled less over time. Links from these pages may also be crawled less and it has been debated by Googlers whether links will continue to be counted at all. Links from canonicalised pages can be crawled initially, but PageRank may not flow as expected if indexing and link signals are passed to another page as indicated in the canonical. This may impact discovery and ranking. Robots.txt pages can’t be crawled, so links from these pages will not be seen.
  • Flesch Readability Scores:
  • Flesch readability scores are now calculated and included within the ‘Content‘ tab with new filters for ‘Readability Difficult’ and Readability Very Difficult’.
  • Please note, the readability scores are suited for English language, and we may provide support to additional languages or alternative readability scores for other languages in the future.
  • Readability scores can be disabled under ‘Config > Spider > Extraction’.
  • Other Updates:
  • Auto Complete URL Bar:
  • The URL bar will now show suggested URLs to enter as you type based upon previous URL bar history, which a user can quickly select to help save precious seconds.
  • Response Code Colours for Visualisations:
  • You’re now able to select to ‘Use Response Code Node Colours’ in crawl visualisations.
  • This means nodes for no responses, 2XX, 3XX, 4XX and 5XX buckets will be coloured individually, to help users spot issues related to responses more effectively.
  • XML Sitemap Source In Scheduling:
  • You can now choose an XML Sitemap URL as the source in scheduling and via the CL in list mode like the regular UI.

New in Screaming Frog SEO Spider 17.2 (Sep 12, 2022)

  • Bug fixes.

New in Screaming Frog SEO Spider 17.1 (Aug 23, 2022)

  • Bug fixes.

New in Screaming Frog SEO Spider 17.0 (Aug 17, 2022)

  • Issues Tab, Links Tab, New Limits, ‘Multiple Properties’ Config For URL Inspection API, Apple Silicon Version & RPM for Fedora and Detachable Tabs.

New in Screaming Frog SEO Spider 16.7 (Mar 2, 2022)

  • This release is mainly bug fixes and small improvements:
  • URL inspection can now be resumed from a saved crawl.
  • The automated Screaming Frog Data Studio Crawl Report now has a URL Inspection page.
  • Added ‘Days Since Last Crawl’ column for the URL Inspection integration.
  • Added URL Inspection data to the lower ‘URL Details’ tab.
  • Translations are now available for the URL Inspection integration.
  • Fixed a bug moving tabs and filters related to URL Inspection in scheduling.
  • Renamed two ‘Search Console’ filters – ‘No Search Console Data’ to ‘No Search Analytics Data’ and ‘Non-Indexable with Search Console Data’ to ‘Non-Indexable with Search Analytics Data’ to be more specific regarding the API used.
  • Fix crash loading scheduled tasks.
  • Fix crash removing URLs.

New in Screaming Frog SEO Spider 16.6 (Mar 2, 2022)

  • Includes URL Inspection API integration

New in Screaming Frog SEO Spider 16.5 (Dec 21, 2021)

  • Update to Apache log4j 2.17.0 to fix CVE-2021-45046 and CVE-2021-45105.
  • Show more detailed crawl analysis progress in the bottom status bar when active.
  • Fix JavaScript rendering issues with POST data.
  • Improve Google Sheets exporting when Google responds with 403s and 502s.
  • Be more tolerant of leading/trailing spaces for all tab and filter names when using the CLI.
  • Add auto naming for GSC accounts, to avoid tasks clashing.
  • Fix crash running link score on crawls with URLs that have a status of “Rendering Failed”.

New in Screaming Frog SEO Spider 16.4 (Dec 14, 2021)

  • Bug fixes.

New in Screaming Frog SEO Spider 16.3 (Nov 4, 2021)

  • This release is mainly bug fixes and small improvements:
  • The Google Search Console integration now has new filters for search type (Discover, Google News, Web etc) and supports regex as per the recent Search Analytics API update.
  • Fix issue with Shopify and CloudFront sites loading in Forms Based authentication browser.
  • Fix issue with cookies not being displayed in some cases.
  • Give unique names to Google Rich Features and Google Rich Features Summary report file names.
  • Set timestamp on URLs loaded as part of JavaScript rendering.
  • Fix crash running on macOS Monetery.
  • Fix right click focus in visualisations.
  • Fix crash in Spelling and Grammar UI.
  • Fix crash when exporting invalid custom extraction tabs on the CLI.
  • Fix crash when flattening shadow DOM.
  • Fix crash generating a crawl diff.
  • Fix crash when the Chromium can’t be initialised.

New in Screaming Frog SEO Spider 16.2 (Oct 18, 2021)

  • Fix issue with corrupt fonts for some users.
  • Fix bug in the UI that allowed you to schedule a crawl without a crawl seed in Spider Mode.
  • Fix stall opening saved crawls.
  • Fix issues with upgrades of database crawls using excessive disk space.
  • Fix issue with exported HTML visualisations missing pop up help.
  • Fix issue with PSI going too fast.
  • Fix issue with Chromium requesting webcam access.
  • Fix crash when cancelling an export.
  • Fix crash during JavaScript crawling.
  • Fix crash accessing visualisations configuration using languages other then English.

New in Screaming Frog SEO Spider 16.1 (Sep 27, 2021)

  • Updated some Spanish translations based on feedback.
  • Updated SERP Snippet preview to be more in sync with current SERPs.
  • Fix issue preventing the Custom Crawl Overview report for Data Studio working in languages other than English.
  • Fix crash resuming crawls with saved Internal URL configuration.
  • Fix crash caused by highlighting a selection then clicking another cell in both list and tree views.
  • Fix crash duplicating a scheduled crawl.
  • Fix crash during JavaScript crawl.

New in Screaming Frog SEO Spider 16.0 (Sep 22, 2021)

  • Improved JavaScript Crawling
  • Automated Crawl Reports For Data Studio
  • Advanced Search & Filtering and Translated UI

New in Screaming Frog SEO Spider 15.2 (May 18, 2021)

  • Bug fixes.

New in Screaming Frog SEO Spider 15.1 (Apr 14, 2021)

  • Bug fixes.

New in Screaming Frog SEO Spider 15.0 (Apr 12, 2021)

  • Crawl Comparison, Site Structure Comparison, Change Detection and URL Mapping For Staging / Different URL Structure Comparison.

New in Screaming Frog SEO Spider 14.3 (Mar 17, 2021)

  • Bug fixes.

New in Screaming Frog SEO Spider 14.2 (Feb 18, 2021)

  • Core Web Vitals Assessment, Broken Bookmarks (or ‘Jump Links’) and bug fixes

New in Screaming Frog SEO Spider 14.1 (Dec 7, 2020)

  • Bug fixes.

New in Screaming Frog SEO Spider 14.0 (Nov 23, 2020)

  • Dark Mode, Google Sheets Export, HTTP Headers, Cookies, Aggregated Site Structure and New Configuration Options.

New in Screaming Frog SEO Spider 13.2 (Aug 4, 2020)

  • Bug fixes.

New in Screaming Frog SEO Spider 13.1 (Jul 15, 2020)

  • Bug fixes

New in Screaming Frog SEO Spider 13.0 (Jul 1, 2020)

  • Near Duplicate Content, Spelling & Grammar, Improved Link Data – Link Position, Path Type & Target, Security Checks and Improved UX Bits.

New in Screaming Frog SEO Spider 12.6 (Feb 3, 2020)

  • Keywords, traffic and value data, which is now available courtesy of the Ahrefs API.

New in Screaming Frog SEO Spider 12.1 (Oct 25, 2019)

  • Bug fixes.

New in Screaming Frog SEO Spider 12.0 (Oct 22, 2019)

  • PageSpeed Insights, Database Storage Crawl Auto Saving & Rapid Opening, Configurable Tabs, Configurable Page Elements, Configurable Link Elements For Focused Auditing In List Mode, More Extractors, Custom Search Improvements and Redirect Chain Report Improvements.

New in Screaming Frog SEO Spider 11.1 (Mar 13, 2019)

  • Bug fixes

New in Screaming Frog SEO Spider 11.0 (Mar 5, 2019)

  • Structured Data & Validation, Multi-Select Details & Bulk Exporting, Tree-View Export, Visualisations Improvements, Smart Drag & Drop and Configure Internal CDNs.

New in Screaming Frog SEO Spider 10.3 (Oct 24, 2018)

  • Bug fixes.

New in Screaming Frog SEO Spider 10.1 (Oct 3, 2018)

  • Bug Fixes

New in Screaming Frog SEO Spider 9.0 (May 29, 2018)

  • Database Storage Mode (Scale), In-App Memory Allocation, Store & View HTML & Rendered HTML, Custom HTTP Headers, XML Sitemap Improvements, Granular Search Functionality, Updated SERP Snippet Emulator and Post Crawl API Requests.

New in Screaming Frog SEO Spider 8.1 (Jul 27, 2017)

  • Bug fixes.

New in Screaming Frog SEO Spider 8.0 (Jul 18, 2017)

  • Updated User Interface:
  • The SEO Spider has ruined many an SEO’s slide deck over the years, with its retro (yet, beautifully functional) user interface, and we felt it was finally deserving of an update. However, please don’t panic – it retains the core usability and data led functionality of what made the interface loved by users. And, it still stays true to its fairly retro styling. As you can see, it’s a little more modern and has splashes of colour, but now also takes advantage of new technologies in the updated framework, and works with HDPI monitors by default.
  • External Link Metrics Integration:
  • You can now connect to Majestic, Ahrefs and Moz APIs and pull in external link metrics during a crawl. This has been a much-requested feature and is extremely useful for performing a content audit, or quickly bulk checking link metrics against a list of URLs. When you have connected to an API, link metrics will appear in real time, under the new ‘Link Metrics’ tab and in the ‘Internal’ tab, so they can be combined with all the usual crawl and analytical data. We’ve now also introduced an ‘API’ tab into the right-hand window pane, to allow users to keep an eye on progress.
  • You will be required to have an account with the tool providers to pull in data using your own API credentials. Each of the tools offer different functionality and metrics from their APIs, and you’re able to customize what data you want to pull in.
  • The SEO Spider will calculate the API usage of pulling data based upon your API plan (where possible via the API), and can even combine link counts for HTTP and HTTPS versions of URLs for Majestic and Ahrefs to help you save time.
  • Moz is the only tool with a free (slower, and limited API), as well as a paid plan, which you can select and allows requests to be super fast. o you can pull in Moz metrics such as Page Authority, Domain Authority, or Spam Score and lots more.
  • Custom Configuration Profiles:
  • You can already adjust and save your configuration to be the default, however, we know users want to be able to switch between multiple set-ups quickly, depending on the crawl type, client or objective. Hence, you are now able to create multiple custom configuration profiles and seamlessly switch between them.
  • There isn’t a limit to the number of profiles, you can create as many as you like. The custom configuration profiles are saved within your user directory, so you can also copy and share your favourite profiles with colleagues for them to load and use.
  • JavaScript Redirects:
  • The SEO Spider will now discover and report on JavaScript redirects. The SEO Spider was the first commercial crawler with JavaScript rendering, and this functionality has been advanced further to help identify client-side redirects, which is another first.
  • While not strictly speaking a response code, they can be viewed under the ‘Response Codes’ tab and ‘Redirection (JavaScript)’ filter. Meta Refreshes are now also included within this area and treated in a similar way to regular server-side and client-side redirect reporting.
  • HSTS Support:
  • HTTP Strict Transport Security (HSTS) is a server directive that forces all connections over HTTPS. If any ‘insecure’ links are discovered in a crawl with a Strict-Transport-Security header set, the SEO Spider will show a 307 response with a status message of ‘HSTS Policy’.
  • The SEO Spider will request the HTTPS version as instructed, but highlight this with a 307 response (inline with browsers, such as Chrome), to help identify when HSTS and insecure links are used (rather than just requesting the secure version, and not highlighting that insecure links actually exist).
  • The search engines and browsers will only request the HTTPS version, so obviously the 307 response HSTS policy should not be considered as a real temporary redirect and ‘a redirect to fix’.
  • Hreflang Auditing In XML Sitemaps:
  • The SEO Spider already extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header, and will now for XML Sitemaps in list mode as well.
  • There’s now an extra column under the ‘hreflang’ tab, for ‘Sitemap hreflang’ which allows users to audit for common issues, such as missing confirmation links, incorrect language codes, not using the canonical, and much more.
  • Fetch & Render Screenshots Exporting:
  • You can view the rendered page the SEO Spider crawled in the ‘Rendered Page’ tab at the bottom of the user interface when crawling in JavaScript rendering mode. These rendered screenshots are now viewable within the ‘C:UsersUser Name.ScreamingFrogSEOSpiderscreenshots-XXXXXXXXXXXXXXX’ folder, and can be exported via the ‘bulk export > Screenshots’ top level menu, to save navigating, copying and pasting. The rendered screenshots are stored on a temporary basis to this directory during a crawl, and while it’s still open. They will only be saved if you save the crawl project.
  • Other Updates:
  • The ‘Internal’ tab now has new columns for ‘Unique Inlinks’, ‘Unique Outlinks’ and ‘Unique External Outlinks’ numbers. The unique number of ‘inlinks’ was previously only available within the ‘Site Structure’ tab and displays a percentage of the overall number of pages linking to a page.
  • A new ‘Noindex Confirmation Links’ filter is available within the ‘Hreflang’ tab and corresponding export in the ‘Reports > Hreflang > Noindex Confirmation Links’ menu.
  • An ‘Occurences’ column has been added to the Hreflang tab to count the number on each page and identify potential problems.
  • A new ‘Encoded URL’ column has been added to ‘Internal’ and ‘Response Codes’ tab.
  • The ‘Level’ column has been renamed to ‘Crawl Depth’ to avoid confusion & support queries.
  • There’s a new ‘External Links’ export under the ‘Bulk Export’ top level menu, which provides all source pages with external links.
  • The SERP Snippet tool has been updated to refine pixel widths within the SERPs.
  • Java is now bundled with the SEO Spider, so it doesn’t have to be downloaded separately anymore.
  • Added a new preset user-agent for SeznamBot (for a search engine in the Czech Republic). Thanks to Jaroslav for the suggestion.
  • The insecure content report now includes hreflang and rel=“next” and rel=“prev” links.
  • You can highlight multiple rows, right click and open them all in a browser now.
  • List mode now supports Sitemap Index files (alongside usual sitemap .xml files).
  • We also fixed up some bugs:
  • Fixed a couple of crashes in JavaScript rendering.
  • Fixed parsing of query strings in the canonical HTTP header.
  • Fixed a bug with missing confirmation links of external URLs.
  • Fixed a few crashes in Xpath and in GA integration.
  • Fixed filtering out custom headers in rendering requests, causing some rendering to fail.

New in Screaming Frog SEO Spider 7.2 (Jan 30, 2017)

  • This release includes:
  • Prevent printing when using JavaScript rendering.
  • Prevent playing of audio when using JavaScript rendering.
  • Fix issue with SERP panel truncating.
  • Fix crash in hreflang processing.
  • Fix crash in tree view when moving columns.
  • Fix hreflang ‘missing confirmation links’ filter not checking external URLs.
  • Fix status code of ‘illegal cookie’.
  • Fix crash when going to ‘Configuration > API Access > Google Analytics’.
  • Fix crash when sorting on the redirect column.
  • Fix crash in custom extraction.
  • Fix ‘Enable Rendered Page Screen Shots’ setting not saving.
  • Fix ‘Inconsistent Language Confirmation Links’ report, reporting the wrong ‘Actual Language’.
  • Fix: NullPointerException when saving a crawl.

New in Screaming Frog SEO Spider 7.1 (Dec 15, 2016)

  • Show decoded versions of hreflang URLs in the UI.
  • Fix issue with connecting to a SSLv3 only web servers.
  • Handle standards based authentication when performing forms based authentication.
  • Handle popup windows when peforming forms based authenticaion.
  • Fix typo in hreflang filter.

New in Screaming Frog SEO Spider 7.0 (Dec 12, 2016)

  • ‘Fetch & Render’ (Rendered Screen Shots):
  • You can now view the rendered page the SEO Spider crawled in the new ‘Rendered Page’ tab which dynamically appears at the bottom of the user interface when crawling in JavaScript rendering mode. This populates the lower window pane when selecting URLs in the top window. This feature is enabled by default when using the new JavaScript rendering functionality, and allows you to set the AJAX timeout and viewport size to view and test various scenarios. With Google’s much discussed mobile first index, this allows you to set the user-agent and viewport as Googlebot Smartphone and see exactly how every page renders on mobile. Viewing the rendered page is vital when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t rely on Google’s own Fetch & Render in Search Console.
  • Blocked Resources:
  • The SEO Spider now reports on blocked resources, which can be seen individually for each page within the ‘Rendered Page’ tab, adjacent to the rendered screen shots. The blocked resources can also be seen under ‘Response Codes > Blocked Resource’ tab and filter. The pages this impacts and the individual blocked resources can also be exported in bulk via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.
  • Custom robots.txt:
  • You can download, edit and test a site’s robots.txt using the new custom robots.txt feature under ‘Configuration > robots.txt > Custom’. The new feature allows you to add multiple robots.txt at subdomain level, test directives in the SEO Spider and view URLs which are blocked or allowed. During a crawl you can filter blocked URLs based upon the custom robots.txt (‘Response Codes > Blocked by robots.txt’) and see the matches robots.txt directive line.
  • Custom robots.txt is a useful alternative if you’re uncomfortable using the regex exclude feature, or if you’d just prefer to use robots.txt directives to control a crawl.
  • The custom robots.txt uses the selected user-agent in the configuration, and works well with the new fetch and render feature, where you can test how a web page might render with blocked resources.
  • We considered including a check for a double UTF-8 byte order mark (BOM), which can be a problem for Google. According to the spec, it invalidates the line – however this will generally only ever be due to user error. We don’t have any problem parsing it and believe Google should really update their behaviour to make up for potential mistakes.
  • Please note – The changes you make to the robots.txt within the SEO Spider, do not impact your live robots.txt uploaded to your server.
  • hreflang Attributes:
  • First of all, apologies this one has been a long time coming. The SEO Spider now extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header. They are also extracted from Sitemaps when crawled in list mode.
  • While users have historically used custom extraction to collect hreflang, by default these can now be viewed under the ‘hreflang’ tab, with filters for common issues.
  • While hreflang is a fairly simple concept, there’s plenty of issues that can be encountered in the implementation. We believe this is the most comprehensive auditing for hreflang currently available anywhere and includes checks for missing confirmation links, inconsistent languages, incorrect language/regional codes, non-canonical confirmation links, multiple entries, missing self-reference, not using the canonical, missing the x-default, and missing hreflang completely.
  • Additionally, there are four new hreflang reports available to allow data to be exported in bulk (under the ‘reports’ top level menu) –
  • Errors – This report shows any hreflang attributes which are not a 200 response (no response, blocked by robots.txt, 3XX, 4XX or 5XX responses) or are unlinked on the site.
  • Missing Confirmation Links – This report shows the page missing a confirmation link, and which page requires it.
  • Inconsistent Language Confirmation Links – This report shows confirmation pages which use different language codes to the same page.
  • Non Canonical Confirmation Links – This report shows the confirmation links which are to non canonical URLs.
  • This feature can be fairly resource-intensive on large sites, so extraction and crawling are entirely configurable under ‘Configuration > Spider’.
  • rel=”next” and rel=”prev” Errors:
  • This report highlights errors and issues with rel=”next” and rel=”prev” attributes, which are of course used to indicate paginated content.
  • The report will show any rel=”next” and rel=”prev” URLs which have a no response, blocked by robots.txt, 3XX redirect, 4XX, or 5XX error (anything other than a 200 ‘OK’ response).
  • This report also provides data on any URLs which are discovered only via a rel=”next” and rel=”prev” attribute and are not linked-to from the site (in the ‘unlinked’ column when ‘true’).
  • Maintain List Order Export:
  • One of our most requested features has been the ability to maintain the order of URLs when uploaded in list mode, so users can then export the data in the same order and easily match it up against the original data.
  • Unfortunately it’s not as simple as keeping the order within the interface, as the SEO Spider performs some normalisation under the covers and removes duplicates, which meant it made more sense to produce a way to export data in the original order.
  • Hence, we have introduced a new ‘export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface (when in list mode) which produces an export with data in the same order as it was uploaded.
  • The data in the export will be in the same order and include all of the exact URLs in the original upload, including duplicates or any fix-ups performed.
  • Web Forms Authentication (Crawl Behind A Login):
  • The SEO Spider has supported basic and digest standards-based authentication for some time, which enables users to crawl staging and development sites. However, there are other web forms and areas which require you to log in with cookies which have been inaccessible, until now.
  • We have introduced a new ‘authentication’ configuration (under ‘Configuration > Authentication), which allows users to log in to any web form within the SEO Spider Chromium browser, and then crawl it. This means virtually all password-protected areas, intranets and anything which requires a web form login can now be crawled. Please note – This feature is extremely powerful and often areas behind logins will contain links to actions which a user doesn’t want to press (for example ‘delete’). The SEO Spider will obviously crawl every link, so please use responsibly, and not on your precious fantasy football team. With great power comes great responsibility(!). You can block the SEO Spider from crawling links or areas by using the exclude or custom robots.txt.
  • Other Updates:
  • All images now appear under the ‘Images’ tab. Previously the SEO Spider would only show ‘internal’ images from the same subdomain under the ‘images’ tab. All other images would appear under the ‘external’ tab. We’ve changed this behaviour as it was outdated, so now all images appear under ‘images’ regardless.
  • The URL rewriting ‘remove parameters’ input is now a blank field (similar to ‘include‘ and ‘exclude‘ configurations), which allows users to bulk upload parameters one per line, rather than manually inputting and entering each separate parameter.
  • The SEO Spider will now find the page title element anywhere in the HTML (not just the HEAD), like Googlebot. Not that we recommend having it anywhere else!
  • Introduced tri-state row sorting, allowing users to clear a sort and revert back to crawl order.
  • The maximum XML sitemap size has been increased to 50MB from 10MB, in line with Sitemaps.org updated protocol.
  • Fixed a crash in custom extraction!
  • Fixed a crash when using the date range Google Analytics configuration.
  • Fixed exports ignoring column order and visibility.
  • Fixed cookies set via JavaScript not working in rendered mode.
  • Fixed issue where SERP title and description widths were different for master view and SERP Snippet table on Windows for Thai language.

New in Screaming Frog SEO Spider 6.2 (Aug 16, 2016)

  • Fix for several crashes.
  • Fix for the broken unavailable_after in the directives filter.
  • Multiple extractions instances are now grouped together.
  • Export now respects column order and visibility preferences.

New in Screaming Frog SEO Spider 6.1 (Aug 3, 2016)

  • Java 8 update 66 is now required on all platforms, as this update fixes several bugs in Java
  • Reduced certificate verification to be more tolerant when crawling HTTPS sites
  • Fixed a crash when using the date range configuration for Google Analytics integration
  • Fixed an issue with the lower window pane obscuring the main data window for some users
  • Fixed a crash in custom extraction
  • Fixed an issue in JavaScript rendering mode with the JS navigator.userAgent not being set correctly, causing sites performing UA profiling in JavaScript to miss fire
  • Fixed crash when starting a crawl without a selection in the overview window
  • Fixed an issue with being too strict on parsing title tags. Google seem to use them regardless of valid HTML head elements
  • Fixed a crash for Windows XP/Vista/Server 2013/Linux 32 bit users, which are not supported for rendering mode

New in Screaming Frog SEO Spider 6.0 (Jul 28, 2016)

  • Rendered Crawling (JavaScript):
  • There were two things we set out to do at the start of the year. Firstly, understand exactly what the search engines are able to crawl and index. This is why we created the Screaming Frog Log File Analyser, as a crawler will only ever be a simulation of search bot behaviour
  • Secondly, we wanted to crawl rendered pages and read the DOM. It’s been known for a long time that Googlebot acts more like a modern day browser, rendering content, crawling and indexing JavaScript and dynamically generated content rather well. The SEO Spider is now able to render and crawl web pages in a similar way
  • You can choose whether to crawl the static HTML, obey the old AJAX crawling scheme or fully render web pages, meaning executing and crawling of JavaScript and dynamic content
  • Google deprecated their old AJAX crawling scheme and we have seen JavaScript frameworks such as AngularJS (with links or utilising the HTML5 History API) crawled, indexed and ranking like a typical static HTML site. I highly recommend reading Adam Audette’s Googlebot JavaScript testing from last year if you’re not already familiar.
  • After much research and testing, we integrated the Chromium project library for our rendering engine to emulate Google as closely as possible. Some of you may remember the excellent ‘Googlebot is Chrome‘ post from Mike King back in 2011 which discusses Googlebot essentially being a headless browser.
  • The new rendering mode is really powerful, but there are a few things to remember –
  • Typically crawling is slower even though it’s still multi threaded, as the SEO Spider has to wait longer for the content to load and gather all the resources to be able to render a page. Our internal testing suggests Google wait approximately 5 seconds for a page to render, so this is the default AJAX timeout in the SEO Spider. Google may adjust this based upon server response and other signals, so you can configure this to your own requirements if a site is slower to load a page.
  • The crawling experience is quite different as it can take time for anything to appear in the UI to start with, then all of a sudden lots of URLs appear together at once. This is due to the SEO Spider waiting for all the resources to be fetched to render a page, before the data is displayed.
  • To be able to render content properly, resources such as JavaScript and CSS should not be blocked from the SEO Spider. You can see URLs blocked by robots.txt (and the corresponding robots.txt disallow line) under ‘Response Codes > Blocked By Robots.txt’. You should also make sure that you crawl JS, CSS and external resources in the SEO Spider configuration.
  • It’s also important to note that as the SEO Spider renders content like a browser from your machine, so this can impact analytics and anything else that relies upon JavaScript.
  • By default the SEO Spider excludes executing of Google Analytics JavaScript tags within its engine, however if a site is using other analytics solutions or JavaScript that shouldn’t be executed, remember to use the exclude feature.
  • Configurable Columns & Ordering:
  • You’re now able to configure which columns are displayed in each tab of the SEO Spider (by clicking the ‘+’ in the top window pane).
  • You can also drag and drop the columns into any order and this will be remembered (even after a restart).
  • To revert back to the default columns and ordering, simply right click on the ‘+’ symbol and click ‘Reset Columns’ or click on ‘Configuration > User Interface > Reset Columns For All Tables’.
  • XML Sitemap & Sitemap Index Crawling:
  • The SEO Spider already allows crawling of XML sitemaps in list mode, by uploading the .xml file (number 8 in the ‘10 features in the SEO Spider you should really know‘ post) which was always a little clunky to have to save it if it was already live (but handy when it wasn’t uploaded!).
  • So we’ve now introduced the ability to enter a sitemap URL to crawl it (‘List Mode > Download Sitemap’).
  • Previously if a site had multiple sitemaps, you’d have to upload and crawl them separately as well.
  • Now if you have a sitemap index file to manage multiple sitemaps, you can enter the sitemap index file URL and the SEO Spider will download all sitemaps and subsequent URLs within them! This should help save plenty of time!
  • Improved Custom Extraction – Multiple Values & Functions:
  • We listened to feedback that users often wanted to extract multiple values, without having to use multiple extractors. For example, previously to collect 10 values, you’d need to use 10 extractors and index selectors ([1],[2] etc) with Xpath.
  • We’ve changed this behavior, so by default a single extractor will collect all values found and report them via a single extractor for XPath, CSS Path and Regex. If you have 20 hreflang values, you can use a single extractor to collect them all and the SEO Spider will dynamically add additional columns for however many are required. You’ll still have 9 extractors left to play with as well.
  • You can still choose to extract just the first instance by using an index selector as well.
  • Functions can also be used anywhere in Xpath, but you can now use it on its own as well via the ‘function value’ dropdown.
  • rel=“next” and rel=“prev” Elements Now Crawled:
  • The SEO Spider can now crawl rel=“next” and rel=“prev” elements whereas previously the tool merely reported them. Now if a URL has not already been discovered, the URL will be added to the queue and the URLs will be crawled if the configuration is enabled (‘Configuration > Spider > Basic Tab > Crawl Next/Prev’).
  • rel=“next” and rel=“prev” elements are not counted as ‘Inlinks’ (in the lower window tab) as they are not links in a traditional sense. Hence, if a URL does not have any ‘Inlinks’ in the crawl, it might well be due to discovery from a rel=“next” and rel=“prev” or a canonical. We recommend using the ‘Crawl Path Report‘ to show how the page was discovered, which will show the full path.
  • There’s also a new ‘respect next/prev’ configuration option (under ‘Configuration > Spider > Advanced tab’) which will hide any URLs with a ‘prev’ element, so they are not considered as duplicates of the first page in the series.
  • Updated SERP Snippet Emulator:
  • Earlier this year in May Google increased the column width of the organic SERPs from 512px to 600px on desktop, which means titles and description snippets are longer. Google displays and truncates SERP snippets based on characters’ pixel width rather than number of characters, which can make it challenging to optimise.
  • Our previous research showed Google used to truncate page titles at around 482px on desktop. With the change, we have updated our research and logic in the SERP snippet emulator to match Google’s new truncation point before an ellipses (…), which for page titles on desktop is around 570px.
  • Our research shows that while the space for descriptions has also increased they are still being truncated far earlier at a similar point to the older 512px width SERP. The SERP snippet emulator will only bold keywords within the snippet description, not in the title, in the same way as the Google SERPs.
  • Please note – You may occasionally see our SERP snippet emulator be a word out in either direction compared to what you see in the Google SERP. There will always be some pixel differences, which mean that the pixel boundary might not be in the exact same spot that Google calculate 100% of the time.
  • We are still seeing Google play to different rules at times as well, where some snippets have a longer pixel cut off point, particularly for descriptions! The SERP snippet emulator is therefore not always exact, but a good rule of thumb.
  • Other Updates:
  • A new ‘Text Ratio’ column has been introduced in the internal tab which calculates the text to HTML ratio.
  • Google updated their Search Analytics API, so the SEO Spider can now retrieve more than 5k rows of data from Search Console.
  • There’s a new ‘search query filter’ for Search Console, which allows users to include or exclude keywords (under ‘Configuration > API Access > Google Search Console > Dimension tab’). This should be useful for excluding brand queries for example.
  • There’s a new configuration to extract images from the IMG srcset attribute under ‘Configuration > Advanced’.
  • The new Googlebot smartphone user-agent has been included.
  • Updated our support for relative base tags.
  • Removed the blank line at the start of Excel exports.
  • Fixed a bug with word count which could make it less accurate.
  • Fixed a bug with GSC CTR numbers.

New in Screaming Frog SEO Spider 5.0 (Sep 8, 2015)

  • Google Search Analytics Integration:
  • You can now connect to the Google Search Analytics API and pull in impression, click, CTR and average position data from your Search Console profile. Alongside Google Analytics integration, this should be valuable for Panda and content audits respectively.
  • We were part of the Search Analytics beta, so have had this for some time internally, but delayed the release a little, while we finished off a couple of other new features detailed below, for a larger release
  • For those already familiar with our Google Analytics integration, the set-up is virtually the same. You just need to give permission to our app to access data under ‘Configuration > API Access > Google Search Console’ –
  • The Search Analytics API doesn’t provide us with the account name in the same way as the Analytics integration, so once connected it will appear as ‘New Account’, which you can rename manually for now.
  • You can then select the relevant site profile, date range, device results (desktop, tablet or mobile) and country filter. Similar again to our GA integration, we have some common URL matching scenarios covered, such as matching trailing and non trailing slash URLs and case sensitivity.
  • When you hit ‘Start’ and the API progress bar has reached 100%, data will appear in real time during the crawl under the ‘Search Console’ tab, and dynamically within columns at the far right in the ‘Internal’ tab if you’d like to export all data together.
  • There’s a couple of filters currently for ‘Clicks Above 0’ when a URL has at least a single click, and ‘No GSC Data’, when the Google Search Analytics API did not return any data for the URL.
  • The API is currently limited to 5k rows of data, which we hope Google will increase over time. We plan to extend our integration further as well, but at the moment the Search Console API is fairly limited.
  • View & Audit URLs Blocked By Robots.txt:
  • You can now view URLs disallowed by the robots.txt protocol during a crawl.
  • Disallowed URLs will appear with a ‘status’ as ‘Blocked by Robots.txt’ and there’s a new ‘Blocked by Robots.txt’ filter under the ‘Response Codes’ tab, where these can be viewed efficiently.
  • The ‘Blocked by Robots.txt’ filter also displays a ‘Matched Robots.txt Line’ column, which provides the line number and disallow path of the robots.txt entry that’s excluding each URL. This should make auditing robots.txt files simple!
  • Historically the SEO Spider hasn’t shown URLs that are disallowed by robots.txt in the interface (they were only available via the logs). I always felt that it wasn’t required as users should know already what URLs are being blocked, and whether robots.txt should be ignored in the configuration.
  • However, there are plenty of scenarios where using robots.txt to control crawling and understanding quickly what URLs are blocked by robots.txt is valuable, and it’s something that has been requested by users over the years. We have therefore introduced it as an optional configuration, for both internal and external URLs in a crawl. If you’d prefer to not see URLs blocked by robots.txt in the crawl, then simply untick the relevant boxes.
  • URLs which are linked to internally (or externally), but are blocked by robots.txt can obviously accrue PageRank, be indexed and appear under search. Google just can’t crawl the content of the page itself, or see the outlinks of the URL to pass the PageRank onwards. Therefore there is an argument that they can act as a bit of a dead end, so I’d recommend reviewing just how many are being disallowed, how well linked they are, and their depth for example.
  • GA & GSC Not Matched Report:
  • The ‘GA Not Matched’ report has been replaced with the new ‘GA & GSC Not Matched Report’ which now provides consolidated information on URLs discovered via the Google Search Analytics API, as well as the Google Analytics API, but were not found in the crawl.
  • This report can be found under ‘reports’ in the top level menu and will only populate when you have connected to an API and the crawl has finished.
  • There’s a new ‘source’ column next to each URL, which details the API(s) it was discovered (sometimes this can be both GA and GSC), but not found to match any URLs found within the crawl.
  • You can see in the example screenshot above from our own website, that there are some URLs with mistakes, a few orphan pages and URLs with hash fragments, which can show as quick links within meta descriptions (and hence why their source is GSC rather than GA).
  • I discussed how this data can be used in more detail within the version 4.1 release notes and it’s a real hidden gem, as it can help identify orphan pages, other errors, as well as just matching problems between the crawl and API(s) to investigate.
  • Configurable Accept-Language Header:
  • Google introduced local-aware crawl configurations earlier this year for pages believed to adapt content served, based on the request’s language and perceived location
  • This essentially means Googlebot can crawl from different IP addresses around the world and with an Accept-Language HTTP header in the request. Hence, like Googlebot, there are scenarios where you may wish to supply this header to crawl locale-adaptive content, with various language and region pairs. You can already use the proxy configuration to change your IP as well
  • You can find the new ‘Accept-Language’ configuration under ‘Configuration > HTTP Header > Accept-Language’
  • We have some common presets covered, but the combinations are huge, so there is a custom option available which you can just set to any value required.
  • Smaller Updates & Fixes:
  • The Analytics and Search Console tabs have been updated to allow URLs blocked by robots.txt to appear, which we believe to be HTML, based upon file type.
  • The maximum number of Google Analytics metrics you can collect from the API has been increased from 20 to 30. Google restrict the API to 10 metrics for each query, so if you select more than 10 metrics (or multiple dimensions), then we will make more queries (and it may take a little longer to receive the data).
  • With the introduction of the new ‘Accept-Language’ configuration, the ‘User-Agent’ configuration is now under ‘Configuration > HTTP Header > User-Agent’.
  • We added the ‘MJ12Bot’ to our list of preconfigured user-agents after a chat with our friends at Majestic.
  • Fixed a crash in XPath custom extraction.
  • Fixed a crash on start up with Windows Look & Feel and JRE 8 update 60.
  • Fixed a bug with character encoding.
  • Fixed an issue with Excel file exports, which write numbers with decimal places as strings, rather than numbers.
  • Fixed a bug with Google Analytics integration where the use of hostname in some queries was causing ‘Selected dimensions and metrics cannot be queried together errors’.

New in Screaming Frog SEO Spider 4.0 (Jul 10, 2015)

  • New Features:
  • Google Analytics Integration
  • Custom Extraction
  • Bug fixes and smaller updates:
  • Improved performance for users using large regex’s in the custom filter & fixed a bug not being able to resume crawls with these quickly.
  • Fixed an issue reported by Kev Strong, where the SEO Spider was unable to crawl urls with an underscore in the hostname.
  • Fixed X-Robots-Tags header to be case insensitive, as reported by Merlinox.
  • Fixed a URL encoding bug.
  • Fixed an bug where the SEO Spider didn’t recognise text/javascript as JavaScript.
  • Fixed a bug with displaying HTML content length as string length, rather than length in bytes.
  • Fixed a bug where manual entry in list mode doesn’t work if a file upload has happened previously.
  • Fixed a crash when opening the SEO Spider in SERP mode and hovering over bar graph which should then display a tooltip.

New in Screaming Frog SEO Spider 3.1 (Feb 24, 2015)

  • The insecure content report has been improved to also include canonicals. So if you have a secure HTTPS URL, with an insecure HTTP canonical, these will be identified within the ‘insecure content’ report now, as well.
  • Increased the size of the URL input field by 100px in Spider mode.
  • Fixed a bug with ‘Respect Canonicals’ option, not respecting HTTP Header Canonicals.
  • Fixed a bug with ‘Crawl Canonicals’ not crawling HTTP Header Canonicals.
  • Fixed a crash on Windows, when users try to use the ‘Windows look and feel’, but have an older version of Java, without JavaFX.
  • Fixed a bug where we were not respecting ‘nofollow’ directives in the X-Robots-Tag Header, as reported by Merlinox.
  • Fixed a bug with the Sitemaps file writing ‘priorities’ attribute with a comma, rather than a full stop, due to user locale.
  • Updated the progress percentage & average response time to format according to default locale.
  • Fixed a crash caused by parsing pages with an embed tag containing an invalid src attribute, eg embed src=”about:blank”.

New in Screaming Frog SEO Spider 3.00 (Feb 11, 2015)

  • Tree View:
  • You can now switch from the usual ‘list view’ of a crawl, to a more traditional directory ‘tree view’ format, while still mantaining the granular detail of each URL crawled you see in the standard list view. This additional view will hopefully help provide an alternative perspective when analysing a website’s architecture.
  • The SEO Spider doesn’t crawl this way natively, so switching to ‘tree view’ from ‘list view’ will take a little time to build, & you may see a progress bar on larger crawls for instance. This has been requested as a feature for quite sometime, so thanks to all for their feedback.
  • Insecure Content Report:
  • We have introduced a ‘protocol’ tab, to allow you to easily filter and analyse by secure and non secure URLs at a glance (as well as other protocols potentially in the future). As an extension to this, there’s also a new ‘insecure content’ report which will show any HTTPS URLs which have insecure elements on them. It’s very easy to miss some insecure content, which often only get picked up on go live in a browser. So if you’re working on HTTP to HTTPS migrations, this should be particularly useful. This report will identify any secure pages, which link out to insecure content, such as internal HTTP links, images, JS, CSS, external CDN’s, social profiles etc.
  • Image Sitemaps & Updated XML Sitemap Features:
  • You can now add images to your XML sitemap or create an image sitemap file. You now have the ability to include images which appear under the ‘internal’ tab from a normal crawl, or images which sit on a CDN. Typically you don’t want to include images like logos in an image sitemap, so you can also choose to only include images with a certain number of source attribute references. To help with this, we have introduced a new column in the ‘images’ tab which shows how many times an image is referenced (IMG Inlinks).
  • This is a nice easy way to exclude logos or social media icons, which are often linked to sitewide for example. You can also right-click and ‘remove’ any images or URLs you don’t want to include obviously too! The ‘IMG Inlinks’ is also very useful when viewing images with missing alt text, as you may wish to ignore social profiles without them etc.
  • There’s now also plenty more options when generating an XML sitemap. You can choose whether to include ‘noindex’, canonicalised, paginated or PDFs in the sitemap for example. Plus you now also have greater control over the lastmod, priority and change frequency.
  • Paste URLs In List Mode:
  • To help save time, you can now paste URLs directly into the SEO Spider in ‘list’ mode, or enter URLs manually (into a window) and upload a file like normal. Hopefully these additional options will be useful and help save time, particularly when you don’t want to save a file first to upload.
  • Improved Bulk Exporting:
  • We plan on making the exporting function entirely customisable, but for now bulk exporting has been improved so you can export all inlinks (or ‘source’ links) to the custom filter and directives, such as ‘noindex’ or ‘canonicalised’ pages if you wish to analyse crawl efficiency for example.
  • Windows Look & Feel:
  • There’s a new ‘user interface’ configuration for Windows only, that allows users to enable ‘Windows look and feel’. This will then adhere to the scaling settings a user has, which can be useful for some newer systems with very high resolutions.
  • Other Updates:
  • You can now view the ‘Last-Modified’ header response within a column in the ‘Internal’ tab. This can be helpful for tracking down new, old, or pages within a certain date range. ‘Response time’ of URLs has also been moved into the internal tab as well (which used to just be in the ‘Response Codes’ tab, thanks to RaphSEO for that one).
  • The parser has been updated so it’s less strict about the validity of HTML mark-up. For example, in the past if you had invalid HTML mark-up in the HEAD, page titles, meta descriptions or word count may not always be collected. Now the SEO Spider will simply ignore it and collect the content of elements regardless.
  • There’s now a ‘mobile-friendly’ entry in the description prefix dropdown menu of the SERP panel. From our testing, these are not used within the description truncation calculations by Google (so you have the same amount of space for characters as pre there introduction).
  • We now read the contents of robots.txt files only if the response code is 200 OK. Previously we read the contents irrespective of the response code.
  • Loading of large crawl files has been optimised, so this should be much quicker.
  • We now remove ‘tabs’ from links, just like Google do (again, as per internal testing). So if a link on a page contains the tab character, it will be removed.
  • We have formatted numbers displayed in filter total and progress at the bottom. This is useful when crawling at scale! For example, you will see 500,000 rather than 500000.
  • The number of rows in the filter drop down have been increased, so users don’t have to scroll.
  • The default response timeout has been increased from 10 secs to 20 secs, as there appears to be plenty of slow responding websites still out there unfortunately!
  • The lower window pane cells are now individually selectable, like the main window pane.
  • The ‘search’ button next to the search field has been removed, as it was fairly redundant as you can just press ‘Enter’ to search.
  • There’s been a few updates and improvements to the GUI that you may notice.
  • Fixed reported bugs:
  • Fixed a bug with ‘Depth Stats’, where the percentage didn’t always add up to 100%.
  • Fixed a bug when crawling from the domain root (without www.) and the ‘crawl all subdomains’ configuration ticked, which caused all external domains to be treated as internal.
  • Fixed a bug with inconsistent URL encoding. The UI now always shows the non URL encoded version of a URL. If a URL is linked to both encoded and unencoded, we’ll now only show the URL once.
  • Fixed a crash in Configuration->URL Rewriting->Regex Replace, as reported by a couple of users.
  • Fixed a crash for a bound checking issue
  • Fixed a bug where unchecking the ‘Check External’ tickbox still checks external links, that are not HTML anchors (so still checks images, CSS etc).
  • Fixed a bug where the leading international character was stripped out from SERP title preview.
  • Fixed a bug when crawling links which contained a new line. Google removes and ignores them, so we do now as well.
  • Fixed a bug where AJAX URLs are UTF-16 encoded using a BOM. We now derive encoding from a BOM, if it’s present.

New in Screaming Frog SEO Spider 2.55 (Jul 29, 2014)

  • Command Line Option To Start Crawls:
  • We are working on a scheduling feature and full command line option. In the meantime, we have made a quick and easy update which allows you to start the SEO Spider and launch a crawl via the command line, which means you can now schedule a crawl.
  • Supplying no arguments starts the application as normal. Supplying a single argument of a file path, tries to load that file in as a saved crawl. Supplying the following '--crawl http://www.example.com/' starts the spider and immediately triggers the crawl of the supplied domain. This switches the spider to crawl mode if its not the last used mode and uses your default configuration for the crawl.
  • Note: If your last used mode was not crawl, “Ignore robots.txt” and “Limit Search Depth” will be overwritten.
  • Small Tweaks:
  • A new configuration for ‘User Interface’ which allows graphs to be enabled and disabled. There are performance issues on Late 2013 Retina MacBook Pros with Java FX which we explain here. A bug has been raised with Oracle and we are pressing for a fix. In the mean time users affected can work around this by disabling graphs or using low resolution mode. A restart is required for this to take affect.
  • We have also introduced a warning for affected Mac users on start up (and in their UI settings) that they can either disable graphs or open in low resolution mode to improve performance.
  • Mac memory allocation settings can now persist when the app is reinstalled rather than be overwritten. There is a new way of configuring memory settings detailed in our memory section of the user guide.
  • We have further optimised graphs to only update when visible.
  • We re-worded the spider authentication pop up, which often confused users who thought it was an SEO Spider login!
  • We introduced a new pop-up message for memory related crashes.
  • Bug Fixes:
  • Fixed a crash with invalid regex entered into the exclude feature.
  • Fixed a bug introduced in 2.50 where starting up in list mode, then moving to crawl mode left the crawl depth at 0.
  • Fixed a minor UI issue with the default config menu allowing you to clear default configuration during a crawl. It’s now greyed out at the top level to be consistent with the rest of the File menu.
  • Fixed various window size issues on Macs.
  • Detect buggy saved coordinates for version 2.5 users on the mac so they get full screen on start up.
  • Fixed a couple of other crashes.
  • Fixed a typo in the crash warning pop up.

New in Screaming Frog SEO Spider 2.50 (Jul 1, 2014)

  • Graph View:
  • We have introduced a new ‘graph view’ window into the lower half of the right hand overview window pane. This updates in real time, as you crawl.
  • When you click the different tabs at the top, or filters in the right hand window pane, the graphs update to help visualise the data.
  • If you click the ‘Site Structure’ or ‘Response Codes’ tabs, you can see better visualisations of this data as well.
  • At the moment you can’t export the graphs unfortunately, but we do have this planned for development.
  • Other Updates:
  • We became a verified publisher for the Windows platform.
  • Improved debugging and saving of log files for support issues directly from the ‘Help’ and ‘debug’ menu.
  • Improved logging so data is not overwritten with new sessions.
  • The SEO Spider now detects any crashes on a previous run and asks the user to save logs and send to our support team.
  • The SEO Spider now has greater tolerance around invalid gzip responses. We will now continue with what was read up to the point of error, logging appropriately in the log file.
  • Fixed a bug where the ‘ignore robots.txt’ configuration wouldn’t save as default.
  • Fixed a bug where the ‘respect canonicals’ configuration was case insensitive.
  • Fixed a bug where we failed to parse some pages containing meta refresh tags resulting in invalid urls.
  • Fixed acrash caused by the sequence add, add, delete, delete, add on URL Rewriting -> Remove Parameters config menu :-).
  • Fixed a crash caused by starting in crawl mode, navigating to the outlinks tab and right clicking.

New in Screaming Frog SEO Spider 2.40 (Jun 30, 2014)

  • SERP Snippets Now Editable:
  • First of all, the SERP snippet tool we released in our previous version has been updated extensively to include a variety of new features. The tool now allows you to preview SERP snippets by device type (whether it’s desktop, tablet or mobile) which all have their own respective pixel limits for snippets. You can also bold keywords, add rich snippets or description prefixes like a date to see how the page may appear in Google.
  • SERP Mode For Uploading Page Titles & Descriptions:
  • You can now switch to ‘SERP mode’ and upload page titles and meta descriptions directly into the SEO Spider to calculate pixel widths. There is no crawling involved in this mode, so they do not need to be live on a website.
  • Crawl Overview Right Hand Window Pane:
  • We received a lot of positive response to our crawl overview report when it was released last year. However, we felt that it was a little hidden away, so we have introduced a new right hand window which includes the crawl overview report as default. This overview pane updates alongside the crawl, which means you can see which tabs and filters are populated at a glance during the crawl and their respective percentages.
  • Ajax Crawling:
  • Some of you may remember an older version of the SEO Spider which had an iteration of Ajax crawling, which was removed in a later version. We have redeveloped this feature so the SEO Spider can now crawl Ajax as per Google’s Ajax crawling scheme also sometimes (annoyingly) referred to as hashbang URLs.
  • Canonical Errors Report:
  • Under the ‘reports‘ menu, we have introduced a ‘canonical errors’ report which includes any canonicals which have no response, are a 3XX redirect or a 4XX or 5XX error.
  • This report also provides data on any URLs which are discovered only via a canonical and are not linked to from the site (so not html anchors to the URL). This report will hopefully help save time, so canonicals don’t have to be audited separately via list mode.
  • We have also made a large number of other updates:
  • A ‘crawl canonicals‘ configuration option (which is ticked by default) has been included, so the user can decide whether they want to actually crawl canonicals or just reference them.
  • Added new Googlebot for Smartphones user-agent and retired the Googlebot-Mobile for Smartphones UA. Thanks to Glenn Gabe for the reminder.
  • The ‘Advanced Export’ has been renamed to ‘Bulk Export‘. ‘XML Sitemap‘ has been moved under a ‘Sitemaps’ specific navigation item.
  • Added a new ‘No Canonical’ filter to the directives tab which helps view any html pages or PDFs without a canonical.
  • Improved performance of .xlsx file writing to be close to .csv and .xls
  • ‘Meta data’ has been renamed to ‘Meta Robots’.
  • The SEO Spider now always supplies the Accept-Encoding header to work around several sites that are 404 or 301’ing based on it not being there (even though it’s not actually a requirement…).
  • Allow user to cancel when uploading in list mode.
  • Provide feedback in stages when reading a file in list mode.
  • Max out Excel lines per sheet limits for each format (65,536 for xls, and 1,048,576 for xlsx).
  • The lower window ‘URL info’ tab now contains much more data collected about the URL.
  • ‘All links’ in the ‘Advanced Export’ has been renamed to ‘All In Links’ to provide further clarity.
  • The UI has been lightened and there’s a little more padding now.
  • Fixed a bug where empty alt tags were not being picked up as ‘missing’. Thanks to the quite brilliant Ian Macfarlane for reporting it.
  • Fixed a bug upon some URLs erroring upon upload in list mode. Thanks again to Fili for that one.
  • Fixed a bug in the custom filter export due to the file name including a colon as default. Oops!
  • Fixed a bug with images disappearing in the lower window pane, when clicking through URLs.

New in Screaming Frog SEO Spider 2.30 (Mar 21, 2014)

  • I am really pleased to announce version 2.30 of the Screaming Frog SEO spider code-named ‘Weckl’. This update includes new features that provide greater control of crawls, as well as how the data is analysed. Thanks again to everyone for their continued feedback, suggestions and support.
  • We still have lots of features currently in development and planned on our roadmap, so we expect to make another release fairly soon. For now, version 2.30 of the SEO Spider includes the following:
  • Pixel Width Calculated For Page Titles & Meta Descriptions:
  • Back in 2012 Google changed the way they display snippets in their search results. Historically this was simply a character limit of around 70 characters for page titles and 156 characters for meta descriptions. However, Google switched this to determine the actual pixel width of the characters used, which we believe is currently a limit of 512 pixels for page titles which they truncate with CSS. You may expect meta descriptions to be simply double this 512px div figure (1,024px), however they appear to be approximately 920 pixel width.
  • Page titles or meta descriptions with lots of thin characters (such as ‘i’ or ‘l’ etc) can have more characters than the old set limit. Conversely, if particularly wide characters are used, Google can show far less than the old limits. I believe the first experiments around this behaviour at the time were from Darren Slatten, who had already built a very cool SERP Snippet tool long before.
  • Therefore, while character counts are still useful, they are not a particularly accurate measurement for characters that will display in the SERPs. We have been meaning to calculate pixel width in the tool for sometime and with Google’s recent search result redesign we reverse engineered Google’s logic to calculate pixel width to provide greater accuracy.
  • SERP Snippet Tool:
  • We have also introduced a ‘SERP snippet’ emulator into the lower window tabs, which will show you how a page (when selected in the top window pane) might look in the Google search results based upon the pixel width logic described above.
  • The emulator provides detail on the width in pixels and how many pixels a title or description is over or under the limit. Please note, this won’t always be exact and it is our first iteration. This feature will be developed over time to include much more (like dates, item counts, rich snippets etc.), based upon the ever changing search results of Google (and possibly even other search engines).
  • Configurable Preferences:
  • We have introduced a ‘preferences’ tab into the spider configuration which puts you in control over the character (and pixel width) limits and therefore filters for things like URLs, page titles, meta descriptions, headings, image alt text and image size.
  • So if you have performed your own research, or wish to amend the limits and filters in the interface, you can.
  • Canonicals Are Now Crawled:
  • The SEO Spider now crawls canonicals whereas previously the tool merely reported them. Now if a URL has not already been discovered, the URL will be added to the queue and they will be crawled as a search bot would do.
  • Canonicals are not counted as ‘In Links’ (in the lower window tab) as they are not links in a traditional sense. Hence, if a URL does not have any ‘In Links’ in the crawl, it might well be due to discovery from a canonical. We recommend using the ‘Crawl Path Report‘ to show how the page was discovered, which will show the full path and whether it was via a canonical.
  • More In-Depth Crawl Limits:
  • We have moved a few options into a ‘limits’ tab under the spider configuration and introduced new options such as limiting crawl by maximum URL length, maximum folder depth and by the number of parameter query strings.
  • I have been asked for a regex to exclude crawling of all parameters in a URL a lot over the past 6-months (it’s .*\?.* by the way), so this will mean users don’t have to brush up on regex to use the exclude function for query parameters. The new ‘limit number of query strings’ option can simply be set to ’0′.
  • Microsoft Excel Support:
  • We added Excel support for .xlsx and .xls for both writing and reading (in list mode). The SEO Spider will automatically remember your preferred file type.
  • Other Smaller Updates:
  • Configuration for the maximum number of redirects to follow under the ‘advanced configuration’.
  • A ‘Canonicalised’ filter in the ‘directives’ tab, so you can easily filter URLs which don’t have matching (or self referencing) canonicals. So URLs which have been ‘canonicalised’ to another location.
  • The custom filter drop downs now have descriptions based on what’s inserted in the custom filter configuration.
  • Support for Mac Retina displays.
  • Further warnings when overwriting a file on export.
  • New timing to reading in list mode when uploading URLs.
  • Right click remove and re-spider now remove focus on tables.
  • The last window size and position is now remembered when you re-open.
  • Fixed missing dependency of xdg-utils for Linux.
  • Fixed a bug causing some crashes when using the right click function whilst also filtering.
  • Fixed some Java issues with SSLv3.

New in Screaming Frog SEO Spider 2.22 (Dec 5, 2013)

  • From feedback of individual cell and row selection in 2.21, we have improved this and introduced row numbers.

New in Screaming Frog SEO Spider 2.20 (Jul 15, 2013)

  • Redirect Chains Report:
  • There is a new ‘reports’ menu in the top level navigation of the UI, which contains the redirect chains report. This report essentially maps out chains of redirects, the number of hops along the way and will identify the source, as well as if there is a loop. This is really useful as the latency for users can be longer with a chain, a little extra PageRank can dissipate in each hop and a large chain of 301s can be seen as a 404 by Google.
  • Another very cool part of the redirect chain report is how it works for site migrations alongside the new ‘Always follow redirects‘ option (in the ‘advanced tab’ of the spider configuration). Now when you tick this box, the SEO spider will continue to crawl redirects even in list mode and ignore crawl depth.
  • Previously the SEO Spider would only crawl the first redirect and report the redirect URL target under the ‘Response Codes’ tab. However, as list mode is essentially working at a crawl depth of ’0′, you wouldn’t see the status of the redirect target which, particularly on migrations is required when a large number of URLs are changed. Potentially a URL could 301, then 301 again and then 404. To find this previously, you had to upload each set of target URLs each time to analyze responses and the destination. Now, the SEO Spider will continue to crawl, until it has found the final target URL. For example, you can view redirect chains in a nice little report in list mode now.
  • Crawl Path Report:
  • Have you ever wanted to know how a URL was discovered? Obviously you can view ‘in links’ of a URL, but when there is a particularly deep page, or perhaps an infinite URLs issue caused by incorrect relative linking, it can be a pain to track down the originating source URL (Tip! – To find the source manually, sort URLs alphabetically and find the shortest URL in the sequence!). However, now on right click of a URL (under ‘export’), you can see how the spider discovered a URL and what crawl path it took from start to finish.
  • Respect noindex & Canonical:
  • You now have the option to ‘respect’ noindex and canonical directives. If you tick this box under the advanced tab of the spider configuration, the SEO Spider will respect them. This means ‘noindex’ URLs will obviously still be crawled, but they will not appear in the interface (in any tab) and URLs which have been ‘canonicalised’ will also not appear either. This is useful when analysing duplicate page titles, or descriptions which have been fixed by using one of these directives above.
  • rel=“next” and rel=“prev”:
  • The SEO Spider now collects these html link elements designed to indicate the relationship between URLs in a paginated series. rel=“next” and rel=“prev” can now be seen under the ‘directives’ tab.
  • Custom Filters Now Regex:
  • Similar to our include, exclude and internal search function, the custom filters now support regex, rather than just query string. Oh and we have increased the number of filters from five to ten and included ‘occurrences’. So if you’re searching for an analytics UA ID or a particular phrase, the number of times it appears within the source code of a URL will be reported as well.
  • Crawl Overview Report:
  • Under the new ‘reports’ menu discussed above, we have included a little ‘crawl overview report’. This does exactly what it says on the tin and provides an overview of the crawl, including total number of URLs encountered in the crawl, the total actually crawled, the content types, response codes etc with proportions. This will hopefully provide another quick easy way to analyze overall site health at a glance.
  • We have also changed the max page title length to 65 characters (although it seems now to be based on pixel image width), added a few more preset mobile user agents, fixed some bugs (such as large sitemaps being created over the 10Mb limit) and made other smaller tweaks along the way.

New in Screaming Frog SEO Spider 2.00 (Jun 28, 2012)

  • Word count:
  • The SEO spider now counts the number of words on a given URL between the body tags. This is useful for finding low content pages, you can read our word count definition here.
  • URL rewriting:
  • The SEO spider now allows you to rewrite URLs. This is particularly useful for sites with session IDs or excess parameters, you can now simply remove them from the URLs using this feature. You can read about URL rewriting in our user guide.
  • Auto check for updates:
  • You don’t have to manually check for updates anymore, we let you know when one is available.
  • Remove URLs:
  • We allow you to delete URLs completely from the SEO spider (upon the right click). So if you only wish to export certain URLs, or create a sitemap with specific URLs, you can do it in the interface (rather than exporting to Excel).
  • Advanced exports:
  • We have renamed the ‘export’ option in the top level menu, to ‘Advanced export’ to differentiate it from the usual ‘export’ option. This area allows you to export in bulk, rather than just from the window in your current view. We have included additional exports under this section as well, including exporting of all alt text and anchor text. You can read more about the advanced export feature in our user guide.
  • Crawling outside of sub folders / domains:
  • As default the SEO spider has always crawled from sub domain or sub directory forwards. This is really useful for most sites, but there are some configurations where this can be a pain. So, we have provided a couple of extra options to crawl outside of start sub folders or sub domains for greater control of crawl. So you can now crawl from anywhere you’d like on the site using this feature and we will crawl all URLs for example. Both of these new options can be found under the ‘include‘ option in configuration.

New in Screaming Frog SEO Spider 1.90 (Feb 23, 2012)

  • There are not many feature updates in 1.90, but there has been a lot of work behind the scenes improving the product and fixing a few technical issues. These include:
  • Include follow true/false at link level in our bulk link exports. This will allow you to find use of internal ‘nofollow’ instantly.
  • The SEO spider now crawls .swf (flash) files.
  • You can now export all alt text via the bulk export.
  • When you’re saving files, if you rename it so that it doesn’t include the file extension, it is now automatically added. This applies to both .seospider and .csv files.
  • The SEO spider now has exit warnings when you have unsaved data and provides an option to file save at that point. We have also added a warning when exiting the spider when it’s still crawling.
  • We fixed a relative linking crawling issue.
  • We fixed a case sensitivity issue in our crawling.
  • There were some reported issues of difficulty allocating memory above 1,024mb on 64-bit machines. This update should solve these!
  • We now ignore newlines in anchor tags.