LinkChecker Changelog

What's new in LinkChecker 9.3

Jan 7, 2015
  • Features:
  • checking: Parse and check links in PDF files.
  • checking: Parse Refresh: and Content-Location: HTTP headers for URLs.
  • Changes:
  • plugins: PDF and Word checks are now parser plugins PdfParser, WordParser). Both plugins are not enabled by default since they require third party modules.
  • plugins: Print a warning for enabled plugins that could not import needed third party modules.
  • checking: Treat empty URLs as same as parent URL.
  • installation: Replaced the twill dependency with local code.
  • Fixes:
  • checking: Catch XML parse errors in sitemap XML files and print them as warnings.
  • checking: Fix internal URL match pattern. Patch by Mark-Hetherington.
  • checking: Recalculate extern status after HTTP redirection.
  • checking: Do not strip quotes from already resolved URLs.
  • cgi: Sanitize configuration.
  • checking: Use user-supplied authentication and proxies when requestiong
  • robot.txt.
  • plugins: Fix Word file check plugin.

New in LinkChecker 9.2 (Jul 1, 2014)

  • Fixes:
  • checking: Don't scan external robots.txt sitemap URLs.
  • installation: Correct case for pip install command.
  • Features:
  • checking: Parse and check HTTP Link: headers.
  • checking: Support parsing of HTML image srcset attributes.
  • checking: Support parsing of HTML schema itemtype attributes.

New in LinkChecker 9.1 (Mar 31, 2014)

  • Features:
  • checking: Support parsing of sitemap and sitemap index XML files.
  • Closes: GH bug #413
  • checking: Add new HTTP header info plugin.
  • logging: Support arbitrary encodings in CSV output.
  • Closes: GH bug #467
  • installation: Use .gz compression for source release to support
  • "pip install".
  • Closes: GH bug #461
  • Changes:
  • checking: Ignored URLs are reported earlier now.
  • checking: Updated the list of unkonwn or ignored URI schemes.
  • checking: Internal errors do not disable check threads anymore.
  • checking: Disable URL length warning for data: URLs.
  • checking: Do not warn about missing addresses on mailto links that have
  • subjects.
  • checking: Check and display SSL certificate info even on redirects.
  • Closes: GH bug #489
  • installation: Check requirement for Python requests >= 2.2.0.
  • Closes: GH bug #478
  • logging: Display downloaded bytes.
  • Fixes:
  • checking: Fix internal errors in debug output.
  • Closes: GH bug #472
  • checking: Fix URL result caching.
  • checking: Fix assertion in external link checking.
  • checking: Fix SSL errors on Windows.
  • Closes: GH bug #471
  • checking: Fix error when SNI checks are enabled.
  • Closes: GH bug #488
  • gui: Fix warning regex settings.
  • Closes: GH bug #485

New in LinkChecker 9.0 (Mar 5, 2014)

  • Features:
  • checking: Support connection and content check plugins
  • checking: Move lots of custom checks like Antivirus and syntax
  • checks into plugins (see upgrading.txt for more info)
  • checking: Add options to limit the number of requests per second
  • allowed URL schemes and maximum file or download size
  • checking: Support checking Sitemap: URLs in robots.txt files
  • checking: Reduced memory usage when caching checked links
  • gui: UI language can be changed dynamically
  • Changes:
  • checking: Use the Python requests module for HTTP and HTTPS requests
  • logging: Removed download, domains and robots.txt statistics
  • logging: HTML output is now in HTML5
  • checking: Removed 301 warning since 301 redirects are used a lot without updating the old URL links
  • Also, recursive redirection is not checked any more since there is a maximum redirection limit anyway
  • checking: Disallowed access by robots.txt is an info now, not a warning. Otherwise it produces a lot of warnings which is counter-productive
  • checking: Do not check SMTP connections for mailto: URLs anymore
  • It resulted in lots of false warnings since spam prevention usually disallows direct SMTP connections from unrecognized client IPs
  • checking: Only internal URLs are checked as default. To check external urls use --check-extern
  • checking: Document that gconf and KDE proxy settings are parsed
  • checking: Disable twill page refreshing
  • checking: The default number of checking threads is 10 now instead of 100
  • Fixes:
  • logging: Status was printed every second regardless of the configured wait time
  • logging: Add missing column name to SQL insert command
  • checking: Several speed and memory usage improvements
  • logging: Fix --no-warnings option
  • logging: The -o none now sets the exit code
  • checking: For login pages, use twill form field counter if the field has neither name nor id
  • configuration: Check regular expressions for errors

New in LinkChecker 8.4 (Jan 26, 2013)

  • Features:
  • checking: Support URLs.
  • logging: Sending SIGUSR1 signal prints the stack trace of all current
  • running threads. This makes debugging deadlocks easier.
  • gui: Support Drag-and-Drop of local files. If the local file is
  • a LinkChecker project (.lcp) file it is loaded, else the check
  • URL is set to the local file URL.
  • Changes:
  • checking: Increase per-host connection limits to speed up checking.
  • Fixes:
  • checking: Fix a crash when closing a Word document after scanning failed.
  • Closes: GH bug #369
  • checking: Catch UnicodeError from idna.encode() fixing an internal error when
  • trying to connect to certain invalid hostnames.
  • checking: Always close HTTP connections without body content.
  • See also http://bugs.python.org/issue16298
  • Closes: GH bug #376

New in LinkChecker 8.3 (Jan 17, 2013)

  • Features:
  • project: The Project moved to Github.
  • Changes:
  • logging: Print system arguments (sys.argv) and variable values in internal error information.
  • installation: Install the dns Python module into linkcheck_dns subdirectory to avoid conflicts with an upstream python-dns installation.
  • Fixes:
  • gui: Fix storing of ignore lines in options.

New in LinkChecker 8.2 (Nov 10, 2012)

  • Changes:
  • checking: Print a warning when passwords are found in the configuration file and the file is accessible by others.
  • checking: Add debug statements for unparseable content types.
  • checking: Turn off caching. This improves memory performance drastically and it's a very seldom used feature - judging from user feedback over the years and my own experience.
  • checking: Only allow checking of local files when parent URL does not exist or it's also a file URL.
  • Fixes:
  • checking: Fix anchor checking of cached HTTP URLs.
  • checking: Fix cookie path matching with empty paths.
  • checking: Fix handling of non-ASCII exceptions (regression in 8.1).
  • configuration: Fix configuration directory creation on Windows systems.

New in LinkChecker 8.1 (Oct 15, 2012)

  • Features:
  • checking: Allow specification of maximum checking time or maximum
  • number of checked URLs.
  • checking: Send a HTTP Do-Not-Track header.
  • checking: Check URL length. Print error on URL longer than 2000 characters,
  • warning for longer than 255 characters.
  • checking: Warn about duplicate URL contents.
  • logging: A new XML sitemap logger can be used that implements the defined protocol
  • Changes:
  • doc: Mention 7-zip and Peazip to extract the .tar.xz under Windows.
  • logging: Print download and cache statistics in text output logger.
  • logging: Print warning tag in text output logger. Makes warning filtering
  • more easy.
  • logging: Make the last modification time a separate field in logging
  • output. See doc/upgrading.txt for compatibility changes.
  • logging: All sitemap loggers log all valid URLs regardless of the
  • warnings or --complete options. This way the sitemaps can be
  • logged to file without changing the output of URLs in other loggers.
  • logging: Ignored warnings are now never logged, even when the URL
  • has errors.
  • checking: Improved robots.txt caching by using finer grained locking.
  • checking: Limit number of concurrent connections to FTP and HTTP
  • servers. This avoids spurious BadStatusLine errors.
  • Fixes:
  • logging: Close logger properly on I/O errors.
  • checking: Fix wrong method name when printing SSL certificate warnings.
  • checking: Catch ValueError on invalid cookie expiration dates.
  • checking: Detect and handle remote filesystem errors when checking
  • local file links.

New in LinkChecker 8.0 (Sep 3, 2012)

  • Features:
  • checking: Verify SSL certificates for HTTPS connections. Both the
  • hostname and the expiration date are checked.
  • checking: Always compare encoded anchor names.
  • checking: Support WML sites.
  • checking: Show number of parsed URLs in page content.
  • cmdline: Added Nagios plugin script.
  • Changes:
  • dependencies: Python >= 2.7.2 is now required
  • gui: Display debug output text with fixed-width font.
  • gui: Display the real name in the URL properties.
  • gui: Make URL properties selectable with the mouse.
  • checking: Ignore feed: URLs.
  • checking: --ignore-url now really ignores the URLs instead of checking only the syntax.
  • checking: Increase the default number of checker threads from 10 to 100.
  • Fixes:
  • gui: Fix saving of the debugmemory option.
  • checking: Do not handle attribute as parent
  • URL but as normal URL to be checked.
  • checking: Fix UNC path handling on Windows.
  • checking: Detect more sites not supporting HEAD requests properly.