What's new in LinkChecker 9.3
Jan 7, 2015
- Features:
- checking: Parse and check links in PDF files.
- checking: Parse Refresh: and Content-Location: HTTP headers for URLs.
- Changes:
- plugins: PDF and Word checks are now parser plugins PdfParser, WordParser). Both plugins are not enabled by default since they require third party modules.
- plugins: Print a warning for enabled plugins that could not import needed third party modules.
- checking: Treat empty URLs as same as parent URL.
- installation: Replaced the twill dependency with local code.
- Fixes:
- checking: Catch XML parse errors in sitemap XML files and print them as warnings.
- checking: Fix internal URL match pattern. Patch by Mark-Hetherington.
- checking: Recalculate extern status after HTTP redirection.
- checking: Do not strip quotes from already resolved URLs.
- cgi: Sanitize configuration.
- checking: Use user-supplied authentication and proxies when requestiong
- robot.txt.
- plugins: Fix Word file check plugin.
New in LinkChecker 9.2 (Jul 1, 2014)
- Fixes:
- checking: Don't scan external robots.txt sitemap URLs.
- installation: Correct case for pip install command.
- Features:
- checking: Parse and check HTTP Link: headers.
- checking: Support parsing of HTML image srcset attributes.
- checking: Support parsing of HTML schema itemtype attributes.
New in LinkChecker 9.1 (Mar 31, 2014)
- Features:
- checking: Support parsing of sitemap and sitemap index XML files.
- Closes: GH bug #413
- checking: Add new HTTP header info plugin.
- logging: Support arbitrary encodings in CSV output.
- Closes: GH bug #467
- installation: Use .gz compression for source release to support
- "pip install".
- Closes: GH bug #461
- Changes:
- checking: Ignored URLs are reported earlier now.
- checking: Updated the list of unkonwn or ignored URI schemes.
- checking: Internal errors do not disable check threads anymore.
- checking: Disable URL length warning for data: URLs.
- checking: Do not warn about missing addresses on mailto links that have
- subjects.
- checking: Check and display SSL certificate info even on redirects.
- Closes: GH bug #489
- installation: Check requirement for Python requests >= 2.2.0.
- Closes: GH bug #478
- logging: Display downloaded bytes.
- Fixes:
- checking: Fix internal errors in debug output.
- Closes: GH bug #472
- checking: Fix URL result caching.
- checking: Fix assertion in external link checking.
- checking: Fix SSL errors on Windows.
- Closes: GH bug #471
- checking: Fix error when SNI checks are enabled.
- Closes: GH bug #488
- gui: Fix warning regex settings.
- Closes: GH bug #485
New in LinkChecker 9.0 (Mar 5, 2014)
- Features:
- checking: Support connection and content check plugins
- checking: Move lots of custom checks like Antivirus and syntax
- checks into plugins (see upgrading.txt for more info)
- checking: Add options to limit the number of requests per second
- allowed URL schemes and maximum file or download size
- checking: Support checking Sitemap: URLs in robots.txt files
- checking: Reduced memory usage when caching checked links
- gui: UI language can be changed dynamically
- Changes:
- checking: Use the Python requests module for HTTP and HTTPS requests
- logging: Removed download, domains and robots.txt statistics
- logging: HTML output is now in HTML5
- checking: Removed 301 warning since 301 redirects are used a lot without updating the old URL links
- Also, recursive redirection is not checked any more since there is a maximum redirection limit anyway
- checking: Disallowed access by robots.txt is an info now, not a warning. Otherwise it produces a lot of warnings which is counter-productive
- checking: Do not check SMTP connections for mailto: URLs anymore
- It resulted in lots of false warnings since spam prevention usually disallows direct SMTP connections from unrecognized client IPs
- checking: Only internal URLs are checked as default. To check external urls use --check-extern
- checking: Document that gconf and KDE proxy settings are parsed
- checking: Disable twill page refreshing
- checking: The default number of checking threads is 10 now instead of 100
- Fixes:
- logging: Status was printed every second regardless of the configured wait time
- logging: Add missing column name to SQL insert command
- checking: Several speed and memory usage improvements
- logging: Fix --no-warnings option
- logging: The -o none now sets the exit code
- checking: For login pages, use twill form field counter if the field has neither name nor id
- configuration: Check regular expressions for errors
New in LinkChecker 8.4 (Jan 26, 2013)
- Features:
- checking: Support URLs.
- logging: Sending SIGUSR1 signal prints the stack trace of all current
- running threads. This makes debugging deadlocks easier.
- gui: Support Drag-and-Drop of local files. If the local file is
- a LinkChecker project (.lcp) file it is loaded, else the check
- URL is set to the local file URL.
- Changes:
- checking: Increase per-host connection limits to speed up checking.
- Fixes:
- checking: Fix a crash when closing a Word document after scanning failed.
- Closes: GH bug #369
- checking: Catch UnicodeError from idna.encode() fixing an internal error when
- trying to connect to certain invalid hostnames.
- checking: Always close HTTP connections without body content.
- See also http://bugs.python.org/issue16298
- Closes: GH bug #376
New in LinkChecker 8.3 (Jan 17, 2013)
- Features:
- project: The Project moved to Github.
- Changes:
- logging: Print system arguments (sys.argv) and variable values in internal error information.
- installation: Install the dns Python module into linkcheck_dns subdirectory to avoid conflicts with an upstream python-dns installation.
- Fixes:
- gui: Fix storing of ignore lines in options.
New in LinkChecker 8.2 (Nov 10, 2012)
- Changes:
- checking: Print a warning when passwords are found in the configuration file and the file is accessible by others.
- checking: Add debug statements for unparseable content types.
- checking: Turn off caching. This improves memory performance drastically and it's a very seldom used feature - judging from user feedback over the years and my own experience.
- checking: Only allow checking of local files when parent URL does not exist or it's also a file URL.
- Fixes:
- checking: Fix anchor checking of cached HTTP URLs.
- checking: Fix cookie path matching with empty paths.
- checking: Fix handling of non-ASCII exceptions (regression in 8.1).
- configuration: Fix configuration directory creation on Windows systems.
New in LinkChecker 8.1 (Oct 15, 2012)
- Features:
- checking: Allow specification of maximum checking time or maximum
- number of checked URLs.
- checking: Send a HTTP Do-Not-Track header.
- checking: Check URL length. Print error on URL longer than 2000 characters,
- warning for longer than 255 characters.
- checking: Warn about duplicate URL contents.
- logging: A new XML sitemap logger can be used that implements the defined protocol
- Changes:
- doc: Mention 7-zip and Peazip to extract the .tar.xz under Windows.
- logging: Print download and cache statistics in text output logger.
- logging: Print warning tag in text output logger. Makes warning filtering
- more easy.
- logging: Make the last modification time a separate field in logging
- output. See doc/upgrading.txt for compatibility changes.
- logging: All sitemap loggers log all valid URLs regardless of the
- warnings or --complete options. This way the sitemaps can be
- logged to file without changing the output of URLs in other loggers.
- logging: Ignored warnings are now never logged, even when the URL
- has errors.
- checking: Improved robots.txt caching by using finer grained locking.
- checking: Limit number of concurrent connections to FTP and HTTP
- servers. This avoids spurious BadStatusLine errors.
- Fixes:
- logging: Close logger properly on I/O errors.
- checking: Fix wrong method name when printing SSL certificate warnings.
- checking: Catch ValueError on invalid cookie expiration dates.
- checking: Detect and handle remote filesystem errors when checking
- local file links.
New in LinkChecker 8.0 (Sep 3, 2012)
- Features:
- checking: Verify SSL certificates for HTTPS connections. Both the
- hostname and the expiration date are checked.
- checking: Always compare encoded anchor names.
- checking: Support WML sites.
- checking: Show number of parsed URLs in page content.
- cmdline: Added Nagios plugin script.
- Changes:
- dependencies: Python >= 2.7.2 is now required
- gui: Display debug output text with fixed-width font.
- gui: Display the real name in the URL properties.
- gui: Make URL properties selectable with the mouse.
- checking: Ignore feed: URLs.
- checking: --ignore-url now really ignores the URLs instead of checking only the syntax.
- checking: Increase the default number of checker threads from 10 to 100.
- Fixes:
- gui: Fix saving of the debugmemory option.
- checking: Do not handle attribute as parent
- URL but as normal URL to be checked.
- checking: Fix UNC path handling on Windows.
- checking: Detect more sites not supporting HEAD requests properly.