HTML Parser icon

HTML Parser

2.7/5 6
GPL   

A Java library used to parse HTML in either a linear or nested fashion #Parse HTML  #Extract HTML  #HTML transformation  #Parse  #Extraction  #Parser  

Description

Free Download

Primarily used for transformation or extraction, it features filters, custom tags, visitors, and easy to use JavaBeans. HTML Parser is a robust, fast, and well tested package.

HTML Parser is a useful Java library designed for HTML transformation or extraction.

The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data).

In general, to use the HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, it's more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.

To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner.

The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes.

Extraction encompasses all the information retrieval programs that are not meant to preserve the source page.

This covers uses like: · text extraction, for use as input for text search engine databases for example · link extraction, for crawling through web pages or harvesting email addresses · screen scraping, for programmatic data input from web pages · resource extraction, collecting images or sound · a browser front end, the preliminary stage of page display · link checking, ensuring links are valid · site monitoring, checking for page differences beyond simplistic diffs

There are several facilities in the HTMLParser codebase to help with extraction, including filters, visitors and JavaBeans.

Transformation includes all processing where the input and the output are HTML pages.

Some examples are: · URL rewriting, modifying some or all links on a page · site capture, moving content from the web to local disk · censorship, removing offending words and phrases from pages · HTML cleanup, correcting erroneous pages · ad removal, excising URLs referencing advertising · conversion to XML, moving existing web pages to XML

During or after reading in a page, operations on the nodes can accomplish many transformation tasks "in place", which can then be output with the toHtml() method. Depending on the purpose of your application, you will probably want to look into node decorators, visitors, or custom tags in conjunction with the PrototypicalNodeFactory.

HTML Parser 1.6 / 2.0 Snapshot

add to watchlist add to download basket send us an update REPORT
  runs on:
Windows All
  file size:
4.2 MB
  main category:
Programming
  developer:
  visit homepage

ShareX

Capture your screen, create GIFs, and record videos through this versatile solution that includes various other amenities: an OCR scanner, image uploader, URL shortener, and much more
ShareX

IrfanView

With support for a long list of plugins, this minimalistic utility helps you view images, as well as edit and convert them using a built-in batch mode
IrfanView

Context Menu Manager

Customize Windows’ original right-click context menu using this free, portable and open-source utility meant to enhance your workflow
Context Menu Manager

4k Video Downloader

Export your favorite YouTube videos and playlists with this intuitive, lightweight program, built to facilitate downloading clips from the popular website
4k Video Downloader

7-Zip

An intuitive application with a very good compression ratio that can help you not only create and extract archives, but also test them for errors
7-Zip

Windows Sandbox Launcher

Set up the Windows Sandbox parameters to your specific requirements, with this dedicated launcher that features advanced parametrization
Windows Sandbox Launcher

Bitdefender Antivirus Free

Feather-light and free antivirus solution from renowned developer that keeps the PC protected at all times from malware without requiring user configuration
Bitdefender Antivirus Free

calibre

Effortlessly keep your e-book library thoroughly organized with the help of the numerous features offered by this efficient and capable manager
calibre

Zoom Client

The official desktop client for Zoom, the popular video conferencing and collaboration tool used by millions of people worldwide
Zoom Client

Microsoft Teams

Effortlessly chat, collaborate on projects, and transfer files within a business-like environment by employing this Microsoft-vetted application
Microsoft Teams

% discount
calibre
  • calibre
  • Zoom Client
  • Microsoft Teams
  • ShareX
  • IrfanView
  • Context Menu Manager
  • 4k Video Downloader
  • 7-Zip
  • Windows Sandbox Launcher
  • Bitdefender Antivirus Free
essentials


Click to load comments
This enables Disqus, Inc. to process some of your data. Disqus privacy policy