A powerful web crawler that can also extract and manipulate documents in order to retrieve the required information on the Internet. #Web spider #Web crawler #Internet crawler #Spider #Crawler #Search
Search engines allow the Internet to be interpreted in a meaningful way, as otherwise one would have to waste a lot of time finding information.
As essential tools when surfing online, developers have been constantly preoccupied with improving these utilities.
Norconex HTTP Collector is one such auxiliary tool that can be employed to crawl sites quickly and return results to a local folder or feed them directly to a search engine.
The application supports multi-threaded operations, thus ensuring that adequate results are received with little time being wasted. This ability can be especially useful when dealing with particularly large websites.
Once a target has been specified, the program automatically attempts to detect the language and text can be extracted from all the attached pictures and PDFs, as the library has support for OCR tasks.
Other formats, such as HTMLs and Office documents are supported and the spider can also process canonical URLs.
Several settings can be customized when starting jobs, such as the ability to adjust the crawling speed; also, one can configure the crawler to treat embedded documents as distinct files and hierarchical fields can also be built.
Filtering output documents can be performed based on URL or HTTP headers and metadata information can also be employed towards this end.
For ease of use, several samples are available, allowing developers or users to assess the power of the tool accurately.
A concise online manual can be perused to solve many issues and the forums can also be employed to ensure one obtains good results.
System requirements
- Java
- Internet connection
What's new in Norconex HTTP Collector 3.0.2:
- Fixed GenericSitemapResolver NPE when the sitemap content-type could not be detected. #803
- Updated Maven dependency updates: norconex-commons-maven-parent 1.0.2, norconex-collector-core 2.0.2, norconex-importer 3.0.1, Guava 32.0.0-jre, Selenium 4.0.0, Jetty 9.4.51.v20230217.
Norconex HTTP Collector 3.0.2
add to watchlist add to download basket send us an update REPORT- PRICE: Free
- runs on:
-
Windows 11
Windows 10 32/64 bit
Windows 8 32/64 bit
Windows 7 32/64 bit
Windows Vista 32/64 bit - file size:
- 102 MB
- filename:
- norconex-collector-http-3.0.2.zip
- main category:
- Internet
- developer:
- visit homepage
Zoom Client
IrfanView
Microsoft Teams
Bitdefender Antivirus Free
4k Video Downloader
ShareX
7-Zip
Windows Sandbox Launcher
calibre
paint.net
- Windows Sandbox Launcher
- calibre
- paint.net
- Zoom Client
- IrfanView
- Microsoft Teams
- Bitdefender Antivirus Free
- 4k Video Downloader
- ShareX
- 7-Zip