Apache Nutch

Nutch is open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Homepage POM file JAR file Javadoc
'org.apache.nutch:nutch:2.0-dev'

Dependencies

Compile dependencies

Test dependencies