Norconex Importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a computer file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before importing/using it in your own service or application.

Homepage POM file JAR file Javadoc
'com.norconex.collectors:norconex-importer:2.3.1'

Dependencies

Compile dependencies

Test dependencies