Apache Tika

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Homepage POM file JAR file Javadoc
'org.apache.tika:tika:1.7'

Dependencies

no dependencies