Over a thousand distinct file formats, including PPT, XLS, and PDF, are detected and their information and text are extracted using the external content extractor. A single interface can parse all of these file types, making it helpful for many different tasks including content analysis, translation, and search engine indexing
Why use a content extractor on Tika Server?
The mailbox and Zextras both utilise the same Java Virtual Machine (JVM), which is used by the Tika library. You may have different Tika servers indexing the material apart from the mailbox using the Tika server. Even if a Tika server crashes, the mailbox JVM is untouched.