

If you want to avoid any thumbnail or full-text extraction you need to switch the toggle immediately after creating the content source. When you create a content source this triggers a full sync immediately, including thumbnail and full-text extraction. However, these toggle options are only available after you create the content source. You have the option to toggle off extracting thumbnails and/or full-text from files, if you need to save RAM. Thumbnail extraction is automatically disabled when less than 2GB of Heap is available.įor maximum performance and stability, ensure that Enterprise Search has at least 4GB of Heap.ĭisabling thumbnail or full-text extraction edit Encrypted documents are skipped by the extractor.Ĭontent extraction from binary formats (e.g., images, audio, videos) is currently not supported.The resulting text will be truncated if it exceeds 100KB. The maximum file size for content extraction is 20MB. There are some important facts and figures to note up front:
#Text extractor api how to
The following documentation covers the file extensions and media types supported by Workplace Search, as well as how to troubleshoot surprising results. Nevertheless, you might be surprised that some of your documents are not having their content extracted, or that the extraction is not perfect. Thumbnail extraction is available for certain image formats. To make the document searchable, the Workplace Search connector tries to extract text content into fields, and images into thumbnail previews.įull text content extraction is available for many types of documents, including PDFs and most Office365 and GSuite formats. Workplace Search will try to extract the content of these files, to transform the source document into a searchable document. The 3rd party services you sync with Workplace Search, such as Dropbox or Google Drive, usually contain a wide variety of documents and file types.
