The DocumentSearchController Interface defines the functionality of search API that is designed to work with a Page Object's PageText hierarchy. Each Page of a PDF document that contains text will have a none Null value when a call to page.getPageText() is made. The PageText hierarchy is made up child LineText, WordText and GlyphText. During the Page initialization process the content stream is parsed and the PageText Object and child objects are created. The resulting structure is synonymous with normal text processing; LineText represents a line of text and is made ups of child WordText objects for each word. The data structure makes is possible to properly search and extract text in a contextual manner.
The DocumentSearchControllerImpl is our reference implementation for searching for text in PDF page content streams. If a search indexing system such as Lucene is required it can easily be implemented and tied in with ICEpdf using the DocumentSearchController interface. The Search API was designed to be extensible and simple to use. The following sudo code shows how a typical search would be started and executed.
Search controllers job is quite simple, search a page or clear a page's search results and search terms. Search terms are global to the Search controller and a set of SearchTerms will be used for subsequent page searches. When a call to one of SearchControllers clear method is called all search terms are removed. Working examples of the document searching can be found by using the Viewer RI (http://www.icepdf.org/demo/jws/icepdf.jnlp) and the Search example (../display/PDF/Search+Example).
The SearchTerms class allows a user to specify what they want the search controller to search for. The search term has three immutable instance vars to configure case sensitivity and whole word searches.
The new search API implementation allows for multiple search terms. For example, consider the following three term search:
Specify the page to search, for the terms. The search results are highlighted on that page.
The RI search panel is setup by default to use single term searches. The search checkbox option "Cumulative" can be enabled to suppress, between search commands, the call to
A working examples of the search API can be found in Examples.
© Copyright 2017 ICEsoft Technologies Canada Corp.