View Source

h3. DocumentSearchController Interface Notes

The DocumentSearchController Interface defines the functionality of search API that is designed to work with a Page Object's PageText hierarchy. Each Page of a PDF document that contains text will have a none Null value when a call to page.getPageText() is made. The PageText hierarchy is made up child LineText, WordText and GlyphText. During the Page initialization process the content stream is parsed and the PageText Object and child objects are created. The resulting structure is synonymous with normal text processing; LineText represents a line of text and is made ups of child WordText objects for each word. The data structure makes is possible to properly search and extract text in a contextual manner.

The DocumentSearchControllerImpl is our reference implementation for searching for text in PDF page content streams. If a search indexing system such as Lucene is required it can easily be implemented and tied in with ICEpdf using the DocumentSearchController interface. The Search API was designed to be extensible and simple to use. The following sudo code shows how a typical search would be started and executed.

# Get instance of SearchController from the SwingController
# Call clearAllSearchHighlight to remove previous search terms from controller.
# Add one or more Search Terms to the SearchController
# Execute to search for the specified terms

Search controllers job is quite simple, search a page or clear a page's search results and search terms. Search terms are global to the Search controller and a set of SearchTerms will be used for subsequent page searches. When a call to one of SearchControllers clear method is called all search terms are removed. Working examples of the document searching can be found by using the Viewer RI ([|]) and the Search example ([../display/PDF/Search+Example|Searching]).

h4. Search Terms

The SearchTerms class allows a user to specify what they want the search controller to search for. The search term has three immutable instance vars to configure case sensitivity and whole word searches.

* *Term* - string word or phrase to use as the search key.
* *Case-sensitive* - true indicates a case sensitive search of the given term, false ignores case search dependency when searching for the term.
* *Whole word* - True indicates that a search match must match the search term exactly in length. False indicates that whole word most match.

The new search API implementation allows for multiple search terms. For example, consider the following three term search:

// clear previous search terms and highlighting
// use the add search term call to add multiple terms
searchController.addSearchTerm(term1, isCaseSensitive, isWholeWord);
searchController.addSearchTerm(term2, isCaseSensitive, isWholeWord);
searchController.addSearchTerm(term3, isCaseSensitive, isWholeWord);
Specify the page to search, for the terms. The search results are highlighted on that page.

// Search/highlight page y, with previously specified search terms

The RI search panel is setup by default to use single term searches. The search checkbox option "Cumulative" can be enabled to suppress, between search commands, the call to
A working examples of the search API can be found in [Examples].