Thanks for posting the file. It turns out that our sample code for text extraction isn't working very well in this case. The example code calls document.getPageText() which eventually calls page.getText() which is supposed to be optimized for text extraction. However in this case is a little too optimized in that it only parses the watermark text.
The good news is that there is another way to get get a page text that is used by the viewer RI. In this case the page is fully parse which is a little slower then the previous (I'll create an bug for the text extraction error). In our ./examples/extraction/PageTextExtraction.java
PageText pageText = document.getPageText(pagNumber);
Object pageLock = new Object();
PageTree pageTree = document.getPageTree();
Page pg = pageTree.getPage(pagNumber, pageLock);
PageText pageText = pg.getViewText();
The extraction preserves the columns but always has problems with justified text layout as it's really hard to get the spacing right. If this is problem there is a system property org.icepdf.core.views.page.text.spaceFraction=3 which can be tweaked to help detect words. A value of zero does no space insertion where as a larger number will try to factionallly (based on the average glyph width) add more spaces.