public class ParagraphAbsorber extends Object
Represents an absorber object of page structure objects such as sections and paragraphs.
Performs search for sections and paragraphs of text and provides access for rectangles and polydons that describes it in text coordinate space.
Also performs text segments search and provides access to search results via TextFragments
collections grouped by structure elements.
// Open document Document doc = new Document("input.pdf"); // Create ParagraphAbsorber object ParagraphAbsorber absorber = new ParagraphAbsorber(); // Accept the absorber for first page absorber.visit(doc.getPages.get_Item(1)); // Get markup object of first page PageMarkup markup = absorber.getPageMarkups().get(0); // Loop through structure elements of the page text to find first text fragment of each paragraph for (MarkupSection section : markup.getSections()) { for (MarkupParagraph paragraph : section.getParagraphs()) { TextFragment fragment = paragraph.getFragments().get_Item(0); // Update text properties fragment.getTextState().setBackgroundColor (Color.getLightBlue()); } } // Save document doc.save(GetOutputPath("output.pdf"));
ParagraphAbsorber.PageMarkups
collection will contains PageMarkup
objects that represents page structure by collections of MarkupSection
and MarkupParagraph
.
The TextFragment
object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).Constructor and Description |
---|
ParagraphAbsorber()
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page. |
ParagraphAbsorber(int sectionsSearchDepth)
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page. |
Modifier and Type | Method and Description |
---|---|
List<PageMarkup> |
getPageMarkups()
Gets collection of
PageMarkup that were absorbed. |
int |
getSectionsSearchDepth()
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed.
|
boolean |
isMulticolumnParagraphsAllowed()
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
|
void |
setMulticolumnParagraphsAllowed(boolean value)
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
|
void |
setSectionsSearchDepth(int value)
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed.
|
void |
visit(Document doc)
Performs search for sections and paragraphs on the specified
Document . |
void |
visit(Page page)
Performs search on the specified
Page . |
public ParagraphAbsorber()
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page.
public ParagraphAbsorber(int sectionsSearchDepth)
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page.
sectionsSearchDepth
- Number of sequential searches for more fine elements of structure that will be performed.
ParagraphAbsorber.SectionsSearchDepth
property for more hints about the parameter.
public List<PageMarkup> getPageMarkups()
Gets collection of PageMarkup
that were absorbed.
public int getSectionsSearchDepth()
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).
public void setSectionsSearchDepth(int value)
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).
value
- int valuepublic final boolean isMulticolumnParagraphsAllowed()
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
public final void setMulticolumnParagraphsAllowed(boolean value)
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
value
- boolean valuepublic void visit(Document doc)
Performs search for sections and paragraphs on the specified Document
.
doc
- Pdf document object.public void visit(Page page)
Performs search on the specified Page
.
page
- Pdf document page object.Copyright © 2023 Aspose. All Rights Reserved.