Tips on Advanced PDF Automation with Test Studio

This content originally appeared on Telerik Blogs and was authored by Petar Grigorov

Check out these more advanced tips for verifying your PDF’s images with Test Studio.

Whether it’s for platform compatibility, document integrity, security, size, compression, rich media support, ISO standardization, accessibility, archiving or just ease of creation and use, Portable Document Format—PDF—is one of the most widely used document formats in both personal and professional settings alike. One of my colleagues goes even further in describing it as “the alpha and omega of document processing for any business” in a blog post.

A laptop whose screen has a dashboard loaded with some charts
Image source: Unsplashed

Progress Telerik Test Studio is an easy-to-use automation tool for functional UI, load/performance and API testing for any web and desktop applications. Whether you’re going codeless or choosing its code-based capabilities, Test Studio provides solutions for the entire team, empowering everyone—from junior testers to senior devs, PMs to QA leads—to achieve max productivity in agile software delivery environments. I could also go even further and describe it as “automated testing that just works.”

As a good beginner’s guide on how to get started with PDF automation testing with Telerik Test Studio, I’d recommend the aforementioned publication. This and of course the official documentation. The purpose of the current blog post, however, is to take you one step further and unveil some advanced tips, tricks and even fun (or maybe not so much) experiments. The main topic will focus on image verification in a PDF file of your choice.

Understanding PDF Structure

PDFs are composed of various elements such as text, images and graphics organized in a structured manner. Their file structure can be divided into several sections:

Document catalog: The root of the document structure
Pages: Representations of individual pages within the PDF
Content streams: Sequences of instructions that describe the appearance of a page
Resources: Collections of objects like fonts and images used by the content streams

By embedding images as background elements within the content stream and providing correct layering and resource management, PDFs can maintain a consistent and visually appealing layout across different platforms and devices. If such image is a bitmap (e.g., JPEG or PNG), it is embedded directly into the PDF as an image object. To use an SVG as a background, the SVG must be converted into a format that PDF can interpret natively.

Counting Challenges

Let’s say that there is exactly such a PDF as described above. Real-life quality assurance scenarios would require answering questions like:

“How can I validate the header/footer properties?”

“How can I check if the entire text in an input box is visible or not?”

“How can I achieve two images comparison in % of matching?”

“Is the watermark present throughout all of the pages of the PDF file?”

Listing all such questions would render the blog post TL;DR, so let’s fast-forward to the explanation of why trying to automate an image embedded as a background in a PDF file can turn even the most seasoned automation testers into puzzled seasoned automation testers. In other words, the image element is not available in the Document Object Model—the mighty DOM.

A closeup of a sculpture showing metal orbs connected with cylindrical tubes
Image source: Unsplashed

Solving Challenges

When a PDF file is open for validation with Test Studio, it looks and feels like you’re working inside a webpage with the same functionality for recording elements and execution. Technically, Test Studio starts its built-in PDF viewer server and displays the file inside, parsed to an HTML page. All of that happens seamlessly so you do not have to worry about starting and maintaining the PDF viewer server.

From then on, you can validate any element inside the PDF file the way you’re used to from automating webpages—hover over and choose the desired action from the context menu. Usually, in such cases, it is crucial to use Test Studio’s pixel-by-pixel image verification feature along with the element by image feature.

However, when an image element is not recognized in the DOM, try to stay cool and then adapt—overcoming and improving are even more crucial.

And illustration of the Telerik Test Studio Ninja mascot in a meditative pose

Before creating a test script with Telerik Test Studio, please make sure to have your monitor’s scaling set to 100%. A reminder that this is not related to the screen resolution, but rather to the Windows OS System Settings:

System - Display: Scale & Layout is set to the recommended 125%. User has highlight 100%

I created a sample web test called “True,” opening a PDF file (called Report.html.pdf) created out of the Test Studio Reports section’s HTML export. When you attach a recorder with hover over element highlighting enabled to the PDF file, it will look like this:

Exploring the DOM (via the “Locate in DOM” option) should bring the following result, only showing a 1056 x 816px canvasWrapper div, which contains the Progress Telerik Test Studio logo in the header, the results graph and the results data grid.

In order to verify that the Progress Telerik Test Studio logo is visible, I took the following steps:

Adapt

While the Recorder was on and the canvasWrapper highlighted, I created a dummy element and called it “FooElement.”

Add to Elements

I found the new element from the Elements repository >> clicked on Step Builder >> Verifications and selected Visible. (If you need additional details how to exactly do that, check the verification steps docs.)

The following step was created: Verify element ‘FooElement’ ‘is’ visible, with its SearchByImageFirst property set to True.

Verify FooElement is Visible searchbyimagefirst

Overcome

I edited the element’s attributes (i.e., tagname = foo) by assigning them dummy values, so it could not be found by Test Studio default Smart Find Logic.

tagname is exactly foo

I modified the element’s image with the one I need for the logo verification only.

Upon running the test, the result is successful, as the new logo image is found in the PDF file. Note that such verification checks the visibility property of the element. If an element is marked visible but scrolled off the current window, the verification will still pass but it’s not actually visible on the current screen or inside the scroll window.

Test Success - 3 passed out of 3 executed

Improve

I wanted to make sure the validation is not false positive, so added two additional scenarios:

I unchecked the IsVisible property to make sure the step would fail upon execution

Test fails. 1 passed out of 2 executed - Verify FooElement is not visible - failed

I copied the original test (IsVisible is checked) to a new one, called False, but this time uploaded an additional and slightly different logo for the element image. Note that you can upload more than one image to an element and have different steps use different images.

Upon running the new test, the result fails as expected, as the modified logo is not present in the PDF file.

Test fails. 1 passed out of 2 executed - Verify element FooElement is visible - did not pass

Following the flow applied for the logo validation, you could do the same for the graph image or any other and achieve the same results.

Improvise

The improvise part is always tricky, but nevertheless I decided to add it. Using some C# code, I might be able to calculate if an element is visible or not. It would require asking the browser for the current screen coordinates of the view window, whether that’s a scroll window or the browser window. Then I’d ask the browser for the current screen coordinates of the target element I want to verify. Then I’d calculate whether the two rectangles intersect. If the two rectangles intersect, then the element is visible. If they do not intersect, then the element is not visible. But that is pretty advanced and maybe worthy of a separate blog post.

I could go even further and take a snapshot of the entire browser or just a portion of it via ActiveBrowser.Window.GetBitmap(). Then, using System.Drawing namespace, crop an area and save it, and finally compare it to another image that I am using as a reference standard. I did this to experiment with Mean Squared Error (MSE), which is a common metric used in image comparison to measure the difference between two images.

MSE quantifies the average of the squares of the differences between corresponding pixel values of the two images. MSE is a mathematical formula used to calculate the average squared difference between the original (reference) image and the modified (test) image. The lower the MSE, the more similar the two images are. A higher MSE indicates greater dissimilarity.

Just to give you a glimpse of what’s necessary to be done in advance before MSE is fully relied on, you need to:

Align the images – Check that both images are of the same dimensions. If not, resize them appropriately.
Subtract pixel values – For each corresponding pixel in the two images, compute the difference.
Square the differences – Square each difference to ensure all values are positive.
Sum the squared differences – Add up all the squared differences.
Average the sum – Divide the total by the number of pixel pairs to get the mean squared error.

Although that approach would provide endless opportunities, the maintenance of the code would eventually become a burden, so I’d prefer to stick to the low/no code approach. What would you do in such a case? Let us know in the comments section.

And if you haven’t done so, give Test Studio a try for free:

Try Test Studio

Happy testing!

This content originally appeared on Telerik Blogs and was authored by Petar Grigorov

Print Share Comment Cite Upload Translate Updates

APA

Petar Grigorov | Sciencx (2024-07-11T07:54:19+00:00) Tips on Advanced PDF Automation with Test Studio. Retrieved from https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/

MLA

" » Tips on Advanced PDF Automation with Test Studio." Petar Grigorov | Sciencx - Thursday July 11, 2024, https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/

HARVARD

Petar Grigorov | Sciencx Thursday July 11, 2024 » Tips on Advanced PDF Automation with Test Studio., viewed ,<https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/>

VANCOUVER

Petar Grigorov | Sciencx - » Tips on Advanced PDF Automation with Test Studio. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/

CHICAGO

" » Tips on Advanced PDF Automation with Test Studio." Petar Grigorov | Sciencx - Accessed . https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/

IEEE

" » Tips on Advanced PDF Automation with Test Studio." Petar Grigorov | Sciencx [Online]. Available: https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/. [Accessed: ]

rf:citation

» Tips on Advanced PDF Automation with Test Studio | Petar Grigorov | Sciencx | https://www.scien.cx/2024/07/11/tips-on-advanced-pdf-automation-with-test-studio/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.