Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3

This content originally appeared on DEV Community and was authored by brk

Automating web-based tasks with Selenium? Efficiently. That's the name of the game here, so.. Take the reins and make technology work for you.

A coding story in three chapters (with a bonus).

The Odyssey of the Code
- I. In the realm of the browser, a puppet is forged
- II. The opening of the file
- III. Sentence splitting action unleashed
- IV. The forging of the chunks
- V. Print statements illuminate the path
- VI. The sorcery of the selectors
- VII. The bridge of translation is crossed
- VIII. The file is written
Conclusion

The Odyssey of the Code

The Taming of the Firefox

I. In the realm of the browser, a puppet is forged

Where the mighty browser is bound to our will, as if by sorcery.

def initialize_browser() -> webdriver.Remote:

    driver_service = Service(GECKODRIVER_PATH, log_output="geckodriver.log")
    options = webdriver.FirefoxOptions()
    options.binary_location = FIREFOX_PATH
    if HEADLESS:
        options.add_argument("-headless")

    return webdriver.Firefox(service=driver_service, options=options)

First up, we've got this driver_service business. Now, I know what you're thinking - "George, what in the world is a driver service?" Well, let me tell you, it's the piece of code that's gonna make this whole Selenium thing work. It's like a bridge between our script and the middleman, I name the geckodriver! And after that service is up, the gecko’s gonna turn our Firefox into a nice marionette.
And where do we get this driver service, you ask? From the GECKODRIVER_PATH, of course! That's the path to the executable that's gonna make Firefox dance to our tune. Do you know what else it's gonna do? It's gonna log everything that's happening, right into the geckodriver.log file. That way, if anything goes sideways, we can take a peek under the hood and see why’s that.
Now, the options. See, we're creating a shiny new set of FirefoxOptions, and we're gonna deck them out with some goodies. First up, we're pointing it to the FIREFOX_PATH, making sure we've got the right browser to work with.
And then, if that HEADLESS variable is set to True, we're gonna add a little -headless argument to the mix. That means the browser's gonna run without a visible window, stealthier than your average ninja. No need for all the bells and whistles (and windows), we're just here to get the job done, right?
And finally, we're wrapping it all up by returning a brand-spankin' new instance of the Firefox webdriver, complete with our custom service and options. This one's like a well-oiled machine, and we're the ones behind the wheel.
Just remember: if you ever find yourself scratching your head, wondering what in the world is going on, just take a peek at that geckodriver.log file. It's like a crystal ball, and it might tell you everything you need to know.

II. The opening of the file

Like an ancient tome unveiling its secrets.

def load_input_file(file_name: str) -> str:

    try:
        with open(file_name, "r", encoding="utf-8") as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {file_name} not found. Please check, and try again.")
        raise

Alright, folks, gather 'round, because we're about to dive into some more of this code. Now, this right here, this is the kind of stuff that separates the wheat from the chaff.
First up, we've got this try block. Now, I know what you're thinking: "George, what in the world is a 'try' block?" Well, let me tell you, it's the statement that’s gonna keep us from running headfirst into a brick wall every time something goes wrong. And trust me, in the universe of programming, something's always going wrong.
See, what we're doing here is we're trying to open a file, right? And we're using this fancy-schmancy open() function to do it. And that's where the try block comes in.
If everything goes according to plan, and the file is there, waiting for us with open arms, we're gonna read the contents and hand 'em back, no problem. "But George, what could possibly go wrong?" Well, my friends, the world is a cruel and unpredictable place (especially when you’re learning software development), and sometimes, those files, they just up and disappear. But don't you worry, we've got a plan and that's where the except block comes in.
If that 'FileNotFoundError' rears its ugly head, we're gonna print out a nice, friendly message, letting the user know that the file they were looking for is.. not there..
It's like a safety net, a way to keep the wheels from falling off even when the road gets a little bumpy. And you know what they say, "the more you plan for the unexpected, the less unexpected it becomes." Or something like that.. I don't know, I'm just making it up as I go along.

Anyway, the point is when life hands you lemons, you make lemonade. And when life hands you FileNotFoundError's, you just smile ;)

III. Sentence splitting action unleashed

The fellowship of words is broken.

def split_text_into_sentences(text: str) -> list:

    return re.split(r"(?<=[\.?!])", text)

Okay, listen up, because we're about to dive into some serious sentence-splitting action.
Now, you might be looking at this line of code and thinking: "what in the world is going on here?" Well, let me tell you, it's a thing of beauty: we're taking this text that we've been handed, and we're gonna break it down into its individual sentences.
And how are we doing it, you ask? With the power of regular expressions (regex) and its re.split() implementation, that's how. "But George, what's a regular expression?" Well, my friends, it's a language all its own, a way of describing patterns in ways that would make your head spin. You can tinker with it here.
But don't worry, we're not gonna get too deep into the weeds here. All you need to know is that this little regular expression is the key to our success. It's gonna look for those periods, question marks, and exclamation marks, and it's gonna use them as the boundaries to split our text into individual sentences. But careful, this regex has its limits, see, every language is like a delicate little dance, with each word and phrase movin' in perfect harmony. Some Pros, they’re like aware of that more than us so they built up nice little tools using carefully crafted and more accurate natural language processing (NLP) formulas (for example you can experiment with the nltk library). For the sake of simplicity, let’s stick to the basics with the regex way.

The next time you find yourself staring at a wall of text, wondering how in the world you're gonna make sense of it all, just remember this little line of code and no more trying to figure out where one sentence ends and the next one begins. And who knows, maybe one day, you'll be the one writing the regular expressions turning chaos of strings into order.

IV. The forging of the chunks

The awakening of mighty blocks of text.

def generate_chunks(text: list, char_limit: int = CHAR_LIMIT) -> Generator[str, None, None]:

    chunk = deque()
    chunk_length = 0

    for sentence in text:
        sentence_length = len(sentence) + 1  # Add 1 for the space character
        if chunk_length + sentence_length > char_limit:
            yield " ".join(chunk)
            chunk.clear()
            chunk.append(sentence)
            chunk_length = sentence_length
        else:
            chunk.append(sentence)
            chunk_length += sentence_length

    yield " ".join(chunk) if chunk else ""

Hmm, we're about to dive into some serious text-chunking action now. You see, we've got this text that we need to translate, but we can't just send the whole thing off to the translation service all at once. Nah, that would be way too easy. Instead, we've gotta break it down into manageable chunks, little bite-sized pieces that the service can handle without breaking a sweat. That's where this generate_chunks() function comes in handy. It's like a master chef, carefully slicing and dicing the text, making sure each piece is the perfect size.
First, we set up a little deque, a fancy data structure that's gonna help us keeping track of the current chunk.

A deque, that's just a chic way of saying double-ended queue, obviously we deal with small amounts of data here but I wanted to give a try to this exotic thing. Your less sophisticated arrays would work fine there too. Just remember that usual pop and append methods don’t perform fast on items on the opposite side of the line. So Python’s collections module provides that class called deque that’s specially designed to provide fast and memory-efficient ways to append and pop item from both ends.

And then, we start looping through the sentences of the "text" block (we provided it as an argument when we called the function), and for each sentence, we're gonna figure out its length, -including the space character at the end.
Now, you might be wondering, "But George, how do you know when to start a new chunk?" Well, my friends, it's all about that char_limit variable and the one called chunk_length that we use as a counter to keep a running tally of the length of the current chunk. If the current sentence is gonna fit within the chunk size character limit, well, we're just gonna add it on and update the chunk length accordingly. That’s the normal way to go. Got it?
Now, if the length of the current sentence, plus the length of the chunk we've got so far, pushes us over the character limit we've set, well, we're gonna do a few things. First, we're gonna take all the sentences in the chunk and join 'em up with spaces, and then we're gonna yield that beasty. That just means we're gonna spit it out and move on to a new chunk.
Moving on to the next chunk means we're gonna clear out the chunk variable and start fresh, adding the current sentence to it and resetting the chunk length to just the length of that sentence, and then, just like before, we’ll keep rolling until we hit the limit again.
Finally, after we've gone through all the sentences, we're gonna yield the last chunk, if there is one.

That’s it for our symphony of text-chunking perfection. And who knows.. maybe you'll even find yourself singing the praises of them deques and character limits soon.

V. Print statements illuminate the path

The chronicles of progress are Written.

def print_preprocess_infos(input_text: str, chunks: List[str]) -> None:

    print("------------------")
    # Delete extra spaces gathered during generate_chunks
    input_text_with_spaces = input_text.replace(
        ".", ". ").replace("?", "? ").replace("!", "! ")
    print(
        f"Input text contains {len(input_text_with_spaces)-(len(chunks)-1)} characters.")
    print("------------------")
    print(f"Found {len(chunks)} chunks!")
    print(f"Sizes: {[len(s) for s in chunks]} characters each.")
    print("------------------")

Alright, but George, why do we need all these print statements? Isn't the script just supposed to, you know, just work?"
Well see, we've got this script that's doing all sorts of stuff, translating text and whatnot. But how are we supposed to know if it's working, huh? That's where these print statements come in, my friends, they're like the canaries in the coal mine. They're the early warning system that's gonna deal us some intel to let us know something's gone awry.
We're gonna print out the length of the input text. Because let's be real, how are we supposed to know if this thing is working if we don't know how much text we're dealing with, am I right? It's like trying to bake a cake without knowing how much flour you've got. There's more.. We're also gonna print out the number of chunks that the script has generated. And to top it all off, we're gonna print out the length of each of those chunks. Because, honestly, who doesn't love a good old-fashioned character count? It's like a little treasure hunt, but instead of finding gold, we're finding out how many letters are in each sentence.
A little apparté for when you will be contributing to bigger projects: there is that other breed of statements called loggers. They're the quiet ones, they slip in the back door and get the job done without all the fanfare of the print. Just keep in mind they're the ones that keep a careful record when things start to get a little hairy.

VI. The sorcery of the selectors

Their spell cast the way.

def get_input_textarea_element(driver: webdriver.Remote) -> WebElement:

    return WebDriverWait(driver, TIMEOUT).until(EC.element_to_be_clickable((By.CSS_SELECTOR, INPUT_AREA)))

Now it’s time to jump into some serious Selenium wizardry.

The input field our marionette will search for.
You see, we've got this input field that we need to find on the web page, and we can't just go barging in there, hoping it'll be there. Instead, we've gotta harness the power of Selenium, which is gonna help us navigate this digital bytescape.
To understand this whole pasta here, We need to know more about’em selectors. See, when you're coding a website, just like a building, you got all sorts of elements fitting together - your headings, your paragraphs, buttons, an' the like. Each one of those elements, it's got its own little personality, just like a set of traits and characteristics. And to identify them, you can use all sorts of different selectors: element selectors, class selectors, ID selectors, and many more other lads. Find out more here.

That's where this get_input_textarea_element() function comes in. It's like a secret agent, carefully scanning the page, looking for that elusive input field where you insert the text you want to translate.
But there’s a challenge to it: what's the point of finding the input field if it's not even ready for us to use? So first, we create a WebDriverWait object, and we instruct it to wait for up to TIMEOUT seconds (we've set it at the beginning as a parameter, by the way) for the element to be loaded and clickable thanks to this intriguing EC.element_to_be_clickable() thing that tells Selenium exactly what we're looking for. In this case, we're saying, "Hey, Selenium, find me an element that's clickable, and it's gotta match this CSS_SELECTOR. But.. how did I found that damn selector we've got here? Well, invoke the inspector by hitting F12 in Firefox. Then try the ctrl+shift+C combo, point & clic your target and in the inspector you'll be able to right clic the highlighted piece of code and extract its info.

Concretely we’re looking to mimic human behavior just like when you paste your first chunk of text. Let’s break it down again:

You select the input field by clicking on it (it’s selected).
Depending of your laziness level, either you start typing or you just paste some text into that area.

Well, our driver should be instructed to do the same.
Now, I know what you're thinking, "But George, what if the element never becomes clickable? What then?" Well, friends, that's where the WebDriverWait comes to the rescue. If the element doesn't become clickable within the TIMEOUT period, the whole thing is gonna throw an exception, and we'll know that something's gone wrong. Maybe the website has been updated and our old CSS selector pal is no more, or a networking problem occured somewhere. Whatever.
If everything goes according to plan, and Selenium manages to find that input field and confirm that it's ready for us to use, well, that's when the wonder happens. We're gonna return that element, and the rest of the script is gonna be able to work its magic.

VII. The bridge of translation is crossed

Meaning is conveyed across the void.

def translate_text(driver: webdriver.Remote, input_field: WebElement, chunks: list) -> list:

    translation = []

    for index, chunk in enumerate(chunks):
        print(f"Translating chunk {index+1} of {len(chunks)}")
        print("------------------")
        input_field.send_keys(chunk)
        print("Fetching translation...")
        time.sleep(SLEEP_TIME)

        translated_chunk = driver.find_element(
            By.CSS_SELECTOR, OUTPUT_AREA).text

        translation.append(str(translated_chunk))
        input_field.clear()

    return translation

Alright, friends, gather 'round, because we're about to witness the main event, the grand finale and the moment you've all been waiting for - the translation process! Now, I know what you're thinking, "But George, how in the world is this script gonna take all those chunks of text and turn them into a beautiful translation?" Well, my friends, watch out.

First, we're gonna set up an empty list called translation. Our little treasure trove of linguistic gold. This is where we're gonna store all the translated chunks.
And then, we're gonna start looping through those chunks, one by one. Now, I know what you're thinking, "But George, how are we gonna keep track of which chunk we're on?" Well, that's where the enumerate() function comes in, my friends. It's gonna give us the index of each chunk, so we can keep tabs on our progress.
As we loop through each chunk, we're gonna print out a little separator, just to let the user know that we're hard at work. And then, we're gonna send that chunk of text to the input field, using the send_keys() method. It's like we're typing the text over to the translation service, saying, "Here, take a look at this! A new text chunk for you to translate."
But we can't just sit back and wait, oh no, that would be way too easy. Instead, we're gonna print out a little message, letting the user know that we're fetching the translation. And then, we're gonna hit the time.sleep() function, giving the translation service some time to achieve its task.
When the time is up, we're gonna use Selenium again to find the output area on the web page, and we're gonna grab the text that's been translated by now. We're talking about pure algorithmic alchemy, folks, turning one language into another with the click of a button. The thing there is just that we’re not behind the wheels anymore.
So once we've put our hands on that translated chunk, we're gonna append it to our translation list. And then, just to be on the safe side, we're gonna clear out the input field, making sure we're ready for the next chunk.
When we've gone through all the chunks, and collected all those translated gems, we're gonna return the whole (translation) list which is the result of all the translations we’ve collected.

VIII. The file is written

The chronicle of the realm is inscribed.

def write_output_file(file_name: str, translation: list) -> None:

    with open(file_name, "w", encoding="utf-8") as f:
        f.writelines(translation)

The last step starts by calling write_output_file(), and let me tell you, it's one of the unsung hero of this whole operation. Because, let's be honest, what's the point of all this translation process if we can't actually, you know, save the results somewhere?
So, here's how it works. First, we're gonna open up a file, using that with statement. And let me tell you, that with statement, it's like a handshake that tells Python, "Hey, I'm about to do some serious file-handling business, so don't you dare interrupt me!" And what are we gonna do with this file, you ask? Well, my friends, we're gonna write some text to it. But not just any text, oh no, we're talking about the fruits of our labor, the translated chunks that we've been slaving over for who knows how long.
Now, I know what you're thinking, "But George, how are we gonna get all those chunks into the file?" Well, that's where the f.writelines() function comes in. It's like a magic wand, taking our list of translated chunks and turning them into a cohesive piece of writing. When that file finally gets saved, it's gonna be all those chunks of text, neatly packaged up into a tangible reality: a file!
But you know, it's not just about the end result, folks. It's about the journey, the process of getting there. And this write_output_file() function, was the final step in that journey. A cherry on top, mic drop moment that says, "We did it, folks, we translated the heck out of that text!"
kudos

Conclusion

We’ve been:

Taming that Selenium machine (installation and configuration)
Managing files (loading a source file and writing results into another one).
Witnessing that the true sorcery of this tutorial lies in the automated process of translating a text by splitting it into sentences, stacking them into piles of chunks sent one by one to the online translation service, through Selenium, without ever exceeding a character limit!
And the goodies: raise a glass to the coder's toolkit, by implementing some good practices (using a header, organizing the code using functions, parameters and a main(), use docstrings and type hints).

That's just the beginning, isn't it? This Selenium business, it's got so much more to offer. The possibilities, they're practically endless, my friends. So I want you to take this code, tinker with it, experiment, see what else you can make it do. You’re ready to go.
Try it out on different translation services, see how it handles the variations. Heck, see if you can make it do your taxes while you're at it (okay, maybe not that, but you get the idea). The point is, this is tip of the iceberg. So, what are you waiting for? Good luck!

The code is available on Github.

(Cover picture: Cat People, 1942).

This content originally appeared on DEV Community and was authored by brk

Print Share Comment Cite Upload Translate Updates

APA

brk | Sciencx (2025-02-20T15:28:52+00:00) Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3. Retrieved from https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/

MLA

" » Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3." brk | Sciencx - Thursday February 20, 2025, https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/

HARVARD

brk | Sciencx Thursday February 20, 2025 » Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3., viewed ,<https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/>

VANCOUVER

brk | Sciencx - » Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/

CHICAGO

" » Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3." brk | Sciencx - Accessed . https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/

IEEE

" » Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3." brk | Sciencx [Online]. Available: https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/. [Accessed: ]

rf:citation

» Basic Selenium – The Easy Peasy Introduction, Chapter 3 of 3 | brk | Sciencx | https://www.scien.cx/2025/02/20/basic-selenium-the-easy-peasy-introduction-chapter-3-of-3/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Contents