Introducing gs-fastcopy

These days, a single laptop can chomp through gigabytes of data in seconds. So why was it taking ~1.5min to compress & upload 2 GB? Why was it taking ~10s to download just 100 MB?

I get bothered by code that “should” be fast but isn’t, when I have…


This content originally appeared on DEV Community and was authored by David Haley

These days, a single laptop can chomp through gigabytes of data in seconds. So why was it taking ~1.5min to compress & upload 2 GB? Why was it taking ~10s to download just 100 MB?

I get bothered by code that "should" be fast but isn't, when I have to wait around for it. Maybe it's 30+ yrs experience with software, 25+ years in web dev: I have a pretty good sense when something is slower than it "should" be.

And o', but am I never satisfied with needlessly slow code.

Time is both time and money. The more cancer researchers can process data, the faster we get to innovative treatments and save lives. And going 2x as fast with the same hardware typically means spending 1/2 as much. In an eventual clinical setting, every cent matters when it comes to tests being given freely… which can mean life & death.

I checked with my co-conspirator Lynn Langit: "these speeds, but really though?" She pointed me at the gcloud CLI tool's much superior performance in file transfer.

That began an investigation into optimizing transfer: basically, the standard Python (& other) Blob implementation is single-threaded. So much computing power just … sitting there sad & idle.

It's nice when default settings "just work" – correctly, but also fast. The numpy library is absolutely brilliant because it brings all kinds of low-level hardware optimization into Python, you don't have to think about it.

In that spirit, I hope to make cloud storage file transfer just that much easier, so that you don't have to think about it to get fast performance.

Without further ado: introducing gs-fastcopy:
https://medium.com/@dchaley/introducing-gs-fastcopy-36bb3bb71818

It's my first open-source public Python package 🐍 📦 🎉

Package: https://pypi.org/project/gs-fastcopy/
Source code: https://github.com/redwoodconsulting-io/gs-fastcopy-python

Now I download & uncompress those 100 MB in just a couple seconds, not 10. I'll take a 5x speedup. And the impact is only bigger as the files get larger.

Bar chart of benchmark results for local & cloud environments, using Blob-based smart_open and gs-fastcopy


This content originally appeared on DEV Community and was authored by David Haley


Print Share Comment Cite Upload Translate Updates
APA

David Haley | Sciencx (2024-07-21T23:23:35+00:00) Introducing gs-fastcopy. Retrieved from https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/

MLA
" » Introducing gs-fastcopy." David Haley | Sciencx - Sunday July 21, 2024, https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/
HARVARD
David Haley | Sciencx Sunday July 21, 2024 » Introducing gs-fastcopy., viewed ,<https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/>
VANCOUVER
David Haley | Sciencx - » Introducing gs-fastcopy. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/
CHICAGO
" » Introducing gs-fastcopy." David Haley | Sciencx - Accessed . https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/
IEEE
" » Introducing gs-fastcopy." David Haley | Sciencx [Online]. Available: https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/. [Accessed: ]
rf:citation
» Introducing gs-fastcopy | David Haley | Sciencx | https://www.scien.cx/2024/07/21/introducing-gs-fastcopy/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.