This content originally appeared on DEV Community and was authored by Wesley Miranda
NodeJS is well known for being single-threading, but it is not true, because only the event-loop is handled by a single thread. NodeJS gives to us the possibility to use 2 approaches for multithreading, worker_threads
and child_process
.
-
worker_threads
are controlled by a process and have shared memory, which makes communication between them easier. -
child_process
are processes generated from the main thread, it's useful when we need direct communication with the Operational System, but we need more memory to create them.
Application: The purpose of the application we are going to create is to load all the folder's content and upload every file to Google Cloud Storage, but the most interesting part is that we will decide how many threads should do this operation, speeding up the upload process.
Main Principles:
- Worker threads: NodeJS module to create threads.
- Streams: if you don't know NodeJS streams I suggest you take a look at the tutorial that I've created here.
- File System: NodeJS provides us with a simple way to access the OS and manipulate files and folders.
Steps to reproduce:
- Load the folder's content
- Create the threads
- Create the upload worker
- Assign the threads to upload worker
Requirements:
We are going to use NodeJS
16.16
versionYou need to have a Google account to access Google Cloud Services.
Cloud Storage Service
We need to install cloud storage library to deal with google cloud service.
npm install @google-cloud/storage
If you need help with configuring your Cloud Storage service on your Google account, many good tutorials can help you with that, it's not the purpose of this tutorial.
Let's create our first file cloudStorageFileService.js
to work with our storage.
cloudStorageFileService.js
const { Storage } = require('@google-cloud/storage')
const path = require('path')
const serviceKey = path.join(__dirname, '../gkeys.json')
class CloudStorageFileService {
// (1)
constructor() {
this.storage = new Storage({
projectId: 'my-project-id',
keyFilename: serviceKey
})
}
// (2)
async uploadFile(bucketName, destFileName) {
return await this.storage
.bucket(bucketName)
.file(destFileName)
.createWriteStream()
}
}
module.exports = CloudStorageFileService
From the code sections above:
Basic configurations to use Cloud Storage service, as the project id and the path with your Google Cloud credentials.
Google Cloud Storage provides us a Writable Stream for uploading files.
Thread Controller
The thread controller will handle the thread distribution, we want to give a thread for each file, and upload them separately.
threadController.js
const {
Worker
} = require('node:worker_threads');
const { readdir } = require('fs/promises')
const path = require('path')
class ThreadController {
// (1)
constructor(threadsNumber) {
this.files = []
this.threadsNumber = threadsNumber
this.count = 0
}
// (2)
async loadFiles() {
this.files = await readdir(path.join(__dirname, '/content'))
}
// (3)
async uploadThread(filePath) {
return new Promise((resolve, reject) => {
const worker = new Worker('./fileUploadWorker.js', {
workerData: {
file: filePath
}
});
worker.once('error', reject);
worker.on('exit', (code) => {
resolve(filePath)
});
})
}
// (4)
async execute() {
const init = performance.now()
await this.loadFiles()
let promises = []
while (this.count < this.files.length) {
for (let i = this.count; i < this.count + this.threadsNumber; i++) {
if (this.files[i]) {
promises.push(this.uploadThread(this.files[i]))
}
}
const result = await Promise.all(promises)
promises = []
this.count += this.threadsNumber
console.log(result)
}
const end = performance.now()
console.log(end - init)
}
}
module.exports = ThreadController
From the code sections above:
Initializing our three main parameters, the number of threads that we want, the files we want to upload, and the counter for created threads.
Load all the files contained into the folder we want to process.
Here we are sending a message with the right file path to the worker thread and waiting until the thread finishes its process using a
Worker
object.Running everything together. Now we are giving a thread for each file until there are no files to process. for example if there are 5 files and we pass 3 threads to process, at the first time It will process the first 3 files and at the second time will process the 2 files remaining. Also, I put a performance meter to test the behavior with a different number of threads.
File Upload Worker
The upload worker is the representation of the thread as a code, here we are going to put all we want that the thread does.
fileUploadWorker.js
const {
isMainThread, parentPort, workerData
} = require('node:worker_threads');
const path = require('path')
const { pipeline } = require('stream/promises')
const { createReadStream } = require('fs')
const CloudStorageFileService = require('./cloudStorageFileService');
class FileUploadWorker {
// (1)
constructor() {
this.storage = new CloudStorageFileService()
this.filePath = path.join(__dirname, '/content/', workerData.file)
this.fileName = workerData.file
}
// (2)
async upload() {
if (!isMainThread) {
await pipeline(createReadStream(this.filePath), await this.storage.uploadFile('myfileuploads', this.fileName))
}
}
}
// (3)
;
(async () => {
const fileUploader = new FileUploadWorker()
await fileUploader.upload()
})()
From the code sections above:
In the constructor we need to initialize the Storage service or we could receive it as a parameter. Also, we need to get the file path from the parent thread through the
workerData
.Here we check if we are in a thread dynamically created by us or in the NodeJS main thread. If we are not in the main thread we create a Readable Stream object from the file and upload it.
This anonymous function is responsible for executing our created thread.
Executing Everything
To test our application I will put 9 threads, one for each file in my folder. You can experiment with other values to measure the performance.
fileUploadWorker.js
const ThreadController = require('./threadController');
const controller = new ThreadController(9)
;
(async () => {
await controller.execute()
})()
Takeaways
- NodeJS is not single threading.
- Threads are handy when you need to process a heavy job and don't want to crash the NodeJS main thread.
- We also can use multithreading for batch jobs.
You can take a look at the entire code here
This content originally appeared on DEV Community and was authored by Wesley Miranda
Wesley Miranda | Sciencx (2023-03-19T17:44:11+00:00) Uploading multiple files at the same time using multithreading in NodeJS. Retrieved from https://www.scien.cx/2023/03/19/uploading-multiple-files-at-the-same-time-using-multithreading-in-nodejs/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.