Which GitHub repo is going to have the most commits?

So I saw this post by Virej Dasani: Which GitHub repo has the most commits?, and I wondered, “Is there a faster, more efficient way of reaching 3,000,000 commits?”

Well, of course the answer is yes, and the answer deals with multi-threading and hacky …


This content originally appeared on DEV Community and was authored by [Cursors]

So I saw this post by Virej Dasani: Which GitHub repo has the most commits?, and I wondered, "Is there a faster, more efficient way of reaching 3,000,000 commits?"

Well, of course the answer is yes, and the answer deals with multi-threading and hacky Git tricks.

First off, I use the Master/slave design pattern:

if (__filename.split("/").reverse()[1] === "master") {
    // process is master (folder names determine status)

    console.log(`Spawning ${threads} slave${threads !== 1 ? "s" : ""}...`);

    for (let i = 0; i < threads; i++) {
        // spawn slaves
    }
} else {
    // process is slave
}

You might think that you'd just spawn multiple processes that spam empty commits, just like how Virej used a Python loop, but no, that won't work.

It is almost certain that two processes will commit at the same time, resulting in the HEAD.lock file in the .git/ directory not matching.
Error! It's fatal and also it's slower to ignore this with traditional try-catch.
Plus, if we have many processes (10+), this will happen almost every time a commit is made, hindering progress.

Instead we will provide each slave its own Git repository, where it can commit happily as it chooses, separate from other slaves.

for (let i = 0; i < threads; i++) {
    execSync(`git clone ../master ../slave-${i}`);

    execSync(`tsc ../slave-${i}/index.ts`);

    const slave = fork(`../slave-${i}/index.js`, [threads, commits, i.toString()].map((v) => v.toString()), { cwd: join(process.cwd(), "..", `slave-${i}`) });

    // ...
}

We also pass the identifier to the slave using fork's argv parameter, and we also set the cwd of the slave to its repository's root.

Each slave will make its own branch to commit on, and when it's done, it will notify the master process.
The master will then merge the slave's commits onto the master branch, and then proceeds to delete the slave's Git repository and kill its process.

const id = process.argv[4];

execSync(`git checkout -b slave-${id}`);

for (let i = 1; i < commits * 1000; i++) {
    execSync(`git commit --allow-empty -m "[slave-${id}]: ${i}"`);

    process.send!(`commit ${i}`);
}

process.send!("EXIT");

Using process.send, we can send a message to the master process:

slave.on("message", (msg) => {
    if (msg === "EXIT") {
        execSync(`git remote remove local`);
        execSync(`git remote add local ../slave-${i}`);
        execSync(`git fetch local`);
        execSync(`git merge local/slave-${i}`);

        rmSync(`../slave-${i}`, { recursive: true, force: true });

        return slave.kill();
    }

    return console.log(`[slave-${i}]: ${msg.toString()}`);
});

Finally, we can add a little more flair if we wish:

// A check so that we are in the correct directory...

if (join(process.cwd(), "index.ts") !== __filename) {
    console.log(`You must be inside the master repository.`);

    process.exit();
}
// When the master is interrupted it will clean up its mess...

process.on("SIGINT", () => {
    console.log(`Cleaning up... please wait.`);

    for (let i = 0; i < created; i++) {
        try {
            execSync(`git remote remove local`);
            execSync(`git remote add local ../slave-${i}`);
            execSync(`git fetch local`);
            execSync(`git merge local/slave-${i}`);
        } catch {
            console.log(`Unable to merge 'slave-${i}'`);
        }
    }

    execSync(`rm -rf ../slave-*`);

    process.exit();
});

Final notes:

  • GitHub repository
  • Leave a star if you liked, leave an issue if you disliked.
  • I'm committing a lot, and then pushing all those commits, so I'm not spamming GitHub's servers.
  • The .git directory will become very large, but nowhere close to GitHub's previously known limit: 100GB
  • Write an implementation in other languages and make sure to share it with me!


This content originally appeared on DEV Community and was authored by [Cursors]


Print Share Comment Cite Upload Translate Updates
APA

[Cursors] | Sciencx (2021-09-20T22:40:46+00:00) Which GitHub repo is going to have the most commits?. Retrieved from https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/

MLA
" » Which GitHub repo is going to have the most commits?." [Cursors] | Sciencx - Monday September 20, 2021, https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/
HARVARD
[Cursors] | Sciencx Monday September 20, 2021 » Which GitHub repo is going to have the most commits?., viewed ,<https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/>
VANCOUVER
[Cursors] | Sciencx - » Which GitHub repo is going to have the most commits?. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/
CHICAGO
" » Which GitHub repo is going to have the most commits?." [Cursors] | Sciencx - Accessed . https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/
IEEE
" » Which GitHub repo is going to have the most commits?." [Cursors] | Sciencx [Online]. Available: https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/. [Accessed: ]
rf:citation
» Which GitHub repo is going to have the most commits? | [Cursors] | Sciencx | https://www.scien.cx/2021/09/20/which-github-repo-is-going-to-have-the-most-commits/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.