ci/github-script/bot: improve parallelism

We used to employ the worst strategy for parallelism possibly: The rate
limiter capped us at one concurrent request per second, while 100+ items
were handled in parallel. This lead to every item taking the full
duration of the job to proceed, making the data fetched at the beginning
of the job stale at the end. This leads to smaller hiccups when
labeling, or to the merge-bot posting comments after the PR has already
been closed.

GitHub allows 100 concurrent requests, but considers it a best practice
to serialize them. Since serializing all of them causes problems for us,
we should try to go higher.

Since other jobs are running in parallel, we use a conservative value of
20 concurrent requests here. We also introduce the same number of
workers going through the list of items, to make sure that each item is
handled in the shortest time possible from start to finish, before
proceeding to the next. This gives us roughly 2.5 seconds per individual
item - but speeds up the overall execution of the scheduled job to 20-30
seconds from 3-4 minutes before.
This commit is contained in:
Wolfgang Walther
2025-11-04 10:59:03 +01:00
parent 6a54ddb71f
commit 810b9ba51d
2 changed files with 24 additions and 8 deletions

View File

@@ -496,7 +496,11 @@ module.exports = async ({ github, context, core, dry }) => {
}
}
await withRateLimit({ github, core }, async (stats) => {
// Controls level of parallelism. Applies to both the number of concurrent requests
// as well as the number of concurrent workers going through the list of PRs.
const maxConcurrent = 20
await withRateLimit({ github, core, maxConcurrent }, async (stats) => {
if (context.payload.pull_request) {
await handle({ item: context.payload.pull_request, stats })
} else {
@@ -625,11 +629,23 @@ module.exports = async ({ github, context, core, dry }) => {
arr.findIndex((firstItem) => firstItem.number === thisItem.number),
)
;(await Promise.allSettled(items.map((item) => handle({ item, stats }))))
.filter(({ status }) => status === 'rejected')
.map(({ reason }) =>
core.setFailed(`${reason.message}\n${reason.cause.stack}`),
)
// Instead of handling all items in parallel we set up some workers to handle the queue
// with more controlled parallelism. This avoids problems with `pull_request` fetched at
// the beginning getting out of date towards the end, because it took the whole job 20
// minutes or more to go through 100's of PRs.
await Promise.all(
Array.from({ length: maxConcurrent }, async () => {
while (true) {
const item = items.pop()
if (!item) break
try {
await handle({ item, stats })
} catch (e) {
core.setFailed(`${e.message}\n${e.cause.stack}`)
}
}
}),
)
}
})
}

View File

@@ -1,4 +1,4 @@
module.exports = async ({ github, core }, callback) => {
module.exports = async ({ github, core, maxConcurrent = 1 }, callback) => {
const Bottleneck = require('bottleneck')
const stats = {
@@ -13,7 +13,7 @@ module.exports = async ({ github, core }, callback) => {
// https://docs.github.com/en/rest/using-the-rest-api/best-practices-for-using-the-rest-api
const allLimits = new Bottleneck({
// Avoid concurrent requests
maxConcurrent: 1,
maxConcurrent,
// Will be updated with first `updateReservoir()` call below.
reservoir: 0,
})