We used to employ the worst strategy for parallelism possibly: The rate limiter capped us at one concurrent request per second, while 100+ items were handled in parallel. This lead to every item taking the full duration of the job to proceed, making the data fetched at the beginning of the job stale at the end. This leads to smaller hiccups when labeling, or to the merge-bot posting comments after the PR has already been closed. GitHub allows 100 concurrent requests, but considers it a best practice to serialize them. Since serializing all of them causes problems for us, we should try to go higher. Since other jobs are running in parallel, we use a conservative value of 20 concurrent requests here. We also introduce the same number of workers going through the list of items, to make sure that each item is handled in the shortest time possible from start to finish, before proceeding to the next. This gives us roughly 2.5 seconds per individual item - but speeds up the overall execution of the scheduled job to 20-30 seconds from 3-4 minutes before.
24 KiB
24 KiB