When 'bots crawl through my public repo, I get "get branches: exit status 1" errors on the logs. Why? #8001
Unanswered
GwynethLlewelyn
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Ok, so, perhaps I'm just overthinking a stupid issue here, and worried about nothing.
While upgrading from Gogs 0.13.2 to 0.13.3, I was carefully looking at the logs, to spot any mistakes I might have done during the (manual) upgrade. And I noticed a strange, recurring error on the logs, which, however, does not happen all the time:
Now, what surprised me at first was that there were so many trace messages for my single-user Gogs installation. While most of it is public, and some of the repos there are either mirrored from GitHub/GitLab, or they are the origin for GitHub/GitLab to mirror them — so, a few automated messages from the mirroring would be expected — otherwise, I would expect next-to-zero traffic. And, in fact, I am not really seeing any
500
errors. So, I thought, what is going on there?And, most importantly, how can I prevent this error from happening? I first thought it was a template error (I'm using a tweaked version of a 'dark mode' template); but as you can see in the example above, several templates exhibit the error, and this doesn't really make sense.
The relevant bit of code which dumps the error comes from
...and, after unravelling this code, I sort of figured out that ultimately this is a call from the included
github.com/gogs/git-module
, which, to my surprise, contains wrapper code to fork an external instance ofgit
to perform some of the operations (!) — I always thought that everything was handled in native Go!So, whatever is issuing this error, it comes from an attempt at figuring out which branches are active/available/listed for a specific repository — and that operation fails, and an error is propagated until it reaches this level of the code. Unfortunately, the only "error" we get is that the forked process exited abnormally, but the reason for that remains unknown.
Theoretically, though, this shouldn't be happening "spontaneously", that is, "something" ought to be calling a specific page on Gogs, which (probably) requires a list of all branches, which, in turn, fails for some reason, but this trace doesn't log anything very useful for tracking down the culprit.
However, I had a way to figure it out!
You see, my Gogs installation, while open to the public, has two layers of protection. The first is the locally-running
nginx
server, which does some clever filtering measure (to catch the worst of the worst intruders) and a bit of additional caching; and on top of that, I run all my domains (well, except for two) behind the Cloudflare firewalled global caching system, which does much more clever checks than my humblenginx
installation does.Nevertheless, since it's
nginx
that gets the URL to pass it to Gogs — which runs onlocalhost
and is not directly accessible from the 'net — and such URLs get logged bynginx
as well. To my utter surprise... there was quite a lot of traffic there! Namely, from legitimate 'bots — the usual suspects (Google, Bing, ahrefs...) but also the brand-new ones which scrape the 'net for content to train their AIs with, such as Meta's. All these are pre-vetted by Cloudflare, of course, and even mynginx
configuration, by default, lets them through — I don't really have no reason to block them, unless they're disrespecting the limits set onrobots.txt
(which the legitimate 'bots rarely do).I didn't notice any 'suspicious' activity, either — just the expected, regular web-scraping URLs. Nevertheless, it seems clear to me that some of the 'bots must have the wrong URLs (which
nginx
cannot check for), asking for data in repositories that don't exist any more or have gone through substantial changes, which, in turn, makes Gogs give an error when trying to retrieve a non-existing repository (or a deleted branch from an existing repository). In other words, this is all 'normal and expected behaviour', and I shouldn't worry too much about it...... right?
On the other hand, if a substantial number of invalid pages are due to failed forked processes, these do consume resources shared with other, more important processes running in the same machine; and, naturally enough, I'd love to avoid those that are simply being throwing away because they're broken/non-existing — but do that before the (useless) forking happens.
Is whatever I'm asking for feasible to do under the current version of Gogs?
If not... how could this somehow be circumvented?
That said, I thank you all in advance for any insights you may wish to share!
Beta Was this translation helpful? Give feedback.
All reactions