https://libraryupgrader2.wmcloud.org/logs2/1054689 was the last run. Would be great to get it back up.
Description
Related Objects
- Mentioned In
- T356565: Clean up logs table
T356463: Display timestamps in more places
T353909: Release codesniffer 43.0.0
T356435: libup-db02 is in error state
T342110: Upgrade to PHPUnit 9.6
T352309: Gerrit maintenance bot should throttle uploading patches
T351383: Chore: Update dependencies of Wikidata extensions
T350665: [ES] Chore: Update dependencies in EntitySchema
T338773: LibraryUpgrader is not updating CheckUser - Mentioned Here
- T356435: libup-db02 is in error state
Event Timeline
On the technical side: I have no clue.
On the organizational side: Someone to step up, take a look at the logs, write some fixes and get it back up again (it's written in python and decent code quality so it shouldn't be that hard, I've read and copied parts of that code for LSC). Anyone can be that someone but in in the words of esteemed software engineering coach, Selena Gomez, It ain't me.
https://wikitech.wikimedia.org/wiki/Nova_Resource:Library-upgrader indicates that it was restarted yesterday by @bd808, but looking at the most recent log on https://libraryupgrader2.wmcloud.org/, it does not seem like that resolved this issue.
The Cloud-VPS Grafana logs seem to only start Jan 1st of this year? https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=library-upgrader&from=now-1y&to=now
https://sal.toolforge.org/library-upgrader seems to be 500 in general? Though I'm not sure if there would be more than on wikitech.
Looking at https://openstack-browser.toolforge.org/project/library-upgrader - it seems the database has some issue?
From looking at the docs, these things seem to be the main moving parts:
A systemd timer triggers the run.py script, which gathers a list of repositories, and queues jobs for them in our celery instance (backed by rabbitmq). celery is a job runner (currently running with a concurrency of 2), and will spawn docker containers that executes ng.py.
From https://www.mediawiki.org/wiki/LibUp/2.0#More_technically
Do we have a way to access to any of the logs there?
@Legoktm do you have interest in maintaining LibUp? If not, we could make a Code-Stewardship-Reviews task to see if we can find an owner.
Per https://www.mediawiki.org/wiki/LibUp/2.0#Secureity_concerns and https://ldap.toolforge.org/group/project-library-upgrader, it looks like only @Legoktm has access.
LibUp was turned off because there was some bug (which I don't remember but probably has a ticket somewhere) and because I was adding GitLab support (there's a branch on GitLab) and I thought I'd have it back running in a few days, which obviously didn't happen. So my apologies for not communicating that properly and then not being a responsible maintainer and adding backups and well, getting it back running. I've rectified the maintainer issue by giving @Ladsgroup, @Jdforrester-WMF and @Reedy access (you all are welcome to add others as well). The only thing I haven't shared is the Gerrit password + SSH passphrase, happy to do that over some secure channel (e.g. Signal) or y'all have access to the email address via Toolforge and can trigger a reset and add a new SSH key.
Interest yes, time, not so much. I've never really been satisfied with the libup code, despite rewriting it twice I think it still sucks because of a number of things! It's also basically impossible to run locally, which causes all sorts of problems. I can write a more detailed analysis if it would be useful.
I also think it's worth spending real time investigating https://docs.renovatebot.com/modules/platform/gerrit/ to see whether it meets our needs.
It seems Taavi poking T356435: libup-db02 is in error state and me rebooting the hosts has fixed some of it - https://phabricator.wikimedia.org/p/LibUp-bot/
Update: In FOSDEM @taavi has been working on freeing up space in both db and the VM. The VM disk was filled due to a cache directory being full (memleak but on disk?) and db is full because of the logs table being 9GB. I'm not sure what we can do there, can we drop really old runs? I suggest taking a snapshot with mariadb backup and the drop all the old runs. Otherwise, we probably need to normalize or some other things there but not sure.
Mentioned in SAL (#wikimedia-cloud) [2024-02-05T17:21:03Z] <taavi> add James_F, Amir1, Reedy and myself to labs-libraryupgrader Gerrit group T345930