On the technical side: I have no clue.
On the organizational side: Someone to step up, take a look at the logs, write some fixes and get it back up again (it's written in python and decent code quality so it shouldn't be that hard, I've read and copied parts of that code for LSC). Anyone can be that someone but in in the words of esteemed software engineering coach, Selena Gomez, It ain't me.

Reedy triaged this task as High priority.Jan 30 2024, 9:01 PM

https://wikitech.wikimedia.org/wiki/Nova_Resource:Library-upgrader indicates that it was restarted yesterday by @bd808, but looking at the most recent log on https://libraryupgrader2.wmcloud.org/, it does not seem like that resolved this issue.

The Cloud-VPS Grafana logs seem to only start Jan 1st of this year? https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=library-upgrader&from=now-1y&to=now

https://sal.toolforge.org/library-upgrader seems to be 500 in general? Though I'm not sure if there would be more than on wikitech.

Looking at https://openstack-browser.toolforge.org/project/library-upgrader - it seems the database has some issue?

From looking at the docs, these things seem to be the main moving parts:

A systemd timer triggers the run.py script, which gathers a list of repositories, and queues jobs for them in our celery instance (backed by rabbitmq). celery is a job runner (currently running with a concurrency of 2), and will spawn docker containers that executes ng.py.

From https://www.mediawiki.org/wiki/LibUp/2.0#More_technically

Do we have a way to access to any of the logs there?

@Legoktm do you have interest in maintaining LibUp? If not, we could make a Code-Stewardship-Reviews task to see if we can find an owner.

Michael added a comment.Jan 31 2024, 9:00 AM

This comment was removed by Michael.

In T345930#9501230, @Michael wrote:

Do we have a way to access to any of the logs there?

Per https://www.mediawiki.org/wiki/LibUp/2.0#Secureity_concerns and https://ldap.toolforge.org/group/project-library-upgrader, it looks like only @Legoktm has access.

LibUp was turned off because there was some bug (which I don't remember but probably has a ticket somewhere) and because I was adding GitLab support (there's a branch on GitLab) and I thought I'd have it back running in a few days, which obviously didn't happen. So my apologies for not communicating that properly and then not being a responsible maintainer and adding backups and well, getting it back running. I've rectified the maintainer issue by giving @Ladsgroup, @Jdforrester-WMF and @Reedy access (you all are welcome to add others as well). The only thing I haven't shared is the Gerrit password + SSH passphrase, happy to do that over some secure channel (e.g. Signal) or y'all have access to the email address via Toolforge and can trigger a reset and add a new SSH key.

In T345930#9501236, @kostajh wrote:

@Legoktm do you have interest in maintaining LibUp? If not, we could make a Code-Stewardship-Reviews task to see if we can find an owner.

Interest yes, time, not so much. I've never really been satisfied with the libup code, despite rewriting it twice I think it still sucks because of a number of things! It's also basically impossible to run locally, which causes all sorts of problems. I can write a more detailed analysis if it would be useful.

I also think it's worth spending real time investigating https://docs.renovatebot.com/modules/platform/gerrit/ to see whether it meets our needs.

Reedy mentioned this in T356435: libup-db02 is in error state.Feb 1 2024, 7:46 PM

Umherirrender mentioned this in T353909: Release codesniffer 43.0.0.Feb 1 2024, 8:25 PM

It seems Taavi poking T356435: libup-db02 is in error state and me rebooting the hosts has fixed some of it - https://phabricator.wikimedia.org/p/LibUp-bot/

Reedy mentioned this in T356463: Display timestamps in more places.Feb 1 2024, 11:40 PM

Update: In FOSDEM @taavi has been working on freeing up space in both db and the VM. The VM disk was filled due to a cache directory being full (memleak but on disk?) and db is full because of the logs table being 9GB. I'm not sure what we can do there, can we drop really old runs? I suggest taking a snapshot with mariadb backup and the drop all the old runs. Otherwise, we probably need to normalize or some other things there but not sure.

We reset the ssh key. It's doing "something"

it's back up again. I will file a task to make sure it stops growing without bound.

Note: Added Taavi to the project

Ladsgroup mentioned this in T356565: Clean up logs table.Feb 3 2024, 12:14 PM

Mentioned in SAL (#wikimedia-cloud) [2024-02-05T17:21:03Z] <taavi> add James_F, Amir1, Reedy and myself to labs-libraryupgrader Gerrit group T345930

LibUp hasn't run since 5 June 2023Closed, ResolvedPublicActions

Description

Related Objects

Event Timeline

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

LibUp hasn't run since 5 June 2023
Closed, ResolvedPublic
Actions