User Details
- User Since
- Oct 8 2014, 1:57 PM (535 w, 4 d)
- Availability
- Available
- IRC Nick
- question_mark
- LDAP User
- Mark Bergsma
- MediaWiki User
- Unknown
Aug 21 2024
Feb 20 2024
This is approved.
Sep 8 2023
Approved.
Sep 7 2023
Approved.
Apr 19 2023
@BTullis @odimitrijevic Given that this is an ongoing privacy leak, could we get some clarity on whether we can get this deployed soon, or how other teams may be able to help if needed?
Jul 18 2022
Jul 11 2022
This appears to be configurable now in Swift 2.24.0 and later (we currently seem to be running 2.26.0 on 6/8 of frontends...), by enabling a piece of middleware and configuring RFC compliant ETag responses for specific Swift user accounts or containers:
https://docs.openstack.org/swift/latest/middleware.html#module-swift.common.middleware.etag_quoter
Feb 24 2022
Jul 26 2021
Given that the underlying problem that this change might help with has already caused multiple full outages (all wikis affected) in the past year alone and the extension is deployed on quite a few wiki, I'd like to ask this to be looked into again for the near-term. Raising priority to 'high'. Would this be in scope for PET's Clinic Duty? How can SRE help?
Feb 19 2021
Jan 22 2021
It's purely an idea I've had for a long time, to make it immediately obvious to anyone logging in what is backed up, and what isn't. That should help to:
Oct 8 2020
Hi all,
Sep 4 2020
Approved.
Sep 1 2020
Approved.
Approved.
Aug 20 2020
@wiki_willy @Papaul It seems we've had an ongoing pattern of crashes with this (rather important) backup host, which means we are not yet able to trust it. Until we are able to resolve this we also cannot decommission the older hosts (that this replaces) either. At the moment the system doesn't even boot. Are there any steps we can take soon to debug this issue? Anything we can help with? Thanks!
Jul 10 2020
Approved.
May 27 2020
Apr 7 2020
Feb 21 2020
I am pretty sure there are a bunch of optics (of various kinds) in the "spare" switches, in the bottom of rack OE15. Unfortunately those switches are not powered up, and certainly not configured and remote manageable - something we should probably fix on next visit.
Feb 18 2020
There are multiple 10G LR optics on-site for sure. Longer distance ones, less so.
Feb 13 2020
Personally I don't think Pybal should be rejecting that; it's a valid configuration from a technical standpoint, and there can be valid reasons to have it, at least temporarily. But we may decide that in our specific environment that should be avoided at all cost, so perhaps that logic should be implemented elsewhere - in the code that manages pooling state.
Feb 12 2020
@wiki_willy With Chris having been ill the past few days, what's a realistic new ETA for this?
Dec 10 2019
I agree - it seems that PyBal adds no real value here, because it's essentially load balancing the k8s load balancers. Why couldn't our caching layer do that directly, and know about all the k8s proxies/nodes directly and do health checks for them?
Nov 28 2019
Nov 27 2019
Nov 26 2019
scs1-oe16-esams:psu1 {#20163} to ps2-oe16-esams:34
scs1-oe16-esams:psu2 {#20164} to ps1-oe16-esams:34
cr3-esams now has its power cables labeled:
cr2-esams now has its power cables labeled:
All duplicate ids have been fixed, labels replaced for one pair and updated in netbox.
I've filled out all red cells in the (original) bootstrap spreadsheet.
All 7 cable managers have been asset tagged and put into Netbox with the appropriate info and rack position.
All SERVER power cords have been audited in this sheet: https://docs.google.com/spreadsheets/d/1RMb6lMCc94wUj6MgSm1yYdnAC3SUsZIRj8zHLtxRx4o/edit?usp=sharing
Done.
Nov 25 2019
Nov 4 2019
CPT: please take a new look, thanks :)
Oct 24 2019
I'm a bit confused; as far as I know the old plan was always to have HA of Phabricator between eqiad and codfw, and the linked task T190572 also talks about that. So is that no longer the case, and if so, why is that? I believe there have been blockers & complications for that deployment, but are they documented anywhere? How does this task relate to those plans, why do we feel failover within eqiad is (also) needed?
Oct 22 2019
Could CPT take a look at this please? Thanks!
Sep 17 2019
What's the status of this? Is this done and working?
Sep 12 2019
Sep 5 2019
Hi Anusha, Greg,
Aug 9 2019
EX4200 can also have any port converted as VC - just won't be as fast, max 10Gbps.
Aug 6 2019
Approved for access.
Jul 23 2019
Because this means that right now stub dumps generation for (at least) enwiki and dewiki and several other is broken, we have only a few days to fix this before the dumps need to be done at the end of the month. Setting UBN...
Apr 16 2019
Apr 5 2019
While I agree with Daniel and others that the use of the MediaWiki db connection/load balancing layer is an absolute minimum requirement, there are quite a few other potential problems that could affect the security/privacy, reliability or maintainability of our data and services, if Doctrine is to be used to access MediaWiki's existing databases in any way (it's definitely easier if done in separate, not connected database clusters). However this ticket so far is very sparse on details, and we don't have the information we need to make an informed decision. I've requested access to the linked document yesterday, but so far it wasn't granted yet. Alternatively, could this perhaps be replicated here on Phabricator so everyone involved can build an informed opinion? Thanks. :)
Apr 1 2019
There has been some concern from our DBAs the archiving of the old policy will make it even harder for developers to find out about what database-related requirements their code should fulfill, and what the processes would be to get any schema or query changes deployed (such as a link to the Schema_changes page). The old information on database related requirements, while admittedly a bit outdated, was discussed as an RFC at the time: https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2015-09-16
Feb 5 2019
Jan 23 2019
Yes, we should probably move over to prefix-limit to prevent (improving) filters from making accepted-prefix-limit ineffective.
Have a look at https://github.com/mwiget/bgp_graceful_shutdown for a JunOS op script (SLAX) that does this fully automatically for all peers with a single command.
This was solved by fixing the original bastion, a while ago.
I really don't see the point of this. With the scarcity of IPv4 space we only need to get MORE flexible about how we use our IP space, and we will almost certainly not be able to maintain production vs others split between these address blocks in the future. Rather than spend time on renumbering I think it's much more valuable to spend that effort on better managing our ACLs and more automation.
Jan 11 2019
Right now I can only find a single graph with eqiad/codfw total (aggregated) power usage, but proper per-rack power usage data is still entirely missing. This makes it currently very hard to determine the total amount of power used per rack (across all phases) and to monitor things like phase imbalance.
Dec 19 2018
I am getting the impression here that some things are being rushed and finalized without time for a proper discussion between people/teams about the different possible solutions and their impact, after this new discovery. Is that because goals are due to be posted now?
Oct 12 2018
Sep 18 2018
Although we didn't manage to discuss this in our SRE meeting yesterday I discussed it with relevant people afterwards.
Sep 11 2018
T97368 appears to be about the same issue.
Indeed, let's go with a "proper" Debian package, imho the cleanest way to go and conforming to how we do things.
Sep 3 2018
Yes, this can be merged once Nuria approves.
Aug 14 2018
@Dzahn please get her added to this list. Thanks!
Aug 13 2018
Aug 10 2018
Jul 30 2018
I am a bit confused by this RFC/proposal as it stands now, as I feel it doesn't really reflect the discussions we've been having.
Jul 25 2018
@ema: Has this been seen again? Does this need any work in Pybal?
The eqdfw-knams needs have a lower metric than the current primary (codfw-eqiad + eqiad-esams) links so traffic from codfw to esams prefer that link.
Jul 24 2018
Jul 16 2018
Jul 11 2018
This has been addressed in acdd0ebf74e5dd9e06c3216b9a93063ab8e91574