https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/501444/ seems to have changed the behaviour of the links update job to use the parser cache when a page has an entry.
As a result of this when a wikibase repo (wikidata) has an edit that it dispatches to a client to update its pages sometimes ContentAlterParserOutput will not run, which is currently what adds the wikibase_item page prop.
After a little more digging, it looks like perhaps the behaviour actually changed at the end of 2018 in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/465157/
At least in this patch it looks as though refreshlinks requests the parser output from RevisionRendered with with generate-html set to the result of shouldCheckParserCache, which I believe would be false for a client page having a links update triggered by wikidata dispatching.
This will not happen to pages that get updated that do not have a parser cache entry.
All pages can be fixed with a null edit.
I also assume when a parser cache entry expires the new entry will update the page props correctly (TODO verify / confirm?)
Possible solutions
- Change where this (and potentially other) updates happen.
- Allow the wikibase update process to ignore the parser cache in the job
- Something else?
Bug report
Title: page_props missing links for some Commons category <-> Wikidata sitelinks
Some Commons categories are linked to Wikidata, but are missing from the page_props table in the database. Examples:
https://commons.wikimedia.org/wiki/Category:Broadway_East,_Baltimore
https://commons.wikimedia.org/wiki/Category:Buddhist_temples_in_Lamphun_Province
https://commons.wikimedia.org/wiki/Category:Buddhist_temples_in_Ubon_Ratchathani_Province
https://commons.wikimedia.org/wiki/Category:Civil_law_notaries
https://commons.wikimedia.org/wiki/Category:Climate_change_conferences
https://commons.wikimedia.org/wiki/Category:Former_components_of_the_Dow_Jones_Industrial_Average
https://commons.wikimedia.org/wiki/Category:Dukes_of_the_Archipelago
https://commons.wikimedia.org/wiki/Category:Eastern_Catholic_orders_and_societies
https://commons.wikimedia.org/wiki/Category:English_people_of_Turkish_descent
Lucas confirmed this on Twitter https://twitter.com/LucasWerkmeistr/status/1175747434208727040 and there's discussion at https://www.wikidata.org/wiki/User_talk:Jheald#Quarry_oddity . It's not clear why this is happening - the examples I've given are from categories I've found when looking through commons links from enwp (in alphabetical order, hence B-E).
Acceptance criteria
- page_props are always updated for client page when sitelink is added to the repository