Wiktionary:Beer parlour/2023/June

Homophones and Tonal languages

Is the use of Template:hmp for terms which have homophones but are tonally different, correct? Example would be for Punjabi باہَر (bā́r, “outside”) (vs بَہار (bā̀r, “spring [season]”) and بار (bār, “a bar; multiple meanings”)). نعم البدل (talk) 00:22, 1 June 2023 (UTC)[reply]

That doesn't seem right to me, but I also think it's valuable information to have (for instance, if someone learning a tonal language wanted to figure out exactly what gaffes they might be making by using the wrong tones with a given set of phonemes). Andrew Sheedy (talk) 20:20, 1 June 2023 (UTC)[reply]

Perhaps we want an English-based concept of near-homophone. For example, Anglophones speaking Thai notoriously have problems with not only tones but also the initial consonant contrast /b~p~pʰ/. Indeed, Fundamentals of the Thai Language (Campbell and Shaweevongse) completely failed to distinguish the last two. --RichardW57m (talk) 11:31, 2 June 2023 (UTC)[reply]

The issue with tones in Punjabi, is that they're not marked. bā́r is actually باہَر (bāhar). bā̀r is actually بَہار (bahār) and bār is just bār. The average Punjabi speaker won't actually know about tonal sounds. نعم البدل (talk) 15:09, 2 June 2023 (UTC)[reply]

But they should still know that the words sound different. --RichardW57 (talk) 16:53, 2 June 2023 (UTC)[reply]

Agree. If tone is phonemic, then it's not a homophone. Ex: Yorùbá ọkọ̀ & ọkọ́ would not be homophonous to me. AG202 (talk) 21:27, 2 June 2023 (UTC)[reply]

Thank you, I'll remove them the listed homophones in that case. نعم البدل (talk) 21:45, 2 June 2023 (UTC)[reply]

There appears usage in Chinese linguistics where “homophone” is related to the sound without the tonemes.

The Routledge Encyclopedia of the Chinese Language, 2016, page 235: “打铁 dǎtiě ‘hit + iron = to forge iron’ is often used instead of and with the meaning of 打贴 dǎtiē ‘writing a post’. In the last two examples, the character that replaces the correct one is not a homotone but but only a homophone.”

François Martin (1989) “Travaux pratiques: Lecture du parallélisme dans deux huitains de Du Fu”, in Extrême-Orient Extrême-Occident‎^[1], number 11, page 126:

La poitrine est dite dang. Ce mot, compte tenu ď interférences (cf. Shuowen, glose du Hanshu, etc.) avec un autre dang, homophone, homotone et quasi homographe, offre les sens suivants: « étendue d'eau, vaste (comme elle), agitée (comme elle), pur ou purifié (comme par l'onde), libre (comme un bateau non manœuvré, abandonné aux flots), libéré des conventions ».

I have not found an unambiguous correlate to homotone though, a term of the same phonemic segments bar tones, save other kludgy ad-hoc terms like segmental homonym, while from the history of the phonetic science and dogmatics, The Life of Daniel Jones p. 359, we are informed that the idea of a phoneme was only extended thirty years after the invention of the term phoneme to length (“chroneme”) and tone (“toneme”), so it has historical legitimacy and thus etymological consequence to use “homophone” in a sense restricted to segmental phonemes. Indeed, Daniel Jones who is claimed to have first introduced the term phoneme, has applied it in his 1917 article to only segmental phonemes of the Tswana language which has tones. Thus we need a retronym for the senior homonym.

The Wiktionary glossing of homonym is a wrong simplification anyway. The general definition is, in the words of Jan W. Mulder, Sándor G. Hervey (1972) Theory of the Linguistic Sign‎^[2], The Hague: Mouton, →DOI, page 30: (semiotics) Two tentative signs that are formally the same but denotationally different. Fay Freak (talk) 05:34, 4 June 2023 (UTC)[reply]

Blind Spot: OCLC Numbers

People change things about citations I've added, maybe even delete them, etc. from time to time. BUT- no one ever changes the OCLC numbers I add.
(1) Has anyone reading this ever found a citation on Wiktionary that used the wrong OCLC number? If so, could you tell me a little about that? (Feel free to check my edits for any potentially dubious OCLC number.)
(2) If almost no one has ever changed an OCLC number or challenged an OCLC number, what's the likelihood that the OCLC numbers on Wiktionary are potentially systematically wrong on some level, or perhaps there are numerous undiscovered random errors?
(3) Is there any kind of mechanism or page where random citations, including OCLC numbers, are spot-checked?
(4) When there are multiple "versions" of a text with different OCLC numbers, are there criteria (either formally or informally) for determining which OCLC is more appropriate?
I guess this doesn't matter to most users, and even to the users it does matter to, they are primarily concerned with the ISBN being the one in the work.
If you can engage at all with even some part of these questions, please let me know. --Geographyinitiative (talk) 11:23, 1 June 2023 (UTC)[reply]

For me it's always an arbitrary call. A lot of the citations that I add don't have ISBN numbers but they do have OCLC on the worldcat website, which I'll always include. Realistically I'll always include some sort of identifier, if possible.

But as far as multiple and/or incorrect OCLC numbers are concerned, (and to be fair I've seen the same happen with ISBN - where there are multiple identifiers) I'll just always double check the year published, author etc, and just make an arbitrary call. You could also double check it with Google Books, if they have it on their catalogue. نعم البدل (talk) 13:43, 1 June 2023 (UTC)[reply]

Can the format of OCLC numbers be automatically checked, as with ISBNs? This sort of stuff should be done automatically, and not take up valuable editing time. – Jberkel 23:29, 1 June 2023 (UTC)[reply]

@Jberkel: Unlike ISBNs and ISSNs, I don't think OCLC numbers have check digits, which means there isn't a way to confirm if they are correct. They seem to just be issued sequentially by the OCLC. Unfortunately it does appear that the same OCLC numbers are sometimes associated with different works, perhaps by accident or due to a typo. — Sgconlaw (talk) 16:58, 2 June 2023 (UTC)[reply]

Yeah, I can't find the discussion offhand but I recall a case earlier this year where either the printed book itself or whatever repository the person pulled the OCLC from had typoed it. How difficult would it be to program a bot to look up all our OCLCs and check e.g. how many of the words we say occur in the title also occur in the titled listed on (wherever we're looking it up)? - -sche (discuss) 19:41, 2 June 2023 (UTC)[reply]

Thank you all for thinking about this. I'm a person that is prone to fuck shit up, so I'm always thinking "oh, where did I fuck this up this time?" If there were a bot or something to check these numbers somehow, it would be great. I will be brave and tell you a little about my OCLC number selection process. What I've noticed is that some OCLC entries are basically horseshit, and then there's one, two or three (for the same title) that are high quality. Usually I look for the OCLC entry with the same ISBN (obviously) and a total number of pages included, and I try to check to see that the number of pages is the same as the last page in the book (for Internet Archive) or is the same as listed (for Google Books in limited view). That's so primitive, but I don't want to turn around in ten years and someone say "you fucked up everything bro". --Geographyinitiative (talk) 19:48, 2 June 2023 (UTC)[reply]

Yeah, don't worry too much about fucking shit up, it happens by any user that produces lots of content. It's a wiki, and the important errors will eventually be fixed. But things like OCLCs, well frankly most people don't give a monkey's butt about them so those errors might never be caught. Quantity over quality, I says. Queen of Troubled Teens (talk) 11:41, 6 June 2023 (UTC)[reply]

Useful Lexical Ghosts

I have an issue with a Pali lexical ghost that derives from a real adverb Pali sāyaṃ (“in the evening”). On encountering it, it is natural for a user to stem it and obtain sāya (noun) and look that up. (At present that is all he can do, but that is because Wiktionary is a work-in-progress.) Traditional Pali word lists contain this word, and record it as masculine, but apart from this use of the accusative singular and some derivative words, it appears to be a lexical ghost. There is a learned claim that the adverb is not derived from a noun at all, which would imply that Vedic Sanskrit साय n (sāya, “termination, evening”) is a parallel back formation.

We can't make this word a reconstruction, for we have significant reason to doubt that the noun ever existed in Pali.

So, what should I do? I'm inclined to record it as a lexical ghost (are there templates for this?) with masculine gender and limit its declension to nominative singular (for the alternative citation form, which I can find in a Pali-Pali dictionary - "Tikaṃ sāyanhe. Sāyati dinaṃ avasāyatīti sāyo, sāyanto vā dinantaṃ karonto ayatīti sāyo, punnapuṃsake") and accusative singular. This runs counter to the general policy of not including lexical ghosts, though conceivably their presence in dictionaries can override this for LDLs, which is why I am stating my intent in the Beer Parlour rather than raising the issue in the Tea Room. --RichardW57m (talk) 11:07, 2 June 2023 (UTC)[reply]

Hmm. If we know the word didn't exist, ... technically perhaps LDL rules would permit an entry citing the other dictionaries and explaining why they're wrong, but perhaps something more like {{no entry}} is better, pointing users to the nearest real word and explaining the issue there. - -sche (discuss) 19:43, 2 June 2023 (UTC)[reply]

I've tried doing something like that. One problem I've found is that if I remove the headword line, the links to the pages for sāya and its equivalents across the Pali writing systems turn orange, which is extremely confusing. The orange link system is too clever by half. --RichardW57 (talk) 04:56, 3 June 2023 (UTC)[reply]

What part of speech should the non-entries have? I've currently recorded them as 'noun'. --RichardW57 (talk) 05:10, 3 June 2023 (UTC)[reply]

Updating WT:Etymology to ban listing of cognates when there is a page with a descendants section

In short, I think we should not list cognates on pages if those same cognates are on a Proto-page or something similar. I do not wish to discuss pages beyond that (i.e. pages connected ultimately through PIE), but I mean more within families. Vininn126 (talk) 18:50, 2 June 2023 (UTC)[reply]

Agreed. Nicodene (talk) 19:03, 2 June 2023 (UTC)[reply]

How can one determine when such a page exists? — Sgconlaw (talk) 19:17, 2 June 2023 (UTC)[reply]

Reconstruction:Proto-Slavic/mъlviti, abdomen#Latin. Pretty easy. Vininn126 (talk) 19:20, 2 June 2023 (UTC)[reply]

Such a page should be linked in the etymology. If one isn't linked, or doesn't exist, I wouldn't remove cognates. Nicodene (talk) 20:24, 2 June 2023 (UTC)[reply]

Strong oppose. Having one or two cognates is useful even if there is an ancestor. Thadh (talk) 19:22, 2 June 2023 (UTC)[reply]

Useful in what way? Vininn126 (talk) 19:23, 2 June 2023 (UTC)[reply]

First of all, it gives a quick insight in what lies ahead on the reconstructed page. It also is helpful for people that are familiar with the mentioned languages, because they now won't have to read a new page just to confirm their suspicions. And finally when dealing with larger families it just gives you more insight if you list Hawaiian and Maori than a list of all descendants of Proto-Polynesian at once. Thadh (talk) 22:38, 2 June 2023 (UTC)[reply]

As I've said before, there's no such thing as "one or two cognates" in many cases. There are lots of IP editors whose only edits consist of adding their own language to cognate lists. After all, you can't have Icelandic without Danish/Faroese/Nynorsk/Bokmål/Swedish/Westrobothnian- that wouldn't be fair... Chuck Entz (talk) 19:37, 2 June 2023 (UTC)[reply]

Well, the reasonable solution is to revert those IPs based on a community decision which cognate are to be shown: We're not abolishing Proto-Dravidian just because there may be IPs whose only edits are claiming that every single term derives from Sanskrit, are we? Thadh (talk) 22:40, 2 June 2023 (UTC)[reply]

Is there such a community decision? All of the cognates are valid. Which languages are missing is often a matter of random chance. At any rate, I'm not arguing we should never list cognates, just pointing out that such phenomena exist and we need to consider them. Chuck Entz (talk) 23:03, 2 June 2023 (UTC)[reply]

For rarer words such a list is very much useful. It shows that the Proto-form is based on several descendants, not just from one. I also suspect that many readers are not going to click through a link to get to the descendants list, if they are even aware that it exists. The cognate list makes the relationship more obvious and easier to find. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 21:04, 2 June 2023 (UTC)[reply]

That user behaviour is because they have been habituated to see cognates without an ancestor page. And surely IPs add cognates whether we want or not because they have seen superfluous cognates on older pages where man would remove them, they have learnt that this is how the dictionary works or is read and redacted. Fay Freak (talk) 21:13, 2 June 2023 (UTC)[reply]

I do not follow what you mean by this. At the given Proto-page all the descendants should be listed. If the proto-page is a redlink, the cognates can stay. Vininn126 (talk) 21:14, 2 June 2023 (UTC)[reply]

I'm for reducing duplicate listing of the same descendants/cognates/information in multiple places as much as possible. (I'm sure the IPs Chuck mentions will continue to add cognates whether we want it or not, and whether there are any already listed or not, alas; there are other users who love to copy-paste lengthy English etymologies into their own language, e.g. Italian or Cebuano — since it's only fair" for those languages' etymologies to be just as elaborate — so they can fall out of sync and be unwieldy, which is worse IMO.) There are a few situations where it might be helpful for language X to mention the cognate in language Y, e.g.cases where laypeople think of the languages as dialects, like Scots mentioning the relevant English word which typically has much more etymological detail, or Gottscheerish [etc] mentioning German which is likewise typically more fleshed out, but perhaps those cases can be handled by rephrasing so that it's not just a redundant "Cognate to X" link at all but rather something specifically saying "For more, see X". I assume we'll continue mentioning cognates in cases where there is something exceptional to say beyond that the words are cognate (e.g. an Italian word where the derivation is unexpected and we're saying "Cognate to French foo, which shows the same unexpected [whatever]". Perhaps the ideal thing to make the cognates maximally findable would be to either 1) have a template/module that could automatically pull and display a collapsed list of cognates from one central location, a la the various descendant-listing templates (which could satisfy the people who want their language listed as a cognate and might lead people to notice any errors in the cognate list), or 2) systematically add a little tag like "(see there for cognates)" or "(see there for other descendants)" to help deter the addition of them to the entries. - -sche (discuss) 21:05, 2 June 2023 (UTC)[reply]

Yes, my formulation is something like that they should not be added if they do not provide additional information if seen with ancestor pages, but there is a lot of excess verbiage at Wiktionary:Etymology#Cognates to get rid of: who even wrote “The inclusion of cognate words is allowed only for inherited words” which is obviously untrue? Our best etymologists have not edited nor likely read the page, and I am somebody who for due diligence always reads the manual or instructions before assembling anything. Fay Freak (talk) 21:08, 2 June 2023 (UTC)[reply]

Could you please give an example? I'm a bit confused as to what this'd entail. AG202 (talk) 21:25, 2 June 2023 (UTC)[reply]

Essentially if a descendants page lists multiple pages, then those corresponding pages should not list each other as cognates in their etymology section. Vininn126 (talk) 21:29, 2 June 2023 (UTC)[reply]

Ahh thanks. Yeah I’d oppose this as well. Per this policy, we wouldn’t be able to list cognates for, for example, the Jeju cognate of Korean 하다 on that page or Yoruboid cognates at ajá. Especially for the smaller language groups, it’s much easier to see the cognates listed at the pages they’re currently at rather than clicking to the proto or ancestor page and then clicking to another descendant. AG202 (talk) 02:47, 3 June 2023 (UTC)[reply]

My read is that the suggested policy would require removal of e.g. the list of cognates at worm ("Cognate with Dutch worm, West Frisian wjirm, German Wurm, Danish orm, Norwegian orm. Indo-European cognates include Latin vermis (“worm”), Lithuanian var̃mas (“insect, midge”), Albanian rrime (“rainworm”), Ancient Greek ῥόμος (rhómos, “woodworm”)") since the etymology links to *wurmiz and *wr̥mis which list the cognate sets. On the one hand, I agree that it's a terrible pain to have to update or correct these kinds of lists, and they either get unwieldy (if an attempt at a complete list is made) or there's no clear criteria for which cognates are included or left out. The biggest argument for keeping them I think is that it's inconvenient to have to click through links, possibly even more than once, to see relevant information (although it's not clear to me how many users find it useful to see the list of cognates).--Urszag (talk) 23:28, 2 June 2023 (UTC)[reply]

In the list of "worm" cognates, German Wurm shows you the PGmc and wGmc "wu", Danish orm shows you the nGmc "o" without the "w", Latin vermis shows the consonants in a non-Germanic word, Ancient Greek ῥόμος (rhómos) shows the result of an earlier syllabic "r̥", the circumflex accent in Lithuanian var̃mas likewise suggests a syllabic "r̥". I don't know enough about Albanian to say what it shows. The idea is that the cognates are there to show us something other than "x is related to this term". Of course, most people have no clue about some things like what happens to PIE syllabic sonorants in Lithuanian and Ancient Greek, but they can see that there's something different going on there, even if they don't know why. In that list, the Dutch and the Norwegian (added to go with the Danish here. Presumably it's Bokmål, which is, as expected, identical to the Danish) aren't really telling us much. It looks like the West Frisian and the Albanian might have something to say (perhaps "i" umlaut in the first case and syllabic "r̥" in the second?) but I can only guess. That's all subjective, so I don't think we can come up with a formulation that will hold up in the face of the inevitable edit wars. Chuck Entz (talk) 01:37, 3 June 2023 (UTC)[reply]

I’m pretty sure these could be generated automatically and displayed in a collapsible list by using a similar method to {{desctree}}, which would address the concerns of everyone. Theknightwho (talk) 03:07, 3 June 2023 (UTC)[reply]

In theory it'd be possible to automatically generate a lot through the etyline - you could potentially have a button there that could take you to another page with a navigable tree. Vininn126 (talk) 07:55, 3 June 2023 (UTC)[reply]

@Theknightwho, Erutuon, Benwing2, Mahagaja, -sche I've always liked the use of collapsible lists to reconcile divergent needs of users: scanning a large area of an entry (eg, multiple PoSes, many definitions) and also having quick access to lists of terms or of quotations, some of which should be visible simultaneously with other parts of the entry (eg, a definition). At what point does the proliferation of such collapsible content cause technical problems (eg, Lua memory limits, download lags)? DCDuring (talk) 16:29, 4 June 2023 (UTC)[reply]

@DCDuring It’s difficult to estimate this kind of thing. My experience is that page scraping (i.e. checking raw Wikitext of a page) is not very demanding, as it’s what Chinese lects use to achieve automatic transliteration - sometimes with over a thousand terms on the same page.

That being said, my idea of scraping cognate lists could be tricky, as it has the potential to get exponentially complex. {{desctree}} also has the same risk, but works in practice because you can only have so many descendants before you run out. Cognates aren’t necessarily so simple, and it could be tricky to work out a reliable way to tell the function when it’s time to stop checking pages for more cognates. Theknightwho (talk) 16:39, 4 June 2023 (UTC)[reply]

Kinda oppose. We've got a plenty of cognate lists at Turkic and Scandinavian words, mainly because these words aren't got no proper etymology. We also may have use for cognate lists if the cognates are similar or identical in their spelling, but got some different meaning (e.g. Swedish orm is not quite same as English worm, but I personally ain't see it as somethin important). In other situations i see no reason to have long cognate lists if it is possible to just see them on the proto-word link. But isn't it kinda obvious? Do we really have so many examples of the situations when this logic is not followed? Tollef Salemann (talk) 09:57, 3 June 2023 (UTC)[reply]

Oppose per Thadh and longstanding tradition. I've been reading dictionaries ever since I was young and I've gotten used to seeing cognates. One reason why paper dictionaries might do this is that they don't have an appendix of parent language roots in the appendix, yes .... but it's also a lot more convenitent for our readers to have cognates close at hand instead of needing to click to another page and then find the ones they want in the desctree. I think it's better to have no policy than to have such a strict policy that inconveniences readers. —Soap— 15:22, 4 June 2023 (UTC)[reply]

Oppose. We don't need a blanket ban; we can use common sense to allow editors to list a handful of especially close or especially interesting cognates without going overboard and listing every possible cognate in every imaginable language. —Mahāgaja · talk 16:42, 4 June 2023 (UTC)[reply]

Sorta support. I think the number of cognates displayed can be excessive, and needs to be limited. I also notice one problem with many of the cognates. At the Thai word น้ำ (náam, “water”) there are 20 cognates listed, and six of them are redlinks. Even some of the blue linked words are sketchily supported, even by the standards of LDLs. Are we being equally lax with the lists of descendants? In this particular case, the reconstructed form's descendants don't look to be severely policed. --RichardW57m (talk) 14:27, 5 June 2023 (UTC)[reply]

We also have potential purging issue with the SE Asian descendants of Pali and Sanskrit, where the diffusion patterns seem not to be well known. They may of course be multiple - for Northern Thai there has been a steady flow of Siamese loans along with earlier waves apparently from Khmer, surely some from Mon, and probably a steady flow of learned loaning direct from Pali to Northern Thai. --RichardW57m (talk) 14:27, 5 June 2023 (UTC)[reply]

Oppose. I think a ban is the wrong approach, but I don't object to anyone removing long lists of cognates redundant to descendants sections. Basically I second what -sche said. Also, I suggest we remove this line from WT:ETY:

For example, in English they, one may write: “From Old Norse þeir (Icelandic þeir, Swedish de, Danish de, Norwegian Bokmål de, Norwegian Nynorsk dei)”.

Yes, they're Germanic languages, but these feel too far removed to be useful to English speakers and learners. Some of the points raised here about which cognates are desirable belong at WT:ETY for sure. Ultimateria (talk) 02:04, 8 June 2023 (UTC)[reply]

Oppose I do not like the idea of making the click through more than 1 page to see the cognates. Listing a few interesting cognates is harmless even when there is a parent entry. Overburdening an etymology with lots of cognates should and is discouraged and is not widely practiced. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 06:26, 9 June 2023 (UTC)[reply]

User:Theknightwho causing troubles in Japanese modules again

He tries to change the definition of cat:Japanese terms with multiple readings without consensus. Japanese makes distinctions between "terms with multiple readings" and "homographs", but as I have stated several times before, User:Theknightwho does not speak Japanese, so I can not expect him to understand the difference. That is also why I asked him to get the consensus for this change, he needs the help of Japanese speakers.

But he replied "There's nothing about it which needs consensus.", which shows his lack of respect for either other editors with knowledge of the related language, or the proper decision making procedure. It is particularly ironic that "you do not have consensus" is one of his favored excuses to attack other editors.

He also immediately locked mod:Jpan-headword to win the ownership of the page, as usual. -- Huhu9001 (talk) 01:38, 4 June 2023 (UTC)[reply]

Adding comment from the previous thread, which Huhu9001 has moved:

~~This isn't vandalism.~~ You've now changed it to "causing trouble".
The consensus is actually against you, because there are hundreds of manual additions to the category that do not make the distinction you are claiming matters.
There is no category for homographs, and in any event there is nothing to suggest Wiktionary has ever made the distinction anyway. Homographs are not a concept unique to Japanese, and it's not even difficult to understand the concept, so please cut the Sapir-Whorf bullshit.
You haven't even explained the change I made - you just reverted the addition without discussion or even notifying me, which is something I've noticed you do a lot.
I locked the page to prevent you from edit warring, because you have a track record of it. You also ignored my previous offer to unprotect Module:Jpan-sortkey on the condition that you wouldn't edit war, which strongly suggests that you intend to keep engaging in edit warring behaviour.

Theknightwho (talk) 01:53, 4 June 2023 (UTC)[reply]

開ける and 戻る, etc. has never been in cat:multiple readings for many years. This proved wrong his guessing that Wiktionary do not distinguish "a term with multiple readings" and "different terms with the same spelling", and also his misbelief that he has the consensus with him. But again, User:Theknightwho does not speak Japanese, so he often has to guess wildly to make his stand. User:Theknightwho makes assumptions on what he saw as "manual additions to the category". But the Japanese infrastructure has changed much during the years. It is not clear under what circumstance they were add and whether the Japanese community agree on their addition. User:Theknightwho never tried to consult Japanese editors and just act arbitrarily based on his guessings and feelings.

No matter what excuse User:Theknightwho uses, it does not justify his recurring behaviour of "immediately locking a page to win the ownership of it". It is an clear sign that he enjoys abusing his admin power. -- Huhu9001 (talk) 02:16, 4 June 2023 (UTC)[reply]

Finding some pages where people hadn’t got round to manually adding them yet doesn’t prove anything, and it certainly doesn’t mean I’m “guessing wildly”. You still haven’t actually made any kind of argument - just lots of made up rubbish about my supposed motivations. Theknightwho (talk) 03:35, 4 June 2023 (UTC)[reply]

From 1."There's nothing about it which needs consensus." to 2."The consensus is actually against you" to 3."manually adding them (or not) yet doesn’t prove anything". User:Theknightwho has been evasive all long the way arguing. Basically I know any talk with this user is futile from the beginning. Any Japanese editor wants to speak? -- Huhu9001 (talk) 05:31, 4 June 2023 (UTC)[reply]

I’m not being evasive - you’re just intentionally taking what I’m saying out of context because you have a grudge. Theknightwho (talk) 11:59, 4 June 2023 (UTC)[reply]

By the way, User:Theknightwho locked the Japonic headword module indefinitely. Is he going to make mod:Jpan-headword his permanent private territory? -- Huhu9001 (talk) 02:19, 4 June 2023 (UTC)[reply]

Can we get someone who is not one of the above two parties to comment, esp. one of the Japanese editors? Benwing2 (talk) 05:40, 4 June 2023 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix): AG202 (talk) 13:14, 4 June 2023 (UTC)[reply]

Not a Japanese editor, but I do have some knowledge in Japanese, so this is perhaps a cent and a half from me. I should mention that 開ける has been in Category:Japanese words with multiple readings (which redirects to the terms one) since this edit by Shantian Tailang (the correctness of this edit is perhaps a separate matter, but I see that Huhu9001 has edited that page afterwards), whereas 戻る hasn't been in the category.

I do think, however, since TKW doesn't speak Japanese and there is an active Japanese editor community, he should have consulted them before making changes to the infrastructure. It's pretty jarring for me seeing the recent mass changes by TKW without any sort of discussion. Obviously as I'm not involved with editing in Japanese, I have no knowledge at all about whether or not there is a consensus on the issue of multiple readings; if not, there should somehow be one. – Wpi (talk) 13:49, 4 June 2023 (UTC)[reply]

That last paragraph is exactly what I feel and what I've told Theknightwho before many times as well. I don't know what's going on with the Okinawan {{ryu-noun}} template but it's being replaced with {{ryu-head|noun}} no warning or explanation (at the very least write an edit summary !). It's unbelievably frustrating that this keeps happening. @Benwing2 AG202 (talk) 13:57, 4 June 2023 (UTC)[reply]

@AG202 The change is purely cosmetic in the wikitext, and was for the sake of consistency. It did not change the output. Theknightwho (talk) 14:06, 4 June 2023 (UTC)[reply]

Thanks, I noticed that it didn’t change the output, but next time, at the very least put why in the edit summary. I’m not sure why the consistency is needed considering that many other languages use x-noun templates, though. AG202 (talk) 16:10, 4 June 2023 (UTC)[reply]

@AG202 It’s because I created {{kzg-head}}, {{okn-head}} (etc.) for the other Ryukyuan languages, and wanted to avoid creating a bunch of headword templates for each language for no reason. Okinawan had a few headword templates, most of which were very little-used, but for the sake of consistency it made sense to switch it over to the same format. I honestly didn’t see it as a significant change, but you’re right I should have put an edit summary, as it would have cleared up why I was doing it. Theknightwho (talk) 16:31, 4 June 2023 (UTC)[reply]

Thanks for the clarification. AG202 (talk) 21:25, 7 June 2023 (UTC)[reply]

開ける can be あける or ひらける; 戻る can be もどる or もとる. I don't see the problem here. Or are you trying to distinguish between these and something like 寂しい(さびしい〜さみしい)? —Fish bowl (talk) 18:48, 4 June 2023 (UTC)[reply]

Likewise, I fail to understand what the underlying issue is supposed to be.

> Japanese makes distinctions between "terms with multiple readings" and "homographs"

This is news to me. As I understand it, a homograph is a single spelling that is used to signify multiple words, sometimes with multiple pronunciations. Meanwhile, in Japanese lexicography, a "term with multiple readings" is any word spelling that can be read (pronounced) multiple ways -- thus signifying multiple words. This seems to be two ways of referring to the same phenomenon. Compare English read as /ɹiːd/ (present) or /ɹɛd/ (past), lead as /liːd/ (verb) or /lɛd/ (noun), also Japanese 避ける (yokeru, “dodge”) or 避ける (sakeru, “avoid”), 被る (kōmuru, “to get, to receive, to earn”) or 被る (kaburu, “to wear on one's head”), etc.

@Huhu9001, could you expand on how you seek to distinguish "terms with multiple readings" and "homographs"? ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:35, 5 June 2023 (UTC)[reply]

I'm not certain, but my guess is that Huhu9001 wants to reserve "terms with multiple readings" for cases like 愛猫家 (あいびょうか, あいねこか) where no difference in meaning is associated with the different pronunciations.--Urszag (talk) 22:58, 5 June 2023 (UTC)[reply]

In reality, many entries aren’t so simple. Lots where two readings with identical meanings are split due to having different etymologies. Theknightwho (talk) 23:23, 5 June 2023 (UTC)[reply]

@Eirikr: Only 被る is a "term with multiple readings" because it is a single term with different pronunciations. よける and さける are not. They are different terms having different meanings. -- Huhu9001 (talk) 00:30, 6 June 2023 (UTC)[reply]

@Huhu9001, I'm afraid I still don't understand your distinction.

I agree that sakeru and yokeru are different words with different meanings, that simply happen to share the same spelling 避ける (sakeru, yokeru).

I do not understand your contention that “Only 被る is a "term with multiple readings" because it is a single term with different pronunciations.” The reading kaburu is phonetically distinct from kōmuru, and the semantics of the two are also distinct. Etymologically, kaburu and kōmuru share an origin, unlike sakeru and yokeru. That said, all four in the modern language are "different terms having different meanings".

Could you explain your position that kaburu and kōmuru are somehow "a single term with different pronunciations"? ‑‑ Eiríkr Útlendi │^{Tala við mig} 04:29, 6 June 2023 (UTC)[reply]

@Eirikr: Look at page 被る. Etymology 1 is a classical verb, leave it aside. Etymology 2 かぶる and Etymology 3 かむる's definitions are essentially identical: "1. to cover 2. to wear 3. to receive". So this is "a single term with different readings". Same are Etymology 4 こうぶる and Etymology 5 こうむる. On the other hand さける and よける are semantically different. You can find many articles discussing their difference. The entry 被る itself also gives a link to such an article "「さける」と「よける」はどう違う？｜日本語・日本語教師｜アルク". So they are 2 different terms, not a single term with different readings. -- Huhu9001 (talk) 07:00, 6 June 2023 (UTC)[reply]

Now I don't know much about Japanese, but you seem to be beating around the bush. My superficial understanding was that the different 'readings' of a word mainly represents a distinction of register, with the higher register usually being a Chinese loan, a bit like in English eat and dine or drink and imbibe where the higher register word comes from French or Latin. This as opposed to cleave (“to stick”) and cleave (“to split”) which coincidentally sound and are written the same but otherwise have nothing to do with each other. A fine distinction in theory. In English we don't care much about it because eat and dine aren't written with the same character. What you should be discussing is whether this distinction can be made neatly across the board in Japanese and whether/why it is worth making.

What is going on in the entry of 被る anyway? Five words with the same etymology and meaning. Shouldn't it contain information helping me decide which one to use? You say the first one is historical. What distinguishes the other four? —Caoimhin ceallach (talk) 11:27, 6 June 2023 (UTC)[reply]

Certain Japanese lexical items have multiple possible pronunciations recognized by dictionaries. We (the Wiktionary community) don't always have the expertise needed to fully explain the subtler differences in regional and sociolectal usage. Plus, this is a volunteer project, and it's possible that we just ran out of time or interest, leaving the entry less finished than might be ideal. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:35, 6 June 2023 (UTC)[reply]

@Caoimhin ceallach: No, your guess is completely wrong. In Japanese the so called "higher register usually being a Chinese loan" have actually completely different spellings like 食べる and 食事する, 飲む and 飲用する. They are irrelevant to this thread. In this thread we talk about things that are spelled the same. Your question "Shouldn't it contain information helping me decide which one to use?" is also irrelevant to this topic. I can tell you how kaburu is different from kōmuru in meanings and it may merit a Usage notes section in that entry, but this has nothing to do with cat:Japanese terms with multiple readings. -- Huhu9001 (talk) 01:02, 7 June 2023 (UTC)[reply]

I can loosely agree that kaburu and kamuru are essentially the same word, or that kōburu and kōmuru are similarly variations of each other.

That said, I cannot agree that kaburu and kōmuru (the two more mainstream forms in modern usage) are the same terms -- which is exactly why I used these two readings in my posts earlier in this thread.

@Huhu9001, do you hold that kaburu and kōmuru are somehow "a single term with different pronunciations"? ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:33, 6 June 2023 (UTC)[reply]

@Eirikr: No, I did not say kaburu and kōmuru are the same term. What I said is that kaburu and kōmuru are different terms, but kaburu and kamuru are the same term, and kōburu and kōmuru are the same term, so the entry deserves to be in cat:Japanese terms with multiple readings. On the other hand 避ける should not be in this category because sakeru and yokeru are different terms. We can not find two readings that are the same term like kaburu and kamuru of 被る in 避ける. -- Huhu9001 (talk) 01:02, 7 June 2023 (UTC)[reply]

Okinoerabu and Tokunoshima

Discussion moved to Wiktionary:Requests for moves, mergers and splits.

Renaming Sarcee to Tsuut'ina

Tsuut'ina is a Na-Dené language native to Alberta. On Wiktionary, the language is called Sarcee. However, in most circles, even on Wikipedia (Tsuut'ina language), the language is called Tsuut'ina, which is a much more appropriate name for the language. Sarcee actually comes from a Blackfoot word, while Tsuut'ina is the native term.

Could we please rename Sarcee to Tsuut'ina? GKON (talk) 02:34, 5 June 2023 (UTC)[reply]

Support Wikipedia says that Sarcee is a former term. CitationsFreak: Accessed 2023/01/01 (talk) 21:44, 5 June 2023 (UTC)[reply]

Should I add this discussion to the Votes? Or is this too irrelevant for that? GKON (talk) 23:26, 5 June 2023 (UTC)[reply]

@GKØN440 We don’t usually have formal votes for renaming languages. Theknightwho (talk) 23:28, 5 June 2023 (UTC)[reply]

Next time this should be at WT:RFM, btw. If there's no opposition in the next 2 weeks or so, the change should just be made. AG202 (talk) 21:25, 7 June 2023 (UTC)[reply]

@GKØN440, AG202 Renamed. There may still be references to the old name in documentation, which should be fixed. Benwing2 (talk) 04:24, 24 July 2023 (UTC)[reply]

Word-like Stemming Artefacts

A user might stem a verb or deverbal adjective that is a compound of Pali kamati (“to go”) by removing its prefix. For a finite verb that will leave a form of kamati. However, for a past participle, that would leave kanta (“gone”), which appears not to be attested, and quite possibly simply wasn't used at all. I have a number of worries about supporting this process:

1. This 'kanta' is a morpheme, but I worry that if I label it as such, it may be objected to because it is decomposable into kam + -ta. Campaigns have been waged against compound morphemes. --RichardW57m (talk) 10:37, 7 June 2023 (UTC)[reply]

2. Possibly I should enter it as '-kanta', but as stemming the finite verb forms leads to 'kamati', this seems weird. Should I create '-kamati' as a soft redirect to 'kamati'? --RichardW57m (talk) 10:37, 7 June 2023 (UTC)[reply]

3. Pali compounds formed by adding a prefix tend to be very idiomatic, like English compound verbs. Perhaps we shouldn't encourage such an approach by supporting it. --RichardW57m (talk) 10:37, 7 June 2023 (UTC)[reply]

There may be problems with other non-finite forms, which do not appear to be well-attested for the simplex, and some of which, such as *kamma for the absolutive, are inherently unlikely - there is a rule that the absolutive has a different form for compounds, but it is not rigidly adhered to. --RichardW57m (talk) 10:37, 7 June 2023 (UTC)[reply]

User:Theknightwho is now messing up another Japanese module

Again, despite that User:Theknightwho does not speak Japanese, he has decided to change the Japanese transliteration rules without any consensus from the Japanese community. For example, in special:diff/73363646, he changed the testcase to make "yi" and "wu" the expected transliterations of "いぃ" and "うぅ" in standard Japanese. I don't know when Japanese has ever had "yi" and "wu". His disrespect for proper decision making procedure has worsen since the last thread above.

Also, considering his past behaviour, I am afraid he will soon lock the related module pages to win the ownership of them once again. -- Huhu9001 (talk) 13:04, 7 June 2023 (UTC)[reply]

Wiktionary:No_personal_attacks#Community_spirit: "It is your responsibility to foster and maintain a positive online community in Wiktionary. Personal attacks against any user - regardless of the editor's past behavior - are contrary to this spirit."
Wiktionary:No personal attacks: "Accusatory comments such as "Bob is a troll", or "Jane is a bad editor" can be considered personal attacks if said repeatedly, in bad faith, or with sufficient venom."
Above: "It is an clear sign that he enjoys abusing his admin power."
"His disrespect for proper decision making procedure has worsen since the last thread above." --Geographyinitiative (talk) 13:37, 7 June 2023 (UTC)[reply]

I completely agree - Huhu9001 has repeatedly and maliciously broken WT:CIVIL. Twice in a row, his first response has been to create a new Beer Parlour thread to attack me, and he repeatedly refuses to even speak to me directly. It’s absolutely disgraceful, and clearly intended to harass me as some kind of revenge for being blocked for 24 hours several weeks ago. Theknightwho (talk) 14:50, 7 June 2023 (UTC)[reply]

TKH then proceeds to reply to with a personal attack. --{{victar|talk}} 15:19, 7 June 2023 (UTC)[reply]

Nice bait. Theknightwho (talk) 15:26, 7 June 2023 (UTC)[reply]

@Huhu9001 This is now the fourth (fifth?) thread where you have behaved like this in response to changes you don’t like instead of engaging like an adult. You have also invariably been dishonest and - quite frankly - malicious when explaining what the problem is. I am reaching the point where I think you need a long-term block from Wiktionary, because you have proven yourself to be impossible to work with in any meaningful way. You’re also just outright wrong about this: these are rare phonemes, but exist in borrowings like ウゥルカーヌス (Wurukānusu) or プイィ・フュイッセ (Puyi Fyuisse), even if they are often not realised that way. Theknightwho (talk) 14:25, 7 June 2023 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix): For a third-party viewer, it's really difficult on both ends to see what's happened since some edits lack proper edit summaries. On the other hand, I will yet again state that major changes like this should have discussion before they are implemented. I'm looking at the testcases and even with my limited knowledge, there are some questions that I have. Ex: With ずぁ, Is this something that can be found in running texts? Where is the romanization pattern coming from exactly? (This should probably be cited somewhere in the documentation) Does this differ between Japonic languages? (I'm aware of pages like Module:ja-translit/data/ryu, but still) There needs to be much more done and more cooperation/documentation. Also, while Huhu has been far from cordial, I do not think that a long-term block would be warranted as their concerns seem to be valid (and other admin should hopefully weigh in). I bet that as the person that created the module, they'd have more say and more worry in the matter. As you've been told before, Theknightwho, it's best to leave these personal disputes to other admin so that they don't get worse and keep blowing up. Saying things like "instead of engaging like an adult" only provokes and doesn't solve the issue. I would've instead pointed to documentation/sources about why you chose to add this AND address concerns without stooping to a similar level. Edit: Even if they don't respond, it at least shows to other people the rationale behind it. Side note: those aren't phonemes btw AG202 (talk) 21:49, 7 June 2023 (UTC)[reply]

I don't know what to do. I want to work with Japonic (Ryukyuan; Okinawan) modules. Too lazy to read unfortunately. Chuterix (talk) 21:54, 7 June 2023 (UTC)[reply]

@AG202 These were all modifications involving very marginal syllables or strict transcriptions that don't generally crop up in standard Japanese, but can under rare circumstances. We already cover several similar instances as it is, so it made sense to broaden the coverage:

いぃ (yi) and うぅ (wu) (as above) - these are given in more recent versions of modified Hepburn romanisation.
あ with dakuten あ゙ ('a), which is mostly used in slang to represent a glottal (or otherwise distorted) vowel sound in interjections. There's no standard romanisation.
The nasal "u" う゚ (ũ) - the kana is officially defined for transcription purposes, but (again) there is no standard romanisation for it so far as I can tell. I used the tilde by analogy with romanisations used for Ryukyuan nasal vowels.
I fixed the handling of ゐ゙ (vi), ゑ゙ (ve) and を゙ (vo), which were being handled wrongly as they don't exist as separate characters in Unicode (unlike their katakana counterparts, which were being handled correctly). They're very rare, but they do exist.
I improved the handling of gemination when it occurs before a consonant that is romanised with an initial apostrophe, so that it doesn't merely duplicate the apostrophe (this mostly affects Ryukyuan languages). For example, Kunigami ま゚ ('ma) should become っま゚ ('mma), not っま゚ (''ma). The only time this could affect Japanese is with (mostly unused) か゚ (nga), which would become っか゚ ('nnga). I doubt it will ever come up, but if it it does it would certainly be wrong to put っか゚ (''nga).

The testcases are incomplete, and will likely need to be refined. However, to use your example ずぁ (zwa), it follows by direct analogy from くぁ (kwa), and can be seen in tables such as this one published by the Council on East Asian libraries.

Finally, I have no idea what I could have done here to stop it "blowing up": Huhu9001's first response was to immediately make this thread, and it isn't the first time he's reacted like this. The guy refuses to even speak to me, so it's pretty futile trying to engage him in conversation; I have tried several times to do so, to no avail. Telling me I'm "stooping to a similar level" is self-evidently untrue. Theknightwho (talk) 22:31, 7 June 2023 (UTC)[reply]

I have no particular facility with Lua, and no time in my life to dedicate to that.

Re: the morae wu and yi, granted that these do not appear in the Japanese language, and most native Japanese speakers would likely hesitatingly pronounce these as either u / ū or i / ī.

That said, these kana combos うぅ・ウゥ and いぃ・イィ do appear in very restricted contexts as either borrowings or other attempts at rendering non-Japanese phonology. So while they are definitely not native to Japanese, they do appear in Japanese text. The ウゥ (wu) spelling has been used in the ウゥルカーヌス article on the JA WP since its creation in 2005, intended as Wurukānusu as an approximation of the Latin Vulcanus.

As best as I can tell from the diff to Module:ja-translit/testcases linked at the top of this thread, this shows that @Theknightwho added a new function t:test_transliterate_kana; inasmuch as I understand code generally, a new function like this should not have much impact on existing code, and should thus presumably be minimally disruptive.

In addition, I cannot think of any case where うぅ・ウゥ would normally be romanized as ū, or where いぃ・イィ would be romanized as ī, so the change on line 313 of the new version should similarly have minimal impact.

Re: Japonic languages, Module:ja-translit is only used for mainstream Japanese (as used in the media), and (as far as I understand it) should only be used for mainstream Japanese.

Agreed that name-calling and fault-finding are not helpful. Please stick to the issues at hand, and do not engage in ad hominem commentary. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:41, 7 June 2023 (UTC)[reply]

@Eirikr Just to clarify, the module is used for all Japonic languages, but (some of the) languages other than Japanese have a dedicated data module that modifies the output (e.g. Module:ja-translit/data/xug). Eventually, they will all have one. Theknightwho (talk) 22:44, 7 June 2023 (UTC)[reply]

Given our language codes, presumably the module should be moved to Module:jpx-translit then, no?

The ja language code is specifically for Japanese, which held true when the module was begun. But if it's been expanded to cover Japonic languages as a group, the ja language code is no longer accurate. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:26, 8 June 2023 (UTC)[reply]

@Eirikr I forgot to mention it also covers Ainu (Module:ja-translit/data/ain), so it's probably better to move it to Module:Jpan-translit. In theory, it could also be used for Taiwanese kana, too. Theknightwho (talk) 00:27, 8 June 2023 (UTC)[reply]

Hmm, interesting, thank you.

If memory serves, Jpan is a regional code, whereas the module is intended to process the two kana scripts, hiragana and katakana, regardless of region. Per ISO 15924, the script code for that would be hrkt. ‑‑ Eiríkr Útlendi │^{Tala við mig} 01:22, 8 June 2023 (UTC)[reply]

@Eirikr The module does contain logic that allows it to scrape transliterations (in the same manner as Chinese), but it's not currently enabled in mainspace, so I think it's fair to say it covers Jpan in full. Theknightwho (talk) 01:29, 8 June 2023 (UTC)[reply]

@Eirikr On reflection, I've changed my mind - it would be helpful to have Module:Jpan-translit for scraping transliterations, and a separate Module:Hrkt-translit that handles kana (which would fill the role of kana_to_romaji). Jpan-translit would of course call Hrkt-translit once it's determined the correct kana. This separation would have four advantages:

It would make it possible to call Hrkt-translit if we know for certain that the input's just going to be kana. At the moment that's only possible by calling the module directly, but that's not ideal because there's a bunch of generic processing done by Module:languages that does a lot of the legwork for formatting etc, and that's bypassed at the moment.
It would allow languages that never mix kana with kanji like Ainu or (historically) Taiwanese Hokkien to call Hrkt-translit without having to hard encode exceptions for them in Jpan-translit.
Having a separate script code for Hrkt means we can put hentaigana under it instead of awkwardly shoving them under hiragana, as we do right now.
Sometimes it's useful to deal with kana as a whole without including kanji, whether for categorising things or for technical reasons, and it's not possible to do that without a dedicated script code for them.

Because of this, I've added Hrkt to Module:scripts/data. Theknightwho (talk) 18:15, 28 August 2023 (UTC)[reply]

While long term détente may, alas, depend on getting rid of one of the pair, there ought to be a technical solution to cases like this whereby testcases can include justifications. I've built something along those lines into Module:sa-convert/testcases. In this particular case, it may have been easier to add the contrary testcases with displayed justifications - one generally doesn't need a perfect testcase result until it is agreed what testcases are correct. It would be nicer if Module:UnitTests could be augmented to add a discussion column. @Theknightwho, Erutuon, Benwing2, Erutuon, Huhu9001 --RichardW57m (talk) 15:07, 8 June 2023 (UTC)[reply]

@Geographyinitiative, AG202: I want to clarify that the reason I seem to be always the one starting a BP thread is that User:Theknightwho has more convenient tools to move the dispute in his favor. For example, when I just barely made the first edit he doesn't like, he could immediately harrass me on my talk page with what maybe you can read as a block threat (special:diff/73303256) or lock the related page to make sure he win the dispute (mod:Jpan-headword is still locked indefinitely now). On the other hand, I can do nothing else except for starting a BP thread. This results in my actions being disproportionately more visible than his, and may thus give the misimpression that I am overreacting. In fact User:Theknightwho seldom loses any chance to break WT:CIVIL any time he find an edit he dislike. If anyone does have much respect for this Wiktionary policy, I want to remind you it has almost become a joke after User:Theknightwho even used vulgar terms several times against me but simply got away with no consequence at all.

@Eirikr: Testcases of transliteration modules serve as a guide for future changes of those modules. So a change of testcases is a claim to change the actual transliteration module in the same manner later. Such a change by a non-Japanese speaker (I also believe User:Theknightwho speaks no other Japonic languages at all) without any discussions to get the consensus beforehand should not have been tolerated. That's why I have been always emphasizing the importance of "proper decision making procedure" which User:Theknightwho disrespects.

As for いぃ and うぅ, same-vowel small kana in Japanese are most often used for non-standard lengthening of a vowel, which you can not easily find in a dictionary. There are more of them in running texts of an informal register: "うぅ! " ("Uh!", non-standard lengthening of う) "かわいいぃぃ〜〜〜！" ("cuuuuute", non-standard lengthening of い), "優しいなぁ" ("You are so kind.", non-standard lengthening of な). -- Huhu9001 (talk) 01:20, 8 June 2023 (UTC)[reply]

Huhu9001 is lying here: when he says he "barely made the first edit [I] [don't] like", he means that he reverted everything without pinging me, and then he started a Grease Pit thread where he lied that I was engaged in vandalism. At no point has he made any attempt to speak to me, and given that he has repeatedly tried to get me sanctioned over and over as a direct response to me giving him a 24 hour ban a couple of months ago, it's very clear to me that he has absolutely no interest in engaging in a productive way. He has a track record of manipulation and dishonesty, and I am far from the first editor to have noticed this. I also have no idea where he is getting the idea that I threatened to block him, either, but it seems to be yet another one of his brazen lies. Theknightwho (talk) Theknightwho (talk) 01:34, 8 June 2023 (UTC)[reply]

Good. User:Theknightwho listed several instances he labeled as "my lying". Everyone can check with their own eyes whether these are actual lies, or excellent examples of User:Theknightwho unhesitatingly violating WT:CIVIL by making false accusations. -- Huhu9001 (talk) 02:02, 8 June 2023 (UTC)[reply]

@Huhu9001 Repeatedly lying and then trying to get me sanctioned for breaching policy when I point out what you're doing is some of the worst behaviour I have ever seen on Wiktionary. You should be indefinitely blocked. Theknightwho (talk) 02:04, 8 June 2023 (UTC)[reply]

Having mod:Jpan-headword be locked to only administrators, I can agree that that's not really the best idea though locking it in general to a wider group (maybe to template editors) would be better as that'd give more people oversight rather than just giving a single user oversight in the matter. I see that Huhu is a template editor and if that's an issue that's come up, then that should be resolved imho. Also, a better place maybe for this would've been at the talk pages with a ping to Japanese editors at least first. I can see Huhu's point, because if an editor from an outside community comes and edits a module that an editing community has been using for ages with little to no prior discussion or warning and not much explanations at first, then I'd be frustrated to. I mean, I've given my frustrations to Theknightwho when Koreanic entries were broken for weeks due to sudden changes (which since have been fixed). This is why I continue to emphasize having discussions first before making substantial changes. This, however, does not excuse the language that's been used as it does not help at all. I'd really suggest again to the both of you to cool it with the personal comments and focus on what actually is the problem and what needs to be done, because it does not look good for either of y'all. AG202 (talk) 02:15, 8 June 2023 (UTC)[reply]

@AG202: I just don't understand how I can "focus on what actually is the problem and what needs to be done". As you can see clearly above, User:Theknightwho threatened to indefinitely block me. He has been good at finding any excuse to revert me, some even as ridiculous as this one (special:diff/72638224) "reverting pointless change made to get the last word". He also showed how he can arbitrarily label some of my comments he dislike as "lying" regardless of the fact. Given the threat, maybe instead of reverting he will carry out the block for some random excuse the next time if I continue to edit where he tends to make his own territory, which is pretty much every module that works on a language scale or larger. You do make plenty pieces of good advice, but User:Theknightwho listens to none of them. -- Huhu9001 (talk) 02:42, 8 June 2023 (UTC)[reply]

@AG202 In all seriousness, how am I supposed to have a discussion with somebody who literally refuses to talk to me, and acts like this every time he disagrees with anything I do? You can see even on this thread that Huhu9001 has not addressed a single comment to me. I should also point out that - other than Huhu9001 - the other users in the community have so far agreed with the changes I've made on both occasions: Huhu9001 is merely using them as an excuse to start a fight.

I also want to point out the blatant lie that I threatened to block him in the comment above, and that he has been like this for two months - saying we need to cool off misses the point, because the evidence suggests he's going to keep doing this forever. Theknightwho (talk) 03:15, 8 June 2023 (UTC)[reply]

You both need to cool off! Honestly at this rate, I'd suggest giving time and room for other users to comment/discuss and for both of you to stop editing the modules until the Japonic community can come to a consensus. Continuing to edit when you know the other party will be pissed off without some kind of discussion or consensus from a third-party is only going to cause more issues. At this rate I don't care what you both have said to each other because it's all blending together anyways. Either focus on the issue at hand and push away the personal attacks till a later time or take a break entirely. AG202 (talk) 03:22, 8 June 2023 (UTC)[reply]

@AG202: Sigh. I can see you call for "both of you to stop editing the modules" to which I doubt whether he will ever listen. I haven't edited the related module any more after starting the BP discussion but he keeps doing so. So you see the problem is User:Theknightwho has the privilege to unilaterally push everything he wants, which has somehow become an elephant in the room. -- Huhu9001 (talk) 03:33, 8 June 2023 (UTC)[reply]

@AG202 Please explain how it's possible to come to a consensus with somebody who literally refuses to engage in discussion with me. Theknightwho (talk) 03:35, 8 June 2023 (UTC)[reply]

@Theknightwho Notice how I said the Japonic community. That means everyone. I specifically implied that you two shouldn't engage with each other, but let others chime in. AG202 (talk) 04:06, 8 June 2023 (UTC)[reply]

@AG202 Sure, but if he's still doing this in two months then please don't brush it off again. Theknightwho (talk) 04:09, 8 June 2023 (UTC)[reply]

I'm not brushing it off if I'm even choosing to engage with the discussion and proposing some kind of solution (that others have also agreed with). But if this continues, yes, my response will be different. AG202 (talk) 04:13, 8 June 2023 (UTC)[reply]

(e/c) Yeah. For anyone reading this who comes across a change to a module they want to bring up for discussion, this is a great example of how not to do it. Post something like "hey, module ABC used to do X, but it's been changed to do Y, I think X was better because of [reasons]", and the community can evaluate whether X or Y is better for users. When someone posts "this user I've been pursuing a feud with for some time is making edits I don't like again!!" it comes across as, well, being about your personal feud and not about the substance of the change, which reduces the likelihood of people evaluating the substance of the change (I appreciate Eirikr trying to do so). Ideally users should discuss substantial changes beforehand, as AG202 says, and if a user makes a change they didn't think was substantial / didn't discuss beforehand and someone else disagrees with it and undoes/reverts it, then we have a "module ABC used to do X, but..." discussion (not a "this user I feud with edited again!!!!1" discussion), and go with whichever revision the community decides on, which should mean these modules can be open to all template editors and admins (if not all users), since there shouldn't be more than one undo/revert. (The obvious problem with this is, what to do if on a volunteer project no-one else with relevant knowledge has time to evaluate whether person A's or person B's preferred revision is better.) - -sche (discuss) 02:57, 8 June 2023 (UTC)[reply]

@-sche: While I admit "module ABC used to do X, but..." is a more ideal discussion style than "User:Theknightwho is now messing up another Japanese module". User:Theknightwho's response is even far worse than the latter. As I have described above, he immediately opened a "bad faith removal" section in my talk page and locked the module page to make sure his version stands. That was before I can start any discussion in BP. In this case, User:Theknightwho is himself a problem, as well as the module change. So I can not avoid bringing it up in the discussion. -- Huhu9001 (talk) 03:19, 8 June 2023 (UTC)[reply]

Again, I think this is a good example that have shown the benefit of admin privileges in manipulating public opinions. When admins use admin tools to unilaterally push the result they want, it is far less visible than non-admins opening threads on discussion pages. And as a result non-admins' edits are far more easier targets for critisms of being unfriendly etc. than the admin ones. -- Huhu9001 (talk) 03:47, 8 June 2023 (UTC)[reply]

@Huhu9001 If you refuse to engage in discussion, it is impossible to come to a consensus with you. You could demonstrate your good faith by actually trying to discuss the issues with me - it really is as simple as that. Instead, you chose to do exactly the same thing again, making yet another Beer Parlour thread about me. At least you didn't ping 75 users this time. Theknightwho (talk) 03:49, 8 June 2023 (UTC)[reply]

As an outsider I agree with @AG202 and @-sche that it would be better for both of you to stop commenting for a while as you're only attacking each other and not adding any clarity to the discussion. Better to let other Japonic editors chime in. —Caoimhin ceallach (talk) 15:47, 8 June 2023 (UTC)[reply]

Huhu9001's synopsis above is painfully on point. To speak generally, many admins use their synop powers to gatekeep, pushing their personal agendas. I believe the role of admins should primarily be preventing vandalism, and to further that, I pushed for the creation of the mover role, and in admin votes, always ask if the user truly needs admin privileges, or simply user group access, like template editor. Wiktionary has become increasingly hostile and divisive over the years, and I would attribute that, at least in part, to unchecked toxic admins abusing their position. --{{victar|talk}} 00:10, 9 June 2023 (UTC)[reply]

@Victar What power has been abused here, exactly? Theknightwho (talk) 05:47, 9 June 2023 (UTC)[reply]

"To speak generally". Even on your desynop vote, I only voted on the principle, because users have no voice to be heard other than here, and in votes. --{{victar|talk}} 07:29, 9 June 2023 (UTC)[reply]

Would like to add to the exacerbation: the {{ryu-readings}} template is based on {{ja-readings}}, the former has “On (unclassified)” which the one in parentheses shouldn’t be there; any research on go-on, kan’on, etc. for the Ryukyuan cognates? ～ POKéTalker （＝◉＝） 13:53, 8 June 2023 (UTC)[reply]

This was done by @Huhu9001. Theknightwho (talk) 14:01, 8 June 2023 (UTC)[reply]

@Poketalker Honestly, there's should just be "On" for now. From what I've read, it seems like most On readings/terms are just direct borrowings from Japanese anyways. See: Wayne Lawrence (2015) “7. Lexicon”, in Patrick Heinrich, Shinsho Miyara, Michinori Shimoji, editors, Handbook of the Ryukyuan Languages, volume 11, Berlin/Boston/Munich: Walter de Gruyter, Inc., →ISBN, →ISSN, pages 168-171 AG202 (talk) 15:19, 8 June 2023 (UTC)[reply]

@Huhu9001 Hey, I don't know you, nor will I take the time to read all this. All I'm saying to you is that Wiktionary 'does not matter all. It's not important. Who cares? You think I like everything that happens here? No I don't. But it ultimately doesn't matter. Just try to be 100% friendly with people, no matter what you think they are thinking, and that will affect how people interact with you and eventually it will change how you interact with them. Even if you are 100% right and the other person is 100% wrong, just don't worry about it. There are inifinte things to do which are not even slightly controversial. Just do the other things. I do not like Wiktionary in several respects! There are things that are horrifying to me! Just do the other stuff. Stick to the fun stuff. --Geographyinitiative (talk) 15:44, 8 June 2023 (UTC)[reply]

@Geographyinitiative: Well, I guess you just haven't read w:Wikipedia:Civil POV pushing. That article may show you things actually work slightly differently here from what you have thought. (To clarity, I don't completely agree with that article. But obviously it has many supporters.) -- Huhu9001 (talk) 19:43, 8 June 2023 (UTC)[reply]

It's hard for me to make sense of the discussion. Both seem to be acting in good faith but we see mutual atatcks. Theknightwho is already known to have made good efforts in making complex modules work and have introduced great improvements and new functionality. He does make mistakes sometimes but who doesn't? He does correct and listens to the advice. Huhu9001 also made a lot of good edits but I remember him attacking people in the past. He attacked Wyang even for the great contributions in a number of complex Asian languages (nothing is or was ever perfect) and even gloated when he left. It wasn't nice at all but we moved on. This new conflict just reminds me of that hate.

I think I am bit biased, since I am happy about what Theknightwho has already introduced and hope he can do more great stuff but he is being discouraged now. I agree with those who suggest they both should stop posting about each other and help others understand what they are trying to achieve. Constructive comments from Huhu9001 have also been said but I don't know why the work started can't continue with all the corrections. The abuse of adminship comments were unwarranted, IMO. --Anatoli T. ^{(обсудить}/^вклад) 06:23, 9 June 2023 (UTC)[reply]

@Atitarev Thank you Anatoli. Your comments are very constructive. I think for User:Theknightwho in particular it would be best for him to hold his tongue more often and wait for others to chime in. If someone is unreasonably attacking him, others will respond appropriately. For User:Huhu9001 please do assume good faith and instead of saying things like "XYZ is messing up a module", specify what you think should or shouldn't be done and why. Benwing2 (talk) 19:48, 10 June 2023 (UTC)[reply]

@Atitarev, Benwing2: Thank you for your advice but it simply doesn't work. You see the typical workflow is, first I make some edits TKW doesn't like, and second TKW starts edit warring and leaves a section called "Warning" or "Bad faith removal" in my talk page, or lock the page to "win the battle", or write insulting edit summaries, leaving overt or covert threats of banning here and there. And yes no one will blame him for these deeds because they happen in a user talk page or page histories, very hard to notice. But I will be blamed for starting a BP thread complaining because only now everyone sees something unpleasant brought up. Basically there is nothing more I can do at this point. This happened again and again. How many times should I mention the wrong information about |sc= TKW left still hangs on Template:ll/documentation? I am tired of seeing more unbiased negotiators saying things like "he should have discussed it beforehand", "he could have done better", "he could have left this to another admin". Why should he ever listen when he never gets any consequence for not listening? Your advice just sounds pretty much like "let them eat cake".

Since you have mentioned Wyang, I hope you do not forget what message he has left on his user page. Don't you think a person no longer needing to work for what he thinks is "a 乌烟瘴气 kludge" is beneficial for both himself and the website? What is "gloating"? -- Huhu9001 (talk) 23:49, 10 June 2023 (UTC)[reply]

@RichardW57 You mentioned above something about a discussion column in UnitTests. Do you mean a column with notes (which already exists in some uses of this module and for which support exists) or something that is editable so people can carry on an actual discussion? The latter would be significant work to implement and IMO is better left for talk pages. Benwing2 (talk) 23:55, 10 June 2023 (UTC)[reply]

Huhu9001 literally refuses to talk to me, but instead:

Reverts my work without discussion.
Talks about me in the third person while replying to my comments, despite being asked to stop, as it is an incredibly rude thing to do.
Makes accusatory and inflammatory Beer Parlour threads about me.

This all started after I gave him a 24 hour block for behaving in a rude manner, and he has only become worse. He has repeatedly engaged in dishonest, manipulative and - quite frankly - insulting behaviour, and has shown himself to be petty and vindictive in the process. I simply cannot believe he is engaging in good faith here, as he seems to be entirely driven by spite. His feigned exasperation seem to be yet another attempt to manipulate other users, because from my perspective he has made absolutely no effort to be collaborative whatsoever. Given that he has stated it numerous times by now, I should point out that his agenda seems to be getting me desysopped, which is why he keeps kicking up a huge fuss. It's vindictiveness, and nothing more. Theknightwho (talk) 00:01, 11 June 2023 (UTC)[reply]

Just a column with notes, but notes capable of nicely linking to a discussion. --RichardW57 (talk) 00:55, 11 June 2023 (UTC)[reply]

Quoting texts written in romanization

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix): Hopefully this will be my last ping to y'all for a while, sorry. How should we quote texts for Okinawan (or other languages) written by natives but in different romanization patterns, such as Kokuritsu Kokugo Kenkyūjo (国立国語研究所) (1963) 沖縄語辞典 (Okinawa-go Jiten) [Dictionary of the Okinawan Language] (in Japanese), Tokyo (東京): Okurashō Insatsu Kyoku (財務省印刷局) or Mitsugu Sakihara (2006 June) Okinawan-English Wordbook, →ISBN? Should they be normalized to mixed script? Katakana? Not everyone writes using Hepburn romanization either. Similar thing applies to Katakana-only texts, though I assume that we can leave those in Katakana. This also applies to linking within quotes or citing sources as well. Whatever is decided should also be listed at WT:About Okinawan. AG202 (talk) 15:26, 8 June 2023 (UTC)[reply]

I suggest mixed script. Chuterix (talk) 15:39, 8 June 2023 (UTC)[reply]

@Chuterix Just to be sure, would you support changing a quote to be normalized to Kanji? So like if I were to quote:

1963, “ʔicigu”, in 沖縄語辞典 [Okinawan Dictionary]‎^[3], page 246:
ʔicigu mamatumuti ʔikatareN sjașiga, satuja cimu kawati, 'jusuni nariti
ʔicigu mamatumuti ʔikatareN sjașiga, satuja cimu kawati, 'jusuni nariti

should it be instead something like this?:

1963, “ʔicigu”, in 沖縄語辞典 [Okinawan Dictionary]‎^[4], page 246:

一(いち)期(ぐ)ままとぅ思(む)てぃ御語(いかた)れん為(しゃ)すぃが、里(さとぅ)や肝(ちむ)変(か)わてぃ、余所(ゆす)に馴(な)りてぃ
ichigu mamatumuti ikata ren shasiga, satuya chimu kawati, yusuni nariti
-

(Also there should be a nocat parameter for this template and {{ja-usex}}.) Apologies for any errors. AG202 (talk) 19:36, 8 June 2023 (UTC)[reply]

Yes. Chuterix (talk) 19:39, 8 June 2023 (UTC)[reply]

Actually, would it be possible to have a like "normalization" parameter for the quote & cite templates? @Benwing2, @Surjection, @Erutuon? Mainly wondering about the technical limitations. Maybe something similar to the "literal translation" parameter. AG202 (talk) 16:05, 8 June 2023 (UTC)[reply]

From a technical standpoint it should be entirely possible to implement. It might also be handy for some other languages, so it could be considered. — SURJECTION ^{/ T / C / L /} 18:28, 9 June 2023 (UTC)[reply]

Would it make sense to allow an optional free text line to explain what 'normalised' meant. I supposed we would want templates for the common explanations. We might want different positions to be available. For example, 'read as' comments might follow the raw text, but a conversion from cuneiform transliteration to transcription would come later. --RichardW57 (talk) 18:50, 9 June 2023 (UTC)[reply]

I can't speak to technical limitations, but in terms of output I would prefer to have quoted as close as possible to the original, with standardized forms under the quoted text. Thus:

1963, “ʔicigu”, in 沖縄語辞典 [Okinawan Dictionary], page 246:

[ʔicigu] mamatumuti ʔikatareN sjașiga, satuja cimu kawati, 'jusuni nariti.

ichigu mamatumuti ikata ren shasiga, satuya chimu kawati, yusuni nariti.

I thought when we met that we would be together for life, but you have had a change of heart and fallen for someone else.

(I'm not sure about the translation, though!)

In principle I prefer to leave the original unchanged as much as possible in order to give a "truer" sense of its form as well as meaning. Cnilep (talk) 22:51, 8 June 2023 (UTC)[reply]

I agree with Cnilep. Facts are sacred. Quote the original text. For example, back-converting Roman script to Brahmi is simply fraudulent. (This comes under the 'other language' part of the question.) Now, what one can legitimately do is bend the transliteration line to also show the text in the preferred script, which is akin to what I do when I have to change things for intelligibility, e.g. in the quotation for ສະລະນະ (salana). --RichardW57m (talk) 10:55, 9 June 2023 (UTC)[reply]

Yeah, but what main script? Mixed Japanese script or Latin script? Chuterix (talk) 13:43, 9 June 2023 (UTC)[reply]

Done. Now Okinawan ʔucinaa. Chuterix (talk) 13:56, 9 June 2023 (UTC)[reply]

Hmmm I don’t think we need entries like that right now. Romanization entries are allowed, but I’m not sure if we need a “Latin script” entry (especially since that schema is not used too too often and is only found in dictionaries). We don’t create Yale romanization entries for Korean for example. AG202 (talk) 14:26, 9 June 2023 (UTC)[reply]

@Chuterix, I agree with @AG202 here -- if we are to create romanized entries for Ryukyuan languages, we should follow the format we already have for romanized Japanese or romanized Gothic, even -- see agisleiks#Gothic, sakura#Japanese, etc. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:17, 9 June 2023 (UTC)[reply]

@Eirikr But ʔucinaa (uchinā) is semi-IPA transcription, not Hepburn; what do we do? Chuterix (talk) 22:37, 9 June 2023 (UTC)[reply]

I think we should not create romanized entries for any language, until and unless there is a clear standard.

I am not aware of any such clearly leading standard for the Ryukyuan languages, but then again I have not spent nearly as much time on these as I have on Japanese.

For now, for ʔucinaa, a quick look at online results shows many more hits for forms like uchina (google:"uchina" "okinawa": 68.9K hits) or uchinaa (google:"uchinaa" "okinawa": 7.1K hits), as compared to ʔucinaa (google:"ʔucinaa" "okinawa": 367 hits).

@Chuterix, I would counsel patience here. There is no burning need for us to create romanized entries for the Ryukyuan languages. We should take the time to research and determine if there is any kind of leading and widely used standard for transcribing these terms in the Latin alphabet, before we launch into creating these. ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:08, 9 June 2023 (UTC)[reply]

@AG202, RichardW57 I agree with User:Surjection here; we could definitely add a normalization param without too much difficulty. This could be useful e.g. for Old English or other languages that have a standard form but where the texts don't always use it. However if we want the flexibility that RichardW57 proposes, that would take a bit more work, and before doing it I'd want to see a design proposal hashed out for what the params and values would look like and how that would translate into the display. Benwing2 (talk) 19:32, 10 June 2023 (UTC)[reply]

As I see the problem of "ʔucinaa" vs "uchina" has been brought up, it may be worth noting that the current orthographies Wiktionary is using for Ryukyuan languages are different from what seem to be the offical(?) ones here: 沖縄県における「しまくとぅば」の表記について. The current system is probably borrowed from jlect.com (or just the one with more hits in Google results?) but I am not sure. @Chuterix: Perhaps Ryukyuan language editors want to explain and compare these different systems first? -- Huhu9001 (talk) 00:14, 11 June 2023 (UTC)[reply]

CC: @MiguelX413. I’ve personally been following the one listed by the Council listed at w:Okinawan_scripts for the little I’ve done, with changes for ゑ, since it seems like we use that one here already. AG202 (talk) 01:26, 11 June 2023 (UTC)[reply]

FYI: June Updates from Unicode

https://mailchi.mp/dc330f87a3c8/unicode-in-6240894 —Justin (koavf)❤T☮C☺M☯ 18:04, 8 June 2023 (UTC)[reply]

Confirmation vote

To increase admin accountability, I would like to submit myself to a confirmation vote. As a reminder, I've been made an admin a little more than a year ago, and I've tried to mostly stay away from controversies ever since, though I'm not sure I've entirely succeeded in that.

Are there major objections to the idea? P U C – 18:39, 9 June 2023 (UTC)[reply]

I think it would become a bit tedious if sysops did this regularly. I think that unless people bring complaints against you, you can assume that you are in good standing and that people would vote to maintain your admin privileges. Andrew Sheedy (talk) 18:40, 9 June 2023 (UTC)[reply]

Yeah, please don't. I am still in favor of changing the criteria for removing rights so that it is the complement of the criteria for giving rights, e.g. if 2/3s are required to give rights, 1/3 is sufficient to remove them. The logic being that if more than 1/3 wish to remove, a 2/3 vote would no longer pass. - TheDaveRoss 20:20, 9 June 2023 (UTC)[reply]

I'm also in favor of this change of criteria. Megathonic (talk) 03:38, 11 June 2023 (UTC)[reply]

I can’t think of any objections anyway. Other than making annoying votes—will be enough for some people to vote contra, as there is no requirement for being reasonable or proportionate when voting. Fay Freak (talk) 00:17, 10 June 2023 (UTC)[reply]

Agreed that we don't need to do this and IMO it sets a bad precedent. (Also not in favor of a 1/3 vote to remove; there is far too much randomness in voting here. IMO such a system wouldn't work even if there is a fixed set of voters who regularly vote, such as in the US Congress or UK Parliament, much less in a fairly anarchic system such as we have here.) Benwing2 (talk) 19:36, 10 June 2023 (UTC)[reply]

I don't have any major objections, but I don't think it's necessary to have periodic confirmation votes for admins. Personally, I wouldn't create it. Megathonic (talk) 03:38, 11 June 2023 (UTC)[reply]

You could get yourself pointlessly desysopped and then have another vote. El Tío Medio Cabrón (talk) 14:51, 11 June 2023 (UTC)[reply]
I really think what Wiktionary needs is more voting and less editing. Vininn126 (talk) 14:54, 11 June 2023 (UTC)[reply]

Seems pointless and liable to trigger further willing or unwilling "confirmation votes". Equinox ◑ 14:56, 11 June 2023 (UTC)[reply]

I would support this, although I wouldn't make it a practice to require confirmation votes for admins to maintain their positions. I think such votes would give interesting information. I've thought before about starting an informal vote for myself to gauge my approval rating, although without any implications tied to the outcome of the vote. Imetsia (talk) 17:56, 15 June 2023 (UTC)[reply]

Handling Neapolitan and Sicilian

Under these two labels fall a wide range of dialects which are in modern Romance linguistics deemed 'Upper Southern Italian' and 'Extreme Southern Italian' respectively. The two dialect groups show a great deal of internal diversity, and neither features an overall standard or even any sense of unity on the part of their speakers. The average man from Salento would no doubt be baffled to hear the local dialect labelled 'Sicilian'.

So one might ask: what spellings, and what pronunciations, should we indicate on Wiktionary for two dialect groups that have no standard spelling, much less a standard pronunciation?

I propose that, for any given lexeme, we opt for a traditional, preferably conservative, spelling based on some prominent dialect, such as that of the city of Naples ('Neapolitan' proper). Then we may indicate orthographies for other dialects as alternative spellings on the same entry. This does amount to an ultimately arbitrary in-house standardization, for the purposes of Wiktionary, but the alternative would be dealing with the chaos of having potentially dozens of equal lemmas, all under the name 'Neapolitan', and this I would prefer to avoid.

As concerns pronunciation, I would caution against trying to fit dozens of dialectal variants under a single 'perfect' phonemic representation. @Catonif made a valiant effort to do this for Sicilian, in the wake of an older discussion, and it proved to be all but impossible in the end.

Instead, I propose adding phonetic transcriptions labelled according to the specific (sub-)dialect that they are meant to represent.

I have made basic entries for Neapolitan miedeco and Sicilian medicu to illustrate how this would work in practice.

As a final thought, I would like to add that 'Tarantino', a purported language added to Wiktionary by SemperBlotto, is probably best merged into Neapolitan in the manner described above. This was brought up in an earlier thread by @Jamala. Nicodene (talk) 22:01, 12 June 2023 (UTC)[reply]

It is definitely the case that spelling in Neapolitan is varied and my understanding is that Neapolitan dictionaries use Naples-style spelling, so I would be in favor of using that as the standard spellings and others as variants. I'm too ignorant of Sicilian to even speculate there. —Justin (koavf)❤T☮C☺M☯ 22:10, 12 June 2023 (UTC)[reply]

What you propose seems like a good idea. Thadh (talk) 09:30, 13 June 2023 (UTC)[reply]

I mostly agree.

For Extreme Southern Italian orthography, what I've been lemmatising at overtime is the (minorly polished) Sicilian w:Giovanni Meli orthography, which itself is not much different from the result of modern attempts of standardisation of the insular language (with the one great difference that of course the latter are by far more unnatural and artificial) or from the one of dictionaries like w:it:Antonino Traina's and Vincenzo Mortillaro's. Of course, Calabrian only or Salento only terms would follow their own lemmatisation rules and should not be "Sicilianised". There probably shouldn't be to many hard rules, I leave much to common sense, while trying to keep artificial normalisation to a minimum.

About phonemic and phonetic transcription, I've given up on both, as both are bound to be misleading and unavoidably wrong when given by someone like I am who doesn't speak the language natively. I'd abstain from adding them unless they're from a precise and up-to-date source. If provided with relative certainty, on the other hand, it is a great idea, as it portrays the diversity of the dialects which is very important and I see many people gloss over and assume one big happy unified language.

Finally, I agree Tarantino L2 should go (but only on theoretical grounds, as I wouldn't partake in the cleanup process :p). Catonif (talk) 09:34, 13 June 2023 (UTC)[reply]

(As a Spanish speaker & Portuguese reader....) This is fascinating, & reasonable-sounding.

Editors of Asia-Pacific languages should take note. Please check out MEDICU & MIEDECO. @Wpi @Justinrleung @mgc 釆 (talk) 15:56, 4 July 2023 (UTC)[reply]

Rethinking Finnish (Finnic) infinitives

@Hekaheka, Brittletheories, Thadh (tried to ping every active editor who could be interested, sorry if I forgot anyone)

The concept of Finnish "fourth infinitives" and "fifth infinitives" appear to be obsolete For example, ISK considers Finnish to only have three infinitives. Their view is that

the "fourth infinitive" is just a verbal construct employing the verbal noun with olla (note that this is strictly about the onstruct "se on tekeminen", "sitä ei ole tekemistä", and thus only the nominative and partitive forms exist), and
the "fifth infinitive" is a kind of adverbial

I would agree that these infinitives are not like the others, because they cannot stand alone without olla (“to be”).

Thus I think we should retire them too. We should instead consider -minen to purely be verbal nouns, while the fifth infinitives could stay around as is but renamed to something else. One possibility is "proximative adverbial" (my own creation, based roughly on the introduction of Ylikoski 2003, a paper covering the usage of the forms, as well as the aforementioned ISK classification). We could still keep both in the conjugation tables.

Some dissenting opinions do also exist. Verkkokielioppi considers the fourth infinitive to be legitimate, but not the fifth infinitive (at least based on how the sentence describing it is worded).

For the record, Estonian already considers -mine forms to purely be verbal nouns. — SURJECTION ^{/ T / C / L /} 18:51, 13 June 2023 (UTC)[reply]

I personally think our current way of handling these is fine. Since -minen is 100% productive - every verb can have these - it seems reasonable to handle these as verb forms (so, infinitives), rather than derived nominals.

I'm not familiar with the Estonian situation. If there, too, -mine forms can be derived from any verb, then I also think these should be handled as verb forms. Thadh (talk) 19:23, 13 June 2023 (UTC)[reply]

This all may also interest @Joonas07, Mölli-Möllerö, and maybe they know even more people to ping. Thadh (talk) 19:26, 13 June 2023 (UTC)[reply]

My quick response: -minen/-mistA forms are verb forms as they can be used in sentences like "hänen ei ole itkeminen"/"sinne ei ole menemistä". (I'm not going to attempt to translate those sentences, sorry. I couldn't get the nuances right if I tried; someone else can try if they so wish.) In these cases the -minen form is obligatory as "hänen ei ole itku" and "sinne ei ole menoa" do not sound good. Also tekemäisillään is productive. I don't really care that much what the names of these forms are called, but I really hope that they will be listed in verb conjugation tables also in the future. Mölli-Möllerö (talk) 07:02, 14 June 2023 (UTC)[reply]

We can show -minen due to productivity, but I don't think this form is necessarily distinct. I would be inclined to agree with ISK's view that this is a special use of the -minen verbal noun. — SURJECTION ^{/ T / C / L /} 10:12, 14 June 2023 (UTC)[reply]

Hello!

I don't speak Finnish so I don't know the nuances of the different infinitives and frankly don't understand the meaning of the 4th and 5th infinitive, as in these example sentences. All I can say for Estonian is that, as far as I understand Finnish grammar, Estonian only has the Finnish 1st infinitive (called the da-infinitive), and the Finnish 3rd infinitive (ma-infinitive and also the dictionary form). -mine forms are just verbal nouns, which can technically be derived from any verb in a grammatical sense. I don't understand how these could be considered infinitives though, since they are nouns derived from the verb. Joonas07 (talk) 21:53, 15 June 2023 (UTC)[reply]

There is a brief explanation of the supposed 4. and 5. infinitives here. — SURJECTION ^{/ T / C / L /} 15:08, 18 June 2023 (UTC)[reply]

I don't think it's fair to consider that as verb forms for only this reason. We have participles as a special case as well - if this is really such an issue (which I don't see it as), we could use a similar system. These aren't infinitives in the same sense as the first three. — SURJECTION ^{/ T / C / L /} 20:28, 13 June 2023 (UTC)[reply]

Now that I think about it, tekemäisillään is translated in English as "about to do". Note the to + dictionary form structure in the English translation as well, just like in 1st and 3rd infinitives (haluan tehdä = I want to do, lähden tekemään = I'll go to do). There is thus at least some argument to regard that as a special kind of infinitive as well. Mölli-Möllerö (talk) 07:02, 14 June 2023 (UTC)[reply]

tekemäisillään cannot be used without olla, unlike the first three infinitives. — SURJECTION ^{/ T / C / L /} 10:09, 14 June 2023 (UTC)[reply]

If this comes to a vote, or if there's a need to somehow "measure" the consensus, I think my following comment should count for a quarter of a vote/say:

I think Surjection's idea to convert 4 and 5 to verbal nouns and adverb makes a lot of sense, as the current wording of these definitions are very clunky and unintuitive for most. Furthermore, the fact they would always appear in that construction is something I would normally consider something the reader has to be aware of, finally there seems to be a trend of handling these forms as such anyway, so there's precedent. Vininn126 (talk) 19:29, 13 June 2023 (UTC)[reply]

I think in this case the language-specific tradition in grammars is more important than the panlinguistic one. Many Finnish grammars call it the fourth infinitive, and any speaker and learner of the language would (assumingly) know what that means. Thadh (talk) 20:02, 13 June 2023 (UTC)[reply]

Sure, and yet there are also some sources that don't, as mentioned by Surjection. Vininn126 (talk) 20:03, 13 June 2023 (UTC)[reply]

The more recent the grammar is, the less likely it is that it uses "fourth infinitive" and "fifth infinitive". — SURJECTION ^{/ T / C / L /} 11:17, 14 June 2023 (UTC)[reply]

I believe our main strengths lie in Wiktionary's cross-linguistic nature, so the change seems warranted. However, I rarely edit nonlemma forms, which leaves me indifferent.

On an unrelated note, small parts of my UI have changed to Punjabi, mostly labels on some buttons, but it's wildly inconsistent. Most things have stayed English, and my preferences tab does claim the website to be in English. Has anyone else had this happen? brittletheories (talk) 20:47, 13 June 2023 (UTC)[reply]

As to the UI change, it has happened to others and there has been some discussion on the Discord server, but the source has yet to be identified. —The Editor's Apprentice (talk) 22:15, 13 June 2023 (UTC)[reply]

Looks like it was a wider problem, not just on Wiktionary, and is on its way to being fixed, see Github for the gritty details. —The Editor's Apprentice (talk) 23:02, 13 June 2023 (UTC)[reply]

I think it would be a pity to retire them. Every verb has these forms whether they are called infinitives or something else. Having them in the conjugation table makes sure that as soon as we add a verb these forms are included in Wiktionary and thus findable for a user that wonders what they might be. Could we simply stop calling them infinitives, rename them and keep a place for them in the table?--Hekaheka (talk) 11:51, 5 July 2023 (UTC)[reply]

I don't think anyone was suggesting getting rid of them entirely. Vininn126 (talk) 11:55, 5 July 2023 (UTC)[reply]

I don't think anyone suggested that someone would have proposed to remove all infinitives.--Hekaheka (talk) 10:52, 9 July 2023 (UTC)[reply]

Nobody proposed removing all infinitives, just the fourth and the fifth ones. We can keep the -minen form in the conjugation table as the "verbal noun" and the -maisillaan form can be renamed to something like "proximative" (since we can't call it a "fifth infinitive" without a fourth one). The biggest change is removing all the "fourth infinitive of..." definitions. — SURJECTION ^{/ T / C / L /} 13:53, 5 July 2023 (UTC)[reply]

Sounds reasonable to me.--Hekaheka (talk) 10:52, 9 July 2023 (UTC)[reply]

Bundling Participles under a Single PoS Header

I've been bundling homonymous participles under a single PoS header, without an associated etymology, so that they may share a common 48-cell (8 cases, 2 numbers, 3 genders) inflection table. My assumption was that the etymology could be kept under the verb. The downside of this is that someone, e.g. @Pulimaiyi, can see an 'adjective' Pali mata with the gloss 'dead' and add an etymology deriving it from Sanskrit, without noticing that one of the meanings obviously has a quite different Sanskrit correspondent. He's not the only to make that type of mistake - it even happens when homographs have different PoS headings. Should I therefore desist and instead duplicate the inflection table? --RichardW57m (talk) 13:57, 14 June 2023 (UTC)[reply]

There are three reasons for hiving off the inflection of participles from the inflection of the verb:

Laziness - at the moment I only really need separate tables for verbs and nouns, though perhaps adjective tables should be stitched together properly from noun tables.
Multiple forms. Quite a few verbs have two or more different participles with the same meaning - and sometimes the form depends on the sense!
Past participles tend to sprout derivative adjectival senses.

--RichardW57m (talk) 13:57, 14 June 2023 (UTC)[reply]

@RichardW57m: Yes, it's better to split homonymous lemmas between two etymologies. That is done for all languages. As for the common declension table, if duplicating it is not desirable, we could perhaps put it above the two etymologies, thus indicating that it is common to both the etymologies. This is already being done in case of pronunciation; pronunciation precedes etymology when there are multiple etymologies. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 14:38, 14 June 2023 (UTC)[reply]

I don't quite follow. Is it wrong to say that Pali mata is descended from Sanskrit मृत (mṛta)? Can't both this be true and that it's the past participle of marati? —Caoimhin ceallach (talk) 15:14, 14 June 2023 (UTC)[reply]

IMO you should split the etymologies and duplicate the declension tables. When they're hidden by default it's not a big deal. I do this all the time for Italian, Spanish, etc. Benwing2 (talk) 15:31, 14 June 2023 (UTC)[reply]

@Benwing2: Just to confirm that we are talking about the same thing, may I have an example please. Note that the text of intended vote Stubbifying alternative forms would force these etymology sections to not contain any text outside the ~~per-etymology~~ subsections of the etymology sections. Just using multiple PoS headers would be compliant. @Pulimaiyi: In the case of mata, the inclusion of the sense 'dead' and the noun 'opinion' may just avoid this, though I'm not sure that the noun doesn't automatically follow from the participle by Pali grammar - an 'absolute' use of the adjective! --RichardW57m (talk) 16:19, 14 June 2023 (UTC)[reply]

Well, firstly one needs to explain that one is not talking normal English. Secondly, you've not noticed the unintended pitfall. The homonym mata (“thought”, adjective) is cognate with मत (mata); in this case it's the past participle of maññati. --RichardW57m (talk) 15:57, 14 June 2023 (UTC)[reply]

@RichardW57 You need at least two headers, each with their own inflection table: one for the adjective+participle derived from mṛta, the other from the adjective+participles derived from mata. Better would be to split the adjectives and participles as well, so that adjectives get an ====Adjective==== header and participles get a ====Participle==== header, in conformity with the way other languages work. Strictly speaking that would require four identical declension tables, which may be a bit much; in this case, you might consider putting the ===Declension=== table below each both the adjective and participle headers, as L4 instead of L5 headers. Benwing2 (talk) 16:38, 14 June 2023 (UTC)[reply]

I don't know how to tell whether something that looks like a Pali participle is being used as an adjective instead. When I asked how to tell (Wiktionary:Beer_parlour/2023/February#Participles), all I got was a snarky assertion that I knew. So no can split.--RichardW57 (talk) 22:33, 14 June 2023 (UTC)[reply]

I asked at a Pali forum, and was told that in Thomas Oberlies' new book, what are generally called participles are simply described as 'verbal adjectives'! I was also asked for a sentence where not knowing would cause confusion. So what is the purpose of the distinction? Is it to aid parsing? Is it a filter for use when constructing new sentences in Pali? --RichardW57 (talk) 08:09, 15 June 2023 (UTC)[reply]

Moved the above to that older thread, where it better belongs. --RichardW57m (talk) 12:43, 15 June 2023 (UTC)[reply]

Just spitballing here, but if we really don't want to repeat the inflection tables, we could make some template like {{infl identical to|adjective}} that would display something like "Inflects identically to the adjective above." to be placed where the redundant inflection table would normally go, in an entry formatted like most other languages. But if one word or POS thereof inflects differently than another homograph on the same page, as at mata, it seems clearer to me to just repeat the table, which as Benwing says seems to be the usual approach in other languages regardless. If in general adjectives vs participles within the same ety section (or even regardless of ety) which are spelled the same inherently always and only ever inflected one way, the same way for everything that had a certain spelling, then it might make sense to do like Irish entries like gaile and earcán do with mutation info (and, in the latter entry, even declension) and have the two L3 Etymology 1 & 2 sections with L4 POS sections and then an L3 inflection section below it all — or, if the inflection is ety specific, then at L4 level with the POS within that ety section, as Benwing says. (But if homographs don't inherently inflect the same way, then this is not robust, and is liable to get borked if someone adds another homograph to the page that inflects differently.) - -sche (discuss) 18:15, 14 June 2023 (UTC)[reply]

@-sche: I'm surprised @JeffDoozan's AutoDooz lets those Irish examples alone. I think that having an L3 inflection section implies that all the homonyms have the same Pali part of speech, which in general isn't so. There are a few exceptions to homonymous adjectives (traditional definition) declining the same:

adjectives in the suffixes -mant and -vant decline differently to present participles;
the rare present participles in -anta (e.g. santa) homonymous decline differently to a-stem adjectives; and
the rare past active participles in -vant decline differently to present participles in -vant.
an a-stem adjective might have an ī-stem feminine,and I have no confidence homonyms would behave the same (this does not affect participles)

'Pronominal' a-stem adjectives decline differently to other a-stem adjectives, but I don't know of any homonyms between the two sets. --RichardW57 (talk) 21:32, 14 June 2023 (UTC)[reply]

Mutations are always L3 and apply consistently to both adjectives and nouns (with a few rare exceptions) so it makes sense for them to be L3 and the bot leaves them as-is. I agree with Ben above, just repeat the L4 Inflection table multiple times so that inflections are always in the same place relative to the definitions across all entries and all languages. If that's really super ugly, then -sche's idea works too, although as as user I'd be annoyed to open a collapsed section just to find a reference to a different collapsed section. JeffDoozan (talk) 23:27, 14 June 2023 (UTC)[reply]

@JeffDoozan Slightly off-topic (but this thread has reminded me): does the bot know how to handle situations like barc, where a mutated form is homonymous with a different lemma? Theknightwho (talk) 11:12, 15 June 2023 (UTC)[reply]

@Theknightwho: The bot doesn't have any special insight into the contents of the two mutation sections in barc, it just knows that if an Etymology contains both POS and Mutations subsections, Mutations should be sorted after the POS sections. Since barc has two Etymologies each with a POS and Mutation, the bot is happy and doesn't need to make any changes. JeffDoozan (talk) 19:35, 15 June 2023 (UTC)[reply]

He's back

Discussion moved from Wiktionary talk:Administrators.

Aside from the needless personal attack/snark (which he defends as due to my "bullying" no one in particular with my nonextent power), this revert doesn't seem remotely accurate even for Wiktionary, let alone English dictionaries as a whole. Even if the OED is musty, cf is bog standard. If TKW isn't going to get any better and is now going to go out of his way to "correct" "mistakes" like this when he's bored, any hope of any other admin having a talk with the guy?

I get everyone is still annoyed by my sense of humor with the bad etymology, but c'mon. (Here per this content but as always feel free to move this around. Beer Parlor seemed like more general discussion than admin/dispute resolution, though. Maybe that just needs to be clarified on the Help page if it's really the default.) — LlywelynII 07:52, 15 June 2023 (UTC)[reply]

@LlywelynII Changing "cf." to "compare" with the edit summary "please don't use opaque abbreviations that no other entries contain." is a completely normal and reasonable thing for an admin to do. If that enrages you to the point where you have to start a new thread about me, then that suggests you are the problem. You also reinstated my change one minute after reverting me, which completely undermines your reason for creating this thread in the first place.

To be quite frank, I consider you starting this thread to be proof positive that you have some kind of personal vendetta: you created this thread as your first response, without even a cursory attempt to discuss the issue. Theknightwho (talk) 08:08, 15 June 2023 (UTC)[reply]

I also found the wording of that edit summary misguided, but it seems like overkill to bring it to the Beer parlour. What are we supposed to discuss? Do you want to create a formal policy about whether it's acceptable to use abbreviations like Cf., or is it a complaint about how it was handled rather than the object level issue? However, while I don't find this one incident heinous, I am bothered by TKW's apparent inability to refrain from vehemently denouncing the other party when responding to complaints like this. If there are relevant extenuating facts that are not apparent, why not just state them in a cooler tone and leave it at that? I get that it's not enjoyable to feel like you're under attack, but please put some trust in the ability of members of this site to use our own brains. It's not like we could never consider the possibility of other users having a vendetta against you unless you yourself point it out. It's unbecoming of a moderator to continually fan the flames of conflict rather than putting them out.--Urszag (talk) 08:57, 15 June 2023 (UTC)[reply]

@Urszag I take your point, and it's not unreasonable. However, how many times does this need to happen before something actually gets done about it? This was an entirely normal request, regardless of whether you agree with the specifics (though the ratio of "cf." to "compare" is about 3%, so I feel it was a justified request). Theknightwho (talk) 09:35, 15 June 2023 (UTC)[reply]

I don’t take issue at all with the substance of the edit, but I feel like it wouldn’t have hurt to have used more neutral wording in the edit summary (something like “Replacing abbreviation ‘Cf.’ with the less opaque synonym ‘Compare’”). The request for LlywelynII to abstain from using ‘Cf’ in future edits actually does seems unwise and unnecessary to me. It's fairly foreseeable that he might react badly to a demand from you phrased in that blunt (and inaccurate) wording. And unless there’s an established community consensus against using ‘Cf.’ this way (there could be, I haven’t checked--if so, it would have been better to have cited it when making the request) it’s not really great to insist on having your way in matters like this, rather than letting it go as a difference of opinion. If the prospect of LlywelynII continuing to use ‘Cf.’ in future edits struck you as truly intolerable, you could have started a Beer Parlour discussion about the issue (rather than the user) yourself, or proposed a vote on it. Even if it might not have made a difference in this case, given the preexisting bad blood between you two, the use of more conciliatory and less demanding wording might have made this come across as more of a nothingburger than it already is.--Urszag (talk) 10:35, 15 June 2023 (UTC)[reply]

Yep. @Urszag if any of this can sink in, then, yes, that would be all I was originally asking for. It's fine if there was a decision somewhere but the revert and attack were uncalled for and the commentary simply wrong. To any extent that it was intended to be unfair... well, I'd appreciate it if we simply didn't interact at all any more but obviously mods need mod. TKW just needs to handle it better, which you're helping him with now hopefully. — LlywelynII 13:05, 15 June 2023 (UTC)[reply]

There was no revert - I just changed a word. The "attack" is totally imaginary, too - "please do not do X" is pretty standard in admin edit summaries. Theknightwho (talk) 13:22, 15 June 2023 (UTC)[reply]

I much prefer more neutral wording than "please don't do X". You may not consider it an attack, but it's perceived as such by many, in part due to the limitations of the written mode of communication. Being circumspect about your phrasing prevents many hurt feelings the occasional drama. —Caoimhin ceallach (talk) 17:00, 15 June 2023 (UTC)[reply]

Anyone who reads "please don't do X" as a personal attack should log off and maybe consider touching grass... Ioaxxere (talk) 23:27, 15 June 2023 (UTC)[reply]

Maybe I'm currently lying on the grass editing wiktionary. What now? Seriously though, a plain imperative can come across as a rebuke much faster online than in face to face communication. —Caoimhin ceallach (talk) 11:35, 16 June 2023 (UTC)[reply]

Nothing TKW did can be described as a revert. Reverting is changing a page back to a previous revision. The only revert on that page was your (temporary) reversal of TKW's edit. Since this is a wiki, we should all get used to other people editing our work, and not reflexively undo their edits. There is nothing wrong with making an edit to replace "Cf." with "Compare".--Urszag (talk) 21:24, 15 June 2023 (UTC)[reply]

Putting an untruth in a remonstrance is not a good idea. (3%, if distinct from 90%, is more than none.) It tends to enrage the recipient. --RichardW57m (talk) 10:44, 15 June 2023 (UTC)[reply]

It may be very hard sometimes to understand this kind of abbreviations even for people who know English very well. Just yesterday I've met a language guy who tried to read some very advanced dictionary of his language, and he was needed to use some other dictionary for understand the the abbreviations in it, even if some of them seemed kinda standard in some linguist circles. May it be a bad idea to use too much abbreviations, especially the less-used ones? I've seen cf before, but it seems much less common than abbreviations like etc and e.g., even if it seems very nice and short, so may be it is best to not use it?

From my standing place, I've been used to use f.ex. insteadof e.g., but since I know it ain't so common in English compared to my language, I use e.g., so more English-speaking people can understand it and read easy, of just write the whole stuff as for example, so evry body can read it and find in any dictionary independent of their language.

May it be that the milieu of Llywelyn is using cf very often, but in general English it is not so common? In this case it can be good to avoid use of cf. But is it really so big deal to argue about?

PS it seems like some other people use cf on Wiktionary, so it ain't no un-usual practise at least on Wiktionary. Tollef Salemann (talk) 10:25, 15 June 2023 (UTC)[reply]

Of course they do. It's absolutely standard although TheDaveRoss isn't wrong that we aren't space constrained. For my own part, it's more an issue of just very much appreciating the other admins finally discussing TKW's unadmin-like behavior until they completely knock it off.

There was a further message on my talk page accusing me of being the one 'with a beef' after this latest nonsense. I'm not perfect. I'm not an admin. I shouldn't be getting full-day blocks for first reverts or continuing harassment from someone you made a mod. I'm sure TKW is useful in some other capacity. Get him off my case and let him shine. — LlywelynII 12:51, 15 June 2023 (UTC)[reply]

I can't find it, but I remember recently reading a discussion where the consensus was that we should use 'compare' rather than 'cf'. --RichardW57m (talk) 10:37, 15 June 2023 (UTC)[reply]

Fair enough if that's true. Kindly link it. — LlywelynII 12:51, 15 June 2023 (UTC)[reply]

That was my intention, and this time I've found it - Bot remove of 'confer' from etymologies, but I may have been misled by the presumption of the topic. @Benwing2 has done some preliminary work. --RichardW57m (talk) 10:24, 16 June 2023 (UTC)[reply]

Wiktionary is not subject to any space constraints* like a paper dictionary is; we have no reason to use any abbreviations of the nature of "e.g.", "cf." and so on. I have been known to replace these abbreviations when I see them. (* Okay, some places do have constraints, like translation tables, but those are the exception.) This, that and the other (talk) 11:37, 15 June 2023 (UTC)[reply]

Cf. is fine, Compare is fine, that edit summary is fine, creating a Beer Parlour discussion over every perceived slight is not fine. Admins are not gods, you are not required to obey their requests if there is no consensus or policy backing them up. If you want to use Cf. go right ahead, if TKW wants to change that to Compare he can do that as well. If one or both of you feel there needs to be some policy standard around which of those are required (here's a hint, there doesn't, both are fine) feel free to create a discussion about that to see if there is a consensus. - TheDaveRoss 12:41, 15 June 2023 (UTC)[reply]

For what it's worth, I didn't. It was moved here. — LlywelynII 12:48, 15 June 2023 (UTC)[reply]

Can we please establish some kind of policy that editors must attempt to resolve issues before raising complaints about other users? I want to reiterate that this was LlywelynII's first response, which is clearly not appropriate by any reasonable measure. Theknightwho (talk) 12:53, 15 June 2023 (UTC)[reply]

It wasn't my response at all and you have started establishing a pattern of unwarranted abuse. I'd just appreciate—if you're going to be a mod—that someone would finally talk to you about acting like one instead of harrassing me about standard abbreviations and then gaslighting on my talk page after asking you to, y'know, kindly *stop*. — LlywelynII 12:55, 15 June 2023 (UTC)[reply]

Yes, it was: your response was to (a) revert the change, (b) reinstate it, (c) start this thread. All of that happened with no input from me, so it absolutely was your first response. Please do not attempt to mislead other users. I treated you in exactly the same way I would any other user, and your response has been to make a mountain out of a molehill because of your own personal grievances. Theknightwho (talk) 12:57, 15 June 2023 (UTC)[reply]

I started it elsewhere. There shouldn't've been a revert. The note on the edit should've been neutral. (Not that it matters, but I let your edit stand out of the possibility cf was unhelpful.) The issue OP was complaining about wasn't my 'first response'. Your inability to let any of this go and to misrepresent it speaks to exactly why you need an intervention here. — LlywelynII 13:01, 15 June 2023 (UTC)[reply]

You started this thread at Wiktionary talk:Administrators raising a complaint about me, explicitly mentioned a dispute resolution procedure, and didn't even tag me in it. You are now trying to downplay it because it's an obvious overreaction. Just stop. All I did was tell you not to do something very minor, and you've just blown up! Theknightwho (talk) 13:04, 15 June 2023 (UTC)[reply]

It wasn't a complaint. It was a repeat of my continuing request that someone sit you down and explain that, no, you can't troll as a mod. Sorry it got moved here and became needlessly public. That said, sure, avoiding me altogether would be a pleasant outcome. There are other mods if I'm actually doing anything truly awful. (No, using cf. didn't really count.) — LlywelynII 13:12, 15 June 2023 (UTC)[reply]

I told you to do it because of the reasons others have outlined in this thread, not because I was "trolling" you. The fact you have a problem with me does not mean you get to start making wildly disproportionate accusations as you've just done (gaslighting, trolling, harassment, unwarranted abuse), because it's a pretty major breach of WT:CIVIL. Theknightwho (talk) 13:19, 15 June 2023 (UTC)[reply]

If a complaint thread is started at Wiktionary talk:Administrators that has absolutely nothing to do with the adminship of the complainee, it is IMO entirely appropriate to move it to a more appropriate venue. --Lambiam 09:05, 2 July 2023 (UTC)[reply]

Fwiw, someone did mention there might've been a "recent... discussion" preferring not to use cf. any more. If true, yeah, fine. Kindly link it and stop the personal attacks and hostile commentary as you go about necessary mod business. — LlywelynII 12:55, 15 June 2023 (UTC)[reply]

I don't know where it was discussed, but please note Wiktionary:Todo/unhelpful abbreviations, which has been there since 2010, and which was created expressly to correct the use of such abbreviations. While it's true that TKW hasn't been terribly diplomatic, I seem to remember it taking several years before you learned to humor those poor unfortunates without the benefit of your superior knowledge and follow consensus... Chuck Entz (talk) 14:20, 15 June 2023 (UTC)[reply]

@LlywelynII See WT:Style guide (which says to expand "cf." to "compare") and Wiktionary:Grease pit/2023/February#bot removal of obsolete "confer" from etymologies (where a bot run was done to replace "confer" with "compare" in etymologies). Benwing2 (talk) 18:38, 15 June 2023 (UTC)[reply]

@Equinox This is just the kind of thing I was talking about. I'd already said I would be doing the thing you'd asked and was just talking about this kind of thing as a general solution to the 'problem', so the abuse that followed was unwarranted (at least in that context). Still, yeah, it's your talk page and I'll stay off of it in the future. Apologies for whatever you read between my lines. — LlywelynII 04:20, 18 June 2023 (UTC)[reply]

This is the lamest discussion of the year so far (and there have been plenty of lame threads). Thyself be knowne (talk) 13:03, 15 June 2023 (UTC)[reply]
Nah, the discussion from some other guy about removing TKW entirely from modding was even worse since he was almost entirely wrong on the merits. Here, it's just whether there's a new policy about avoiding any traditional abbreviations, plus general rudeness in edit commentaries. — LlywelynII 13:10, 15 June 2023 (UTC)[reply]
It would have been great if that's what it was about, but you called it "He's back" not "Consensus on common abbreviations (and rudeness in edit commentaries)". Instead of making it about the issue, you made it about the person. Andrew Sheedy (talk) 18:07, 15 June 2023 (UTC)[reply]
But that's exactly what this is: An informal personal request for someone to take the guy aside. Everyone already understands edit commentaries shouldn't be rude. The policy on not harrassing users should be pretty solid, as should avoiding them when there's personal history and they've asked to be let alone. It got moved here to a public setting where more formal phrasing would've been more appropriate, sure, but the main point is this particular admins' particular actions and pattern of behavior. — LlywelynII 04:24, 18 June 2023 (UTC)[reply]

Agree we should prefer "compare" over "cf.". We don't need to save space like a paper dictionary. I also dislike LlywelynII's habit of using the pedantic, obsolete "&c." in place of "etc." (Of course I don't think we should write "et cetera" either, since while we don't need to save space, "etc." is the normal, standard everyday form.) Equinox ◑ 18:23, 15 June 2023 (UTC)[reply]

I found "Wiktionary:Style guide#Abbreviations", but this is not formal policy. — Sgconlaw (talk) 18:35, 15 June 2023 (UTC)[reply]

I don't think a formal policy should be required. It should, however, be standard practice to write reasonably current 21st-century English in etymologies, usage notes, etc. and not to use forms that are measurably stilted and obsolete. We have enough trouble with the "archaists" as it is. Equinox ◑ 01:24, 17 June 2023 (UTC)[reply]

Again, though, cf. is bog standard and in the OED. Avoiding it—if necessary—should just be a stated policy. Otherwise you end up with mistaken/gaslighting harrassment ("no one ever..." &c.) like what happened here. I get I'm unsympathetic in your eyes but it's still not good to have secret or arbitrarily enforced policies. [edit: That said, Benwing seems to link to some above. I'll read through the Style Guide again since obviously missed that last time.] — LlywelynII 04:15, 18 June 2023 (UTC)[reply]

I think that I speak for everybody here when I say that Wiktionary was a wonderful place, almost utopian, right up until TKW had to come along and ruin everything for everyone. Now there isn’t a single edit that anybody can make without the fear of invoking his wrath. Mark my words: it’s only a matter of time before he enslaves us all, whipping our half‐naked bodies in the scorching hot sun as we’re forced to make edits, templates, and entries for him under his yokes and chains. We must put a stop to this knightly tyrant before it’s too late!!!!!! —(((Romanophile))) ♞ (contributions) 19:42, 15 June 2023 (UTC)[reply]

I think we have had one or two similarly abrasive or prickly contributors before, but they are all gone now. DCDuring (talk) 21:23, 15 June 2023 (UTC)[reply]

I, for one, welcome any half-naked whipping Feel more heavy (talk) 21:27, 15 June 2023 (UTC)[reply]

Well, Equinox asked me not to discuss the syn template further on their talk page (overly rudely given that I was just asking about his reasoning and the actual policy behind it but, hey, it's their talk page and they're still annoyed at me for a previous thing over an etym I got cranky about), but I'll just link and document here that I politely asked TKW to just stop bothering/following/attacking me and he couldn't even do it for the length of that thread on a separate admin's tak page. Roman might have been sarcastic that the TKWs we have with us always but it's actually fairly accurate at least in my experience. The entirely unwarranted personal abuse and inability for the guy to control himself really does all on its own make coming here incredibly unpleasant. — LlywelynII 04:09, 18 June 2023 (UTC)[reply]

There has been no personal abuse: you are slandering me at this point. You don’t get a free pass just because you complain about me a lot. Theknightwho (talk) 17:26, 18 June 2023 (UTC)[reply]

Why is [Biblical] Aramaic unqualified?

Aramaic isn't a single language, but rather a family of languages spoken to today. The current practice regarding Aramaic looks strange, though, in that all varieties in the family receive a qualifier or an alternative name (e.g., Samaritan Aramaic, Imperial Aramaic, Classical Syriac, Assyrian Neo-Aramaic) except for Biblical Aramaic (aka Jewish Babylonian Aramaic), which receives no qualifier and is simply called "Aramaic". I find it a little strange that this variety, in Hebrew square script, was chosen, arbitrarily as far as I can tell, to be the type species (cf. in גברא where it functions as a sort of 'hub') of the whole family, though, especially since its 1773 entries isn't as much as CS (2535) or ANA (2008), and it's a dead language, too. Normally what I've seen on Wiktionary is that ancient, no-longer-spoken languages with a homonymous descendant receive some qualifier (often "Old" or "Ancient"), which would make Aramaic a conspicuous exception.

Tangential to this is when a Biblical Aramaic form in square script is given as the ancestor to a Classical Syriac form (e.g. *yad-) when a more proper ancestor would be Imperial Aramaic, instead: BA was the Hebrew-influenced variety used by a specific ethnolinguistic group of a specific place (and has no descendants to my knowledge), whereas IA was the lingua franca for the whole Near East at the time.

I think a structure where IA is taken to be ancestral to the other varieties (or at least standing as if it were ancestral; cf. Classical Latin) makes more sense, and this would then allow words to be linked across scripts and longer time depths via etymological relationships and descendant forms, similar to how Romance cognates all congregate under the Latin entry's Descendants section. RagingPichu (talk) 15:58, 16 June 2023 (UTC)[reply]

@RagingPichu Maybe the Aramaic and/or Hebrew editors can comment: (Notifying 334a, Rhemmiel, Shuraya, ZxxZxxZ, Ruakh, Qehath, Mnemosientje, Isaacmayer9, Metaknowledge, Sartma): I'm sure there were prior discussions on this. BTW this is more of a Beer Parlour discussion, maybe you can move it there. Benwing2 (talk) 21:58, 16 June 2023 (UTC)[reply]

I'm surprised by your phrase "Biblical Aramaic (aka Jewish Babylonian Aramaic)"; in my experience those refer to two different things. (See, for example, the Wikipedia articles Biblical Aramaic and Jewish Babylonian Aramaic.)

I doubt that anyone intended to have "Aramaic" mean specifically Biblical Aramaic and/or Jewish Babylonian Aramaic, but it seems possible en.wikt has come to use "Aramaic" that way as a result of different editors doing different things. If so, then I agree with you that that's a problem, and should be fixed.

(Note: I say "possible" and "if so" because, with my very limited knowledge of Aramaic, I can't tell if that's actually the case. Like, for example, you say that at Reconstruction:Proto-Semitic/yad-, "a Biblical Aramaic form in square script is given as the ancestor to a Classical Syriac form", but of course the page itself doesn't say that that's what it's doing, so I'm not sure how you know. Are you saying that the forms יְדָא (yəḏā) and יַד (yaḏ) don't exist outside of Biblical Aramaic? Or are you making a supposition that the person who put it there was thinking of Biblical Aramaic (as evidenced by their use of the square script)? Or something else? Not that I particularly think you're mistaken, just that it's not obvious to me one way or the other.)

—Ruakh_TALK 23:26, 16 June 2023 (UTC)[reply]

You're right regarding JBA and BA, I accidentally confused myself. They're not very well distinguished here in my experience, which exacerbated my confusion and points out the problem more. Regarding my comment on *yad-, I'm saying the latter. I've anecdotally noticed a tendency to conflate IA and BA here and elsewhere, which is not correct; I was trying to demonstrate an example of BA being used as a substitute for IA. IA has 𐡉𐡃 (yd) if memory serves; no idea about the vocalism. RagingPichu (talk) 06:08, 17 June 2023 (UTC)[reply]

@RagingPichu: You need to be careful to distinguish the script used for the entry names from the language. Most of the scripts used for Aramaic are rather similar in composition to the Phoenician script they all came from, so it's not difficult to switch the letters from one script to the other without changing the content. My impression is that use of the Hebrew square letters is just a convention that arose because at one time it was a script that a lot of the scholars who dealt with Aramaic as an academic subject were familiar and comfortable with. As for Wiktionary's practice, look at the entry for Aramaic 𐡂𐡁𐡓𐡀 (gbrʾ): as you can see, it's a full entry with a Descendants section showing Jewish Bayblonian Aramaic under Eastern Aramaic and Biblical Aramaic under Western Aramaic. The main problem seems to be more lack of consistency as to which script to use for the main entry than of privileging one historical Aramaic language over the others. Chuck Entz (talk) 00:14, 17 June 2023 (UTC)[reply]

I see. It would be nice, though, if it were made consistent. RagingPichu (talk) 06:09, 17 June 2023 (UTC)[reply]

Can an "until..." phrase be a conjunction?

Please see User_talk:Equinox#until_hell_freezes_over. I am pretty much certain this is a PP but the uncharitable DCDuring seems to think I was drunk. @DCDuring. As I wrote there, I gather that the difference is that a conjunction is connecting things of equal grammatical "depth" (I can't remember the correct terms) whereas "until" is creating a subclause of some kind. Grammar nerds please confirm/deny. Equinox ◑ 01:22, 17 June 2023 (UTC)[reply]

(Chuck Entz has just drawn my attention to the difference between coordinator and subordinator. But if "until hell freezes over" is to be interpreted as using one of these, then what would the PoS be? Not Prepositional phrase but ... Conjunctional phrase? Or the Adverb wastebasket again?) Equinox ◑ 01:27, 17 June 2023 (UTC)[reply]

Functionally it is an adverb, but the appropriate wastebasket PoS is IMHO "phrase". DCDuring (talk) 15:22, 17 June 2023 (UTC)[reply]

until is a (subordinating) conjunction here and until hell freezes over is a subordinate clause. Benwing2 (talk) 18:09, 17 June 2023 (UTC)[reply]

However User:DCDuring is right that we don't have a POS "clause" or "subordinate clause" so we're reduced to calling it a "phrase". Benwing2 (talk) 18:10, 17 June 2023 (UTC)[reply]

Would we call something a clause only if it did not include a subordinating conjunction or only if it did not, or would we call both clauses. I think we would have to exclude the conjunction or include all the attestable collocations with subordinating conjunctions like when, after, if, till, that.

Also freezes is not the only verb form for freeze: eg. "I would see hell freeze over before ....", "Has hell frozen over?", "He regarded hell freezing over as an unlikely event." I don't think these are exclusively literary.

I conclude that hell freezes over is a better location for the lemmas, though it would need a good number of redirects and usage examples, about the same number that the current entry should have. DCDuring (talk) 18:41, 17 June 2023 (UTC)[reply]

FWIW, modern linguists of English categorize until as a preposition whose complement is the clause/sentence "hell freezes over". Traditional/school grammar, however, doesn't accept the notion of a preposition whose complement is a clause/sentence, and instead says that until is a "preposition" when its complement is a noun phrase and a "subordinating conjunction" when its complement is a clause. (Not sure how it would handle a case like "until recently".) Our entry, [[until#English]], takes the latter approach (as do entries in most dictionaries).

Likewise, modern linguists of English would call until hell freezes over a "prepositional phrase" or "preposition phrase", whereas traditional/school grammar would call it an "adverbial clause" or "adverb clause".

—Ruakh_TALK 19:04, 17 June 2023 (UTC)[reply]

@Ruakh Just curious, which modern linguists are you referring to? Is this across the board, or specifically Chomskyan? Benwing2 (talk) 22:59, 17 June 2023 (UTC)[reply]

I'm not well-informed enough to say "across the board" (and certainly it's not "across the board" if we count dictionaries and grade-school grammar textbooks as linguistic works), but it's not specifically Chomskyan. Examples include Jespersen's 1924 Philosophy of Grammar [page 89] (which is the source that newer sources point to), McCawley's 1998 Syntactic Phenomena of English, and Huddleston and Pullum's 2002 Cambridge Grammar of the English Language (CGEL). McCawley's book is Chomskyan, but CGEL is not, and Philosophy of Grammar was published before Chomsky was even born. —Ruakh_TALK 01:16, 18 June 2023 (UTC)[reply]

pectinate(d)

Should pectinate and pectinated be merged, or cross-linked? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:59, 17 June 2023 (UTC)[reply]

@Pigsonthewing: There are loads of botany/biology terms where ...e/...ed are synonymous. Of course the pronunciations are different so I don't like to call it "spelling variation": only one of them ends with a /d/ sound. But I do love to dedupe because DRY. You may either make one "alternative form" of the other, or "synonym of". It doesn't really matter. Thanks. Equinox ◑ 11:05, 17 June 2023 (UTC)[reply]

Done I just merged everything into one entry, because it's frankly easier. I have no doubt we could cite all senses in either form. If you doubt it, bring the cites. Thanks. Equinox ◑ 11:07, 17 June 2023 (UTC)[reply]

I dimly recall encountering pectinated in recipes to mean "has had pectin added to it" -- presumably as the past tense of to pectinate. See also google:"pectinated" recipe (you might have to click the extra link to tell Google that "yes, really, I used the damn quotes to search specifically for the fucking quoted term, you imbecilic POS AI..." [apologies for my frustration]). ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:50, 20 June 2023 (UTC)[reply]

I've added the relevant sense. There are ample quotes that could be plucked from various sources, but I'm really not up to speed at EN entry and quote format, so I must ask that others fill in any quotes or other needed details. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:42, 11 July 2023 (UTC)[reply]

Do we include digraphs as well as letters?

It's not always clear in the orthography of a particular letter whether an un-analyzable sequence of two letters counts as a letter or as a digraph, and which it is may vary regionally or over time (e.g. Spanish ch). I see a number of digraphs on Wikt, but some obvious gaps -- such as English. This would seem to be something we should include. E.g. the OED not only has an entry for the English digraph ch, but has a section break before it so that it's more salient than word entries. (The entry starts, "CH, a consonanntal digraph, which in various languages (e.g. Welsh, Spanish, Bohemian) is treated as a distinct letter, placed in the Alphabet after C. In English it is not so treated formally, but in its characteristics and proper sound (tʃ) which it has in all native words, it practically adds an additional symbol to the alphabet."

It goes on like that for half a page.

Given that the OED should, IMO, be a model for how to create Wikt., I would think we should include articles on digraphs. However, the fact that we don't have them for English makes me wonder if there's been consensus against this. If we do include them, would the POS heading be 'Letter'? kwami (talk) 17:50, 17 June 2023 (UTC)[reply]

I would say we should not, unless a given language considers a particular digraph a single letter (confer Hungarian orthography). However, the vast majority of languages do not, and in such cases I think we should not include them. Vininn126 (talk) 17:52, 17 June 2023 (UTC)[reply]

What of Spanish, where 'ch' is or is not a letter depending on the country and era? kwami (talk) 18:18, 17 June 2023 (UTC)[reply]

For Spanish I'd create a letter entry with the label {{lb|obsolete}}. I honestly think what we currently have is fine, though. Otherwise, I generally agree with Vininn, if it's not a part of the language's alphabet, I wouldn't put it under the "Letter" L2 (though I disagree that it's not a vast majority of languages). It should be under a different header imho, if anything (like "Symbol"). AG202 (talk) 18:33, 17 June 2023 (UTC)[reply]

I think there's possibly value in it when it creates a distinctive phoneme - for example, I’d be okay with English entries at th or ch. Quite a lot of Caucasian languages blur the line here, especially when you get into the dialects: Adyghe “officially” has х (x), хъ (χ) & хъу (χʷu) but not ху (xʷu) as multigraphs in its alphabet, even though the latter exists dialectally, and it’s trivial to see where it would go if it were to be included. Theknightwho (talk) 17:22, 18 June 2023 (UTC)[reply]

@Theknightwho do u understand the difference between alphabets and phonemes? I agree with Vininn, let's not add digraphs unless someone wants to create a separate template specifically for digraphs and trigraphs. But why stop there? We could go further and create separate templates even for tetragraphs in German or Irish. Or how would you treat "gh," which in English can be pronounced as /f/ or can be silent in diphthongs, huh??? And "Symbol" isn't specific enough. 15:30, 19 June 2023 (UTC) Shumkichi (talk) 15:30, 19 June 2023 (UTC)[reply]

Also, foreign clusters like "ps", "chth", "ks" etc. in English, when they are at the beginning of words, can also be considered blablablagraphs as they technically stand for single phones. Would you add them too? Shumkichi (talk) 15:35, 19 June 2023 (UTC)[reply]

@Shumkichi: I think you mean 'ugh' rather than 'gh'. In the name of the artist, habit effectively leads to 'van Gogh' being misread as 'van Gough', whence the British pronunciation. --RichardW57m (talk) 15:42, 19 June 2023 (UTC)[reply]

or it can be analysed more simply as ou + gh and au + gh, since they usually stand for different clusters? anyway, still, let's not add these multigraphs when they aren't counted as individual letters in an alphabet (they are in languages like Czech, Slovak, or Hungarian though but I think they're already added). Shumkichi (talk) 15:48, 19 June 2023 (UTC)[reply]

If there is a tradition of calling it a letter, then I think 'letter' is the appropriate part of speech.

For other combinations, something like 'digraph' would be better, but I'm not sure of the best term. Thinking of German, what would we do with 'sch' (which I think has some standing as a heading in tabulation) and 'tsch'? There's also an issue of how we count characters if we distinguish 'digraph' and 'trigraph' - for Tai languages in the Lanna script, it would be somewhat off to count Northern Thai ᩉᩖ (/⁠l^H⁠/) as a digraph but its despised alternative ᩉ᩠ᩃ (/⁠l^H⁠/) as a trigraph just because the latter uses three characters to express its two graphemes. However, I can probably use published abecedaries to promote these two sequences to letters. --RichardW57m (talk) 10:26, 19 June 2023 (UTC)[reply]

"Multigraph" is the generic term, so we could use that for ambiguous cases, or possibly for anything other than a clear-cut digraph.

But in your Lanna example, ᩉ᩠ᩃ would still be a digraph, because we count letters, not characters. The joiner isn't visible in print and so is irrelevant for anything but encoding.

That's assuming the sound value of ᩉ᩠ᩃ is not predictable from its components. If it is, then it's just a letter sequence, not a multigraph. kwami (talk) 10:32, 19 June 2023 (UTC)[reply]

@Kwami: Multigraph in this sense is a new word to me, and I'm tempted to RfV it. --RichardW57m (talk) 11:33, 19 June 2023 (UTC)[reply]

Ah, but is the second character in the first Lanna script example a letter? Formally, in Unicode, it's a combining mark, as in the Lao script. I think it's natively regarded as a letter, but I could be wrong. Another Northern Thai example, which gets a dictionary entry, is ᨷᩕ (/⁠p^h⁠/), where the partially encircling mark is usually regarded as an aspiration mark, though historically it's just one of two forms of the 'subscript' <r>-letter. In this case, the voicelessness is what is unpredictable. --RichardW57m (talk) 11:33, 19 June 2023 (UTC)[reply]

The formal clusters in the pair above are semi-predictable. Anomalously, it is not the second consonant that is dropped, but rather the first consonant drops apart from it's rôle as a tone modifier. This is why they show up in abecedaries. However, this applies to all instances of that first consonant that I am aware of. There is one other consonant that works like that, in about half a dozen words. (We get similar behaviour in Thai and Lao-script Lao, where combination has proceeded to give us ໜ.) --RichardW57m (talk) 11:33, 19 June 2023 (UTC)[reply]

Related to this question, there are things like Vietnamese ố, which are not letters of the alphabet, but are not digraphs either. Unless we delete all those articles, and leave them as red links in the Latin Extended Additional appendix, we'd probably use the POS "letter" even though that's not technically accurate. And if we do that, is there any real significant with using the POS "letter" for an English digraph? kwami (talk) 10:58, 19 June 2023 (UTC)[reply]

Well, Unicode classifies them as letters. However, an example such as this is an SoP - it's the Vietnamese ô (letter) plus tone mark ◌́ --RichardW57m (talk) 11:45, 19 June 2023 (UTC)[reply]

"Letter" in Unicode is an indication of behaviour. It's not necessarily a "letter" in the non-Unicode world. kwami (talk) 11:47, 19 June 2023 (UTC)[reply]

I think a good indication of "letter" (not the only, mind you) is how things are standardly alphabetized within a language. Vininn126 (talk) 11:48, 19 June 2023 (UTC)[reply]

Aren't you rather assuming standards, and at that, standards that are followed. One can't even rely on dictionaries being internally consistent! As an aside, Thai dictionaries follow a standard ordering, but it's not quite what most Thais expect! --RichardW57m (talk) 12:05, 19 June 2023 (UTC)[reply]

In this case I mean something that the majority of speakers use - standard as in widespread, i.e. the vast majority of speakers do XYZ. Vininn126 (talk) 12:07, 19 June 2023 (UTC)[reply]

I would say that a scheme understood by 20% of the population counts as significant. Despite its diversity of systems, I think we can extract the sometimes discontiguous compound vowels of Lao as letters. I am not sure about Thai - most Thais more or less learn the order of the compound vowel symbols, but foreigners only need to learn the order of their constituents! --RichardW57m (talk) 13:37, 19 June 2023 (UTC)[reply]

Although such letters are sums of parts, I think we should make an exception because of form NFC. It may not be easy for a user to decompose them, unlike normal sums of parts. Whether we then need a different part of speech is another question. We already have 'ligature', and characters like these are like ligatures, but in the backing store rather than on paper. --RichardW57m (talk) 13:44, 19 June 2023 (UTC)[reply]

I don’t agree - Unicode form NFC exists because it was necessary to encode certain characters with diacritics in a precomposed format in order to maintain one-to-one correspondence with tons of pre-existing encoding standards. i.e. it was done for backwards compatibility, to ease the conversion to Unicode. It doesn’t make them any more or less linguistically interesting than all the letters with diacritics that weren’t encoded that way, and if Unicode were being created from scratch today I suspect they wouldn’t be included at all. In short, I just don’t think it has any relevance to a dictionary. Theknightwho (talk) 18:28, 28 June 2023 (UTC)[reply]

There were complaints from Africa about letters like ɛ ɔ ŋ not being available with pre-composed tone diacritics, unlike similar letters needed for European languages (and even for obscure academic disciplines), but by that time Unicode had decided to stop encoding pre-composed characters. Given the disparity in software support in Europe and Africa, or between e.g. paleography and basic-literacy programs, it should've been the other way around. If the encoding were to be done from scratch today, I wonder if Unicode would even bother with separate capital forms. kwami (talk) 19:00, 28 June 2023 (UTC)[reply]

@Theknightwho: So how do you propose that the average user, who's never even heard of the Unicode Character Database, look up the sequence of units in Vietnamese ố? --RichardW57 (talk) 21:18, 28 June 2023 (UTC)[reply]

@RichardW57 There's nothing wrong with having an entry for it in Vietnamese, but the fact it's encoded as an atomic character in Unicode is completely irrelevant. Theknightwho (talk) 21:31, 28 June 2023 (UTC)[reply]

Agreed. I don't think we should have 'translingual' entries for Unicode characters unless the character is truly international or interlingual apart from being in Unicode, e.g. mathematical notation, punctuation, basic letters of international/inter-ethnic scripts, etc. If they aren't, it's far more useful to give language-specific info, and if we can't provide that, IMO we shouldn't have an article. kwami (talk) 21:38, 28 June 2023 (UTC)[reply]

@Theknightwho: Except that ố isn't a letter of the Vietnamese language; it's a 'backing store ligature', being a combination of a letter and a tone mark. It can be a single character in Unicode, but there's nothing 'atomic' about it. There are editing tools (cursed be their authors) where backwards deletion will take out a base character plus diacritics even when the combination has to be multiple characters in Unicode. --RichardW57 (talk) 23:26, 28 June 2023 (UTC)[reply]

@RichardW57 Either it's of linguistic value or it isn't - the fact that it's encoded atomically in Unicode is irrelevant in either case. Theknightwho (talk) 23:31, 28 June 2023 (UTC)[reply]

How is it less significant than the digraph त्र (tra)? Or is the latter justified because we don't have red-link support for it or Northern Thai ᨬ᩠ᨬ? --RichardW57 (talk) 23:53, 28 June 2023 (UTC)[reply]

Correction: The Devanagari example is labelled as a 'ligature'. --RichardW57 (talk) 00:01, 29 June 2023 (UTC)[reply]

And in some cases it will not be clear whether a compound character is a letter or not, or it may be a separate letter (e.g. alphabetized separately) in some sources but not in others, especially in languages without a strong literary tradition. So I think we should either agree to be lax about the scope of the 'letter' categories, both for cases like ố and for digraphs, or we can come up with additional categories to classify them by. We can be more precise in the section heading so as not to confuse the reader; most readers are only going to see that it's 'letter', 'digraph', 'symbol' etc. and aren't going to care what the auto-generated category is. And since editors already accept that 'lemma' isn't being used precisely (e.g. ǧ is not a lemma as it's not a word, yet it generates a 'lemma' category for every language it's listed under), they should be able to be similarly flexible about 'letter' categories. kwami (talk) 23:46, 28 June 2023 (UTC)[reply]

RichardW57: You will see that in the preview infobox if you follow a red link, but personally I think we should have articles on Unicode characters if we can demonstrate independent use (i.e. it's not a spurious character) and can provide a non-Unicode definition (not just a visual description), even if that's no more than 'a letter in the Fooese alphabet'. But I think we should do the same for composite characters that are not included in Unicode (though maybe not something trivially predictable such as vowel + tone diacritic), and I would extend that to digraphs, which by definition are not SOP. kwami (talk) 21:32, 28 June 2023 (UTC)[reply]

@Kwami: It doesn't work for U+FB4E HEBREW LETTER PE WITH RAFE, I suspect because it gets decomposed into two characters in form NFC. --RichardW57m (talk) 12:07, 30 June 2023 (UTC)[reply]

Odd. And if you backspace on פֿ in the URL, the rafe is deleted, so it would appear that our software is treating it as a composite character, not as a precomposed character. But also if we were to create an article, {character info} would not work: we'd get the error message

Lua error in Module:character_info at line 71: Page title is not a single Unicode character.

So this is a broader issue in how we handle precomposed characters. kwami (talk) 19:57, 30 June 2023 (UTC)[reply]

Unicode defines U+FB4E as

≡ 05E4 [pe] 05BF [rafe]

The ⟨≡⟩ presumably means canonical equivalence. The characters identified with ⟨≡⟩ don't work with either preview or {character info}, while those defined with ⟨≈⟩, such as U+FB4F ﭏ, work just fine. kwami (talk) 20:04, 30 June 2023 (UTC)[reply]

It's not quite as dire as that. {{character info|0xFB4E}} seems to work fine. And most characters identified with ⟨≡⟩, which indeed denotes canonical equivalence in Unicode's code charts, work fine, such as à. The one's that don't are the 'composition exclusions', the 'scalar values' which expand to two or more in NFC (Normal Form Composed) as well as in NFD (Normal Form Decomposed). RichardW57 (talk) 21:53, 30 June 2023 (UTC)[reply]

Latin terms suffixed with -o (n-stem nouns)

There are a number of Latin third-declension nouns with a stem ending in -ōn- (that have -ō in the nominative, -ōnis in the genitive) that appear to be derived in some way from other words. An issue with sorting these into categories is that this ending does not always have the same meaning—in some cases the meaning, aside from its role in marking the word as a noun, is not even easy to identify. Sometimes it forms a derogatory epithet for a person (but often the derogatory sense is contributed to by the meaning of the root as well as the suffix) but not all derived nouns in -ō are derogatory; sometimes it forms a proper noun; sometimes it forms the name of a profession, or something that could be viewed as a type of agent noun (mūliō (“muleteer”), centuriō (“centurion”)) but not all derived nouns in -ō are agent nouns. Currently there is no general category for these nouns; instead, they are split based on their sense. Since 2016, we have had Category:Latin terms suffixed with -o (name) (6 entries currently) and Category:Latin terms suffixed with -o (agent noun) (37 entries currently). In March of this year, I created Category:Latin terms suffixed with -o (inanimate noun) to have a place to put some suffixed nouns that I thought could not be accurately described as agent nouns. But today, I'm thinking it would be better to just replace all three of these with a single category with a more generic name like "Latin terms suffixed with -o (n-stem noun)", since this does not require using any unclear subjective criteria to distinguish between words with formally identical noun-forming suffixes (and I think that etymologically, this is just one suffix). Does anyone object to this or have further thoughts? @Nicodene, LlywelynII Urszag (talk) 08:02, 18 June 2023 (UTC)[reply]

@Urszag I'm sorry. What am I missing as far as what's included and what isn't that "Category:Latin terms suffixed with -o (n-stem noun)" is necessary instead of just "Latin terms suffixed with -o (noun)"? — LlywelynII 08:06, 18 June 2023 (UTC)[reply]

That's certainly fine too. I guess I might wonder whether readers will by default realize that this title is referring to third-declension nouns (rather than e.g. the dative/ablative singular of second-declension nouns), but given the convention on Wiktionary of using the nominative for article names, there is no practical issue of ambiguity. None of the other subcategories of Category:Latin terms suffixed with -o contain nouns.--Urszag (talk) 08:48, 18 June 2023 (UTC)[reply]

I also lean towards merging them, as I don't see the need to split suffix categories by sense. Where precisely the line between the first and second categories lies has never been clear to me, in any case; cf. frontō 'forehead-man' (name) vs. aleō 'dice-man' (agentive?).

The third category that you have labelled 'inanimate noun' is of some diachronic interest, as it is populated entirely by words first attested in Late or Medieval Latin, probably reflecting a shift in the usage of the suffix that presages the Romance -ón, -one, etc. It would, however, be sufficient to indicate this with for instance an {{lb|Late Latin}} for sense 3 on the entry for Latin -ō. Nicodene (talk) 17:36, 18 June 2023 (UTC)[reply]

Good idea, I edited to add that. I will be moving over the nouns to a single merged category now. Something that I just noticed: we currently show dungio as a feminine noun, which would make it an outlier in the category, but all of the descendants (dungun, donjon, dongione) seem to be marked as masculine. The way it's currently formatted, the page appears to cite Niermeyer for the form and gender, but I checked and I don't actually see any gender given by that source. I also didn't see a gender in Wartburg. Do you think it is justifiable to change the gender of dungio to masculine?--Urszag (talk) 10:09, 20 June 2023 (UTC)[reply]

I suspect that the (rather questionable) Frankish etymology was the only reason it was marked as feminine. That would also explain why the specific variant dungiō was lemmatized, contrary to the cited sources.

Considering that it is only attested from the 11th century, per Niermeyer at least, I would relabel it as a borrowing from Romance and set up a reconstruction page per the FEW's and van Osta's *dominiō(nem). Nicodene (talk) 15:29, 20 June 2023 (UTC)[reply]

I just found another similar example that I wonder about, pīpiō f vs. piccione m and pyjon m. As a somewhat separate issue, there are three remaining nouns that seem separate enough to not go in the same category as the rest since they are both feminine and non-personal; consortio, vorago and vertigo. I question whether these are best analyzed as synchronically ending in a suffix -ō in Latin; even if they originally were derived by means of a nasal suffix early on, I'm inclined to think they would have been reanalyzed as ending in -io, -ago and -igo respectively.--Urszag (talk) 10:52, 20 June 2023 (UTC)[reply]

I imagine pipio f was an oversight that went unnoticed.

It seems beyond doubt that consortio represents consors + -io, and I suspect that this too was an oversight.

For vorago and vertigo, I'm inclined to wonder whether their 'surface etymologies' are in fact the real ones. I have not been able to find an attestation of *voraco, incidentally. Nicodene (talk) 15:45, 20 June 2023 (UTC)[reply]

@Urszag: We probably need some policing of the assignments - I just found and fixed the problem that the suffix in the centūriō (noun) had sense ID 'denominative', which is for verbs, instead of 'agent noun'. --RichardW57m (talk) 09:23, 19 June 2023 (UTC)[reply]

What would make sense is to have a merged category, consisting only of the categories of noun with the -ō suffixes. That can be created manually. If we split the suffix by sense ID, and put the sense ID in derivations, then the category split follows automatically. --RichardW57m (talk) 09:23, 19 June 2023 (UTC)[reply]

My proposal is to have one sense ID for the noun suffix -ō, -ōnis, that will distinguish it from e.g. the verb suffix -ō, -āre, but not distinguish between subsenses of the noun suffix. That will make it easier to clean up unassigned and misassigned words. All assignments to categories will of course be made by using the sense ID parameter in affix or suffix template. Unless I misunderstood you, I think we're talking about the same thing.--Urszag (talk) 10:09, 20 June 2023 (UTC)[reply]

No, I don't favour having just one noun suffix. What I suggested is a supercategory containing the nouns formed from the various senses. --RichardW57 (talk) 00:15, 29 June 2023 (UTC)[reply]

OK. What sense distinctions do you propose that we make between masculine nouns formed with the suffix -ō?--Urszag (talk) 00:46, 29 June 2023 (UTC)[reply]

sunshiney

I merged this entry into the entry for the more accepted spelling, sunshiny, then decided to go back and try to find a quotation to put under the -ey spelling. None to be found in literature, journalism, or even a blog; I would label it a misspelling, except that it seems to be common in such uses as product names. Is there a category or procedure for handling spelling variants that show up only in advertising and the like? – 2603:6081:8040:E92C:30EB:5C8E:11EE:13BF 08:10, 18 June 2023 (UTC)[reply]

Product names dont really count as words unless they pass the very rigid WT:BRAND criteria, which basically means only Google, Facebook, etc get listed. Or did you mean it's a common spelling in use in advertising, like lite? I can see the appeal of the expanded spelling, as sunshiny, while correct, seems to cut off too early. —Soap— 00:44, 19 June 2023 (UTC)[reply]

I meant the latter – lite is the first example that came to my mind as well. I looked up lite, though, and sunshiney isn't really in the same category after all. It's hard to do research on sunshiney because Google results consist almost entirely of sites selling dresses and the like. However, judging from the fact that it doesn't even appear in more than two or three of a couple of dozen online and print dictionaries I glanced at, I suspect that originally it was an unintentional error that has caught on in the world of marketing, Pinterest, and so forth. It also shows up a lot in non-scholarly references to song titles and lyrics. So again I'm wondering if Wiktionary has a good bin into which we can put words that are no longer arrant "mispellings" but that still haven't attained widespread usage. 2603:6081:8040:E92C:30EB:5C8E:11EE:13BF 01:36, 19 June 2023 (UTC)[reply]

I dont know of any category we could put this word in that stands in the middle between a misspelling and an alternative form. My personal instinct is to keep it as an alternative form, however, as I suspect the people using this spelling are using it on purpose because it has sunshine in it, and is easily recognizable as sunshine + y, whereas the standard spelling looks at first glance like sun + shiny which doesnt have quite the same happy feel to it. There may be subtle mindplay behind lite as well, such that one could write a book on it. —Soap— 12:56, 24 June 2023 (UTC)[reply]

Question about the use of the `|year=` parameter in `{{quote-book}}`

Hello, everyone
Among the older texts I usually employ for quotations, there is the Nuova Cronica. Now, the work is divided into 13 books: the first 11 were written by Giovanni Villani, the remaining 2 by his relatives Matteo (Giovanni's brother) and Filippo Villani (Matteo's son). The first edition (dating back to 1537, despite Giovanni Villani's death occurring in 1348) only included the first 10 books, while the one including all 11 books written by Giovanni was only published in 1559. The first edition of the version including the additions by Matteo and Filippo dates back to 1729. The Wikisource version I use for quotations, published in 1991, includes all 13 books.
Now, my question is: when using {{quote-book}}, what should I use as the year of publication of the original work? Should I use the year of the complete version, since the text I'm referencing is based on that? Should I use the year of the very first edition (and, in that case, is the fact that there are almost 200 years between the composition and the publication problematic?) Should I use the year of Filippo Villani's death (1407), as he's the last person to have worked on the text (despite it not being published for another 130 years)? I'm really unsure which one is to be considered the original work here.
Any input on the matter is most welcome. — GianWiki (talk) 12:03, 19 June 2023 (UTC)[reply]

I would go with the earliest possible date, since the language has changed over time, and post-dating it gives a false impression. To make an analogy, if somehow a lost Shakespeare manuscript had turned up in 1757, I would strongly oppose dating quotes to it from 1757, because it wouldnt represent the English language as it was spoken in 1757. This work was probably written in the early 1300's, so I would hope we could represent that date with something like c. 1325 in our quotes, not with the date of publication. But I'm not aware of any rules for or against this, so I will defer to others' answers. —Soap— 12:40, 19 June 2023 (UTC)[reply]

I agree generally with this sentiment, although it is important to recognize when something was edited much later than it was written, because often editorial changes are material, e.g. when a spelling changes over time and the editor updates the spelling for a modern spelling, if we then back-date that quote to the date originally written it gives the false impression that a newer spelling is older. There is some amount of discretion which needs to be applied here. - TheDaveRoss 13:25, 19 June 2023 (UTC)[reply]

Maybe there could be some indication of this through the use of the |other= parameter?

Something like:

#* {{quote-book|it|year=c. 1395|author=Giovanni Villani|authorlink=Giovanni Villani|title={{w|Nuova Cronica}}|trans-title=New Chronicle|url=https://it.wikisource.org/wiki/Nuova_Cronica|editor=Giovanni Porta|publisher=Ugo Guanda|year_published=1991|other=first published 16<sup>th</sup> century|chapter=Book, chapter|text=text|t=translation}}

- c. 1395, Giovanni Villani, “Book, chapter”, in Giovanni Porta, editor, Nuova Cronica [New Chronicle]‎^[5], Ugo Guanda, published 1991, first published 16^th century:
  text
  translation

Or through the use of |origyear=:

#* {{quote-book|it|year=16<sup>th</sup> century|author=Giovanni Villani|authorlink=Giovanni Villani|title={{w|Nuova Cronica}}|trans-title=New Chronicle|url=https://it.wikisource.org/wiki/Nuova_Cronica|editor=Giovanni Porta|publisher=Ugo Guanda|year_published=1991|origyear=c. 1395|chapter=Book, chapter|text=text|t=translation}}

- 16^th century [c. 1395], Giovanni Villani, “Book, chapter”, in Giovanni Porta, editor, Nuova Cronica [New Chronicle]‎^[6], Ugo Guanda, published 1991:
  text
  translation

What do you think? — GianWiki (talk) 13:49, 19 June 2023 (UTC)[reply]

Information on the work is a bit more confused than I had initially assessed. Italian Wikipedia says that Giovanni wrote eleven books, and Matteo later wrote eleven more (the work only has thirteen books). English Wikipedia's entry on Filippo Villani says “[t]he second edition of Villani's histories came out in either 1395 or 1396.” Perhaps c. 1395 could be used for good measure? — GianWiki (talk) 13:33, 19 June 2023 (UTC)[reply]

Are you aware of the parameter |newversion=? It can also be assigned a value implicitly. It allows a parameter |year2= for what you are actually quoting. I understand the policy to be to be date works to their first publication. However, if the original manuscript exists, as Wikipedia seems to suggest, you can use its date for the sections it covers. Normally, we use the date of publication as the date of origin in |year=, so I end up with quotations like the one in anupādā-aparitassanā, which is formatted by the call of {{cite-book}} in Template:RQ:pi:PTS Samyutta Nikaya 3. I've chosen the original date of 'c. 50 BC' because the Tipitaka is reported to have been written down in the first century BC (two widely different dates are deduced from the various records), though parts are centuries older and the whole text is strongly suspected to have had its grammar 'corrected' over a millennium later. I hope my experience helps. --RichardW57m (talk) 12:42, 19 June 2023 (UTC)[reply]

@GianWiki: as expressed by others above, |year= should be used to indicate the original year of publication. If required, I have used |month= to qualify that year; for example, |month=(date written) or |month=(first performance). You can then use |year_published= or the second set of parameters (i.e., |title2=, |year2=, etc.) to specify the year of publication of the actual work quoted from. If quoting from a version of the text that has been modified from the original, I generally add |footer={{small|The spelling has been modernized.}}. — Sgconlaw (talk) 13:36, 19 June 2023 (UTC)[reply]

Romanesco

I'm not sure I understand what the Wiktionary policy on Romanesco is. I wasn't able to find any actual guidelines on the subject.
I see the masculine singular definite article er/'r was added, and I wanted to second this as-of-yet-unanswered question. How much room is there for the creation of purely-Romanesco entries?
Thanks for any input. — GianWiki (talk) 12:58, 19 June 2023 (UTC)[reply]

Hi @GianWiki. Of course all terms in all dialects should be created whenever possible. While contributing to Romanesco (but I believe the rules apply to other central dialects we keep under ==Italian== as well) I've been minorly normalising and tried to more or less adapt where possible. See for example imbruttire for 'mbruttì, fargliela for fajela, fusaglie for fusaje, etc. (an exeption is vordì, because it fossilised enough in that shape). Aside from the obvious differences, Romanesco is grammatico-morphologically close enough to the standard to not require much particular policy. The question becomes more tricky when we have to deal with Sabine dialects. Catonif (talk) 13:52, 19 June 2023 (UTC)[reply]

I can honestly get behind the /j/ pronunciation of the -gl(i)- cluster being considered a mere dialectal variant, but I'm not as sure about the truncated infinitives.

I mean, it's generally close enough to Standard Italian, but there are also some undeniably clear differences.

The first thing that comes to my mind is the contractions: for instance, the preposition /ko/ (“with”) (I'm using phonemic transcription in order to circumvent the orthography problem, which is also kind of relevant) unites with definite articles in very peculiar ways, when compared to Standard Italian.

/ko/ + /(e)r/ → /kor/ (“with the [masc. sing.]”, which is, admittedly the most similar to SI)

/ko/ + /a/ → /kaː/ (“with the [fem. sing.]”)

/ko/ + /i/ → /kiː/ (“with the [masc. plur.]”)

/ko/ + /e/ → /keː/ (“with the [fem. plur.]”)

This also applies to prepositions like /a/ (“at, to”), /de/ (“of”), /in/ (“in”; It. nello → /noː/, It. nella → /naː/, It. nei → /niː/, It. nelle → /neː/), /pe/ (“for”)

Also, things like Italian quello scemo becoming Romanesco /kwoːʃˈʃemo/; It. quella → /kwaː/; It. non lo sapevo → /ˌnoː saˈpevo/

I'm also thinking of a phrase like Italian non ce la faccio (“I can't do it; I can't take it”), which has varying degrees of contraction:

→ /ˌnujje la ˈfatt͡ʃo/

→ /ˌnujjaː ˈfatt͡ʃo/

→ /ˌnuɲɲaː ˈfatt͡ʃo/

→ /ɲaː ˈfatt͡ʃo/

I guess what I'm trying to say is that I think normalized spellings can only work to a very limited extent. — GianWiki (talk) 14:29, 19 June 2023 (UTC)[reply]

@GianWiki Well, we can create co', pe', de, se (for si, also see the Rome sense there, which would never be used in the form but only in se), ce, etc. which are true for most of Italy other than Rome. The contractions are a bit harder since the duplicated vowels, which you transcribe as long with /ː/, aren't consistently written. I usually write them double (as can be seen in fargliela, where the /ɲaː ˈfatt͡ʃo/ you mentioned appears as <gnaa faccio>) but I've also often seen them written as a single, e.g. <quo là> for /kwoolˈla/. And the contractions as well are most likely true for most of (at least central-southern) Italy. Anyways, those are indeed tricky to deal with, I planned to deal with them, see User:Catonif/common-italian, but later gave up. If you plan on dealing with dialectal particles and contractions then be my very guest. Aside from this stuff I think adding Romanesco terms shouldn't be too hard. Catonif (talk) 15:51, 19 June 2023 (UTC)[reply]

@Catonif: I think writing long vowels with a single vowel letter creates unnecessary ambiguity. I'm thinking of minimal pairs, such as /kebˈbɛlle/ (“how beautiful! how nice! [fem. plur.]”; It. che belle!) vs. /keːbˈbɛlle/ (“with the beautiful/pretty (ones) [fem.]”; It. con le belle), or /kwa/ (“here”; It. qua) vs. /kwaː/ (“that [fem.]”, adjective; It. quella), just to name a couple of them.

I believe coming up with an orthographic standard to follow would be a very good start. Speaking of which, let me see if I got the point of your common-italian page: are/were you aiming a single orthographic standard to be used for both Standard Italian and any dialect thereof? — GianWiki (talk) 18:12, 19 June 2023 (UTC)[reply]

@GianWiki I know vowel length is distinctive, which is why I personally write double vowels. I was noting that single vowels are also often used. Cf. "gna faccio" with 11100 google hits and "gnaa faccio" with 416. But aside from that, what? You want to come up with an orthographic standard for Modern Romanesco? Sorry, but I really don't follow you on this. To make myself clear, I'll break up Romanesco content in two:

small grammatical particles, prepositions, pronouns, etc. These work quite differently from SI, given the great contractions. So we even have stuff like /ɲɔː/* for <non gliel'ho>, etc. These are written differently by each person, e.g. I don't even know how to write the example I just provided, <gnhoo>? <gno ho>? <nj'hoo>? while most people I know would just opt for <gnò>, likely with additional apostrophes all over the place. These entries are hence hard to make since these things are rarely written down, and when they are, they are written badly. Overall this is the kind of content that can more comfortably house under a ===Phonetics=== header of a Wikipedia entry, like w:Romanesco dialect. But, you want to add the contractions? Cool, but then add them descriptively, that is, mentioning all the weird ways they are spelled. Note: all the contractions are never Rome-exclusive, as they can be found in most central-southern Italy, so we have to be careful when labelling.
actual words, such as nouns, adjectives, verbs, etc. These, like the examples I gave above, can be handled without too many shenanigans. A note about verbs, we can't make them end in -à, -é, -e, -ì because (1) that would be a mess and (2) @Benwing2's modules can't handle that (and shouldn't, really, because (1) that would be a mess).

Now, I don't really understand where the "orthographic standard" comes in. On entries of point 1? But why, if they have to be made, at least they should be descriptive. On entries of point 2? But why? I mean, what would the standard contain? Maybe in usage examples? I admit I chose pretty much arbitrarily what to use in them in entries like ahó, fargliela, etc. But is it just about usage examples then? About my common-italian userpage, I wasn't aiming a single orthographic standard, I was planning how to best deal with the mess in the monosyllaboids entries to add "spoken Italian" information, like us saying co and not con etc. which isn't mentioned on the project yet.

I think the most interesting part of a dialect and what really needs our attention is the peculiar words (e.g. girarello, eccallà, etc.). Those are fun and are very informative. Contractions are wobbly a boring. Catonif (talk) 19:10, 19 June 2023 (UTC)[reply]

@Catonif Yeah, I think I get what you're saying, about words being more important than the orthographic representation of phonotactics. I guess I was just kind of excited at the idea of... I don't know, creating a somewhat-agreed-upon standard for the dialect.

One thing about truncated infinitives: if they are to be “normalized” orthographically, should the truncation be indicated in pronunciation instead? For example, with imbruttire:

IPA^(key): /im.brutˈti.re/, (Romanesco) /im.brutˈti/^**
Rhymes: -ire, (Romanesco) -i

Also, how would this apply to reflexive infinitives? For example, ingrifarsi:

IPA^(key): /in.ɡriˈfar.si/, (Romanesco) /in.ɡriˈfas.se/
Rhymes: -arsi, (Romanesco) -asse

Thanks in advance for any input on the subject. — GianWiki (talk) 07:43, 22 June 2023 (UTC)[reply]

@GianWiki Sorry if I was too direct, I'm overall very happy you chose to contribute to this dialect and wish you good luck. I guess keeping the verb truncations in pronunciations is fine, though if you're going to do that, use only one asterisk for the gemination, as I don't think I've ever heard a stress-final infinitive not geminate (except of course when before words like la and 'na, which have a beginning ° themselves). This would be only on Romanesco-only terms, right? Since all of pretty much all of central Italy says -à, -ì, -asse, etc. (except for Tuscany which says -à, -ì, -assi), so a label like "Romanesco" wouldn't be appropriate in terms widely used outside of the dialect. Catonif (talk) 09:20, 22 June 2023 (UTC)[reply]

Of course. I was thinking in terms of Romanesco only because it's pretty much the only central Italian dialect whose phonology I know for sure. I really wouldn't know—as of right now—which Romanesco terms are also used by other dialects, and by which ones.

Also, is there a plan (maybe plan could be somewhat of a strong word, I don't know) for representing other Italian dialects? As in, which ones—apart from Romanesco—could be picked as “representative”? I can only really think of Tuscan right now. — GianWiki (talk) 09:54, 22 June 2023 (UTC)[reply]

I honestly don't know either, there hasn't been much activity in the project in those fields, nor do I foresee anything happening soon with them. From my experience, Sabine dialects have some distinctive grammatical, inflectional and phonetic characteristics which often make them more different from the standard, than, say, Sicilian dialects, to the point where they might even warrant a separate L2. On the other hand, the dialects of the Tuscia viterbese, I think similar enough to Tuscan and Romanesco, shouldn't be too hard to deal with. I'm too ignorant on Umbrian and central Marchigian dialects to say anything about them. Catonif (talk) 12:13, 22 June 2023 (UTC)[reply]

@Catonif Speaking of Sabine dialects, I think they should be treated as a separate language, since, as you said, they present a number of differences that set them apart from Standard Italian. For example, the distinction between word-final /o/ and /u/, the almost complete merge of the second and fourth conjugations, the verb aé ("to have") being taken as a model for the conjugations of other verbs (stà, dà and partially jì), as well as the presence of metaphonesis and possessive clitics aren't small things that can be overlooked. If possible I would like to request the separation of the Sabine dialects (treated as a single language of course) from Standard Italian, on account of what stated above, plus of course the different lexicon. Trimpulot (talk) 09:58, 23 June 2023 (UTC)[reply]

@Catonif, @Trimpulot: I think it's all kind of confusing. Wikipedia puts central Italian as a whole under Central Italo-Dalmatian languages, as follows:

Italian (Regional Italian) • Central dialects • Tuscan • Corsican (Gallurese) • Sassarese

But, unless I'm mistaken, Tuscan is classified as an Italian dialect, and Corsican, Gallurese, and Sassarese are regarded as separate (albeit similar) languages here.

I think that, if we consider Corsican, Gallurese, and Sassarese as three separate languages, we also ought to regard central Italian—as a whole—as its own thing, on the grounds of morphological and lexical differences. In general, I believe Sabino is better represented as a dialect of central Italian than as its own separate language.

Any thoughts? GianWiki (talk) 07:27, 27 June 2023 (UTC)[reply]

@GianWiki I don’t know enough about central Italian dialects other than Romanesco and Sabino to judge whether they’re similar enough to be classified as one language or not. All I know is that I wouldn’t consider Romanesco and Sabino to be the same language. Trimpulot (talk) 17:43, 28 June 2023 (UTC)[reply]

using `{{l}}` and `{{m}}` in definitions

Should {{l}} and {{m}} be used in definitions? For example, the first two lines in climate change have several words linked with the {{l}} template, and though I dont have an example handy, I suspect there are many pages using {{m}} in the definition lines, since that is what the documentation of that template (they share the same code) recommends to do.

The problem with using m, as i see, is that it italicizes the word, making it stand out from the words that are linked using the "staple" brackets [[ ]]. It seems using l would make more sense. However, as I understand it, L stands for "list" and M stands for "mention", which is why I assume the template documentation suggests to use m.

Since the l template just renders like a normal link would, it seems that a normal link would do. should we maybe not be using either of them in definitions? Or should we still use m, but specify that it should only be used in certain circumstances, since it italicizes the word? If there is some function of the template that we need that a normal link cant do, would l be the best choice after all, since it wont italicize, despite it otherwise being used for lists? I admit that it took me quite a while to sort out the difference between the two templates and that saying L is for "lists and definitions" might make it even more confusing. Best regards, —Soap— 13:50, 19 June 2023 (UTC)[reply]

“l” stands for “link” (which is the full name of the template). J3133 (talk) 13:54, 19 June 2023 (UTC)[reply]

I would say no for now. Although this would be good to have in the longer run, the way to do it is not to introduce tons of {{l|en|ugly}} code into the definitions that make them harder to parse. Much better to have a dedicated definitions template that works in the same way as {{non-gloss}}, where the links are converted automatically. However, anything like that will need a lot of discussion and will need to be done properly. Theknightwho (talk) 14:01, 19 June 2023 (UTC)[reply]

I think we shouldn't use these. English is on the top of the page for a reason, and bare links are much lighter on the memory and editors alike. Thadh (talk) 14:10, 19 June 2023 (UTC)[reply]

Taught likewise. --RichardW57m (talk) 14:15, 19 June 2023 (UTC)[reply]

Agree. Let's keep the use of {{l}} and {{m}} to the etymology and terms sections. — Sgconlaw (talk) 14:18, 19 June 2023 (UTC)[reply]

I agree except in two situations: (1) an English word in the definition is spelled identical to the foreign-language word (in which case I propose using {{l|en|...}} to avoid an unlinked bold word), and (2) when foreign words are cited in definitions. As an example of #2, consider the definition of Spanish jurelero, which currently reads as follows:

{{lb|es|relational}} of or pertaining to [[horse mackerel]], or other fish called {{m|es|jurel}}

I actually have a script that implements (1), converting all English words in definitions that are linked with {{l}} to raw but converting raw words identical to the pagename to {{l|en|...}}, and likewise for [[#English|foo]] links. Benwing2 (talk) 18:23, 19 June 2023 (UTC)[reply]

I have no strong feelings (I mostly use bare links but it would not have occurred to me to see any problem with people using {{l}}), in the majority of cases where the result is the same. But I agree with Benwing, there can't be an absolute ban, since there are always edge cases. Another case where foreign words are used/mentioned in definitions is where multiple related/connected languages have words for some food, clothing, etc that English doesn't have an attested word for; in that case I've just defined it in one place. - -sche (discuss) 19:36, 19 June 2023 (UTC)[reply]

These should not be used in def-lines unless there is special need to call to a particular word in a language. I have seen some Polish and Finnish definitions where one needs to point to a particular word within that language, usually for historical/regional things, and almost always as a supplement to a definition, however, as just the link itself, they shouldn't be used. Vininn126 (talk) 19:42, 19 June 2023 (UTC)[reply]

I should mention that I remembered I do use {{l}} in definitions when a term requires a gloss. However, I don’t think I’ve used {{m}}. — Sgconlaw (talk) 19:47, 19 June 2023 (UTC)[reply]

Ah, yes, that's another place I've also used them. - -sche (discuss) 15:52, 20 June 2023 (UTC)[reply]

{{l}}, at least in my experience, is totally useless. Is there any reason we need it at all? Ioaxxere (talk) 21:02, 19 June 2023 (UTC)[reply]

Yes. For example, if you have a list of derived/related terms in a non-English language, you want the links to take you to that language section, not to the top of the page. Andrew Sheedy (talk) 16:07, 20 June 2023 (UTC)[reply]

Also useful for English when a page has a Translingual section. It also handles script codes, translation etc. It’s definitely not “totally useless”. Theknightwho (talk) 16:39, 20 June 2023 (UTC)[reply]

FWIW, I have found a good number of uses of {{m|mul}} to generate italicized taxonomic names, not always in prescribed taxonomic form. I convert such to wikitext. DCDuring (talk) 18:40, 20 June 2023 (UTC)[reply]

A comment. {{l|en|foo|id=bar}} allows a link to a particular senseid. This is occasionally helpful. Quercus solaris (talk) 03:50, 22 June 2023 (UTC)[reply]

A better comment: I am fine with adhering to the spirit of this advice (i.e., don't use those templates inside definitions unless a good reason), and I'll begin adhering. I will just note here a few corollaries: (1) fine to use {{l|en|foo|id=bar}} when id is helpful (targeted landing), and (2) fine to use [[foo#Noun|foo]] instead of [[foo]] when pos is helpful (targeted landing) (e.g., [[note#Noun|note]] versus [[note#Verb|note]], and [[lead#Etymology_1|lead]] versus [[lead#Etymology_2|lead]]). Quercus solaris (talk) 23:25, 23 June 2023 (UTC)[reply]

@Quercus solaris -- FWIW, I generally advise others against linking to numbered etymology sections -- entries might get reorganized, such that the former #Etymology_2 section becomes #Etymology_3, and suddenly all those older links to #Etymology_2 are now pointing at the wrong content.

As a workaround, I'll find some short label that makes sense and add in an {{etymid}} at the top of that etymology section, and target that label instead of the number. See a couple examples here at Japanese ん, and here at maroon. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:05, 24 June 2023 (UTC)[reply]

@Eiríkr Útlendi — That's a great idea. I hadn't known about that template, so thanks for sharing. Quercus solaris (talk) 02:49, 24 June 2023 (UTC)[reply]

Silesian's place in the Proto-Slavic tree (let's try and squash this)

@Thadh @PUC @KamiruPL @Sławobóg and also (Notifying Solvyn, Atitarev, Benwing2, Hergilei, Zhnka, Jan.Kamenicek): for Lach I propose we make Silesian a descendant of Old Polish. The form of Silesian we are documenting is a new literary standard being spread throughout the area and comes from the end of the Old Polish/beginning of the Middle Polish era, as can be evidenced by regular sound corresponces - it retains -y- in -cyja words of latinate origin, it has regular reflexes of Old Polish long vowels, and many of its core words are of Old Polish origin (granted, they could have been borrowed, but considering sound changes made to them in that time, inheritence makes more sense.

There is the issue of the Lach dialect, however the generall consensus is that Lach is of Czech origin and just happens to share a lot with Silesian. Vininn126 (talk) 14:05, 19 June 2023 (UTC)[reply]

Agree. I have nothing else to add tbh xd Shumkichi (talk) 16:05, 19 June 2023 (UTC)[reply]

Regardless of the tree structure, FWIW I have proposed unifying Polish and Silesian similar to unified Chinese, to avoid Silesian ending up badly maintained and having duplication with Polish. (I proposed the same thing for Scots and English, since Scots is hardly maintained at this point and a lot of Scots words are duplicated with dialectal English words.) This is probably a nonstarter since Shumkichi got angry about this suggestion when I made it before (people seem to have difficulty distinguishing the concept of "unified macrolanguage" from the idea that unification implies demotion to dialect status, which it does not at all), and maybe the Lechitic editors are willing to put the time in to maintaining Silesian so it doesn't end up a red-headed stepchild (so to speak). I will say no more about this unless others speak up in favor. Benwing2 (talk) 18:17, 19 June 2023 (UTC)[reply]

I have to say I also strongly oppose merging the two. Most institutions consider Silesian a separate language - political issues aside there are significant grammatical and lexical differences between the two and I do not think they are similar enough to merge. Most of the scholars who claim it's Polish have ulterior motives - Same with the people who claim the same about it but that it's Czech also have ulterior motives. Glottolog and ISO both have codes for Silesian. Vininn126 (talk) 18:27, 19 June 2023 (UTC)[reply]

@Benwing2 "people seem to have difficulty distinguishing the concept of "unified macrolanguage" from the idea that unification implies demotion to dialect status, which it does not at all" - are you Xi Jinping in disguise or what? it literally does imply that. look at what's happening in China now, politicians have appropriated the concept of macrolanguages to artificially erase minorities. if let's say Dutch was "badly maintained", eould u merge it with German? Also, why don't we merge Slovene with Serbo-Croatian since the former is rather badly maintained? The truth is that "macrolanguage" is a useless political concept that has nothing to do with linguistics, and it's only really applied to Chinese dialects because they use the same non-alphabetic script so we can pretend they are the same language but the truth is that some of these dialects are less mutually intelligible than e.g. Dutch and German. Shumkichi (talk) 19:04, 19 June 2023 (UTC)[reply]

also Vininn is right, Silesian can be compared to Scots in that both are derived NOT from English and Polish, respectively, but from Middle English and Middle Polish ALONGSIDE English and Polish. So they are really sister languages, their relationship is not asymmetrical (like e.g. Afrikaans which is a daughter language that comes from Dutch). Merging them sounds like the same rhetoric that some Russians use when they describe Ukrainian and Belarusian as "merely" dialects of Russian while they come from two different subdialect groups, with Ukrainian and Belarusian forming the Rusyn lanugages and Russian coming from Old Moscovian IIRC. Shumkichi (talk) 19:10, 19 June 2023 (UTC)[reply]

The main problem with the macrolanguage approach is what to call the broader language. With Chinese and Arabic, you have standard lects with accepted names that many speakers of the individual lects think of as the main language that their lect is attached to. "Polish" just gives the false impression that Silesian is a mere dialect of Polish. A two-headed monster like "Polish-Silesian" would come across to speakers as obviously made up. Is there a term for the umbrella lect that speakers would recognize?

There are all kinds of situations with a low-prestige language used at home and among members of a community vs. an unrelated high-prestige language used in school and in formal situations- for instance, in the US that applies to pretty much all combinations of English and just about everything else- so that in itself doesn't make bilingualism into a macrolanguage. If there isn't already an actual macrolanguage, I don't feel comfortable making one up for our purposes. Chuck Entz (talk) 19:09, 19 June 2023 (UTC)[reply]

The only term I can think of is "Lechitic", which sounds very forced. Thadh (talk) 19:13, 19 June 2023 (UTC)[reply]

Only linguists use such technical terms as "Lechitic" because they are useful in classification but normal people have never even heard the word in their lives, and good for them. So yeah, that would be extremely artificial. Shumkichi (talk) 19:19, 19 June 2023 (UTC)[reply]

Historical differences aside, there is still a strong political aspect to this - but even still, Silesian has different a grammar, phonology, and lexicon - enough to make it distinct in my opinion. It has retained the aorist, changed long vowels rather than merging them, mostly denasalized vowels, among others. Not to mention it should retain its LDL status without hassle. Vininn126 (talk) 19:27, 19 June 2023 (UTC)[reply]

@Shumkichi I would bite my tongue but calling me "Xi Jinping" is completely beyond the pale. You might as well call me Putin (which you almost did anyway with your reference to "some Russians") or Hitler. And yes I know the historical derivation of all these languages; I have a Ph.D. in linguistics after all. So please STFU, thank you.

On a more constructive note, maybe there is a way to use {{transclude sense}} or something similar to reduce the maintenance burden for the large number of technical terms and proper nouns that I suspect will have identical meanings between Polish and Silesian. Benwing2 (talk) 20:48, 19 June 2023 (UTC)[reply]

I have been trying to use this template more especially with technical terms and internationalisms as it is. At the moment I've been trying to focus on core vocabulary, where that isn't something we should employ, but with terms like that, I don't see why not; I'll just add that's a cross-linguistic problem. Also, most will be lematized at different places, since the spelling is different or even the entire form (Silesian latinate -tio nouns are -cyjŏ (a reflex of long -a in the end, visible in Middle Polish as well) where as Polish is -cja (with elision!), so really the problem there is comparable IMO to the difference between Polish -cja nouns and Russian -ция nouns. Vininn126 (talk) 20:57, 19 June 2023 (UTC)[reply]

@Benwing2 I think a good entry that illustrates why Silesian should remain separate is dziedzina - Silesian and Polish, as far as I can tell, share no lexical similarity with this word, however if Silesian were considered part of it, one could easily assume it has all the same meanings as Polish and that those other meanings are additional, unless you add a long usage note explaining the difference, and such a usage note would have to be in most entries! So the core vocabulary, grammar, and phonology differ, and yes, technical terms might be similar but that is a problem amongst even unrelated languages. Vininn126 (talk) 21:03, 19 June 2023 (UTC)[reply]

also, cognate words with the same spelling can have different semantic fields in different languages. there are some words that have slightly different or additional meanings in either Polish or Silesian, so treating them as variants of the same word will not always work. Shumkichi (talk) 21:10, 19 June 2023 (UTC)[reply]

+ declension and conjugation patterns are slightly different so even if the nominative forms look the same, the other forms are different Shumkichi (talk) 21:27, 19 June 2023 (UTC)[reply]

Why are you throwing such a tantrum over a little joke? xd And no, I did not compare you to Putin, stop overreacting and reading too much into my comment. "So please STFU, thank you" - you're lucky I don't argue with ppl here anymore, so I'll ignore your rude comment.

Also, maybe familiarise yourself with Silesian orthography first because it is the technical vocab, especially borrowings from Latin, that often has different forms in Silesian (cf. religia vs religijŏ, relacja vs. relacyjŏ, and thousands more). Shumkichi (talk) 21:04, 19 June 2023 (UTC)[reply]

@Shumkichi Trying to excuse your ad-hominem attack as a "little joke" is hardly cool. And what does "you're lucky" mean? Is this an implied threat to resort to more ad-hominem attacks? Please Assume Good Faith, which you seem to have problems doing. Benwing2 (talk) 21:39, 19 June 2023 (UTC)[reply]

yeah, it was literally an innocent joke, drama queen. "Please Assume Good Faitt

- then maybe stop telling me to "stfu"? Shumkichi (talk) 21:44, 19 June 2023 (UTC)"[reply]

@Shumkichi Uh huh. "Drama queen" is an ad-hominem attack, in case you didn't realize. Benwing2 (talk) 21:48, 19 June 2023 (UTC)[reply]

ooch nie, straszna obraza :( Shumkichi (talk) 21:49, 19 June 2023 (UTC)[reply]

Given the user's history of doing things like this, and of responding to being asked to stop by continuing to (or, when blocked, saying he will stop, and then continuing to), I have blocked him for two weeks. (IMO this seems like an example of how making good edits buys you the ability to make bad edits, because I suspect someone with fewer mainspace edits who acted this recalcitrant with a block log this long would've been blocked by someone else sooner for longer, but when this user has been blocked for any appreciable length of time, another user reduces those blocks, so...two weeks, but IMO block length should increase if misbehaviour continues.) - -sche (discuss) 23:47, 19 June 2023 (UTC)[reply]

@-sche: "this seems like an example of how making good edits buys you the ability to make bad edits": well yes, and I hope it stays that way. I'm not saying a valuable contributor should be given a free pass to say or do whatever they want simply on account of their being a valuable contributor, but they should certainly be cut more slack (within reasonable bounds) than people who produce nothing of value and juste waste other people's time. P U C – 19:34, 20 June 2023 (UTC)[reply]

POG, Ph.D. in linguistics? I hope that the commission was not under the influence of alcohol. Ok. ɶLerman (talk) 00:09, 20 June 2023 (UTC)[reply]

Also let me clarify what I meant by "denasalization" - in Polish you get nasal vowel assimilation dębu being pronounced dembu - in Silesian the reflex of Old Polish -ęb-/-ęp- for example is -ymp- so the nasal is still present, and thus evidence of inheritance, but the assimilation has fully phonemicized. Vininn126 (talk) 21:21, 19 June 2023 (UTC)[reply]

Finally! An interesting tidbit worth considering is the chronology - the first mentions of Silesians in text as they are now date to the 16th century, which is the cutoff date for Old Polish and Silesian. )(written in Silesian). (The first texts written in Silesian were from the 17th century). Vininn126 (talk) 21:27, 19 June 2023 (UTC)[reply]

BTW @Vininn126 the reason for my suggestions is based on the fact that minority languages are usually heavily influenced by the majority language in the country they are within and tend to borrow technical terms wholesale, so I would expect more similarity between Silesian and Polish in technical fields than e.g. Silesian and Russian. Mostly I don't want to see Wiktionary's Silesian coverage end up like the mess we currently have for Scots, although if Silesian has a single standardizing body and a single accepted orthography, and its dictionaries/grammars are available on the Internet, that should help. Benwing2 (talk) 21:39, 19 June 2023 (UTC)[reply]

@Benwing2 I understand that and I don't think the fear is unfounded - however I feel that with the recent rise of a literary standard and all the other reasons listed, Silesian should be an L2 and a descendant of Old Polish, at that If I am being honest, as to that matter, I'm not seeing many counter arguments, and if I don't soon, I'd like to push that forward. Vininn126 (talk) 21:42, 19 June 2023 (UTC)[reply]

On a different subject, can anyone explain how the new edit conflict interface is supposed to work? If there's a conflict it gives me the option only of selecting one text or the other, not of merging the two, so inevitably if I select my own text it erases the intervening text of others (or it duplicates my text, like it just did). Benwing2 (talk) 21:42, 19 June 2023 (UTC)[reply]

@Vininn126 Yeah I'm not objecting to making Silesian an L2. Benwing2 (talk) 21:44, 19 June 2023 (UTC)[reply]

Okay, so if/when we decide to make it a descendant of Old Polish, I'd need to update descendant sections and etymologies, particularly in relation to Proto-Slavic. This would probably have to be done manually, but I'd like a bot to find mentions of it in descendant sections so that I can update it manually. I should be able to just use CAT:Silesian terms inherited from Proto-Slavic to update etymologies there. Vininn126 (talk) 21:47, 19 June 2023 (UTC)[reply]

@Vininn126 Can you clarify what you need done by bot? Benwing2 (talk) 22:13, 19 June 2023 (UTC)[reply]

Actually, I suppose I could just do a search for -insource:/\{\{desc\|szl\| and any variations thereof, and then use the category, so maybe I don't. The only technical help I'd need is setting it as an descendant. Vininn126 (talk) 22:17, 19 June 2023 (UTC)[reply]

@Vininn126 OK, let me know if you need anything specific. Benwing2 (talk) 02:01, 21 June 2023 (UTC)[reply]

@Benwing2 Since it seems this thread has calmed down and no one else wants to add anything, I'd like to move forward with this. Vininn126 (talk) 12:04, 26 June 2023 (UTC)[reply]

@Vininn126 No objections from me but it looks like User:Theknightwho already made the changes. Benwing2 (talk) 07:12, 28 June 2023 (UTC)[reply]

@Benwing2 Just for the record, Vin asked me to do this last night - I didn’t make the change without being prompted. Theknightwho (talk) 18:19, 28 June 2023 (UTC)[reply]

Can confirm. Vininn126 (talk) 18:21, 28 June 2023 (UTC)[reply]

Not a problem, not objecting to this change. Benwing2 (talk) 20:51, 29 June 2023 (UTC)[reply]

Not in any case. ɶLerman (talk) 23:36, 19 June 2023 (UTC)[reply]

What do you mean? I don't understand this message. Vininn126 (talk) 11:58, 20 June 2023 (UTC)[reply]

@Shumkichi: The rhetoric of “some Russians” that Ukrainian and Belarusian are “just” dialects is unscientific and modernly politicized. The correct political view on this issue dates back to 1917 and at the same time it was resolved and closed. The Ukrainian and Belarusian languages do not originate from two different dialect groups and they do not form the Rusyn languages. The Russian does not originate from the Old Moscovian.

Also, no one artificially erases minorities, there is nothing wrong with the fact that the Bats language or the Ingrian language will disappear, but there is also nothing wrong with maintaining these languages. ɶLerman (talk) 23:44, 19 June 2023 (UTC)[reply]

@Vininn126: I didn't see a conclusion for consensus here. Zombear didn't even write here. Mostly Shumkichi, you and Benwing2 wrote here. ɶLerman (talk) 15:19, 2 July 2023 (UTC)[reply]

Block Shumkichi

I believe that the blocking of Shumkichi is not justified and administrator -she violates UCoC. It is absolutely unclear what "learn to leave the 4chan behaviour on 4chan 🙄" are? ɶLerman (talk) 00:38, 20 June 2023 (UTC)[reply]

Seems perfectly clear to me. Nicodene (talk) 00:55, 20 June 2023 (UTC)[reply]

How would it violate the Universal Code of Conduct? —Justin (koavf)❤T☮C☺M☯ 01:00, 20 June 2023 (UTC)[reply]

Speaking of UCoC, user should have been permablocked a year ago. --{{victar|talk}} 03:12, 20 June 2023 (UTC)[reply]

Shumkichi brought the bulk of the most reasonable arguments. At the 19:27 message by Vininn126 when I read the thread it looked all okay, and comprehensive, to the larger part due to his participation. Then Benwing took offense on Shumkichi’s question “are you Xi Jinping in disguise or what?” which I found to bring forward a ponderable thought as a rhetorical question, while nobody would assume Benwing to be associated with the Pooh party, it was all about the dangers of problematic ways of thinking group representations. Therefore indeed it was “a little joke”, I don’t know how else to deescalate other than other formulations likewise expressing “I didn’t mean it that way”; there is a lot to say about the social meanings of jokes, which all take too lightly. There is too much reasonable argument for “4chan behaviour”. Basically Koavf is discriminating against mentally disabled people who do their best, to answer the question. It is clear that social behaviour is more challenging for some, without culpability, as many must be insane to spent that much time on an internet community project, and if you follow some users longer there is no doubt about at least some personality disorder, the kind of which is manageable, as many “problematic behaviours”. You can call people out or call them to order for being too retarded again according to your taste, which you may have and contend, but such a ban should not be expected. Wiktionary is far from becoming 4chan because of Shumkichi: if 4chan were to come here, Shumkichi would be the quickest to dissuade them—definitely skilled in toxic rhetorics, where and why do people learn this? Normalcy is also deadly. Fay Freak (talk) 10:44, 20 June 2023 (UTC)[reply]

A Topic and Category for School Exercises

What does anyone here think of setting up a category topic of school exercises, like dictation and vulgus? -- Apisite (talk) 07:07, 22 June 2023 (UTC)[reply]

Sounds good, if you can find other stuff to add to it then great. There's already a Category:en:Schools, but that appears to be about types of schools themselves, so maybe there might be some interesting things to collect in such a category. Kiril kovachev (talk) 23:57, 27 June 2023 (UTC) @Apisite[reply]

A few ideas: assignment, bookwork, certamen, closed-book, cloze, coursework, dictation, essay, examination, homework, open-book, open-notes, pizza fractions, quiz bowl, seatwork, science fair, show and tell, take-home, theme. Equinox ◑ 00:12, 28 June 2023 (UTC)[reply]

Don’t think it’s necessary to have a subcategory for this. Either “Category:Education” or “Category:Schools” is sufficient. In any case “School exercises” may be vague—people may think it is for types of physical exercise done in schools. Also, are examinations and tests a type of “exercise”? I would have regarded them as forms of assessment. — Sgconlaw (talk) 01:30, 28 June 2023 (UTC)[reply]

cisgender

There's a growing argument over the definition of cisgender in the talk page. I, and what appears to be the majority of the people in the discussion as of this edit, believe that the presence of the label "sometimes offensive" in the definition is inappropriate at best. Vinnin126 believes it's appropriate for the label to be there because "many people feel it's offensive". I think it might be best to get another admin to take a look at the situation, particularly given how other Wikimedia Foundation pages regarding the term have become especially contentious within the last couple of days.

(I would have put this in WT:RFVE given my position that the label lacks valid sources, but since the definition page is currently protected that's a bit tricky at the moment.) --Pikavangelist (talk) 15:45, 22 June 2023 (UTC)[reply]

There's already a discussion at Wiktionary:Information_desk/2023/June#"Cisgender"_engenders_controversy. Let's keep discussion cntralized. —Justin (koavf)❤T☮C☺M☯ 16:26, 22 June 2023 (UTC)[reply]

Will do. Thanks, completely missed that that was there. --Pikavangelist (talk) 04:30, 23 June 2023 (UTC)[reply]

Scots as an LDL

@Theknightwho, @Thadh, @Soap: Per our discussion on Discord and prior discussions, I'd like to suggest that we move Scots from being a Well-documented language to being a Limited Documentation Language. It seems that it's currently listed as a WDL due to its close relation to English, but then that excludes a lot of entries that have seen use from being included. Being an LDL doesn't mean that everything seen in a Scots text must be included (we still have standards with other LDLs); it just means that we have more flexibility with how best to proceed with updating Scots entries, templates, modules, etc. If we want to have solid coverage, this must be something that we fix now. Per WT:WDL, this does not need a formal vote and only requires a consensus here. AG202 (talk) 15:45, 22 June 2023 (UTC)[reply]

Support. Theknightwho (talk) 16:17, 22 June 2023 (UTC)[reply]

Support and we've had this discussion multiple times before, it may be worth looking them up. If I remember correctly, there was even a vote on this. Thadh (talk) 16:21, 22 June 2023 (UTC)[reply]

@Thadh: you might be thinking of Wiktionary:Votes/pl-2019-02/Treat Scots as English. -- Sokkjō 04:57, 23 June 2023 (UTC)[reply]

Support--Urszag (talk) 16:24, 22 June 2023 (UTC)[reply]

Support. Has that LDL-ness to it. Reasoning for it not being that is stupid. CitationsFreak: Accessed 2023/01/01 (talk) 17:25, 22 June 2023 (UTC)[reply]

Support. Vininn126 (talk) 17:30, 22 June 2023 (UTC)[reply]

Comment: there are various Scots dictionaries extant, including the Dictionaries of the Scots Language containing numerous quotations. Is Scots really documented in a limited manner? — Sgconlaw (talk) 18:07, 22 June 2023 (UTC)[reply]

@Sgconlaw The vast majority of languages on Wiktionary are LDLs: if you look at the list of WDLs here, Scots is the only non-constructed language listed that isn't a major language. Theknightwho (talk) 19:29, 22 June 2023 (UTC)[reply]

To be honest Im surprised at how few WDL's we have. I'm not against changing Scots in itself, but it makes me wonder if we're too permissive in general as to what qualifies as an LDL. This means that we won't need three cites anymore, right? And a sort of disconnected thought, not a consequence of the above .... from what I can see, the situation of Scots is similar to that of Basque, in that it is widely spoken but also a secondary language in its own territory. But we classify Basque as WDL. Is Basque different? Is it because Basque is so very different from its surrounding languages that it can't just shade into Spanish the way Scots shades into English? This isn't a vote against the change so much as a couple of unrelated puzzled thoughts. Thanks, —Soap— 11:15, 25 June 2023 (UTC)[reply]

@Soap: Basque is now a language of education, so speakers are far more likely to be able to write in it. By contrast, most speakers of Scots can write it with at best great difficulty, so it's not a natural choice of idiom even for parochial writing. Recall how long Amaryllis Gardener got away with cod Scots for the Scots Wikipedia. --RichardW57m (talk) 08:54, 26 June 2023 (UTC)[reply]

For reference, I would like to keep this open for at least 2 weeks to reach a consensus, unless it looks like it's unanimous in either way. AG202 (talk) 18:35, 22 June 2023 (UTC)[reply]

For reference, this was previously discussed at Wiktionary:Beer_parlour/2020/November#Attestation_of_Scots (which was mostly, but not entirely, supportive of making it an LDL). - -sche (discuss) 20:42, 22 June 2023 (UTC)[reply]

I just saw that discussion, and if that's enough to move forward with marking it as an LDL, then please go ahead. AG202 (talk) 20:55, 22 June 2023 (UTC)[reply]

Meh, no reason not to leave this open at least a few more days to let people comment. - -sche (discuss) 22:11, 22 June 2023 (UTC)[reply]

@AG202, -sche Shall we move forward with this? It’s been over a week and there are no objections. Theknightwho (talk) 14:48, 1 July 2023 (UTC)[reply]

Seems reasonable. - -sche (discuss) 03:06, 2 July 2023 (UTC)[reply]

Done Theknightwho (talk) 02:00, 9 July 2023 (UTC)[reply]

Scots and Middle English to RFVE?

On this topic, I have been thinking for some time that it would be better to move Middle English and Scots RFVs from RFVN to RFVE. The resources required to verify Middle English overlap significantly with those required to verify Early Modern English words, and the main Scots dictionaries provide their definitions in English. Moreover, we often see requests at RFVE for verification of words that are "actually Scots" or "actually Middle English", so these two languages already have an indirect but strong presence at RFVE. It would be easier for Middle English and Scots editors to participate in those RFVs if they were at the same venue as RFVs in their languages. (As for Old English, it is a very different beast and requires its own specialist skill-set, so it would be better to keep it at RFVN.) Thoughts? This, that and the other (talk) 04:01, 25 June 2023 (UTC)[reply]

A premiere on Wiktionary?

2 weeks ago @Fenakhay blocked me, if I understand it correctly, for adding sources to a page. I figured out at the time I might have done it improperly, so I stalked his edits to see how I should do it next time.

In vain – I have yet to find a single etymology Fenakhay has added using a reference.

What sickens me is the sheer arbitrariness of it. What wrongs have I done to the page? What the hell have I done to him?

Is Wiktionary a place where admins enforce rules crafted through community consensus, or one where admins make up rules on the spot? (= 3rd world cops) Synotia (talk) 20:59, 22 June 2023 (UTC)[reply]

@Fenakhay:, can you provide a diff(s) of the edits that led you to block the above user? —Justin (koavf)❤T☮C☺M☯ 21:06, 22 June 2023 (UTC)[reply]

On the page حاسوب he added a “cognate” which wasn’t a cognate and then he added a strangely formatted reference section containing references which he hasn’t consulted and about which he doesn’t know whether they actually support the claim, when his source is a webpage having them in the references section, continuing to make things that don’t make sense in spite of having received explanation that they don’t make sense, basically repeatedly committing the basic internet offence of adding irrelevant information (off-topic); for example how does ISIL marking Christian houses with a ن for نصراني support the claim of it having a pejorative sense? Fay Freak (talk) 22:17, 22 June 2023 (UTC)[reply]

Thanks. @Synotia: Did you actually read these first-hand sources or just reproduce them after finding them online? —Justin (koavf)❤T☮C☺M☯ 23:04, 22 June 2023 (UTC)[reply]

The bottom reference supports the claim. But I have trouble tolerating this hypocrisy, of having to be lectured by people who never practice what they preach.

how does ISIL marking Christian houses with a ن for نصراني support the claim of it having a pejorative sense? How does it not? Christians are kuffar to them. Synotia (talk) 09:18, 23 June 2023 (UTC)[reply]

They could mark it with a neutral sign for the same purpose, and even if they wanted to see it as pejorative this says little about the language community at large, as terrorist usage is fringe, requiring to weigh the values of things unusually to come to terrorist conclusions. But evidently you are bad a discerning or recognizing the nature of the fringe. Fay Freak (talk) 13:47, 23 June 2023 (UTC)[reply]

evidently you are bad a discerning or recognizing the nature of the fringe.

I don't know. To me, it was a fitting example in the category of pejorative usage. I doubt ISIL would care about neutral terms since Islamic extremists are fond of religious slurs :))

But in any case, this is a subjective matter up to discussion. Not exactly up to intimidation and blocking, like I'm some kind of vandal pest. Synotia (talk) 16:24, 23 June 2023 (UTC)[reply]

This user has been known to add either fringe-theory cognates or just generally cognates with no relation and has been asked many times to stop. Vininn126 (talk) 22:23, 22 June 2023 (UTC)[reply]

such as where? What's fringe about saying that حاسوب is made using the ح س ب root related to copmutation, after the computer term as used in English since the late 19th century, from which calques in plenty of other languages are derived? The Arabic Wikipedia page on حاسوب] mentions the same. Synotia (talk) 09:18, 23 June 2023 (UTC)[reply]

Inuktitut syllabic transliteration

The current system of transliterating Inuktitut was probably based on Cree transliteration, so that /nunavut/ is rendered as ᓄᓇᕗᑦ nonafot for some reason. This does not fit well with Inuktitut phonology, which only has /a i u/ and no /f/, so it should be transcribed as /nunavut/. Kwékwlos (talk) 10:16, 23 June 2023 (UTC)[reply]

Non sequitur. It's a transliteration, not a transcription. --RichardW57 (talk) 21:42, 23 June 2023 (UTC)[reply]

Doubt this was Cree, it doesn't have /f/ either. I'm fine with changing it to something more fitting. Thadh (talk) 21:48, 23 June 2023 (UTC)[reply]

Why does the transliteration/transcription not match the Latin script form? (Why are we giving a transliteration when, for this language, we could just give the Latin script form, in the same way that the romanization provided for Japanese is romaji because the language already has a [namely: that] Latin script form?) - -sche (discuss) 01:41, 24 June 2023 (UTC)[reply]

@-sche I don’t really know much about Inuktitut, but transliterations and romanisations aren’t the same thing. As an example: Mongolian used a Latin script for a while, but even if they’d never switched to Cyrillic it would be totally inappropriate to use it to transliterate the Mongolian script, as it works in a very different way that isn’t even close to one-to-one. Maybe it’s a situation like Serbo-Croatian, where the romanisation also happens to be good for transliteration, but not every language that uses Latin alongside other scripts is in that position. Theknightwho (talk) 18:12, 28 June 2023 (UTC)[reply]

I'm not all that up on Inuktitut, but I've never encountered nonafot before. Also, somewhat damningly, the detailed results from google:"inuktitut" "nonafot" return all of five hits, entirely from Wiktionary. Compare to google:"inuktitut" "nunavut", which gets us an ostensible hit count of 357K.

If nonafot were actually in use anywhere, we should see more than our own content. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:38, 28 June 2023 (UTC)[reply]

My rudimentary understanding of Inuktitut is that the prevailing modern Latin script spellings and the prevailing modern syllabic spellings, at least the normalized ones cited in dictionaries, are intended to be interchangeable for given varieties, or are at least frequently interchanged. Indeed, I encounter the Latin script forms more than the syllabic, although I haven't cared to object when people have lemmatized syllabic script forms. I don't know who invented the romanization found here, but we should probably switch the relevant modules to use the usual Latin script equivalences. Obviously, as Wikipedia puts it, "Because the Inuit languages are spread over such a large area, divided between different nations and political units and originally reached by Europeans of different origins at different times, there is no uniform way of writing the Inuit language." We probably have to use modern, normalized/standardized spellings as 'hubs' for the various Latin- and syllabic- script spellings from different regions and eras, which may need manual transliteration if they use letters for substantially different-than-modern values. - -sche (discuss) 21:18, 28 June 2023 (UTC)[reply]

@Eirikr Just to clarify: I don’t doubt that nonafot is wrong - I’m just saying that the Latin script form of a language isn’t always appropriate to use as a transliteration. But as @-sche says, it sounds like in this case that it would make sense to do so. Theknightwho (talk) 14:44, 1 July 2023 (UTC)[reply]

Why is Hokkien GÂU treated as being "cognate" to ZH (Zhongwen) 𠢕?

There is no firm evidence that Hokkien GÂU has any etymological tie to any (other) etymon ever represented by 𠢕.

NOTE. Much less in the pan-East Asian written koine; 𠢕 has never been in "international" circulation, and the 廣韻 excluded it from its 20000+ entries.

I imagine somebody might respond here in the Beer Parlour that there is something to this, and that GÂU should indeed be treated under an "Etymology 2" section.

However, such etymological mischaracterisations for various etyma seem to number in the four digits (if not "in the thousands") for Hokkien only. My point here is not that this specimens should be corrected and the other thousand(s) left alone. Rather, the error is clearly (in a broad sense) structural. The structural (in a broad sense) error should be corrected.

To be clear, we're not dealing with incomplete information at this point. We're dealing with pretty complete information that is incorrect.

This leads into the related question of why GÂU should be "Etymology 2" under 𠢕 at all. There is no firm evidence of an etymological tie, and GÂU has traditionally been written as 賢 or 爻 or 肴 or GÂU, with the first of these being recognisable to by far the greatest number of people into this century. 𠢕 is just one of MANY fantasy Neo-Sinographs proposed for GÂU, mostly after about 1975. The placement of GÂU under 𠢕 is supposed to be for etymological considerations; however, the purported etymological link is not proven.

NOTE. While pre-modern Sinology tended to ignore etyma with no ties to Sinographs, Neo-Chinese Sinology seeks to demonstrate that there are no etyma (unless of recent "foreign" origin) with no ties to Sinographs; it is assumed to be impossible in a practical sense for a "non-foreign" word like GÂU to have no "cognate" Sinograph from the Neo-Chinese canon (which of course excludes Vietnamese Nôm, for instance). In the Neo-Chinese logic, GÂU is categorically assumed to have at least one "cognate" Sinograph from the canon, and if 𠢕 can be shown to be the least unlikely candidate, the etymology is conclusively enshrined with or without proof.

It would seem that the place of a word or etymon in English Wiktionary generally doesn't depend on purported etymologies. Rather, words are grouped together (or not) based on established knowledge; unestablished etymologies are at most noted as such, with links between the possibly related roots. Why is Hokkien dealt with so differently? The rationale seems to be that this is a valid — if exceptional — shortcut, but the shortcut seems to have generated a great deal of misleading, invalid (mis)information.

NOTE. I must take a bite out of my own question by noting that etyma can only be cognate (or not) to etyma; there's no way for a written symbol to be "cognate" to anything. In this case — of course — we want to refer to an etymon (perhaps presuming, as we often do, that there has only ever been one) represented at some point by the symbol 𠢕. As a convenient & indispensable shorthand, we refer to 𠢕 as if it was the etymon itself. This shorthand is unique to modern Sinology (Sinology as informed by Neo-Chinese nationalism), and has never come up for public review or discussion. 釆 (talk) 04:50, 24 June 2023 (UTC)[reply]

Hello, not a Chinese editor but I thought I would give my input. What page are you referring to when you talk about "GÂU"? There is no etymology under gâu, let alone an Etymology 2, so I am slightly confused what you're referring to here. Moreover, you mean to say that "GÂU" is not related to MC /ɦɑu/, OC *ŋaːw? The relationship seems quite plausible to me, if not in fact obvious, so the Hokkien spoken rendition of "𠢕" and "GÂU", rather than being "cognate" are literally the same term, no? Kiril kovachev (talk) 23:48, 27 June 2023 (UTC)[reply]

@釆 Kiril kovachev (talk) 23:49, 27 June 2023 (UTC)[reply]

@Kiril kovachev Thank you for your input. This is the page I'm referring to:

https://en.wiktionary.org/wiki/𠢕

This page implies that the word represented as 𠢕 in (for example) the 廣韻 (an imperial rimebook, or semi-dictionary) is a word, timelessly; and that Hokkien GÂU is simply the same word in a certain dialect of the same language, Chinese.

Just in case, note that the word GÂU has typically been written in one of four ways (賢, 爻, 肴 & GÂU), and 𠢕 is not one of those. So GÂU is not here due to orthographical considerations. 𠢕-as-GÂU is a (recent) theory of etymology.

To the extent that people write GÂU as 𠢕, yes, the two are logically just the same thing. However, GÂU is not generally written 𠢕.

Fundamentally, Wiktionary has GÂU in this awkward position because it holds 𠢕-as-GÂU to be etymologically true, w/o what would constitute proof in studies of Austronesian, Romance, Dravidian, etc. etymology.

The logic of 𠢕-as-GÂU is basically circular, but — as you can see — this is difficult to illustrate whenever it is first presented as Truth to the uninitiated. 釆 (talk) 19:58, 29 June 2023 (UTC)[reply]

@Kiril kovachev (Far from being offended, I would welcome constructive criticism of my maybe-tortured attempt at explaining what I have just tried to explain.) 釆 (talk) 20:05, 29 June 2023 (UTC)[reply]

You must forgive me, since like you identified, I have insufficient experience to properly understand the problem. Let me see if I've understood correctly:

The word "GÂU", in Hokkien, exists in some capacity (what is it actually meant to mean, by the way?), but it is not usually written as 𠢕, which is what the Wiktionary entry for 𠢕 would suggest.

My failure to understand comes from the following:

- Aren't there numerous syllables pronounced "GÂU" in Hokkien, each with different meanings, and hence different assigned characters?

- What is the concrete etymology of the word we're talking about here, and what would we change to reflect the correct etymology? Since filing it under 𠢕 is a mistake, where would it go?

- If 𠢕 can be read as "GÂU", then isn't it suitable for "GÂU" to be placed under that Chinese entry as one of its renderings? Are you perhaps suggesting that Hokkien should not be placed under the unified Chinese header?

Kiril kovachev (talk) 22:46, 29 June 2023 (UTC)[reply]

@Kiril kovachev > Let me see if I've understood correctly: The word "GÂU", in Hokkien, exists in some capacity …, but it is not usually written as 𠢕, which is what the Wiktionary entry for 𠢕 would suggest.

Yes, on all counts. (The Wiktionary entry is misleading. Unfortunately, it’s one of thousands reflecting the same complex of interlocking biases.)

GÂU means “clever; wise; skilful; good at [something]” (Douglas).

Good questions. I’ll try to answer them as concisely as I can.

Hokkien GÂU is cognate to Teochew GÂU. (Wiktionary & ISO treat Hokkien & Teochew as one language, which is unjustified, & inconsistent with their treatment of many other language pairs.) I suspect there are cognates in Hokchew (Foochow) & Hokchia (福清語). There are no cognates in Hakka, Cantonese or Vietnamese, AFAIK. I’m not aware of cognates in the “Wu” languages, but I do wonder.

(This answer may be unsatisfying for most. There’s a default expectation that words in Hokkien should have some kind of pan-Chinese link. But there is none, to the best of my knowledge. And many scholars have put their minds to finding one over the years, with a single-mindedness that might be considered unscientific. I would point out that when a Hokkien etymon has pan-Chinese ties, there are almost always cognates in Hakka, the Wu languages, or both; and this meshes with what we know of the relevant linguistic history. Even Sinologists will admit that a decent share of the Hokkien lexicon is non-pan-Chinese, in theory; but whenever a specific etymon is discussed, the default assumption is that “this will be not one of those”.) 釆 (talk) 09:00, 1 July 2023 (UTC)[reply]

@Kiril kovachev GÂU has no homophones in spoken Hokkien. Whether there are unrelated Sinographs (not used to write this GÂU) that also have a reading of GÂU … is another matter.

To simplify, there are cosmopolitan Sinographs that have readings in every Sinospheric language. These are the Sinographs used in the Confucian classics, or which pertain to the “international” pan-East Asian written koine (called “Chinese” in English) that, say, a captain from Kyushu in the 1200s or 1800s would’ve used in written correspondence with a Korean port official. And then there are vernacular Sinographs (or “Sinoid graphs”) that are only used in certain languages or in certain regions.

Vietnamese 𠀧 (BA, “three”) would be an example of the latter. 𠀧 is a Vietnamese Sinoid graph that doesn’t have a reading in Japanese, Mandarin, or Hokkien. What I mean to emphasise here for a moment is that just b/c a Sinoid graph exists doesn’t mean that it has a reading in every Sinospheric language. So there’s a non-frivolous — but kind of forbidden — question as to whether 𠢕 has a reading in Hokkien at all.

The 19th century Hokkien rimebook 彙音妙悟 has 8 graphs in the *GÂU section, incl. 爻 and 肴, & a few that strike me as being obscure. (Note that these are generally not “syllables pronounced GÂU in Hokkien”; they’re just Hokkien readings of cosmopolitan Sinographs, analogous to the Hakka or Vietnamese or Korean readings of the same graphs.) But no 𠢕. Nor is 𠢕 found under *HÔ͘ or *NGÔ͘.

Another, more renowned 19th century Hokkien rimebook, 彙集十五音雅俗通, has just 2 graphs in *GÂU: 嶤, and 賢, the latter being simply the then-as-now most common way to write the vernacular word GÂU. (爻, 肴, etc. are found under a different reading, reflecting what could be considered differences of dialect.) 𠢕 is not found.

The 20th century Campbell “dictionary” (technically Hokkien-Taiwanese, & and not a conventional dictionary) was marinated in Chinese nationalism and tried to include a much wider palette of graphs than the pre-modern rimebooks … yet only listed two graphs in GÂU: 賢 & a graphic variant of 賢. Again, 嶤, 爻, 肴, etc. are found under different readings. 𠢕 is not found.

Even the leanest of these books (彙音妙悟) carried somewhere around 10000 graphs. Nor did the Cantonese rimebook 分韻撮要 (slicker product of a wealthier, more highly literate region), see fit to include 𠢕. To be sure, these books — the Hokkien rimebooks & Campbell, at least — weren’t considered authoritative in their time; teachers were. But these pre-modern & early modern books serve as proxies. It’s safe to say that 𠢕 wasn’t part of the written koine as taught & used in the Hokkien-speaking region (or anywhere, apparently) in late imperial times. Nor was it a vernacular Hokkien graph.

It’s possible that a Hokkien-speaking teacher in the 1930s would’ve turned to the KANGXI GRAPH BOOK and given us a reading like HÔ or NGÔ͘ or similar based on the reading encoded in the KANGXI. Something like this was described in an essay by Ông Io̍k Tek 王育德 (1924-85). And this is what a Chinese nationalist would expect. But there is no indication that this would've been the case before about the 1870s. (The non-koine, poss. KANGXI-derived graph 鑫 suddenly appears in personal names in Formosa in the early 20th cen....)

So what is 𠢕 as captured in the 11th century 廣韻 & 集韻 and later copied into the KANGXI? My guess is that it was a graph used somewhere in the heartland of China — maybe slightly outside the seats of power — to write an infrequent etymon, or one syllable of an infrequent polysyl. etymon. The 集韻 suggests that this etymon or syllable had two pronunciations, and that some were inclined to use the (cosmopolitan) graph 豪 instead.

From a Hokkien POV, 𠢕 is no less obscure & out-there than 𠀧. To a Chinese nationalist, though, it would be sacrilegious to suggest that any graph in “canonical Chinese” (as defined in Chinese-nationalist terms) — and KANGXI & 廣韻 are squarely canonical — lacks a reading in any Chinese language. (And even Japanese or Vietnamese are just “the ones that got away”.) While 𠢕 may be undefined for Hokkien in a practical AND historical sense, in a Chinese-nationalist light, 𠢕 is canonical and MUST be assigned a Hokkien reading — logically *HÔ/*HÔ͘ and/or *NGÔ͘/*GÔ (but *GÂU might be fine, especially if paired with a micro-narrative that insinuates that an etymologically “orphaned” vernacular word is in fact the fossil of a long-lost timeless Chinese word).

So, under Chinese nationalism, 𠢕 (but not 𠀧) is assumed a priori to have a Hokkien reading. Although almost no Sinograph-literate Hokkien speakers past & present would recognise 𠢕, the Chinese-nationalistic take would be that “they just haven’t learned it yet”. But this is beyond the realm of science. 釆 (talk) 09:25, 1 July 2023 (UTC)[reply]

@Kiril kovachev > Are you perhaps suggesting that Hokkien should not be placed under the unified Chinese header?

Hokkien — if Taiwanese is excluded — is clearly “a Chinese language” in a social sense. This is the social reality, which in turn reflects six centuries of political reality. As for the current treatment of Hokkien as a dialect of “Chinese”, that’s fictional, and not in line with how Wiktionary deals with “non-Chinese” languages in general. That’s an interlocking issue, but it might have to be dealt with separately.

The outlines of the Hokkien-as-dialect issue are clearly visible from here, though. For over a thousand years, serious discourse in East Asia took place chiefly in a stable written koine. If desired, writings in the koine could be read aloud using local sets of readings; these sets of readings seemed to often be regarded as the most wholesome & significant facet of the spoken vernaculars that they were siamesed to. At the dawn of the modern era, the koine was replaced by four elevated vernacular languages, of which Mandarin Chinese was & is touted by Chinese nationalists to be the successor of the old koine — although the fringe nations of East Asia don’t recognise this b/c they “lack loyalty” and China hasn’t been “made great again” yet. The idea of written Mandarin as the modern continuation of the sacred koine underlies the current scholarly pretense that Mandarin syntax & lexicon are essentially a pan-Chinese syntax & lexicon, with various other vernaculars departing from pan-Chinese syntax & lexicon much less often than not; hence the idea that it would be a waste of scholarly effort & resources to simply treat (say) Hokkien, Cantonese, Hakka & Mandarin as four different languages, no matter how little mutual intelligibility there is. 釆 (talk) 09:52, 1 July 2023 (UTC)[reply]

Forgot to add: In a sense Standard Mandarin, the vernacular, & Standard-Mandarin-in-vernacular-readings are in a three-way relationship. While it’s hard to justify treating Standard Mandarin & the (non-Mandarin) vernacular as one language, it’s also hard to see Standard-Mandarin-in-vernacular-readings as anything but a facet of Standard Mandarin. If the modern speaker subjectively perceives the vernacular & Standard-Mandarin-in-vernacular-readings as low & high registers of the vernacular, it follows that they’d perceive the (low) vernacular to be a low-register extension of “Chinese”.

If anything, maybe repositories of knowledge should decouple the “low” vernacular from Standard-Mandarin-in-vernacular-readings? The latter — of considerable ceremonial importance to many — would continue to be appended to Standard Mandarin…. Maybe this would end the tug-of-war. 釆 (talk) 11:30, 1 July 2023 (UTC)[reply]

I once suggested that we should show the vernacular and literary readings separately, or at least indicate them clearly, but turns out for some words it's difficult to do so (e.g. the ones at 生 or 爭) due to how different lects may use vernacular or literary readings differently for the same sense (and often the sources does not document which sense is in the vernacular/literary reading); let alone using separate pronunciation headers, or even separating the vernacular ones under a different language header.

I think at the very least (and this is what I would do if this were a Cantonese word, but I usually don't touch non-Cantonese stuff), we should be putting the modern Hokkien senses under a different etymology section, c.f. 杰#Etymology 3.

Also of note is that the literary readings does not always come from Standard Mandarin.

Of course, this is still a controversial issue, and I'm still thinking of ideas to improve this – some of our entries like 下 and 夾 are just abhorrent. Feel free to suggest practical solutions if you have some. – Wpi (talk) 16:09, 1 July 2023 (UTC)[reply]

@Wpi Why not the solution that's in effect for non-Chinese languages? English "royal" & "regal" are cognates, but we find them on separate pages, together with homographs in various languages, regardless of etymology.

https://en.wiktionary.org/wiki/royal

https://en.wiktionary.org/wiki/regal

To extend one of your points: The formal 漢文 readings generally don't come from Standard Mandarin at all. At the same time, many people have — quite forcefully — reinterpreted Standard Mandarin to be the "new" 漢文, i.e. what the 漢文 readings are chiefly intended to be used to read.

The phenomenon of, say, mainstream Cantopop lyrics being written in (put plainly) a localised dialect of Mandarin, or Cantonese church congregations reading the Mandarin Bible "in Cantonese", using an extended set of 漢文 readings ... is something that's unique to Chinese; maybe the current pronunciation boxes (for "Chinese" words) do do it justice? Meanwhile, a "mainstream Wiktionary" approach seems to be a much better fit for the vernaculars. 釆 (talk) 16:04, 2 July 2023 (UTC)[reply]

@Wpi (Also, "vernacular readings" is an artificial concept. 漢文 readings (called "HÁN" in Vietnamese studies) are what exist in the social reality.

Novices have trouble determining whether HÁI (for 海) in any given context is a Han (漢) reading in Hokkien-Taiwanese or not. The social reality is that when Han readings are desired, HÁI is fine; that's what matters, socially. In linguistic terms, HÁI as in HÁI-ÉNG 海泳 or HÁI-KÎᴺ-Á 海墘仔 is almost certainly "of greater antiquity" than the bulk of Han readings. But there's no social relevance or reality to that, which is why artificial categorisation tends to run aground or fall apart. 釆 (talk) 16:25, 2 July 2023 (UTC)[reply]

@Wpi (Was that clear? Or was what I wrote too abbreviated?) 釆 (talk) 06:37, 3 July 2023 (UTC)[reply]

@釆—Thanks very much for your response. I'm afraid there is little level on which I can response to your quite grave concerns except emotionally, because I am familiar with the reality of "sub-standard" languages being repressed by major governments; this is a worldwide reality that is slaughtering a great wealth of culture by the day and year, and for which reason I'm apologetic to hear that this is the treatment of Hokkien as well; inasmuch as the Wiktionary project is able to do, what solution do you think would be possible? The thing is, I don't see the "Chinese" and "Hokkien" languages being decoupled in all likelihood, and words will continue to be indexed by their perceived canonical Chinese representation, even if that today is the modern Mandarin form, and even if that results in the imposition of pseudo-Hokkien readings onto spellings that just aren't used. What I can see as a fix for now is to remove Hokkien pronunciations on a given sinograph's page—if you are dead certain that that reading does not exist. Sorry for my, I imagine, poor understanding, but I support your pursuit of the most accurate possible representation of languages here. Kiril kovachev (talk) 21:04, 8 July 2023 (UTC)[reply]

@Kiril kovachev Thank you. No doubt such occurrences are many; Hokkien & its sister languages (Teochew, etc.) are probably just the most spectacular examples in terms of linguistic distance — which is politically (almost) irrelevant, of course.

It seems that the best solution would be to recognise Hokkien and Mandarin-in-Hokkien-readings as separate entities. The former would be a separate language. (This is not impossible. In fact, Wiktionary seems to have handled Hokkien that way till the 2010s.) The latter would be a special dialect of Mandarin. (While this dialect only exists marginally, it has political-cultural or even spiritual weight, and it may be harmless to exaggerate its importance as long as it's not at the expense of the actual Hokkien language itself.)

In other words, the systematically presented Hokkien readings of Mandarin words should be presented as exactly that, rather than as some kind of pseudo-spoken Hokkien. And actual Hokkien words should be separately presented as such, whether or not there is a homographic cognate in other languages (Chinese, Japanese, etc.). 釆 (talk) 14:33, 9 July 2023 (UTC)[reply]

@釆 Kiril kovachev (talk) 22:46, 29 June 2023 (UTC)[reply]

Add Vim language

Hi to all people! Since Vim is a artificial language, like Toki Pona and pronounces exactly like the English language, please create words, create some templates, etc! Here is: Appendix:Vim Thanks for all the people! Mihai Popa 😃📃 ^{Talk to me! 💬} 『My contributions! 🕔🕖』 12:59, 24 June 2023 (UTC)[reply]

It's nice to meet you. Are you looking to create a collaborative project where anyone who joins can help out? That sounds like a lot of fun, but there are other places to go for things like that. I know a lot about artificial languages, but I admit I've never been much of a team player .... all the goals I set are so far afield of what others want .... so I can't point you anywhere in particular that might be a good place to start a new collaborative project. Hopefully someone else here can offer better advice. Best wishes, —Soap— 13:11, 24 June 2023 (UTC)[reply]

Just FYI it appears this user is the owner of a sock farm, many of whose accounts have been blocked on the English Wikipedia. The user's accounts mostly all contain the word "Mihai" in them and sometimes are even completely transparent in their intent (e.g. "MihaiUnblocked"). See [7]. Benwing2 (talk) 07:11, 28 June 2023 (UTC)[reply]

Okay thanks for letting me know. I suppose he can get blocked here too if he does anything like that, but so far his edits seem good-faith, if admittedly rather playful. —Soap— 19:47, 29 June 2023 (UTC)[reply]

Holy Rolling

Obviously the current edits to the cluster of pages around English saint are just quick kludges and crosslinks. There are a couple of related issues:

1st, uncapitalized saint currently has a section treating it as a prefix with the idea that it can sometimes be added to completely random things to form placenames. Either that's just completely the wrong way to go about talking about the idea or we're missing a bunch of content at Category:English terms prefixed with saint and related forms such as san, santa, sainte, &c. Even if we now handle separate initial words as prefixes (?), this would only ever show up as capitalized Saint in practice, right? And even then, it would only ever apply to unusual uses of the word, with everything actually named after supposed saints just handled from the base terms.

2nd, in 2021, User:030BeterHe sensibly moved all the 'Sainte' names from saint to sainte. The listed reason was 'reduced bloating', which isn't quite right. Somewhere there's going to be a list of all the placenames with saint in them and it's going to be very very long. Still, it was the right move: most if not all placenames with 'Saint_e_' aren't native English variants of saint at all. They're just borrowings directly from French grammar. The problem is that right now the derivations are all over the place: there are separate lists of overlapping placenames from the exact same word/senses at saint, Saint, and St. Shouldn't we have all uses of the word as a word at saint, all the uses of the word as a title for specific people at Saint, and absolutely nothing at the abbreviations even when the abbreviation is extremely commonly used in the written form? or do all the title uses really go into 2 derivation lists based on formatting?

Even without the crosslinks I just added, there should probably be some indicator to readers where to find what they're looking for and to editors where to put things properly since it seems to be a recurring problem with these entries.

3rd, shouldn't all the various forms of abbreviation of Saint just point at the primary abbreviation? St is where we're currently parking the translations of abbreviation of Saint, so there, right?

4th, we really should have a list somewhere of the various hyponyms for saints in various faiths but that will bloat up with notes and subcategories very quickly. Better to just have a new category for some version of :en:Holy people and let the curious just click around? better to have a Thesaurus:saint entry to park everything? or better to just let the saint entry #Hyponyms section bloat up for now? — LlywelynII 13:36, 24 June 2023 (UTC)[reply]

Less important, 5th, aren't both of the "nickname" senses at Saint just occasional direct application of the "any good person" sense of saint and a new "pej. any person assumed to believe themselves morally superior" sense that ought to be added to the uncapitalized form? Is it normal to treat general potential nicknames this way?

and, 6th, what initially set this off was that we have entries for yͭ, yͤ, 2ᵈ, 2ⁿᵈ (which looks off to my browser), and so on. I was hoping there already was an entry for capital S and superscript t to just copy/paste into some formatting. We don't have one yet but it does match the formatting and former commonality of those other examples. Just create an "archaic form of" entry at Sᵗ? or is there a particular kind of superscript to use with our entries for these forms? or just don't create that in this case at all? — LlywelynII 14:13, 24 June 2023 (UTC)[reply]

Regarding just your first paragraph ..... there are some placenames in the Gaspé Peninsula of Quebec with names like Saint-Pamphile and the eye-catching Saint-Louis-du-Ha! Ha!. These are not named after canonized saints, so far as we know. My guess is, that's the basis for the third definition, and although I don't know of any English examples offhand, there are probably at least some. We already have a large number of French and Canadian French toponyms listed, so if we can't find English examples, maybe we could just move definition 3 to French. —Soap— 22:40, 24 June 2023 (UTC)[reply]

There's also St. Tiggywinkles, based on the tendency for hospitals in some countries to be named after saints, as was the old TV series St. Elsewhere. Chuck Entz (talk) 22:56, 24 June 2023 (UTC)[reply]

For point 3, I'd say that the translations for "saint" go under saint, and the abbreviations go under St.

Point 4, I feel like en:Holy People would work, but double-check. CitationsFreak: Accessed 2023/01/01 (talk) 00:55, 25 June 2023 (UTC)[reply]

About Judeo-Italian

Hello, everyone.
I've recently acquired a copy of Una traduzione giudeo-romanesca del Libro di Giona [A Judeo-Roman translation of the Book of Jonah] by Luisa Cuomo (1988), and—on the basis of this text—I've started adding a couple of entries in the Judeo-Italian language, namely דוּמֵידֵית (Dumedet), פַארַאוֵילַה (paravela), and פִילְייוֹ (figlio). I've formatted the entries in a manner similar to that of Hebrew entries, with the page name not containing diacritics, which are instead shown in the headword. But, since the situation with vowel diacritics is simpler, more reminiscent of Yiddish than of Hebrew, I was wondering if they should be added instead. Any ideas on this?
Thanks for any input on the subject. — GianWiki (talk) 13:11, 25 June 2023 (UTC)[reply]

I think it depends how the language is normally written. I think the choice to only include nikkud in the headword is because Hebrew is normally written without them, not because the system is complicated. The nikkud are generally used for disambiguation, in Hebrew Scriptures (to ensure the preservation of proper pronunciation), and in dictionaries. So if most Judeo-Italian texts include them, then maybe it would be better to follow the practice for Yiddish. Andrew Sheedy (talk) 20:51, 25 June 2023 (UTC)[reply]

Good point. As far as I've been able to see, it seems Judeo-Italian texts were written with full vocalization (albeit the niqqud signs used seem to vary to some extent, perhaps depending on dialect).

Thank you. — GianWiki (talk) 07:53, 26 June 2023 (UTC)[reply]

Transliteration in `{{quote-book}}` and the like

In {{quote-book}}, is there a way to achieve transliteration for parameters like |title= and |chapter= that I'm not aware of? I haven't been able to find anything on the subject.
Any contribution is more than welcome. GianWiki (talk) 07:35, 27 June 2023 (UTC)[reply]

We don't even have such an option for the names of authors, where transliteration might perhaps be more useful, except, of course, standard Wiktionary transliteration would be unhelpful. Presumably the names of authors should be the form most commonly seen in English. --RichardW57m (talk) 10:02, 27 June 2023 (UTC)[reply]

I'm not trying to say it's fundamental for comprehension, but I felt it could be a nice plus. —— GianWiki (talk) 11:09, 27 June 2023 (UTC)[reply]

There's also a converse pressure, to keep the bibliographical information short. It would be nice to have |info_link= to link, when useful, to more information about the cited item itself. If the reference is implemented via a template, this information could be displayed on the template's page. Or perhaps we just automatically hide some of this arguably excess information. --RichardW57m (talk) 11:47, 27 June 2023 (UTC)[reply]

All that is available tor titles and chapters is translation. Automated transliteration beyond the level of single words is not standardly available on Wiktionary, not even excluding cases where no transliteration has yet been released. Non-standard mark-up is required for Thai, Chinese and other languages because too many editors hate standard mark-up such as <wbr> for invisible word boundaries, and actual transliteration in the sense of a reversible system is refused for Thai and many other languages and may well not exist for Chinese (unlike Akkadian). Templates such as {{zh-x}}, {{th-x}} and {{mnw-quote}} take marked up text and display cleaned up text and transliteration. Thai has the additional problem that WT:About Thai#Thai personal names precludes entries for people's names. Of course, it's not beyond the wit of man to make these schemes into standard or alternative generally available transliteration schemes, though the interfaces in the {{quote-book}} family will need some thought - note the musings at Burmese book titles. --RichardW57m (talk) 10:02, 27 June 2023 (UTC)[reply]

I wasn't really talking about automated transliteration (although that's always nice when available). What I had in mind was a parameter allowing for manual insertion of a transliteration of a title, or chapter title (or author name, as RichardW57m said above). Something like |translit-author=, |translit-title=, |translit-chapter=. —— GianWiki (talk) 11:17, 27 June 2023 (UTC)[reply]

Or we expect the simple parameters to give the information in Roman script, and use |script-title= etc. to give the items in the 'original' script. That concept could be fun if the author never had the work written down or the work quoted from is a transliteration into a script created long after his death. I've recently added some quotations from a work composed by Buddhaghosa and republished in the Chakma script, e.g. 𑄢𑄇𑄴𑄈𑄖𑄨 (rakkhati). The modern spelling is part of the reason for the quotation. --RichardW57m (talk) 12:22, 27 June 2023 (UTC)[reply]

@Sgconlaw, GianWiki Any thoughts? I think it should be fine to add params for manual transliteration of titles or chapters. What should the format be, esp. if combined with a translation? Benwing2 (talk) 21:20, 27 June 2023 (UTC)[reply]

@Benwing2 I'll make an example using {{quote-book}} and Arabic. Right now, the following code

#* {{quote-book|ar|year=2023|author=مُؤَلِّف|title=عُنْوَان|trans-title=Title|chapter=فَصْل|trans-chapter=Chapter}}

gives us

- 2023, مُؤَلِّف, “فَصْل [Chapter]”, in عُنْوَان [Title]:

What I'm thinking about could mimic the style of the {{l}} and {{m}} templates, so that something like

#* {{quote-book|ar|year=2023|author=مُؤَلِّف|translit-author=Muʔallif|title=عُنْوَان|trans-title=Title|translit-title=ʕunwān|chapter=فَصْل|trans-chapter=Chapter|translit-chapter=Faṣl}}

gives us

- 2023, مُؤَلِّف [Muʔallif], “فَصْل [Faṣl, Chapter]”, in عُنْوَان [ʕunwān, Title]:

Also, I'm not sure how to achieve it, but I believe there should be a way for the template to recognize the script in order to show it with the proper font (because, in this example, مُؤَلِّف looks definitely more readable than مُؤَلِّف), maybe the same way whereby {{l}} and {{m}} use the language parameter to get that information.

What do you think? ——
GianWiki (talk) 07:57, 28 June 2023 (UTC)[reply]

Looking back at what I just typed, I can now see that it could generate confusion at some level.

There's also the option of putting transliterations in tooltips, so that

#* {{quote-book|ar|year=2023|author=مُؤَلِّف|translit-author=Muʔallif|title=عُنْوَان|trans-title=Title|translit-title=ʕunwān|chapter=فَصْل|trans-chapter=Chapter|translit-chapter=Faṣl}}

returns

- 2023, “فَصْل [Chapter]”, مُؤَلِّف [here you can insert a version of the author's name that's not just merely transliterated, if there is one, e.g. Muallif], in عُنْوَان [Title]:

It seems way cleaner this way. —— GianWiki (talk) 08:18, 28 June 2023 (UTC)[reply]

Also, about the name of the parameters: since |translation= is shortened as |t=, and |transliteration= as |tr=, I dare to propose — mainly in order to facilitate the ease of use — to change

|trans-title= and |trans-chapter= (or |trans-entry=)

to

|t-title= and |t-chapter= (or |t-entry=)

and add

|tr-title= and |tr-chapter= (or |tr-entry=)

for transliteration purposes. — GianWiki (talk) 12:14, 28 June 2023 (UTC)[reply]

Transliteration standard for Judeo-Italian

Hi.
I've recently added a few entries in Judeo-Italian, and I have more or less employed the transliteration system used in the only published work I've been able to get my hands on that has transliteration, Una traduzione giudeo-romanesca del Libro di Giona [A Judeo-Roman translation of the Book of Jonah] by Luisa Cuomo (1988), and I was wondering if it would be better to establish a transliteration standard beforehand, or—considering the lack of an official one—it would be fine to just use whatever (within reasonable limits, of course).
I know the language is extremely minoritary, so it's likely that most people will see this as a non-issue, but I wanted to ask nonetheless. I thank you all in advance for your suggestions, and I certainly won't hold it against you if you tell me there's bigger issues to take care of around here. —— GianWiki (talk) 16:50, 27 June 2023 (UTC)[reply]

Do you mean if we should use nikudes (niqqudoth) on the page name, or only in the page text (like ee do with Hebrew and Yiddish)? I don't see any reason for using nikudes in a page name, like we neither do use accent signs in page names in Serbian, Ukrainian, Slovenian and Russian. In other hand, you use it in Chinese and Greek page names by some reason. Tollef Salemann (talk) 17:24, 27 June 2023 (UTC)[reply]

@Tollef Salemann: See the discussion above (two discussions above this one, on the same page). Andrew Sheedy (talk) 17:43, 27 June 2023 (UTC)[reply]

No. By transliteration, I mean representing Hebrew-script characters with Latin-script characters. Stuff like "What Latin character should represent the Hebrew character י? ⟨i⟩ ? ⟨y⟩? something else?"

Also, if I remember correctly, Yiddish pagenames do use nikud. —— GianWiki (talk) 20:03, 27 June 2023 (UTC)[reply]

Ah yeah sorry. By Yiddish nikud I mean you avoid nikud in many words, but yeah, Hebrew ones. With "figlioli" example, do you mean, the variant "figliyoli" is more accurate? Tollef Salemann (talk) 23:43, 27 June 2023 (UTC)[reply]

Japanese – automated accent scraping proposal

Hello, I was recently considering a new bot task, whose purpose would be to use the contents of Japanese dictionaries such as Daijirin or the NHK Accent Dictionary to harvest the pitch accent information for Japanese words, which could then be automatically transferred to Wiktionary entries. I wanted to gauge opinion as to whether this is a productive, sensible, acceptable, and desirable task altogether, since I feel there may be objection to this idea, and I therefore wanted to hear everyone's thoughts before committing to any concrete coding, etc.

In more detail, I would go through existing terms on Wiktionary, maybe through a dump, and find terms with no accent information (or, use Special:WhatLinksHere/Template:tracking/ja-pron/no accent as an index), and subsequently reference a local dictionary file to extract the relevant data. (Since the pitch accent data is not transparently available in the dictionary file, but rather as a plain-text section of text in the main definition body, it would require some string manipulation to extract, which may figure as a source of complication.) I would then use the accent, making sure the kana matches that of the {{ja-pron}} template it's being added to, and then append the gathered accent.

What do you think? Kiril kovachev (talk) 23:36, 27 June 2023 (UTC)[reply]

@Kiril kovachev I have done things conceptually similar to this so it's definitely possible to do in a reliable fashion. I would advise you to punt all the edge cases where you're not 100% sure they're correct (i.e. rather than making the change automatically, output a warning so you can review it manually). The only potential concern would be a legal one based on scraping a likely copyrighted dictionary; but the pitch info is so basic that I'm not sure there would be an issue, particularly if it's describing a standardized dialect (facts can't be copyrighted although layout of facts and choice of what to present can be). Benwing2 (talk) 07:06, 28 June 2023 (UTC)[reply]

@Benwing2 Yes, I was also preoccupied with the copyright point, but as you say I believe the pitch information isn't "creative" or otherwise original enough to be considered copyrightable... although I'm no expert here, I agree we should be fine on that front, especially considering I already copy pitch accents by hand from those dictionaries, and I don't feel I should feel any different compunction just because it's being done automatically by a script. I'm glad to hear your approval, and of course I'll be taking your advice of staying on the safe side—if I get any work done coding-wise, I'll show you the script again when it's pretty much finished. Thanks! Kiril kovachev (talk) 12:43, 28 June 2023 (UTC)[reply]

@Benwing2 Hello again! I've developed what I believe to be a satisfactory solution to our problem here. As usual, it is available on GitHub (link) if you wish to inspect (if you would also like to run it, there are more particular instructions over there as to how to access the dictionary file); you can also see some of the test edits done so far:

I would specifically like to consult about what happens when the script is run on a page with {{C}} category entries at the bottom: the script attempts to generate the ===References section after the categories, which is improper. My question is: are these types of category inclusions the only possible straggling content that can occur at the bottom? (Ultimately, the true question revolves around how to position the references section after the main body, but before any of this extra metadata at the bottom; but this might be a starting point.)

Here is an example of what happens if there are such categories (on the page 碧玉):

==Japanese==
{{ja-kanjitab|へき|ぎょく|yomi=o}}

===Pronunciation===
{{ja-pron|へきぎょく|acc=0|acc_ref=DJR}}

===Noun===
{{ja-noun|へきぎょく}}

# [[jasper]] {{gloss|precious stone}}

{{c|ja|Gems|Minerals|sort=へきぎょく}}

===References===
<references />

Clearly no good.

In other news, I have currently structured the code around human intervention, such that I still need to review edits for the time being, before they are published to the site. Do you find it sensible to delegate this task fully to the bot, if we can ascertain it is behaving safely?

As a final description of its behaviour, the bot reads from Special:WhatLinksHere/Template:tracking/ja-pron/no accent, and additionally maintains a local blacklist of entries that have already failed to provide pitch accents, perhaps because they are missing. Thus, when the program re-boots and notices entries which have already been unsuccessfully rejected, they are skipped.

I intend to do some more manual inspection of its behaviour for a bit, but in the meantime I would be eager to hear your thoughts. Kiril kovachev (talk) 20:37, 10 July 2023 (UTC)[reply]

I just expanded a bit at ロックアウト, adding in more refs for the pitch accent, among other things. FWIW, I think the examples you linked above look good. Thank you for doing this! 😄 ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:03, 11 July 2023 (UTC)[reply]

@Kiril kovachev: The short answer to your question is that besides {{C}}, there are a handful of other templates and alternative spellings like {{c}}, {{cat}}, {{top}}, {{topic}}, {{topics}}, {{categorize}}, {{catlangname}}, {{catlangcode}}, {{cln}}, {{zh-cat}}plus the wikitext [[C:]] and [[Categorize:]]. See here for the regular expression I to match categories.

While categories *should* be at the end of an L2 entry, that's not always the case. In practice, this means you should never trust that the first category template you encounter is a sign that everything following it is "straggling content". To find the spot to append a References section, the safest way is to find the L2 section header of your target language, then check all of the following lines until you either encounter another L2 section, or the end of the page. Once you have all of the lines in your section, working backwards from the last line you can check if the line is a category, or a blank line until you get to something you don't recognize. Then insert your references section at that spot and you should be good.

Of course, that ignores the existence of an Anagrams section, which should be below your References section. To handle that you can just search for ===Anagrams===. If it exists, you can just insert ===References=== immediately before it and skip worrying about categories. JeffDoozan (talk) 14:12, 13 July 2023 (UTC)[reply]

@JeffDoozan Thanks for this advice, this is very helpful to know. I'd forgotten about this endeavour for a while, but I'll be getting back to it soon so my thanks for sharing this wisdom. Kiril kovachev (talk) 18:20, 28 July 2023 (UTC)[reply]

Issues with Carl Francis, again

Bringing up major issues with behavior of Carl Francis, the main editor for Cebuano.

Issues:

removes diacritics from Cebuano entry headwords. These are not ordinarily used in writing, but are provided in dictionaries as pronunciation aid.
formatting issues, penchant for adding articles onto translations, and duplication
rudeness toward other editors (see edit history for pantalan and ulod)

He's valuable in expanding our Cebuano coverage, but we need to monitor his behavior. TagaSanPedroAko (talk) 09:45, 28 June 2023 (UTC)[reply]

Announcing the new Elections Committee members

You can find this message translated into additional languages on Meta-wiki.

More languages • Please help translate to your language

Hello there,

We are glad to announce the new members and advisors of the Elections Committee. The Elections Committee assists with the design and implementation of the process to select Community- and Affiliate-Selected trustees for the Wikimedia Foundation Board of Trustees. After an open nomination process, the strongest candidates spoke with the Board and four candidates were asked to join the Elections Committee. Four other candidates were asked to participate as advisors.

Thank you to all the community members who submitted their names for consideration. We look forward to working with the Elections Committee in the near future.

On behalf of the Wikimedia Foundation Board of Trustees,

RamzyM (WMF) 18:00, 28 June 2023 (UTC)[reply]

Lacuna in documentation of Module:links

@Theknightwho, Benwing2, Huhu9001: Please add the documentation of field no_check_redundant_translit in the parameter data of exported function full_link(). It is used in Module:pi-decl/noun, ultimately because transliteration from Pali needs to know which writing system is being used. So doing should prevent that part of the interface being moved again without remedial action being taken promptly. --RichardW57 (talk) 18:08, 29 June 2023 (UTC)[reply]

@RichardW57 I will as soon as I can. I am currently very busy with something, but will do this when I get the opportunity. Theknightwho (talk) 18:10, 29 June 2023 (UTC)[reply]

@RichardW57 I made this change. I tried to find all existing places that use it but I seem to have missed Module:pi-decl/noun. Benwing2 (talk) 20:52, 29 June 2023 (UTC)[reply]

I see, you call it "suppress_check" internally; I was searching for "no_check_redundant_translit". Benwing2 (talk) 20:53, 29 June 2023 (UTC)[reply]

@Benwing2: That wouldn't explain why 'tracking' failed. However, it can take a week for all languages' terms' pages to be regenerated; presumably there aren't enough Lao script or alphabetic Thai script Pali lemmas for one to be processed in the short time you were monitoring. Unfortunately, the name's a bit of a misnomer - it suppresses checks not only for redundant transliteration, but also for different transliteration. --RichardW57 (talk) 21:33, 29 June 2023 (UTC)[reply]

@RichardW57 Recommendations for a better name? Benwing2 (talk) 21:37, 29 June 2023 (UTC)[reply]

@Benwing2: Well, trust_manual_translit would work, though it's not as clearly imperative as I would like. Of course, changing the name breaks the interface. --RichardW57 (talk) 22:02, 29 June 2023 (UTC)[reply]

@RichardW57 Since I just changed the interface there's no problem with changing it again, but I'm not sure trust_manual_translit is especially clear. Benwing2 (talk) 23:49, 29 June 2023 (UTC)[reply]

@Benwing: The normal operation is to use maintenance categories to flag manual transliteration as something to worry about. Without this flag, a manual transliteration would only be accepted without placement in a maintenance category if automatic transliteration did not deliver a result. --RichardW57 (talk) 05:29, 30 June 2023 (UTC)[reply]

Precomposed characters?

I haven't been able to find anything clear about this, so I ask: is there a preference towards (or against) precomposed characters?
For a specific example: should a page name use פֿ (pe + combining rafeh) or פֿ (precomposed pe with rafeh)? —— GianWiki (talk) 19:08, 29 June 2023 (UTC)[reply]

@GianWiki Can you clarify what you mean? Internally the MediaWiki software converts all page names and page content into Unicode precomposed form. Benwing2 (talk) 20:54, 29 June 2023 (UTC)[reply]

@Benwing2: Unfortunately, pedantry is helpful here. Technically, it uses form NFC ('normal form composed'), but there are some precomposed characters that aren't allowed in form NFC. That includes all Hebrew letters including RAFE and I think all including DAGESH. There's a complete list of them at https://unicode.org/Public/draft/UCD/ucd/CompositionExclusions.txt . This is important information for those writing fast transliterators. --RichardW57 (talk) 21:53, 29 June 2023 (UTC)[reply]

@RichardW57 Hmm, I did not know this. In that case, I don't know the answer although I suspect we should not use these unusual precomposed chars. Benwing2 (talk) 23:48, 29 June 2023 (UTC)[reply]

@Benwing2, GianWiki: A short answer is that it is OK to type them; they will be normalised away as soon as the page is saved. On the other hand, don't use (numeric) character entities for them. There might be problems using them with some spell-checkers; I got irritated with a Northern Thai spell checker which objected to tone marks before subscripted consonants until I tweaked it to effectively normalise before checking. RichardW57 (talk) 05:40, 30 June 2023 (UTC)[reply]

@RichardW57 I thought you said they will NOT be normalized because they are outside of NFC? Benwing2 (talk) 06:00, 30 June 2023 (UTC)[reply]

@Benwing2: No. On its own, U+FB4E HEBREW LETTER PE WITH RAFE normalises to <U+05E4 HEBREW LETTER PE, U+05BF HEBREW POINT RAFE> for both form NFC and form NFD (normal form decomposed). The scalar value U+FB4E is not allowed in either of forms NFC and NFD. --RichardW57 (talk) 06:21, 30 June 2023 (UTC)[reply]

@RichardW57 OK does this mean that chars like U+FB4E HEBREW LETTER PE WITH RAFE are deprecated, similar to Arabic presentation forms? Benwing2 (talk) 06:24, 30 June 2023 (UTC)[reply]

@Benwing2: There are 13 deprecated characters, and neither are amongst them. Now, Arabic presentation forms may be discouraged, but the composition exclusions don't cause any problems beyond the usual failure of basic tools to understand canonical equivalence. However, many would regard characters that are composition exclusions as serving no useful purpose, though they may work better with simple-minded fonts, just as unnormalised text renders best for the Arabic, Hebrew and Lanna scripts. (Really smart fonts are unfazed by normalised text.) --RichardW57 (talk) 06:48, 30 June 2023 (UTC)[reply]

@RichardW57 Blah, it is very frustrating to talk with you because you're so literal minded. If Arabic presentation forms aren't "deprecated" then they are indeed discouraged/dispreferred/whatever, and you didn't answer my question: Is U+FB4E HEBREW LETTER PE WITH RAFE discouraged/dispreferred/whatever-you-want-to-call-it? Benwing2 (talk) 06:52, 30 June 2023 (UTC)[reply]

@Benwing2: The simple answer to the question would have been 'No.'. The Arabic presentation forms are discouraged, but the Hebrew presentation forms are not. A Unicode-compliant process would have no problems with them. --RichardW57 (talk) 07:09, 30 June 2023 (UTC)[reply]

Unnormalised Tibetan text is also easier for some purposes. On Wiktionary, we currently include processes to unnormalise Tibetan text. --RichardW57 (talk) 07:09, 30 June 2023 (UTC)[reply]

@RichardW57 Unicode normalized Arabic text is hard to work with because there's a bug in the ordering of short vowels w.r.t. shadda. Whenever you decompose Arabic text you have to reverse the order of short vowels and shadda, do your operations, and undo the reversal. Benwing2 (talk) 07:13, 30 June 2023 (UTC)[reply]

Or get a more intelligent font. --RichardW57 (talk) 07:41, 30 June 2023 (UTC)[reply]

Sorry, you're talking of more general operations. --RichardW57m (talk) 08:37, 30 June 2023 (UTC)[reply]

@Benwing2: It's even awkward with Vietnamese - Ợ gets decomposed into O plus tone mark plus diacritic forming a different letter. --RichardW57m (talk) 12:12, 30 June 2023 (UTC)[reply]

Thank you for asking, because it made me realize my question was missing a kind-of-important part.

My understanding is that this specific presentation form (U+FB4E HEBREW LETTER PE WITH RAFE) can be used for a page name without problems, but the page title will eventually be changed, and end up using base character + combining diacritic anyway. Am I correct in assuming this means that any links to said page will have to also use character + combining diacritic? —— GianWiki (talk) 09:49, 30 June 2023 (UTC)[reply]

It doesn't necessarily follow. Given how the servers fold spaces and underscores, I would expect them to normalise the internal path name parts of URIs. My recollection is that normalisation to form NFC happens whenever one previews a page that one is editing. I became sensitive to this behaviour when using a non-normalising spell-checker. --RichardW57m (talk) 10:11, 30 June 2023 (UTC)[reply]

I'm not sure I explained myself correctly. If I were to create an entry with pe + rafeh, the page name would — in any case — end up showing character + combining diacritic. Now, if were to use, say, {{l}} or {{m}} to link to the aforementioned page, should I employ as a parameter the form used as the page name (character + combining diacritic) or could I use the precomposed character? Would that end up being normalized as well?. —— GianWiki (talk) 11:45, 30 June 2023 (UTC)[reply]

The worst that could happen is that you would get a wrongful red link when you first viewed that page, but I think not even that happens. Subsequent views would definitely link to that page, which ever way you typed the name of the page, for by then Wikimedia would have converted the text to NFC. I've a feeling Wiktionary might also normalise the text before generating the link as part of the functioning of the template. For example, if I type a link using {{l}} to the non-existent page for the character, I get the same response either way, and the new page name has two Unicode characters, is is to be expected with form NFC. --RichardW57m (talk) 12:02, 30 June 2023 (UTC)[reply]

I see. Thank you for the explanation. —— GianWiki (talk) 12:35, 30 June 2023 (UTC)[reply]

Also, I managed to satisfy myself that I could access à using both composed and decomposed URLs. In my experiment, I found I reached the same page for both NFC and NFD strings using MS Edge. The strings acted differently when I edited them on the address line, so one reaches the same Wiktionary page if the external URIs are canonically equivalent. --RichardW57m (talk) 15:35, 30 June 2023 (UTC)[reply]

Eliminate distinction between "obsolete terms" and "obsolete forms"

Same for "archaic terms" vs. "archaic forms", "rare terms" vs. "rare forms", etc. Currently we have three categories CAT:English obsolete terms, CAT:English terms with obsolete senses and CAT:English obsolete forms. I gather the intended distinction between "obsolete term" and "obsolete form" is (maybe?) that obsolete forms are obsolete variants of current terms (e.g. absolete in place of obsolete), and that obsolete terms don't necessarily have a replacement in Modern English, but this distinction is not consistently made in English and I doubt it's consistently made anywhere. Furthermore, CAT:English obsolete forms is a subcategory of CAT:English terms with obsolete senses, which seems wrong in that most of these obsolete forms are not merely a single sense of a term with other non-obsolete senses (the intended use of CAT:English terms with obsolete senses), but obsolete in all of their senses (usually only one). {{obsolete form of}} categorizes into CAT:English obsolete forms while the 'obsolete' label categorizes into either CAT:English obsolete terms or CAT:English terms with obsolete senses depending on whether {{lb}} or {{tlb}} is used. There's an additional issue in that {{obsolete form of}} doesn't allow for a distinction between lemmas and non-lemma forms (and "forms" in the category name suggests to me non-lemma forms, which isn't the case here). I propose to redefine {{obsolete form of|LANG|FOO}} to be equivalent to {{lb|LANG|obsolete}} {{alt form|LANG|FOO}}. This at least deals with the inconsistent distribution of terms across the "terms" vs. "forms" categories, and simplifies the category structure. In the longer run we can then maybe think of using a bot to make the above conversion and deprecating {{obsolete form of}}. Benwing2 (talk) 00:08, 1 July 2023 (UTC)[reply]

The example you used ({{obsolete form of|LANG|FOO}}) is intended to communicate that the term is an obsolete alternative form of FOO. Wouldn't {{lb|LANG|obsolete}} {{alt form|LANG|FOO}} just categorize the term as having an obsolete sense (the implication being that is an obsolete sense wherein it is an alternative form of FOO)? Did you mean to use {{tlb}}? —— GianWiki (talk) 06:13, 1 July 2023 (UTC)[reply]

@GianWiki Hmm, yes, you are right, {{tlb}} would be better in most cases (unless there is another definition of the same term that isn't obsolete, which is possible but unlikely). If we do the bot conversion, the bot can check to see whether there's another definition and automatically generate {{lb}} or {{tlb}} appropriately. Benwing2 (talk) 06:37, 1 July 2023 (UTC)[reply]

I have tried hard to dissect the issue from your motion (and seemingly other people have even more difficulties or similarly require exertion to see the problem). I am not sure that {{obsolete form of}} should allow distinction between lemmas and non-lemma forms; it does not suggest non-lemma forms to me, in so far as I conceptualize lemmas as centralized main-entries, so technically only main entries can be obsolete terms and have obsolete senses (even if in some cases individual senses are only found at an alternative form entry, which I even avoid if the form chosen as main form does not strictly have the sense in the main form, by employing context labels and usage notes).

It may thus me an artificially, and unnecessarily, attained belief that we need further gradation (Abstufungen) of alternative forms which are obsolete by distinguishing whether they relate to lemma forms directly or to non-lemma forms (which is kind of artificial a distinction anyway because we just need a citation form to portray information). We will not be able to make a split anyway if you don’t suggest terminology, like hyponyms of the current “obsolete forms”. But I like the distinction of obsolete forms (and rare etc. forms) from “obsolete terms” in that only the latter contain definitional content making them main-entries. Where pages categorized as having obsolete senses are pages that contain definitional content making them main-entries while not solely consisting of obsolete definitions. So how could I see even more complication? Everything obsolete but not in definitional pages makes “obsolete forms”, for now, whether it relates to particular inflections or whole paradigms. The distinction has no quandaries, as far I see.

It is true that our English pages fail to adhere to the theoretical foundation, since they largely even have been created before our category systems and corresponding templates at large have been created (in the middle of the last decade by Benwing, Rua and Erutuon, as it is in my memory) and older editors continue to apply our templates with lesser logics (admittedly, one needs to be here for not few months to find out about many things and then acquires habits which put pressure upon reason), but in recent years adherence improves, and for more recently covered languages it is quite reliable: Category:Arabic obsolete terms, Category:Arabic terms with obsolete senses, Category:Arabic obsolete forms I can tell since I largely made them and set standards for other editors, who copied my editing patterns, while new editors rather learn from foreign-language formatting as the thousandfold mass of English entries has been recognized to qualify as the worst offender. Fay Freak (talk) 20:33, 2 July 2023 (UTC)[reply]

I'm much in favour of eliminating the distinction between "CAT:English terms with X senses" and "CAT:English X terms". {{tlb}} is an utter failure. You are meant to put it at the end of the headword line, but it's too difficult to notice it there! Therefore critical information, like the term being obsolete, is missed. The drastic underpopulation of the "X terms" categories says it all.

Having said that, I think there is some value in keeping the "forms" separate from the "terms". Unlike what you say, it seems to me that the distinction between the two is generally well maintained:

in Cat:English obsolete forms I find originall = original, origyne = origin, and orizon = orison - all old forms of current vocabulary.
in Cat:English terms with obsolete senses we have originary (“being the origin of”), orignal (“moose”), and orismologic (“relating to the explanation of technical terms”) - dusty old words in and of themselves.

This, that and the other (talk) 10:02, 6 July 2023 (UTC)[reply]