Wiktionary:Beer parlour/2019/March

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Mozilla releases 1,400 hours of voice recordings

[edit]

https://venturebeat.com/2019/02/28/mozilla-updates-common-voice-dataset-with-1400-hours-of-speech-across-19-languages/Justin (koavf)TCM 02:44, 1 March 2019 (UTC)[reply]

It's whole sentences, rather than isolated words, so I think it's not especially useful for us at this point. —Μετάknowledgediscuss/deeds 04:35, 1 March 2019 (UTC)[reply]
Perhaps we could use them as usexes that have audio, if the license is compatible. —Suzukaze-c 03:05, 4 March 2019 (UTC)[reply]
@Suzukaze-c: It's CC-0. —Justin (koavf)TCM 03:27, 4 March 2019 (UTC)[reply]
Just had a look at the Italian portion of the dataset and it's definitely good usex material, lots of natural-sounding language. The recording quality varies, some have audible background noise. – Jberkel 22:36, 4 March 2019 (UTC)[reply]

Limit the table of contents to language names

[edit]

In the table of contents I don't think there's a lot of use in listing subsections beyond the different language entries. In the vast majority of entries you can see all the different subsections within the space of a screen anyway. Personally, I have never ever tried to go to a specific subsection of an entry through the table of contents, whereas I have spent a surprising amount of time scrolling through long and messy tables of contents trying to find the language I'm looking for. The exception would be articles not in the main namespace. ─ ReconditeRodent « talk · contribs » 00:11, 3 March 2019 (UTC)[reply]

I tend to agree, but is there an easy way to do that? DTLHS (talk) 00:12, 3 March 2019 (UTC)[reply]
I find it useful to be able to click to the etymology when there are a number of different ones. When there are 5 or 6 different homographs, I find it easier to navigate using the ToC. Andrew Sheedy (talk) 00:37, 3 March 2019 (UTC)[reply]
Fair enough, though I'm assuming this is for when you're already familiar with an entry(?) (since "Etymology #" isn't very descriptive otherwise.) ─ ReconditeRodent « talk · contribs » 01:31, 3 March 2019 (UTC)[reply]
The following CSS should hide all but the top-level headings in the ToC in the main namespace: .ns-0 .toclevel-1 ul { display: none; }. Add to your common.css page, or try it out by entering mw.util.addCSS('.ns-0 .toclevel-1 ul { display: none; }') in your browser's JavaScript console. — Eru·tuon 00:47, 3 March 2019 (UTC)[reply]
I'm with Andrew. Has RR considered using the right hand side placement of the table of contents, achieved by a gadget? I also wonder whether a gadget could accomplish selective repression of the offending parts of the ToC. DCDuring (talk) 00:49, 3 March 2019 (UTC)[reply]
@Erutuon I take it that one could specifiy .toclevel-[2,3,4,etc] with the corresponding reduced display. DCDuring (talk) 00:52, 3 March 2019 (UTC)[reply]
@DCDuring: Yep, that works too if you want to show more header levels. I imagine getting it to look consistent (for instance, to always show part-of-speech headers even when they are at different header levels) would require JavaScript, though. — Eru·tuon 01:00, 3 March 2019 (UTC)[reply]
Hey, wow, that's cool! Thanks!
Well I guess I'm happy then, though instinctively I still feel this would be a better default. Would it be impertinent to suggest a vote/!vote? ─ ReconditeRodent « talk · contribs » 01:31, 3 March 2019 (UTC)[reply]
@ReconditeRodent: It's a good idea to make sure that the vote has some chance of passing first. For my part, I am not in favor. — Eru·tuon 01:41, 3 March 2019 (UTC)[reply]
I like to see how many etymologies there are. Equinox 01:43, 3 March 2019 (UTC)[reply]
Through the table of contents the editor can see if he has sorted the headings wrongly or used unbalanced equal signs and the like, which former cannot be easily seen since level 4 and level 5 are of the same size. But I do not even peruse this advantage since I use tabbed browsing. Fay Freak (talk) 12:27, 3 March 2019 (UTC)[reply]
I prefer seeing the other headers, although I don't know if that means it should be the default for everyone as opposed to just something users like me opt-out of changing. In any event it might be useful to make the code for hiding non-language headers a gadget users could find in their Gadgets tab. - -sche (discuss) 17:45, 3 March 2019 (UTC)[reply]

This thread inspired me to make a super compact TOC CSS work again, and here it is:

/* Use simple horizontal TOC */
/* Appearance: Language names are layed out as a horizontal list and are the only items
   shown in the TOC; borders are only horizontal ones; the result is very compact
   and minimalistic. */
.ns-0 div#toc ul ul { display: none; } /* Reduce the depth of shown headings in TOC */
div#toc span.tocnumber { display: none; } /* Hide numbers in TOC */
.ns-0 .toclevel-1 ul { display: none; }
.ns-0 div#toc li { display: inline; }
.ns-0 div#toc li + li:before { content: ' · '; }
.ns-0 div.toctitle { display: none; }
.ns-0 div#toc { border-color: #DDDDFF; border-right: none; border-left: none; background-color: white; padding-top: 0px; }

In kilo-, it produces approximately this:

English · Czech · Danish · Dutch · Finnish · German · Hungarian · Italian · Latvian · Norwegian Bokmål · Norwegian Nynorsk · Polish · Portuguese · Romanian · Slovak · Slovene · Spanish · Swedish · Turkish

--Dan Polansky (talk) 18:48, 3 March 2019 (UTC)[reply]

Wow! That's lovely. It's amazing what can be done with CSS. I would use it if I didn't sometimes want to find subheaders from the ToC. — Eru·tuon 20:18, 3 March 2019 (UTC)[reply]
I might try to figure out how to make a compact layout that uses two levels of headings, but I am no CSS guru; the core ideas of the posted code were provided by someone else on en wikt. Incidentally, Wiktionary:Votes/2012-10/Enabling Tabbed Languages passed, and the super compact TOC is no worse than tabbed languages as for availability of subheaders. I used to use the compact TOC CSS before the tabbed language vote, and I am using it right now. --Dan Polansky (talk) 20:41, 3 March 2019 (UTC)[reply]
As an aside, thank you for that mw.util.addCSS hint; it is very nice for finetuning CSS. --Dan Polansky (talk) 20:47, 3 March 2019 (UTC)[reply]

For what it's worth, my personal CSS gives me this sort of experience. It's not nearly as wonderfully minimalist as Dan's CSS; it keeps all headings. —Suzukaze-c 03:02, 4 March 2019 (UTC)[reply]

@Suzukaze-c: I like that because I can still see all the headers, but it requires less scrolling. I've enabled my own slightly modified version (not yet saved on-wiki). — Eru·tuon 03:23, 5 March 2019 (UTC)[reply]
How about this? It is quite hacky though. It hides all subsections but still keeps the numbered etymology sections visible as numbers after the language. — surjection?15:56, 11 March 2019 (UTC)[reply]

From Okinawan onwards I keep getting the error message: “Lua error: not enough memory”. ---> Tooironic (talk) 10:02, 3 March 2019 (UTC)[reply]

I noticed the same thing with "me". Happens for every template. ─ ReconditeRodent « talk · contribs » 13:02, 3 March 2019 (UTC)[reply]
Some of the modules used (via templates) on the page use a lot of memory, more memory than pages are alloted. This has also hit e.g. water and man in the past (and discussions can be found in the archives of this page and the Grease Pit) and led to translations being moved to subpages. Transliteration (whether generating it or just checking a manually input one) seems to be among the things which is "expensive". Ultimately, we're going to have to do fewer "expensive" things with Lua, or at least (as we did with {{t-simple}}) have a set of much simpler or even Lua-less templates for use on large pages like this; for example, pages like this could use simpler headword templates that would just have the romaji input manually as a parameter and not invoke Lua to generate or check it. - -sche (discuss) 16:47, 3 March 2019 (UTC)[reply]
(edit conflict) See CAT:E. I've cleared all the module errors except for this one, so, for the moment, it doubles as a list of pages with this problem. This is a a recurring problem with large entries: each template that calls a module uses memory for that module, and no entry is allowed to use more than 50 MB of module memory. The location where it starts getting the error isn't all that significant, since it results from the system's order of executing the modules. Generally it's not any specific item, but the total number of them that causes the problem.
The only solution is to reduce the total execution time of all the templates in the entry. This is not easy, but here are a few tips (I'm sure @Erutuon can expand on/correct this):
  1. The easiest step is adding the entry to the opt-out list at {{redlink category}}. This template is called by every linking template such as {{l}},{{m}},{{t}}, and {{t+}} so its module use is multiplied over the entire number of linking templates in the entry. That's already been done for the 6 entries in question.
  2. Get rid of any unnecessary module-using content, such as duplication.
  3. Replace linking templates with ones that use less memory. Linking templates do a lot of work behind the scenes to check things and get the information needed to produce the correct link and display the text properly. The {{t-simple}} template does only the bare minimum of this for a translation template, and may be substituted for the regular ones where the loss of functionality isn't a problem.
  4. Replace linking templates with hard-coded wikitext- plain wikitext doesn't use module memory. For links to English entries, there's no need to look up the language information for displaying the text, since English is the default language here: {{l|en|word}} is functionally equivalent to [[word#English|word]]
  5. Move the largest blocks of linking templates such as translation tables or derived-term/compound lists to a subpage or appendix and provide a link to it in the entry. Moving quotes to the citation tab is another variant. Some of the most intractable memory-hogs have required this.
Chuck Entz (talk) 17:03, 3 March 2019 (UTC)[reply]
All good tips. Another way to reduce memory usage, which I just did for do, is to replace a bunch of individual linking templates ({{l}}) with a column template (in this case, {{rel2}}). Each template invokes a module, and each invocation uses a certain amount of "startup" memory, so fewer module-based templates tend to use less memory, when they are doing roughly the same job. — Eru·tuon 20:13, 3 March 2019 (UTC)[reply]
Thanks for the suggestions, but three CJKV characters, , , has been placed in Category:Pages with module errors for a very long time. Sometimes the memory works fine, but after a few days (even though no one has edited the entry), the page runs out of memory again. And after a few days it works again. This has been going on for a long time. Any idea what is the main cause of this issue? KevinUp (talk) 13:31, 4 March 2019 (UTC)[reply]
You can preview each section and look at the parser profiling data at the bottom of the page to see how much memory is used by that section (some memory use is shared between sections, so the numbers may add up to more than 50MB). The basic issue is that those entries have lots and lots of templates which call lots and lots of modules. When you consider all the things that these modules do, it's not surprising that they're pushing the limits of what the system will allow. Chuck Entz (talk) 14:48, 4 March 2019 (UTC)[reply]
  • This seems to be a significant issue that has a real impact on users' experience of Wiktionary, at least those who want to look up simple words with multiple definitions in multiple languages. I would recommend we use language-specific soft-redirects (i.e. sub-pages) for words like , , etc. Surely this would free-up lua memory enough so all the information can be displayed for users, even if a second click is required. As someone with no IT expertise, I can only hope one of our talented editors can facilitate such a solution. ---> Tooironic (talk) 05:32, 9 March 2019 (UTC)[reply]
Update: Thanks to User:Erutuon, Lua memory in is now reduced to 44.98 MB by subsituting {{ja-r}} with {{ja-r/multi}} and {{ja-r/args}} as well as {{zh-der}} with {{zh-der/fast}}. KevinUp (talk) 09:19, 11 March 2019 (UTC)[reply]

Pseudo-X-isms by language

[edit]

There's CAT:Pseudo-anglicisms by language (which incidentally should perhaps use a capital A for consistency with other such terms). There are pseudo-Latinisms like noli illegitimi carborundum, which could go in a subcat of CAT:Pseudo-Latinisms by language. There are pseudo-Gallicisms like quoi ci quoi ça and double entendre. There's also CAT:Pseudo-Italianisms by language (and some English pseudo-Italianisms discussed here). There must be others. I think these should be grouped into a category for "Pseudo-X-isms by language", similar to "Borrowed terms by language". What should they be called? "Pseudo-borrowings", "pseudo-loans"? - -sche (discuss) 17:20, 3 March 2019 (UTC)[reply]

I thought of "pseudo-foreignisms" at some point. Per utramque cavernam 17:38, 3 March 2019 (UTC)[reply]
Now that I've page through Google Books results, I think "pseudo-loans" is most common, followed by "pseudo-borrowings", followed by "pseudo-foreignisms". - -sche (discuss) 19:12, 3 March 2019 (UTC)[reply]
I set up Category:Pseudo-loans by language. At the moment both "Portuguese pseudo-loans" and "Pseudo-anglicisms by language" -type categories are subcategories of it; somebody may want to change that or privilege the latter to be at the start of the list (with *) or the like. - -sche (discuss) 22:46, 7 March 2019 (UTC)[reply]

Stress of compound words

[edit]

Why isn't the stress(es) of compound words and the like reflected in their entries? E.g. ,bee's knees vs 'bee sting --Backinstadiums (talk) 19:24, 4 March 2019 (UTC)[reply]

@Backinstadiums: What do you mean by "reflected"? Stress is usually given in the Pronunciation section. You can include the stress by adding pronunciation transcriptions to bee's knees and bee sting. — Eru·tuon 22:47, 4 March 2019 (UTC)[reply]
@Erutuon: I meant addition. Is there any compound of the like that shows it? They're lexicalized sometimes and I am not a native speaker --Backinstadiums (talk) 01:50, 5 March 2019 (UTC)[reply]
@Backinstadiums: You can find examples of English words that are categorized as compounds and have a stress mark in an IPA template by searching : incategory:"English compound words" hastemplate:"IPA" insource:/\{\{IPA\|[^|}]+ˈ/. Some are very old compounds that are not felt as compounds anymore (like island), but others are probably more "fresh" compounds. — Eru·tuon 01:57, 5 March 2019 (UTC)[reply]

@Erutuon: According to the Longman Pronunciation Dictionary, " Usually, Compound words / phrases have early/late stress, respectively. Yet, among grammatical compounds pronounced with late stress are those where the first element names the material or ingredient (except for the terms cake, juice, water, so ˈorange juice), so a ˌpork ˈpie, a ˌrubber ˈduck, or a ˌpaper ˈbag (bag made of paper) but ˈpaper bag (bag for newspapers). --Backinstadiums (talk) 17:20, 6 March 2019 (UTC)[reply]

Other Germanic languages have the same stress distinction. The Dutch equivalents of those phrases all have stress on the same part as in English, but in each case the distinction is also visible in spelling because late stress has a space while early stress does not: ˈsinaasappelsap, ˌrubberen ˈeend (but ˈrubbereend has early stress), paˌpieren ˈzak (but paˈpierzak has early stress). Late stress is associated with adjective-noun phrases like wooden box, which suggests that "rubber duck" and relatives are in fact also syntactically an adjective-noun phrase and not compounds. —Rua (mew) 17:15, 9 March 2019 (UTC)[reply]
The crux here is defining cleary what is a compound. JMGN (talk) 20:25, 24 October 2024 (UTC)[reply]

What is the policy on suprasegmental prosody? --Backinstadiums (talk) 17:08, 9 March 2019 (UTC)[reply]

africates: / d͡ʒ͜ɹ , ʃ͡ɹ /

[edit]

Pondering about some pronunciations of words such as imagery /ˈɪmɪ.d͡ʒ͜ɹɪ/ or dangerous, I infer that IPA should recognize as such africates such as / d͡ʒ͜ɹ / (and even / ʃ͡ɹ / in shrub), just as currently is / t̠ɹ̠̊˔ /. What are the guidelines on this issue in Wiktionary? https://www.youtube.com/watch?v=mH5FbbusdkI --Backinstadiums (talk) 01:52, 5 March 2019 (UTC)[reply]

If the IPA doesn't recognize it, why should we blaze the trail? What's the distinction between /ʃ͡ɹ/ and /ʃɹ/? /ʃ͡ɹ/ is certainly going to confuse people, and I don't see a value add.--Prosfilaes (talk) 05:22, 5 March 2019 (UTC)[reply]
[d͡ʒ͜ɹ] and [ʃ͡ɹ] (they should be in square brackets because they are not phonemes) don't really look like affricates. Affricates are basically stops with a fricative release. They don't qualify as affricates: [d͡ʒ͜ɹ] has an extra approximant at the end (d, ʒ, ɹ) and [ʃ͡ɹ] has a fricative and an approximant, not a stop and fricative. About guidelines, guidelines for what? — Eru·tuon 06:10, 5 March 2019 (UTC)[reply]

Read-only mode for up to 15 minutes on 19 March 15:00 UTC

[edit]

Hi everyone, a short notice. On 19 March 15:00 UTC your wiki will briefly be in read-only mode. That means that you’ll be able to read it, but not edit. This is because of network maintenance. It will last up to 15 minutes, but probably shorter. You can read more on Phabricator (phab:T217441, phab:T187960), or write on my talk page if you’ve got any questions. /Johan (WMF) (talk) 14:52, 5 March 2019 (UTC)[reply]

Unprotection of user scripts

[edit]

User:Yair rand/newentrywiz.js is currently admin-protected. It's unnecessary (at least now, maybe not back when it was protected) because only admins and interface admins can edit user JavaScript pages. Could it be unprotected so that lowly interface administrators like me can edit it? — Eru·tuon 01:35, 6 March 2019 (UTC)[reply]

Done. It should be moved to MW namespace. Dixtosa (talk) 19:03, 6 March 2019 (UTC)[reply]

Order of etymologies

[edit]

Does Wiktionary have a policy for the order etymologies should go in? I noticed that on fly, the first etymology listed is an obscure (relative to the other ones) dialectical word meaning "wing." It seems to me that when some etymologies are significantly more notable than others, they should go first. Is there a policy I'm unaware of that makes the current order correct, or should it be changed? Nloveladyallen (talk) 00:20, 7 March 2019 (UTC)[reply]

The Japanese entry ない has the same problem. Etymology 1 is an unproductive suffix only found in a small number of words. --Dine2016 (talk) 05:00, 7 March 2019 (UTC)[reply]
I am not aware of a policy, but I think you can be bold and re-order the sections to what you deem the most logical. If someone else disagrees they can bring it up for discussion. - TheDaveRoss 13:25, 7 March 2019 (UTC)[reply]
When I see obscure/dialect things as the first ety for an everyday word, I swap them around. Equinox 16:45, 7 March 2019 (UTC)[reply]
I agree, though I also tend to move up older etymologies, provided they still have at least one definition in common, widespread use. DCDuring (talk) 17:58, 7 March 2019 (UTC)[reply]
As to why it's like that on some entries: some people, at least historically, preferred to put the oldest / first-attested etymologies/words first. (And some users, at least historically, straight-up put uncommon or obsolete or dialectal etymologies/words first even when more common ones are equally old...) Please do re-order them. - -sche (discuss) 22:52, 7 March 2019 (UTC)[reply]
  • It's a matter of the editor's personal preference. AFAIK the order of etymologies (and definitions) has not been enshrined in Wiktionary policy. IMO we should put the most common usages first, and that's the way I have been editing. ---> Tooironic (talk) 05:27, 9 March 2019 (UTC)[reply]

[Japanese] Should historical Kanji readings always be noted whenever applicable?

[edit]

For example, the historical inscription of 川 in Kunyomi is かは (which has since been reformed to かわ), so should every instance of 川 in a word being read as かわ have かは as a historical hiragana? I noticed that most entries do not bother adding it but a few of them do. --Four-fifths (talk) 04:29, 9 March 2019 (UTC)[reply]

You could start adding them, of course. It's just missing information. —Suzukaze-c 04:38, 9 March 2019 (UTC)[reply]
I would prefer for historical kanji readings/spellings to be added only if the historical reading is attestable in historical literature. KevinUp (talk) 08:19, 9 March 2019 (UTC)[reply]
A number of historical literature does not spell according to 歴史的仮名遣い. Here is an example where 全(まと)う, reduced from 全(まっと)う, is spelled またふ despite the correct historical spelling being またう. --Dine2016 (talk) 10:59, 9 March 2019 (UTC)[reply]
IFF you're transcribing some text that includes 川, and other parts of that text use historical spellings, then sure, add かは as furigana for 川.
However, "should every instance of 川 in a word being read as かわ have かは as a historical hiragana?" -- no, there's no value in doing so, and instead you risk confusing users who might think the historical kana spelling is still in use. And while 川 has only one possible historical kana spelling, as Dine2016 notes, various words could be rendered in kana in multiple ways to achieve the appropriate reading. For the example of 全う, both まとう and またう resulted in the same pronunciation from around the 1700s or 1800s as the /ɔː/ vowel converged with /oː/. There are many such instances of historical kana spellings that could technically be called "misspellings", so it is not always safe to assume that the "correct" historical kana spelling was the one always used for a given word. ‑‑ Eiríkr Útlendi │Tala við mig 05:07, 2 April 2019 (UTC)[reply]

Language family trees in category pages

[edit]

Hi, JohnC5 and I would like to add language family trees (generated by Module:family tree) to language categories, probably at the bottom of the text, directly above the lists of subcategories and pages in the category. This has been a plan for a while, but thanks to some HTML and CSS work by Suzukaze-c (and some work by me), the tree is finally in a presentable state.

As an example, the following tree would be added to Category:Proto-Germanic language. It shows the descendants of Proto-Germanic, based on the language data that is used in our entries. Click "Expand" to see the tree.

Some aspects of the tree are confusing. Etymology languages (such as American English) are shown as children of the languages, language families, or language variants that they belong to (in this case, English). This does not mean that they are descendants (like English is a descendant of Middle English); we simply don't have a better way to display them in the tree.

Currently, language families have a tree emoji after them (🌳) and etymology languages have a speech bubble (💭). This could be changed.

Some aspects of the style of the tree are not set in stone. One disagreement is the position of the tree icon: on the left side or the right side of the language family name. Currently it's on the right so that all the language and language family names line up. If you have an opinion on this either way, please let us know.

Is there any opposition to this idea? Also, any ideas for improvements? — Eru·tuon 08:14, 9 March 2019 (UTC)[reply]

Since nobody objected, the trees have been added to language categories. The icons have changed, though. Suggestions still welcome. — Eru·tuon 05:19, 17 March 2019 (UTC)[reply]
@Erutuon I've been looking at this and noticed some room for improvement. Families are defined by being descended from a particular ancestral language. The Frisian languages are those descended from Old Frisian, the North Germanic are those descended from Proto-Norse, and so on. I think it makes more sense to make families and their corresponding proto-languages a single node in the tree rather than two. Otherwise, every family would in theory would have exactly one child: its proto-language. —Rua (mew) 19:10, 4 April 2019 (UTC)[reply]
@Rua: There are currently undocumented display options |protounderfam= and |famunderproto= that do something similar to what you describe: show proto-languages directly below the family that they are the parent of, if they belong to that family, and the reverse. — Eru·tuon 19:16, 4 April 2019 (UTC)[reply]
What does it do if neither of those options is given? —Rua (mew) 19:20, 4 April 2019 (UTC)[reply]
You can see what it does in the tree above, with North Germanic and Proto-Norse. — Eru·tuon 19:22, 4 April 2019 (UTC)[reply]
That's the same as what |protounderfam= is supposed to do though, right? —Rua (mew) 19:24, 4 April 2019 (UTC)[reply]
Never mind, I see the difference now. I think my preference would be this format if the family has a proto-language: protolanguage (code) [family (code)]. —Rua (mew) 19:27, 4 April 2019 (UTC)[reply]
I'll implement that at some point. If there's ever some kind of vote on this, people will have to see all the possible options. — Eru·tuon 08:21, 6 April 2019 (UTC)[reply]

Encoding of apostrophe-like palatalisation marks in various languages

[edit]

There are various languages written in the Latin alphabet that use a mark resembling an apostrophe or prime to indicate palatalization. The exact Unicode code point to use is often not specified in the language, or used haphazardly without regard to the function that Unicode designates for the character. As a result, there are many variations in use, often within the same language as well. The difficulty of producing the correct mark often leads language users to use the simple ASCII apostrophe ('), which is not well suited for that purpose. More generally, Unicode characters that are designated "punctuation" are often used as well, even though the palatalization mark is not a punctuation character and is sometimes considered a proper letter of the alphabet in question.

As far as the orthography of Skolt Sami is concerned, however, the codepoint to use is actually standardised: ʹ (U+02B9 MODIFIER LETTER PRIME). This character is intended in Unicode for use in linguistics to represent palatalisation, and we use it in our transliterations of Russian as well. More importantly, because it's considered a letter and not punctuation by Unicode, applications will not use it to separate words and will select it along with the rest of a word when you doubleclick on it. Therefore, this seems like the character we should use, and I hereby propose we make this the standard for all such cases across languages. This would affect various Finnic languages (Veps, Võro and Votic), but I'm sure there are others that I'm not familiar with. Spellings with alternative palatalization signs can become redirects to the spellings using the proposed symbol. —Rua (mew) 15:06, 10 March 2019 (UTC)[reply]

Seems right. Fay Freak (talk) 16:24, 10 March 2019 (UTC)[reply]
For people who can read French, you may be interested by Wiktionnaire:Apostrophes. We try to record all the apostrophe-like mark we should use for any languages. Pamputt (talk) 09:51, 11 March 2019 (UTC)[reply]
Please don't change the transliteration for Russian or other languages, which transliterate "ь" as "ʹ" (not a plain apostrophe), e.g. мать (matʹ, mother): "matʹ"! (Asking just in case). In Ukrainian in Belarusian a plain apostrophe is also a standard letter (different from "ь", which is also used) and Uzbek seems to use "ʻ".
Czech and Slovak uses a symbol, which is merged with the letter it palatises: e.g. ť as in mať (mother). --Anatoli T. (обсудить/вклад) 11:55, 11 March 2019 (UTC)[reply]
I think you misunderstood. —Rua (mew) 19:31, 11 March 2019 (UTC)[reply]
Yes, I did. --Anatoli T. (обсудить/вклад) 23:20, 11 March 2019 (UTC)[reply]
Seems OK, yes. Per utramque cavernam 19:37, 11 March 2019 (UTC)[reply]

: When Japanese Kyujitai and Traditional Chinese shapes for the same codepoint differ.

[edit]

I've noticed that the Japanese Kyujitai form and the Traditional Chinese form while sharing a Unicode codepoint, differ in that the Japanese form has an extra stroke joining the two stacked rectangles whereas the Traditional Chinese form does not. (Do they officially differ by stroke count?)

I'm assuming this is systematic across the majority of Japanese vs Traditional Chinese fonts.

I know we have some mechanisms for documenting when Simplified and Traditional Chinese forms have differing appearances but a shared codepoint? Do we do the same with Kyujitai vs Traditional? If so should we add that to this entry? If not should we start doing so? And what to do if such variation is not systematic but varies from font to font? — hippietrail (talk) 06:32, 11 March 2019 (UTC)[reply]

@Hippietrail: perhaps like 蝉#Translingual? where the usage notes describe the difference, or 浅#Translingual, where the IDS describes the difference. @KevinUpSuzukaze-c 07:35, 11 March 2019 (UTC)[reply]
No, both Traditional Chinese (Taiwan/Hong Kong standard) and Japanese kyūjitai of have the same stroke number and glyph appearance. The difference occurs due to Xin Zixing (新字形) in mainland China, which substitutes all characters containing with which is one stroke less. KevinUp (talk) 08:40, 11 March 2019 (UTC)[reply]
The revised form of (containing rather than ) can be found in books published in mainland China, such as calligraphic books and modern dictionaries such as Xiandai Hanyu Cidian. The different glyph forms have been noted in this edit. KevinUp (talk) 08:40, 11 March 2019 (UTC)[reply]

Very interesting - thanks everyone! Should we do more to document these differences? Especially in the translingual section that shows the forms? This is a kind of variant form after all, even though sharing a codepoint. Do we have a category for characters affected by Xin Zixing? Are there cases where the Xin Zixing does get its own Unicode codepoint due to differences being regarded more significant?

Also this means my photo is a bit of a quirk. I'm in Taiwan and I've taken photos of three different Japanese style "open for business" signs and all three actually use this Xin Zixing form rather than the Kyujitai or Traditional form. If anyone knowledgeable would like to alter the captions an descriptions here and/or over on Commons that would be great. I might upload the pictures of the other two signs too. — hippietrail (talk) 09:45, 11 March 2019 (UTC)[reply]

It turns out my photos are one of each form! Two are already at 営業中. I'll upload the third and I might make cropped versions of each too... — hippietrail (talk) 09:57, 11 March 2019 (UTC)[reply]
The characters in "commons:File:Japanese "open" sign in traditional characters.jpg" are written in nonstandard form:
  • The character (with instead of the orthodox ) is recorded as A02453-025 in 教育部異體字字典 (Dictionary of Chinese Character Variants).
  • If you look closely at the second character the bottom component of is written ⿳䒑一木 rather than its correct form ⿱䒑未.
  • I've modified the description of the file at Wikimedia Commons.
(By the way, this discussion belongs to the Tea Room) KevinUp (talk) 10:42, 11 March 2019 (UTC)[reply]
Thanks once more! Sorry I've been away from beer parlours and tea rooms for so long I've forgotten. Please feel free to move the discussion in case I do it the wrong way and make a mess. I've just uploaded the third variant too:
營業中
hippietrail (talk) 11:17, 11 March 2019 (UTC)[reply]

I've drafted this vote to define the supermajority we use, as well as what "fail" and "no consensus" mean. Please give me feedback, particularly regarding the higher standard for modifications to WT:CFI and WT:EL. —Μετάknowledgediscuss/deeds 00:29, 12 March 2019 (UTC)[reply]

Reminder to contribute to the discussion at Wiktionary talk:Votes/2019-03/Defining a supermajority for passing votes, particularly on issues like whether only admins should close votes, and the higher standard for CFI and EL mentioned above. —Μετάknowledgediscuss/deeds 14:44, 15 March 2019 (UTC)[reply]

Standardizing some template shortcuts

[edit]

Can we pick a standard for the shortcuts of alternative forms templates? Currently we have:

I am constantly forgetting which ones use a hyphen, a space, neither, or some combination thereof. Personally I would prefer a space with no hyphen for all of them: alt sp, alt form, and alt caps (without of) seem like the clearest option to me. Ultimateria (talk) 17:44, 12 March 2019 (UTC)[reply]

I take it back, I would keep of to be consistent with {{form of}}, {{synonym of}}, etc. Ultimateria (talk) 19:43, 12 March 2019 (UTC)[reply]

Page-deleter role?

[edit]

It's been suggested several times in the past few months that we break up admin responsibilities into smaller roles. Personally, I think one such role could be that of "page-deleter".

As I've written here, I see the blocking tool as "the most powerful tool, and the one requiring the most discernment"; this means that someone trusted with it can easily be trusted with all the rest and be made an admin (I'm not the only one thinking as much). The reverse is not necessarily true: one could be trusted to make a good job as a page-deleter, but not as a blocker. That's why I think having the possibility of granting page-deleting rights as separate from adminship could be useful.

A user entrusted with that role would be able to delete entries that failed RFD, vandalistic entries, spam entries or spam user pages, empty categories tagged for deletion, wrong bot entries, unwanted redirects, etc.

If this is accepted, another question would arise: on what basis should it be granted: a vote? A whitelisting-like nomination?

What do you think? Per utramque cavernam 18:14, 12 March 2019 (UTC)[reply]

While I feel slightly less strongly about this idea than the blocker version, it still feels like a solution in search of a problem, and I would still probably prefer that we just have a single role for blocking/deleting. Deletion is also a multifaceted function, which of the following functions would be included: page delete, page undelete, revision delete, revision undelete, view deleted/hidden revisions, delete logs entries, delete tags, mass delete? If it were to be a new role, I would suggest it be voted on. - TheDaveRoss 12:47, 13 March 2019 (UTC)[reply]

WMF proposes rebranding Wiktionary as a "Wikipedia project"

[edit]

WMF conducted a study discovering that "Wikipedia" is the most recognized name and project, while "Wikimedia" is less recognized. It proposes rebrading Wiktionary as a "Wikipedia project". For public feedback, you should go to meta:Talk:Communications/Wikimedia brands/2030 research and planning/community review; for private, email to brandproject(at)wikimedia.org. --George Ho (talk) 21:12, 12 March 2019 (UTC)[reply]

My feedback is that Wikipedia is awful and I'd hate to be affiliated with it. DTLHS (talk) 21:16, 12 March 2019 (UTC)[reply]
Nothing like a needless "rebrand" to suck up volunteer donations :( Equinox 21:54, 12 March 2019 (UTC)[reply]

You may now become 'Wiktionary — A Wikipedia project'

[edit]

According to this discussion at Meta, Wikimedia Foundation is considering rebranding. This means for you, that rather than Wiktionary being a Wikimedia project, it would become a Wikipedia project.

The proposed changes also include

  • Providing clearer connections to the sister projects from Wikipedia to drive increased awareness, usage and contributions to all movement projects.

While raising such awareness in my opinion is a good thing, do you think classifying you as a 'Wikipedia' project would cause confusion? Do you think newcomers would have a high risk of erroneously applying some of Wikipedia principles and policies here which do not apply? If so, what confusion? Could you please detail this. I have raised a query about that HERE in general, but I am looking for specific feedback.

Please translate this message to other languages. --Gryllida 23:05, 12 March 2019 (UTC)[reply]

@Gryllida: This is a terrible idea. We frequently have newcomers, both with and without actual experience editing Wikipedia, attempting to apply English Wikipedia policies like notability (which has a local, but very different lexicographical equivalent) and 3RR (which does not exist here). We try to patiently point them toward noticing the name of the website they are currently editing, and to acknowledge that they are separate projects. I can only imagine how much more confusion there will be if this were to go through. —Μετάknowledgediscuss/deeds 00:30, 13 March 2019 (UTC)[reply]
Thank you for these clarifications. Three questions:
  • Apart from notability and 3RR, is there anything else that is different?
  • Would you be willing to give examples of these confused newcomers and the communication with them?
  • I've found Wiktionary:Wiktionary_for_Wikipedians. It talks about the differences. Is it up to date? Is there any other relevant documentation that you would share in response to this question? Gryllida 01:09, 13 March 2019 (UTC)[reply]
    I find it confusing that in this discussion Wikipedia appears to stand for the English Wikipedia, and Wiktionary for the English Wiktionary. For each language, these have their own policies and customs. As to the respective English-language projects, it is easier to list the commonalities: (0) Like all Wikimedia projects, both use MediaWiki software; (1) Anyone can edit Wiktionary (but, unlike on Wikipedia, also anonymous IPs can create pages); (2) Users who are apparently not there to contribute to the project will soon find themselves blocked. (3) Only administrators, who get that role only after having been approved by the user community, can block users and delete pages. That’s about it.  --Lambiam 10:15, 13 March 2019 (UTC)[reply]
Currently we can speak of "Wikipedia policies" and "Wiktionary policies" (or "...votes", "editors", etc.). How are we supposed to distinguish these things, in speech and writing, after the word "Wikipedia" subsumes Wiktionary? Equinox 00:49, 13 March 2019 (UTC)[reply]
If the rebranding is approved, the name 'Wiktionary' will remain. As I understand, it will become named 'a Wikipedia project' (the new branding) instead of 'a Wikimedia project' (the current branding), that is all.
While at the moment we see the 'a Wikimedia project' only at certain pages (the main page; {{sisterprojects}}; documentation; these places are pretty hard to discover), if the rebranding is approved, the belonging of the project to the family of Wikimedia (to-be Wikipedia) projects may be featured more prominently. Gryllida 01:05, 13 March 2019 (UTC)[reply]
Today I can say "he edits Wikipedia but not Wiktionary". How would I say that afterwards? Equinox 01:07, 13 March 2019 (UTC)[reply]
The same phrasing and names would apply. Their first name (Wiktionary, Wikipedia, etc) would remain the same and their last names ('a Wikimedia project', which nobody sees now, but after the rebranding they may become 'a Wikipedia project' and become more prominently shown to readers) would change, so to speak. Gryllida 01:12, 13 March 2019 (UTC)[reply]
How would you think of renaming 'Wikimedia' to 'Wikimania'? To name it 'a Wikimania project'? Perhaps Wikimedia Foundation likes this brand, and it does not cause as much confusion as 'a Wikipedia project'. It is probably not too bad that there is a conference with this name, it is about the same movement anyway. Gryllida 01:16, 13 March 2019 (UTC)[reply]
  • The problem is that most of the world is confused about everything in WikiWorld except Wikipedia. So for outward-facing presentation purposes we probably benefit from a more explicit connection with WP. This seems to me to be a lot like what I have to do when I explain this project which has consumed much of my time for more than a decade. I have to say "Wiktionary is like Wikipedia, except it's a dictionary. It's supported by the same foundation that supports Wikipedia." Two sentences; two mentions of Wikipedia. To me this re-branding is almost a non-event. It seems like a simple recognition of where we stand in the eyes of the world. DCDuring (talk) 02:29, 13 March 2019 (UTC)[reply]
    Thank you DCDuring. Since your position is that Wikipedia brand would not cause harm, how do you think about the Wikimania brand? Do you think naming Wiktionary 'a Wikimania project' would make any harm? Do you think this change would be as good as the 'a Wikipedia project' name? Gryllida 03:29, 13 March 2019 (UTC)[reply]
I agree that this isn't really a thing. Really what is happening is that Wikimedia is rebranding itself as Wikipedia. We wouldn't need to change anything around here. - TheDaveRoss 03:06, 13 March 2019 (UTC)[reply]
I'd also welcome the opportunity to know your opinion about the 'Wikimania' brand as well. It is not confirmed by Wikimedia at this stage but knowing your views about it would be nice. Gryllida 03:30, 13 March 2019 (UTC)[reply]
I thought you were joking. The Wiki movement has worked quite hard to be taken seriously and has finally achieved the objective for many audiences. 'Wikimania' would undermine all that progress IMO. It seems to convey the image of the lunatics running the asylum. DCDuring (talk) 03:47, 13 March 2019 (UTC)[reply]
Yeah, I find "Wikimania" a bit harder to take seriously. Equinox 03:54, 13 March 2019 (UTC
I am not an expert but my gut feeling is that the way things are now is fine. There is the saying- If its not broken, don't fix it. As a frequent editor, if this change was made, it would not affect me too much. Geographyinitiative (talk) 05:20, 13 March 2019 (UTC)[reply]
It aint broke - so don't try to fix it. SemperBlotto (talk) 06:55, 13 March 2019 (UTC)[reply]
While I agree with some sentiments that this could marginalize smaller projects the decision to rebrand makes sense. As a word, Wikimedia is just too close too Wikipedia and gets easily confused, both in reading and speaking. The choice of Wikimedia as an umbrella term was unfortunate in the first place. – Jberkel 11:59, 13 March 2019 (UTC)[reply]
I think the branding is broken. The proposed change seems reasonable.
I still get confused navigating among Wikimedia Foundation, MediaWiki, and Meta-Wiki. I hope that my confusion is not an indicator of the confusion of others. DCDuring (talk) 12:26, 13 March 2019 (UTC)[reply]
The MediaWiki vs Wikimedia naming is unfortunate. Back in 2003 the naming committees really got stuck on a theme. I don't think Wikimania is a better brand or name than any of the alternatives, I don't think it has any cachet outside of a subset of the Wikimedia community and I don't think it is strictly worse at indicating what the thing is that it is naming. Wikicon would be a better name for Wikimania to begin with, at least that follows the form of the thousands of other conventions. - TheDaveRoss 12:39, 13 March 2019 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────

Assuming that the purpose of having a unified brand is facilitating publicity for all projects, a major consideration is how evocative and easy-to-remember the brand name is. While currently Wikipedia is the best-known name associated with Wikimedia, with the right approach any well-chosen name can quickly become widely recognized; it is just a matter of generating publicity. I agree that Wikimedia was an unfortunate choice: not appropriately evocative (“media” is not a unifying focus), and easily confused with Wikipedia or MetaWiki. Replacing it by Wikipedia will raise the confusion to an unmanageable level. Wikimania may seem cool but has bad connotations that are just too strong and is irresistably inviting of the derived term Wikimaniac, which is fine for internal use, but we would not be able to keep its use contained. Why does the WMF not open up a contest for a unified brand name in the style of WikiXXX for some suitable term replacing XXX, with (after a preliminary selection producing a shortlist) the user community selecting the winner. My submission: Wikiworld. That certainly covers everything and has a nice alliteration. (I know there used to be a WikiWorld, but that has now been defunct for over 10 years.)  --Lambiam 14:52, 13 March 2019 (UTC)[reply]

I'm skeptical that any change will improve whatever perceived problem there might be. If Dine Brands Global Inc. changed their name, would people eat at Applebee's or IHOP more often? I doubt anyone would notice. -Mike (talk) 16:51, 13 March 2019 (UTC)[reply]
I'm boycotting Google because they aren't Google on the stock exchange anymore. DCDuring (talk) 17:05, 13 March 2019 (UTC)[reply]
There is a proposal at meta to have a brainstorming for different names. (Thanks Lambiam) The names proposed so far are 'wikipedia', 'wikiworld', 'wikimania', 'wikiweb'. Please share your proposals either here or there, at your convenience. Gryllida 00:43, 14 March 2019 (UTC)[reply]

Not only will it cause confusion because of an old sense competing with a new sense, if you rebrand Wikimedia to Wikipedia or any other project’s name, but it will also be factually untrue if you call Wiktionary “a Wikipedia project”. Wiktionary isn’t a Wikipedia project, won’t become one, shouldn’t become one, even if you do hold the Wikipedia brand in higher esteem, which I do not, thinking that the abyss would stare back; the confusion and separation issue is enough of a reason. If you do a rebranding do it only if that is worth it and don’t mingle projects in so much as they are intentionally separate.
Currently your issues are that Wikimedia is not distinctive enough, being only different in one grapheme or phoneme, though this issue is minor and can be ignored as it until this proposal has been ignored, and that one the other hand the merits of Wiktionary, as a project being as much of higher quality as it works distinctly, – the analogous with other projects like Wikispecies – are not highlighted enough. If you show an attachment of Wiktionary to Wikipedia you will pull it down and achieve the opposite of what you want to achieve. The messages must be and stay: Wiktionary will give you an experience that is well above that on Wikipedia. Wikipedia has lost its chances to be taken seriously, I am sorry to blackpill you, though the usefulness of Wikipedia is of course not debated by anyone, and Wiktionary is currently above it, as is Wikispecies, but people do not know the difference, only know Wikipedia. It is important to make known for those who have, rightly, lost hope in Wikipedia, that Wiktionary is 1. made by other editors 2. editors working pursuant to dissimilar principles and workflows, even if they also edit Wikipedia 3. describes a wholly unlike subject matter, hence the resulting project should be put not all on one level with Wikipedia. Fay Freak (talk) 04:27, 15 March 2019 (UTC)[reply]

Who is the “you” implied in “your issues”, used above? Are you addressing the Wikimedia Foundation? I don’t expect them to be monitoring the discussion on this page.  --Lambiam 09:16, 15 March 2019 (UTC)[reply]
Yes, Wikimedia’s, and also like one’s, the editors who try not to confuse when explaining; though I am not sure if they don’t even monitor this where they have posted, this being a Wikimedia project. Well I could repost it under meta:Talk:Communications/Wikimedia brands/2030 research and planning/community review#Wikipedia I guess since I now do not discern a different place for it; it would be a comparatively long answer there though. Fay Freak (talk) 14:38, 15 March 2019 (UTC)[reply]

Μετάknowledge, Fay Freak, , Μετάknowledge, DCDuring: As an alternative to saying 'a Wikipedia project' there is the possibility of saying 'a sister of Wikipedia'. This in my opinion may reduce confusion: it makes the sister project stand out as a separate project more clearly. That's what I commonly do when speaking with people about one of the sister wikis, when asking them to release an image under a free licence. They usually understand quickly. Do you think this option can reduce confusion here caused by people misinterpreting Wikipedia policies as Wiktionary's own? --Gryllida 18:25, 25 March 2019 (UTC)[reply]

People believe that WP rules apply here now, even though our name is distinct. I'm not sure that any plausible renaming will change that. DCDuring (talk) 19:59, 25 March 2019 (UTC)[reply]
At the same time, I don't see how labelling everything a "sister of Wikipedia" is any good either. It's like your only claim to fame is being a sibling of someone famous, rather than on your own merit. —Rua (mew) 20:11, 25 March 2019 (UTC)[reply]
Right, and therefore the effect of nivellating the image of Wiktionary would be the same. Fay Freak (talk) 15:09, 26 March 2019 (UTC)[reply]
[edit]

FYI, I started a discussion about deleting this feature on the talk page of the template. Noting here in case not everyone notices the discussion there. - TheDaveRoss 15:55, 14 March 2019 (UTC)[reply]

I hope this doesn't need a vote. It seems to me to merit a BP discussion, especially because the idea behind it is potentially of wider application. DCDuring (talk) 16:04, 14 March 2019 (UTC)[reply]
Although this is a really awful hacksaw-and-bailing-wire-and-duct-tape way of doing this (run a module from every linking template on every page that has linking templates, every time any such page loads, with an expensive parser function run every time if the template is linking to one of the target languages- really? To populate todo lists?), there are people who find the information it generates very useful, and no one seems to want to spend the time and effort to generate it by other, more sensible methods. Chuck Entz (talk) 03:19, 15 March 2019 (UTC)[reply]
It's not too much effort to implement something like this by analyzing the dumps. It would work for all languages, and could take other ways of linking into account (plain wikilinks), and perhaps even indicate orangelinks. – Jberkel 10:30, 15 March 2019 (UTC)[reply]

category: silent t

[edit]

Please could somebody create a category for words with a silent t: moisten, often, thistle etc. --Backinstadiums (talk) 14:45, 15 March 2019 (UTC)[reply]

Pronunciation trivia of this sort seems more suited for an appendix really. --Tropylium (talk) 09:47, 18 March 2019 (UTC)[reply]
This category would be very useful to learners. — Ungoliant (falai) 15:40, 26 March 2019 (UTC)[reply]

Translations in languages you don't know

[edit]

In User talk:Panglossa#Translations in languages you don't know, we read "Please avoid adding these. It is very easy to make mistakes, and even if you get the content right, you may end up adding it in the wrong way, as you did at walk, thus requiring someone else to clean up after you."

The original poster to the user talk page added e.g. Czech pivní sýr, and admitted they do not know Czech.

As far as I know, multiple established editors add translations in languages they do not know. A very recent example is diff, where at least Estonian and Greek do not match any Babel box.

Do we want users to receive such messages on their talk pages? Do we want to introduce a policy or recommendation to the effect of that message on the user talk page? --Dan Polansky (talk) 09:59, 17 March 2019 (UTC)[reply]

The way the message is phrased makes it sound like the boilerplate language of Wikipedia warning templates (“When moving pages, please remember to fix any double redirects”). Warning users when they make a kind of mistake that they are likely to repeat is by itself a good thing. It would have been better (I think) if the mistake had been specified more, and I think I might have phrased the warning like “please be extremely careful when...”. Perhaps a readable essay for new editors with positive advice on how one can contribute to Wiktionary (focus on languages you are familiar with) works better than introducing guidelines on what to avoid,  --Lambiam 12:08, 17 March 2019 (UTC)[reply]
Elsewhere, I made the following proposal:
Editors can contribute new entries even for languages that they do not know and have not studied. However, in such case, they are strongly encouraged to work very carefully with sources, and get acquainted with the lemmatization practice of the English Wiktionary for the language. For instance, for Latin, some dictionaries use e.g. stare as the lemma while Wiktionary uses sto as the lemma.
Whether that should have the status of policy, guideline or advice is a little less important, I think. --Dan Polansky (talk) 12:17, 17 March 2019 (UTC)[reply]
The status is less important than how easy a read it is.  --Lambiam 13:56, 18 March 2019 (UTC)[reply]
I agree that there shouldn't be an absolute prohibition from editing or adding translations in languages one doesn't know. I think in such cases one should be very careful, but certainly there are cases when being pretty much sub-A1 level in a given language doesn't preclude one from being able to consistently add correct and useful content in that language. There is in my experience a gradation in the degree to which one can be unfamiliar with languages not listed on one's Babel. For example: I am absolutely lost when confronted with Chinese or Nahuatl texts, but if you give me a Romanian word I am confident that I could with some effort find out whether it is in use, whether it is SOP or what its lemma form is. Perhaps we could for convenience create a Wiktionary namespace page or a new section on a relevant extant page with advice and warnings regarding possible pitfalls when editing/translating in languages one doesn't know (with a shortcut à la WT:ATTEST or WT:EL like idk, WT:UNFAMILIAR or w/e), but it'd be undesirable imo to prohibit such editing entirely (not least because proficiency is self-reported anyway, making such a rule difficult to enforce). — Mnemosientje (t · c) 13:08, 17 March 2019 (UTC)[reply]
Again, Dan, this isn't a matter of policy. I don't leave these messages for everyone doing it, and I wasn't going to for Panglossa until they made a mistake that I had to clean up. At that point, their contributions became a slight waste of another editor's time, and I therefore wanted them to stop doing that. It's that simple. —Μετάknowledgediscuss/deeds 15:09, 17 March 2019 (UTC)[reply]
I add such entries from time to time, and while I usually make sure the entry is correct, either by checking a dictionary or by asking a native speaker, I understand I can make mistakes, especially regarding the form of the entry. I welcome Μετάknowledge's warning about the correct procedure, but I also understand this a collective project, we contribute what we can and more knowledgeable peers will correct it if necessary. I will certainly be more careful from now on, but whenever I find something worth including, I will do so. Panglossa (talk) 15:19, 17 March 2019 (UTC)[reply]
@Panglossa: Thank you. If you're willing to put in the care to check both correctness and that the lemma/spelling/etc. meets with Wiktionary's standards, then I am perfectly satisfied. —Μετάknowledgediscuss/deeds 17:08, 17 March 2019 (UTC)[reply]
What about adding these under "Translation to be checked"? Panglossa (talk) 15:22, 17 March 2019 (UTC)[reply]
That's not really the purpose of the "Translations to be checked" sections, which are for translations where it's not known which of the translation sections a translation belongs to. Instead, you should use the template {{t-check}} where you would use {{t}}. This automatically tags it for checking by someone who knows the language, and also displays a message saying that it needs to be checked. This will alleviate much of the problem, though it still requires someone spending time later to clean up.
The biggest potential problem is that someone may add a translation that's wrong and that goes unnoticed. We don't have a really convenient way of finding all the translations in a given language, so it could be a long, long time before it's fixed. Translations are very hard to patrol, since they involved language-specific knowledge that no one person has for every language, and there's no way to check where the contributors got them.
If I see someone add or change translations in a large number of unrelated languages, that immediately raises my suspicions. Yesterday, a Canadian IP completely reworked the translation tables at middle in a single edit, with changes in multiple languages that I don't know. Fortunately, one involved changing an uppercase German noun to lowercase, which no one who knows anything about German would ever do, so I reverted all their edits and blocked them. They could have been mostly right, but the difficulty of sorting through all of their changes in all of those languages made throwing all of it out the only practical option once I knew they were seriously wrong on one aspect. Chuck Entz (talk) 15:59, 17 March 2019 (UTC)[reply]

Pronunciation respelling for English

[edit]

I propose adding the corresponding entries for the graphemes of some Pronunciation respellings for English, especially the one used by wikipedia, that is WIK-ih-PEE-dee-ə-Backinstadiums (talk) 14:51, 17 March 2019 (UTC)[reply]

Oppose. These aren't words, nor do they have any meaning in a language. —Rua (mew) 15:12, 17 March 2019 (UTC)[reply]
If you mean adding WIK-ih-PEE-dee-ə, then no. If you mean adding to e.g. ee that it's used to represent /i/, then maaaybe, but pronunciation respelling schemes are possibly too varied for us to want to try to include them all; they are as Rua says not words. (And in many works that use them, they're explained in appendices already.) - -sche (discuss) 19:43, 19 March 2019 (UTC)[reply]

Attestations of native toponyms mentioned in Latin texts

[edit]

Many old toponyms of Europe are found only in the form of mentions within Latin texts. Because the text itself is Latin, it seems that our CFI would treat these words as Latin. However, they are generally not Latin grammatically (i.e. they lack Latin endings), and are by and large written down by native speakers of the area in question, not native speakers of Latin. Thus, it can be argued that this is simply code-switching, inserting for example an Old Dutch name in its native form into an otherwise Latin text. If they are considered an attestation of the native language, we can include them in etymologies of modern place names, which is great. It wouldn't make sense to say that a modern Dutch place name is descended from a Latin name merely because the Old Dutch name was quoted in a Latin text. There really isn't anything Latin about these other than the language of the text they happen to appear in.

My question is whether these toponyms count as attestations for the local language, rather than Latin. I'm not sure if CFI says anything about this either way, but it certainly seems like it would be desirable to be able to include these. —Rua (mew) 15:12, 17 March 2019 (UTC)[reply]

If an undeclinable Latin form and an Old Dutch form (code-switched into Latin) are indistinguishable by form then according to some one and the same occurence attests this form for both languages. Since seemingly still people fail to see criteria for the lexicographical quality of an occurence with regard to code-switching.
I’d argue for a “favour for the smaller language.” If you say it is Latin one expects a bit clearer evidence that shows that these are names used in Latin, otherwise one could add place names without end because they somewhere appeared in Latin, which would be insipid. Whereas if you see such a thing for Old Dutch, one naturally can’t expect pull out more.
In my view toponyms and personal names should not even get their own language sections. They should be under L2 headers called “Name” or similar, other spellings being soft redirects like سميث being “Arabic spelling of Smith” for instance; also using own linking templates perchance. Things like Timișoara and its argument “is this Spanish?” will get on everybody’s wick at some point. Why do we need hundred entries for Srebrenica only because history books about the Yugoslav Wars have been written in hundred languages? Why is Karadžić according to Wikipedia an English name spoken /ˈkærədʒɪtʃ/? I don’t believe in the “pronunciation information” arguments. If a Turk lives in Germany his name will stay bare Turkish for seven generations and beyond. Eindeutschung according to peculiarities of law won’t help. Kowalski is still not a German name. And yeah, all the entries in Category:English surnames from German are German lexemes used in English discourses, if not the German spellings of Slavic names etc. Kaufman is German and not English. People just don’t realize that they don’t talk English any more when they use these names. No, this is not code-switching. Names work differently. In other words languages are sets that do not contain proper nouns, since, rightly observed, these stay if you switch the language. Fay Freak (talk) 17:30, 17 March 2019 (UTC)[reply]
Why don't English speakers need to know how Karadžić is pronounced by English speakers? Why is Kaufman German--as our entry points out, and w:Kaufman (surname) shows, it's not a name used in Germany. According to Kaufmann, the basic form is attested back to Old High German, so if it's not English, it's not German either.
Proper nouns do not necessarily stay if you switch language, as the translation table for Rome makes clear. Even a new city like Las Vegas has six Latin-script Wikipedias that chose a name for their article other than "Las Vegas" (or "Las Vegas, Nevada"), with Navajo ranking in as the most unusual with Naʼazhǫǫsh Hatsoh. Tokyo is named Tokyo, Tokio, Tòquio, Tokyô, Tōkyō, Tóquiu, Tókio and Tang-kiaⁿ-to͘. To go at it from another direction, Perth may be spelled the English way in many languages, but it is not pronounced the English way in most of them, the dental fricative being rare among the world's languages. It's a complex mess, and your rant bluntly ignores all the hard details.--Prosfilaes (talk) 09:22, 18 March 2019 (UTC)[reply]
“Perth may be spelled the English way in many languages, but it is not pronounced the English way in most of them, the dental fricative being rare among the world's languages” – does not seem so. This name does not appear in discourse in German and if a German tries to pronounce it he tries a dental fricative. There is currently no place in Australia or Scotland having a lexicalized German pronunciation. How would a Russian pronounce it? It would likely also be with a dental fricative, if the speaker knows about its existence in other languages.
Las Vegas being primarily inhabited by English speakers, it would of course be notable in its section. Or in general, if we have “Name” sections, we can put the English pronunciations at first; it would also include a German pronunciation, which also is lexicalized, /las ˈveːɡas/. But I would be we wary not to conjecture any like you do for “Perth”.
The spelling or pronunciation, or inflection, apparently does not say anything about nativization; it is not constitutive and can not be taken as an indication for a name being included in a language, not related to what lexicalization means. “Kaufman” is a German name, only and even though used in Germany in a different spelling. But the pronunciation information would not get lost, as I said. It is important to see that one can’t just talk about “names used in Germany”, “a name used in the United States” and the like. “Names” aren’t “used” this way. They aren’t used because they belong to a language but because they belong to a specific entity referred to; with rare exceptions. Only one in England for German, a few more in Italy (Rom, Mailand, Venedig, Florenz, Turin, Padua, Genua, Neapel and then it ends for any speaker, if I haven’t missed one, other places are perceived and spoken as if bare Italian, ignoring those in the now or once German-settled areas), and for Poland in former German-settled places both compete. Is Szczecin German because it is sometimes used in this form and not Stettin in German newspapers and the like? No, this is a wrong question, not even Stettin is German in this sense: “Rome” being different by language, even names calqued does not say anything, since names are changed even for one language: Са́нкт-Петербу́рг (Sánkt-Peterbúrg), Ленингра́д (Leningrád). See, place names and people can just be “renamed” pursuant to the law, this is also shows how names work differently: This question “is this of the language X” does not arise in such a form in nature for names but you ask this only because on Wiktionary you group all under a language, by grouping names independently you avoid such questions which are wrong.
I also want to emphasize that place names and personal names slant the statistics in the categories “X terms borrowed from Y”. One could go around in Germany and quote the local Russian journals for any commune in Germany, we get 11,000 “terms” borrowed from German into Russian this way (the approximate number of communes in Germany). No, this is underplayed, since the towns also have districts, so the number is actually higher, even if we count stretches of land of which seemingly no Russian has ever heard of. Fay Freak (talk) 14:26, 18 March 2019 (UTC)[reply]
A German who knows no English who tries a dental fricative does not in fact produce a dental fricative. Vowels will consistently get mangled, as you point out with Las Vegas. Names get mangled, both spelling and pronunciation, into various languages, particularly when the place or person doesn't speak the original language. Every major city of Europe has one or more cities named after it in the US, and all of those cities have their pronunciations anglicized. One of my friends grew up near Venice, Missouri, and it took her years to realize that the city was named after a place in Italy, as the cities, even in English, were not pronounced anywhere near the same.
Again, why is Kaufman German? If the Anglicization doesn't make it English, then the modernization doesn't make it other than Old High German.
Compare w:en:List_of_sovereign_states_and_dependent_territories_in_Europe, w:de:Liste_der_Länder_Europas, w:az:Avropa_ölkələrinin_siyahısı and w:lv:Eiropas_valstu_un_atkarīgo_teritoriju_uzskaitījums. Comparing the first and second lists make it clear that English and German disagree on the names of about half of the nations of Europe. An examination of the third and fourth list show that Latvian and Azerbaijani make a habit of changing the spelling of names.
Spellings are changed all the time by law, and language regulators like Académie française change the words for things. Places have different spellings and names depending on the language, and even pronunciations for the same name: /ˈkaʊ̯fˌman/ verus /kaʊfmæn/ for the name you brought up.
I understand that most place names just get adopted as is, with no real nativized pronunciation. But I don't think we can deal with that without recognizing that place names can be as intertangled with their language as any other noun.--Prosfilaes (talk) 01:43, 19 March 2019 (UTC)[reply]
Avoiding for a moment the question of what language to consider them, I'd note that they can still be mentioned in etymologies even if they're considered Latin, without saying the e.g. Dutch name is derived from Latin. Lüneburg#German uses the "First mentioned in 956, in Latin, as Luniburc"; a similar approach would be to say something like "from Old High German *Foo,"—(or Old Dutch, or whatever)—"attested in Latin in 632 as Fou". - -sche (discuss) 19:50, 19 March 2019 (UTC)[reply]
I suppose so, but the Lüneburg example is exactly what I'm referring to in this question. To me, it seems weird to treat Luniburc as Latin, it doesn't look at all like Latin to me and has no Latin grammatical endings. —Rua (mew) 19:55, 19 March 2019 (UTC)[reply]

Old Gutnish

[edit]

I don't know if this has been discussed before, but I am wondering what people think of adding Old Gutnish as an etymology-only language, with its parent language being either Old Norse or Gutnish. It shows up in descendants sections of Old Norse entries, and in Gutnish etymologies. However, as I understand, it is a dialect of Old Norse as are Old East Norse and Old West Norse, which do not have their own codes, so I'm on the fence. Julia 04:44, 18 March 2019 (UTC)[reply]

Old West Norse and Old East Norse already mentioned in entries using {{label}} and have categories, so I think it wouldn't hurt to add etymology language codes for them. Jonteemil suggested adding them last year, but nothing came of it. I don't think their absence is a good reason not to add Old Gutnish. — Eru·tuon 05:59, 18 March 2019 (UTC)[reply]
Added. I also added it as a label, like OEN and OWN. The only concern I have is that while not including it at all was clearly problematic, including it as Old Norse may or may not go far enough: some references treat it as its own language. - -sche (discuss) 03:17, 23 March 2019 (UTC)[reply]

Words who letters are in alphabetical order

[edit]

Do we have a category for words (such as "biopsy, almost, chintz") whose letters are in alphabetical order? SemperBlotto (talk) 13:22, 19 March 2019 (UTC)[reply]

I'd expect such a category to be in Category:English terms by orthographic property, but there's only Category:English words that use all vowels in alphabetical order. — Eru·tuon 17:53, 19 March 2019 (UTC)[reply]
I don't know if there is a name for these, I call them "alphagram words". Here is a list of a bunch of terms we already have which qualify. - TheDaveRoss 19:16, 19 March 2019 (UTC)[reply]
Created a list, though not restricted to English entries if that was what you were thinking of, from the latest dump. — Eru·tuon 19:25, 19 March 2019 (UTC)[reply]
Your list is a lot more permissive than mine. You have a slutty list. - TheDaveRoss 19:29, 19 March 2019 (UTC)[reply]
Interesting, your list is more permissive in another way, because it allows letters to be repeated. — Eru·tuon 19:53, 19 March 2019 (UTC)[reply]
Not hard to do this in Module:en-headword. Would Category:English words with letters in alphabetical order be an okay category name? I suppose Category:English words whose letters are in alphabetical order is clearer. The function here is the one I used for the list above. Might want to exclude words with uppercase letters or with at least two consecutive uppercase letters (acronym-like). — Eru·tuon 19:41, 19 March 2019 (UTC)[reply]
Nice lists. What's the longest alphagram word, by the way? Interesting to know according to both ASSes (Alphagram Sluttiness Systems) - the DASI (TheDaveRoss Alphagram Sluttiness Index) and the EASI (Erutuon Alphagram Sluttiness Index). --I learned some phrases (talk) 13:33, 20 March 2019 (UTC)[reply]
For DASI; aegilops (which is what Wikipedia lists as the longest) and affinors are the only 8 letter options. Aegilops has the advantage of not having any repeated letters, so it exists in EASI as well. It is also not plural, so it just feels good as the winner. If capital letters are allowed you can add DDMMYYYY to the 8-letter list, but that isn't a word. - TheDaveRoss 14:15, 20 March 2019 (UTC)[reply]

{{lb|neologism}}

[edit]

Wiktionary:Neologisms doesn't give any guidance on when a neologism stops being one. This came up when I was looking at our entry for Latin@, which has just been added to the OED with citations going back 19 years. Is that long enough to be considered no longer a neologism, or where should we designate the cut-off? Ƿidsiþ 13:11, 21 March 2019 (UTC)[reply]

I agree it'd help to have at least a soft cutoff. I recall we listed thon as a neologism for a long time even though we quoted uses from the 1880s to the 1980s(!) (I see someone fixed that in 2012). I'm having a hard time finding a book that gives a clear definition / cutoff. Simple English WP says "15-20 years" (and cites sources, but I'm not sure they're sources for the cutoff per se), and poking around google books:neologism years, I see many books talking about neologisms from the last 10, 15 or 20 years, but it's not clear if they mean that's the cutoff for when something stops being a neologism, or just the cutoff for what they looked at. Still, 15 years seems reasonable to me (or 20 if we wanna be more conservative). - -sche (discuss) 19:52, 21 March 2019 (UTC)[reply]
Pick a number, take a poll, make a vote. 20 years seems good enough. Actually using {{defdate}}, based on attestation or use on line would be nice. DCDuring (talk) 20:39, 21 March 2019 (UTC)[reply]
I believe the OED has hard cutoffs like "no uses in the past 100 years = obsolete". This of course assumes that someone has actually made a good effort to find historical examples and that the researcher actually has access to a representative corpus to search in. Both of which may be questionable for us. Not to mention that these labels need to be reviewed periodically. DTLHS (talk) 04:45, 22 March 2019 (UTC)[reply]
So yeah I am opposed to this, since it will become another label like {{defdate}} that gets slapped indiscriminately on entries without any supporting evidence. At least now when we say something is obsolete we're not making any more than a vague statement (which is all we're equipped to make). DTLHS (talk) 05:21, 22 March 2019 (UTC)[reply]
Ah, good point; I hadn't interpreted Widsith's question as being about when to add the tag to entries, but only when it's OK to remove it. Perhaps we should, as you seem to be suggesting(?), remove it from all entries and only focus on things like whether a new word is rare. - -sche (discuss) 07:03, 22 March 2019 (UTC)[reply]
Yeah, that's what I meant. But I suppose the broader question is – what's the point of this label? I mean, I can see the point of "protologism" because it basically means "this may not have the citations a word would normally require". But "neologism" doesn't really mean anything except "it's kind of new in an undefined way, but it can still be cited normally", in which case it seems rather pointless. Ƿidsiþ 13:43, 22 March 2019 (UTC)[reply]
I looked over Category:English neologisms and removed ~24 entries that jumped out at me as being from the 1990s or earlier. Removal of the tag from all entries might require more discussion. If the label is kept, we should at least treat its category as a 'check back on' category like the hot word category. - -sche (discuss) 01:41, 23 March 2019 (UTC)[reply]
Recently added to blurb, now removed. That word is older than me, I reckon. DonnanZ (talk) 01:09, 24 March 2019 (UTC)[reply]
Remove all. Not useful information, especially since it is applied so randomly, as when somebody rarely thinks about it. A word being a “neologism” does not say anything to anyone about whether or where he should use it or could encounter it. Fay Freak (talk) 23:34, 23 March 2019 (UTC)[reply]
I agree. It's also used as a context label here, while it says nothing about usage context. It's an etymological detail. —Rua (mew) 18:21, 24 March 2019 (UTC)[reply]
I'm going to tweak [[neologism]]'s usage notes to reflect this, btw, emphasizing the 15-20 year cutoff over "being felt to have always been valid", which isn't even the case for non-neologisms (like "ain't"). - -sche (discuss) 04:27, 26 March 2019 (UTC)[reply]
I think that this label has some of the same pros and cons as {{lb|nonstandard}}. There are elements of subjectivity and authority shopping and the question of when something becomes acceptable, albeit informal or colloquial. But there is also the fact that it is the kind of useful information that people expect from a dictionary. Perhaps we should let volumes like Garner's Modern American Usage go withoutcompetition from us, since we won't have our hearts into doing a good job of it anyway. DCDuring (talk) 12:02, 26 March 2019 (UTC)[reply]
I'm not sure it is really useful information. "Nonstandard" tells you a bit about how you can use a word or can't, about as much as one word can. Many entirely standard words are neologisms, and a lot of slang, no matter how long it's been around, is and always will be slang. The word fanac is mid-20th century in origin, and yet it's still slang for a specific audience.--Prosfilaes (talk) 06:50, 27 March 2019 (UTC)[reply]

Categorize Japanese verbs by their classical conjugations?

[edit]

-- Huhu9001 (talk) 10:44, 22 March 2019 (UTC)[reply]

Language explicitly stated in quotation of use

[edit]

Since recently, some Czech quotations now show "(in Czech)", which I find annoying and unnecessary. Of course Czech attesting quotations are in Czech; what else could they be. An example entry is být na dvě věci. Thoughts? --Dan Polansky (talk) 20:28, 22 March 2019 (UTC)[reply]

I completely agree. —Μετάknowledgediscuss/deeds 20:30, 22 March 2019 (UTC)[reply]
Maybe it's semantically useful to store this information but it's idiotic to display it when the entry is the same language as the citation. Equinox 20:34, 22 March 2019 (UTC)[reply]
Agreed. - TheDaveRoss 21:50, 22 March 2019 (UTC)[reply]
I agree; display should be suppressed by default. Is it intended to represent the language of the work as potentially distinct from the language of the quoted snippet? (E.g., for a mostly-English anthology with one Spanish paper in it?) Then it should be suppressed unless those two languages are different. Otherwise, visible display should be suppressed by default (enable-able by some other parameter?), if not in all cases. - -sche (discuss) 01:33, 23 March 2019 (UTC)[reply]
The template documentation for {{quote-book}} includes |lang= in the “most basic” parameters for both English and non-English quotations, thus encouraging its pointless use. The parameter |worklang= still makes sense in case the book is not in the same language as the quotation.  --Lambiam 19:39, 23 March 2019 (UTC)[reply]
Yes. The worst example, as it appears to me, is “Qur'an (in Arabic).” I don’t even see a single reason to name the language by default. Fay Freak (talk) 23:29, 23 March 2019 (UTC)[reply]
OK, I can change this. The display of (in Foo) has always been there but the difference is I added language codes to the various quotes so they get categorized and formatted properly. Benwing2 (talk) 04:38, 27 March 2019 (UTC)[reply]
I changed this so the annotation is only displayed in two cases: (1) |worklang= is given (in which case |worklang= is displayed); (2) |termlang= is given and is different from |lang= (in which case |lang= is displayed). The former case is intended to handle the situation where the language of the work as a whole is different from the language of the quote, and the latter case handles the situation where the language of the term is different from the language of the quote. Benwing2 (talk) 04:52, 27 March 2019 (UTC)[reply]

Can we now get rid of all these Webster's 1913 requests for quotes?

[edit]

Many years ago - so long that I can no longer find the discussion - I proposed that we get rid of thousands of templates in entries asking for quotes from specific historical figures. An example would be absolution, which includes the following entries:

  1. The form of words by which a penitent is absolved. [First attested around 1350 to 1470.]
    (Can we find and add a quotation of Shipley to this entry?)

...

  1. (obsolete) Delivery, in speech.
    (Can we find and add a quotation of Ben Jonson to this entry?)

When I first proposed getting rid of these, the determination of the community was to keep them on the grounds that these requests hinted at sources of citations for the definitions, and might soon be fulfilled. Years later, I see no sign of that happening, certainly not on any kind of scale sufficient to suggest that the nearly ten-thousand requests will ever be addressed. In the meantime, they are just pollution in the entry, an eyesore, a permanent signifier of incompleteness that falsely suggests to the reader that a specific quotation is required to have the complete definition.

Furthermore, our entries are in no way contingent on having quotes used by another dictionary. For most of the words for which such a template exists, there are thousands of sources to which we can turn to cite the word in general. There is nothing magical about Shipley or Johnson that makes their quotes particularly significant to the meaning of the words, nor do we have any guidance for which specific quote by these subjects the authors of Webster's 1913 may have been referring to in their inclusion of these names.

I therefore again propose that we get rid of these requests for quotes. I would propose that a reasonable alternative to having them in the entries would be to have a bot move all of them to a project page, so that those who are really interested in hunting down these citations can look there to see which entries they are associated with. bd2412 T 20:53, 22 March 2019 (UTC)[reply]

@Aabull2016, you're someone who I've seen actually try to fulfill these requests. DTLHS (talk) 21:06, 22 March 2019 (UTC)[reply]
I would be in favor of, at least, having the templates not display in the entry but simply categorize. Moving them to a project page is even better. - TheDaveRoss 21:48, 22 March 2019 (UTC)[reply]
I very occasionally fill a request when I'm working on a page for some other reason, most recently (yesterday) at [[disadvise]]. DCDuring (talk) 22:39, 22 March 2019 (UTC)[reply]
We might get rid of them from the display (if we can store them elsewhere, if only in HTML comments!) but I strongly disagree with removing that data altogether: otherwise, these senses will be RFVed with no evidence available and probably deleted, whereas they might survive with these hints at how to find them. Legitimate definitions are more important than prettiness of a Web page. Equinox 00:18, 23 March 2019 (UTC)[reply]
@Equinox: Are HTML comments visible in the dumps? Specifically: Articles, templates, media/file descriptions, and primary meta-pages: enwiktionary-20190320-pages-articles.xml.bz2? DCDuring (talk) 04:02, 23 March 2019 (UTC)[reply]
Though I wasn't the one you asked, yes, they are. The pages-articles and pages-meta-current dumps (which have the same format) contain the wikitext just as it appears when editing the source of a page, except that < and > are encoded as &lt; and &gt;. — Eru·tuon 23:04, 23 March 2019 (UTC)[reply]
Thanks. I was just interested in whether the data would be recoverable if Equinox's suggestion was implemented. I have other reasons stated below for wanted these kept as is. DCDuring (talk) 02:31, 24 March 2019 (UTC)[reply]
By the way, you are criticising some sources because they are other dictionaries: that's fine (usage vs. mention rule) but note that a lot (I think a very huge majority) of these requests are not actually references to dictionaries, but rather to real writers using a term. Equinox 00:19, 23 March 2019 (UTC)[reply]
I am not criticizing them because of the source of the quote, but because all of these "hints" are derived from other dictionaries. In other words, another dictionary somewhere is saying "use this as a source", an we are acting as if we are constrained by the decision of that other dictionary to use their preferred source. bd2412 T 15:22, 31 March 2019 (UTC)[reply]
The suggestion above of having the templates display nothing (but be present in the wikitext, and add categories) might be a compromise. However, I'm not really opposed to removing them altogether: the main benefit is that they're a starting point for trying to cite an obscure sense you're not sure is real, but in that case just turn to a free online copy of Webster's old dictionary and look up the entry (and its quotations) there, the way I check old public domain copies of Century to see if they have citations if I'm trying to cite some obsolete word/sense. - -sche (discuss) 01:23, 23 March 2019 (UTC)[reply]

I think the notion of making Wiktionary look superficially like a finished product when, 1., it was founded on the principle of user participation in its construction and improvement, 2., it remains far from being a finished product, and, 3., it is likely to remain unfinished for quite some time is positively wrong-headed. We need to offer more ordinary-user-facing evidence of incompleteness to help lure potential contributors into the process. Those patient enough to cite entries would be particularly desirable. DCDuring (talk) 03:54, 23 March 2019 (UTC)[reply]

Thanks for tagging me in this discussion, @DTLHS. I can appreciate the arguments for and against having these requests visible; however, I believe it would be a terrible loss to remove them altogether. In my own work, they very often provide an entry point into work that becomes much more extensive. The idea of checking *every* obscure sense in a free online copy of Webster's 1913 dictionary is not at all appealing as it would involve a lot of wasted time and chasing down dead-ends. Having worked with a very large number of these requests, I can definitely confirm @Equinox's contention that the vast majority of these are not references to dictionaries. The few that do refer to dictionaries inevitably involve specialized dictionaries (e.g. nautical, technical, agricultural) and can also provide useful information to clarify senses and periods of use. Aabull2016 (talk) 16:34, 23 March 2019 (UTC)[reply]
I have seen it happening. Sometimes even IPs add quotes “as requested”, so it does lure in editors. I have also solved a few of these, working on pages for other reasons. Also this template has been used for other languages already where there aren’t corpora available whence you could easily get other quotes (“thousands of sources to which we can turn”). Fay Freak (talk) 23:25, 23 March 2019 (UTC)[reply]
  • Delete all these I've even seen requests from Shakespeare for words that didn't exist in this time. Requesting a quotation is fine but requesting one from a particular dictionary or Samuel Johnson is too much. —Justin (koavf)TCM 23:31, 23 March 2019 (UTC)[reply]
    • This abuse must be pretty rare and does not say anything about the template in general. It is like arguing Gothic translations should not be allowed because people add Gothic translations for things that did not exist with the Goths. Samuel Johnson is not too much either, one can search the term plus his name or databases of texts of him with the term. But even if Samuel Johnson is not so relevant, there are more interesting use cases, I think about historical uses of plant names difficult to identify. Fay Freak (talk) 23:39, 23 March 2019 (UTC)[reply]
      • One thing that could be done is to remove the requests that are obviously for dictionaries. We could compile a hit list of dictionary-only sources with links to their request categories, and those who are bothered by them could remove them. That would also have the benefit of improving the focus on the most worthy types of requests. Chuck Entz (talk) 00:39, 24 March 2019 (UTC)[reply]

Hey look at me! I'm in bold at the bottom. If consensus says these must go then I would at least like a list of what they were, so I can try to add those senses, which will otherwise be stupidly thrown away. Can someone help me with a bit of botting? Please consider this before doing a blanket deletion. Equinox 04:48, 26 March 2019 (UTC)[reply]

Theoretically all of the senses should exist, right? These requests were added in conjunction with the Webster's 1913 definition. If there are instances where a request exists but no definition that should probably stay, and have an {{rfdef}} added to boot. - TheDaveRoss 12:26, 26 March 2019 (UTC)[reply]
Doesn't appear to me as if there's anything like a concensus. Of course all the *senses* exist; it's the citations that will need to be added to illustrate / confirm them. As mentioned earlier, where those senses are obsolete and/or rare, they are liable to be removed after failing verification requests. It's a large amount of useful information to flush down the toilet. Aabull2016 (talk) 15:04, 26 March 2019 (UTC)[reply]
@Aabull2016 Do we actually know that all the senses exist? I don't know what the criteria for inclusion in Webster's 1913 was, but it is possible that they included senses that actually would not meet our CFI. bd2412 T 15:26, 31 March 2019 (UTC)[reply]
We know that the professional lexicographers thought that readers might need the definitions, possibly just to understand the way the author named used it. They had the notable-work attestation criterion for inclusion. (What were they thinking!?!?!?) In some cases they were just a century or more closer than we are to the use of the definition. I think we need their help to understand some of the older meanings of words. We could use dated citations to help use know in what time period to look for citations, especially for uncommon, dated (etc) definitions of polysemic words, which can be very hard to cite. Or we could just risk COPYVIO and use the OED's citations. DCDuring (talk) 17:55, 31 March 2019 (UTC)[reply]
The First Edition of the OED up to N should be in the public domain worldwide, and up to Th in the US.--Prosfilaes (talk) 02:43, 1 April 2019 (UTC)[reply]
@BD2412 “it is possible that they included senses that actually would not meet our CFI?” Did you have any specific senses in mind? Of the many instances I’ve worked on, I have not run into any cases of this. As DCDuring rightly points out, in many cases it would be extremely time-consuming and difficult to chase down citations without clues such as those provided by Webster’s 1913. Aabull2016 (talk) 03:36, 1 April 2019 (UTC)[reply]
So far as I have seen, Webster's gives a clue for a single citation for each word, whereas our CFI requires three citations. If the senses with these templates were taken to RfV, then that clue might be useful for finding one cite, but we are still on the hook for the other two. bd2412 T 04:33, 1 April 2019 (UTC)[reply]
...while, for senses added randomly by some anon Internet drive-by, we have no proof whatsoever. At least Webster gives us a clue to ONE of them. Equinox 05:33, 1 April 2019 (UTC)[reply]
I have recently encountered some interesting use cases in Latin entries, so in quodsī and sonīvius. And for obnūbilus Ennius is the only author, hence it can’t be too much to request a quote of a particular author. And see for plant names which sometimes had particular uses in Arab Spain Category:Requests for quotation/كتاب عمدة الطبيب في معرفة النبات لكل لبيب. Those occurences in Andalusi authors mentioned by sigles in this Andalusi plants glossary should ideally be all quoted, hence requests. Fay Freak (talk) 15:26, 26 March 2019 (UTC)[reply]
To revise my earlier stance since we have multiple people who are working on this, "leave them but maybe make them invisible" seems like a decent compromise if that's possible. Perhaps even make them only visible for people who opt in. That way people who want to add them can. Or we could of course just try and fulfil them all... - -sche (discuss) 17:00, 26 March 2019 (UTC)[reply]
Is there really a consensus that we should conceal this and other evidence of the incompleteness of Wiktionary? Why aren't we advertising all the incompleteness to try to lure more contributors?
Are we really going to a hidden category for each author? Why bother to hide these categories anyway? DCDuring (talk) 20:13, 26 March 2019 (UTC)[reply]
We already have a category for each author...? - -sche (discuss) 20:21, 26 March 2019 (UTC)[reply]
As I understand it, categories and static lists are good if many members are involved. Dynamic lists with many members are slow and resource-intensive because the search is repeated fairly often as one works through the list. I believe that the list is not updated after every relevant change or, if it is, the lag can be a matter of minutes. DCDuring (talk) 11:56, 27 March 2019 (UTC)[reply]
@Equinox: If it comes to that, I have a program that can grab all instances of {{rfquotek}} from the dump. Something like this, though you'd need more of the text of the page; at the very least the definition above the template. — Eru·tuon 20:44, 26 March 2019 (UTC)[reply]
It seems to me that the logical solution, if we are to remove these requests from entries, is to move them to the Citations page. Andrew Sheedy (talk) 23:43, 1 June 2019 (UTC)[reply]
The information these requests provide seems useful but the presentation might be improved. Thus, instead of
(Can we find and add a quotation of Ben Jonson to this entry?)
it could say
(Per Webster 1913, a quotation by Ben Jonson supports this sense. Can we find and add it?)
Admittedly, it is more wordy. Someone might come up with a better idea. Unless people find these items annoying (I guess some do), they could stay, I think. And above there is at least one editor that finds them useful. The requests would probably not be so easy to discover (hit upon) in the Citations namespace. Let me point out these are not the kind of information-free requests like RFP that some people like to put into our entries, or let us recall all those requests for translation in our translation tables. Some numbers: insource:/\{rfquotek/ finds 9,359 pages; insource:/\{rfp/ finds 5,043 pages; insource:/\{t-needed/ finds 11,749 pages; insource:/\{rfe/ finds 36,358 pages. --Dan Polansky (talk) 07:30, 2 June 2019 (UTC)[reply]
I would agree with moving these to the citations page. For many entries, that would create the citations page. The notation would then only present itself to readers who are actually looking for a citation. However, I still find it problematic that the "clue" appears like some cryptic puzzle, particularly where the name offered is only an obscure surname. For example, in tacking, "Can we find and add a quotation of Bouvier to this entry?", "Can we find and add a quotation of Kent to this entry?" - is that Jacqueline Bouvier and Clark Kent? bd2412 T 13:21, 2 June 2019 (UTC)[reply]
See W1913 appendix for a list mapping short names to full names. I believe we have a copy of that in our own appxes somewhere. Equinox 13:29, 2 June 2019 (UTC)[reply]
The average reader, of course, will have no clue about that. I didn't know about that until you just said it here. bd2412 T 18:19, 2 June 2019 (UTC)[reply]
Why not just expand these names then. Moving them makes the request only less likely to be solved. What DCDuring above said: “Why aren't we advertising all the incompleteness to try to lure more contributors?”
Also why still do as if the template is only used for Webster 1913? The documentation does not say this and the US is not the center of the world.
Also whoso is offended by requests can perhaps solve them. Fay Freak (talk) 15:22, 2 June 2019 (UTC)[reply]
Well let me ask this, then. Suppose I go and find a half dozen quotes for one of these entries, but none of them come from the source cited by Webster 1913. Is our entry therefore still incomplete? Our these entries incomplete in some lexical sense unless and until they include a specific quote hinted at in Webster 1913? bd2412 T 18:21, 2 June 2019 (UTC)[reply]
I object to people removing the W1913 requests and not supplying 3 citations (of any kind), because then the sense may be entirely unsupported and fail RFV, where otherwise we would have at least some initial evidence for it. W1913's cites are great too, good old classical ones and not Spongebob :) but I don't mind if you find three other cites instead, or simply leave the W1913 alone. Equinox 18:39, 2 June 2019 (UTC)[reply]
We sometimes claim to be a historical dictionary, among other things. To make good on that claim we should have older citations that support some of the older wording of some of our definitions. There was a recent TR discussion of only which seemed to suggest that only was formerly used with meanings we do not currently distinguish from its other senses. The older citations that would be cavalierly deprecated by deletion or exile to the citations page, to an Appendix, or, heaven forfend, a user page are useful to help free us from the shackles of the present. If we would like to stop claiming to be a historical dictionary, we should eliminate our coverage of all dead languages, like Latin, Middle English, Eqyptian, Sanskrit, etc, and forbid any new attestation from before the current millennium. DCDuring (talk) 19:09, 2 June 2019 (UTC)[reply]
We can use older quotes that are not the quotes hinted at by Webster (and, in fact, even if we find a quote by an author referenced by Webster, we may not ever know if that was the specific quote from that author that Webster intended). My point is that there is nothing lexically deficient about a definition that doesn't have a Webster 1913-approved quote, any more than there is anything lexically deficient about a definition that doesn't have a quote approved by Oxford's. bd2412 T 00:07, 3 June 2019 (UTC)[reply]
Of course not. But, having tried to cite some obsolete senses, some with and some without suggestions from other dictionaries, I'd like to have all the help I can get from lexicographers who were better read in older literature than I and closer in time to the usage. And I'd like to recruit as many as possible from our users who might find it interesting to go after such citations. I'd be in favor of putting displays beneath ALL of our uncited definitions calling on our users to find attestation. I only wish we had a budget for bounties for such work by new contributors. DCDuring (talk) 00:20, 3 June 2019 (UTC)[reply]
  • I suspect that you guys were spending too much time whining about these, and about the general and eternal incompleteness on WT. Not that I'm bragging (OK, I kinda am, lol), I just added about 300 quotations using these clanup tags - the tags are extremely useful, especially for obsolete terms. Searching stuff like stramazoun brings up easy hits. When it comes to common words with obscure senses, like for say meaning "To try; to assay.", it's a bit trickier, but my citation-searching machine hasn't been cranked up to 11 for some time... Jonely Mash (talk) 21:04, 5 January 2021 (UTC)[reply]

Eliminating the difference in formatting between no-etymology, single-etymology and multiple-etymology entries

[edit]

Right now, the formatting of pages with regard to etymologies is an inconsistent mess. To me, the real problem is how etymologies aren't nested under the term they describe, but I'll not go into that any further. As of right now, there are roughly three different formats when it comes to etymologies:

  • Entry with no etymology at all. POS sections are at level 3.
  • Entry with a single etymology. POS sections are also at level 3, so at the same level as the etymology section and not nested within it.
  • Entry with multiple etymologies. POS sections are now at level 4, nested under the etymology section they belong to, and everything else has to be bumped up a level too.

This is rather inconsistent. We nest POS sections under the etymology section when there are multiple etymology sections, but not when there is only one. This has been a nice breeding ground for headings with incorrect levels, and it's pretty bad that you have to re-nest the entire entry whenever you go from one to two etymologies. This is frustrating and pointless and there needs to be a better and more consistent way of doing this.

A first possibility is to change single-etymology pages to use the same format as multiple-etymology pages. This means that the POS sections are at level 4 nested under the etymology section, regardless of how many etymology sections are in the entry:

==English==

===Etymology===

====Noun====

=====Derived terms=====

However, this really just shifts the inconsistency around rather than eliminating it, because there are still the pages without an etymology to account for. The POS sections can't be at level 4, because there is no level 3 heading to nest them under. Having a section more than one level below its immediate parent is undesirable. Moreover, people have rightly complained in the past that the difference in heading size between levels 4 and 5 is not easily visible (or not visible at all). The conclusion I draw from this is that it is necessary to eliminate heading level 5 altogether.

That brings me to the second possibility: POS sections always appear on the same heading level as the etymology section. This is like we currently do for single-etymology and no-etymology entries, but now we extend that format to entries with multiple etymologies as well.

==English==

===Etymology 1===

===Noun===

====Derived terms====

===Etymology 2===

===Adjective===

====Derived terms====

This has the advantage of not only eliminating level 5 headings, but it also means that every heading has exactly one level that it's allowed to appear as. POS sections will always be level 3. Inflection, Derived terms, Descendants etc will always be level 4. There will no longer be a need to re-nest the sections; once a section is added, it can always stay at that level no matter how the entry is changed in the future. We could also decide to remove the numbers from the Etymology sections. They don't really serve a purpose after all, and we don't number any other section that way.

While the second possibility definitely has my preference over the current format, it's not without its downsides too. It eliminates the use of nesting as an indicator of what goes with what, instead substituting it with a rule that says "the first etymology section above a POS is the one belonging to that POS". Since we already do exactly that for Pronunciation sections as well as single-etymology entries, it's not a huge problem, but it's not the nicest layout either. Moreover, it encourages editors to add new POS sections that automatically subsume the existing etymology section, whether that is correct or not. A lot of editors, especially inexperienced ones, don't pay particular attention to such tricky details, they just want to add content, which we should applaud them for. It's the same situation that causes Synonyms sections to become incorrect when new senses are added but no {{sense}} template has been added to the synonyms.

For that reason, I still think that the best solution is to have etymology and pronunciation at level 4, nested under the POS they apply to, rather than the current situation. But if people won't agree to that, I think either of these proposals is still better than what we have now, proposal 2 in particular. —Rua (mew) 17:47, 24 March 2019 (UTC)[reply]

I guess I could support this. I would like to see some CSS work done to make sections more visually distinct somehow, if it's possible. Such as automatically indenting everything "under" a single level 3 Etymology header. DTLHS (talk) 18:14, 24 March 2019 (UTC)[reply]
That's not possible, because our HTML output doesn't actually contain any sections. In the HTML, it's simply a series of a header followed (without nesting) by paragraphs of text. So there is no notion of the sections "containing" the subsections. —Rua (mew) 18:18, 24 March 2019 (UTC)[reply]
You're right, it would have to be a JS thing. DTLHS (talk) 18:28, 24 March 2019 (UTC)[reply]
I do use level 5 where necessary, but it has the same font size as level 4, which makes it difficult to distinguish. DonnanZ (talk) 18:23, 24 March 2019 (UTC)[reply]
I like this (specifically I like etymology subordinate to the POS sections), and further I like the hinted at notion of etymology being subordinate to the definition lines (and probably subst'ed in there so that a single etymology could be applied across many definitions without having to keep it in sync. - TheDaveRoss 23:14, 24 March 2019 (UTC)[reply]
All of these are improvements. I have a strong preference for proposals in which the presence or absence of an etymology section does not affect the level of the other headings. Conceptually the simplest is to include the etymology sections in the headings after the definitions. That leaves an issue, however: how to deal with the frequent cases where different parts of speech share an etymology, like for abrupt or wash.  --Lambiam 06:33, 25 March 2019 (UTC)[reply]
I would argue that those terms don't actually share an etymology, we're just being sloppy and pretending they do. Imagine that the adjective and abrupt and the verb abrupt were spelled differently, and thus put on separate pages. Would the etymology section for both of them be exactly the same? I doubt it; the verb would have an etymology section indicating it is derived from the adjective in some way. The same should be done even if the spelling happens to be the same. So really, they don't have the same etymology, the verb is actually missing its etymology. The same for the noun of course. See Wiktionary:Beer_parlour/2018/November#Per-lemma_etymologies for a previous discussion that never went anywhere. —Rua (mew) 18:34, 25 March 2019 (UTC)[reply]
I agree that some terms for which we show just one etymology, like nouns and verbs in English especially, really should show two different etymologies (pinging Equinox because I recall he's a fan of merging these types of etymologies), but I am not so sure about Ancient Greek words that happen to be both an adverb and a preposition, or a pronoun and a determiner. I suppose we could try to figure out which part of speech was earlier and have the full etymology there, with the other etymology saying "derived from the part of speech word". I seem to recall that prepositions are thought to usually derive from adverbs in Ancient Greek for instance. That would involve some work. It is simpler to show just one etymology. — Eru·tuon 20:33, 25 March 2019 (UTC)[reply]
@Rua, Erutuon About that, see Wiktionary:Etymology scriptorium/2017/July § legitimate. I wrote then that "The etymologies aren't exactly the same, since the verb comes from the adjective by conversion. It probably doesn't warrant two headers though". I have changed my mind, and actually agree with Rua that there being two - however slightly - different etymologies does warrant two headers. ChignonПучок 20:52, 25 March 2019 (UTC)[reply]

I realised I forgot to discuss the Pronunciation section with regard to this, the placement of which is even less consistent. When there is one etymology, it goes below the etymology section, but when there are multiple etymology sections it goes above the first etymology. Again, messy and inconsistent, and thus prone to errors (I had to fix one such error just now, diff). Moreover, sometimes we nest pronunciation sections under etymology sections (at level 4) if each etymology has a different pronunciation. As the relationship between pronunciations, etymologies and POS sections become more complex, the entry structure itself becomes more complex. What about an entry with multiple etymologies, where one etymology contains multiple POS sections that each have their own pronunciation? Or where there are 3 etymologies but only 2 pronunciations? The second proposal above would somewhat solve this, in that there is no complicated nesting but the POS simply subsumes the nearest pronuncation above it. But it can lead to this weird structure:

==English==

===Etymology===

===Pronunciation===

===Noun===

===Etymology===

===Adjective===

===Pronunciation===

===Verb===

Is it clear at all which POS has which pronunciation? Following the rule of "nearest section above", the noun goes with the first pronunciation and the first etymology, the adjective goes with the first pronunciation and the second etymology, and the verb goes with the second pronunciation and the second etymology. Not very clear if you ask me. This, again, is why I prefer the structure in which only POS is at level 3. Then it's immediately clear what belongs to what:

==English==

===Noun===

====Etymology====

====Pronunciation====

===Adjective===

====Etymology====

====Pronunciation====

===Verb===

====Etymology====

====Pronunciation====

With this structure there is no complicated nesting, everything is clear and every section has the same level everywhere. The downside of this method is of course potential duplication of the pronunciations, but it seems that's unavoidable without going back to a weird nesting structure again. Duplication of etymology is not really an issue, because as I mentioned above and in past discussions, no two terms actually have the same etymology. —Rua (mew) 20:03, 25 March 2019 (UTC)[reply]

I have always assumed that Etymology was meant to be the word's etymology and not the sense etymology so whether the noun mutated into an adjective or the adjective into the verb is of little concern. Trying to document it seems like splitting hairs. However, if you are making this change, how would you handle multiple etymologies for the same POS? Would you have, say, "Noun 1" and "Noun 2" or put both etymologies under one "Noun" heading? The latter would muddy the waters further if there were also adjective or verb POSs for those multiple etymologies so you end up with something like the 2nd adjective etymology is derived from the 3rd noun etymology. -Mike (talk) 18:34, 26 March 2019 (UTC)[reply]

We would have separate noun headings for each noun. We would not number them, as the numbering isn't actually useful and is subject to change anyway. Instead, we'd clarify which one we mean using glosses and {{senseid}}, like we already do currently.
As for "splitting hairs", I think I'd prefer being correct over being accused of doing that. From a linguistic point of view, the conversion of one POS into another without any change in the lemma form is no less notable than the conversion of one POS to another with a change in lemma form. Moreover, it's actually possible to choose the lemma form in such a way that identical terms are no longer identical, or the reverse. Imagine that we chose the imperative form for Dutch verbs instead of the infinitive. Then, verbs would have the same lemma form as the noun or adjective they might derive from. Would that in itself make a difference in how much etymological detail we give to the user? Of course not. Likewise, if history had chosen to declare the past tense of English verbs as the lemma form instead of the infinitive like now, then we'd be forced to put them on different pages and give them separate etymologies. If we then simply left the etymologies of the derived terms blank, it would be a disservice to our users and nobody would stand for it. In the same way, it does a disservice to the user to not include such information merely because the lemma forms happen to be identical. I therefore stand by the position that the question of whether we provide a separate etymology for a term should be independent on whether the lemma form of the term is identical to another. —Rua (mew) 21:39, 26 March 2019 (UTC)[reply]

I also sympathize with this proposal, “in which the presence or absence of an etymology section does not affect the level of the other headings”. Less strain for the eyes, less wasted time for resorting, easier botting. What will we do though with language sections containing multiple etymologies where some etymology is not written but it is self-evident from the gloss that it is to be found at a further link, e.g. we have some complicated etymology 1 and an etymology 2 which is merely a lame alternative form of some other term one does not necessarily even lose words about in the etymology section for example because it would be a duplication of what is in the gloss or in the main form where all alternative forms are explained? For if we just reassign levels there will be empty etymology sections, so apparently we need additional text at some places. For form etymologies we use {{nonlemma}} which refers us to a “main entry” – if the same is to be used in alternative forms entries {{nonlemma}} has to be renamed because this template shall be used independently of whether something is non-lemma but depending on whether one refers aliō. Fay Freak (talk) 14:37, 28 March 2019 (UTC)[reply]

This is an advantage of nesting etymology under POS. In the current setup, as well as the first two possibilities above, etymology still drives the whole structure of the entry, so we must include it for the sake of structure regardless of whether we want to. We've had to create silly templates like {{nonlemma}} to have something to put into those etymology sections that we'd rather not put anything in. If we nest etymology under POS, then we can just omit the section, and nothing in the page structure will break as a result. Instead, POS will drive the entry structure, which makes a lot more sense to me. —Rua (mew) 22:36, 28 March 2019 (UTC)[reply]
I’d rather have the setup where neither is nested under neither. Etymology level 3, POS level 3, Etymology level 3, POS level 3. For such cases like poucave where you can’t know whether noun or verb came first, plus the etymologies are often written in a fashion independent of the part of speech (including the frequent “from the root”, without there being a need to tell anything further about the the derivation type of a certain pattern or POS: if form II has a causative meaning you don’t need to write in an extra section that it is a causative, it generally is, and the Appendix:Arabic verbs is linked by the “II” in the headword). Without nesting by the mere order it should be clear to the averagely observant reader what belongs to what. According to the current layout POS comes under the etymology, so the reader holds it the same if it is sequentially under the etymology, but not nested under it. Nesting etymology under POS would be an unnecessarily great change, given this easier way to achieve a state “in which the presence or absence of an etymology section does not affect the level of the other headings”. Just this minor issue with then it being more needed to write something under the etymology headers like with {{nonlemma}}, but maybe not even since people will learn that the etymology sections themselves do not need to contain anything but are there without anything to signify that “here starts a different etymology”. PLUS amn’t I right to observe that this second variant with etymology under POS needs manual care while the conservative variant only needs to have etymology numbers removed and the level promoted which would be done by bot? Fay Freak (talk) 13:03, 29 March 2019 (UTC)[reply]
If it needs manual care, then that implies it's not machine readable and that's a problem in itself. Nesting etymology under POS has the least potential for mistakes, the most potential to spot mistakes, and it is immediately understood what goes with what. Your idea of empty etymology sections, just as a signal that the next POS does not belong to the previous etymology, is exactly what is broken about the current etymology-driven approach, and sadly neither of the proposals above solve that issue. At least with POS at level 4 you can still speak of the etymology section "containing" its subsections in some way, which does not apply when both etymology and POS are at level 3. Then you just have an empty etymology section that's prone to be removed by other editors because it appears to be useless. Moreover, not all editors will be aware that a new POS they insert already has an etymology section that will automatically apply to it, leading to errors. With nesting etymology under POS, that kind of mistake becomes impossible. The goal is to be more explicit while also reducing the number of different formats an entry can have. Both proposals above reduce the messiness of entries, but only etymology under POS can properly solve the problem. —Rua (mew) 17:17, 29 March 2019 (UTC)[reply]

Redundant messages on new talk pages

[edit]

Why do we have two almost equally large and visible warnings at the top of the page when you create a new talk page? Here they are, for comparison:

  1. NOTE! Wiktionary's talk pages are usually not regularly followed by other editors. If you want to discuss this entry, please go to Wiktionary:Tea room instead, where more people will see your message. For general questions, please leave a message at Wiktionary:Information desk.
  2. Talk pages of individual entries are not usually monitored by editors, and messages posted here may not be noticed or responded to. You may want to post your message to the Tea Room or Information desk instead.

Can we please get rid of one of these? The more warnings there are to read, the more I imagine new users choose to ignore them. Ultimateria (talk) 17:44, 25 March 2019 (UTC)[reply]

I removed the more garish message from MediaWiki:Newarticletext. - TheDaveRoss 21:15, 25 March 2019 (UTC)[reply]
It looks so much better! Thank you. Ultimateria (talk) 15:21, 26 March 2019 (UTC)[reply]

Remove lemmas from Korean hanja and Vietnamese Hán tự entries

[edit]

I suggest to strip the Korean and Vietnamese Chinese character entries of their lemma status (also any PoS and topical categories) Category:Korean Han characters and Category:Vietnamese Han tu. The same would apply to Zhuang and some other languages where the main form is not written in Chinese characters. It may take longer to remove the topical categories but it's the right thing to do.

A simple example: 한자 (hanja), written in hangeul is a lemma and a noun and 漢字, written in hanja is its hanja form and should only belong to Category:Korean Han characters. --Anatoli T. (обсудить/вклад) 23:48, 25 March 2019 (UTC)[reply]

If they are not lemmas, then what are they a form of? Non-lemmas are always a form of something else. —Rua (mew) 21:49, 26 March 2019 (UTC)[reply]
@Rua: It's a special case for these languages. They are not the main spelling form, consider them soft redirects, like transliterations, e.g. Category:Mandarin pinyin. We also agreed with User:Benwing2 to strip Category:Russian spellings with е instead of ё of the lemma status. It's OK to keep add them to language name non-lemma forms categories. --Anatoli T. (обсудить/вклад) 22:05, 26 March 2019 (UTC)[reply]
@Atitarev I agree with you. These are alternative spellings, similar to transliterations, and not lemmas. Benwing2 (talk) 00:57, 27 March 2019 (UTC)[reply]
How are they like transliterations if they have been used by principle historically? Why not decategorize Russian pre-1918 spellings then? And why not decategorize Serbo-Croatian Cyrillic spellings if you are that far, since these basically double the category entries? Plus they aren’t non-lemma forms even if you remove the lemma category. Those Russian spellings like актер (akter) are categorized neither as lemma nor as non-lemma. These Russian entries are sorted as Category:Russian spellings with е instead of ё, hence it would make sense to have entries for “Vietnamese lemmas in Chinese characters” resp. “Korean lemmas in Chinese characters” as distinguished from the “normal lemmas”, or similar. Lemma—non-lemma is a false dichotomy, this is a tertium, as has already been conceded by categorizing the Russian е-instead-of-ё-entries as neither. Fay Freak (talk) 14:24, 27 March 2019 (UTC)[reply]
Yes, they are neither lemma nor non-lemma forms. Benwing2 (talk) 15:16, 27 March 2019 (UTC)[reply]
I suggest having these: (1) Category:Korean terms in Han script and (2) Category:Vietnamese terms in Han script. KevinUp (talk) 08:55, 28 March 2019 (UTC)[reply]
SupportΜετάknowledgediscuss/deeds 01:58, 27 March 2019 (UTC)[reply]
Support KevinUp (talk) 12:18, 27 March 2019 (UTC)[reply]
The following PoS categories are available for Korean Han characters:
  1. Category:Korean nouns in Han script
  2. Category:Korean proper nouns in Han script
  3. Category:Korean adverbs in Han script
  4. Category:Korean pronouns in Han script
The following PoS categories are available for Vietnamese Han characters:
  1. Category:Vietnamese nouns in Han script
  2. Category:Vietnamese proper nouns in Han script
  3. Category:Vietnamese adjectives in Han script
  4. Category:Vietnamese verbs in Han script
  5. Category:Vietnamese adverbs in Han script
  6. Category:Vietnamese idioms in Han script
  7. Category:Vietnamese proverbs in Han script
For Vietnamese, there's also (1) Category:Vietnamese Han tu, linked by either {{vi-readings|hanviet=}} or {{han tu form of}}, and (2) Category:Vietnamese Nom, linked by either {{vi-readings|nom=}} or {{Nom form of}}.
However, I would prefer to have a separate category for single character entries:
  1. Category:Korean Han characters for single character hanja.
(This category currently contains both single character hanja and hanja compounds)
  1. Category:Vietnamese Han characters for single character Hán Nôm (both chữ Hán and chữ Nôm).
(This category currently contains only single character entries provided by deprecated {{vi-hantu}} that will be deleted soon)
What does the community think of having something like (1) Category:Korean terms in Han script and (2) Category:Vietnamese terms in Han script for compound word entries, after stripping them of their lemma status?
The reason for this is because I currently use incategory:"Korean lemmas" intitle:中 [1] and incategory:"Vietnamese lemmas" intitle:中 [2] to search for derived terms of Korean and Vietnamese . KevinUp (talk) 12:18, 27 March 2019 (UTC)[reply]
I don't think we need to make a distinction between single-word and compound-word entries, it's enough just to have a single category for all such terms with Han characters. Benwing2 (talk) 15:16, 27 March 2019 (UTC)[reply]
Single character terms in Sino-Xenic languages are notoriously difficult and extra efforts will is always required to provide disambiguation on endless homophones or homographs (in the main script), e.g. the Korean syllable/word/component (na) and a list (incomplete) of hanja with the same reading: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , etc. These ARE required for disambiguation but they are not the main form or lemma. --Anatoli T. (обсудить/вклад) 03:26, 28 March 2019 (UTC)[reply]
And as for Vietnamese Han characters, we need to further distinguish between chữ Hán (literary Chinese characters) and chữ Nôm (characters used for native Vietnamese). Also, a distinct category would help to monitor edits done on Korean/Vietnamese single Han character entries. KevinUp (talk) 08:55, 28 March 2019 (UTC)[reply]
Take a look here: Special:Contributions/27.3.73.64. The edits are correct (incorrect readings have been removed), but most Nôm readings have also been removed. I'll clean it up later but the point is, we need a separate category for single character entries, which are prone to all kinds of discreet vandalism due to it being rarely used. KevinUp (talk) 10:48, 28 March 2019 (UTC)[reply]
I agree we need a separate category for chữ Nôm characters. It doesn't need to have lemmata either. --Anatoli T. (обсудить/вклад) 00:12, 29 March 2019 (UTC)[reply]

Making etymological derivations more specific, retiring {{der}}

[edit]

When {{der}} was first created, it was just a generic replacement for {{etyl}}. But since then, we've created {{inh}}, {{bor}} and {{calque}}, all of which categorize more specifically. This has relegated {{der}} to the role of indicating "other" derivations. From what I can tell, though, it really only indicates three things:

  1. Indirect derivations, i.e. cases where a term was taken into an intermediate language, from which the term was then directly inherited, borrowed or calqued.
  2. Morphological derivations, especially from roots. These are really, in theory, instances of the other three; *hundaz is inherited from a term in some ancestral Pre-Germanic Post-PIE language we may call X, which itself was derived within that language from a descendant of *ḱwṓ. The actual sequence of events is then really PG *hundaz <inh (intermediate term in X) <affix (intermediate term in X) <inh PIE *ḱwṓ. The reason we use {{der}} here is that we don't know the intermediate term, the language in which it existed, nor the derivational morphology by which one was derived from the other within X.
  3. Cases that really should be labelled with one of the more specific templates, but where the editor just converted {{etyl}} to {{der}} without further thought.

I propose that we close the gap further by creating new templates specifically for the first two of these roles. I believe that this would then make the derivation-from-another-language templates/categories exhaustive, and then we may be able to retire {{der}} altogether once the erroneous uses of the third type are cleaned up. The category for "terms derived from" would remain, but only as a parent category for the more specific ones to be categorised in.

We may further distinguish between indirect borrowings and indirect inheritance, depending on what the relationship in the intermediate language is. While our current practice for {{bor}} is to use it only for borrowings into the current language, I have begun to think that this may be a mistake. English is inherited from Middle English, and so in a sense, it inherits borrowings from it as well. Distinguishing terms borrowed into English from terms borrowed into Middle English and then inherited doesn't make that much sense in practice; they are both terms that were borrowed sometime during the long unbroken chain of inheritance stretching from modern English back to pre-Indo-European times and beyond. So perhaps we change one of two things here:

  1. Label modern terms that are inherited from ancestral borrowings as both "English terms derived from (ancestor)" and "English terms borrowed from X", ignoring the fact that the borrowing language was not modern English but its ancestor.
  2. Refine this concept in the form of "English terms borrowed from X inherited from Middle English". Then we can still distinguish English terms borrowed within English from terms borrowed into Middle English, Old English, Proto-Germanic and Proto-Indo-European as appropriate.

The first solution is easier, but the second conveys more information to our users. If we do either of these things, then what remains of indirect derivations consists only of indirect borrowings, so we can label them as such.

Wiktionary is often praised for its etymological content, so this could be a way to reinforce an existing strength. —Rua (mew) 02:11, 28 March 2019 (UTC)[reply]

If possible, can we use a bot to convert existing {{etyl}} to {{der}} before ultimately retiring {{der}} itself? KevinUp (talk) 09:18, 28 March 2019 (UTC)[reply]
That would just increase the number of erroneous uses of {{der}}. —Rua (mew) 12:06, 28 March 2019 (UTC)[reply]

1:
So {{inhbor|entry language|source language}} for inherited borrowings?
2:
Ergo {{inhaf|lingua lemmatis|lingua quae affixit|lingua radicem impetrans}}, e. g.: {{inhaf|de|gem-pro|ine-pro|*hundaz|*ḱwṓ|af=affixum quod cognoverimus}}?
3:
> pretending that with further thought, even with ideal literature acquaintance, one would always know the nature of the derivation
Maybe one needs a special template signifying “I specifically used this template because I don’t know the exact derivation”, which could be {{der}} but isn’t it as this has been used now with ineradicable different signification. There are also cases of a word claimed to be X, and claimed to be Y, which are incompatible but all categorize fully. Fay Freak (talk) 14:19, 28 March 2019 (UTC)[reply]

Oppose retiring {{der}}. An interesting idea to become more specific, but the end goal is not one that I see as desirable. Besides unknowns, as Fay Freak mentioned, there are other issues like words in creoles derived from their lexifiers or Yiddish words from Hebrew — you would probably subsume those as a borrowing, but it is not how those linguistic traditions consider these words, and we use {{der}} as a result. —Μετάknowledgediscuss/deeds 14:41, 28 March 2019 (UTC)[reply]
Being able to retire {{der}} is not the end goal, it's only the logical consequence if we have replaced it in every case by something more specific. All I did here is lay down some of those specifics, where they lead is another matter. It shouldn't be a reason for objecting to the first sensible step merely because you don't like where it might lead. —Rua (mew) 22:33, 28 March 2019 (UTC)[reply]
And I'm not objecting to the first step of creating specialised templates. But I pointed out multiple examples of {{der}} being used in ways you seemingly hadn't considered. (They might need more specialised templates, in fact.) —Μετάknowledgediscuss/deeds 23:31, 28 March 2019 (UTC)[reply]
Yes, I figured there would be cases I hadn't considered. I did say "from what I can tell" after all. —Rua (mew) 23:35, 28 March 2019 (UTC)[reply]
  • Keep der, avoiding forcing contributors to be more specific than their knowledge affords. Furthermore, I consider this whole inherited vs. borrowed business in our etymologies to be rather unimportant and nuisance, and I would be happy to see it removed, which is not going to happen but anyway. --Dan Polansky (talk) 19:26, 29 March 2019 (UTC)[reply]
  • Keep: I agree that people have been a bit lazy using {{der}} in cases where they could be using {{af}}, etc., but I see no reason to get rid of {{der}}. --{{victar|talk}} 05:09, 30 March 2019 (UTC)[reply]
  • I can see the etymology templates heading in the direction of the old category-boiler templates, where only a few specialists will know the right template to use. Sure, a few obvious cases like {{inh}} and {{bor}} save some typing, but if we try to make a template for every possibility, you start running into etymological spaghetti like pikake and creme anglaise. If you want to start getting complex, add parameters to {{der}} so all that complexity is explained on one template's documentation page. As for your other idea: I don't think both mister and magister or script and shrift belong in Category:English terms borrowed from Latin. Chuck Entz (talk) 06:01, 30 March 2019 (UTC)[reply]
    Yes, I seem to recall that one reason we decided back when {{bor}} was adopted to only use it for borrowing by the L2 language is that it would feel odd to say e.g. English iron was "borrowed from Proto-Celtic" as if the two languages were coeval or modern English speakers intentionally adopted a dead language's word like they did with ceorl or ghrelin. I grant that that's subjective and other people feel otherwise; I also grant that happenstance does lead to 'inconsistency' when a word that Hebrew borrowed 3,000 years ago would be considered to be "borrowed" into Hebrew just as much as a word borrowed today, because we don't split that language up by age.
    I'm really unconvinced that "English terms borrowed from X inherited from Middle English" would be a good idea. - -sche (discuss) 05:10, 31 March 2019 (UTC)[reply]
  • Keep per Chuck. DCDuring (talk) 15:55, 30 March 2019 (UTC)[reply]
  • I must keep use of der. A word may be directly borrowed from language A, but indirectly derived from another language B, that we cannot use either bor or inh to indicate B. Sometimes, we do not know if a word is directly borrowed from language A or not. So we cannot use bor in this case. --Octahedron80 (talk) 05:28, 31 March 2019 (UTC)[reply]

Unicode images

[edit]

An image of Unicode characters must show up together with the actual characters in their entries so that they can be seen regardless of browser, fonts etc. For example, no image of U+2053 appears either in swung dash nor in --Backinstadiums (talk) 19:26, 28 March 2019 (UTC)[reply]

Not replace, but possibly supplement. Equinox 19:30, 28 March 2019 (UTC)[reply]
@Equinox: I've edited OP. --Backinstadiums (talk) 19:34, 28 March 2019 (UTC)[reply]
Then do it. There doesn't seem to need to be policy here, just add images where you feel they are helpful.--Prosfilaes (talk) 00:34, 29 March 2019 (UTC)[reply]

I'm leaning against this: how do we decide which characters to show as images? All of them? —Justin (koavf)TCM 19:36, 28 March 2019 (UTC)[reply]

Welcome, foreigners

[edit]

Many Wiktionaries have a Wiktionary:Welcome, newcomers page. But for Swedish Wiktionary, I also added a Welcome page in English for those who don't speak much Swedish (yet). I think they can still make useful contributions. Why don't you try it. I'd welcome feedback. Feel free to copy the idea. --LA2 (talk) 22:37, 28 March 2019 (UTC)[reply]

What language would we write it in on the English Wiktionary, though? Swedish? —Rua (mew) 22:38, 28 March 2019 (UTC)[reply]
I'll leave that to each volunteer to figure out. I assume that many users here, just like you and me, in addition to English, speak one lesser known language whose Wiktionary could need some more contributors. --LA2 (talk) 22:54, 28 March 2019 (UTC)[reply]
I don't know how I feel about encouraging people to create barebones entries. You probably don't have much in the way of infrastructure at sv.wikt like we do here for the likes of Swahili, but I wouldn't want people to create entries lacking basic information like noun class (compare our sukari). —Μετάknowledgediscuss/deeds 23:29, 28 March 2019 (UTC)[reply]
A major difference is that en.wiktionary has 4000 entries in Swahili and sv.wiktionary so far only has 200. Swahili is not my main concern, but it was a good neutral example of a language for which we currently did not even have a translation of sugar. We'd be happy if someone comes by and adds the names of the months. The other example was Ukrainian, for which sv.wiktionary has 1200 entries and a fully developed system of inflection templates for nouns, adjectives and verbs, e.g. sv:писати. --LA2 (talk) 00:52, 29 March 2019 (UTC)[reply]
Maybe Esperanto would be a good choice? – Jberkel 21:04, 1 April 2019 (UTC)[reply]

a distrust of politicians

[edit]

distrust says it's uncountable but one can find " a distrust of politicians". Is the situtation here similar to "a fear of heights"? The Oxford Genie dictionary includes a in the entries of both terms, but I do not exactly know why --Backinstadiums (talk) 17:49, 29 March 2019 (UTC)[reply]

Similar to disbelief, it should say: (usually uncountable, plural distrusts): [3], [4], [5].  --Lambiam 19:13, 29 March 2019 (UTC)[reply]
I updated that, because I agree. - TheDaveRoss 14:47, 1 April 2019 (UTC)[reply]
Is this the new term for a group of politicians, like a flock of birds? :D —Rua (mew) 21:19, 31 March 2019 (UTC)[reply]
This has entered my vocabulary. - TheDaveRoss 14:47, 1 April 2019 (UTC)[reply]
It's a construction used to describe characteristics, along the same lines as "she has a lively personality", "the bird has a large beak", "he has a peculiar laugh", etc. This particular variation isn't really countable, though: you would say "they all have a distaste for formality", and never "both have fears of heights". Chuck Entz (talk) 00:02, 1 April 2019 (UTC)[reply]
I have added the noun plural, which was missing. It is almost never heard. I did find a couple of examples in old religious writing: "So often did they provoke God by their distrusts and murmurings..."; "The titles here given them, were enough to shame them out of their distrusts." Equinox 00:09, 1 April 2019 (UTC)[reply]
The three links I put above are to examples of (in these cases non-religious) use.  --Lambiam 14:19, 1 April 2019 (UTC)[reply]

CFI-amendment: excluding typos and scans

[edit]

In light of this comment, I've decided to draft a new proposal: Wiktionary:Votes/2019-03/Excluding typos and scannos. Comments and improvements are welcome. ChignonПучок 18:25, 29 March 2019 (UTC)[reply]

@Chignon: It seems good to me. You should set a start (and end) date, because there's no point in waiting much longer; you won't get more feedback once this is buried. I'd recommend starting a week from when you posted this, following usual custom. —Μετάknowledgediscuss/deeds 04:33, 31 March 2019 (UTC)[reply]

Should an optional orthographic convention in Pashto to distinguish final /ə/ and /a/ be followed?

[edit]

The existing declensional templates for Pashto [[Category:Pashto declension-table templates]] utilise a final he-hamza ۀ to denote final /ə/ which Anne Boyle David's grammar (Descriptive Grammar of Pashto and its Dialects) describes as a "suggestion" (p. 29). This would put it in contrast to final hamza ه which would denote final /a/. I as much as cannot find the character (U+06C0) on an OS X Afghan Pashto keyboard. Note that any non-final /ə/ is not denoted using any diacritics in the Pashto Perso-Arabic script. Should newer entries respect this suggestion or not, and the in-place templates be left as they are or modified to show both variants? Bringing in Vahagn Petrosyan, Qehath, and Adjutor101. Sinonquoi (talk) 12:01, 30 March 2019 (UTC)[reply]

I don't know anything about this subject. --Vahag (talk) 12:21, 30 March 2019 (UTC)[reply]
Similarly, my Pashto isn't good enough that I can make any useful comment. — [ זכריה קהת ] Zack. 02:06, 31 March 2019 (UTC)[reply]
@Sinonquoi, could you give an example? --{{victar|talk}} 02:14, 31 March 2019 (UTC)[reply]
The word ویښتۀ for instance which is often written simply as ویښته. Sinonquoi (talk) 04:48, 31 March 2019 (UTC)[reply]
@Sinonquoi: Pashto French [Dr. M. Akbar Wardag - Qamosona.com] dictionary just uses the spelling وېښته (weӽtǝ́) with the normal final ه and transliterates with an "-ǝ". Pashto Wiktionary also has وېښته . If this spelling with an ۀ (ë) is stricter and more reflective of the pronunciation, I guess we can use it as the main dictionary form and the one with an ه (a) - a redirect or an alternative form.
(I am just using Qamosona's transcription "weӽtǝ́", perhaps we should transliterate ویښتۀ as "weẍtë", as per WT:PS TR.) --Anatoli T. (обсудить/вклад) 05:43, 31 March 2019 (UTC)[reply]
What about including the /ǝ/ spelling in links and headwords, but stripping it from page titles, like Russian or Latin diacritics? —Suzukaze-c 05:57, 31 March 2019 (UTC)[reply]
(edit conflict) @Suzukaze-c It's an option, especially considering that in Persian, the same letter ۀ is not part of the headword, probably not considered a separate letter when used in [[ezafe]], e.g. ایالات متحده آمریکا (the United States of America), transliterated as "eyâlât-e mottahede-ye âmrikâ" uses "ایالات متحدۀ آمریکا" in the headword. It's a different usage in Persian, though. I'm stretching my knowledge here. All depends on how it's perceived. The Arabic ة used to be perceived as a letter ه with diacritics but now it's a letter. We also write out hamza over and under alif. That's why I asked if it's considered a stricter spelling (or a diacritic). --Anatoli T. (обсудить/вклад) 06:12, 31 March 2019 (UTC)[reply]
@Sinonquoi, I would support moving ویښتۀ to وېښته and placing |head=ویښتۀ in the {{head}}. We can also strip ۀ from links, effectively redirecting them to ه pages. --{{victar|talk}} 06:06, 31 March 2019 (UTC)[reply]
I will support depending on what the letter means to Pashto speakers, a diacritic or stricter spelling. We use "tāʾ marbūṭa" and hamza over and under alif in headwords but write out diacritics, which only serve as a pronunciation guide. --Anatoli T. (обсудить/вклад) 06:12, 31 March 2019 (UTC)[reply]
@Sinonquoi: You haven't expressed your own opinion. This can go either way. Meanwhile, I have made the entry وېښته using "ویښتۀ" in the header. --Anatoli T. (обсудить/вклад) 00:40, 1 April 2019 (UTC)[reply]
I think it's best to use just ه since it's prevalent. ۀ is used very little. That only leaves us with the problem of the current templates which all use it. Sinonquoi (talk) 08:32, 14 April 2019 (UTC)[reply]
@Sinonquoi: I have swapped the entries around. --Anatoli T. (обсудить/вклад) 08:50, 14 April 2019 (UTC)[reply]

Audio files for example sentence

[edit]

Hi, everyone. I am an intermediate English learner. Recently I collected and uploaded some audio files of english speeches at Wikicommons and made some links to those audio files from this English wiktionary or Korean version. This is for the educational purpose for English learners like me to listen some sentences in which a word is used. So, I think it is very useful and meaningful works. But some people don't agree on my opinion. Please see this. [6] I wonder what other people think about it.HappyMidnight (talk) 00:58, 31 March 2019 (UTC)[reply]

The audio recording in question sounds bizarre. We don't normally have audio for quotations, but if we do, it should be good audio, not that dreck. —Μετάknowledgediscuss/deeds 04:30, 31 March 2019 (UTC)[reply]
I could not feel those audio files so bizarre or dreck. Now that I understand it. Thank you. HappyMidnight (talk) 05:29, 31 March 2019 (UTC)[reply]
From the VOA website: “Learning English [broadcasts] use a limited vocabulary and are read at a slower pace than VOA's other English broadcasts. Previously known as Special English.”  --Lambiam 10:02, 31 March 2019 (UTC)[reply]