Content-Length: 270221 | pFad | https://github.com/github/cmark-gfm/issues/278

31 GFM autolink extension (`www.`, `https?://` parts): links don’t work when after bracket · Issue #278 · github/cmark-gfm · GitHub
Skip to content

GFM autolink extension (www., https?:// parts): links don’t work when after bracket #278

Open
@wooorm

Description

@wooorm

Problem

Consider:

x www.example.com

[ www.example.com

![ www.example.com

[^ www.example.com

[] www.example.com

---

x https://example.com

[ https://example.com

![ https://example.com

[^ https://example.com

[] https://example.com

---

x contact@example.com

[ contact@example.com

![ contact@example.com

[^ contact@example.com

[] contact@example.com

This is currently rendered as:

x www.example.com

[ www.example.com

![ www.example.com

[^ www.example.com

[] www.example.com


x https://example.com

[ https://example.com

![ https://example.com

[^ https://example.com

[] https://example.com


x contact@example.com

[ contact@example.com

![ contact@example.com

[^ contact@example.com

[] contact@example.com


The reason for this is that, for performance reasons, GH has two algorithms to parse its autolink extension: www. and https?:// are handled when parsing, emails are handled when postprocessing.

One solution

One solution for this problem, is to perform everything when postprocessing (just like the new mailto and xmpp protocols).
However, postprocessing has problems: it does not consider character escapes or character references:

contact\@example.com

contact@example.com

contact@example.com

Yields:

contact@example.com

contact@example.com

contact@example.com


These are examples of someone trying to prevent an email from being linked, using methods that work in the rest of markdown, but GFM ignores that.
A similar problem exists for math on GitHub, which some users are unhappy about, and results in weird and unintuitive ways to escape it.

A better solution

I think it should be possible to either:

  • use a different node type than CMARK_NODE_LINK in extensions/autolink.c
  • add a field to CMARK_NODE_LINK, to differentiate extension links from “normal” links

Then, update the extension to not exit when in a bracket.

Finally, when compiling, to output just the URL (not in a link) of an extension autolink, when already inside a link.

This solution, does come with an additional problem, but that can be mitigated.
Because URL parsing is so loose, it matches ](xxx). For example:

a www.example.com](www.example.org) b

[ a www.example.com](www.example.org) b

Yields:

a www.example.com](www.example.org) b

a www.example.com b

It can be mitigated by, when seeing ], stopping if the next character is ( (or [)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions









      ApplySandwichStrip

      pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


      --- a PPN by Garber Painting Akron. With Image Size Reduction included!

      Fetched URL: https://github.com/github/cmark-gfm/issues/278

      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy