Skip to content

cfcs/ocaml-punycode

Repository files navigation

ocaml-punycode Build status

RFC 3492: IDNA Punycode implementation. It deals with full DNS domain names and their individual labels, and as such can be used for encoding/decoding these, but it does not expose a generic Bootstring or Punycode implementation.

Domain name / label validation is provided by Hannes Mehnert's domain-name library

String handling and Unicode support comes courtesy of David Bünzli's astring and uutf libraries.

Please report any issues or requests for additional features on the issue tracker.

Interface:

(** {1 Error messages} *)


type illegal_ascii_label =
  | Illegal_label_size of string
  | Label_contains_illegal_character of string
  | Label_starts_with_illegal_character of char
  | Label_ends_with_hyphen of string

type punycode_decode_error =
  | Overflow_error
  | Invalid_domain_name of string
  | Illegal_label of illegal_ascii_label

type punycode_encode_error =
  | Malformed_utf8_input of string
  | Overflow
  | Invalid_domain_name of string
  | Illegal_label of illegal_ascii_label

val msg_of_encode_error : punycode_encode_error -> [> `Msg of string]
(** [msg_of_encode_error err] is [err] transcribed to a [Rresult.R.msg],
    making it easier to display the error and use it with the Rresult monad.
    Can be used like this:
    [R.reword_error msg_of_encode_error (Punycode.to_ascii "example.com")]
*)

val msg_of_decode_error : punycode_decode_error -> [> `Msg of string]
(** [msg_of_decode_error err] is [err] transcribed to a [Rresult.R.msg],
    making it easier to display the error and use it with the Rresult monad.
    Can be used like this:
    [R.reword_error msg_of_decode_error (Punycode.to_utf8 "example.com")]
*)


(** {1:unicode2punycode Unicode -> Punycode}*)


val to_encoded_domain_name : string ->
  (Domain_name.t, punycode_encode_error) Rresult.result
(** [to_encoded_domain_name domain_name] is the ASCII-only Punycode
    representation of [domain_name] with each label prefixed by ["xn--"],
    and where [domain_name] is a UTF-8-encoded DNS domain name (or label).
    An error is returned if the input is not valid UTF-8, or if the resulting
    encoded string would be an invalid domain name, for example if:
    - a non-Punycode label starts with a hyphen
    - a label is >= 64 ASCII characters or has a zero length
    - the total length is >= 255 ASCII characters including trailing ['.']
      (if not present it will be assumed).
    See {Domain_name.of_strings} for more information regarding the
    produced value and its validation.
*)

val to_ascii : string -> (string, punycode_encode_error) Rresult.result
(** [to_ascii domain_name] is the ASCII-only Punycode
    representation of [domain_name] with each label prefixed by ["xn--"],
    joined by ['.']
    See {!to_encoded_domain_name} for more information.
*)


(** {1:punycode2unicode Punycode -> Unicode} *)


val of_domain_name : Domain_name.t ->
  (Uchar.t list list, punycode_decode_error) Rresult.result
(** [of_domain_name domain] is [domain] decoded to a list of [Uchar.t] elements
    for each label in the domain name with each label prefixed by ["xn--"]
    decoded using the Punycode algorithm.*)

val to_utf8_list : string -> (string list, punycode_decode_error) Rresult.result
(** [to_utf8_list domain] is the UTF-8 representation of [domain]
    where each label prefixed by ["xn--"] is decoded using the
    Punycode algorithm.
    The implementation strives to only accept valid domain names,
    see {!to_ascii}.
    If [domain] is a FQDN (has a trailing ['.']), this is ignored.
*)

val to_utf8 : string -> (string, punycode_decode_error) Rresult.result
(** [to_utf8 domain] is {!to_utf8_list} concatenated with dots.
    Contrary to {!to_utf8_list}, If [domain] is a FQDN (has a trailing ['.']),
    the decoded string will also have a trailing ['.'].
*)

Examples

utop # Punycode.to_ascii "☫.example.ir";;
- : (string, Punycode.punycode_encode_error) result = Ok "xn--s4h.example.ir"

utop # Punycode.to_utf8 "xn--s4h.example.ir";;
- : (string, Punycode.punycode_decode_error) result =
Result.Ok "☫.example.ir"

utop # Punycode.to_ascii "n☢clear.disarmament.☮.example.com";;
- : (string, Punycode.punycode_encode_error) result =
Ok "xn--nclear-3b9c.disarmament.xn--v4h.example.com"

utop # Punycode.to_utf8 "xn--nclear-3b9c.disarmament.xn--v4h.example.com";;
- : (string, Punycode.punycode_decode_error) result =
Result.Ok "n☢clear.disarmament.☮.example.com"

Run tests

This project organizes its tests using Alcotest, and this dune alias runs them:

dune runtest --force --no-buffer

Using --no-buffer gets you colors in output.

About

RFC 3492: IDNA Punycode implementation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy