Open
Description
It would be nice to segment character iterators, especially for interoperability with the unicode-normalization
crate. This could provide a solution to #7 when/if io::Chars
stabilizes. In particular, I'd like to write a tokenizer like this:
let input: BufRead = my_input();
let tokens = input.chars().nfkc().split_word_bounds();
One issue I see is that most of the public structs provide an as_str
method that returns "the underlying data (the part yet to be iterated) as a slice of the original string". This obviously won't work with streaming types.
Metadata
Metadata
Assignees
Labels
No labels