- unicode_map(+In, -Out, +Options) is det
- Perform unicode normalization operations. Options is a list
of operations. Defined operations are:
- stable
- Unicode Versioning Stability has to be respected.
- compat
- Compatiblity decomposition (i.e. formatting information is lost)
- compose
- Return a result with composed characters.
- decompose
- Return a result with decomposed characters.
- ignore
- Strip "default ignorable characters"
- rejectna
- Return an error, if the input contains unassigned code
points.
- nlf2ls
- Indicating that NLF-sequences (LF, CRLF, CR, NEL) are
representing a line break, and should be converted to the
unicode character for line separation (LS).
- nlf2ps
- Indicating that NLF-sequences are representing a paragraph
break, and should be converted to the unicode character for
paragraph separation (PS).
- nlf2lf
- Indicating that the meaning of NLF-sequences is unknown.
- stripcc
- Strips and/or convers control characters.
NLF-sequences are transformed into space, except if one of
the NLF2LS/PS/LF options is given.
HorizontalTab (HT) and FormFeed (FF) are treated as a
NLF-sequence in this case.
All other control characters are simply removed.
- casefold
- Performs unicode case folding, to be able to do a
case-insensitive string comparison.
- charbound
- Inserts 0xFF bytes at the beginning of each sequence which
is representing a single grapheme cluster (see UAX#29).
- lump
- (e.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-").
(See module header for details.)
If NLF2LF is set, this includes a transformation of
paragraph and line separators to ASCII line-feed (LF).
- stripmark
- Strips all character markings
(non-spacing, spacing and enclosing) (i.e. accents)
NOTE: this option works only with
compose
or decompose
.