[−][src]Trait textwrap::word_separators::WordSeparator
Describes where words occur in a line of text.
The simplest approach is say that words are separated by one or
more ASCII spaces (' '
). This works for Western languages
without emojis. A more complex approach is to use the Unicode line
breaking algorithm, which finds break points in non-ASCII text.
The line breaks occur between words, please see the
WordSplitter
trait for
options of how to handle hyphenation of individual words.
Examples
use textwrap::core::Word; use textwrap::word_separators::{WordSeparator, AsciiSpace}; let words = AsciiSpace.find_words("Hello World!").collect::<Vec<_>>(); assert_eq!(words, vec![Word::from("Hello "), Word::from("World!")]);
Required methods
fn find_words<'a>(
&self,
line: &'a str
) -> Box<dyn Iterator<Item = Word<'a>> + 'a>
&self,
line: &'a str
) -> Box<dyn Iterator<Item = Word<'a>> + 'a>
Find all words in line
.
Implementations on Foreign Types
impl WordSeparator for Box<dyn WordSeparator>
[src]
Loading content...Implementors
impl WordSeparator for AsciiSpace
[src]
Split line
into words separated by regions of ' '
characters.
Examples
use textwrap::core::Word; use textwrap::word_separators::{AsciiSpace, WordSeparator}; let words = AsciiSpace.find_words("Hello World!").collect::<Vec<_>>(); assert_eq!(words, vec![Word::from("Hello "), Word::from("World!")]);
impl WordSeparator for UnicodeBreakProperties
[src]
Split line
into words using Unicode break properties.
This word separator uses the Unicode line breaking algorithm
described in Unicode Standard Annex
#14 to find legal places
to break lines. There is a small difference in that the U+002D
(Hyphen-Minus) and U+00AD (Soft Hyphen) don’t create a line break:
to allow a line break at a hyphen, use the
HyphenSplitter
. Soft
hyphens are not currently supported.
Examples
Unlike AsciiSpace
, the Unicode line breaking algorithm will
find line break opportunities between some characters with no
intervening whitespace:
#[cfg(feature = "unicode-linebreak")] { use textwrap::word_separators::{WordSeparator, UnicodeBreakProperties}; use textwrap::core::Word; assert_eq!(UnicodeBreakProperties.find_words("Emojis: 😂😍").collect::<Vec<_>>(), vec![Word::from("Emojis: "), Word::from("😂"), Word::from("😍")]); assert_eq!(UnicodeBreakProperties.find_words("CJK: 你好").collect::<Vec<_>>(), vec![Word::from("CJK: "), Word::from("你"), Word::from("好")]); }
A U+2060 (Word Joiner) character can be inserted if you want to manually override the defaults and keep the characters together:
#[cfg(feature = "unicode-linebreak")] { use textwrap::word_separators::{UnicodeBreakProperties, WordSeparator}; use textwrap::core::Word; assert_eq!(UnicodeBreakProperties.find_words("Emojis: 😂\u{2060}😍").collect::<Vec<_>>(), vec![Word::from("Emojis: "), Word::from("😂\u{2060}😍")]); }
The Unicode line breaking algorithm will also automatically suppress break breaks around certain punctuation characters::
#[cfg(feature = "unicode-linebreak")] { use textwrap::word_separators::{UnicodeBreakProperties, WordSeparator}; use textwrap::core::Word; assert_eq!(UnicodeBreakProperties.find_words("[ foo ] bar !").collect::<Vec<_>>(), vec![Word::from("[ foo ] "), Word::from("bar !")]); }