A procedure of splitting a text into pieces called “tokens”.
A tokenizer is software that takes a string as input, and returns a sequence of tokens extracted from that string. Typically, tokens are words, but it’s not strictly necessary. It can be a punctuation mark, a word, or in some cases, a combination of words.