Character filters are used to `tidy up'' a string before it is tokenized.
For instance, if our text is in HTML format, it will contain HTML tags like
`<p> or <div> that we don’t want to be indexed. We can use the
{ref}/analysis-htmlstrip-charfilter.html[html_strip character filter]
to remove all HTML tags and to convert HTML entities like Á into the
corresponding Unicode character Á.
An analyzer may have zero or more character filters.