Udostępnij przez


SentencePieceNormalizer Class

Definition

Normalize the string according to SentencePiece normalization.

public sealed class SentencePieceNormalizer : Microsoft.ML.Tokenizers.Normalizer
type SentencePieceNormalizer = class
    inherit Normalizer
Public NotInheritable Class SentencePieceNormalizer
Inherits Normalizer
Inheritance
SentencePieceNormalizer

Constructors

SentencePieceNormalizer(Boolean, Boolean, Boolean, Boolean, IReadOnlyDictionary<String,Int32>)

Creates a SentencePieceNormalizer object.

Properties

AddDummyPrefix

Indicate emitting the dummy prefix character U+2581 at the beginning of sentence token during the encoding.

EscapeWhiteSpaces

Indicate escaping white spaces by adding the dummy prefix character U+2581.

RemoveExtraWhiteSpaces

Indicate removing extra white spaces from the original string during the normalization.

SpecialTokens

Indicate the added tokens.

TreatWhitespaceAsSuffix

Indicate treating white space as suffix.

Methods

Normalize(ReadOnlySpan<Char>)

Normalize the original string according to SentencePiece normalization.

Normalize(String)

Normalize the original string according to SentencePiece normalization.

Applies to