Udostępnij przez


SentencePieceTokenizer.Create Method

Definition

Creates an instance of SentencePieceTokenizer. The model stream should contain a SentencePiece model as specified in the following documentation: https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto.

public static Microsoft.ML.Tokenizers.SentencePieceTokenizer Create(System.IO.Stream modelStream, bool addBeginOfSentence = true, bool addEndOfSentence = false, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default);
static member Create : System.IO.Stream * bool * bool * System.Collections.Generic.IReadOnlyDictionary<string, int> -> Microsoft.ML.Tokenizers.SentencePieceTokenizer
Public Shared Function Create (modelStream As Stream, Optional addBeginOfSentence As Boolean = true, Optional addEndOfSentence As Boolean = false, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing) As SentencePieceTokenizer

Parameters

modelStream
Stream

The stream containing the SentencePiece Bpe or Unigram model.

addBeginOfSentence
Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence
Boolean

Indicate emitting the end of sentence token during the encoding.

specialTokens
IReadOnlyDictionary<String,Int32>

The additional tokens to add to the vocabulary.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to