Udostępnij przez


LlamaTokenizer.Create Method

Definition

Create from the given model stream a LlamaTokenizer which is based on SentencePieceTokenizer. The model stream should contain the SentencePiece Bpe model according to https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto specification.

public static Microsoft.ML.Tokenizers.LlamaTokenizer Create(System.IO.Stream modelStream, bool addBeginOfSentence = true, bool addEndOfSentence = false, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default);
static member Create : System.IO.Stream * bool * bool * System.Collections.Generic.IReadOnlyDictionary<string, int> -> Microsoft.ML.Tokenizers.LlamaTokenizer
Public Shared Function Create (modelStream As Stream, Optional addBeginOfSentence As Boolean = true, Optional addEndOfSentence As Boolean = false, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing) As LlamaTokenizer

Parameters

modelStream
Stream

The stream containing the SentencePiece Bpe model.

addBeginOfSentence
Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence
Boolean

Indicate emitting the end of sentence token during the encoding.

specialTokens
IReadOnlyDictionary<String,Int32>

The additional tokens to add to the vocabulary.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to