LlamaTokenizer.Create Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Create from the given model stream a LlamaTokenizer which is based on SentencePieceTokenizer. The model stream should contain the SentencePiece Bpe model according to https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto specification.
public static Microsoft.ML.Tokenizers.LlamaTokenizer Create(System.IO.Stream modelStream, bool addBeginOfSentence = true, bool addEndOfSentence = false, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default);
static member Create : System.IO.Stream * bool * bool * System.Collections.Generic.IReadOnlyDictionary<string, int> -> Microsoft.ML.Tokenizers.LlamaTokenizer
Public Shared Function Create (modelStream As Stream, Optional addBeginOfSentence As Boolean = true, Optional addEndOfSentence As Boolean = false, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing) As LlamaTokenizer
Parameters
- modelStream
- Stream
The stream containing the SentencePiece Bpe model.
- addBeginOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- specialTokens
- IReadOnlyDictionary<String,Int32>
The additional tokens to add to the vocabulary.
Returns
Remarks
When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.