TiktokenTokenizer.CreateForModelAsync Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Create a Tiktoken tokenizer based on model name and vocab file.
public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateForModelAsync(string modelName, System.IO.Stream vocabStream, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, int cacheSize = 8192, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, System.Threading.CancellationToken cancellationToken = default);
static member CreateForModelAsync : string * System.IO.Stream * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * Microsoft.ML.Tokenizers.Normalizer * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateForModelAsync (modelName As String, vocabStream As Stream, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional normalizer As Normalizer = Nothing, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)
Parameters
- modelName
- String
Model name
- vocabStream
- Stream
The stream to the BPE vocab file.
- extraSpecialTokens
- IReadOnlyDictionary<String,Int32>
Extra special tokens other than the built-in ones for the model
- cacheSize
- Int32
The size of the cache to use.
- normalizer
- Normalizer
To normalize the text before tokenization
- cancellationToken
- CancellationToken
CancellationToken used to request cancellation of the operation.
Returns
The tokenizer