Udostępnij przez


TiktokenTokenizer.CreateForModelAsync Method

Definition

Create a Tiktoken tokenizer based on model name and vocab file.

public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateForModelAsync(string modelName, System.IO.Stream vocabStream, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, int cacheSize = 8192, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, System.Threading.CancellationToken cancellationToken = default);
static member CreateForModelAsync : string * System.IO.Stream * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * Microsoft.ML.Tokenizers.Normalizer * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateForModelAsync (modelName As String, vocabStream As Stream, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional normalizer As Normalizer = Nothing, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)

Parameters

modelName
String

Model name

vocabStream
Stream

The stream to the BPE vocab file.

extraSpecialTokens
IReadOnlyDictionary<String,Int32>

Extra special tokens other than the built-in ones for the model

cacheSize
Int32

The size of the cache to use.

normalizer
Normalizer

To normalize the text before tokenization

cancellationToken
CancellationToken

CancellationToken used to request cancellation of the operation.

Returns

The tokenizer

Applies to