Udostępnij przez


TiktokenTokenizer.CreateAsync Method

Definition

Overloads

CreateAsync(Stream, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken)

Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs

Create a new Tiktoken tokenizer's object asynchronously.

public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateAsync(System.IO.Stream vocabStream, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer, Microsoft.ML.Tokenizers.Normalizer? normalizer, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default, int cacheSize = 8192, System.Threading.CancellationToken cancellationToken = default);
static member CreateAsync : System.IO.Stream * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateAsync (vocabStream As Stream, preTokenizer As PreTokenizer, normalizer As Normalizer, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)

Parameters

vocabStream
Stream

The stream to the BPE vocab file.

preTokenizer
PreTokenizer

The pre-tokenizer to use.

normalizer
Normalizer

The normalizer to use.

specialTokens
IReadOnlyDictionary<String,Int32>

The dictionary mapping special tokens to Ids.

cacheSize
Int32

The size of the cache to use.

cancellationToken
CancellationToken

CancellationToken used to request cancellation of the operation.

Returns

The tokenizer's object.

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to

CreateAsync(String, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken)

Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs

Create a new Tiktoken tokenizer's object asynchronously.

public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateAsync(string vocabFilePath, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer, Microsoft.ML.Tokenizers.Normalizer? normalizer, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default, int cacheSize = 8192, System.Threading.CancellationToken cancellationToken = default);
static member CreateAsync : string * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateAsync (vocabFilePath As String, preTokenizer As PreTokenizer, normalizer As Normalizer, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)

Parameters

vocabFilePath
String

The BPE vocab file.

preTokenizer
PreTokenizer

The pre-tokenizer to use.

normalizer
Normalizer

The normalizer to use.

specialTokens
IReadOnlyDictionary<String,Int32>

The dictionary mapping special tokens to Ids.

cacheSize
Int32

The size of the cache to use.

cancellationToken
CancellationToken

CancellationToken used to request cancellation of the operation.

Returns

The tokenizer's object.

Remarks

When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.

Applies to