TiktokenTokenizer.CreateAsync Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| CreateAsync(Stream, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken) |
Create a new Tiktoken tokenizer's object asynchronously. |
| CreateAsync(String, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken) |
Create a new Tiktoken tokenizer's object asynchronously. |
CreateAsync(Stream, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken)
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
Create a new Tiktoken tokenizer's object asynchronously.
public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateAsync(System.IO.Stream vocabStream, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer, Microsoft.ML.Tokenizers.Normalizer? normalizer, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default, int cacheSize = 8192, System.Threading.CancellationToken cancellationToken = default);
static member CreateAsync : System.IO.Stream * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateAsync (vocabStream As Stream, preTokenizer As PreTokenizer, normalizer As Normalizer, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)
Parameters
- vocabStream
- Stream
The stream to the BPE vocab file.
- preTokenizer
- PreTokenizer
The pre-tokenizer to use.
- normalizer
- Normalizer
The normalizer to use.
- specialTokens
- IReadOnlyDictionary<String,Int32>
The dictionary mapping special tokens to Ids.
- cacheSize
- Int32
The size of the cache to use.
- cancellationToken
- CancellationToken
CancellationToken used to request cancellation of the operation.
Returns
The tokenizer's object.
Remarks
When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.
Applies to
CreateAsync(String, PreTokenizer, Normalizer, IReadOnlyDictionary<String,Int32>, Int32, CancellationToken)
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
Create a new Tiktoken tokenizer's object asynchronously.
public static System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer> CreateAsync(string vocabFilePath, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer, Microsoft.ML.Tokenizers.Normalizer? normalizer, System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default, int cacheSize = 8192, System.Threading.CancellationToken cancellationToken = default);
static member CreateAsync : string * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * System.Threading.CancellationToken -> System.Threading.Tasks.Task<Microsoft.ML.Tokenizers.TiktokenTokenizer>
Public Shared Function CreateAsync (vocabFilePath As String, preTokenizer As PreTokenizer, normalizer As Normalizer, Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional cancellationToken As CancellationToken = Nothing) As Task(Of TiktokenTokenizer)
Parameters
- vocabFilePath
- String
The BPE vocab file.
- preTokenizer
- PreTokenizer
The pre-tokenizer to use.
- normalizer
- Normalizer
The normalizer to use.
- specialTokens
- IReadOnlyDictionary<String,Int32>
The dictionary mapping special tokens to Ids.
- cacheSize
- Int32
The size of the cache to use.
- cancellationToken
- CancellationToken
CancellationToken used to request cancellation of the operation.
Returns
The tokenizer's object.
Remarks
When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.