TiktokenTokenizer.CreateForModel Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| CreateForModel(String, IReadOnlyDictionary<String,Int32>, Normalizer) |
Create tokenizer based on model name |
| CreateForModel(String, Stream, IReadOnlyDictionary<String,Int32>, Int32, Normalizer) |
Create a Tiktoken tokenizer based on model name and vocab file. |
CreateForModel(String, IReadOnlyDictionary<String,Int32>, Normalizer)
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
Create tokenizer based on model name
public static Microsoft.ML.Tokenizers.TiktokenTokenizer CreateForModel(string modelName, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default);
static member CreateForModel : string * System.Collections.Generic.IReadOnlyDictionary<string, int> * Microsoft.ML.Tokenizers.Normalizer -> Microsoft.ML.Tokenizers.TiktokenTokenizer
Public Shared Function CreateForModel (modelName As String, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional normalizer As Normalizer = Nothing) As TiktokenTokenizer
Parameters
- modelName
- String
Model name
- extraSpecialTokens
- IReadOnlyDictionary<String,Int32>
Extra special tokens other than the built-in ones for the model
- normalizer
- Normalizer
To normalize the text before tokenization
Returns
The tokenizer
Applies to
CreateForModel(String, Stream, IReadOnlyDictionary<String,Int32>, Int32, Normalizer)
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
- Source:
- TiktokenTokenizer.cs
Create a Tiktoken tokenizer based on model name and vocab file.
public static Microsoft.ML.Tokenizers.TiktokenTokenizer CreateForModel(string modelName, System.IO.Stream vocabStream, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, int cacheSize = 8192, Microsoft.ML.Tokenizers.Normalizer? normalizer = default);
static member CreateForModel : string * System.IO.Stream * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * Microsoft.ML.Tokenizers.Normalizer -> Microsoft.ML.Tokenizers.TiktokenTokenizer
Public Shared Function CreateForModel (modelName As String, vocabStream As Stream, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional normalizer As Normalizer = Nothing) As TiktokenTokenizer
Parameters
- modelName
- String
Model name
- vocabStream
- Stream
The stream to the BPE vocab file.
- extraSpecialTokens
- IReadOnlyDictionary<String,Int32>
Extra special tokens other than the built-in ones for the model
- cacheSize
- Int32
The size of the cache to use.
- normalizer
- Normalizer
To normalize the text before tokenization
Returns
The tokenizer