TiktokenTokenizer.CreateForEncoding Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Create tokenizer based on encoding name
public static Microsoft.ML.Tokenizers.TiktokenTokenizer CreateForEncoding(string encodingName, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default);
static member CreateForEncoding : string * System.Collections.Generic.IReadOnlyDictionary<string, int> * Microsoft.ML.Tokenizers.Normalizer -> Microsoft.ML.Tokenizers.TiktokenTokenizer
Public Shared Function CreateForEncoding (encodingName As String, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional normalizer As Normalizer = Nothing) As TiktokenTokenizer
Parameters
- encodingName
- String
Encoding name
- extraSpecialTokens
- IReadOnlyDictionary<String,Int32>
Extra special tokens other than the built-in ones for the encoding
- normalizer
- Normalizer
To normalize the text before tokenization
Returns
The tokenizer