Udostępnij przez


TiktokenTokenizer.CreateForModel Method

Definition

Overloads

CreateForModel(String, IReadOnlyDictionary<String,Int32>, Normalizer)

Create tokenizer based on model name

CreateForModel(String, Stream, IReadOnlyDictionary<String,Int32>, Int32, Normalizer)

Create a Tiktoken tokenizer based on model name and vocab file.

CreateForModel(String, IReadOnlyDictionary<String,Int32>, Normalizer)

Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs

Create tokenizer based on model name

public static Microsoft.ML.Tokenizers.TiktokenTokenizer CreateForModel(string modelName, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default);
static member CreateForModel : string * System.Collections.Generic.IReadOnlyDictionary<string, int> * Microsoft.ML.Tokenizers.Normalizer -> Microsoft.ML.Tokenizers.TiktokenTokenizer
Public Shared Function CreateForModel (modelName As String, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional normalizer As Normalizer = Nothing) As TiktokenTokenizer

Parameters

modelName
String

Model name

extraSpecialTokens
IReadOnlyDictionary<String,Int32>

Extra special tokens other than the built-in ones for the model

normalizer
Normalizer

To normalize the text before tokenization

Returns

The tokenizer

Applies to

CreateForModel(String, Stream, IReadOnlyDictionary<String,Int32>, Int32, Normalizer)

Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs
Source:
TiktokenTokenizer.cs

Create a Tiktoken tokenizer based on model name and vocab file.

public static Microsoft.ML.Tokenizers.TiktokenTokenizer CreateForModel(string modelName, System.IO.Stream vocabStream, System.Collections.Generic.IReadOnlyDictionary<string,int>? extraSpecialTokens = default, int cacheSize = 8192, Microsoft.ML.Tokenizers.Normalizer? normalizer = default);
static member CreateForModel : string * System.IO.Stream * System.Collections.Generic.IReadOnlyDictionary<string, int> * int * Microsoft.ML.Tokenizers.Normalizer -> Microsoft.ML.Tokenizers.TiktokenTokenizer
Public Shared Function CreateForModel (modelName As String, vocabStream As Stream, Optional extraSpecialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing, Optional cacheSize As Integer = 8192, Optional normalizer As Normalizer = Nothing) As TiktokenTokenizer

Parameters

modelName
String

Model name

vocabStream
Stream

The stream to the BPE vocab file.

extraSpecialTokens
IReadOnlyDictionary<String,Int32>

Extra special tokens other than the built-in ones for the model

cacheSize
Int32

The size of the cache to use.

normalizer
Normalizer

To normalize the text before tokenization

Returns

The tokenizer

Applies to