Udostępnij przez


EnglishRobertaTokenizer.Create Method

Definition

Overloads

Create(Stream, Stream, Stream)

Create tokenizer's model object to use with the English Robert model.

Create(String, String, String)

Create tokenizer's model object to use with the English Robert model.

Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)

Create tokenizer's model object to use with the English Robert model.

Create(String, String, String, PreTokenizer, Normalizer, Boolean)

Create tokenizer's model object to use with the English Robert model.

Create(Stream, Stream, Stream)

Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream);
static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream) As EnglishRobertaTokenizer

Parameters

vocabularyStream
Stream

The stream of a JSON file containing the dictionary of string keys and their ids.

mergeStream
Stream

The stream of a file containing the tokens's pairs list.

highestOccurrenceMappingStream
Stream

Remap the original GPT-2 model Ids to high occurrence ranks and values.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to

Create(String, String, String)

Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath);
static member Create : string * string * string -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String) As EnglishRobertaTokenizer

Parameters

vocabularyPath
String

The JSON file path containing the dictionary of string keys and their ids.

mergePath
String

The file path containing the tokens's pairs list.

highestOccurrenceMappingPath
String

Remap the original GPT-2 model Ids to high occurrence ranks and values.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.

Applies to

Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)

Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);
static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer

Parameters

vocabularyStream
Stream

The stream of a JSON file containing the dictionary of string keys and their ids.

mergeStream
Stream

The stream of a file containing the tokens's pairs list.

highestOccurrenceMappingStream
Stream

Remap the original GPT-2 model Ids to high occurrence ranks and values.

preTokenizer
PreTokenizer

The pre-tokenizer to use.

normalizer
Normalizer

The normalizer to use.

filterUnsupportedChars
Boolean

Indicate if want to filter the unsupported characters during the decoding.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to

Create(String, String, String, PreTokenizer, Normalizer, Boolean)

Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs
Source:
EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);
static member Create : string * string * string * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer

Parameters

vocabularyPath
String

The JSON file path containing the dictionary of string keys and their ids.

mergePath
String

The file path containing the tokens's pairs list.

highestOccurrenceMappingPath
String

Remap the original GPT-2 model Ids to high occurrence ranks and values.

preTokenizer
PreTokenizer

The pre-tokenizer to use.

normalizer
Normalizer

The normalizer to use.

filterUnsupportedChars
Boolean

Indicate if want to filter the unsupported characters during the decoding.

Returns

Remarks

When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.

Applies to