EnglishRobertaTokenizer.Create Method

Definition

Namespace:: Microsoft.ML.Tokenizers

Assembly:: Microsoft.ML.Tokenizers.dll

Package:: Microsoft.ML.Tokenizers v1.0.1

Package:: Microsoft.ML.Tokenizers v0.22.0

Package:: Microsoft.ML.Tokenizers v2.0.0-preview.1.25125.4

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Overloads

Create(Stream, Stream, Stream)	Create tokenizer's model object to use with the English Robert model.
Create(String, String, String)	Create tokenizer's model object to use with the English Robert model.
Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)	Create tokenizer's model object to use with the English Robert model.
Create(String, String, String, PreTokenizer, Normalizer, Boolean)	Create tokenizer's model object to use with the English Robert model.

Create(Stream, Stream, Stream)

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream);

static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer

Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream) As EnglishRobertaTokenizer

Parameters

vocabularyStream: Stream

The stream of a JSON file containing the dictionary of string keys and their ids.

mergeStream: Stream

The stream of a file containing the tokens's pairs list.

highestOccurrenceMappingStream: Stream

Remap the original GPT-2 model Ids to high occurrence ranks and values.

Returns

EnglishRobertaTokenizer

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to

Create(String, String, String)

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath);

static member Create : string * string * string -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer

Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String) As EnglishRobertaTokenizer

Parameters

vocabularyPath: String

The JSON file path containing the dictionary of string keys and their ids.

mergePath: String

The file path containing the tokens's pairs list.

highestOccurrenceMappingPath: String

Remap the original GPT-2 model Ids to high occurrence ranks and values.

Returns

EnglishRobertaTokenizer

Remarks

When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.

Applies to

Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);

static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer

Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer

Parameters

vocabularyStream: Stream

The stream of a JSON file containing the dictionary of string keys and their ids.

mergeStream: Stream

The stream of a file containing the tokens's pairs list.

highestOccurrenceMappingStream: Stream

Remap the original GPT-2 model Ids to high occurrence ranks and values.

preTokenizer: PreTokenizer

The pre-tokenizer to use.

normalizer: Normalizer

The normalizer to use.

filterUnsupportedChars: Boolean

Indicate if want to filter the unsupported characters during the decoding.

Returns

EnglishRobertaTokenizer

Remarks

When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.

Applies to

Create(String, String, String, PreTokenizer, Normalizer, Boolean)

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Source:: EnglishRobertaTokenizer.cs

Create tokenizer's model object to use with the English Robert model.

public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);

static member Create : string * string * string * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer

Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer

Parameters

vocabularyPath: String

The JSON file path containing the dictionary of string keys and their ids.

mergePath: String

The file path containing the tokens's pairs list.

highestOccurrenceMappingPath: String

Remap the original GPT-2 model Ids to high occurrence ranks and values.

preTokenizer: PreTokenizer

The pre-tokenizer to use.

normalizer: Normalizer

The normalizer to use.

filterUnsupportedChars: Boolean

Indicate if want to filter the unsupported characters during the decoding.

Returns

EnglishRobertaTokenizer

Remarks

When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.

Applies to

Udostępnij przez

EnglishRobertaTokenizer.Create Method

Definition

Overloads

Create(Stream, Stream, Stream)

Parameters

Returns

Remarks

Applies to

Create(String, String, String)

Parameters

Returns

Remarks

Applies to

Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)

Parameters

Returns

Remarks

Applies to

Create(String, String, String, PreTokenizer, Normalizer, Boolean)

Parameters

Returns

Remarks

Applies to