EnglishRobertaTokenizer.Create Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| Create(Stream, Stream, Stream) |
Create tokenizer's model object to use with the English Robert model. |
| Create(String, String, String) |
Create tokenizer's model object to use with the English Robert model. |
| Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean) |
Create tokenizer's model object to use with the English Robert model. |
| Create(String, String, String, PreTokenizer, Normalizer, Boolean) |
Create tokenizer's model object to use with the English Robert model. |
Create(Stream, Stream, Stream)
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
Create tokenizer's model object to use with the English Robert model.
public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream);
static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream) As EnglishRobertaTokenizer
Parameters
- vocabularyStream
- Stream
The stream of a JSON file containing the dictionary of string keys and their ids.
- mergeStream
- Stream
The stream of a file containing the tokens's pairs list.
- highestOccurrenceMappingStream
- Stream
Remap the original GPT-2 model Ids to high occurrence ranks and values.
Returns
Remarks
When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.
Applies to
Create(String, String, String)
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
Create tokenizer's model object to use with the English Robert model.
public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath);
static member Create : string * string * string -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String) As EnglishRobertaTokenizer
Parameters
- vocabularyPath
- String
The JSON file path containing the dictionary of string keys and their ids.
- mergePath
- String
The file path containing the tokens's pairs list.
- highestOccurrenceMappingPath
- String
Remap the original GPT-2 model Ids to high occurrence ranks and values.
Returns
Remarks
When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.
Applies to
Create(Stream, Stream, Stream, PreTokenizer, Normalizer, Boolean)
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
Create tokenizer's model object to use with the English Robert model.
public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(System.IO.Stream vocabularyStream, System.IO.Stream mergeStream, System.IO.Stream highestOccurrenceMappingStream, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);
static member Create : System.IO.Stream * System.IO.Stream * System.IO.Stream * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyStream As Stream, mergeStream As Stream, highestOccurrenceMappingStream As Stream, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer
Parameters
- vocabularyStream
- Stream
The stream of a JSON file containing the dictionary of string keys and their ids.
- mergeStream
- Stream
The stream of a file containing the tokens's pairs list.
- highestOccurrenceMappingStream
- Stream
Remap the original GPT-2 model Ids to high occurrence ranks and values.
- preTokenizer
- PreTokenizer
The pre-tokenizer to use.
- normalizer
- Normalizer
The normalizer to use.
- filterUnsupportedChars
- Boolean
Indicate if want to filter the unsupported characters during the decoding.
Returns
Remarks
When creating the tokenizer, ensure that the vocabulary stream is sourced from a trusted provider.
Applies to
Create(String, String, String, PreTokenizer, Normalizer, Boolean)
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
- Source:
- EnglishRobertaTokenizer.cs
Create tokenizer's model object to use with the English Robert model.
public static Microsoft.ML.Tokenizers.EnglishRobertaTokenizer Create(string vocabularyPath, string mergePath, string highestOccurrenceMappingPath, Microsoft.ML.Tokenizers.PreTokenizer? preTokenizer = default, Microsoft.ML.Tokenizers.Normalizer? normalizer = default, bool filterUnsupportedChars = true);
static member Create : string * string * string * Microsoft.ML.Tokenizers.PreTokenizer * Microsoft.ML.Tokenizers.Normalizer * bool -> Microsoft.ML.Tokenizers.EnglishRobertaTokenizer
Public Shared Function Create (vocabularyPath As String, mergePath As String, highestOccurrenceMappingPath As String, Optional preTokenizer As PreTokenizer = Nothing, Optional normalizer As Normalizer = Nothing, Optional filterUnsupportedChars As Boolean = true) As EnglishRobertaTokenizer
Parameters
- vocabularyPath
- String
The JSON file path containing the dictionary of string keys and their ids.
- mergePath
- String
The file path containing the tokens's pairs list.
- highestOccurrenceMappingPath
- String
Remap the original GPT-2 model Ids to high occurrence ranks and values.
- preTokenizer
- PreTokenizer
The pre-tokenizer to use.
- normalizer
- Normalizer
The normalizer to use.
- filterUnsupportedChars
- Boolean
Indicate if want to filter the unsupported characters during the decoding.
Returns
Remarks
When creating the tokenizer, ensure that the vocabulary file is sourced from a trusted provider.