BertTokenizer.EncodeToIds Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| EncodeToIds(String, Int32, Boolean, String, Int32, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(ReadOnlySpan<Char>, Int32, Boolean, String, Int32, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(String, Int32, String, Int32, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(ReadOnlySpan<Char>, Int32, String, Int32, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(String, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean) |
Encodes input text to token Ids. |
| EncodeToIds(String, Boolean, Boolean, Boolean) |
Encodes input text to token Ids. |
EncodeToIds(String, Int32, Boolean, String, Int32, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, int maxTokenCount, bool addSpecialTokens, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : string * int * bool * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As String, maxTokenCount As Integer, addSpecialTokens As Boolean, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- String
The text to encode.
- maxTokenCount
- Int32
The maximum number of tokens to return.
- addSpecialTokens
- Boolean
Indicate whether to add special tokens to the encoded Ids.
- normalizedText
- String
The normalized text.
- charsConsumed
- Int32
The number of characters consumed from the input text.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(ReadOnlySpan<Char>, Int32, Boolean, String, Int32, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, int maxTokenCount, bool addSpecialTokens, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : ReadOnlySpan<char> * int * bool * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As ReadOnlySpan(Of Char), maxTokenCount As Integer, addSpecialTokens As Boolean, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- maxTokenCount
- Int32
The maximum number of tokens to return.
- addSpecialTokens
- Boolean
Indicate whether to add special tokens to the encoded Ids.
- normalizedText
- String
The normalized text.
- charsConsumed
- Int32
The number of characters consumed from the input text.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(String, Int32, String, Int32, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, int maxTokenCount, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : string * int * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As String, maxTokenCount As Integer, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- String
The text to encode.
- maxTokenCount
- Int32
The maximum number of tokens to return.
- normalizedText
- String
The normalized text.
- charsConsumed
- Int32
The number of characters consumed from the input text.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(ReadOnlySpan<Char>, Int32, String, Int32, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, int maxTokenCount, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : ReadOnlySpan<char> * int * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As ReadOnlySpan(Of Char), maxTokenCount As Integer, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- maxTokenCount
- Int32
The maximum number of tokens to return.
- normalizedText
- String
The normalized text.
- charsConsumed
- Int32
The number of characters consumed from the input text.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, bool addSpecialTokens, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : ReadOnlySpan<char> * bool * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As ReadOnlySpan(Of Char), addSpecialTokens As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- addSpecialTokens
- Boolean
Indicate whether to add special tokens to the encoded Ids.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(String, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : string * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As String, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- String
The text to encode.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : ReadOnlySpan<char> * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As ReadOnlySpan(Of Char), Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.
Applies to
EncodeToIds(String, Boolean, Boolean, Boolean)
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
- Source:
- BertTokenizer.cs
Encodes input text to token Ids.
public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, bool addSpecialTokens, bool considerPreTokenization = true, bool considerNormalization = true);
override this.EncodeToIds : string * bool * bool * bool -> System.Collections.Generic.IReadOnlyList<int>
Public Function EncodeToIds (text As String, addSpecialTokens As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)
Parameters
- text
- String
The text to encode.
- addSpecialTokens
- Boolean
Indicate whether to add special tokens to the encoded Ids.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The list of encoded Ids.