PreTokenizer.CreateWhiteSpace(IReadOnlyDictionary<String,Int32>) Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Create a new instance of the PreTokenizer class which split the text at the white spaces.
public static Microsoft.ML.Tokenizers.PreTokenizer CreateWhiteSpace(System.Collections.Generic.IReadOnlyDictionary<string,int>? specialTokens = default);
static member CreateWhiteSpace : System.Collections.Generic.IReadOnlyDictionary<string, int> -> Microsoft.ML.Tokenizers.PreTokenizer
Public Shared Function CreateWhiteSpace (Optional specialTokens As IReadOnlyDictionary(Of String, Integer) = Nothing) As PreTokenizer
Parameters
- specialTokens
- IReadOnlyDictionary<String,Int32>
The dictionary containing the special tokens and their corresponding ids.
Returns
The pre-tokenizer that splits the text at the white spaces.
Remarks
This pre-tokenizer uses the regex pattern "\S+" to split the text into tokens.