The error message you are encountering, Microsoft.Exchange.Management.Tasks.ErrorFileHasNoTextContentException, indicates that the document may not be recognized as containing text content, even though it is a regular .docx file. Here are some potential reasons and troubleshooting steps:
- Check Document Content: Ensure that the text in the document is not embedded in a way that makes it unreadable by the parser (e.g., as an image or within text boxes).
- Image Content: Since your document contains an image, verify that the presence of the image is not interfering with the text extraction process. Sometimes, if the text is not in the main body but rather in headers, footers, or text boxes, it may not be detected.
- File Integrity: Ensure that the .docx file is not corrupted. Try opening it in Microsoft Word and saving it again to create a fresh copy.
- Test with Different Formats: Since you mentioned that the SIT works with a .txt file, consider converting the .docx file to a .txt format and testing it again to see if the issue persists.
- Document Size and Complexity: If the document is large or complex, consider simplifying it to see if that resolves the issue.
If these steps do not resolve the issue, you may want to consult further documentation on sensitive information types and their requirements for processing.
References: