Unsupported File Type Extension while Following Azure Exercise for RAG-based Solutions

Jonathan Nguyen 0 Reputation points
2025-12-05T19:43:18.0966667+00:00

Hey,

So I'm following along the exercise: https://learn.microsoft.com/en-us/training/modules/build-copilot-ai-studio/5-exercise

I downloaded the provided brochure.zip file, which only contains PDF files.

  • I was able to successfully upload the brochure.zip as data in my AIHUB Foundry
  • I created an index off of this data.

The index failed to create after about 5-10 minutes and complains with the following error message:


input_data=/mnt/azureml/cr/j/3780882ac51644ffb9d4a2905d7b30e9/cap/data-capability/wd/INPUT_input_data/brochures.zip    

num_embedded = create_embeddings(
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/embed.py", line 312, in create_embeddings
    for chunk in chunks:
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/crack_and_chunk_and_embed.py", line 218, in documents_to_embed
    for chunked_doc in chunked_docs:
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/documents/chunking.py", line 169, in split_documents
    for i, document in enumerate(documents):
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/documents/cracking.py", line 343, in crack_documents
    for i, source in enumerate(sources):
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/crack_and_chunk_and_embed.py", line 120, in sources_to_embed
    for source_doc in source_documents:
  File "/azureml-envs/rag-embeddings/lib/python3.9/site-packages/azureml/rag/tasks/crack_and_chunk.py", line 144, in filter_and_log_extensions
    raise Exception(
Exception: None of the provided file extensions are supported. List of supported file extensions is ['.txt', '.md', '.html', '.htm', '.py', '.pdf', '.ppt', '.pptx', '.doc', '.docx', '.xls', '.xlsx', '.csv', '.json']


This doesn't seem to make much sense to me since the .zip clearly contains supported file formats.

Azure AI Content Safety
Azure AI Content Safety
An Azure service that enables users to identify content that is potentially offensive, risky, or otherwise undesirable. Previously known as Azure Content Moderator.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2025-12-05T19:43:28.0566667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The error message you are encountering indicates that the system does not recognize the file extensions contained within the brochure.zip file. Even though the ZIP file contains PDF files, the system may not be able to process the ZIP file format directly. Here are some steps you can take to resolve this issue:

    1. Extract the PDF Files: Unzip the brochure.zip file on your local machine and ensure that you are uploading the individual PDF files rather than the ZIP file itself.
    2. Check File Integrity: Ensure that the PDF files are not corrupted and are in a supported format. You can open them locally to verify.
    3. Upload Supported Formats: Make sure that you are only uploading files with the supported extensions listed in the error message: .txt, .md, .html, .htm, .py, .pdf, .ppt, .pptx, .doc, .docx, .xls, .xlsx, .csv, .json.

    If you follow these steps and still encounter issues, it may be helpful to check the documentation for any additional requirements or limitations regarding file uploads in Azure AI services.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.