Freigeben über

Microsoft Graph API PDF Document Content

admin-blechinger 0 Zuverlässigkeitspunkte
2025-10-22T15:01:04.97+00:00

We want to request the content of a PDF file via the Microsoft Graph API. When requesting the document we get an error. The resource exists in Sharepoint and when requesting a docx document from the same Sharepoint folder the request works.

This is the request: https://graph.microsoft.com/v1.0/sites/driveid/root:/folder/document.pdf:/content
I have replaced site, driveid, folder and document name with placeholders.

Please support.

Gemeindehaus | Wird nicht überwacht
{count} Stimmen

1 Antwort

Sortieren nach: Am hilfreichsten
  1. Sridevi Machavarapu 15,235 Zuverlässigkeitspunkte Externe Microsoft-Mitarbeiter Moderator
    2025-10-23T11:02:33.79+00:00

    Hello admin-blechinger,

    I understand you are trying to get content of a PDF file from SharePoint using Microsoft Graph API, but your request fails while DOCX files work.

    The reason is that PDF files are returned as a binary stream from the Graph API /content endpoint. To handle PDFs, you need to download the binary content first and then extract text using external libraries.

    I have one pdf file with below sample content in it:

    Benutzerbild

    I uploaded this pdf file to SharePoint folder in this path:

    Benutzerbild

    You can run below API calls in Graph Explorer to get the values of site ID and drive ID respectively:

    Site ID:

    GET https://graph.microsoft.com/v1.0/sites/root:/sites/siteName
    

    enter image description here

    Drive ID:

    GET https://graph.microsoft.com/v1.0/sites/siteIDfromAbove/drives?$filter=name eq 'driveName'
    

    enter image description here

    In my case, I used client credentials flow for getting token and granted Files.Read.All permission of Application type:

    enter image description here

    Now, I used the Python code below to get an access token, download the PDF from SharePoint, and extract content using PyPDF2:

    import msal
    import requests
    #pip install PyPDF2
    from PyPDF2 import PdfReader
    
    TENANT_ID = "tenantId"
    CLIENT_ID = "appId"
    CLIENT_SECRET = "secret"
    
    SITE_ID = "siteId"
    DRIVE_ID = "driveId"
    FILE_PATH = "/DemoFolder/demo.pdf"
    
    authority = f"https://login.microsoftonline.com/{TENANT_ID}"
    scope = ["https://graph.microsoft.com/.default"]
    
    app = msal.ConfidentialClientApplication(
        client_id=CLIENT_ID,
        client_credential=CLIENT_SECRET,
        authority=authority
    )
    
    result = app.acquire_token_silent(scope, account=None)
    if not result:
        result = app.acquire_token_for_client(scopes=scope)
    
    access_token = result.get("access_token")
    if not access_token:
        raise Exception("Failed to acquire access token")
    
    content_url = f"https://graph.microsoft.com/v1.0/sites/{SITE_ID}/drives/{DRIVE_ID}/root:{FILE_PATH}:/content"
    headers = {"Authorization": f"Bearer {access_token}"}
    
    response = requests.get(content_url, headers=headers)
    
    if response.status_code == 200:
        pdf_file = "demo_downloaded.pdf"
        with open(pdf_file, "wb") as f:
            f.write(response.content)
        print(f"PDF downloaded successfully: {pdf_file}")
    else:
        raise Exception(f"Failed to download PDF: {response.status_code}")
    
    reader = PdfReader(pdf_file)
    print("\n--- Extracted Text from PDF ---")
    for page_num, page in enumerate(reader.pages, start=1):
        text = page.extract_text()
        if text:
            print(f"\n--- Page {page_num} ---\n{text.strip()}")
        else:
            print(f"\n--- Page {page_num} ---\n(No text found)")
    

    enter image description here

    Hope this helps! Let me know if you need more help. Happy to assist.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful, which may help members with similar questions.

    User's image

    If you have any other questions or are still experiencing issues, feel free to ask in the "comments" section, and I'd be happy to help.


Ihre Antwort

Antworten können von Fragestellenden als „Angenommen“ und von Moderierenden als „Empfohlen“ gekennzeichnet werden, wodurch Benutzende wissen, dass diese Antwort das Problem des Fragestellenden gelöst hat.