Hello admin-blechinger,
I understand you are trying to get content of a PDF file from SharePoint using Microsoft Graph API, but your request fails while DOCX files work.
The reason is that PDF files are returned as a binary stream from the Graph API /content endpoint. To handle PDFs, you need to download the binary content first and then extract text using external libraries.
I have one pdf file with below sample content in it:
I uploaded this pdf file to SharePoint folder in this path:
You can run below API calls in Graph Explorer to get the values of site ID and drive ID respectively:
Site ID:
GET https://graph.microsoft.com/v1.0/sites/root:/sites/siteName
Drive ID:
GET https://graph.microsoft.com/v1.0/sites/siteIDfromAbove/drives?$filter=name eq 'driveName'
In my case, I used client credentials flow for getting token and granted Files.Read.All permission of Application type:
Now, I used the Python code below to get an access token, download the PDF from SharePoint, and extract content using PyPDF2:
import msal
import requests
#pip install PyPDF2
from PyPDF2 import PdfReader
TENANT_ID = "tenantId"
CLIENT_ID = "appId"
CLIENT_SECRET = "secret"
SITE_ID = "siteId"
DRIVE_ID = "driveId"
FILE_PATH = "/DemoFolder/demo.pdf"
authority = f"https://login.microsoftonline.com/{TENANT_ID}"
scope = ["https://graph.microsoft.com/.default"]
app = msal.ConfidentialClientApplication(
client_id=CLIENT_ID,
client_credential=CLIENT_SECRET,
authority=authority
)
result = app.acquire_token_silent(scope, account=None)
if not result:
result = app.acquire_token_for_client(scopes=scope)
access_token = result.get("access_token")
if not access_token:
raise Exception("Failed to acquire access token")
content_url = f"https://graph.microsoft.com/v1.0/sites/{SITE_ID}/drives/{DRIVE_ID}/root:{FILE_PATH}:/content"
headers = {"Authorization": f"Bearer {access_token}"}
response = requests.get(content_url, headers=headers)
if response.status_code == 200:
pdf_file = "demo_downloaded.pdf"
with open(pdf_file, "wb") as f:
f.write(response.content)
print(f"PDF downloaded successfully: {pdf_file}")
else:
raise Exception(f"Failed to download PDF: {response.status_code}")
reader = PdfReader(pdf_file)
print("\n--- Extracted Text from PDF ---")
for page_num, page in enumerate(reader.pages, start=1):
text = page.extract_text()
if text:
print(f"\n--- Page {page_num} ---\n{text.strip()}")
else:
print(f"\n--- Page {page_num} ---\n(No text found)")
Hope this helps! Let me know if you need more help. Happy to assist.
If this answers your query, do click Accept Answer and Yes for was this answer helpful, which may help members with similar questions.
If you have any other questions or are still experiencing issues, feel free to ask in the "comments" section, and I'd be happy to help.