Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

[Bug]: SharePointReader fails to load file directory #901

Open
jamiesun opened this issue Jan 26, 2024 · 12 comments
Open

[Bug]: SharePointReader fails to load file directory #901

jamiesun opened this issue Jan 26, 2024 · 12 comments
Labels
bug Something isn't working triage

Comments

@jamiesun
Copy link

jamiesun commented Jan 26, 2024

Bug Description


loader = SharePointLoader(
            client_id = os.environ.get("TEST_APP_CLIENT_ID"),
            client_secret = os.environ.get("TEST_APP_CLIENT_SECRET"),
            tenant_id = os.environ.get("TEST_TENENT_ID")
            )

documents = loader.load_data(
            sharepoint_site_name ="GPT",
            sharepoint_folder_path= "Python",
            recursive = True,
)

An error occurred while accessing SharePoint: {'code': 'itemNotFound', 'message': 'The resource could not be found.'}

Version

main

Steps to Reproduce

sharepoint config

image

Relevant Logs/Tracbacks

No response

@jamiesun jamiesun added bug Something isn't working triage labels Jan 26, 2024
@arun04cbe
Copy link

@jamiesun could you check if the permissions are set as mentioned here - https://llamahub.ai/l/microsoft_sharepoint?from=loaders

@brandon-vidoori
Copy link

Same issue. Permissions are set up correctly in Azure/SharePoint

@arun04cbe
Copy link

@jamiesun or @brandon-vidoori could you confirm whether you were trying to access only folders present in the documents component of the sharepoint site and not other components like pages or site contents

@brandon-vidoori
Copy link

brandon-vidoori commented Jan 27, 2024

Yes, I am only trying to access folders/documents present in the documents folder. It seems to fail on the graph search for SharePoint site specifically the query for the site name returns nothing.

f"https://graph.microsoft.com/v1.0/sites?search={sharepoint_site_name}"

@rupache
Copy link

rupache commented Jan 27, 2024

I am also getting the same error: An error occurred while accessing SharePoint: {'code': 'itemNotFound', 'message': 'The resource could not be found.'}

for doc in documents:
TypeError: 'NoneType' object is not iterable

@jamiesun
Copy link
Author

@arun04cbe @brandon-vidoori

I changed a site without changing any code, and the code executed successfully; I changed other sites again, and it worked.

I feel a little strange, this failed site name is GPT, I don't know if it has something to do with this name, I used a mixture of English and Chinese sentences when I created the site again, the system automatically generates GPT as the site name.

The sites that I succeeded in executing were all single English word site names without exception.

I'm not sure if it's a problem with sharepoint itself

@brandon-vidoori
Copy link

@jamiesun @arun04cbe

The sites that I succeeded in executing were all single English word site names without exception.

The site I have been testing with is like “Data Science” so that might be causing the issue. Will try with a site named “Data” to see if that succeeds.

The documentation for the Graph REST API search sites is not clear on expected behavior for a partial match and would seem to suggest a search for “Data” would return sites named “Data Science” and “Data Management”. GetSite seems more appropriate for requiring exact match so the inflexibility we are noticing is bizarre to say the least.

@arun04cbe
Copy link

arun04cbe commented Jan 29, 2024

I too faced this issue. I am reading through the msft documentations for the fixes. The current loader is only application based but we also need user based loader, which is well I am working on it. Will post the updates here post fix.

@rupache
Copy link

rupache commented Jan 31, 2024

Can we store indexes in the SharePoint document library itself for persistence? That was the data will be secure within the same domain.

@brandon-vidoori
Copy link

@jamiesun @arun04cbe

SharePointReader returned the same with a single word English named site like "Data" from my previous example.

I know have set up permissions correctly because I can debug SharePointReader locally, set breakpoint and step through code until the access_token is generated, then use that same access_token in postman with GET https://graph.microsoft.com/v1.0/sites?search=Data and success. Not only do I get Data site but I also get DataScience site that I previously created.

@arun04cbe
Copy link

@brandon-vidoori Thanks for pointing out the exact problem. Will look to fix this up.

@brandon-vidoori
Copy link

brandon-vidoori commented Feb 1, 2024

@arun04cbe

I eventually got it working.

This seems like less of a bug and more so the documentation on config variables could be more clear. Provided permissions are set up correctly consider the following:

sharepoint_site_name - is just the name of the site like “Data Science” or “Data”.

sharepoint_folder_path - is just the name of any top level folder in Documents like “Tests”. If you add “Documents/Tests” or “/Tests” it will fail. Only the folder name. Note: I only tested with recursive = True

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

4 participants