Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

feat: load SharePoint Pages content, feat: load docs from root folder in drive, feat: optionally only load specific file types. #930

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions llama_hub/microsoft_sharepoint/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,18 @@ More info on Microsoft Graph APIs - [Refer here](https://learn.microsoft.com/en-

To use this loader `client_id`, `client_secret` and `tenant_id` of the registered app in Microsoft Azure Portal is required.

This loader can:
- Load files present in a specific folder in SharePoint
- Load all files present in the drive of a SharePoint
- Load all pages under a SharePoint site

CraftingLevi marked this conversation as resolved.
Show resolved Hide resolved
This loader loads the files present in a specific folder in sharepoint.

If the files are present in the `Test` folder in SharePoint Site under `root` directory, then the input for the loader for `file_path` is `Test`

![FilePath](file_path_info.png)

### Example loading a single folder
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")
Expand All @@ -43,5 +49,78 @@ documents = loader.load_data(
)
```

### Example loading all files and pages
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name: "<Sharepoint Site Name>",
recursive = True,
include= ['pages', 'documents']
)
```

### Example loading just pages
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name: "<Sharepoint Site Name>",
recursive = True,
include = ['pages']
)
```

### Example loading just documents
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name: "<Sharepoint Site Name>",
recursive = True,
include = ['documents']
)
```

### Example loading just documents with filetype .docx or .pdf
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name: "<Sharepoint Site Name>",
recursive = True,
include = ['documents'],
file_types = ['docx', 'pdf']
)
```

The loader doesn't access other components of the `SharePoint Site`.

Loading
Loading