Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce LanguageModel/TextProcessing OCP API #38854

Merged
merged 77 commits into from
Jul 21, 2023
Merged

Introduce LanguageModel/TextProcessing OCP API #38854

merged 77 commits into from
Jul 21, 2023

Conversation

marcelklehr
Copy link
Member

@marcelklehr marcelklehr commented Jun 16, 2023

Summary

This adds a LanguageModel namespace with an ILanguageModelManager that queries registered ILanguageModelProviders to either synchronously run or asynchronously schedule ILanguageModelTasks. When scheduled asynchronously, the tasks will be run in a background job and either a TaskFailedEvent or a TaskSuccessfulEvent is emitted with error message or the output, respectively. Currently implemented are the following task types: SummarizeTask, HeadlineTask, TopicsTask and FreePromptTask.

In coordination with #julien-nc we reworked this PR to merge it with another upcoming API. Now, this adds a TextProcessing namespace with an IManager that queries registered IProviders to either synchronously run or asynchronously schedule Tasks. When scheduled asynchronously, the tasks will be run in a background job and either a TaskFailedEvent or a TaskSuccessfulEvent is emitted with error message or the output, respectively. Currently implemented are the following task types: SummaryTaskType, HeadlineTaskType, TopicsTaskType and FreePromptTaskType. Apps can register providers for existing task types in core or use their own task types, which we may pull into core at some point, if we want.

TODO

  • ocs API with a polling endpoint to query the result
    • db needs an output column and ILanguageModelTask probably also needs getters and setters for that
  • make linters happy
  • automatically expire tasks in db

Checklist

@ChristophWurst ChristophWurst added enhancement 2. developing Work in progress integration pending documentation This pull request needs an associated documentation update labels Jun 16, 2023
@marcelklehr marcelklehr marked this pull request as ready for review June 21, 2023 08:41
Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice!

  • Setting default provider per task type

Any idea on how this would be done? We could start and avoid thinking about making this appear in the UI. How would the information be set and stored? occ command, OCS endpoint? Stored in a dedicated table? Consumed by the manager's runTask method so it would really reflect the current value?

core/Controller/LanguageModelApiController.php Outdated Show resolved Hide resolved
lib/private/LanguageModel/LanguageModelManager.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ITopicsProvider.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/SummaryTask.php Outdated Show resolved Hide resolved
@marcelklehr
Copy link
Member Author

Setting default provider per task type

Any idea on how this would be done?

I think it would make sense to do this in a new PR together with the settings for translations and STT.

@kesselb
Copy link
Contributor

kesselb commented Jun 27, 2023

Looks nice, thank you!

I've suggested replacing !empty with count() > 0. Backstory is https://localheinz.com/articles/2023/05/10/avoiding-empty-in-php/#content-array which recommends not to use empty when the code is properly typed.

lib/public/LanguageModel/ILanguageModelManager.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ILanguageModelManager.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ILanguageModelManager.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ILanguageModelProvider.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ILanguageModelProvider.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ILanguageModelProvider.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/ISummaryProvider.php Outdated Show resolved Hide resolved
lib/public/LanguageModel/TopicsTask.php Outdated Show resolved Hide resolved
@marcelklehr
Copy link
Member Author

I've never written tests for server before, so please bear with me. I hope this first commit goes in the right direction :)

@marcelklehr
Copy link
Member Author

Tests are in :)

@ChristophWurst

This comment was marked as off-topic.

@marcelklehr
Copy link
Member Author

Hey :)

Are you familiar with the concept of unit tests, integration tests and their differences?

I'm somewhat familiar with unit and integration tests. I tend to not buy into unit tests and rather go with integration tests, as I feel unit tests have a low return on investment.

Asking because right now the tests do a lot and are not isolated and therefore probably not the fastest.

I did mock most if not all of the unrelated classes. Currently it takes 6s for me.

In my opinion there should be full coverage with real unit tests and integration tests for selected paths that are most important.

I will get to writing unit tests, then :) Can we finalize the API first, then? That would avoid blocking you for implementing the downstream stuff and me for the docs.

@ChristophWurst

This comment was marked as off-topic.

Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

lib/public/TextProcessing/Task.php Outdated Show resolved Hide resolved
Comment on lines +159 to +161
$this->jobList->add(TaskBackgroundJob::class, [
'taskId' => $task->getId()
]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best case scenario, the job runs next time cron.php is executed, right?
If so, maybe we could think about allowing providers to say they will be fast and run the task immediately. All good for now, just thinking ahead.

*
* @PublicPage
*/
public function taskTypes(): DataResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, maybe it makes sense to have another endpoint to list the providers (maybe grouped by task type).

Also, maybe we could include the "selected provider" for each task type in the result of this taskTypes method.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the providers on the frontend though? I think we would usually want to hide them, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they wouldn't be displayed in the text processing UI component but maybe it's useful for the settings part where admins can choose the "default" provider for each task type.

Also, maybe we could include the "selected provider" for each task type in the result of this taskTypes method.

What do you think about that other aspect? It might make sense to let advanced users know which provider is used when choosing a task type. This information would not necessarily be very visible in the UI. But just in case we find a nice way to give more details if users want to know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "selected provider" for each task type

I tend to think this should be hidden from the user. Only the admin should see this.

maybe it's useful for the settings part where admins can choose the "default" provider for each task type

I'll deal with that in the follow up PR :)

Signed-off-by: Marcel Klehr <[email protected]>
Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@marcelklehr marcelklehr merged commit 7c80d66 into master Jul 21, 2023
36 checks passed
@marcelklehr marcelklehr deleted the enh/llm-api branch July 21, 2023 09:20
@marcelklehr
Copy link
Member Author

🎉 🚢

@marcelklehr
Copy link
Member Author

Docs: nextcloud/documentation#10714

@AndyScherzinger
Copy link
Member

@marcelklehr backport to 27 needed?

@marcelklehr
Copy link
Member Author

@AndyScherzinger Yes! If we backport now, this is gonna land in 27.0.2, though.

@AndyScherzinger
Copy link
Member

Ah, right 👍

Good point 🫣

@marcelklehr
Copy link
Member Author

/backport to stable27

@backportbot-nextcloud
Copy link

The backport to stable27 failed. Please do this backport manually.

# Switch to the target branch and update it
git checkout stable27
git pull origin stable27

# Create the new backport branch
git checkout -b fix/foo-stable27

# Cherry pick the change from the commit sha1 of the change against the default branch
# This might cause conflicts. Resolve them.
git cherry-pick abc123

# Push the cherry pick commit to the remote repository and open a pull request
git push origin fix/foo-stable27

More info at https://docs.nextcloud.com/server/latest/developer_manual/getting_started/development_process.html#manual-backport

* along with this program. If not, see <http://www.gnu.org/licenses/>
*
*/
namespace OCP\Common\Exception;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an anti pattern and we really try hard to get rid of the OC_Util and OCP\Util, so let's not start the next thing we can never change because it is used by multiple mechanisms in different ways.

Can we deprecate this (or even revert since it was never released apart from beta) and instead add a dedicated TaskNotFound exception inside the text processing namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was suggested to me by other reviewers 🤷
I'm on vacation until end of August. Feel free to change this in my absence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just read up on the comments... alright then :(

And enjoy your hols! 😎

$taskEntity = $this->taskMapper->find($id);
return $taskEntity->toPublicTask();
} catch (DoesNotExistException $e) {
throw new NotFoundException('Could not find task with the provided id');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per other comment, either (re)throw DoesNotExistException which is already a generic exception with the same meaning, or a dedicated TaskNotFound

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing Work in progress backport-request enhancement integration pending documentation This pull request needs an associated documentation update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API to summarize text Localhosted language model (like llama.cpp)
8 participants