-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: classify emails by importance based on subjects #10277
Open
st3iny
wants to merge
37
commits into
main
Choose a base branch
from
enh/noid/classification-based-on-subject-V
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
bca2af7
Classify emails based on subjects
st3iny dc47dc1
fixup! Classify emails based on subjects
st3iny 056f67f
fixup! Classify emails based on subjects
st3iny 0be1a21
Cache features per sender
st3iny 99dfa0a
Implement preprocess command
st3iny 28762e6
feat(importance-classifier): Reduce text feature vector
ChristophWurst af70adf
fixup! feat(importance-classifier): Reduce text feature vector
ChristophWurst 08b1e1b
fixup! feat(importance-classifier): Reduce text feature vector
ChristophWurst d7cca9c
fixup! feat(importance-classifier): Reduce text feature vector
ChristophWurst a9f7399
fixup! feat(importance-classifier): Reduce text feature vector
st3iny cee58bf
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 0e82c52
fixup! feat(importance-classifier): Reduce text feature vector
st3iny bd82bea
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 6c6e2ca
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 18767c7
fixup! feat(importance-classifier): Reduce text feature vector
st3iny c764944
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 974ee46
fixup! feat(importance-classifier): Reduce text feature vector
st3iny f68501b
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 7127cf7
fixup! feat(importance-classifier): Reduce text feature vector
st3iny c8c214c
fixup! feat(importance-classifier): Reduce text feature vector
st3iny 51f31bf
fixup! fixup! feat(importance-classifier): Reduce text feature vector
st3iny 3bc398b
Try wcv -> tfidf pipeline
st3iny fed2011
Fix transformer persistence
st3iny e2c057c
Refactor classifcation of new messages
st3iny bb9056d
Refactor peristence
st3iny 71b1b5f
Adjust meta estimator params
st3iny 877be83
Change training sample size to 300
st3iny 7c740b1
Adjust tuned knn params
st3iny 1a5c5b5
Fix reuse compliance
st3iny a7510ca
Run composer cs:fix
st3iny dade1df
Fix most psalm issues
st3iny a808b2e
Persist classifiers in memory cache only
st3iny 379484a
Revert "Adjust tuned knn params"
st3iny 7da784e
Finalize code changes
st3iny 7b62b39
Run compser cs:fix
st3iny 21d45eb
Fix all remaining psalm issues
st3iny a7ea9c0
Run composer cs:fix
st3iny File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<?php | ||
|
||
declare(strict_types=1); | ||
|
||
/** | ||
* SPDX-FileCopyrightText: 2024 Nextcloud GmbH and Nextcloud contributors | ||
* SPDX-License-Identifier: AGPL-3.0-or-later | ||
*/ | ||
|
||
namespace OCA\Mail\Command; | ||
|
||
use OCA\Mail\Service\AccountService; | ||
use OCA\Mail\Service\PreprocessingService; | ||
use OCP\AppFramework\Db\DoesNotExistException; | ||
use Psr\Log\LoggerInterface; | ||
use Symfony\Component\Console\Command\Command; | ||
use Symfony\Component\Console\Input\InputArgument; | ||
use Symfony\Component\Console\Input\InputInterface; | ||
use Symfony\Component\Console\Output\OutputInterface; | ||
use function memory_get_peak_usage; | ||
|
||
class PreprocessAccount extends Command { | ||
public const ARGUMENT_ACCOUNT_ID = 'account-id'; | ||
|
||
private AccountService $accountService; | ||
private PreprocessingService $preprocessingService; | ||
private LoggerInterface $logger; | ||
|
||
public function __construct(AccountService $service, | ||
PreprocessingService $preprocessingService, | ||
LoggerInterface $logger) { | ||
parent::__construct(); | ||
|
||
$this->accountService = $service; | ||
$this->preprocessingService = $preprocessingService; | ||
$this->logger = $logger; | ||
} | ||
|
||
/** | ||
* @return void | ||
*/ | ||
protected function configure() { | ||
$this->setName('mail:account:preprocess'); | ||
$this->setDescription('Preprocess all mailboxes of an IMAP account'); | ||
$this->addArgument(self::ARGUMENT_ACCOUNT_ID, InputArgument::REQUIRED); | ||
} | ||
|
||
protected function execute(InputInterface $input, OutputInterface $output): int { | ||
$accountId = (int)$input->getArgument(self::ARGUMENT_ACCOUNT_ID); | ||
|
||
try { | ||
$account = $this->accountService->findById($accountId); | ||
} catch (DoesNotExistException $e) { | ||
$output->writeln("<error>Account $accountId does not exist</error>"); | ||
return 1; | ||
} | ||
|
||
$this->preprocessingService->process(4294967296, $account); | ||
|
||
$mbs = (int)(memory_get_peak_usage() / 1024 / 1024); | ||
$output->writeln('<info>' . $mbs . 'MB of memory used</info>'); | ||
|
||
return 0; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
<?php | ||
|
||
declare(strict_types=1); | ||
|
||
/** | ||
* SPDX-FileCopyrightText: 2024 Nextcloud GmbH and Nextcloud contributors | ||
* SPDX-License-Identifier: AGPL-3.0-or-later | ||
*/ | ||
|
||
namespace OCA\Mail\Command; | ||
|
||
use OCA\Mail\Service\AccountService; | ||
use OCA\Mail\Service\Classification\ImportanceClassifier; | ||
use OCA\Mail\Support\ConsoleLoggerDecorator; | ||
use OCP\AppFramework\Db\DoesNotExistException; | ||
use OCP\IConfig; | ||
use Psr\Log\LoggerInterface; | ||
use Rubix\ML\Backends\Amp; | ||
use Rubix\ML\Classifiers\KNearestNeighbors; | ||
use Rubix\ML\CrossValidation\KFold; | ||
use Rubix\ML\CrossValidation\Metrics\FBeta; | ||
use Rubix\ML\GridSearch; | ||
use Rubix\ML\Kernels\Distance\Euclidean; | ||
use Rubix\ML\Kernels\Distance\Jaccard; | ||
use Rubix\ML\Kernels\Distance\Manhattan; | ||
use Symfony\Component\Console\Command\Command; | ||
use Symfony\Component\Console\Input\InputArgument; | ||
use Symfony\Component\Console\Input\InputInterface; | ||
use Symfony\Component\Console\Output\OutputInterface; | ||
|
||
class RunMetaEstimator extends Command { | ||
public const ARGUMENT_ACCOUNT_ID = 'account-id'; | ||
public const ARGUMENT_SHUFFLE = 'shuffle'; | ||
|
||
private AccountService $accountService; | ||
private LoggerInterface $logger; | ||
private ImportanceClassifier $classifier; | ||
private IConfig $config; | ||
|
||
public function __construct( | ||
AccountService $accountService, | ||
LoggerInterface $logger, | ||
ImportanceClassifier $classifier, | ||
IConfig $config, | ||
) { | ||
parent::__construct(); | ||
|
||
$this->accountService = $accountService; | ||
$this->logger = $logger; | ||
$this->classifier = $classifier; | ||
$this->config = $config; | ||
} | ||
|
||
protected function configure(): void { | ||
$this->setName('mail:account:run-meta-estimator'); | ||
$this->setDescription('Run the meta estimator for an account'); | ||
$this->addArgument(self::ARGUMENT_ACCOUNT_ID, InputArgument::REQUIRED); | ||
$this->addOption(self::ARGUMENT_SHUFFLE, null, null, 'Shuffle data set before training'); | ||
} | ||
|
||
public function isEnabled(): bool { | ||
return $this->config->getSystemValueBool('debug'); | ||
} | ||
|
||
protected function execute(InputInterface $input, OutputInterface $output): int { | ||
$accountId = (int)$input->getArgument(self::ARGUMENT_ACCOUNT_ID); | ||
$shuffle = (bool)$input->getOption(self::ARGUMENT_SHUFFLE); | ||
|
||
try { | ||
$account = $this->accountService->findById($accountId); | ||
} catch (DoesNotExistException $e) { | ||
$output->writeln("<error>Account $accountId does not exist</error>"); | ||
return 1; | ||
} | ||
|
||
$consoleLogger = new ConsoleLoggerDecorator( | ||
$this->logger, | ||
$output | ||
); | ||
|
||
$estimator = static function () use ($consoleLogger) { | ||
$params = [ | ||
[5, 10, 15, 20, 25, 30, 35, 40], // Neighbors | ||
[true, false], // Weighted? | ||
[new Euclidean(), new Manhattan(), new Jaccard()], // Kernel | ||
]; | ||
|
||
$estimator = new GridSearch( | ||
KNearestNeighbors::class, | ||
$params, | ||
new FBeta(), | ||
new KFold(5) | ||
); | ||
$estimator->setLogger($consoleLogger); | ||
$estimator->setBackend(new Amp()); | ||
return $estimator; | ||
}; | ||
|
||
/** @var GridSearch $metaEstimator */ | ||
$metaEstimator = $this->classifier->train( | ||
$account, | ||
$consoleLogger, | ||
$estimator, | ||
$shuffle, | ||
false, | ||
); | ||
|
||
if ($metaEstimator !== null) { | ||
$output->writeln("<info>Best estimator: {$metaEstimator->base()}</info>"); | ||
} | ||
|
||
$mbs = (int)(memory_get_peak_usage() / 1024 / 1024); | ||
$output->writeln('<info>' . $mbs . 'MB of memory used</info>'); | ||
return 0; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need an explanation for 4294967296
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, I don't even ...
I'll remove this whole command. It only made sense when the preview text was considered so we needed to pre-processing first. Now, only the subject is needed so the command doesn't make sense anymore.