Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Save, saves related data also (not changed) #16343

Closed
niden opened this issue May 29, 2023 · 45 comments
Closed

[BUG]: Save, saves related data also (not changed) #16343

niden opened this issue May 29, 2023 · 45 comments
Assignees
Labels
5.0 The issues we want to solve in the 5.0 release bug A bug report status: medium Medium

Comments

@niden
Copy link
Member

niden commented May 29, 2023

I have interesting situation.... @niden can u check out that? A help!

        $contractor = \Models\Contractors::findFirst();
        $company = $contractor->getCompany();

        $contractor->save();

In that case, save will be executed in relation too. I mean object $contractor will be save and object $company too. Why is that?

Originally posted by @borisdelev in #16336 (comment)

@niden niden self-assigned this May 29, 2023
@niden niden added bug A bug report status: unverified Unverified 5.0 The issues we want to solve in the 5.0 release labels May 29, 2023
@borisdelev
Copy link
Contributor

More details here and description... lets say we have 2 Models: Companies & Contractors. Contractors connected Companies with belongsTo relation (alias Company).

Somewhere in controller action:

$contractor = \Models\Contractors::findFirst();
$company = $contractor->getCompany(); // Get related company

...

$contractor->save();

... this will invoke ->save method on $contractor but and ->save method in company (i added dump in beforeValidation method on Company model).
Expected behaviour is to save only $contractor.

PhalconPHP 5.2.*

Not sure is it a feature or a bug... anyway. I want to help somehow.

@andrewrbe
Copy link

This is a bug. It appeared in 4.1 branch. To be precise now we are running 4.1.2. There is no such behavior in php7.4-phalcon_4.0.4-908. We even fixed this package in our Docker but now we had to update and got a lot issues.

Lets assume we have a code

class Spendings extends BaseModel {
....
public function initialize(): void {
    $this->hasOne('spendings_categories_id',
            SpendingsCategories::class,
            'id',
            [
                'reusable' => true,
                'alias'    => 'Category',
            ]);

$sp = Spendings::findFirst(627979);
echo  $sp->Category->name; //Relation
$sp->amount += 0.01;
sp->save();

Queries will be:

       START TRANSACTION
      UPDATE `spendings` SET `status` = 1, `.... WHERE `id` = 627979
(!!!) UPDATE `spendings_categories` SET `name` = 'test', `.... WHERE `id` = 12601
       COMMIT

On the one hand let it be, just additional update. But if we changed spendings_categories_id in spendings to another the second update will wrongly update a row in spendings_categories. For example you had spendings_categories:
id | name
1 | category 1
2 | category 2
You change spendings.spending_category_id from 1 to 2, now you have
1 | category 2
2 | category 2

There is workaround with switches off relation update

public function save(): bool {
        $this->related = [];
        return parent::save();
    }

Do not hesitate to contact me if you need more details or tests.

@rudiservo
Copy link
Contributor

TLDR: this happens in collectRelatedToSave()

Starts in the Model::__get

if typeof relation == "object" {
            /**
             * There might be unsaved related records that can be returned
             */
            if isset this->dirtyRelated[lowerProperty] {
                return this->dirtyRelated[lowerProperty];
            }

            /**
             * Get the related records
             */
            return this->getRelated(lowerProperty);
        }

if you fetch anything it does not get set in the dirtyRelated, I think the ideas was to only execute presaverelated() and postsaverelated() if hasRelated() and collectRelatedToSave()

Issue is in the collectRelatedToSave()

for name, record in related {
            if isset dirtyRelated[name] {
                continue;
            }

            if typeof record !== "object" || !(record instanceof ModelInterface) {
                continue;
            }

            record->setDirtyState(self::DIRTY_STATE_TRANSIENT);
            let dirtyRelated[name] = record;
        }

so the getRelated() will set this->related[lower_property] = related
and the collectRelatedToSave() will iterate on the related
But since the model is not in the this->dirtyRelated it will be set Transient with record->setDirtyState(self::DIRTY_STATE_TRANSIENT);
So it can be executed in preSaveRelatedRecord with if record->dirtyState !== Model::DIRTY_STATE_PERSISTENT && !record->save()

Right now it's a feature because probably there "is" a bug
So lets suppose you have C belongs to B and B belongs to A

if we A->B->C->setSomething(value);
C will be in a transient state, but B will not
So saving A will not save C unless B is also in a transient State

I can think of 3 ways to fix this:

  1. has a static "cache" of real dirtyModels to be executed, this could trigger a save on unwanted and unrelated models.
  2. cascade dirtyState to parents, this will consume a lot of cycles since you have to check if there is a loaded relation and cascade on it everytime you do a __set
  3. check all loaded relations recursively for any transient models on save, this could trigger infinite loops (my guess is it does already) so an EXECUTED and EXECUTED_TRANSIENT states will/might be needed to differentiate and avoid infinite save loops, this might be possible without a First Level Cache, I am not sure.

FYI to trigger an infinite loop ATM you just have to

A->B
B->setA(A)
A->save()

I am working currently on this in my repo while trying to add FirstLevelCache... not easy without breaking a few things, sorry.

@rudiservo
Copy link
Contributor

Forgot to mention, my proposal for extra Executed States if not optional they might be a breaking change on model before and prepare Events with business logic.
for example if someone as a beforesave or preparesave that B changes A, if A is already executed and B makes a change in A before update, that change will not be saved and might only happen on hasOne and hasMany relations.

@noone-silent
Copy link
Contributor

This is weird.

We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

@rudiservo
Copy link
Contributor

This is weird.

We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

It's a quick fix if you are only trying to save the "first ring" or first level of relations, if you go beyond that it is an issue.

If you have a second level or a second ring, A->B->C and changed only C and trigger A->save();

All ok here A checks for B hasChanged, nothing is changed on B so move on.

Problem is we only changed C and it will never be saved unless you call C->save somewhere in your code.
If you have a really heavy business logic in the models, this is going to cause issues with models not being saved.

It's a quick fix for simple level 1 relations, anything above that is broken.

@noone-silent
Copy link
Contributor

noone-silent commented Jul 7, 2023

This is weird.
We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

It's a quick fix if you are only trying to save the "first ring" or first level of relations, if you go beyond that it is an issue.

If you have a second level or a second ring, A->B->C and changed only C and trigger A->save();

All ok here A checks for B hasChanged, nothing is changed on B so move on.

Problem is we only changed C and it will never be saved unless you call C->save somewhere in your code. If you have a really heavy business logic in the models, this is going to cause issues with models not being saved.

It's a quick fix for simple level 1 relations, anything above that is broken.

We never used C level, because of the way getRelated() handles not found or null/false results. We also still use php 7.4 so we do not have ?->.

The solution to your case, would be, to call the same method on every $record:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            $changed = true;
            if ($record instanceof Model === true) {
                $hasDirtyRelated = $record->collectRelatedToSave();
                if (\count($hasDirtyRelated) > 0) {
                    $record->save();
                }
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

That should solve C, D and WhatEverLevel.

@rudiservo
Copy link
Contributor

@noone-silent no you can't do that! you have prepare, before and after that must me run on it's order of type of relation, that will trigger prepareSave and beforeSave no matter what type of relation there is, it is a breaking change, massive one.

@rudiservo
Copy link
Contributor

rudiservo commented Jul 7, 2023

@noone-silent the only way to fix it correctly is with state flags and first level cache (thread cache).
Because you need to avoid save loops, run the whole relations to check for changed data and prepareSave might change data, and FLC to keep a single source of truth where if you find a model it will always return the same object instance.

This is something that Java Hibernate ORM has by default and you can't disable it.

https://vladmihalcea.com/jpa-hibernate-first-level-cache/

https://vladmihalcea.com/the-anatomy-of-hibernate-dirty-checking/

I wish it was an easier problem to solve, but it's not, all magic comes with a price, ORM magic comes with a heavy price of data flow complexity, consistency complexity and a lot of cycles to ensure that everything checks out.

@rudiservo
Copy link
Contributor

rudiservo commented Jul 7, 2023

To be honest I think prepareSave, while it has good intentions of enabling making a lot of changes just before the model gets saved, it is a bad practice if you rely on it for data consistently.
Because the ORM can only run the model once, if devs depend on prepareSave for calculations or data consistency and are not using their own logic and functions before triggering a save(), it will lead to data not being changed because of the type and complexity of relations.

I have a very complex project that relations can go to 5 or 6 layers deep, don't like it but it's the way the customer needs it.

We could however have a dirty cache inside a thread cache that when a save gets triggered it runs normally then checks the cache for unsaved models, this adds more cycles and checks every time a model gets changed or save, and might even be unwanted behaviour, I don't recommend it like this.

So my proposal is dirty_state execution flags to avoid loops, resultset with added logic to append/add and persist models with a clear way for dev to program to make sure that it's get triggered by save(), and avoid magic has much has possible.

I am working on it has we speak, it's slow progress because I am trying not to break previous behaviour, i.e. saving hasMany that is an array instead of a resultset object, keeping collectRelatedToSave function.
The hardest part of them all is when you delete or remove a related model, this is the one thing that will drive me nuts, because the model still exists in the database, but you do not want it to be used in a resultset because you deleted it, just not persisted it.

@larsverp
Copy link

@rudiservo Is this related to this comment: #15148 (comment)

Because we are having massive performance issues in our application since the Phalcon 5 update :)

@noone-silent
Copy link
Contributor

To be honest I think prepareSave, while it has good intentions of enabling making a lot of changes just before the model gets saved, it is a bad practice if you rely on it for data consistently. Because the ORM can only run the model once, if devs depend on prepareSave for calculations or data consistency and are not using their own logic and functions before triggering a save(), it will lead to data not being changed because of the type and complexity of relations.

I have a very complex project that relations can go to 5 or 6 layers deep, don't like it but it's the way the customer needs it.

We could however have a dirty cache inside a thread cache that when a save gets triggered it runs normally then checks the cache for unsaved models, this adds more cycles and checks every time a model gets changed or save, and might even be unwanted behaviour, I don't recommend it like this.

So my proposal is dirty_state execution flags to avoid loops, resultset with added logic to append/add and persist models with a clear way for dev to program to make sure that it's get triggered by save(), and avoid magic has much has possible.

I am working on it has we speak, it's slow progress because I am trying not to break previous behaviour, i.e. saving hasMany that is an array instead of a resultset object, keeping collectRelatedToSave function. The hardest part of them all is when you delete or remove a related model, this is the one thing that will drive me nuts, because the model still exists in the database, but you do not want it to be used in a resultset because you deleted it, just not persisted it.

If you find a solution, please patch it into phalcon 4 too. People still use it

@larsverp
Copy link

I have also replicated this issue in our application. We have installed ClockWork that allows us to see all database calls made in a call. I ran this code:

public function phalconAction()
    {
        /** @var EventTicket $ticket */
        $ticket = EventTicket::query()->andWhere('uuid = "93b006a8-188b-4c2b-bd3c-64f8c6f3101e"')->execute()->getFirst();

        $orderItem   = $ticket->orderItem;
        $orderStatus = $orderItem->order->status;

        if ($orderStatus === OrderStatus::COMPLETE) {
            $ticket->status = EventTicketStatus::RESERVED;
            $ticket->save();
        }
    }

That results in the following ClockWork result:

image

The EventTicket Query is not on the screenshot, but you can see that 2 UPDATE queries are done for Order & OrderItem. The data in the UPDATE query is not different from what is currently in the database.

The example code I ran is not too bad, but in our production application this creates a massive performance issue.

@rudiservo
Copy link
Contributor

@rudiservo Is this related to this comment: #15148 (comment)

Because we are having massive performance issues in our application since the Phalcon 5 update :)

At first glance yes, do you have dynamic update? Just to know if it influences it.

@larsverp
Copy link

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

@rudiservo
Copy link
Contributor

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

It checks the model for changed fields before making and update to the database.
It uses more RAM because it keeps a snapshot of original values, and a few more CPU cycles to check for changes.

The increased load you are seeing is in the PHP or the database?

public function initialize()
    {
        $this->useDynamicUpdate(true);
    }

@larsverp
Copy link

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

It checks the model for changed fields before making and update to the database. It uses more RAM because it keeps a snapshot of original values, and a few more CPU cycles to check for changes.

The increased load you are seeing is in the PHP or the database?

public function initialize()
    {
        $this->useDynamicUpdate(true);
    }

Our database is the biggest issue. But that's understandable looking at my example. (#16343 (comment))

Every model select triggers a database update as well. That explains why our database has been having a hard time since the Phalcon 5 update.

@rudiservo
Copy link
Contributor

rudiservo commented Jul 16, 2023

@larsverp well dynamic updates will save writes to your DB, also they will not issue an update for all the fields of the model, only the ones that have changed, making the update query smaller.
It's easier to scale up php workers than it is to scale up the DB, with Relational DB clusters the writing usually happens always in the same node, so pulling easing off the writing will improve the database by a lot.

Test it and give your feedback on how it went.

The solution for this without using dynamic updates, is to forget about "dirty state" on relations has you have to always check the relations for changes and just use the dirty state only for changed fields.

@larsverp
Copy link

@rudiservo Alright thanks I will do some testing with the Dynamic update. Just to be on the same line, the issue we're trying to solve is about Phalcon marking a model as dirty while no fields are updated , right? Is there a possibility for us to disable Phalcon setting a Dirty state on a model? I'd rather check it myself while it's not working.

@larsverp
Copy link

@rudiservo We already seem to use DynamicUpdates so this doesn't do anything for us. It still tries to update the entire model in the DB even no fields have been changed.

@larsverp
Copy link

larsverp commented Jul 17, 2023

@rudiservo I think I was able to write a workaround for our application.

I found the following code in the CPhalcon repo:

protected function collectRelatedToSave() -> array
    {
        var name, record;
        array related, dirtyRelated;

        /**
         * Load previously queried related records
         */
        let related = this->related;

        /**
         * Load unsaved related records
         */
        let dirtyRelated = this->dirtyRelated;

        for name, record in related {
            if isset dirtyRelated[name] {
                continue;
            }

            if typeof record !== "object" || !(record instanceof ModelInterface) {
                continue;
            }

            record->setDirtyState(self::DIRTY_STATE_TRANSIENT);
            let dirtyRelated[name] = record;
        }

        return dirtyRelated;
    }

Source: https://github.com/phalcon/cphalcon/blob/5.0.x/phalcon/Mvc/Model.zep#L1126

In our application all models extend from the ModelBase class (Which extends from the Phalcon Model class) in the ModelBase class I overwrote the collectRelatedToSave function as follows:

public function collectRelatedToSave(): array
{
    $related      = $this->related;
    $dirtyRelated = $this->dirtyRelated;

    foreach ($related as $name => $record) {
        if (isset($dirtyRelated[$name])) {
            continue;
        }

        if (!is_object($record) || !($record instanceof ModelInterface)) {
            continue;
        }

        if (empty($record->getChangedFields()) && empty($record->collectRelatedToSave())) {//<------ This is new
            continue;
        }

        $record->setDirtyState(1);
        $dirtyRelated[$name] = $record;
    }

    return $dirtyRelated;
}

What's your opinion? It seems to work for our application as far as I've tested it.

@rudiservo
Copy link
Contributor

@larsverp it's a quick fix, probably not ideal, it may have a big performance impact, also needs to pass the tests.

@rudiservo
Copy link
Contributor

rudiservo commented Jul 18, 2023

@larsverp Will only work if Dynamic update is set, so you have to put it in the doLowUpdate function.
Also there is a bit of an issue, with field types.
In do lowUpdate you have to cast the values to compare in order to assert if the values are what you want.

i.e. "4" and "4.0" are not exactly the same for PHP, a lot of dev's do not cast the value on set, so this needs to be done.

So the getChangedFields, might not get all changed fields, the function needs a revamp and doLowUpdate needs a revamp to.

The weird part is where you say you have dynamicUpdates, and still saves the model, it should not unless you have an issue in your snapshots, or the type of data you have in your database and model.

Check this to see if there is something different about your model so we can do something about it.
https://github.com/phalcon/cphalcon/blob/5.0.x/phalcon/Mvc/Model.zep#L3911

@rudiservo
Copy link
Contributor

rudiservo commented Jul 18, 2023

@borisdelev do you have dynamicUpdate set?
because if you do and it is executing an update, that behaviour should not happen.
Can you debug and tell us what are types of the fields getting updated, or is it all of them?

@borisdelev
Copy link
Contributor

Nope, didnt set dynamic update. Its a little project.
In my case all fields are updated... And is merged/saved with relations somehow.

@larsverp
Copy link

@rudiservo Thanks for the comment. I'll do some research to find out if this is the case for us. :)

@noone-silent
Copy link
Contributor

Just for everyone to understand, there is no magical way for the system to know what model have changed properties in them not without either the dev marks that something has changed, or the system has to forcefully compare with prior data.

But we have this already? Like I said, the default should be dynamicUpdates and keepSnapshots enabled. Then we KNOW what have changed in every single Model and saving can be skipped. If any of this is disabled, then what we have now as a bug, should be default behavior, because we DON'T KNOW the state of the models and just save them all

@rudiservo
Copy link
Contributor

@noone-silent I can make those changes, but that is not for me to decide, project maintainers have to decide if it is a bug, then it is the default, otherwise it could be a breaking change and that would require a new major version.
For what is worth, Java-Hibertante and PHP-Doctrine keep a snapshot and do dynamic updates mandatory, it's not even an option.

FYI Checking for dynamic updates costs a lot, I do think it should be the default, but if you want performance the developers should "tag" the model has dirty, that way it would only do the DU if it really is dirty.

Magic != performance.

@niden
Copy link
Member Author

niden commented Aug 9, 2023

This has been one of the most painful areas of the framework to be honest. This particular discussion goes back to even v3. When to update? That is the question.

Personally, I don't let the ORM update things for me, I do the updates where I need them to, wrapping the whole operation in a transaction. But that is just me.

I like the idea from @noone-silent. We can go with that and set the behavior for updates once and for all.

@rudiservo
Copy link
Contributor

@niden DynamicUpdates and snapshots by default or mandatory?

FYI, this is just quick fix, still I my advice would be to only run save on relations that we set i.e. $a->b = $b (WIP), I will create an issue for this.

@niden
Copy link
Member Author

niden commented Aug 9, 2023

I would say default. Developers can then change the behavior if they need to.

rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
@niden niden added status: medium Medium and removed status: unverified Unverified labels Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 11, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 12, 2023
niden added a commit that referenced this issue Aug 12, 2023
Fix #16343, DynamicUpdate is now enabled system wide
@niden
Copy link
Member Author

niden commented Aug 12, 2023

Resolved in #16400

Thank you @rudiservo

@niden niden closed this as completed Aug 12, 2023
@noone-silent
Copy link
Contributor

Does it really fix it? This is just the dynamic update enabled by default. The check in collectRelatedToSave what we discussed before is still needed or not?

@rudiservo
Copy link
Contributor

@noone-silent Yes.

The collectRelatedToSave will call the relation but the save will not trigger a SQL update.
In order for the collectRelatedToSave not to grab all relations we need to agree on how will the dev flag that a specific relation has changes to be checked and only those will be returned by collectRelatedToSave.
In my opinion this is needed for control and performance, but it should probably be new issue.

@noone-silent
Copy link
Contributor

Then this issue is not fixed. This issue is about saving a model will also save all related models. Not about making a config setting as default. The config setting as default is just a side effect needed for this ticket.

@rudiservo You complicate things too much here. There have been made many supposed fixes for this already, all using the dynamic update setting. You have concerns about performance, but performance is not the main aspect someone uses ORM. We use ORM because of convenience. If we need pure performance or have hundreds or thousands of queries, we should act like @niden does and do transactions directly, or if we only work like this, don't use ORM at all.

Nobody is using 200 relations and all these relations also use relations of the hundreds. Yes, then performance and even resources would be a problem. But not in those simple use cases we have here.

@rudiservo
Copy link
Contributor

rudiservo commented Aug 13, 2023

@noone-silent it's not fixed?
OK, sorry but I do not understand what is you expectation of it being fixed?
You want for the ORM to check for changedFields in the collectRelatedToSave() or not to collect them at all?
Because checking for fields on collectRelatedToSave or in a save() is about the same performance cost!

I can not really just bluntly check for changed fields on the first level of relations because the ORM does not know how deep in the chain relation anyone made a change to one model.
If and that is a big IF, this change was made, how do you fix the problem of the ORM not checking the whole relation chain for changes? Because it will break someone's code!

You got $A->B->C, a change was made in C but you save A, C does not get saved, how do you fix it honestly?

@rudiservo You complicate things too much here.

Thank you I guess? Yeah I do that.
Honestly I do think you oversimplify things too much here also, the ORM internals are hard by default, right now it's complex and has a lot issues.

Simple != Easy
Complex != Hard

Nobody is using 200 relations and all these relations also use relations of the hundreds.

Do not get this the wrong way, but think about it from the maintainers point of view, is your statement a fact, personal observation or a guess?

I would love to make changes on what I think I need and no one else is using, I can't propose those changes without a really good justification.

BTW sometimes I use 20 relations with 5 relations with 3 more relations and 1 more at the end, why? Bureaucracy and complex business logic that the customer needs, that if I where to manually do it in SQL statements it would cost me more time to develop without any real gains in performance, I did try, and to make matters worse sometimes the customer can be unforgiving in deadlines.

Look if you have any idea how to fix it, do a PR like I do, the maintainers would very much appreciate that, it's actually quite fun and a good learning experience.

You can look at Doctrine, Hibernate and Propel for ideas and how they solved their problems, it's a lot of code but it's fun to read.

Also would advice you to read this blog post from a Zig maintainer, it is quite insightful.
https://kristoff.it/blog/simple-not-just-easy/

@noone-silent
Copy link
Contributor

@noone-silent it's not fixed?
No it's not. Does the save work now like before or is it still saving all related models even without change? It is still saving all related models. And this issue/ticket is about that.

OK, sorry but I do not understand what is you expectation of it being fixed? You want for the ORM to check for changedFields in the collectRelatedToSave() or not to collect them at all? Because checking for fields on collectRelatedToSave or in a save() is about the same performance cost!

I want that save only is saving stuff that needs to be saved. Like before. This is what this issue is about. The performance impact now is much bigger than any change of checking the chain of responsibility will ever create!

I can not really just bluntly check for changed fields on the first level of relations because the ORM does not know how deep in the chain relation anyone made a change to one model. If and that is a big IF, this change was made, how do you fix the problem of the ORM not checking the whole relation chain for changes? Because it will break someone's code!

But you have to. How else you want to check the chain? And why would it break the code. The code is breaking right now, because it does not work like people used it to be or expect it to be. You check the first level and give it down the relation chain and in the end you have all models you need to save. $A->B->C->D->...->Z would be solved with this.

You got $A->B->C, a change was made in C but you save A, C does not get saved, how do you fix it honestly?

You do not stop on the first level. You give it down the relation chain.
It's a chain of responsibility, triggered with the command save and needs to go down the whole chain and call save on every model that is part of this relation.

@rudiservo You complicate things too much here.

Thank you I guess? Yeah I do that. Honestly I do think you oversimplify things too much here also, the ORM internals are hard by default, right now it's complex and has a lot issues.

It was not an attack or anything. Don't get me wrong.

Simple != Easy Complex != Hard

Nobody is using 200 relations and all these relations also use relations of the hundreds.

Do not get this the wrong way, but think about it from the maintainers point of view, is your statement a fact, personal observation or a guess?

I fixed my code with this. Every model is now calling collectRelatedToSave() with the getChangedFields method and guess what? The performance is back to normal.

I would love to make changes on what I think I need and no one else is using, I can't propose those changes without a really good justification.

BTW sometimes I use 20 relations with 5 relations with 3 more relations and 1 more at the end, why? Bureaucracy and complex business logic that the customer needs, that if I where to manually do it in SQL statements it would cost me more time to develop without any real gains in performance, I did try, and to make matters worse sometimes the customer can be unforgiving in deadlines.

Yes, I do that too. We have a billing and invoicing system. A lot of relations there.

Look if you have any idea how to fix it, do a PR like I do, the maintainers would very much appreciate that, it's actually quite fun and a good learning experience.

The fix of it has been proposed many times here already. People implement it already by themself, because this issue is a breaking one. If your code was saving in 250ms before it is now up to 2s. That is breaking.

I will look into it.

@rudiservo
Copy link
Contributor

@noone-silent It's not working has before because someone submitted an issue, v4 broke behaviour that was on v3 and prior, and I've using Phalcon since v1 and honestly skipped v4.

Here is the issue.
#15148

DynamicUpdate is better than getChangedFields, look at the code! In fact I don't even know if getChangedFields is reliable if you compare the code, I have to recheck what is going on.
it should have always been in the first place to save the DB from getting hammered.

I want that save only is saving stuff that needs to be saved. Like before. This is what this issue is about. The performance impact now is much bigger than any change of checking the chain of responsibility will ever create!

We can't have it both ways, save all relations and not save all relations without any input from the dev that triggers either.

What you are proposing is a breaking change, that means a v6, and honestly if we are to make a v6 this is definitely not the way to fix this because it is also flawed.

In Example, the way Doctrine 2 works is opt-out with a readOnly flag, that does not mean it wont go rampant all the way on all relations, yes it's heavy!

You have to opt-in opt-out, it can not be none and things just work out of magic.

I on the other hand think that it should be opt-in, you flag that a relation chain has changes, readOnly flag is if you only want to check relations and skip any update checks, more work for the dev, but far more control over the dataflow.

IMO this is the better solution, it's really good for performance and I stand by it, if you didn't care about performance this would not be an issue for out.

This is what I have been trying to explain to you and why I needed to know what was your expectation.

This change has to be well thought for ramifications, it has to pass the tests and create new ones, it's a ton of work.

Now look, the old behaviour has flaws, the current behaviour has flaws, honestly both are bad, either we can agree on a new behaviour that really fixes the issue once and for all and hopefully the maintainers agree to be the default behaviour, otherwise, it's a setting, not that bad IMO.

You can check my proposal, the relations get flagged with changes WHEN you set them or by another type of opt-in behaviour.

It was not an attack or anything. Don't get me wrong.

I didn't take it like one, you just come out like you don't seem to understand the ramifications and caveats of a complex system like the ORM, I understand you want it fixed for your use case and it's frustrating, but the team can't do it without screwing up someone else, concessions have to be made to make this work otherwise it's not a reliable ORM and we can't please both sides with magic.

But you have to. How else you want to check the chain? And why would it break the code. The code is breaking right now, because it does not work like people used it to be or expect it to be. You check the first level and give it down the relation chain and in the end you have all models you need to save. $A->B->C->D->...->Z would be solved with this.

You really didn't understand, if you don't return B, B never calls C, C never calls D... Z, it wouldn't honestly, I don't how it did before without some opt-in or an overload function to make it work.

You do not stop on the first level. You give it down the relation chain.
It's a chain of responsibility, triggered with the command save and needs to go down the whole chain and call save on every model that is part of this relation.

That is what is doing right now, it did not do before!

Your solution of not returning that model in the collectRelatedToSave, actually breaks that, think about it! if you don't change B and set it, B never calls C.

I fixed my code with this. Every model is now calling collectRelatedToSave() with the getChangedFields method and guess what? The performance is back to normal.

It's a BAD FIX, IF you are not setting the relation or trigger save() on relations in the middle of the model, specially the belongsTo and hasOne, you are going to get unsaved data and a lot of issues, either way, IT'S A BAD UNRELIABLE FIX and it will come back to bite you, I've been there!

I will propose my opt-in solution anyway, it probably has to be a new issue because it's going to be a long conversations with a lot decisions and it's needed for the management of the project, also the maintainers have to accept it.
Sorry but it is what is it, no magic bullet, just really hard software engineering.

rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 15, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 15, 2023
rudiservo added a commit to rudiservo/cphalcon that referenced this issue Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.0 The issues we want to solve in the 5.0 release bug A bug report status: medium Medium
Projects
Status: Released
Development

No branches or pull requests

6 participants