Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB constraints for model validation #28

Open
motasem-salem opened this issue Sep 5, 2014 · 28 comments
Open

DB constraints for model validation #28

motasem-salem opened this issue Sep 5, 2014 · 28 comments
Labels

Comments

@motasem-salem
Copy link
Member

There are two different opinions:

1 - Use DB level constraints in addition to model validations.
2 - Rely only on model validations.

Original discussion: #24 (diff)

@motasem-salem
Copy link
Member Author

I vote for 1

@PurityControl
Copy link
Contributor

For model validation I vote for 1 - db validation.
If there is resistance to it I don't mind only model validation.
However having some things we validate at the db level and not others is something I am quite strongly against.

@sampritipanda
Copy link
Member

I vote for 2.

Like @PurityControl I am against having some validations in the db and others in the model. I think all of the validations should be done in the model. The database should be only used for uniqueness validations using indexes.

As I have said before, you can't have all the validations in the DB; but you can have all the validations in the model.

@tochman
Copy link
Member

tochman commented Sep 5, 2014

I usually don't refer to conventions and best practice so I just state my personal opinion: I would definitely put all validations in the model. ;-)

@yggie
Copy link

yggie commented Sep 5, 2014

there should only be one source of truth in my opinion, model validations can be seen at a glance. So its a 2 for me 😉

@NikitaAvvakumov
Copy link
Member

Voting for 1.
Importantly, as we discussed, models remain as the one single source of truth, the db merely conforms to it and backs it up. Models retain all validations, nothing is deferred to the db.

@NikitaAvvakumov NikitaAvvakumov mentioned this issue Sep 5, 2014
3 tasks
@techsailor
Copy link

I'm not voting but just highlighting uniqueness issue from rails tutorial

Using validates :uniqueness does not guarantee uniqueness.

D’oh! But what can go wrong? Here’s what:

Alice signs up for the sample app, with address [email protected].
Alice accidentally clicks on “Submit” twice, sending two requests in quick succession.
The following sequence occurs: request 1 creates a user in memory that passes validation, request 2 >does the same, request 1’s user gets saved, request 2’s user gets saved.
Result: two user records with the exact same email address, despite the uniqueness validation.
If the above sequence seems implausible, believe me, it isn’t: it can happen on any Rails website with >significant traffic. Luckily, the solution is straightforward to implement; we just need to enforce >uniqueness at the database level as well. Our method is to create a database index on the email >column, and then require that the index be unique.

But I would lean towards model validations personally in the majority, but a few db validations (where critical) might be useful too.

@PurityControl
Copy link
Contributor

@techsailor nice insight

@PurityControl
Copy link
Contributor

I think some of the comments suggest there is confusion. The vote is not for one or the other, it is whether we implement db constraints on top of model validations.
Not having model validations is not an option.

@yggie
Copy link

yggie commented Sep 6, 2014

Thanks for clarifying @PurityControl 😀, I obviously didn’t read the description correctly. Well my opinion is then to avoid DB validations for the time being. At present, it doesn't add much value, and is much harder to keep track of and change. If you find a need to add it (for a more robust validation), the database constraints can still be added in a separate PR on top of the model validations. Although I find it highly unlikely given the use case that you would ever experience race conditions (especially if you are running on just a single dyno on Heroku), so it is premature optimization at this point.

@betasve
Copy link
Contributor

betasve commented Sep 6, 2014

I tend to agree with @yggie. Definitely voting for (2). I am concerned about the cases of email duplication, because of double-clicking a button. Unfortunately according to my observations, lots of newbie internet users are doing it. They think the in-browser env is the same like their desktop. Pretty lame if you as me :) Nevertheless if this duplication happens only when the user double-clicks and server is overloaded, then the chance of this happening is small.
I voted for option 2, mostly because I think this is the agile way. No need to implement something of questionable value at this stage of the project.

@NikitaAvvakumov
Copy link
Member

@yggie Point taken - at the present moment, this is probably overkill. However, going forward with a constantly changing distributed team, where the people working on the app tomorrow may be entirely different from the ones working on it today, it will be too easy to miss the moment where database-level constraints become valuable or even necessary. The app's primary purpose is data management and manipulation, it is essentially an online database. As such, data integrity should be very high on our list of priorities. Database constraints are the tools made for this purpose, and can guard not only against invalid user input but also against the actions of an ill-informed developer.

@yggie
Copy link

yggie commented Sep 6, 2014

@betasve to address your concern, we need to look deeper into how Rails works underneath. Rails is essentially single-threaded, which means it can only process one request at a time. Taking @techsailor's quoted example of Alice accidentally clicking twice, the two requests sent will be resolved in the order that they are sent. By the time the second request is ready to be processed, with the appropriate model constraints, it will be rejected. Not to say Rails can't process requests in parallel, but with the default configuration, we won't see this problem anytime soon.

@betasve
Copy link
Contributor

betasve commented Sep 6, 2014

@yggie that sounds ok

@NikitaAvvakumov
Copy link
Member

@yggie What about the situation described here: two separate web processes requesting uniqueness validations near-simultaneously? OSRA runs Unicorn, on a single Heroku dyno AFAIK, but it can still have multiple web processes, right? So, it's not just a case of Alice clicking twice, but two OSRA employees entering data simultaneously.

@yggie
Copy link

yggie commented Sep 6, 2014

@NikitaAvvakumov depends on how you set it up. For example, in WSO we have the following configuration: worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3) in config/unicorn.rb. If you are really concerned about the issue, just set the number of workers to 1.

@PurityControl
Copy link
Contributor

The problem I seem to have with this discussion is the mismatch between our process and the technologies we use.

  • If we want the flexibility of rails and simpler schema changes along with validation only in the model why are we using a relational database and normalising the data to the extent we have when we could have a much simpler datastore. This decision goes beyond constraints and even affects things like when a call to the different models should be in a transaction.

Also I don't think we gain that much leaving them out. Our options are:

1.We have db constraints and we have the upfront work of making them not nil with a default (possibly empty value).
2. we have the choice of making those columns in the model
validates :presence => true, :allow_blank => true
3. When creating scopes or queries that revolve around empty data we have to remember to check for both blank and nil.

3 seems to be me by far the most error prone.
1 and 2 seem to have similar amounts of work but 2 can be contravened by using rails methods that can bypass validations or by a tool that talks to the database outside of rails.

@NikitaAvvakumov
Copy link
Member

@PurityControl 👍

@NikitaAvvakumov
Copy link
Member

@yggie At what point would you start worrying about the uniqueness issue? Is there a metric for number of users, amount of traffic, db hits etc. you keep an eye on? Gut feeling? Or you don't see this as an issue at all?

Do you prefer setting workers = 1 over enforcing uniqueness in the db?

Sent from Mailbox

On Sat, Sep 6, 2014 at 12:45 PM, Bryan Yap [email protected]
wrote:

@NikitaAvvakumov depends on how you set it up. For example, in WSO we have the following configuration: worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3) in config/unicorn.rb. If you are really concerned about the issue, just set the number of workers to 1.

Reply to this email directly or view it on GitHub:
#28 (comment)

@yggie
Copy link

yggie commented Sep 6, 2014

@NikitaAvvakumov I don’t see this as an issue at this point. For one thing, this system isn’t even deployed yet, so we will never experience the issue. Even if it is, given the nature of the project, it is highly unlikely that there will be enough write traffic to make this a big issue.

I am not saying we should avoid database constraints entirely, my argument was that database schemas are harder to keep track of, while AR model validations are much easier to keep track of. So unless you already have a fixed schema, I would suggest continue developing with model validations until you have reached a satisfactory schema, then consider adding database constraints for a more robust system.

@sampritipanda
Copy link
Member

I agree with @yggie on the point that it much easier to organize all the validations in the model rather than having some validations in the model and others in the database and the duplicate validations.

@NikitaAvvakumov
Copy link
Member

Comparing schema.rb to the combined model definitions, I don't see how the validations in the latter are particularly easier to keep track of than the constraints in the former. And "easier" doesn't sound like the best motivation, either - it would be easier not to write any tests, for example 😵

I agree with @PurityControl that having chosen an SQL database over a "dumb" key-value store, leaving out the constraints feels like a half-measure. We like and take advantage of the fact that our boolean columns can't contain strings, or that our integer columns can't contain floats, so why not go a small step further and ensure that the boolean columns can only hold true and false, or that the number of siblings must be filled in, and that neither of them can be nil?

Since no one is backing down from their position, should we just flip a coin and learn to live with the consequences?

@PurityControl
Copy link
Contributor

I don't mind living with either decision.

These issues aren't really ruby or rails criticisms, they are relational database criticisms. I think in the main rdbms are pretty amazing but when we use them we are forced to live within relational constraints which can be good and bad. However, we also have to live with some bad design decisions, in particular the use of nil in the database and because of this we need to guard against issues we wouldn't necessarily see in other data stores. I think not dealing with those issues head on is a little short sighted.

Having said that whatever we choose to do is not going to be so disastrous it can't be fixed down the line. So if it will help move things forward I am willing to abstain on the vote if that brings us closer to consensus.

@motasem-salem
Copy link
Member Author

@motasem-salem
Copy link
Member Author

I have just realized we don't get foreign key constraints automatically when we use has_one or belongs_to associations for example, we just get a integer column with an _id suffix and hope that there is a matching row with the same integer for an id on the other side of the relationship!!! I took it for granted that this will be provided by Rails and I never bothered to check the actual db schema until now. For me, that's far more important than guarding against nulls in some columns.

To say that this made me flabbergasted would be an understatement :) This goes against everything I learned or have worked with in the past. Maybe I just don't get the Rails "Philosophy".

@sampritipanda
Copy link
Member

@motasem-salem You need to have a validates_presence_of :parent on the child model. I do this almost everywhere.

@motasem-salem
Copy link
Member Author

Thanks @sampritipanda. Do you prefer this to using something like: https://github.com/matthuhiggins/foreigner

In either case, do you think we should create a new story for adding these foreign keys?

@sampritipanda
Copy link
Member

@motasem-salem I would go for adding these foreign keys. However I would prefer to do it manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants