-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] never display sequential primary keys: postgresql solution #133
Comments
Part of the reason not to display sequential primary keys comes down to user experience; numeric identifiers convey no meaning to a user and are virtually impossible to use as a reference point from memory to find something later. If you're going to be displaying something in the URL on a public facing page, you should really be using a slug. The other half of not displaying sequential keys is to give you security-by-depth. Your users should not be able to just iterate through a range of numbers to figure out what the shape of your dataset is. Random number sets are a step in the right direction, but I don't feel like they go far enough. UUIDv4 is complex and random enough that it makes any kind of experimentation to discover the shape of your dataset practically infeasible. I.e., the cost and time of attempting aren't generally worth the payoff. For the above reasons, I'd strongly advise simply using UUIDv4 as a PK and providing a user-facing slug whenever possible. That being said, there are a couple of Django-specific corner cases where switching the PK to something generated at runtime isn't really feasible; e.g., if you want to check to see if an object exists by referencing the pk (which won't have been set until the model instance has been saved). I personally feel like this is a bit of an anti-pattern, but it certainly exists in the wild. If you're dealing with a situation like this that isn't particularly feasible to fix, randomizing the PK pool as above is probably a decent compromise. |
Nathan,
I was proceeding based on the content of the book.
The book covers slugs and UUIDs, and mentions they both have disadvantages. The book also advises against mere obfuscation. (Randomized integer primary keys are not mere obfuscation because you can pick your own secret sauce.)
Your objections to randomized primary keys are largely or wholly outside of the book content.
UUIDv4 is complex and random enough that it makes any kind of experimentation to discover the shape of your dataset practically infeasible. I.e., the cost and time of attempting aren't generally worth the payoff.
I would say that applies equally to randomized integer keys. Because you can pick your own secret sauce, maybe randomized integer primary keys are better than UUIDs.
Note the cited real world example: ebay item numbers.
Q: Why doesn't eBay just use an alphanumeric scheme for IDs
A: While alphanumeric IDs would work from a functional perspective, there are major performance reasons that favor the use of numeric IDs. EBay's scalability challenges are tremendous. Numeric IDs are more space-efficient than alphanumeric IDs. In larger scale tables, indexes on alphanumeric columns are slower than on numeric columns.
https://ebaydts.com/eBayKBDetails?KBid=468
Databases handle integer keys better and faster. UUID's and slugs introduce extra overhead.
The book lists UUIDs and slugs as options, but randomized integer primary keys are arguably better.
Rick
From: Nathan Cox <[email protected]>
To: twoscoops/two-scoops-of-django-1.11 <[email protected]>
Cc: Rick Graves <[email protected]>; Author <[email protected]>
Sent: Wednesday, May 16, 2018 5:23 AM
Subject: Re: [twoscoops/two-scoops-of-django-1.11] [enhancement] never display sequential primary keys: postgresql solution (#133)
Part of the reason not to display sequential primary keys comes down to user experience; numeric identifiers convey no meaning to a user and are virtually impossible to use as a reference point from memory to find something later. If you're going to be displaying something in the URL on a public facing page, you should really be using a slug.The other half of not displaying sequential keys is to give you security-by-depth. Your users should not be able to just iterate through a range of numbers to figure out what the shape of your dataset is. Random number sets are a step in the right direction, but I don't feel like they go far enough. UUIDv4 is complex and random enough that it makes any kind of experimentation to discover the shape of your dataset practically infeasible. I.e., the cost and time of attempting aren't generally worth the payoff.For the above reasons, I'd strongly advise simply using UUIDv4 as a PK and providing a user-facing slug whenever possible. That being said, there are a couple of Django-specific corner cases where switching the PK to something generated at runtime isn't really feasible; e.g., if you want to check to see if an object exists by referencing the pk (which won't have been set until the model instance has been saved). I personally feel like this is a bit of an anti-pattern, but it certainly exists in the wild. If you're dealing with a situation like this that isn't particularly feasible to fix, randomizing the PK pool as above is probably a decent compromise.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Location within the Book 365-366
You advise never to display sequential primary keys. There is a solution that might be worth mentioning: in the tables where, by default, django displays the key in the url to access the record, use non-sequential integer primary keys.
Databases in general have an easier time with integer keys. So non-sequential integer keys might be a better option than slugs or UUID's. And making the integer keys non-sequential avoids the additional overhead of adding an extra field and index.
The idea has come up before, and there are solutions out there. Here are the links I found:
Pseudo_encrypt
Pseudo_encrypt_constrained_to_an_arbitrary_range
Here is the code I used to make integer keys at least 7 digits in length:
Make your own "secret sauce"! Tweak the numbers:
(((1366 * r1 + 150889) % 714025) / 714025.0)
As explained here:
sql-keys-in-depth
Tweaking the numbers, note that the last two should be the "same", the last one being the float version of the prior integer.
Yes, in my implementation, I tweaked the numbers. But for the following example, I used the non-tweaked numbers:
This solution can be implemented as a retrofit, just make your minimum value non-sequential key bigger that the biggest sequential key in your tables; change this line:
Also note that the secret sauce values are stored in the database, so keeping them out of version control is feasible.
I would not recommend this for the user table, as there is utility in having a recognizable alpha user name. This option can be a good fit for any other table.
Real world example: ebay item numbers
The text was updated successfully, but these errors were encountered: