Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set numpy and python random seeds for md output #124

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Bluesy1
Copy link
Collaborator

@Bluesy1 Bluesy1 commented Aug 29, 2024

This should allow us to limit changes to public/instructor variants of every question every time a new question is added to a problem bank ....

Also, this makes it so tests don't need to set seeds in the import section of files, making the templates closer to how they should actually be used.

@Bluesy1
Copy link
Collaborator Author

Bluesy1 commented Aug 29, 2024

@firasm I don't know if this is something you want or not ... I'll leave this for your consideration since its a somewhat substantial change to how the public bank sites get built

@firasm
Copy link
Contributor

firasm commented Aug 30, 2024

Well, the original intent was to constantly change it (on every push) so that they're less googlable and scrapeable by Chegg.

The world has now changed and now we have bigger problems (ChatGPT), where it's irrelevant to change features on every push.


PS. I noticed that the random seeds were removed from the source test files when you solved the problem with the duplicated imports.

This is addressed by this right ?

@Bluesy1
Copy link
Collaborator Author

Bluesy1 commented Aug 30, 2024

Well, the original intent was to constantly change it (on every push) so that they're less googlable and scrapeable by Chegg.

The world has now changed and now we have bigger problems (ChatGPT), where it's irrelevant to change features on every push.

I see - I had seen this from the opposite angle, by changing the variant every commit, after anough time, we've effectively published every variant of a question if someone was to go look through the public git history in one of the public problem banks.

If you really wanted to stop scraping, you should probably disallow scraping of the problem banks on your website via updating your robots.txt

I would imagine something like this would do:

Simpler

User-agent: *
Disallow: /oer/ # block all of the oer resources

Sitemap: https://firas.moosvi.com/sitemap.xml

More Precise

User-agent: *
Disallow: /oer/datascience_bank/
Disallow: /oer/physics_bank/
Disallow: /oer/stats_bank/ 

Sitemap: https://firas.moosvi.com/sitemap.xml

PS. I noticed that the random seeds were removed from the source test files when you solved the problem with the duplicated imports.

This is addressed by this right ?

They were never removed fully, just not duplicated (current main):

import random as rd; rd.seed(111)
import pandas as pd
import problem_bank_helpers as pbh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants