Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document menu scraping #33

Open
qinghao1 opened this issue Aug 2, 2020 · 10 comments
Open

Document menu scraping #33

qinghao1 opened this issue Aug 2, 2020 · 10 comments

Comments

@qinghao1
Copy link

qinghao1 commented Aug 2, 2020

Hi there, I don't think the code to scrape the OHS menu is in here and the scraper is for mongo. Could you add the code you're using here or in a separate repo?

@jonchan51
Copy link
Contributor

Hi, you can refer to this branch for a psql version. I didn't merge it in as the way we were scraping the pdfs were rather awkward because uhs filenames weren't standardized. You'll need to modify menu_downloader.py to make it work with the new file names they are using on the website now.

@qinghao1
Copy link
Author

qinghao1 commented Aug 2, 2020

Thanks! I see, it might be better to get the URL from parsing https://uci.nus.edu.sg/ohs/current-residents/students/daily-menu-2/, I'll see if I can help you with that if I have the time haha

@qinghao1
Copy link
Author

qinghao1 commented Aug 6, 2020

Just an update on this, I think they actually changed the URLs (https://uci.nus.edu.sg/ohs/current-residents/students/daily-menu-2/). I'm going to work on the parsing but please let me know if you have already found another way around it! Thanks

@moziliar
Copy link
Contributor

Hi @qinghao1 may I know if the issue has been resolved. I was in charge of the bot alone and was not very updated with the scraper done by my teammates.

@qinghao1
Copy link
Author

Hi there, I think it hasn't been fixed, but this PR should provide everything you need to fix it. That's on the scraper side though, so I don't think it has anything to do with the bot itself.

@moziliar
Copy link
Contributor

Thanks. I just tried running the scraper on the CentOS container again and the lru seems to be breaking on it without meaningful error message. I replaced it with a normal dict and it still doesn't work. May I have your input on this?

@qinghao1
Copy link
Author

What's the error that you're seeing? It might be the case that the OHS website is blocked. Maybe try running it locally?

@qinghao1
Copy link
Author

Also you'd have to install lru in pipenv

@moziliar
Copy link
Contributor

I did pipenv install with lru inside, but the installation seems to output some stacktrace without much meaningful error message.

@qinghao1
Copy link
Author

I guess you could just replace it with a normal dict, I don't think it will exceed memory usage with normal use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants