-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web scraping / API re-introduction #264
Conversation
…ge; scraping section done
@joelostblom this should be good now, but I did want your opinion on one thing: the last block of code in the API section. It looks like this: data_dict = {
"date":[],
"title": [],
"copyright" : [],
"url": []
}
for item in nasa_data:
data_dict["copyright"].append(item["copyright"] if "copyright" in item else None)
for entry in ["url", "title", "date"]:
data_dict[entry].append(item[entry])
nasa_df = pd.DataFrame(data_dict)
nasa_df It is a little bit complicated for chapter 2 of the book. I currently put a note box warning before it: But do you have any idea for how to simplify it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go through this in detail, but from skimming it looks great overall!
I don't have anything drastically simpler for your snippet in the end. Maybe students would find it easier without the ternary expression?
data_dict = {
"date":[],
"title": [],
"copyright" : [],
"url": []
}
for item in nasa_data:
if "copyright" in item:
data_dict["copyright"].append(item["copyright"])
else:
data_dict["copyright"].append(None)
for entry in ["url", "title", "date"]:
data_dict[entry].append(item[entry])
nasa_df = pd.DataFrame(data_dict)
nasa_df
One note is that we use 4 spaces for indentation in the rest of the book, but it seems like you used 8 here. Another note is that if we want to further this chapter sometime in the future, I think https://scrapy.org/ is both more powerful and more intuitive in many cases than using beautiful soup directly.
Closes #64
Closes #270
Closes #272
.html
file from wiki, behind the scenes loads that file)lxml
to the book imageupdate_environment
workflow