You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can the instructions from 06-environment.md.
What's the version of Pandas that you installed?
You can get the version information using the __version__
field:
pd.__version__
For this homework, we'll use the California Housing Prices dataset. Download it from here.
You can do it with wget:
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/housing.csv
Or just open it with your browser and click "Save as...".
Now read it with Pandas.
How many columns are in the dataset?
- 10
- 6560
- 10989
- 20640
Which columns in the dataset have missing values?
total_rooms
total_bedrooms
- both of the above
- no empty columns in the dataset
How many unique values does the ocean_proximity
column have?
- 3
- 5
- 7
- 9
What's the average value of the median_house_value
for the houses located near the bay?
- 49433
- 124805
- 259212
- 380440
- Calculate the average of
total_bedrooms
column in the dataset. - Use the
fillna
method to fill the missing values intotal_bedrooms
with the mean value from the previous step. - Now, calculate the average of
total_bedrooms
again. - Has it changed?
Has it changed?
Hint: take into account only 3 digits after the decimal point.
- Yes
- No
- Select all the options located on islands.
- Select only columns
housing_median_age
,total_rooms
,total_bedrooms
. - Get the underlying NumPy array. Let's call it
X
. - Compute matrix-matrix multiplication between the transpose of
X
andX
. To get the transpose, useX.T
. Let's call the resultXTX
. - Compute the inverse of
XTX
. - Create an array
y
with values[950, 1300, 800, 1000, 1300]
. - Multiply the inverse of
XTX
with the transpose ofX
, and then multiply the result byy
. Call the resultw
. - What's the value of the last element of
w
?
Note: You just implemented linear regression. We'll talk about it in the next lesson.
- -1.4812
- 0.001
- 5.6992
- 23.1233
- Submit your results here: https://forms.gle/jneGM91mzDZ23i8HA
- You can submit your solution multiple times. In this case, only the last submission will be used
- If your answer doesn't match options exactly, select the closest one
The deadline for submitting is 18 September 2023 (Monday), 23:00 CEST (Berlin time).
After that, the form will be closed.