Skip to content

Commit

Permalink
Chapters 5-8 production merge (#139)
Browse files Browse the repository at this point in the history
* welcome (#74)

* update the welcome page to indicate this is the python version, include links to the R version

* minor edits (newline for better diffs)

Co-authored-by: Trevor Campbell <[email protected]>

* put headings on the table of contents for easier navigation (#75)

* Update metadata and add chapter numbering (#44)

* Update _config.yml

* Update repo and branch info

* Make some visible changes

* Include packages we actually use and updated jupyterbook to fix netlify

* Test rebuilding notebooks again

* Fix typo

* Update ToC

* Add ibis

* Always execute all notebooks

* Delete requirements.txt

* Update source/_config.yml

* Update source/_config.yml

* Change path and branch

* Increase timeout

* Update _config.yml (#90)

* Ch3: wrangling (#76)

* wip on ch3

* working on wrangling chapter

* move chaining content to the intro

* update content on summary statistics

* update the disucssion on apply

* remove the discussion on What is a List

* move the assign content to the very end

* minor wordsmithing on welcome page

* edited learning objs in ch1 for new chaining

* update discussion on chaining, edits from Trevor

* update discussion of split

* remove unnecessary call to File: dir	Node: Top	This is the top of the INFO tree
  This (the Directory node) gives a menu of major topics.
  Typing "d" returns here, "q" exits, "?" lists all INFO commands, "h"
  gives a primer for first-timers, "mEmacs<Return>" visits the Emacs topic,
  etc.
  In Emacs, you can click mouse button 2 on a menu item or cross reference
  to select it.
  --- PLEASE ADD DOCUMENTATION TO THIS TREE. (See INFO topic first.) ---

* Menu: The list of major topics begins on the next line.

Emacs
* Ada mode: (ada-mode). The GNU Emacs mode for editing Ada.
* Autotype: (autotype). Convenient features for text that you enter frequently
                          in Emacs.
* CC Mode: (ccmode).   Emacs mode for editing C, C++, Objective-C,
                          Java, Pike, and IDL code.
* CL: (cl).		Partial Common Lisp support for Emacs Lisp.
* Dired-X: (dired-x).   Dired Extra Features.
* EUDC: (eudc).   A client for directory servers (LDAP, PH)
* Ebrowse: (ebrowse).   A C++ class browser for Emacs.
* Ediff: (ediff).       A visual interface for comparing and merging programs.
* Emacs: (emacs).	The extensible self-documenting text editor.
* Emacs FAQ: (efaq).	Frequently Asked Questions about Emacs.
* Emacs MIME: (emacs-mime).   The MIME de/composition library.
* Eshell: (eshell).     A command shell implemented in Emacs Lisp.
* Forms: (forms).	Emacs package for editing data bases
			  by filling in forms.
* Gnus: (gnus).         The newsreader Gnus.
* IDLWAVE: (idlwave).	Major mode and shell for IDL and WAVE/CL files.

* MH-E: (mh-e).		Emacs interface to the MH mail system.
* Message: (message).   Mail and news composition mode that goes with Gnus.
* PCL-CVS: (pcl-cvs).	Emacs front-end to CVS.
* RefTeX: (reftex).	Emacs support for LaTeX cross-references and citations.
* SC: (sc).		Supercite lets you cite parts of messages you're
			  replying to, in flexible ways.

* Speedbar: (speedbar).        File/Tag summarizing utility.
* VIP: (vip).		An older VI-emulation for Emacs.
* VIPER: (viper).       The newest Emacs VI-emulation mode.
                          (also, A VI Plan for Emacs Rescue
                           or the VI PERil.)
* Widget: (widget).      Documenting the "widget" package used by the
                           Emacs Custom facility.
* WoMan: (woman).       Browse UN*X Manual Pages `Wo (without) Man'.

Texinfo documentation system
* Info: (info).                 Documentation browsing system.

Miscellaneous
* Screen: (screen).             Full-screen window manager.
* Standards: (standards).        GNU coding standards.

GNU admin
* Autoconf: (autoconf).         Create source code configuration scripts

Individual utilities
* aclocal: (automake)Invoking aclocal.          Generating aclocal.m4
* autoconf: (autoconf)autoconf Invocation.
                                How to create configuration scripts
* autoreconf: (autoconf)autoreconf Invocation.
                                Remaking multiple `configure' scripts
* autoscan: (autoconf)autoscan Invocation.
                                Semi-automatic `configure.ac' writing
* config.status: (autoconf)config.status Invocation.
                                Recreating a configuration
* configure: (autoconf)configure Invocation.
                                Configuring a package
* ifnames: (autoconf)ifnames Invocation.
                                Listing the conditionals in source code

GNU programming tools
* automake: (automake).		Making Makefile.in's

Utilities
* Bash: (bash).                     The GNU Bourne-Again SHell.

GNU Packages
* Tar: (tar).			Making tape (or disk) archives.

Individual utilities
* tar: (tar)tar invocation.                     Invoking GNU `tar'

Software development
* Cpp: (cpp).		       The GNU C preprocessor.
* Cpplib: (cppinternals).      Cpplib internals.
* gcc: (gcc).                  The GNU Compiler Collection.
* gccinstall: (gccinstall).    Installing the GNU Compiler Collection.
* gccint: (gccint).            Internals of the GNU Compiler Collection.
* gfortran: (gfortran).                  The GNU Fortran Compiler.

GNU Libraries
* libgomp: (libgomp).                    GNU OpenMP runtime library

Programming & development tools
* gdbm_dump: gdbm_dump(gdbm).   Dump the GDBM database into a flat file.
* gdbm_load: gdbm_load(gdbm).   Load the database from a flat file.

Utilities

GNU libraries
* gmp: (gmp).                   GNU Multiple Precision Arithmetic Library.

Software libraries
* GnuTLS: (gnutls).		GNU Transport Layer Security Library.
* GnuTLS-Guile: (gnutls-guile).		GNU Transport Layer Security Library. Guile bindings.
* libidn2: (libidn2).	Internationalized domain names (IDNA2008/TR46) processing.
* libtasn1: (libtasn1). Library for Abstract Syntax Notation One (ASN.1).
* mpfr: (mpfr).                 Multiple Precision Floating-Point Reliable Library.

GNU Packages
* mpc: (mpc)Multiple Precision Complex Library.

Development
* fftw3: (fftw3).	FFTW User's Manual.

Individual utilities
* aclocal-invocation: (automake)aclocal Invocation.   Generating aclocal.m4.
* autoconf-invocation: (autoconf)autoconf Invocation.
                                How to create configuration scripts
* autoheader: (autoconf)autoheader Invocation.
                                How to create configuration templates
* autom4te: (autoconf)autom4te Invocation.
                                The Autoconf executables backbone
* automake-invocation: (automake)automake Invocation. Generating Makefile.in.
* autoreconf: (autoconf)autoreconf Invocation.
                                Remaking multiple ‘configure’ scripts
* autoscan: (autoconf)autoscan Invocation.
                                Semi-automatic ‘configure.ac’ writing
* autoupdate: (autoconf)autoupdate Invocation.
                                Automatic update of ‘configure.ac’
* config.status: (autoconf)config.status Invocation. Recreating configurations.
* configure: (autoconf)configure Invocation.    Configuring a package.
* ifnames: (autoconf)ifnames Invocation.        Listing conditionals in source.
* libtool-invocation: (libtool)Invoking libtool. Running the 'libtool' script.
* libtoolize: (libtool)Invoking libtoolize.      Adding libtool support.
* testsuite: (autoconf)testsuite Invocation.    Running an Autotest test suite.

Software development
* Autoconf: (autoconf).         Create source code configuration scripts.
* Automake: (automake).         Making GNU standards-compliant Makefiles.
* Automake-history: (automake-history). History of Automake development.
* GNU libtextstyle: (libtextstyle).     Output of styled text.
* GNU libunistring: (libunistring).     Unicode string library.
* Libtool: (libtool).           Generic shared library support script.

Localization
* idn2: (libidn2)Invoking idn2.	Internationalized Domain Name (IDNA2008/TR46) conversion.

Encryption
* Nettle: (nettle).             A low-level cryptographic library.

System Administration
* certtool: (gnutls)certtool Invocation.	Manipulate certificates and keys.
* gnutls-cli: (gnutls)gnutls-cli Invocation.	GnuTLS test client.
* gnutls-cli-debug: (gnutls)gnutls-cli-debug Invocation.	GnuTLS debug client.
* gnutls-serv: (gnutls)gnutls-serv Invocation.	GnuTLS test server.
* psktool: (gnutls)psktool Invocation.	Simple TLS-Pre-Shared-Keys manager.
* srptool: (gnutls)srptool Invocation.	Simple SRP password tool.

Libraries
* libgpg-error: (gnupg).   Error codes and common code for GnuPG.

GNU Libraries
* libgcrypt: (gcrypt).  Cryptographic function library.

C++ libraries
* autosprintf: (autosprintf).   Support for printf format strings in C++.

GNU Gettext Utilities
* ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
* ISO639: (gettext)Language Codes.             ISO 639 language codes.
* autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
* envsubst: (gettext)envsubst Invocation.      Expand environment variables.
* gettext: (gettext).                          GNU gettext utilities.
* gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
* msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
* msgcat: (gettext)msgcat Invocation.          Combine several PO files.
* msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
* msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
* msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
* msgen: (gettext)msgen Invocation.            Create an English PO file.
* msgexec: (gettext)msgexec Invocation.        Process a PO file.
* msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
* msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
* msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
* msginit: (gettext)msginit Invocation.        Create a fresh PO file.
* msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
* msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
* msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
* ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
* xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.

The Algorithmic Language Scheme
* Guile Reference: (guile).     The Guile reference manual.
* R5RS: (r5rs).                 The Revised(5) Report on Scheme.

* take care of colons preceding code blocks

* take care of chapter references

* add discussion of lists and dicts

* add table and discussion on basic data structures in python:

* add description of info

* some general cleanup in apply, assign

* typo fix in the intro

* a couple of type fixes:

* polish chaining/multiline exps

* polish ch3 up to and incl tidy data

* polish indexing

* more polish ch3

* fix python exercises link

* more on groupby

* improve groupy and discussion of lambda functions

* try re-ordering the assign and apply content

* global find replace to remove . in naming conventions

* caption on fig24 fixed

* polish up to apply

* cleanup through Using  to create new columns

* through the summery

* add :tags: [output_scroll] for large code outputs, change figure types

* trim vertical whitespace on figures:

* Update source/wrangling.md

Co-authored-by: Joel Ostblom <[email protected]>

* Apply suggestions from code review

Co-authored-by: Joel Ostblom <[email protected]>

* polishing ch3

* final polish on wrangling

* final polish on ch3 joel comments

Co-authored-by: Trevor Campbell <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* added altair_saver extension

* update build_html.sh script with new docker image

* Ch4: Viz (#77)

* code formatting for viz

* update viz chapter

* updating the viz chapter

* comments addressed through faithful dataset

* more progress on the viz chapter (part way through morley data)

* add back code to create the csv for mauna_loa

* a couple minor typo fixes

* polishing ch4

* minor polish on ch4

* code tags in learning objs

* polish on ch4, fixed number -> percentage in figure labels

* re-added other filetypes...

* better line formatting in saving section

* ignore altair warnings; committed faithful plots

* moved faithful plots to img/

* done polishing ch4

Co-authored-by: Trevor Campbell <[email protected]>

* removed unused material

* Front matter (#96)

* preface python

* remove foreword

* added editors page

* fix appendix,references

* added py acks

* minor ed

* Update editors.md

add Lindsey bio

* Add joels bio

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* Add jupyterlab help section (#101)

* Ch1 fig cleanup (#99)

* first figures in ch1:

* code figures for ch1, including ppt to edit them

* update figure sizes

* remove old lingering image

* removed hidden pptx cache file

Co-authored-by: Trevor Campbell <[email protected]>

* Ch2 fig cleanup (#102)

* update output scrolling for ch2

* update scrolling of large output tables

* Ch3 fig cleanup (#103)

* figure polishing for ch3

* more ch3 figures

* Jupyter and Version Control (#98)

* added jupyter and version control back in

* converted bookdown to jupyterbook images in jupyter chapter

* jupyterbook index entries in jupyter

* fig references to jupyterbook in jupytermd

* updated unicode characters in jupyter; added a caveat at the beginning regarding images

* initial index and figure template add to version control

* R -> Python in version control

* fixed figure refs bookdown->jupyterbook in vsn ctl

* jupyterbook index style vsn ctl

* fixed images in vsn ctl

* minor polish

* added caveat

* Update source/version-control.md

* admonitions for caveats

Co-authored-by: Lindsey Heagy <[email protected]>

* remove the visible call of `glue` in the saving section (#109)

* remove the visible call of  in the saving section

* remove size calculation code

* trim whitespace from end of line for better diffs (#111)

* Chapter 5 production polish (#78)

* starting work on ch5+6; categorical type change; remove commented out R code

* value counts, class name remap, replace in ch5

* remove warnings

* polished ch5+6 up to euclidean dist

* minor bugfix

* minor bugfix

* fixed worksheets link at end of chp

* fix minor section heading wording in Ch1

* added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet)

* initial fit and predict polished; model spec -> model object

* polishing preprocessing

* balancing polished

* pipelines

* learning objs

* mute warnings in ch5

* restore cls2 to main branch

* Update classification1.md

add output scroll for large table

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove random state specificaiton from ch5

* fixed fill plots

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* better intro of meshgrid

* better warning filtering in chapters 4 and 5

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove np.number and replace with just 'number' in dtype selection

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* properly gluing column names

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* minor fixes post-merge

* Update viz.md (#137)

* Chapter 6 production polish (#86)

* starting work on ch5+6; categorical type change; remove commented out R code

* value counts, class name remap, replace in ch5

* remove warnings

* polished ch5+6 up to euclidean dist

* minor bugfix

* minor bugfix

* fixed worksheets link at end of chp

* fix minor section heading wording in Ch1

* added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet)

* initial fit and predict polished; model spec -> model object

* polishing preprocessing

* balancing polished

* pipelines

* learning objs

* mute warnings in ch5

* warn mute code; fixed links at end

* restore cls2 to main branch

* remove caption hack; minor fix to learning objs

* Remove caption hack

* initial improved seed explanation

* random seed section polish done

* polished ch6 up to tuning

* initial cross val example done

* in python -> in scikit

* working on cross-val

* polished ch6 up to predictor selection

* commented out predictor selection

* done ch6 except final under/overfit plot

* warnings filter in ch6; remove seed hack cell

* remove reference to random state in train/test split

* minor typesetting .method() vs method

* put setup.md back in to fix broken links

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* values -> to_numpy in randomness section

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove code for area plot at the end of ch6

Co-authored-by: Joel Ostblom <[email protected]>

* Chapter 7 production polish (#125)

* fix exercises link

* wip

* polished ch7 up to under/overfitting

* done polishing ch7

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>

* Chapter 8 production polish (#126)

* fix exercises link

* polishing ch8

* done polishing ch8

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
  • Loading branch information
4 people authored Jan 18, 2023
1 parent c4c13dd commit 4947023
Show file tree
Hide file tree
Showing 18 changed files with 2,017 additions and 2,510 deletions.
2 changes: 1 addition & 1 deletion source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ parts:
- file: acknowledgements-python.md
- file: authors.md
- file: editors.md
#- file: setup.md
- file: setup.md
- caption: Chapters
numbered: 3
chapters:
Expand Down
6 changes: 3 additions & 3 deletions source/acknowledgements-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ kernelspec:

# Acknowledgments for the Python Edition

We'd like to thank everyone that has contributed to the development of
We'd like to thank everyone that has contributed to the development of
[*Data Science: A First Introduction (Python Edition)*](https://ubc-dsci.github.io/introduction-to-datascience-python/).
This is an open source Python translation of the original [*Data Science: A First Introduction*](https://datasciencebook.ca);
the original focused on the R programming language. Both of these books are
used to teach DSCI 100, a new introductory data science course
the original focused on the R programming language. Both of these books are
used to teach DSCI 100, a new introductory data science course
at the University of British Columbia (UBC).

We will finalize this acknowledgements section after the book is complete!
16 changes: 8 additions & 8 deletions source/acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,20 @@ kernelspec:

# Acknowledgments

We'd like to thank everyone that has contributed to the development of
We'd like to thank everyone that has contributed to the development of
[*Data Science: A First Introduction*](https://datasciencebook.ca).
This is an open source textbook that began as a collection of course readings
for DSCI 100, a new introductory data science course
for DSCI 100, a new introductory data science course
at the University of British Columbia (UBC).
Several faculty members in the UBC Department of Statistics
were pivotal in shaping the direction of that course,
and as such, contributed greatly to the broad structure and
Several faculty members in the UBC Department of Statistics
were pivotal in shaping the direction of that course,
and as such, contributed greatly to the broad structure and
list of topics in this book. We would especially like to thank Matías
Salibían-Barrera for his mentorship during the initial development and roll-out
of both DSCI 100 and this book. His door was always open when
we needed to chat about how to best introduce and teach data science to our first-year students.

We would also like to thank all those who contributed to the process of
We would also like to thank all those who contributed to the process of
publishing this book. In particular, we would like to thank all of our reviewers for their feedback and suggestions:
Rohan Alexander, Isabella Ghement, Virgilio Gómez Rubio, Albert Kim, Adam Loy, Maria Prokofieva, Emily Riederer, and Greg Wilson.
The book was improved substantially by their insights.
Expand All @@ -37,8 +37,8 @@ for his support and encouragement throughout the process, and to
Roger Peng for graciously offering to write the Foreword.

Finally, we owe a debt of gratitude to all of the students of DSCI 100 over the past
few years. They provided invaluable feedback on the book and worksheets;
they found bugs for us (and stood by very patiently in class while
few years. They provided invaluable feedback on the book and worksheets;
they found bugs for us (and stood by very patiently in class while
we frantically fixed those bugs); and they brought a level of enthusiasm to the class
that sustained us during the hard work of creating a new course and writing a textbook.
Our interactions with them taught us how to teach data science, and that learning
Expand Down
10 changes: 5 additions & 5 deletions source/appendixA.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ kernelspec:
name: python3
---

# Downloading files from JupyterHub
# Downloading files from JupyterHub

This section will help you
save your work from a JupyterHub web-based platform to your own computer.
save your work from a JupyterHub web-based platform to your own computer.
Let's say you want to download everything inside a folder called `your_folder`
in your home directory.
First open a terminal \index{JupyterHub!file download} by clicking "terminal" in the Launcher tab.
Next, type the following in the terminal to create a
First open a terminal \index{JupyterHub!file download} by clicking "terminal" in the Launcher tab.
Next, type the following in the terminal to create a
compressed `.zip` archive for the work you are interested in downloading:

```
Expand All @@ -29,6 +29,6 @@ zip -r hub_folder.zip your_folder

After the compressing process is complete, right-click on `hub_folder.zip`
in the JupyterHub file browser
and click "Download". After the download is complete, you should be
and click "Download". After the download is complete, you should be
able to find the `hub_folder.zip` file on your own computer,
and unzip the file (typically by double-clicking on it).
Loading

0 comments on commit 4947023

Please sign in to comment.