Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 5 production polish #78

Merged
merged 38 commits into from
Jan 11, 2023
Merged

Chapter 5 production polish #78

merged 38 commits into from
Jan 11, 2023

Conversation

trevorcampbell
Copy link
Contributor

@trevorcampbell trevorcampbell commented Dec 31, 2022

Ignore the branch name -- I originally was planning to do both ch5+6 in this PR, but figured it would be better to just split them.

@trevorcampbell trevorcampbell changed the title [WIP] Chapters 5 & 6 production polish [WIP] Chapter 5 production polish Jan 2, 2023
@trevorcampbell trevorcampbell changed the title [WIP] Chapter 5 production polish Chapter 5 production polish Jan 2, 2023
Copy link
Collaborator

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great overall, nicely done!

source/classification1.md Show resolved Hide resolved
source/classification1.md Outdated Show resolved Hide resolved
source/classification1.md Show resolved Hide resolved
Comment on lines +256 to +263
the `groupby` and `count` methods to find the number and percentage
of benign and malignant tumor observations in our data set. When paired with
`groupby`, `count` counts the number of observations for each value of the `Class`
variable. Then we calculate the percentage in each group by dividing by the total
number of observations and multiplying by 100. We have
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant
tumor observations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the `groupby` and `count` methods to find the number and percentage
of benign and malignant tumor observations in our data set. When paired with
`groupby`, `count` counts the number of observations for each value of the `Class`
variable. Then we calculate the percentage in each group by dividing by the total
number of observations and multiplying by 100. We have
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant
tumor observations.
the `groupby` and `size` methods to find the number and percentage
of benign and malignant tumor observations in our data set. When paired with
`groupby`, `size` counts the number of observations for each value of the `Class`
variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the rest down so that we do one step at a time

Comment on lines +268 to 270
explore_cancer['percentage'] = 100 * explore_cancer['count']/len(cancer)
explore_cancer
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explore_cancer['percentage'] = 100 * explore_cancer['count']/len(cancer)
explore_cancer
```
```
Then we calculate the percentage in each group by dividing by the total
number of observations and multiplying by 100. We have
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant
tumor observations.
```{code-cell} ipython3
100 * cancer.groupby('Class').size() / cancer.shape[0]
```

shape is preferred over length on dataframes since it is already computed and in the spirit of using dataframes methods instead of global python functions in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue about converting all to shape throughout

I will open another issue about size -- we may need to edit earlier chapters to do that

source/classification1.md Outdated Show resolved Hide resolved
source/classification1.md Outdated Show resolved Hide resolved
source/classification1.md Outdated Show resolved Hide resolved
source/classification1.md Outdated Show resolved Hide resolved
source/classification1.md Outdated Show resolved Hide resolved
add output scroll for large table
source/classification1.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@lheagy lheagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelostblom did a thorough review -- I just added a minor comment (and a commit that enables scroll for the cancer data table because it is quite large). It otherwise looks good to me!

@trevorcampbell trevorcampbell merged commit 2a45dd7 into main Jan 11, 2023
trevorcampbell added a commit that referenced this pull request Jan 18, 2023
* welcome (#74)

* update the welcome page to indicate this is the python version, include links to the R version

* minor edits (newline for better diffs)

Co-authored-by: Trevor Campbell <[email protected]>

* put headings on the table of contents for easier navigation (#75)

* Update metadata and add chapter numbering (#44)

* Update _config.yml

* Update repo and branch info

* Make some visible changes

* Include packages we actually use and updated jupyterbook to fix netlify

* Test rebuilding notebooks again

* Fix typo

* Update ToC

* Add ibis

* Always execute all notebooks

* Delete requirements.txt

* Update source/_config.yml

* Update source/_config.yml

* Change path and branch

* Increase timeout

* Update _config.yml (#90)

* Ch3: wrangling (#76)

* wip on ch3

* working on wrangling chapter

* move chaining content to the intro

* update content on summary statistics

* update the disucssion on apply

* remove the discussion on What is a List

* move the assign content to the very end

* minor wordsmithing on welcome page

* edited learning objs in ch1 for new chaining

* update discussion on chaining, edits from Trevor

* update discussion of split

* remove unnecessary call to File: dir	Node: Top	This is the top of the INFO tree
  This (the Directory node) gives a menu of major topics.
  Typing "d" returns here, "q" exits, "?" lists all INFO commands, "h"
  gives a primer for first-timers, "mEmacs<Return>" visits the Emacs topic,
  etc.
  In Emacs, you can click mouse button 2 on a menu item or cross reference
  to select it.
  --- PLEASE ADD DOCUMENTATION TO THIS TREE. (See INFO topic first.) ---

* Menu: The list of major topics begins on the next line.

Emacs
* Ada mode: (ada-mode). The GNU Emacs mode for editing Ada.
* Autotype: (autotype). Convenient features for text that you enter frequently
                          in Emacs.
* CC Mode: (ccmode).   Emacs mode for editing C, C++, Objective-C,
                          Java, Pike, and IDL code.
* CL: (cl).		Partial Common Lisp support for Emacs Lisp.
* Dired-X: (dired-x).   Dired Extra Features.
* EUDC: (eudc).   A client for directory servers (LDAP, PH)
* Ebrowse: (ebrowse).   A C++ class browser for Emacs.
* Ediff: (ediff).       A visual interface for comparing and merging programs.
* Emacs: (emacs).	The extensible self-documenting text editor.
* Emacs FAQ: (efaq).	Frequently Asked Questions about Emacs.
* Emacs MIME: (emacs-mime).   The MIME de/composition library.
* Eshell: (eshell).     A command shell implemented in Emacs Lisp.
* Forms: (forms).	Emacs package for editing data bases
			  by filling in forms.
* Gnus: (gnus).         The newsreader Gnus.
* IDLWAVE: (idlwave).	Major mode and shell for IDL and WAVE/CL files.

* MH-E: (mh-e).		Emacs interface to the MH mail system.
* Message: (message).   Mail and news composition mode that goes with Gnus.
* PCL-CVS: (pcl-cvs).	Emacs front-end to CVS.
* RefTeX: (reftex).	Emacs support for LaTeX cross-references and citations.
* SC: (sc).		Supercite lets you cite parts of messages you're
			  replying to, in flexible ways.

* Speedbar: (speedbar).        File/Tag summarizing utility.
* VIP: (vip).		An older VI-emulation for Emacs.
* VIPER: (viper).       The newest Emacs VI-emulation mode.
                          (also, A VI Plan for Emacs Rescue
                           or the VI PERil.)
* Widget: (widget).      Documenting the "widget" package used by the
                           Emacs Custom facility.
* WoMan: (woman).       Browse UN*X Manual Pages `Wo (without) Man'.

Texinfo documentation system
* Info: (info).                 Documentation browsing system.

Miscellaneous
* Screen: (screen).             Full-screen window manager.
* Standards: (standards).        GNU coding standards.

GNU admin
* Autoconf: (autoconf).         Create source code configuration scripts

Individual utilities
* aclocal: (automake)Invoking aclocal.          Generating aclocal.m4
* autoconf: (autoconf)autoconf Invocation.
                                How to create configuration scripts
* autoreconf: (autoconf)autoreconf Invocation.
                                Remaking multiple `configure' scripts
* autoscan: (autoconf)autoscan Invocation.
                                Semi-automatic `configure.ac' writing
* config.status: (autoconf)config.status Invocation.
                                Recreating a configuration
* configure: (autoconf)configure Invocation.
                                Configuring a package
* ifnames: (autoconf)ifnames Invocation.
                                Listing the conditionals in source code

GNU programming tools
* automake: (automake).		Making Makefile.in's

Utilities
* Bash: (bash).                     The GNU Bourne-Again SHell.

GNU Packages
* Tar: (tar).			Making tape (or disk) archives.

Individual utilities
* tar: (tar)tar invocation.                     Invoking GNU `tar'

Software development
* Cpp: (cpp).		       The GNU C preprocessor.
* Cpplib: (cppinternals).      Cpplib internals.
* gcc: (gcc).                  The GNU Compiler Collection.
* gccinstall: (gccinstall).    Installing the GNU Compiler Collection.
* gccint: (gccint).            Internals of the GNU Compiler Collection.
* gfortran: (gfortran).                  The GNU Fortran Compiler.

GNU Libraries
* libgomp: (libgomp).                    GNU OpenMP runtime library

Programming & development tools
* gdbm_dump: gdbm_dump(gdbm).   Dump the GDBM database into a flat file.
* gdbm_load: gdbm_load(gdbm).   Load the database from a flat file.

Utilities

GNU libraries
* gmp: (gmp).                   GNU Multiple Precision Arithmetic Library.

Software libraries
* GnuTLS: (gnutls).		GNU Transport Layer Security Library.
* GnuTLS-Guile: (gnutls-guile).		GNU Transport Layer Security Library. Guile bindings.
* libidn2: (libidn2).	Internationalized domain names (IDNA2008/TR46) processing.
* libtasn1: (libtasn1). Library for Abstract Syntax Notation One (ASN.1).
* mpfr: (mpfr).                 Multiple Precision Floating-Point Reliable Library.

GNU Packages
* mpc: (mpc)Multiple Precision Complex Library.

Development
* fftw3: (fftw3).	FFTW User's Manual.

Individual utilities
* aclocal-invocation: (automake)aclocal Invocation.   Generating aclocal.m4.
* autoconf-invocation: (autoconf)autoconf Invocation.
                                How to create configuration scripts
* autoheader: (autoconf)autoheader Invocation.
                                How to create configuration templates
* autom4te: (autoconf)autom4te Invocation.
                                The Autoconf executables backbone
* automake-invocation: (automake)automake Invocation. Generating Makefile.in.
* autoreconf: (autoconf)autoreconf Invocation.
                                Remaking multiple ‘configure’ scripts
* autoscan: (autoconf)autoscan Invocation.
                                Semi-automatic ‘configure.ac’ writing
* autoupdate: (autoconf)autoupdate Invocation.
                                Automatic update of ‘configure.ac’
* config.status: (autoconf)config.status Invocation. Recreating configurations.
* configure: (autoconf)configure Invocation.    Configuring a package.
* ifnames: (autoconf)ifnames Invocation.        Listing conditionals in source.
* libtool-invocation: (libtool)Invoking libtool. Running the 'libtool' script.
* libtoolize: (libtool)Invoking libtoolize.      Adding libtool support.
* testsuite: (autoconf)testsuite Invocation.    Running an Autotest test suite.

Software development
* Autoconf: (autoconf).         Create source code configuration scripts.
* Automake: (automake).         Making GNU standards-compliant Makefiles.
* Automake-history: (automake-history). History of Automake development.
* GNU libtextstyle: (libtextstyle).     Output of styled text.
* GNU libunistring: (libunistring).     Unicode string library.
* Libtool: (libtool).           Generic shared library support script.

Localization
* idn2: (libidn2)Invoking idn2.	Internationalized Domain Name (IDNA2008/TR46) conversion.

Encryption
* Nettle: (nettle).             A low-level cryptographic library.

System Administration
* certtool: (gnutls)certtool Invocation.	Manipulate certificates and keys.
* gnutls-cli: (gnutls)gnutls-cli Invocation.	GnuTLS test client.
* gnutls-cli-debug: (gnutls)gnutls-cli-debug Invocation.	GnuTLS debug client.
* gnutls-serv: (gnutls)gnutls-serv Invocation.	GnuTLS test server.
* psktool: (gnutls)psktool Invocation.	Simple TLS-Pre-Shared-Keys manager.
* srptool: (gnutls)srptool Invocation.	Simple SRP password tool.

Libraries
* libgpg-error: (gnupg).   Error codes and common code for GnuPG.

GNU Libraries
* libgcrypt: (gcrypt).  Cryptographic function library.

C++ libraries
* autosprintf: (autosprintf).   Support for printf format strings in C++.

GNU Gettext Utilities
* ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
* ISO639: (gettext)Language Codes.             ISO 639 language codes.
* autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
* envsubst: (gettext)envsubst Invocation.      Expand environment variables.
* gettext: (gettext).                          GNU gettext utilities.
* gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
* msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
* msgcat: (gettext)msgcat Invocation.          Combine several PO files.
* msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
* msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
* msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
* msgen: (gettext)msgen Invocation.            Create an English PO file.
* msgexec: (gettext)msgexec Invocation.        Process a PO file.
* msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
* msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
* msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
* msginit: (gettext)msginit Invocation.        Create a fresh PO file.
* msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
* msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
* msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
* ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
* xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.

The Algorithmic Language Scheme
* Guile Reference: (guile).     The Guile reference manual.
* R5RS: (r5rs).                 The Revised(5) Report on Scheme.

* take care of colons preceding code blocks

* take care of chapter references

* add discussion of lists and dicts

* add table and discussion on basic data structures in python:

* add description of info

* some general cleanup in apply, assign

* typo fix in the intro

* a couple of type fixes:

* polish chaining/multiline exps

* polish ch3 up to and incl tidy data

* polish indexing

* more polish ch3

* fix python exercises link

* more on groupby

* improve groupy and discussion of lambda functions

* try re-ordering the assign and apply content

* global find replace to remove . in naming conventions

* caption on fig24 fixed

* polish up to apply

* cleanup through Using  to create new columns

* through the summery

* add :tags: [output_scroll] for large code outputs, change figure types

* trim vertical whitespace on figures:

* Update source/wrangling.md

Co-authored-by: Joel Ostblom <[email protected]>

* Apply suggestions from code review

Co-authored-by: Joel Ostblom <[email protected]>

* polishing ch3

* final polish on wrangling

* final polish on ch3 joel comments

Co-authored-by: Trevor Campbell <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* added altair_saver extension

* update build_html.sh script with new docker image

* Ch4: Viz (#77)

* code formatting for viz

* update viz chapter

* updating the viz chapter

* comments addressed through faithful dataset

* more progress on the viz chapter (part way through morley data)

* add back code to create the csv for mauna_loa

* a couple minor typo fixes

* polishing ch4

* minor polish on ch4

* code tags in learning objs

* polish on ch4, fixed number -> percentage in figure labels

* re-added other filetypes...

* better line formatting in saving section

* ignore altair warnings; committed faithful plots

* moved faithful plots to img/

* done polishing ch4

Co-authored-by: Trevor Campbell <[email protected]>

* removed unused material

* Front matter (#96)

* preface python

* remove foreword

* added editors page

* fix appendix,references

* added py acks

* minor ed

* Update editors.md

add Lindsey bio

* Add joels bio

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* Add jupyterlab help section (#101)

* Ch1 fig cleanup (#99)

* first figures in ch1:

* code figures for ch1, including ppt to edit them

* update figure sizes

* remove old lingering image

* removed hidden pptx cache file

Co-authored-by: Trevor Campbell <[email protected]>

* Ch2 fig cleanup (#102)

* update output scrolling for ch2

* update scrolling of large output tables

* Ch3 fig cleanup (#103)

* figure polishing for ch3

* more ch3 figures

* Jupyter and Version Control (#98)

* added jupyter and version control back in

* converted bookdown to jupyterbook images in jupyter chapter

* jupyterbook index entries in jupyter

* fig references to jupyterbook in jupytermd

* updated unicode characters in jupyter; added a caveat at the beginning regarding images

* initial index and figure template add to version control

* R -> Python in version control

* fixed figure refs bookdown->jupyterbook in vsn ctl

* jupyterbook index style vsn ctl

* fixed images in vsn ctl

* minor polish

* added caveat

* Update source/version-control.md

* admonitions for caveats

Co-authored-by: Lindsey Heagy <[email protected]>

* remove the visible call of `glue` in the saving section (#109)

* remove the visible call of  in the saving section

* remove size calculation code

* trim whitespace from end of line for better diffs (#111)

* Chapter 5 production polish (#78)

* starting work on ch5+6; categorical type change; remove commented out R code

* value counts, class name remap, replace in ch5

* remove warnings

* polished ch5+6 up to euclidean dist

* minor bugfix

* minor bugfix

* fixed worksheets link at end of chp

* fix minor section heading wording in Ch1

* added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet)

* initial fit and predict polished; model spec -> model object

* polishing preprocessing

* balancing polished

* pipelines

* learning objs

* mute warnings in ch5

* restore cls2 to main branch

* Update classification1.md

add output scroll for large table

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove random state specificaiton from ch5

* fixed fill plots

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* better intro of meshgrid

* better warning filtering in chapters 4 and 5

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove np.number and replace with just 'number' in dtype selection

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification1.md

Co-authored-by: Joel Ostblom <[email protected]>

* properly gluing column names

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>

* minor fixes post-merge

* Update viz.md (#137)

* Chapter 6 production polish (#86)

* starting work on ch5+6; categorical type change; remove commented out R code

* value counts, class name remap, replace in ch5

* remove warnings

* polished ch5+6 up to euclidean dist

* minor bugfix

* minor bugfix

* fixed worksheets link at end of chp

* fix minor section heading wording in Ch1

* added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet)

* initial fit and predict polished; model spec -> model object

* polishing preprocessing

* balancing polished

* pipelines

* learning objs

* mute warnings in ch5

* warn mute code; fixed links at end

* restore cls2 to main branch

* remove caption hack; minor fix to learning objs

* Remove caption hack

* initial improved seed explanation

* random seed section polish done

* polished ch6 up to tuning

* initial cross val example done

* in python -> in scikit

* working on cross-val

* polished ch6 up to predictor selection

* commented out predictor selection

* done ch6 except final under/overfit plot

* warnings filter in ch6; remove seed hack cell

* remove reference to random state in train/test split

* minor typesetting .method() vs method

* put setup.md back in to fix broken links

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* values -> to_numpy in randomness section

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove code for area plot at the end of ch6

Co-authored-by: Joel Ostblom <[email protected]>

* Chapter 7 production polish (#125)

* fix exercises link

* wip

* polished ch7 up to under/overfitting

* done polishing ch7

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression1.md

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>

* Chapter 8 production polish (#126)

* fix exercises link

* polishing ch8

* done polishing ch8

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

* Update source/regression2.md

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>

Co-authored-by: Lindsey Heagy <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestions for Ch5 - Classification 1 classification1
3 participants