-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chapter 5 production polish #78
Conversation
…nts; fixed colors (not working yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great overall, nicely done!
the `groupby` and `count` methods to find the number and percentage | ||
of benign and malignant tumor observations in our data set. When paired with | ||
`groupby`, `count` counts the number of observations for each value of the `Class` | ||
variable. Then we calculate the percentage in each group by dividing by the total | ||
number of observations and multiplying by 100. We have | ||
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and | ||
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant | ||
tumor observations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the `groupby` and `count` methods to find the number and percentage | |
of benign and malignant tumor observations in our data set. When paired with | |
`groupby`, `count` counts the number of observations for each value of the `Class` | |
variable. Then we calculate the percentage in each group by dividing by the total | |
number of observations and multiplying by 100. We have | |
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and | |
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant | |
tumor observations. | |
the `groupby` and `size` methods to find the number and percentage | |
of benign and malignant tumor observations in our data set. When paired with | |
`groupby`, `size` counts the number of observations for each value of the `Class` | |
variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the rest down so that we do one step at a time
explore_cancer['percentage'] = 100 * explore_cancer['count']/len(cancer) | ||
explore_cancer | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explore_cancer['percentage'] = 100 * explore_cancer['count']/len(cancer) | |
explore_cancer | |
``` | |
``` | |
Then we calculate the percentage in each group by dividing by the total | |
number of observations and multiplying by 100. We have | |
{glue:}`benign_count` ({glue:}`benign_pct`\%) benign and | |
{glue:}`malignant_count` ({glue:}`malignant_pct`\%) malignant | |
tumor observations. | |
```{code-cell} ipython3 | |
100 * cancer.groupby('Class').size() / cancer.shape[0] | |
``` |
shape
is preferred over length on dataframes since it is already computed and in the spirit of using dataframes methods instead of global python functions in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue about converting all to shape throughout
I will open another issue about size -- we may need to edit earlier chapters to do that
add output scroll for large table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelostblom did a thorough review -- I just added a minor comment (and a commit that enables scroll for the cancer data table because it is quite large). It otherwise looks good to me!
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
Co-authored-by: Joel Ostblom <[email protected]>
* welcome (#74) * update the welcome page to indicate this is the python version, include links to the R version * minor edits (newline for better diffs) Co-authored-by: Trevor Campbell <[email protected]> * put headings on the table of contents for easier navigation (#75) * Update metadata and add chapter numbering (#44) * Update _config.yml * Update repo and branch info * Make some visible changes * Include packages we actually use and updated jupyterbook to fix netlify * Test rebuilding notebooks again * Fix typo * Update ToC * Add ibis * Always execute all notebooks * Delete requirements.txt * Update source/_config.yml * Update source/_config.yml * Change path and branch * Increase timeout * Update _config.yml (#90) * Ch3: wrangling (#76) * wip on ch3 * working on wrangling chapter * move chaining content to the intro * update content on summary statistics * update the disucssion on apply * remove the discussion on What is a List * move the assign content to the very end * minor wordsmithing on welcome page * edited learning objs in ch1 for new chaining * update discussion on chaining, edits from Trevor * update discussion of split * remove unnecessary call to File: dir Node: Top This is the top of the INFO tree This (the Directory node) gives a menu of major topics. Typing "d" returns here, "q" exits, "?" lists all INFO commands, "h" gives a primer for first-timers, "mEmacs<Return>" visits the Emacs topic, etc. In Emacs, you can click mouse button 2 on a menu item or cross reference to select it. --- PLEASE ADD DOCUMENTATION TO THIS TREE. (See INFO topic first.) --- * Menu: The list of major topics begins on the next line. Emacs * Ada mode: (ada-mode). The GNU Emacs mode for editing Ada. * Autotype: (autotype). Convenient features for text that you enter frequently in Emacs. * CC Mode: (ccmode). Emacs mode for editing C, C++, Objective-C, Java, Pike, and IDL code. * CL: (cl). Partial Common Lisp support for Emacs Lisp. * Dired-X: (dired-x). Dired Extra Features. * EUDC: (eudc). A client for directory servers (LDAP, PH) * Ebrowse: (ebrowse). A C++ class browser for Emacs. * Ediff: (ediff). A visual interface for comparing and merging programs. * Emacs: (emacs). The extensible self-documenting text editor. * Emacs FAQ: (efaq). Frequently Asked Questions about Emacs. * Emacs MIME: (emacs-mime). The MIME de/composition library. * Eshell: (eshell). A command shell implemented in Emacs Lisp. * Forms: (forms). Emacs package for editing data bases by filling in forms. * Gnus: (gnus). The newsreader Gnus. * IDLWAVE: (idlwave). Major mode and shell for IDL and WAVE/CL files. * MH-E: (mh-e). Emacs interface to the MH mail system. * Message: (message). Mail and news composition mode that goes with Gnus. * PCL-CVS: (pcl-cvs). Emacs front-end to CVS. * RefTeX: (reftex). Emacs support for LaTeX cross-references and citations. * SC: (sc). Supercite lets you cite parts of messages you're replying to, in flexible ways. * Speedbar: (speedbar). File/Tag summarizing utility. * VIP: (vip). An older VI-emulation for Emacs. * VIPER: (viper). The newest Emacs VI-emulation mode. (also, A VI Plan for Emacs Rescue or the VI PERil.) * Widget: (widget). Documenting the "widget" package used by the Emacs Custom facility. * WoMan: (woman). Browse UN*X Manual Pages `Wo (without) Man'. Texinfo documentation system * Info: (info). Documentation browsing system. Miscellaneous * Screen: (screen). Full-screen window manager. * Standards: (standards). GNU coding standards. GNU admin * Autoconf: (autoconf). Create source code configuration scripts Individual utilities * aclocal: (automake)Invoking aclocal. Generating aclocal.m4 * autoconf: (autoconf)autoconf Invocation. How to create configuration scripts * autoreconf: (autoconf)autoreconf Invocation. Remaking multiple `configure' scripts * autoscan: (autoconf)autoscan Invocation. Semi-automatic `configure.ac' writing * config.status: (autoconf)config.status Invocation. Recreating a configuration * configure: (autoconf)configure Invocation. Configuring a package * ifnames: (autoconf)ifnames Invocation. Listing the conditionals in source code GNU programming tools * automake: (automake). Making Makefile.in's Utilities * Bash: (bash). The GNU Bourne-Again SHell. GNU Packages * Tar: (tar). Making tape (or disk) archives. Individual utilities * tar: (tar)tar invocation. Invoking GNU `tar' Software development * Cpp: (cpp). The GNU C preprocessor. * Cpplib: (cppinternals). Cpplib internals. * gcc: (gcc). The GNU Compiler Collection. * gccinstall: (gccinstall). Installing the GNU Compiler Collection. * gccint: (gccint). Internals of the GNU Compiler Collection. * gfortran: (gfortran). The GNU Fortran Compiler. GNU Libraries * libgomp: (libgomp). GNU OpenMP runtime library Programming & development tools * gdbm_dump: gdbm_dump(gdbm). Dump the GDBM database into a flat file. * gdbm_load: gdbm_load(gdbm). Load the database from a flat file. Utilities GNU libraries * gmp: (gmp). GNU Multiple Precision Arithmetic Library. Software libraries * GnuTLS: (gnutls). GNU Transport Layer Security Library. * GnuTLS-Guile: (gnutls-guile). GNU Transport Layer Security Library. Guile bindings. * libidn2: (libidn2). Internationalized domain names (IDNA2008/TR46) processing. * libtasn1: (libtasn1). Library for Abstract Syntax Notation One (ASN.1). * mpfr: (mpfr). Multiple Precision Floating-Point Reliable Library. GNU Packages * mpc: (mpc)Multiple Precision Complex Library. Development * fftw3: (fftw3). FFTW User's Manual. Individual utilities * aclocal-invocation: (automake)aclocal Invocation. Generating aclocal.m4. * autoconf-invocation: (autoconf)autoconf Invocation. How to create configuration scripts * autoheader: (autoconf)autoheader Invocation. How to create configuration templates * autom4te: (autoconf)autom4te Invocation. The Autoconf executables backbone * automake-invocation: (automake)automake Invocation. Generating Makefile.in. * autoreconf: (autoconf)autoreconf Invocation. Remaking multiple ‘configure’ scripts * autoscan: (autoconf)autoscan Invocation. Semi-automatic ‘configure.ac’ writing * autoupdate: (autoconf)autoupdate Invocation. Automatic update of ‘configure.ac’ * config.status: (autoconf)config.status Invocation. Recreating configurations. * configure: (autoconf)configure Invocation. Configuring a package. * ifnames: (autoconf)ifnames Invocation. Listing conditionals in source. * libtool-invocation: (libtool)Invoking libtool. Running the 'libtool' script. * libtoolize: (libtool)Invoking libtoolize. Adding libtool support. * testsuite: (autoconf)testsuite Invocation. Running an Autotest test suite. Software development * Autoconf: (autoconf). Create source code configuration scripts. * Automake: (automake). Making GNU standards-compliant Makefiles. * Automake-history: (automake-history). History of Automake development. * GNU libtextstyle: (libtextstyle). Output of styled text. * GNU libunistring: (libunistring). Unicode string library. * Libtool: (libtool). Generic shared library support script. Localization * idn2: (libidn2)Invoking idn2. Internationalized Domain Name (IDNA2008/TR46) conversion. Encryption * Nettle: (nettle). A low-level cryptographic library. System Administration * certtool: (gnutls)certtool Invocation. Manipulate certificates and keys. * gnutls-cli: (gnutls)gnutls-cli Invocation. GnuTLS test client. * gnutls-cli-debug: (gnutls)gnutls-cli-debug Invocation. GnuTLS debug client. * gnutls-serv: (gnutls)gnutls-serv Invocation. GnuTLS test server. * psktool: (gnutls)psktool Invocation. Simple TLS-Pre-Shared-Keys manager. * srptool: (gnutls)srptool Invocation. Simple SRP password tool. Libraries * libgpg-error: (gnupg). Error codes and common code for GnuPG. GNU Libraries * libgcrypt: (gcrypt). Cryptographic function library. C++ libraries * autosprintf: (autosprintf). Support for printf format strings in C++. GNU Gettext Utilities * ISO3166: (gettext)Country Codes. ISO 3166 country codes. * ISO639: (gettext)Language Codes. ISO 639 language codes. * autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. * envsubst: (gettext)envsubst Invocation. Expand environment variables. * gettext: (gettext). GNU gettext utilities. * gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. * msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. * msgcat: (gettext)msgcat Invocation. Combine several PO files. * msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. * msgcomm: (gettext)msgcomm Invocation. Match two PO files. * msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. * msgen: (gettext)msgen Invocation. Create an English PO file. * msgexec: (gettext)msgexec Invocation. Process a PO file. * msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. * msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. * msggrep: (gettext)msggrep Invocation. Select part of a PO file. * msginit: (gettext)msginit Invocation. Create a fresh PO file. * msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. * msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. * msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. * ngettext: (gettext)ngettext Invocation. Translate a message with plural. * xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. The Algorithmic Language Scheme * Guile Reference: (guile). The Guile reference manual. * R5RS: (r5rs). The Revised(5) Report on Scheme. * take care of colons preceding code blocks * take care of chapter references * add discussion of lists and dicts * add table and discussion on basic data structures in python: * add description of info * some general cleanup in apply, assign * typo fix in the intro * a couple of type fixes: * polish chaining/multiline exps * polish ch3 up to and incl tidy data * polish indexing * more polish ch3 * fix python exercises link * more on groupby * improve groupy and discussion of lambda functions * try re-ordering the assign and apply content * global find replace to remove . in naming conventions * caption on fig24 fixed * polish up to apply * cleanup through Using to create new columns * through the summery * add :tags: [output_scroll] for large code outputs, change figure types * trim vertical whitespace on figures: * Update source/wrangling.md Co-authored-by: Joel Ostblom <[email protected]> * Apply suggestions from code review Co-authored-by: Joel Ostblom <[email protected]> * polishing ch3 * final polish on wrangling * final polish on ch3 joel comments Co-authored-by: Trevor Campbell <[email protected]> Co-authored-by: Joel Ostblom <[email protected]> * added altair_saver extension * update build_html.sh script with new docker image * Ch4: Viz (#77) * code formatting for viz * update viz chapter * updating the viz chapter * comments addressed through faithful dataset * more progress on the viz chapter (part way through morley data) * add back code to create the csv for mauna_loa * a couple minor typo fixes * polishing ch4 * minor polish on ch4 * code tags in learning objs * polish on ch4, fixed number -> percentage in figure labels * re-added other filetypes... * better line formatting in saving section * ignore altair warnings; committed faithful plots * moved faithful plots to img/ * done polishing ch4 Co-authored-by: Trevor Campbell <[email protected]> * removed unused material * Front matter (#96) * preface python * remove foreword * added editors page * fix appendix,references * added py acks * minor ed * Update editors.md add Lindsey bio * Add joels bio Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Joel Ostblom <[email protected]> * Add jupyterlab help section (#101) * Ch1 fig cleanup (#99) * first figures in ch1: * code figures for ch1, including ppt to edit them * update figure sizes * remove old lingering image * removed hidden pptx cache file Co-authored-by: Trevor Campbell <[email protected]> * Ch2 fig cleanup (#102) * update output scrolling for ch2 * update scrolling of large output tables * Ch3 fig cleanup (#103) * figure polishing for ch3 * more ch3 figures * Jupyter and Version Control (#98) * added jupyter and version control back in * converted bookdown to jupyterbook images in jupyter chapter * jupyterbook index entries in jupyter * fig references to jupyterbook in jupytermd * updated unicode characters in jupyter; added a caveat at the beginning regarding images * initial index and figure template add to version control * R -> Python in version control * fixed figure refs bookdown->jupyterbook in vsn ctl * jupyterbook index style vsn ctl * fixed images in vsn ctl * minor polish * added caveat * Update source/version-control.md * admonitions for caveats Co-authored-by: Lindsey Heagy <[email protected]> * remove the visible call of `glue` in the saving section (#109) * remove the visible call of in the saving section * remove size calculation code * trim whitespace from end of line for better diffs (#111) * Chapter 5 production polish (#78) * starting work on ch5+6; categorical type change; remove commented out R code * value counts, class name remap, replace in ch5 * remove warnings * polished ch5+6 up to euclidean dist * minor bugfix * minor bugfix * fixed worksheets link at end of chp * fix minor section heading wording in Ch1 * added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet) * initial fit and predict polished; model spec -> model object * polishing preprocessing * balancing polished * pipelines * learning objs * mute warnings in ch5 * restore cls2 to main branch * Update classification1.md add output scroll for large table * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * remove random state specificaiton from ch5 * fixed fill plots * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * better intro of meshgrid * better warning filtering in chapters 4 and 5 * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * remove np.number and replace with just 'number' in dtype selection * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification1.md Co-authored-by: Joel Ostblom <[email protected]> * properly gluing column names Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Joel Ostblom <[email protected]> * minor fixes post-merge * Update viz.md (#137) * Chapter 6 production polish (#86) * starting work on ch5+6; categorical type change; remove commented out R code * value counts, class name remap, replace in ch5 * remove warnings * polished ch5+6 up to euclidean dist * minor bugfix * minor bugfix * fixed worksheets link at end of chp * fix minor section heading wording in Ch1 * added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet) * initial fit and predict polished; model spec -> model object * polishing preprocessing * balancing polished * pipelines * learning objs * mute warnings in ch5 * warn mute code; fixed links at end * restore cls2 to main branch * remove caption hack; minor fix to learning objs * Remove caption hack * initial improved seed explanation * random seed section polish done * polished ch6 up to tuning * initial cross val example done * in python -> in scikit * working on cross-val * polished ch6 up to predictor selection * commented out predictor selection * done ch6 except final under/overfit plot * warnings filter in ch6; remove seed hack cell * remove reference to random state in train/test split * minor typesetting .method() vs method * put setup.md back in to fix broken links * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * values -> to_numpy in randomness section * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * remove code for area plot at the end of ch6 Co-authored-by: Joel Ostblom <[email protected]> * Chapter 7 production polish (#125) * fix exercises link * wip * polished ch7 up to under/overfitting * done polishing ch7 * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression1.md Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Lindsey Heagy <[email protected]> * Chapter 8 production polish (#126) * fix exercises link * polishing ch8 * done polishing ch8 * Update source/regression2.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression2.md Co-authored-by: Lindsey Heagy <[email protected]> * Update source/regression2.md Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Lindsey Heagy <[email protected]> Co-authored-by: Joel Ostblom <[email protected]> Co-authored-by: GitHub Action <[email protected]>
Ignore the branch name -- I originally was planning to do both ch5+6 in this PR, but figured it would be better to just split them.