This project is a set of skeleton scripts for running experiments, analyzing them, and reporting the results in an academic paper. Specifically, the scripts:
-
Automatically runs experiments using a script and uploads the results to a google doc and/or rabbitmq queue.
-
Downloads the results into R
-
Generates statistics and figures
-
Incorporates the statistics and figures into LaTeX with Sweave
This process has a number of benefits for academic paper writing:
-
You can watch your experiments progress by opening google docs and watching the new data come in. This can often give you an early indication of the effect, if any, your most recent change will have.
-
You can run experiments up to the deadline. Got last minute improved numbers? Simply run
make
and regenerate the paper. Did your last run hurt your numbers? Simply revert your google doc to the last good version. -
Years later, you will always be able to find your experimental data. Need to make a new figure? No problem. Need to share your data? Just share the google doc, and optionally share your R script for analyzing it.
-
Add google docs spreadsheet key and login information to
config.py
. -
Run
python autoexp_pool.py
-
Edit
analyze.R
to use your own google docs spreadsheet (you can get this url fromFile
->Publish to the web
). -
Run
make
The scripts in this project run a fictitious experiment that measures how long it takes to type my name (Ed) and the name of my co-author (Thanassis). Thanassis helped me write some of these scripts.
The main experiment script is autoexp.py
. It is written in python,
and is primarily set up to time external commands. The current
version "measures" the time it takes to type a name by generating a
random number. However, it should be easily adapted to real
experiments. Before it can be used, you must put your google account
information and the google docs spreadsheet key in the config.py
file. You can find a spreadsheet's key by looking at its url. You can
run the experiment using multiple cores on a single machine by running
autoexp_pool.py
. Alternatively, you can run the experiment on
multiple machines with the help of a RabbitMQ server by
running autoexp_producer.py
on any machine, and then running
autoexp_consumer.py
on each worker machine. In either case, the
scripts should add a new table called paper
to the google docs
spreadsheet. It should look like this. If you
look at the spreadsheet while the script is running, you should be
able to see each row being added to the spreadsheet. This is more
useful for experiments that take hours or days to run. Alternatively,
for experiments that produce outputs too large for a spreadsheet, you
can save the results in a RabbitMQ queue by setting
use_google=False
and output_rabbitmq=True
in config.py
. You can
then download the results into a csv file by running
autoexp_download_output.py
.
analyze.R
is an R script that analyzes the uploaded data. It reads
the experiment data directly from google docs, counts the number of
samples, computes the mean time to "type" both Ed and Thanassis, and
then produces two figures. You will probably want to edit this script
so that it uses your own google docs csv file instead of mine. I
often start analyzing experimental data by opening R and running
source("analyze.R")
to download the experimental data. There are
many tools inside of R for exploratory data analysis, but I personally
prefer visualization using the ggplot2 package.
Finally, the results from analyze.R
can also be incorporated into a
LaTeX paper. This is done using the Sweave file
Stats.Rnw
, which creates commands for each statistic in R that needs
to be referenced in the paper. An example LaTeX paper is in
paper.tex
. The final step is to use Makefile
to process the
Sweave file and run LaTeX. The final result should produce a
file similar to paper.pdf
.
It is also possible to generate nice figures for inclusion in
Powerpoint slides. To produce histsummary-slides.emf
, which is
suitable for inclusion in Windows Powerpoint, convert
histsummary-slides.svg
to EMF format using Inkscape on Windows. An
example can be seen in slides.pptx
. Note that Mac Powerpoint cannot
view EMF files properly. For Mac Powerpoint, you can simply use
histsummary-slides.pdf, which is generated by analyze.R
.