An example of automatic logging of numerical outputs using ... - Njit

24 downloads 115 Views 113KB Size Report
outputs using Fortran, MATLAB, and LATEX. Roy H. Goodman. September 1, 2009. 1 Autobiography/Introduction. A common problem with running numerical ...
An example of automatic logging of numerical outputs using Fortran, MATLAB, and LATEX Roy H. Goodman September 1, 2009

1

Autobiography/Introduction

A common problem with running numerical simulations is what to do with all the output. If I were trained as an experimental scientist, say a biologist or chemist, I would have been shown, on joining my advisor’s lab, a protocol for recording all my experiments—the conditions under which they were run, measurements of output, my observations, etc.—in a lab notebook. I however, am an applied mathematician. My advisor and I decided I should run some numerical simulations, so I started writing FORTRAN 77 code. Each time, I’d record the outputs in an ad-hoc manner by redirecting the standard output to a file with a name like run July 16 1998 epsilon half that tried to remind me of the day I ran the code and what parameters were used. Of course the program had a number of parameters, too many to record in the filename, so I bought a bound composition notebook and tried to use it as a lab notebook. This lasted about a day, and I never got past the first page. I quickly realized that the computer was much better than I was at recording this information and added code that would create a directory in which to store all the input and output parameters and figures that I created. In tandem, I created a simple lab notebook file called log.txt in which I typed a few notes about each run. Eventually, I automated the creation and management of the output directory and the logfile, and added LATEX commands to it. I think it got to be pretty good, so I decided to clean it up and write this explanation. In the course of trying to create something that other people might like to use, I came up with a handful of additional improvements to the code, trying to make it as modular and reusable as I could.

1

This package contains a number of programs for creating a type of lab notebook for numerical simulations. The simulations are run in Fortran, which automatically creates directories in which it stores the inputs and outputs. Matlab routines read these files and create figures. LATEX codes synthesize these two types of data into hyperlinked documents with two parts, first an initial table containing listings of the parameters under which each program is run and in which directory its data is stored. Second, each line of the table is linked to a page which contains summary figures for each run. Before each run, the user edits a single data file that can be read by Fortran or Matlab. This file is copied to the output directory. In addition to all the parameters with which to run the simulation, the data file contains space for the user to make remarks, containing the user’s reasoning for running this code. This line is imported by the LATEX logfile and displayed in both the table of runs and in the summary page for each run. The MATLAB program creates the figures and then asks the user to make another remark, where he can make observations on what he sees.

2

A demonstration

The gzipped tarfile logging.tgz contains a Directory Logging. Unpack the tarfile (type tar xzvf logging.tgz) and cd into this directory. Here is what you will find Directories Data, source, and string • Data contains a bunch of Matlab .m files and two subdirectories latex and html. Each of these contains three LATEX files that will be used to typeset the log—a main file, and two files containing macros. • source The source code for the two executables html logging and latex logging as well as several auxiliary programs used by one or both programs. • string The string library, written by Giulio Vistoli & Alex Pedretti. This contains some simple codes that modernize the poor string handling capabilities of Fortran 77 as well as additional codes that I wrote which use the library to simplify basic UNIX file-handling. This stuff is all pretty Unix-specific. I make no guarantees about how it will work on the strangely popular Windows platform. 2

Files data html and data latex Datafiles used by the two executables. Type ‘make’ and, with luck the Fortran code will compile. If it does not, you probably have to edit the Makefile and make sure that the variable F77 is set to be the Fortran compiler on your computer (see later sections for descriptions of how it all works).

The latex logging program If everything compiles, you should have two executable files html logging and latex logging. Run latex logging. Instead of doing actual calculations, this program just creates a whole log of gaussians for you to plot using the Matlab programs contained in the Data directory. These will be stored in the directory Data/latex/001. This directory contains: • A copy of the datafile data latex containing the parameters under which the simulation was run. • A directory params full of small files, each with a .tex extension1 . These are the same as the elements of the datafile, but written this way so that they can be loaded by the LATEX logfile. Because each parameter is given its own • Seven directories labeled y.001 through y.007. These are the outputs of our fake simulation. File number k contains the data to plot the function 2 y(x) = Ae−σ(x−xk ) where the parameters are described below. In addition, the directory Data/latex contains a file dataList.tex that will be used in typesetting the logfile and a simple plaintext logfile log.txt. The top-level directory will contain two hidden files (another UNIX specific thing) named .latex number and .latex processed. These are created the first time the program is run. The former will contain the number 1, meaning that the most recently created data directory is Data/latex/001. It will be updated each time the executable is called. The latter contains the number zero. This number is updated whenever the Matlab program view latex is run with zero input arguments, which processes (creates summary figures and records the experimenter’s observations) for all unprocessed data directories. 1

I didn’t want to put this extension there, but the LATEX“include” statement only will work on files with the .tex extension

3

The parameters under which the program is run are stored in the file data latex: npoints The number of x-values in the computational grid. nshifts The number of simulations. amplitude The amplitude A above. sigma The number σ above. The width of the gaussians is proportional to σ −1/2 . xmin and xmax The plotting limits. shiftmin and shiftmax used together with nshifts to determine the values of xk . remark This is a remark that will appear in the log files. It starts with a single percent sign (so that matlab will treat it as a comment and this file can be loaded as a column vector of numbers. Itcan be as many lines long as need be, each no more than 70 characters. The remark is terminated by a line starting with two percent signs (%%). Now edit the file data latex. Change some of the parameters and the remark. You should now have an additional directory Data/latex/002. The file .latex number should now contain the number 2, while .latex processed still contains the number zero. The next step is to create the summary figures and comments using Matlab. In Matlab, navigate to the Data directory and type view latex with no arguments. It will loop through both directories and create a summary figure for each and prompt you for your observations on the figure it produced. Type something and use a blank line to terminate your comment. Note this can contain LATEX instructions, since it will be included in the final PDF logfile. I like to use pdftLaTeX, so my code makes a call to the shell to run epstopdf on each of the encapsulated postscript files. If you don’t want to do this or are having problems (e.g. Matlab giving path errors), you can comment out this part of the program. Once this is done, typeset the thing. In the directory Data/latex type pdflatex latex log. Rerun this about four times (it takes this many runs for the longtable and hyperref packages to work out all their differences and create a nice PDF file with a formatted table indexing the runs on page one. Note the red directory names 001 and 002 are hyperlinked to the summary sheets on the next two pages. 4

The html logging program The above works well if each run produces just a small amount of data. If your program generates a larger amount of data, e.g. if you’re exploring a large parameter space, then one summary figure per run will be insufficient. A better solution would be to create a a larger number of figures and arrange them into a thumbnail gallery with links to the originals. This we do the second example. A web page provides a much more convenient interface to data than does a PDF. It is still useful to use LATEX to programmatically create the HTML file. There exist many tools to convert LATEX to HTML. The best, as of today and after I have spent a lot of time spent trying out the alternatives, seems to be plasTeX2 , written by Kevin Smith in order to format the documentation for SAS software. This processes an ordinary LATEX file using code built using the Python programming language and requires a small amount of setting up.3 This program is run the same way as the other one, by editing data html and running html logging. It creates similar hidden number files and numbered data directories. In this version of the program, both the shift variable and the amplitude variable are allowed to vary. The figures are created, as we did in the first example, by running the program view_html.m in the Data directory. Run html logging several times and create the graphics summary files. Before running running plasTeX to typeset the data into HTML, open of the file part2commands.tex in the Data/html. You will see the command \newcommand{\logginghome}{file:///Users/roy/Logging}. Change this to the directory where you have installed the files. We want 2

http://plastex.sourceforge.net Most of the installation is straightforward, albeit involved, especially if you don’t have Python installed. Version 0.9.2 of PlasTeX, current as of August, 2009, cannot display the thumbnail galleries nicely in a manner similar to figures in LATEX. In response to my requests, Kevin Smith has added a small fix in the next version. If you want it to work now, there are two choices build PlasTeX from the current CVS snapshot or do the following small hack. Find the file plasTeX/Renderers/XHTML/Themes/default/styles/styles.css and append to it the following snippet of css code: 3

.subfloat { display: inline-table; padding: 1em 3em 1em 3em; text-align: center }

5

PlasTeX to point to the PDF files that are created by the MATLAB viewing programs, and will need absolute referencing for this. The HTML log file will be contained in the file Data/html/log_html/index.html which you can view in a web browser. You may have to run plasTeX a few times to get the references correct. Running plasTeX fills my terminal with error messages that look like this: GPL Ghostscript 8.63: Unrecoverable error, exit code 1 [21 These seem to be harmless on the first few pages of output. PlasTeX converts all the included graphics to the portable network graphics (PNG) format. Eventually, however, the errors pile up program stops producing these .pngformat graphics files. Notice in the included html document, that starting on page 004 the thumbnails fail to appear. My computer has dvipng version 1.11 installed (part of the MacTeX package). I have tried without success to compile version 1.12.

3

Other things

Okay, now we have a system that can make an index of our numerical simulations. What else could we want to do with it? I hope others might come up with (and let me know about) lots of good ideas. I have one more, which I will call pruning. A lot of the simulations you run will be uninteresting, irrelevant, or simply mistaken. The system has been written in such a way that it is easy to get rid of the results of these simulations. Suppose you don’t like the run stored in directory 004. You can delete it safely. In addition to deleting it, you also must delete (or comment out) the line \insertData{004} from the file dataList.tex. The next time you typeset the logfile, this run won’t appear. What can you add to this?

6

Suggest Documents