README file for software to accompany

P. D. Asimow, L. C. Stein, J. L. Mosenfelder and G. R. Rossman
Quantitative polarized infrared analysis of trace OH in populations
	of randomly oriented mineral grains
Submitted to American Mineralogist, March 2005

1. This README accompanies and documents two programs, pbg and pdc.
pbg is for subtracting backgrounds from infrared spectra. pdc is for
determining orientation of the polarization vector in measured spectra
using standard spectra in the principal directions on the same mineral,
and for synthesizing principal spectra from a set of randomly oriented
spectra. See the paper for details.

2. Each program is distributed as C source code and as executables
compiled for Windows, MacOS X, and Linux. The source code may be freely
modified and redistributed, but please cite the above paper for all uses
and ensure that a comparable message accompanies any modified versions.

3. It is best to run pbg from from a terminal window. The way to get a
terminal window varies among operating systems. On Linux, any shell or
command window will suffice. On MacOS, under Applications/Utilities, use
the Terminal program. On Windows, on the Start Menu under Programs/
Accessories you will find the "Command Prompt" program, which brings up a
command line window.

4. The versions that are being distributed with the paper are version 1.0.
Updates will be posted at http://minerals.gps.caltech.edu. Please send bug
reports to Paul Asimow at asimow@gps.caltech.edu.

5. How to run pbg

a. If you just invoke pbg from the command line with no arguments, it
prints a help summary with a list of all the command line arguments and a
short description of each.

b. If you run pbg -i, you get an interactive mode, where you will be
prompted to set each of the options that control the behavior of the
program. This way you don't have to remember the meaning of any of the
other command line options.

c. The program requires at least one input file with a spectrum to be
backgrounded. This is a plain text file, csv format, with a wavenumber, 
absorbance pair on each line. There must be a comma between the wavenumber 
and the absorbance; other whitespace characters are ignoted. Any line-
endings (DOS, Windows, Mac, Unix) are OK. There are no header lines. The 
file examplespectrumraw.csv included with this distribution
is a legal input file. It is best if the filename ends with .csv for all 
inputs.

d. You always get an output file matching each input file with either the 
background or the background-substracted spectrum (see the -bgs option 
below) in the same format as the input; the filenames are generated by 
inserting .bg or .bgs before the .csv. There are also options to generate 
an omnibus output file with all the backgrounds or background-subtracted 
spectra (-o) generated in a given execution, and to concatenate all the 
input spectra into a single file with columns labelled by the input 
filenames (-oi).

e. All the options:

-i
puts you in interactive mode. Default is non-interactive. The rest of the 
command line is still read, but the answers to prompts can override 
command-line options.

-bgs
tells the program to output background-subtracted spectra. The default is 
to output the backgrounds instead. You could then substract them yourself 
elsewhere. If -bgs is active, the output filenames end with .bgs.csv; 
otherwise .bg.csv .

-o omnibusfilename
Activates output of a file into which all the backgrounds or background-
subtracted spectra from a given run are concatenated. Each column is 
labeled by the filename of the input. Default is no omnibus output file.

-oi omnibusinputfile
Activates output of a file into which all input spectra used are echoed. 
Each column is labelled by the filename of the input. Note, the name of 
this option is perhaps a bit misleading - this file is written, not read, 
and it is not a way to get a bunch of spectra input to the program all at 
once. Default is no omnibus input file.

-p degree
Flags the program to use a polynomial fit of the specified degree to the 
selected anchor points rather than a cubic spline. We have had better luck 
with the cubic spline fit, but the polynomial option is included as an 
option. Default is cubic spline.

-1 lowwavenumber
-2 highwavenumber
Indicates the wavenumber range over which background fitting is performed. 
Data outside this range are copied as-is into the background-subtracted 
output file or set to zero in the background output file. The default range 
is 1100 to 3970 cm^-1.

-a #anchors
Specified how many anchor points are located for fitting the background to 
the spectrum. The program begins with a line connecting the datapoint at 
lowwavenumber to the datapoint at highwavenumber. It then searches for the 
point on the spectrum between these ends that is farthest below this line. 
It adds this as an anchor and then searches for the point that is furthest 
below the piece-wise linear fit through these three points. This is 
iterated until #anchors are found. Default 25. Values up to 100 give a 
closer fit to the spectrum, but may overfit noise in some cases.

-z low1 high1 [-z low2 high2] ...
This is one way of dealing with artifactual peaks, for example from organic 
contamination. Within each range specified by a low and high wavenumber 
limit, the difference between the data and a line between the endpoints is 
added to the background. This has the result that the background-subtracted 
spectrum is linear between the endpoints. Peaks that occur in these ranges 
are therefore taken to be background features, not data. This operation is
done AFTER the anchor search. Use this option for positive peaks - See the 
-pz option for negative peaks (e.g. from bad correction of air CO2 
absorption). Specify as many -z regions as you like. Default is no -z 
regions.

-pz low1 high1 [-pz low2 high2] ...
Another way of dealing with artifacts. In this case, the region between the 
low and high wavenumber limits specified is replaced with a line BEFORE the 
anchor search. This is needed when the spectrum contains negative 
background peaks, like from undercorrection for CO2 in air along the 
optical path. Without the -pz option, the anchor search will pick
the bottoms of these negative peaks as anchors and will not therefore fit 
the desired background. Removing these peaks with -pz will prevent this 
problem. Specify as many -pz regions as you like. Default is no -pz 
regions.

-f fixedanchor1 [-f fixedanchor2] ...
This option inserts a fixed anchor, i.e. a wavenumber value at which the 
background is forced to equal the data spectrum, and the background-
subtracted spectrum is forced to be zero (exactly zero for cubic spline 
fitting, optimized towards zero in polynomial fitting). Use this option 
when the automatic anchor search does not do what you want. For example, if 
your background has a concave-down region, you will need some fixed anchors 
to pull the background up into the concavity. Default is on -f points.

-x low1 high1 [-x low2 high2] ...
This specified forbidden regions for the automatic anchor search. It will 
not pick any anchors between the specified low and high limits. This is 
another way of dealing with troublesome regions like negative background 
peaks. Specify as many -x regions as you like. Default is no -x regions.

-g linearpoint1 [-g linearpoint2] ...
Specifies a location where the cubic spline between the anchors immediately 
to either side will be replaced by a linear fit instead. This helps deal 
with locations where the cubic spline is "too wiggly". Note that the 
property of the cubic spline is such that introducing linear regions does 
not introduce any kinks - the first derivative remains continuous at the 
anchor points bounding a linear region. -g has no effect during polynomial 
fitting. Default is no -g points.

-Z epoxyfilename
Activates subtraction of a spectrum given in epoxyfilename, and scaled as 
specified by the -Zp parameter or its default. This is used to remove the 
complete spectrum of a known contaminant, for example mounting epoxy, 
instead of individual contamination peaks as with -z and -pz above.
This assumes that all the spectral features due to contamination scale 
linearly, which may not be a very good assumption. The spectrum given in 
epoxyfilename should be background-subtracted already! Please ensure that 
the wavenumbers sampled in the epoxy spectrum are exactly the same as
those in the data spectrum being subtracted. By default this feature is 
off.

-Zp peaklocation
Indicates the wavenumber at which the amplitude of the contamination 
spectrum is estimated. The height of the data spectrum above the background 
calculated before epoxy subtraction is compared to the absolute height at 
this wavenumber in epoxyfilename, and the entire epoxy spectrum is scaled 
by the ratio of these and then subtracted from the background-substracted
output or added to the background output. Default 3925.126. If the data and 
epoxy spectra do not have a sample at exactly this wavenumber, the nearest 
one is used.

infile1 [infile2] [infile3] ...
All other arguments on the command line are taken to be input files. See 
above for the format.

----------

f. The following command string, for example, does a good job on 
examplespectrumraw.csv:

pbg -bgs -a 100 -g 1800 -pz 2300 2400 -z 2820 3050 examplespectrumraw.csv

This should produce the background-subtracted spectrum given in 
examplespectrumraw.bgs.csv.

6. How to run pdc

a. If you just invoke pdc from the command line with no arguments, it 
prints a help summary with a list of all the command line arguments and a 
short description of each.

b. If you run pdc -i, you get an interactive mode, where you will be 
prompted to set each of the options that control the behavior of the 
program. This way you don't have to remember the meaning of any of the 
other command line options.

c. The program requires an input file with all the spectra to be analyzed.
This is a plain text file, csv format, with the following specification.
	- Optional header lines at the beginning, for example to label the 
	spectra. The number of header lines is set by the users with the
	-hl flag (default is 2). These are not processed, they are simply 
	copied to the output file.
	- A line of sample thicknesses, in the same units to which your 
	standards are normalized. So, if the standards are given as 1 mm 
	thick, then give unknown sample thicknesses in mm. Note, this line 
	should start with a comma, since the first column in subsequent 
	lines is the wavenumber, and the samples start at the second 
	column. Hence whatever precedes the first comma on the thickness 
	line is ignored.
	- All subsequent lines give wavenumber and then the background-
	subtracted absorbance of each unknown spectrum pair on each line. 
	All spectra must be sampled at the same wavenumbers. Spectra can be 
	either normalized or not (see the -n flag below), but all must be 
	the same option.
See the example in examplepdcinput.csv. Any line-endings (DOS, Windows, 
Mac, Unix) are OK.

d. It also needs a standard file. This file, also csv format, contains the 
E||a, E||b, and E||c standard absorbance spectra, background-subtracted and 
normalized. There are no header lines, and the format of each line is 
wavenumber, a, b, c.

e. You get an output file giving the orientation angles determined for each 
spectrum in the input file, the thickness ratios, and the goodness of fit; 
the filename is generated by inserting .decon before the .csv. The header 
lines are copied from the input file in order to label the orientations the 
same way the spectra were labeled in the input file.

f. If the -y synthfile option is set there is another output file with the
synthetic principal spectra, csv format. It also gives the integrals of 
these spectra between the limits given, and the water contents estimated 
from these integrals.

g. All the options:

-i
Puts you in interactive mode. Default is non-interactive. The rest of the 
command line is still read, but the answers to prompts can override 
command-line options.

-t
Indicates fixed thickness in the orientation fitting. Otherwise thickness 
is taken as a fitting parameter, and the ratio of best-fitting to nominal 
thickness is reported in the output file. Default is thickness fitting on.

-n
Indicates that the input spectra are normalized to a thickness of 1 of the 
units in which thickness is given, i.e. the same normalization as the 
standards. In this case they are denormalized to the measured thickness 
before fitting. The default is non-normalized input.

-s stdfilename
Gives the filename of the standards csv file, see above.

-o outputfilename
Specifies the outputfilename. By default .decon is added to the input 
filename before the .csv.

-hl headerlines
Dictates how many lines occur in the input file before the line of sample 
thicknesses. These are not read, they are merely copied to the output file.

-1 lowwavenumber
-2 highwavenumber
These indicate the range over which the unknown spectra are compared to the 
standard spectra to determine their orientation. The default is 1200 to 
2200 cm^-1.

-x low1 high1 [-x low2 high2] ...
These indicates regions between low and high limits that are ignored in the 
orientation fitting. If there are contamination peaks in the silicate 
region used for fitting, their effect can be neutralized with this option. 
Specify as many -x regions as you like. Default is none.

-y synthfile
Turns on the option to generate synthetic principal-axis spectra and gives 
the filename
where they are written. The synthfile also contains the integrals of the principal axis
spectra, and the water contents converted from each orientation and total, using the
Bell et al. calibration factor of 0.188 ppm-cm.

-int low high
Gives the wavenumber range for integration of the synthetic principal axis spectra.
Default 3000 to 3750 cm^-1.

infile
The filename of the input file with the unknown spectra. See above.

--------

h. The following command string, for example, takes examplepdcinput.csv and does the
orientation and synthesis exercises using the standards file grr997.csv:

pdc -t -n -s grr997.csv -hl 1 -x 1720 1740 -y examplesynth.csv examplepdcinput.csv

This should produce the orientations given in examplepdcinput.decon.csv and the synthetic
spectra given in examplesynth.csv.