- Vannever Bush
"Whenever logical processes of thought are employed,that is, whenever thought for a time runs along an accepted groove,there is an opportunity for the machine."

"A mathematician is not a man (?) who can readily manipulate figures; often he cannot. He is not even a man who can readily perform the transformations of equations by the use of calculus. He is primarily an individual who is skilled in the use of symbolic logic on a high plane, and especially he is a man of intuitive judgment in the choice of the manipulative processes he employs.

All else he should be able to turn over to his mechanism, just as confidently as he turns over the propelling of his car to the intricate mechanism under the hood."

The Atlantic Monthly- July 1945

As We May Think

by Vannevar Bush - Bin Yu

Computation for Statistical Inference

- First generation computation in statistics before computers:
use parametric models with closed form solutions for

maximum likelihood estimators or Bayes estimators.

- Second generation computation with computers:
design statistically optimal procedures and worry about

computation later. Call optimization routines.

- Third generation computation:
form statistical goals with computation in mind and take

advantage of special features of statistical computation.

- Research experience. Last week we heard from Dr. Yu Han's adviser, Professor Song, who presented an ingenious solution to a complex estimation problem. A central role was played by a tuning parameter, for which Professor Song derived several theorems. At dinner, I learned that computational exploration came first, then the theorems were worked out.
- My own research experience:
- The experience of grid computing at Iowa State University.
- The importance of user interface, and of computing, despite
reluctance in some quarters. (i.e. if you are interested in this methodological
approach, I hope my comments are of some use to you in justifying your
interest.
- Cited Examples:
- Jennifer Hoeting's comments to us that releasing R software increased interest in her research.
- Our teacher, Di Cook, elected a Fellow of the American Statistical Association for her work on statistical graphics and high-dimensional data visualization.

- [UPDATE, Summer 2008]
(Iowa State University
Statistics Department News -- 2008):
#### Statistics Department Ranks 5th in Nation!

Ken Koehler, University Professor and chair of statistics, attributed the high ranking to several factors: a $1.8 million National Science Foundation grant helped the department improve its graduate program and attract more doctoral students (it now has 90);

and the department has one of the best survey sampling programs in the world. ...__the department recently hosted a major conference for users of software for statistical computing;__

- History of computing in statistics at ISU: 1924, Henry A. Wallace and "Machine Calculations for Statistical Methods."
- Why I think this research seminar session might be of help.
- We generally lack time to search out new ways of doing things.

- The statistical task is a complex, multilayered one. And computation has not reached Bushes' dream of not needing to know what is under the hood- or rather, perhaps driving a car involves a different set of complex behavior patterns than walking.
- You can call the suit of tools we have passed out, organized as they are, a type of Integrated Development Environment (IDE).
- Why an IDE? The statistical task has many components. Switching interfaces can be time consuming and inefficient.

Let us look at what the authors of the component that integrates our statistical package into our IDE have to say about the statistical task:

"Complex contacts between interfaces can involve the coordination
of several data files, multiple sta- tistical software packages, and the
corresponding source code in each of these languages, combined for a
single analysis. ESS, as part of Emacs, has tools which assist in this,
including support for version and source code control systems, tools
for accessing programs or files on remote machines, and interfaces
to documentation systems including LaTeX and XML. In addition,
Emacs can assist with, or be programmed to perform, many tasks
related to data cleaning, management, and editing."

--Rossini et. al citation following.

The ultimate completion of this IDE is Linux, in which the whole operating system is designed with this philosophy and purpose. As someone doing scientific computing, you are not working at cross purposes with your operating system as in the case of Windows, which wants you to sit still at your desk and play solitaire. (This is a joke referring to a very popular card game that ships with Windows.)

"ESS is one of the first IDEs intended for statisticians. It provides an enhanced, powerful interface for efficient interactive data analysis and statistical programming. It is completely customizable to satisfy individual desires for interface styles as well as being extensible to additional statistical languages and analysis packages."

Rossini, Machler, Hornik, Heiberger, Sparapani (2001)

Emacs Speaks Statistics: A Universal Interface for Statistical
Analysis

http://ess.r-project.org/

How does this connect to statisticians? We will consider one major statistical program that ess integrates:

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hard copy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Why R for researchers and students?

- Flexible and general design useful for exploration and learning.
- Site of active statistical research.

Presenting results in journals (and homework) is an important part of the statistical
task.

LaTeX is the best way to write math on a computer.

Based on TeX, created by
Donald Knuth
when he was dismayed at the poor quality of
one of his mathematical books produced by computer.

Once the project gets big enough, or once people start to collaborate, project management becomes essential. A good version control tool can help very much in this task. Thus, we introduce Subversion, a system highly favored by many developers currently.

