Engineering considerations for large astrophysics projects

6 downloads 102 Views 4MB Size Report
Jan 9, 2014 ... Dave Schlegel (LBL) & Scott Burles (Cutler). ▷ Mike Blanton (NYU). ▷ Dustin Lang (CMU) & Jo Bovy (IAS) &. Dan Foreman-Mackey (NYU) ...
Engineering considerations for large astrophysics projects David W. Hogg Center for Cosmology and Particle Physics Department of Physics New York University Max-Planck-Institut f¨ ur Astronomie Heidelberg, Germany

2014 January 9

punchlines I

Calibration programs are wasteful and reduce the accuracy of your end-of-mission results. I

I

Homogeneity and uniformity of survey samples are impossible, unnecessary, and harmful goals. I

I

(you will need to implement some probability theory)

Proper uncertainty propagation is not easy. I

I

(you will need to adjust your observing strategy)

(I got nothing)

The challenge is to make precise measurements and keep discovery space open. I

(you will need to understand, quantitatively, your goals)

my teachers (incomplete list)

I

Gerry Neugebauer (Caltech, emeritus)

I

Sam Roweis (Toronto & NYU, deceased)

I

Dave Schlegel (LBL) & Scott Burles (Cutler)

I

Mike Blanton (NYU)

I

Dustin Lang (CMU) & Jo Bovy (IAS) & Dan Foreman-Mackey (NYU)

survey-centric context

I

Gaia

I

SKA and pathfinders

I

Euclid

I

LSST

I

SDSS-IV . . . and many more

I

I

(I am going to get mean at the end.)

my day job

I

Astrometry.net and TheTractor

I

emcee and kplr

I

precision measurement, probabilistic inference

I

data-driven models

homogeneity and uniformity are impossible

I

weather

I

target selection

I

hardware evolution

I

efficiency considerations

probabilistic target selection

I

SDSS-III SDSS-III BOSS quasar target selection

I

in SDSS bandpasses, z ∼ 3 quasars look like A-type stars

I I

stars outnumber quasars enormously don’t have good models of either I

I

Bovy et al. arXiv:1011.6392

this target selection cannot be uniform I

I I

heterogeneous data quality means heterogeneous target selection star density varies on the sky suck it up!

homogeneity and uniformity are unnecessary

I

correct the data I

I I

I

forward modeling I I I I

I

compute inverse selection “volume” or probabilities 1/Vmax (ish) re-weight the data using these inverse volumes very wrong! write down uncensored p0 (data | parameters) multiply by (one minus) censoring rate η(data) renormalize to get expected p(data | parameters) this is a likelihood function

(but: visualizing a forward model)

estimators

I

Cram`er–Rao bound I

example: Gaia astrometry

I

likelihood principle(s)

I

it is our duty to analyze our very limited data with optimal methods the output of any data analysis must be a likelihood function

I

I

WMAP, Planck

likelihood principle

I

I said “function”.

I

p(data | parameters)

living the likelihood dream

I

don’t make a catalog of objects I I

I

that’s some kind of (probably inefficient) estimator even with error bars it can’t transmit the full information

produce a likelihood function in catalog space I I

Lang et al. http://TheTractor.org/ Brewer et al. arXiv:1211.5805

homogeneity and uniformity are unnecessary?

I I

special case of two-point functions (and higher orders) currently an unsolved problem I

(but papers from Wandelt’s group)

homogeneity and uniformity are harmful

I

can’t be uniform in everything I

the uniformity you choose only helps one of your customers!

I

uniform samples end up requiring a lot of time on the least useful objects

I

reduces the heterogeneity that is essential to calibration

self-calibration

I

final imaging calibration of SDSS I I

made no use at all of the calibration program data Padmanabhan et al. arXiv:astro-ph/0703454

calibration programs are wasteful

I

there are more photons in the science data I

I

I

therefore, the science data contain more information about calibration (exceptions abound)

you must take your data with proper heterogeneity! I I I

Kepler tiling patterns Holmes et al. arXiv:1203.6255

Sky Position β (deg)

4

A

B

C

D

2

0

−2 −4

Sky Position β (deg)

4

2

0

−2 −4 −4

−2

0

2

Sky Position α (deg)

4

−4

−2

0

2

Sky Position α (deg)

4

(c)

(d)

0.0 −0.2

0.2 0.1

−0.4

0.0

−0.6

−0.1 −0.2 −0.3

0.950 0.940

−0.3 −0.2 −0.1 0.0

0.1

0.930

0.2

Focal Plane Position x (deg)

0.3

−0.8 −0.3 −0.2 −0.1 0.0

0.1

0.2

Focal Plane Position x (deg)

0.920 0.910 0.930 0.940 0.95 0 0.96 0.97 0 0

(a)

0.3 0.2 0.990 0 1.00

0.1

0.3

(b)

0.910

0.940 0.95 0 0.96 0.97 0 0

0.2 0.990 0 1.00

0.1

0.2 0.890 0.920 0.930

0.3

(c)

(d)

−0.1

−0.4 −0.6

0.980

−0.2 −0.3

0.0 −0.2

0.0

0.940 0.930 0.920 0.910 0.900

−0.3 −0.2 −0.1 0.0

0.960 0.950

0.1

−0.8 0.890

0.2

Focal Plane Position x (deg)

0.3

−0.3 −0.2 −0.1 0.0

0.1

0.2

Focal Plane Position x (deg)

0.3

Focal Plane Position y (deg)

0.4

−0.2

−1.0

0.2

0.960 0.950 0.940 0.930 0.920

0.910

0.900

0.920 0.900 0.930

0.3

0.950

0.2

0.980

(c)

0.940

(d)

−0.4

0.0

−0.6

−0.1 −0.2

−0.8

0.960 0.950 0.940 0.930 0.920

0.910

0.900

0.900

0.1

0.890

0.2

0.3

−0.3 −0.2 −0.1 0.0

0.1

0.2

Focal Plane Position x (deg)

0.920 0.910 0.930 0.940 0.95 0 0.96 0.97 0 0

(a)

0.2 0.1

0.0 −0.2

0.96 0.97 0 0

0.990

0.1

0.890 0.910

0.3

Residuals (%)

0.940 0.930 0.920 0.910 0.900

−0.3

0.960 0.950

0.6

0.0

1.0

0.4

1.0 0.8

−0.1

Focal Plane Position x (deg)

0.6

(b)

0.96 0.97 0 0

−0.3 −0.2 −0.1 0.0

0.8

(a)

0.940

0.990

0.1

−0.3

0.980

−0.2

0.980

−0.3

−1.0

0.0 −0.1

0.950

0.2

Residuals (%)

0.930 0.960 0.970

0.910

0.920 0.930

0.3

0

0.99

0.3

(b)

−1.0 1.0 0.8

0 1.00

0.6

0.0 −0.1

0.4 0.980

−0.2 0.940 0.930 0.920 0.910 0.900

−0.3

0.960 0.950

0.910

0.940 0.95 0 0.96 0.97 0 0

0.2 0.1

0.2 0.890 0.920 0.930

0.3

0

0.99

(c)

(d)

−0.2

0 1.00

−0.4

0.0 −0.1

−0.6

0.980

−0.2 −0.3

0.0

0.940 0.930 0.920 0.910 0.900

−0.3 −0.2 −0.1 0.0

0.960 0.950

0.1

−0.8 0.890

0.2

Focal Plane Position x (deg)

0.3

−0.3 −0.2 −0.1 0.0

0.1

0.2

Focal Plane Position x (deg)

0.3

−1.0

Residuals (%)

0.950 0.940

0.2

Focal Plane Position y (deg)

−0.2

Residuals (%)

0.4

Focal Plane Position y (deg)

0.0 −0.1

0 0.95

Focal Plane Position y (deg)

1.0

0.6

0 0.95

Focal Plane Position y (deg)

0.1

0.3

Focal Plane Position y (deg)

(b)

0.8

−0.3

Focal Plane Position y (deg)

(a)

0.2

Focal Plane Position y (deg)

0.960 0.970

0.3

Self-calibration of imaging

I

A good survey: I I I I

I

every star appears in many images in different images, the star is in different places every image contains many stars Holmes et al. arXiv:1203.6255

Kepler and Spitzer exoplanet photometry is pessimal for self-calibration. . . I

. . . but for a very good reason!

target selection is classification

I

SDSS-III SDSS-III BOSS is taking spectra of quasars, not stars

I

stars outnumber (relevant) quasars by factors of hundreds

I

observations are noisy and theoretical models are incomplete

I

want to find only the quasars. . . or do we?

classification algorithms

I

Support Vector Machine, Random Forest, Artificial Neural Net I

I

all bad!

value of a causal model I I

I I

training and test samples don’t match need to classify new data taken under different conditions make use of our technical knowledge about the data. Bovy et al. arXiv:1011.6392

1-epoch

model

30-epoch

aside: discovery as classification

I

found an exoplanet? I I

I

utility arises I

I

That’s a model selection move. Bayes doesn’t tell you how to make decisions. Make decisions that maximize expected (scientific?) return.

Astrometry.net has an explicit utility model I I

I

Automatic calibration of an image successful? Our “customer model” is that they are offended by false positives. Lang et al. arXiv:0910.2233

utility considerations

I

might be worth taking a source unlikely to be a quasar, as long as it is likely to be interesting I I I I

I

need to be able to make these trade-offs quantitatively requires a specification of utility needs to be measured in dollars (or equivalent) long-term future discounted free cash flow

the “game” of proposal writing I I I

we aren’t honest in our proposals about what we want SDSS was over-designed by any measure that was valuable!

over-design

I

SDSS was seriously over-designed to measure the large-scale structure I I

I

I

(no-one thinks that was a bad idea) could have done all the large-scale structure in less than one year of observing we might have to be more honest going forward

if we want to use resources efficiently, we need to face a trade-off between efficiency and discovery I I

At the present, everything is heuristics. I say we make this trade-off explicitly, not implicitly.

utopia I

every part of your data analysis pipeline returns a likelihood function I

I

I

you can simulate data under different experimental designs I

I

likelihood is p(data | parameters)

you have a specified utility function I

I

information propagation through the pipeline always by likelihood function implications are severe

converts information in your answer into dollars

every decision can now be an optimization I I I

detectors, optical path, spectral elements filters, exposure times, cadences targets

example: bandpasses

I

LSST plans to do imaging in ugrizy

I

I am going to smash that r filter! why not do ugWizy ?

I

I I I

easy example because zero-cost change doesn’t require full utility specification bet it is much better for low-s/n objects

hardware vs software trades

I

P1640 I I

I

Oppenheimer et al. arXiv:1303.2627 Fergus et al. in prep

glitter cam I

Fergus et al. MIT-CSAIL-TR-2006-058

open-source surveys

I

Hipparcos example

I

SDSS calibration example

I

enormous benefits accrue from making the data re-reducable from scratch

throwing down the gauntlet

I

Gaia uncertainty propagation (qualitative)

I

Euclid observing strategy for imaging

I

LSST bandpass, cadence, and exposure-time settings

I

SKA pathfinder image products

I

eBOSS two-point function estimators APOGEE & HERMES signal-to-noise requirements

I

I I

(My hourly rates are a bargain.) (These surveys are all awesome!)

punchlines I

Calibration programs are wasteful and reduce the accuracy of your end-of-mission results. I

I

Homogeneity and uniformity of survey samples are impossible, unnecessary, and harmful goals. I

I

(you will need to implement some probability theory)

Proper uncertainty propagation is not easy. I

I

(you will need to adjust your observing strategy)

(I got nothing)

The challenge is to make precise measurements and keep discovery space open. I

(you will need to understand, quantitatively, your goals)

Suggest Documents