Basic Astrometry Fit

Here is a worked example of a one-planet model fit to relative astrometry (positions measured between the planet and the host star).

Start by loading the Octofitter and Distributions packages:

using Octofitter, Distributions

Specifying the data

We will create a likelihood object to contain our relative astrometry data. We can specify this data in several formats. It can be listed in the code or loaded from a file (eg. a CSV file, FITS table, or SQL database).

astrom_like = PlanetRelAstromLikelihood(
    #        MJD         mas                       mas                        mas         mas
    (epoch = 50000, ra = -505.7637580573554, dec = -66.92982418533026, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50120, ra = -502.570356287689, dec = -37.47217527025044, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50240, ra = -498.2089148883798, dec = -7.927548139010479, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50360, ra = -492.67768482682357, dec = 21.63557115669823, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50480, ra = -485.9770335870402, dec = 51.147204404903704, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50600, ra = -478.1095526888573, dec = 80.53589069730698, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50720, ra = -469.0801731788123, dec = 109.72870493064629, σ_ra = 10, σ_dec = 10, cor=0),
    (epoch = 50840, ra = -458.89628893460525, dec = 138.65128697876773, σ_ra = 10, σ_dec = 10, cor=0),
    instrument_name = "GPI" # optional -- name for this group of data
)
PlanetRelAstromLikelihood Table with 6 columns and 8 rows:
     epoch  ra        dec       σ_ra  σ_dec  cor
   ┌────────────────────────────────────────────
 1 │ 50000  -505.764  -66.9298  10    10     0
 2 │ 50120  -502.57   -37.4722  10    10     0
 3 │ 50240  -498.209  -7.92755  10    10     0
 4 │ 50360  -492.678  21.6356   10    10     0
 5 │ 50480  -485.977  51.1472   10    10     0
 6 │ 50600  -478.11   80.5359   10    10     0
 7 │ 50720  -469.08   109.729   10    10     0
 8 │ 50840  -458.896  138.651   10    10     0

You can also specify it in separation (mas) and positon angle (rad):

astrom_like_2 = PlanetRelAstromLikelihood(
    (epoch = 50000, sep = 505.7637580573554, pa = deg2rad(24.1), σ_sep = 10, σ_pa =deg2rad(1.2), cor=0),
    # ...etc.
    instrument_name = "GPI" # optional -- name for this group of data
)

Another way we could specify the data is by column:

astrom_like = PlanetRelAstromLikelihood(
    Table(
        epoch= [50000,  50120, 50240, 50360,50480, 50600, 50720, 50840,], # MJD
        ra = [-505.764, -502.57, -498.209, -492.678,-485.977, -478.11, -469.08, -458.896,], # mas
        dec = [-66.9298, -37.4722, -7.92755, 21.6356, 51.1472,  80.5359,  109.729,  138.651, ], # mas
        σ_ra = fill(10.0, 8),
        σ_dec = fill(10.0, 8),
        cor = fill(0.0, 8),
    ),
    instrument_name = "GPI" # optional -- name for this group of data
)

Finally we could also load the data from a file somewhere. Here is an example of loading a CSV:

using CSV # must install CSV.jl first
astrom_data = CSV.read("mydata.csv", Table)
astrom_like = PlanetRelAstromLikelihood(
    astrom_data,
    instrument_name="GPI"
)

You can also pass a DataFrame or any other table format.

In Octofitter, epoch is always the modified Julian date (measured in days). If you're not sure what this is, you can get started by just putting in arbitrary time offsets measured in days.

In this case, we specified ra and dec offsets in milliarcseconds. We could instead specify sep (projected separation) in milliarcseconds and pa in radians. You cannot mix the two formats in a single PlanetRelAstromLikelihood but you can create two different likelihood objects, one for each format.

Creating a planet

We now create our first planet model. Let's name it planet b. The name of the planet will be used in the output results.

In Octofitter, we specify planet and system models using a "probabilistic programming language". Quantities with a ~ are random variables. The distributions on the right hand sides are priors. You must specify a proper prior for any quantity which is allowed to vary.

We now create our planet b model using the @planet macro.

@planet b Visual{KepOrbit} begin
    a ~ truncated(Normal(10, 4), lower=0.1, upper=100)
    e ~ Uniform(0.0, 0.5)
    i ~ Sine()
    ω ~ UniformCircular()
    Ω ~ UniformCircular()
    θ ~ UniformCircular()
    tp = θ_at_epoch_to_tperi(system,b,50420)
end astrom_like

In the model definition, b is the name of the planet (it can be anything), Visual{KepOrbit} is the type of orbit parameterization (see the PlanetOrbits.jl documentation page).

After the begin comes our variable specification. Quantities with a ~ are random variables aka. our priors. You must specify a proper prior for any quantity which is allowed to vary. "Uninformative" priors like 1/x must be given bounds, and can be specified with LogUniform(lower, upper).

Warning

Make sure that variables like mass and eccentricity can't be negative. You can pass a distribution to truncated to prevent this, e.g. M ~ truncated(Normal(1, 0.1),lower=0).

Priors can be any univariate distribution from the Distributions.jl package.

For a KepOrbit you must specify the following parameters:

  • a: Semi-major axis, astronomical units (AU)
  • i: Inclination, radians
  • e: Eccentricity in the range [0, 1)
  • ω: Argument of periastron, radius
  • Ω: Longitude of the ascending node, radians.
  • tp: Epoch of periastron passage

Many different distributions are supported as priors, including Uniform, LogNormal, LogUniform, Sine, and Beta. See the section on Priors for more information. The parameters can be specified in any order.

You can also hardcode a particular value for any parameter if you don't want it to vary. Simply replace eg. e ~ Uniform(0, 0.999) with e = 0.1. This = syntax works for arbitrary mathematical expressions and even functions. We use it here to reparameterize tp.

tp is a date which sets the location of the planet around its orbit. It repeats every orbital period and the orbital period depends on the scale of the orbit. This makes it quite hard to sample from. We therefore reparameterize using θ parameter, representing the position angle of the planet at a given reference epoch. This parameterization speeds up sampling quite a bit!

After the variables block are zero or more Likelihood objects. These are observations specific to a given planet that you would like to include in the model. If you would like to sample from the priors only, don't pass in any observations.

For this example, we specify PlanetRelAstromLikelihood block. This is where you can list the position of a planet relative to the star at different epochs.

Creating a system

A system represents a host star with one or more planets. Properties of the whole system are specified here, like parallax distance and mass of the star. This is also where you will supply data like images, astrometric acceleration, or stellar radial velocity since they don't belong to any planet in particular.

@system Tutoria begin
    M ~ truncated(Normal(1.2, 0.1), lower=0.1)
    plx ~ truncated(Normal(50.0, 0.02), lower=0.1)
end b

Tutoria is the name we have given to the system. It could be eg PDS70, or anything that will help you keep track of the results.

The variables block works just like it does for planets. Here, the two parameters you must provide are:

  • M: Gravitational parameter for the total mass of the system, expressed in units of Solar mass.
  • plx: Distance to the system expressed in milliarcseconds of parallax.

Make sure to truncate the priors to prevent unphysical negative masses or parallaxes.

After that, just list any planets that you want orbiting the star. Here, we pass planet b. This is also where we could pass likelihood objects for system-wide data like stellar radial velocity.

Prepare model

We now convert our declarative model into efficient, compiled code:

model = Octofitter.LogDensityModel(Tutoria)
LogDensityModel for System Tutoria of dimension 11 and 11 epochs with fields .ℓπcallback and .∇ℓπcallback

This type implements the julia LogDensityProblems.jl interface and can be passed to a wide variety of samplers.

Initialize starting points for chains

Run the initialize! function to find good starting points for the chain. You can provide guesses for parameters if you want to.

init_chain = initialize!(model) # No guesses provided, slower global optimization will be used
init_chain = initialize!(model, (;
    plx = 50,
    M = 1.21,
    planets = (;
        b=(;
            a = 10.0,
            e = 0.01,
            # note! Never initialize a value on the bound, exactly 0 eccentricity is disallowed by the `Uniform(0,1)` prior
        )
    )
))
Chains MCMC chain (1000×16×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.18 seconds
Compute duration  = 1.18 seconds
parameters        = M, plx, b_a, b_e, b_i, b_ωy, b_ωx, b_Ωy, b_Ωx, b_θy, b_θx, b_ω, b_Ω, b_θ, b_tp
internals         = logpost

Summary Statistics
  parameters         mean         std      mcse    ess_bulk    ess_tail      r ⋯
      Symbol      Float64     Float64   Float64     Float64     Float64   Floa ⋯

           M       1.2021      0.0771    0.0025    945.7619    904.5700    0.9 ⋯
         plx      50.0001      0.0227    0.0007    947.5717    992.3671    0.9 ⋯
         b_a      11.0798      1.0734    0.0308   1193.7284    998.9087    0.9 ⋯
         b_e       0.1938      0.0805    0.0026    939.5029    876.3151    0.9 ⋯
         b_i       0.6335      0.1209    0.0037   1043.4855   1004.6833    0.9 ⋯
        b_ωy       0.5533      0.6286    0.0191   1112.6303    868.6575    0.9 ⋯
        b_ωx       0.3457      0.4084    0.0124   1060.9354    927.7721    0.9 ⋯
        b_Ωy       0.7399      0.3578    0.0114    967.9296    893.1140    0.9 ⋯
        b_Ωx       0.3874      0.4217    0.0140    914.2489    985.6804    0.9 ⋯
        b_θy       0.0741      0.0089    0.0003    922.0427    835.5932    1.0 ⋯
        b_θx      -0.9773      0.0729    0.0024    893.1805    706.4381    0.9 ⋯
         b_ω       0.6543      1.0287    0.0335   1107.7191    957.7191    1.0 ⋯
         b_Ω       0.5075      0.5860    0.0191    927.0425    988.0099    1.0 ⋯
         b_θ      -1.4951      0.0064    0.0002   1042.6948    867.5731    0.9 ⋯
        b_tp   43102.2933   2620.5422   85.8354    894.2458    895.0928    0.9 ⋯
                                                               2 columns omitted

Quantiles
  parameters         2.5%        25.0%        50.0%        75.0%        97.5%
      Symbol      Float64      Float64      Float64      Float64      Float64

           M       1.0435       1.1688       1.2125       1.2395       1.3380
         plx      49.9593      49.9878      49.9968      50.0166      50.0400
         b_a       8.6875      10.9533      11.3261      11.7166      12.6266
         b_e       0.0184       0.1476       0.1894       0.2333       0.3255
         b_i       0.4170       0.5101       0.6658       0.7285       0.8288
        b_ωy      -1.0241       0.6685       0.7894       0.9176       1.0410
        b_ωx      -0.6057       0.2492       0.4384       0.6374       0.8444
        b_Ωy      -0.0144       0.5071       0.9332       0.9785       1.1420
        b_Ωx      -0.1255       0.0529       0.3258       0.8132       1.0610
        b_θy       0.0596       0.0689       0.0742       0.0819       0.0883
        b_θx      -1.1091      -1.0255      -0.9773      -0.9479      -0.8006
         b_ω      -0.8341       0.2851       0.6069       0.7948       2.8935
         b_Ω      -0.1247       0.0576       0.3249       1.0220       1.5855
         b_θ      -1.5087      -1.4994      -1.4965      -1.4900      -1.4844
        b_tp   40667.6779   41054.1065   42090.6675   44753.6968   49035.4725

Visualize the starting points

Plot the inital values to make sure that they are reasonable, and match your data. This is a great time to confirm that your data were entered in correctly.

using CairoMakie
octoplot(model, init_chain)
Example block output

The starting points for sampling look reasonable!

Note

The return value from initialize! is a "variational approximation". You can pass that chain to any function expecting a chain argument, like Octofitter.savechain or octocorner. It gives a rough approximation of the posterior we expect. The central values are probably close, but the uncertainties are unreliable.

Sampling

Now we are ready to draw samples from the posterior:

chain = octofit(model)
Chains MCMC chain (1000×28×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 3.08 seconds
Compute duration  = 3.08 seconds
parameters        = M, plx, b_a, b_e, b_i, b_ωy, b_ωx, b_Ωy, b_Ωx, b_θy, b_θx, b_ω, b_Ω, b_θ, b_tp
internals         = n_steps, is_accept, acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt, loglike, logpost, tree_depth, numerical_error

Summary Statistics
  parameters         mean         std       mcse   ess_bulk   ess_tail      rh ⋯
      Symbol      Float64     Float64    Float64    Float64    Float64   Float ⋯

           M       1.1811      0.0959     0.0036   704.2649   648.3496    1.00 ⋯
         plx      49.9993      0.0187     0.0006   907.4223   593.8112    1.00 ⋯
         b_a      11.7780      2.2752     0.1402   245.7569   313.0496    1.00 ⋯
         b_e       0.1576      0.1138     0.0065   307.8376   419.0989    1.00 ⋯
         b_i       0.6327      0.1540     0.0095   283.6325   261.1875    1.00 ⋯
        b_ωy      -0.0431      0.7495     0.0579   200.0304   781.2544    1.00 ⋯
        b_ωx      -0.0540      0.6751     0.0551   166.6325   757.9030    1.00 ⋯
        b_Ωy       0.0608      0.7448     0.1102    55.1621   385.3888    1.00 ⋯
        b_Ωx       0.0833      0.6773     0.0984    57.5994   241.0397    1.00 ⋯
        b_θy       0.0749      0.0109     0.0004   759.6095   306.3232    0.99 ⋯
        b_θx      -1.0052      0.0987     0.0035   836.8297   560.7032    0.99 ⋯
         b_ω      -0.2901      1.8579     0.1386   231.8360   650.7426    1.00 ⋯
         b_Ω      -0.2650      1.7502     0.2173    96.8815   317.6240    1.00 ⋯
         b_θ      -1.4965      0.0074     0.0003   813.6123   641.3347    1.00 ⋯
        b_tp   42629.5788   5214.0891   329.0673   246.9617   642.8424    1.00 ⋯
                                                               2 columns omitted

Quantiles
  parameters         2.5%        25.0%        50.0%        75.0%        97.5%
      Symbol      Float64      Float64      Float64      Float64      Float64

           M       1.0131       1.1128       1.1804       1.2451       1.3823
         plx      49.9606      49.9869      49.9996      50.0108      50.0356
         b_a       8.1027      10.0905      11.3544      13.2475      16.8267
         b_e       0.0067       0.0627       0.1326       0.2429       0.4096
         b_i       0.2797       0.5401       0.6487       0.7436       0.9009
        b_ωy      -1.0628      -0.7656      -0.1987       0.7528       1.0796
        b_ωx      -1.0546      -0.6855      -0.1096       0.5695       1.0296
        b_Ωy      -1.0772      -0.7069       0.1283       0.8148       1.0964
        b_Ωx      -1.0340      -0.5315       0.2036       0.6881       1.0733
        b_θy       0.0560       0.0671       0.0744       0.0819       0.0980
        b_θx      -1.2069      -1.0713      -0.9999      -0.9338      -0.8258
         b_ω      -2.9793      -2.1048      -0.1922       1.0944       3.0030
         b_Ω      -2.9719      -2.0871       0.2641       1.0562       2.9208
         b_θ      -1.5111      -1.5015      -1.4963      -1.4914      -1.4825
        b_tp   30235.5670   39389.3753   43629.4224   46367.4052   49877.5208

You will get an output that looks something like this with a progress bar that updates every second or so. You can reduce or completely silence the output by reducing the verbosity value down to 0.

Once complete, the chain object will hold the sampler results. Displaying it prints out a summary table like the one shown above. For a basic model like this, sampling should take less than a minute on a typical laptop.

Sampling can take much longer when you have measurements with very small uncertainties.

Diagnostics

The first thing you should do with your results is check a few diagnostics to make sure the sampler converged as intended.

A few things to watch out for: check that you aren't getting many numerical errors (ratio_divergent_transitions). This likely indicates a problem with your model: either invalid values of one or more parameters are encountered (e.g. the prior on semi-major axis includes negative values) or that there is a region of very high curvature that is failing to sample properly. This latter issue can lead to a bias in your results.

One common mistake is to use a distribution like Normal(10,3) for semi-major axis. This left tail of this distribution includes negative values, and our orbit model is not defined for negative semi-major axes. A better choice is a truncated(Normal(10,3), lower=0.1) distribution (not including zero, since a=0 is not defined).

You may see some warnings during initial step-size adaptation. These are probably nothing to worry about if sampling proceeds normally afterwards.

You should also check the acceptance rate (mean_accept) is reasonably high and the mean tree depth (mean_tree_depth) is reasonable (~4-8). Lower than this and the sampler is taking steps that are too large and encountering a U-turn very quicky. Much larger than this and it might be being too conservative.

Next, you can make a trace plot of different variabes to visually inspect the chain:

using CairoMakie
lines(
    chain["b_a"][:],
    axis=(;
        xlabel="iteration",
        ylabel="semi-major axis (AU)"
    )
)
Example block output

And an auto-correlation plot:

using StatsBase
using CairoMakie
lines(
    autocor(chain["b_e"][:], 1:500),
    axis=(;
        xlabel="lag",
        ylabel="autocorrelation",
    )
)
Example block output

This plot shows that these samples are not correlated after only about 5 iterations. No thinning is necessary.

To confirm convergence, you may also examine the rhat column from chains. This diagnostic approaches 1 as the chains converge and should at the very least equal 1.0 to one significant digit (3 recommended).

Finaly, you might consider running multiple chains. Simply run octofit multiple times, and store the result in different variables. Then you can combine the chains using chainscat and run additional inter-chain convergence diagnostics:

using MCMCChains
chain1 = octofit(model)
chain2 = octofit(model)
chain3 = octofit(model)
merged_chains = chainscat(chain1, chain2, chain3)
gelmandiag(merged_chains)
Gelman, Rubin, and Brooks diagnostic
  parameters      psrf    psrfci
      Symbol   Float64   Float64

           M    1.0005    1.0028
         plx    1.0001    1.0005
         b_a    1.0015    1.0061
         b_e    1.0011    1.0046
         b_i    1.0085    1.0137
        b_ωy    1.0007    1.0032
        b_ωx    1.0009    1.0031
        b_Ωy    1.0002    1.0020
        b_Ωx    1.0019    1.0064
        b_θy    0.9998    1.0003
        b_θx    1.0010    1.0042
         b_ω    1.0016    1.0064
         b_Ω    1.0010    1.0050
         b_θ    1.0002    1.0018
        b_tp    1.0007    1.0038

This will check that the means and variances are similar between chains that were initialized at different starting points.

Analysis

As a first pass, let's plot a sample of orbits drawn from the posterior. The function octoplot is a conveninient way to generate a 9-panel plot of velocities and position:

using CairoMakie
octoplot(model,chain)
Example block output

This function draws orbits from the posterior and displays them in a plot. Any astrometry points are overplotted.

You can control what panels are displayed, the time range, colourscheme, etc. See the documentation on octoplot for more details.

Pair Plot

A very useful visualization of our results is a pair-plot, or corner plot. We can use the octocorner function and our PairPlots.jl package for this purpose:

using CairoMakie
using PairPlots
octocorner(model, chain, small=true)
Example block output

Remove small=true to display all variables.

In this case, the sampler was able to resolve the complicated degeneracies between eccentricity, the longitude of the ascending node, and argument of periapsis.

Saving your chain

Variables can be retrieved from the chains using the following sytnax: sma_planet_b = chain["b_a",:,:]. The first index is a string or symbol giving the name of the variable in the model. Planet variables are prepended by the name of the planet and an underscore.

You can save your chain in FITS table format by running:

Octofitter.savechain("mychain.fits", chain)

You can load it back via:

chain = Octofitter.loadchain("mychain.fits")

Saving your model

You may choose to save your model so that you can reload it later to make plots, etc:

using Serialization
serialize("model1.jls", model)

Which can then be loaded at a later time using:

using Serialization
using Octofitter # must include all the same imports as your original script
model = deserialize("model1.jls")
Warning

Serialized models are only loadable/restorable on the same computer, version of Octofitter, and version of Julia. They are not intended for long-term archiving. For reproducibility, make sure to keep your original model definition script.

Comparing chains

We can compare two different chains by passing them both to octocorner. Let's compare the init_chain with the full results from octofit:

octocorner(model, chain, init_chain, small=true)
Example block output