Generating and Fitting Simulated Data

This tutorial demonstrates how to use Octofitter to simulate synthetic data, fit models to that data, and validate the results. This is a crucial workflow for understanding model performance and testing analysis pipelines.

We'll simulate data from a known set of parameters, fit a model to recover those parameters, and compare the posterior to the true values used for simulation.

Setup

using Octofitter, OctofitterRadialVelocity, Distributions
using CairoMakie, PairPlots, Pigeons
using CSV, DataFrames

Define the Model

In order to simulate data from Octofitter, you start by defining a real model with data. The simulate step will use the provided epochs, data types, and uncertainties when simulating data. If you don't have real data, you can enter in arbitrary values for e.g. delta R.A., but use expected epochs and uncertainties.

For this example, we'll use the combined astrometry, proper motion anomaly, and radial velocity model from the Astrometry, PMA, and RV tutorial to define the epochs and uncertainties.

# HGCA likelihood for HD 91312
hgca_like = HGCALikelihood(gaia_id=756291174721509376)

# Simulated relative astrometry data (from discovery paper)
astrom_dat = Table(;
    epoch = [mjd("2016-12-15"), mjd("2017-03-12"), mjd("2017-03-13"), mjd("2018-02-08"), mjd("2018-11-28"), mjd("2018-12-15")],
    ra    = [133., 126., 127., 083., 058., 056.],
    dec   = [-174., -176., -172., -133., -122., -104.],
    σ_ra  = [07.0, 04.0, 04.0, 10.0, 10.0, 08.0],
    σ_dec = [07.0, 04.0, 04.0, 10.0, 20.0, 08.0],
    cor   = [0.2, 0.3, 0.1, 0.4, 0.3, 0.2]
)

astrom_like = PlanetRelAstromLikelihood(
    astrom_dat,
    name = "SCExAO",
    variables = @variables begin
        jitter = 0
        northangle = 0
        platescale = 1
    end
)

# Simulated RV data
rv_dat = Table(;
    epoch = [mjd("2008-05-01"), mjd("2010-02-15"), mjd("2016-03-01")],
    rv    = [1300, 700, -2700],
    σ_rv  = [150, 150, 150]
)

rvlike = StarAbsoluteRVLikelihood(
    rv_dat,
    name="SOPHIE",
    variables=@variables begin
        jitter ~ truncated(Normal(10, 5), lower=0)
        offset ~ Normal(0, 1000)
    end
)

# Planet model
planet_b = Planet(
    name="b",
    basis=AbsoluteVisual{KepOrbit},
    likelihoods=[ObsPriorAstromONeil2019(astrom_like)],
    variables=@variables begin
        a ~ LogUniform(0.1,400)
        e ~ Uniform(0,0.999)
        ω ~ Uniform(0, 2pi)
        i ~ Sine()
        Ω ~ Uniform(0, 2pi)
        mass = system.M_sec
        θ ~ Uniform(0, 2pi)
        M = system.M
        tp = θ_at_epoch_to_tperi(θ, 57737.0; M, e, a, i, ω, Ω)
        F = 0.0
    end
)

# System model
ra = 158.30707896392835
dec = 40.42555422701387

sys = System(
    name="HD91312_simulation",
    companions=[planet_b],
    likelihoods=[hgca_like, rvlike],
    variables=@variables begin
        M_pri ~ truncated(Normal(1.61, 0.1), lower=0.1)
        M_sec ~ LogUniform(0.5, 1000)
        M = M_pri + M_sec*Octofitter.mjup2msol
        plx ~ gaia_plx(gaia_id=756291174721509376)
        pmra ~ Normal(-137, 10)
        pmdec ~ Normal(2, 10)
        ra = $ra
        dec = $dec
        rv = 0*1e3
        ref_epoch = Octofitter.meta_gaia_DR3.ref_epoch_mjd
    end
)

model = Octofitter.LogDensityModel(sys)
┌ Warning: Only stars with solution types 5 are currently supported. This sol is 7.0. If you need solution type 1, 7, or 9, please open an issue on GitHub and we would be happy to add it.
└ @ Octofitter ~/work/Octofitter.jl/Octofitter.jl/src/likelihoods/hipparcos.jl:192
[ Info: renormalizing hipparcos uncertainties according to Nielsen et al (2020). Pass `renormalize=false` to disable.
[ Info: No scan law table provided. We will fetch an approximate solution from the GOST webservice.
┌ Info: Detected known gap in Gaia scans; skipping.
│   window = 57175.48233750556
└   note = "Decontamination"
[ Info: Preparing model
┌ Info: Determined number of free variables
└   D = 13
┌ Info: Determined number type
└   T = Float64
ℓπcallback(θ): 0.000152 seconds (78 allocations: 84.312 KiB)
∇ℓπcallback(θ): 0.001028 seconds (24 allocations: 152.391 KiB)

Generate Synthetic Data

We have three choices for generating simulated data:

  1. Draw values from the priors
  2. Use values from a fitted posterior
  3. Specifying values manually

We will look at each.

1. Draw values from the priors

We can draw a value from the priors like so:

params_to_simulate = Octofitter.drawfrompriors(model.system)
(M_pri = 1.5562986319255552, M_sec = 10.347465336014558, plx = 29.118268387196267, pmra = -127.94961948113367, pmdec = -11.183875245402653, M = 1.566176262671512, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 5.499230860761885, offset = -187.44375737101257),), planets = (b = (a = 13.332176157916468, e = 0.7046170387475686, ω = 3.172046394443539, i = 1.8933071613532866, Ω = 5.469217899998308, θ = 5.639780995632805, mass = 10.347465336014558, M = 1.566176262671512, tp = 54694.40569689927, F = 0.0, observations = (obspri_SCExAO = (jitter = 0, northangle = 0, platescale = 1),)),))

2. Use values from a fitted posterior

We can select a particular draw from the posterior and use this to generate new data

# Perform MCMC fitting on real data
using Pigeons
chain_real, pt = octofit_pigeons(model, n_rounds=5)

# Use one particular draw as the basis of our simulation
draw_number = 1
params_to_simulate = Octofitter.mcmcchain2result(model, chain_real, draw_number)

Specifying values manually

We can also specify all values for the simulation manually. This process is a bit more involved.

Start by drawing parameters from the priors:

template = Octofitter.drawfrompriors(model.system)
(M_pri = 1.5476050121013458, M_sec = 673.0608196917726, plx = 29.133801224413578, pmra = -144.65193222974793, pmdec = 4.050878734755731, M = 2.1901049896897793, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 12.466066600549926, offset = 1338.673719329046),), planets = (b = (a = 38.122780455556125, e = 0.1700453043911487, ω = 1.5602361119870258, i = 0.37314499294146414, Ω = 5.108649912355986, θ = 1.0219160163400411, mass = 673.0608196917726, M = 2.1901049896897793, tp = 53772.267370459755, F = 0.0, observations = (obspri_SCExAO = (jitter = 0, northangle = 0, platescale = 1),)),))

Copy this output as a template, and replace values as needed. Note that if some parameters are calculated based on others in your model, you will have to repeat those calculations here.

Warning

Note that the output below is just an example, you must generate your own template from your model and modify it as needed. The exact structure is not garuanteed to be stable between versions of Octofitter.

# Define our "true" parameter values for simulation
M_pri = 1.61
M_sec = 85.0
M = M_pri + M_sec*Octofitter.mjup2msol
params_to_simulate = (
    M_pri = M_pri,
    M_sec = M_sec,  # Jupiter masses
    M = M,
    plx = 21.5,
    pmra = -137.0,
    pmdec = 2.0,
    ra = ra,
    dec = dec,
    rv = 0.0,
    ref_epoch = Octofitter.meta_gaia_DR3.ref_epoch_mjd,
    observations = (
        SOPHIE = (jitter = 15.0, offset = 0.0),
    ),
    planets = (
        b = (
            a = 45.0,
            e = 0.15,
            ω = 1.2,
            mass = M_sec,
            i = 0.8,
            Ω = 2.1,
            θ = 1.5,
            M = M,
            tp = θ_at_epoch_to_tperi(1.5, 57737.0; M=M, e=0.15, a=45.0, i=0.8, ω=1.2, Ω=2.1),
            F = 0.0,
            observations = NamedTuple()
        ),
    )
)
(M_pri = 1.61, M_sec = 85.0, M = 1.6911405098873926, plx = 21.5, pmra = -137.0, pmdec = 2.0, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 15.0, offset = 0.0),), planets = (b = (a = 45.0, e = 0.15, ω = 1.2, mass = 85.0, i = 0.8, Ω = 2.1, θ = 1.5, M = 1.6911405098873926, tp = -4260.340997684594, F = 0.0, observations = NamedTuple()),))

Generate synthetic system with simulated data

sim_system = Octofitter.generate_from_params(model.system, params_to_simulate)
sim_model = Octofitter.LogDensityModel(sim_system)
LogDensityModel for System HD91312_simulation of dimension 13 and 10 epochs with fields .ℓπcallback and .∇ℓπcallback

Let's plot the simulated orbit and data to see what we generated:

# Convert true parameters to chain format for plotting
true_chain = Octofitter.result2mcmcchain([params_to_simulate])

# Plot the simulated system with true parameters
fig = octoplot(sim_model, true_chain, colormap=:red)
fig
Example block output

Fit the Simulated Data

Now we'll sample from the posterior using the simulated data:

# Sample from the simulated data
chain, pt = octofit_pigeons(sim_model, n_rounds=8, explorer=SliceSampler())
display(chain)
[ Info: Sampler running with multiple threads     : true
[ Info: Likelihood evaluated with multiple threads: false
┌ Info: Starting values not provided for all parameters! Guessing starting point using global optimization:
│   num_params = 13
└   num_fixed = 0
┌ Info: Found sample of initial positions
│   logpost_range = (-79.16063454708186, -68.94063608018524)
└   mean_logpost = -71.95246722456072
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  scans     restarts      Λ        Λ_var      time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)    min(αₑ)   mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2          0       1.69       3.39       4.02   2.32e+09  -4.09e+04          0      0.836          1          1
        4          0       8.33       4.05       7.04   4.27e+09  -2.54e+03          0      0.601          1          1
        8          0       8.02       4.94       13.6   8.32e+09       -449          0      0.582          1          1
       16          0       8.16       7.51       26.7   1.63e+10       -150   7.06e-72      0.494          1          1
       32          0       8.48       7.25       51.8   3.18e+10        -87   2.15e-15      0.493      0.999          1
       64          1       9.07       5.61        103    6.3e+10      -80.4      0.241      0.527          1          1
      128          6       9.61       5.46        206   1.27e+11      -80.2      0.261      0.514          1          1
      256          6        9.3        5.7        413   2.54e+11      -80.1      0.296      0.516          1          1
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Compare Results

Let's visualize how well our sampling recovered the true parameters:

# Plot the posterior from fitting simulated data
fig = octoplot(sim_model, chain)
fig
Example block output

Overlay True and Recovered Parameters

For a direct comparison, we can overlay the true orbit with the posterior samples:

# Create astrometry plot showing both posterior and true orbit
fig = Octofitter.astromplot(sim_model, chain, use_arcsec=false, ts=1:2)
ax = fig.content[1]

# Add true orbit in red
Octofitter.astromplot!(ax, sim_model, true_chain, use_arcsec=false, ts=1:2, colormap=Makie.cgrad([:red]))

# Make true orbit line more visible
ax.scene.plots[6].linewidth = 6

fig
Example block output

Corner Plot Comparison

Compare the parameters used to generate the simulated data, and the recovered posterior:

# Create corner plot showing both posterior and true values
octocorner(
    sim_model,
    chain,
    small=true,
    truth=(PairPlots.Truth((;
        M=collect(true_chain[:M][:]),
        b_a=collect(true_chain[:b_a][:]),
        b_e=collect(true_chain[:b_e][:]),
        b_i=collect(true_chain[:b_i][:]),
        b_mass=collect(true_chain[:b_mass][:]),
    ),label="Simulated Orbit", color=:red)=>(
        PairPlots.MarginLines(),
        PairPlots.BodyLines(),
    ),)
)
Example block output