Generating and Fitting Simulated Data
This tutorial demonstrates how to use Octofitter to simulate synthetic data, fit models to that data, and validate the results. This is a crucial workflow for understanding model performance and testing analysis pipelines.
We'll simulate data from a known set of parameters, fit a model to recover those parameters, and compare the posterior to the true values used for simulation.
Setup
using Octofitter, OctofitterRadialVelocity, Distributions
using CairoMakie, PairPlots, Pigeons
using CSV, DataFrames
Define the Model
In order to simulate data from Octofitter, you start by defining a real model with data. The simulate step will use the provided epochs, data types, and uncertainties when simulating data. If you don't have real data, you can enter in arbitrary values for e.g. delta R.A., but use expected epochs and uncertainties.
For this example, we'll use the combined astrometry, proper motion anomaly, and radial velocity model from the Astrometry, PMA, and RV tutorial to define the epochs and uncertainties.
# HGCA likelihood for HD 91312
hgca_like = HGCALikelihood(gaia_id=756291174721509376)
# Simulated relative astrometry data (from discovery paper)
astrom_dat = Table(;
epoch = [mjd("2016-12-15"), mjd("2017-03-12"), mjd("2017-03-13"), mjd("2018-02-08"), mjd("2018-11-28"), mjd("2018-12-15")],
ra = [133., 126., 127., 083., 058., 056.],
dec = [-174., -176., -172., -133., -122., -104.],
σ_ra = [07.0, 04.0, 04.0, 10.0, 10.0, 08.0],
σ_dec = [07.0, 04.0, 04.0, 10.0, 20.0, 08.0],
cor = [0.2, 0.3, 0.1, 0.4, 0.3, 0.2]
)
astrom_like = PlanetRelAstromLikelihood(
astrom_dat,
name = "SCExAO",
variables = @variables begin
jitter = 0
northangle = 0
platescale = 1
end
)
# Simulated RV data
rv_dat = Table(;
epoch = [mjd("2008-05-01"), mjd("2010-02-15"), mjd("2016-03-01")],
rv = [1300, 700, -2700],
σ_rv = [150, 150, 150]
)
rvlike = StarAbsoluteRVLikelihood(
rv_dat,
name="SOPHIE",
variables=@variables begin
jitter ~ truncated(Normal(10, 5), lower=0)
offset ~ Normal(0, 1000)
end
)
# Planet model
planet_b = Planet(
name="b",
basis=AbsoluteVisual{KepOrbit},
likelihoods=[ObsPriorAstromONeil2019(astrom_like)],
variables=@variables begin
a ~ LogUniform(0.1,400)
e ~ Uniform(0,0.999)
ω ~ Uniform(0, 2pi)
i ~ Sine()
Ω ~ Uniform(0, 2pi)
mass = system.M_sec
θ ~ Uniform(0, 2pi)
M = system.M
tp = θ_at_epoch_to_tperi(θ, 57737.0; M, e, a, i, ω, Ω)
F = 0.0
end
)
# System model
ra = 158.30707896392835
dec = 40.42555422701387
sys = System(
name="HD91312_simulation",
companions=[planet_b],
likelihoods=[hgca_like, rvlike],
variables=@variables begin
M_pri ~ truncated(Normal(1.61, 0.1), lower=0.1)
M_sec ~ LogUniform(0.5, 1000)
M = M_pri + M_sec*Octofitter.mjup2msol
plx ~ gaia_plx(gaia_id=756291174721509376)
pmra ~ Normal(-137, 10)
pmdec ~ Normal(2, 10)
ra = $ra
dec = $dec
rv = 0*1e3
ref_epoch = Octofitter.meta_gaia_DR3.ref_epoch_mjd
end
)
model = Octofitter.LogDensityModel(sys)
┌ Warning: Only stars with solution types 5 are currently supported. This sol is 7.0. If you need solution type 1, 7, or 9, please open an issue on GitHub and we would be happy to add it.
└ @ Octofitter ~/work/Octofitter.jl/Octofitter.jl/src/likelihoods/hipparcos.jl:192
[ Info: renormalizing hipparcos uncertainties according to Nielsen et al (2020). Pass `renormalize=false` to disable.
[ Info: No scan law table provided. We will fetch an approximate solution from the GOST webservice.
┌ Info: Detected known gap in Gaia scans; skipping.
│ window = 57175.48233750556
└ note = "Decontamination"
[ Info: Preparing model
┌ Info: Determined number of free variables
└ D = 13
┌ Info: Determined number type
└ T = Float64
ℓπcallback(θ): 0.000152 seconds (78 allocations: 84.312 KiB)
∇ℓπcallback(θ): 0.001028 seconds (24 allocations: 152.391 KiB)
Generate Synthetic Data
We have three choices for generating simulated data:
- Draw values from the priors
- Use values from a fitted posterior
- Specifying values manually
We will look at each.
1. Draw values from the priors
We can draw a value from the priors like so:
params_to_simulate = Octofitter.drawfrompriors(model.system)
(M_pri = 1.5562986319255552, M_sec = 10.347465336014558, plx = 29.118268387196267, pmra = -127.94961948113367, pmdec = -11.183875245402653, M = 1.566176262671512, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 5.499230860761885, offset = -187.44375737101257),), planets = (b = (a = 13.332176157916468, e = 0.7046170387475686, ω = 3.172046394443539, i = 1.8933071613532866, Ω = 5.469217899998308, θ = 5.639780995632805, mass = 10.347465336014558, M = 1.566176262671512, tp = 54694.40569689927, F = 0.0, observations = (obspri_SCExAO = (jitter = 0, northangle = 0, platescale = 1),)),))
2. Use values from a fitted posterior
We can select a particular draw from the posterior and use this to generate new data
# Perform MCMC fitting on real data
using Pigeons
chain_real, pt = octofit_pigeons(model, n_rounds=5)
# Use one particular draw as the basis of our simulation
draw_number = 1
params_to_simulate = Octofitter.mcmcchain2result(model, chain_real, draw_number)
Specifying values manually
We can also specify all values for the simulation manually. This process is a bit more involved.
Start by drawing parameters from the priors:
template = Octofitter.drawfrompriors(model.system)
(M_pri = 1.5476050121013458, M_sec = 673.0608196917726, plx = 29.133801224413578, pmra = -144.65193222974793, pmdec = 4.050878734755731, M = 2.1901049896897793, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 12.466066600549926, offset = 1338.673719329046),), planets = (b = (a = 38.122780455556125, e = 0.1700453043911487, ω = 1.5602361119870258, i = 0.37314499294146414, Ω = 5.108649912355986, θ = 1.0219160163400411, mass = 673.0608196917726, M = 2.1901049896897793, tp = 53772.267370459755, F = 0.0, observations = (obspri_SCExAO = (jitter = 0, northangle = 0, platescale = 1),)),))
Copy this output as a template, and replace values as needed. Note that if some parameters are calculated based on others in your model, you will have to repeat those calculations here.
Note that the output below is just an example, you must generate your own template from your model and modify it as needed. The exact structure is not garuanteed to be stable between versions of Octofitter.
# Define our "true" parameter values for simulation
M_pri = 1.61
M_sec = 85.0
M = M_pri + M_sec*Octofitter.mjup2msol
params_to_simulate = (
M_pri = M_pri,
M_sec = M_sec, # Jupiter masses
M = M,
plx = 21.5,
pmra = -137.0,
pmdec = 2.0,
ra = ra,
dec = dec,
rv = 0.0,
ref_epoch = Octofitter.meta_gaia_DR3.ref_epoch_mjd,
observations = (
SOPHIE = (jitter = 15.0, offset = 0.0),
),
planets = (
b = (
a = 45.0,
e = 0.15,
ω = 1.2,
mass = M_sec,
i = 0.8,
Ω = 2.1,
θ = 1.5,
M = M,
tp = θ_at_epoch_to_tperi(1.5, 57737.0; M=M, e=0.15, a=45.0, i=0.8, ω=1.2, Ω=2.1),
F = 0.0,
observations = NamedTuple()
),
)
)
(M_pri = 1.61, M_sec = 85.0, M = 1.6911405098873926, plx = 21.5, pmra = -137.0, pmdec = 2.0, ra = 158.30707896392835, dec = 40.42555422701387, rv = 0.0, ref_epoch = 57388.5, observations = (SOPHIE = (jitter = 15.0, offset = 0.0),), planets = (b = (a = 45.0, e = 0.15, ω = 1.2, mass = 85.0, i = 0.8, Ω = 2.1, θ = 1.5, M = 1.6911405098873926, tp = -4260.340997684594, F = 0.0, observations = NamedTuple()),))
Generate synthetic system with simulated data
sim_system = Octofitter.generate_from_params(model.system, params_to_simulate)
sim_model = Octofitter.LogDensityModel(sim_system)
LogDensityModel for System HD91312_simulation of dimension 13 and 10 epochs with fields .ℓπcallback and .∇ℓπcallback
Let's plot the simulated orbit and data to see what we generated:
# Convert true parameters to chain format for plotting
true_chain = Octofitter.result2mcmcchain([params_to_simulate])
# Plot the simulated system with true parameters
fig = octoplot(sim_model, true_chain, colormap=:red)
fig

Fit the Simulated Data
Now we'll sample from the posterior using the simulated data:
# Sample from the simulated data
chain, pt = octofit_pigeons(sim_model, n_rounds=8, explorer=SliceSampler())
display(chain)
[ Info: Sampler running with multiple threads : true
[ Info: Likelihood evaluated with multiple threads: false
┌ Info: Starting values not provided for all parameters! Guessing starting point using global optimization:
│ num_params = 13
└ num_fixed = 0
┌ Info: Found sample of initial positions
│ logpost_range = (-79.16063454708186, -68.94063608018524)
└ mean_logpost = -71.95246722456072
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
scans restarts Λ Λ_var time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 0 1.69 3.39 4.02 2.32e+09 -4.09e+04 0 0.836 1 1
4 0 8.33 4.05 7.04 4.27e+09 -2.54e+03 0 0.601 1 1
8 0 8.02 4.94 13.6 8.32e+09 -449 0 0.582 1 1
16 0 8.16 7.51 26.7 1.63e+10 -150 7.06e-72 0.494 1 1
32 0 8.48 7.25 51.8 3.18e+10 -87 2.15e-15 0.493 0.999 1
64 1 9.07 5.61 103 6.3e+10 -80.4 0.241 0.527 1 1
128 6 9.61 5.46 206 1.27e+11 -80.2 0.261 0.514 1 1
256 6 9.3 5.7 413 2.54e+11 -80.1 0.296 0.516 1 1
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Compare Results
Let's visualize how well our sampling recovered the true parameters:
# Plot the posterior from fitting simulated data
fig = octoplot(sim_model, chain)
fig

Overlay True and Recovered Parameters
For a direct comparison, we can overlay the true orbit with the posterior samples:
# Create astrometry plot showing both posterior and true orbit
fig = Octofitter.astromplot(sim_model, chain, use_arcsec=false, ts=1:2)
ax = fig.content[1]
# Add true orbit in red
Octofitter.astromplot!(ax, sim_model, true_chain, use_arcsec=false, ts=1:2, colormap=Makie.cgrad([:red]))
# Make true orbit line more visible
ax.scene.plots[6].linewidth = 6
fig

Corner Plot Comparison
Compare the parameters used to generate the simulated data, and the recovered posterior:
# Create corner plot showing both posterior and true values
octocorner(
sim_model,
chain,
small=true,
truth=(PairPlots.Truth((;
M=collect(true_chain[:M][:]),
b_a=collect(true_chain[:b_a][:]),
b_e=collect(true_chain[:b_e][:]),
b_i=collect(true_chain[:b_i][:]),
b_mass=collect(true_chain[:b_mass][:]),
),label="Simulated Orbit", color=:red)=>(
PairPlots.MarginLines(),
PairPlots.BodyLines(),
),)
)
