Full G23H Example Script

This page provides a complete, production-ready script for fitting orbital models using the G23H catalog. This script supports:

Single or multi-planet systems
Optional Gaussian Process modeling of stellar activity in RV data
RV data from DACE or custom RDB files
Flexible selection of which astrometric observation types to include
Incremental sampling with checkpoint saving

Prerequisites

Before running this script, ensure you have installed the necessary packages:

Required packages:

using Pkg
Pkg.add([
    "Octofitter",
    "OctofitterRadialVelocity",
    "Arrow",
    "DataFrames",
    "Distributions",
    "CairoMakie",
    "PairPlots",
    "Pigeons",
    "MCMCChains",
    "ArgParse",
    "DelimitedFiles",
    "FiniteDiff",
    "DifferentiationInterface",
])

Optional for DACE RV data:

using CondaPkg
# Then in Pkg mode: conda pip_add dace-query

Script Overview

The script is organized into several sections:

Argument Parsing - Command-line interface for specifying targets and options
Observation Setup - Create G23H astrometry likelihood object (catalog downloaded automatically)
RV Data Loading - Load RV data from DACE or RDB files
Model Definition - Define planet and system models with optional GP
Sampling - Run MCMC with Pigeons
Analysis - Generate plots and save results

Complete Script

#=
G23H Joint Astrometry + RV Fitting Script
=========================================

This script fits orbital models using the G23H catalog for joint
Gaia-Hipparcos astrometry modeling, optionally combined with RV data.

Usage:
    julia script.jl --hip 12345 --host-mass 1.0 --n-rounds 10
    julia script.jl --gaia 756291174721509376 --host-mass 1.0 --dace-rv
    julia script.jl --hip 12345 --host-mass 1.0 --rdb-rv mydata.rdb --gp

=#

# First time only: install dace-query
# using CondaPkg;  Then type "]" to go into Pkg-mode and paste: conda pip_add dace-query

using ArgParse
using Arrow
using CairoMakie
using DataFrames
using Dates
using Distributions
using Octofitter, OctofitterRadialVelocity
using LinearAlgebra
using PairPlots
using Pigeons
using Printf
using Statistics
using StatsBase
using MCMCChains
using DelimitedFiles
using FiniteDiff, DifferentiationInterface

## ============================================================================
## ARGUMENT PARSING
## ============================================================================

function parse_commandline()
    s = ArgParseSettings()

    @add_arg_table! s begin
        "--host-mass"
            help = "Host star mass in solar masses"
            arg_type = Float64
            required = true
        "--n-planets"
            help = "Number of planets in the system"
            arg_type = Int
            default = 1
        "--hip"
            help = "Hipparcos ID (mutually exclusive with --gaia)"
            arg_type = Int
        "--gaia"
            help = "Gaia source ID (mutually exclusive with --hip)"
            arg_type = Int64
        "--gaia-rv"
            help = "Include Gaia DR3 RV in astrometry likelihood"
            action = :store_true
        "--dace-rv"
            help = "Query DACE for RV data"
            action = :store_true
        "--rdb-rv"
            help = "Load RV data from RDB file (can be specified multiple times)"
            action = :append_arg
        "--gp"
            help = "Enable Gaussian Process kernel for stellar activity"
            action = :store_true
        "--n-rounds"
            help = "Number of MCMC rounds"
            arg_type = Int
            default = 12
        "--obs-types"
            help = "Comma-separated list of observation types to include (default: all)"
            arg_type = String
        "--circular"
            help = "Force a circular (e=0) orbit for all companions"
            action = :store_true
    end

    return parse_args(s)
end

args = parse_commandline()

# Validate mutually exclusive arguments
if isnothing(args["hip"]) && isnothing(args["gaia"])
    error("Either --hip or --gaia must be specified")
end
if !isnothing(args["hip"]) && !isnothing(args["gaia"])
    error("Cannot specify both --hip and --gaia")
end

# Extract configuration
host_mass = args["host-mass"]
num_planets = args["n-planets"]
n_rounds = args["n-rounds"]
use_gaia_rv = args["gaia-rv"]
use_dace_rv = args["dace-rv"]
rdb_files = isnothing(args["rdb-rv"]) ? String[] : args["rdb-rv"]
use_gp = args["gp"]
circular = args["circular"]

# Generate planet names
planet_names = Symbol.(Char.(Int('b') .+ (0:num_planets-1)))

output_dir = "results"
mkpath(output_dir)

## ============================================================================
## CREATE G23H OBSERVATION OBJECT
## ============================================================================

# G23HObs accepts either gaia_id or hip_id directly.
# The G23H catalog is automatically downloaded on first use via DataDeps.
if !isnothing(args["hip"])
    hip_id = args["hip"]
    sysname′ = "HIP$hip_id"
    absastrom = Octofitter.G23HObs(;
        hip_id=hip_id,
        include_rv=use_gaia_rv,
    )
else
    gaia_id = args["gaia"]
    absastrom = Octofitter.G23HObs(;
        gaia_id=gaia_id,
        include_rv=use_gaia_rv,
    )
    # Determine system name from catalog
    if isfinite(absastrom.catalog.hip_id)
        hip_id = round(Int, absastrom.catalog.hip_id)
        sysname′ = "HIP$hip_id"
    else
        sysname′ = "GDR3$gaia_id"
    end
end

ra = absastrom.catalog.ra
dec = absastrom.catalog.dec
println("Found target: RA=$ra, Dec=$dec")

# Determine which observation types to include
if isnothing(args["obs-types"])
    # Default: include all available types
    obs_to_include = [
        :iad_hip,
        :ra_hip,  :dec_hip,
        :ra_hg,   :dec_hg,
        :ra_dr2,  :dec_dr2,
        :ra_dr32, :dec_dr32,
        :ra_dr3,  :dec_dr3,
        :ueva_dr3,
    ]
    if use_gaia_rv
        push!(obs_to_include, :rv_dr3)
    end
else
    # Parse user-specified observation types
    obs_to_include = Symbol.(split(args["obs-types"], ","))
    # Add rv_dr3 if --gaia-rv is specified
    if use_gaia_rv && !(:rv_dr3 in obs_to_include)
        push!(obs_to_include, :rv_dr3)
    end
end

indices = findall([
    available_obs_kind ∈ obs_to_include
    for available_obs_kind in absastrom.table.kind
])
absastrom = Octofitter.likeobj_from_epoch_subset(absastrom, indices)

sysname = "$sysname′-" * join(string.(absastrom.table.kind), "-")

## ============================================================================
## GAUSSIAN PROCESS KERNEL (OPTIONAL)
## ============================================================================

if use_gp
    function gauss_proc_approx_quasi_periodic(θ_system)
        B = θ_system.gp_B
        C = θ_system.gp_C
        L = θ_system.gp_L
        Prot = θ_system.gp_Prot

        kernel = OctofitterRadialVelocity.Celerite.RealTerm(
            log(B * (1 + C) / (2 + C)),  # log_a
            log(1 / L)                    # log_c
        ) + OctofitterRadialVelocity.Celerite.ComplexTerm(
            log(B / (2 + C)),            # log_a
            -Inf,                         # log_b
            log(1 / L),                  # log_c
            log(2π / Prot)               # log_d
        )

        return OctofitterRadialVelocity.Celerite.CeleriteGP(kernel)
    end
end

## ============================================================================
## RV HELPER FUNCTIONS
## ============================================================================

"""
Create RV likelihood for a single instrument.
Handles both GP and non-GP cases using composable variable blocks.
"""
function create_rv_likelihood(dat, name, mean_epoch, use_gp)
    # Base variables always present
    base_vars = @variables begin
        offset ~ Uniform(-1000, 1000)
        jitter ~ LogUniform(0.1, 100)
        m = system.m
    end

    # GP variables (conditional)
    gp_vars = @variables begin
        gp_B = system.gp_B
        gp_C = system.gp_C
        gp_L = system.gp_L
        gp_Prot = system.gp_Prot
    end

    # Combine variables based on whether GP is enabled
    vars = use_gp ? vcat(base_vars, gp_vars) : base_vars

    # Create likelihood with appropriate configuration
    return StarAbsoluteRVObs(
        dat;
        name,
        gaussian_process = use_gp ? gauss_proc_approx_quasi_periodic : nothing,
        variables = vars,
        trend_function = (θ_obs, epoch) -> θ_obs.m * (epoch - mean_epoch)
    )
end

"""
Perform outlier rejection on RV data using MAD (Median Absolute Deviation).
"""
function reject_rv_outliers(d, inst_names; clip_threshold=3.0)
    d_filt_parts = DataFrame[]
    total_original = 0
    total_filtered = 0

    for inst in inst_names
        d_inst = d[d.ins_name .== inst, :]
        n_inst_original = nrow(d_inst)
        total_original += n_inst_original

        # Outlier rejection using MAD within this instrument
        sigma = 1.4826 * mad(d_inst.rv)  # MAD to std conversion
        median_rv = median(d_inst.rv)

        # Create mask for non-outliers
        rv_mask = abs.(d_inst.rv .- median_rv) .< (clip_threshold * sigma)
        d_inst_filt = d_inst[rv_mask, :]

        n_inst_filtered = nrow(d_inst_filt)
        n_inst_rejected = n_inst_original - n_inst_filtered

        println("  $inst: kept $n_inst_filtered/$n_inst_original (rejected $n_inst_rejected)")

        if n_inst_filtered < 3
            @warn "Too few measurements for $inst after filtering ($n_inst_filtered < 3); keeping all for this instrument"
            d_inst_filt = d_inst
        end

        d_inst_filt.rv .-= median(d_inst_filt.rv)

        push!(d_filt_parts, d_inst_filt)
        total_filtered += nrow(d_inst_filt)
    end

    d_filt = vcat(d_filt_parts...)
    println("Total: kept $total_filtered/$total_original measurements across all instruments")

    return d_filt
end

"""
Process RV data for multiple instruments: outlier rejection, plotting, and likelihood creation.
Reusable for both DACE and RDB data sources.
"""
function process_rv_data(d_filt, inst_names; sysname, use_gp, source_name="")
    # Plot the filtered data
    f, a, p = errorbars(
        d_filt.rjd,
        d_filt.rv,
        d_filt.rv_err,
        color = map(i -> findfirst(==(i), inst_names), d_filt.ins_name),
        colormap = :turbo,
        axis = (; xlabel="time", ylabel="rv [m/s]", title=source_name)
    )

    plot_filename = isempty(source_name) ?
        "$sysname-rvs.png" :
        "$sysname-rvs-$(lowercase(source_name)).png"
    save(plot_filename, f, px_per_unit=4)

    # Create observations for each instrument
    observations = map(inst_names) do name
        dat = d_filt[d_filt.ins_name .== name, :]
        dat.epoch .= dat.rjd
        dat.rv .= dat.rv
        dat.σ_rv .= dat.rv_err
        dat.rv .-= median(dat.rv)

        mean_epoch = mean(dat.epoch)

        create_rv_likelihood(dat, name, mean_epoch, use_gp)
    end

    return observations
end

## ============================================================================
## LOAD RV DATA
## ============================================================================

rvlikes = []

# Load RV data from DACE
if use_dace_rv
    @info "Loading RV data from DACE"
    using PythonCall
    dace_spectroscopy = pyimport("dace_query.spectroscopy")

    hip_for_dace = !isnothing(args["hip"]) ? args["hip"] : absastrom.catalog.hip_id
    result = dace_spectroscopy.Spectroscopy.get_timeseries(
        target="HIP $(Int(hip_for_dace))",
        sorted_by_instrument=false,
        output_format="pandas"
    )
    d = DataFrame(pyconvert(PyTable, result))

    if isempty(d)
        @warn "NO RVS Found from DACE!"
    else
        # Perform outlier rejection per instrument
        println("Performing per-instrument outlier rejection for DACE data...")
        d_insts = unique(d.ins_name)

        d_filt = reject_rv_outliers(d, d_insts)

        # Process and create observations using helper function
        dace_likes = process_rv_data(d_filt, d_insts; sysname, use_gp, source_name="dace")
        append!(rvlikes, dace_likes)
    end
end

# Load RV data from RDB files
for rdb_file in rdb_files
    @info "Loading RV data from RDB file: $rdb_file"

    if !isfile(rdb_file)
        @warn "RDB file not found: $rdb_file"
        continue
    end

    # Read RDB file (tab-separated, skip first 2 header lines)
    d_raw = readdlm(rdb_file, '\t', skipstart=2)

    # Create DataFrame from RDB data
    # Columns: rjd, vrad, svrad, s_mw, sig_s, ha, sig_ha, weight
    d = DataFrame(
        rjd = Float64.(d_raw[:, 1]),
        rv = Float64.(d_raw[:, 2]),      # vrad
        rv_err = Float64.(d_raw[:, 3]),  # svrad
    )

    # Use filename as instrument name
    inst_name = replace(basename(rdb_file), ".rdb" => "")
    d.ins_name .= inst_name

    # Basic outlier rejection using MAD
    println("Performing outlier rejection for RDB file: $inst_name...")
    d_insts = [inst_name]

    d_filt = reject_rv_outliers(d, d_insts)

    # Process and create observations using helper function
    rdb_likes = process_rv_data(d_filt, d_insts; sysname, use_gp, source_name=inst_name)
    append!(rvlikes, rdb_likes)
end

if isempty(rvlikes)
    @info "No RV data loaded"
end

has_rv_data = !isempty(rvlikes)

## ============================================================================
## DEFINE PLANET MODEL(S)
## ============================================================================

# We should always use RUWE mode
ueva_mode = :RUWE
OrbitType = AbsoluteVisual{KepOrbit}

ref_planet_pos = mean(absastrom.gaia_table.epoch)

planets = Planet[]
for planet_i in 1:num_planets
    planet = Planet(
        name=planet_names[planet_i],
        basis=OrbitType,
        observations=[],
        variables=@variables begin
            planet_i = $planet_i
            a ~ LogUniform(0.001, 1024)
            planet_present = system.n_planets >= planet_i
            mass′ ~ LogUniform(0.01, host_mass / Octofitter.mjup2msol)
            mass = mass′ * planet_present
            e ~ Uniform(0, 0.9)
            M = system.M_pri + mass * Octofitter.mjup2msol
            ω ~ Uniform(0, 2pi)
            i = system.i
            Ω = system.Ω
            τ ~ Uniform(0.0, 1.0)
            P = √(a^3 / M)
            tp = τ * P * 365.25 + $ref_planet_pos
        end
    )
    push!(planets, planet)
end

if num_planets == 0
    planets = [Planet(name=:b, basis=OrbitType, variables=@variables begin
        a = 1.0
        mass = 0.0
        e = 0.0
        M = system.M_pri
        ω = 0.0
        i = 0.0
        Ω = 0.0
        tp = 0.0
    end)]
end

## ============================================================================
## DEFINE SYSTEM MODEL
## ============================================================================

rv = isnan(absastrom.catalog.radial_velocity) ? 0 : absastrom.catalog.radial_velocity * 1e3

observations = []
push!(observations, absastrom)
append!(observations, rvlikes)

if num_planets > 1
    push!(observations, PlanetOrderPrior(planets...))
    push!(observations, NonCrossingPrior())
end

ref_epoch = Octofitter.meta_gaia_DR3.ref_epoch_mjd

# Base system variables
base_system_vars = @variables begin
    M_pri = $host_mass

    n_planets ~ truncated(NegativeBinomial(), upper=num_planets)

    ref_epoch = $ref_epoch
    plx ~ truncated(Normal(absastrom.catalog.parallax, absastrom.catalog.parallax_error),
                    lower=absastrom.catalog.parallax / 2)

    pmra ~ Uniform((Float64(absastrom.catalog.pmra_dr3) .+ (-10, 10))...)
    pmdec ~ Uniform((Float64(absastrom.catalog.pmdec_dr3) .+ (-10, 10))...)

    dec = $dec
    ra = $ra
    rv = $rv

    i ~ Sine()
    Ω ~ Uniform(0, 2pi)
end

# RV trend variable (conditional on having RV data)
rv_trend_vars = @variables begin
    m ~ Normal(0, 0.1)  # m/s/d trend
end

# GP hyperparameters (conditional on GP being enabled)
gp_system_vars = @variables begin
    gp_B ~ Uniform(0.00001, 2000000)
    gp_C ~ Uniform(0.00001, 200)
    gp_L ~ Uniform(40, 2000)
    gp_Prot ~ Uniform(35, 45)
end

# Compose system variables based on configuration
system_vars = base_system_vars
if has_rv_data
    system_vars = vcat(system_vars, rv_trend_vars)
end
if use_gp
    system_vars = vcat(system_vars, gp_system_vars)
end

sys = System(;
    name=sysname,
    companions=planets,
    observations,
    variables=system_vars
)

model = Octofitter.LogDensityModel(sys; verbosity=4, autodiff=AutoFiniteDiff())

## ============================================================================
## INITIALIZE AND SAMPLE
## ============================================================================

# Enable threaded Kepler solving for performance
Octofitter._kepsolve_use_threads[] = true

# Find good starting positions
initial_θ = collect(Octofitter.guess_starting_position(model, 10000)[1])
model.starting_points = fill(collect(model.link(initial_θ)), 100)
@time model.ℓπcallback(model.starting_points[1])

# Run initial sampling
explorer = SliceSampler()

chain, pt = octofit_pigeons(
    model,
    n_chains=32,
    n_chains_variational=0,
    variational=nothing,
    n_rounds=1,
    explorer=explorer,
    multithreaded=true,
)

## ============================================================================
## INCREMENTAL SAMPLING WITH CHECKPOINTS
## ============================================================================

while length(pt.shared.reports.summary.last_round_max_time) < n_rounds
    global chain, pt
    increment_n_rounds!(pt, 1)
    chain, pt = octofit_pigeons(pt)
    display(chain)

    # Save checkpoint
    chain = MCMCChains.setinfo(chain, (; chain.info...,
        logevidence_ratio=Pigeons.stepping_stone_pair(pt)[2]))
    chain_file = joinpath(output_dir, "$sysname-post-round$(pt.inputs.n_rounds).fits")
    Octofitter.savechain(chain_file, chain)

    # generate dotplot
    Octofitter.dotplot(model, chain)

    # Save sampling summary
    CSV.write(joinpath(output_dir, "$(sysname)-pigeons-summary.csv"),
              pt.shared.reports.summary)
end

## ============================================================================
## ANALYSIS AND VISUALIZATION
## ============================================================================

# Final chain summary
println("\n" * "="^60)
println("FINAL RESULTS")
println("="^60)
display(chain)

# Corner plot
corner_fig = octocorner(model, chain, small=true)
save(joinpath(output_dir, "$(sysname)-corner.png"), corner_fig, px_per_unit=2)

# Orbit plot
orbit_fig = octoplot(model, chain)
save(joinpath(output_dir, "$(sysname)-orbits.png"), orbit_fig, px_per_unit=2)

# RV plot (if we have RV data)
if has_rv_data
    chain_pl = chain[chain["b_planet_present"][:] .> 0]
    fig = Octofitter.rvpostplot(model, chain_pl, rand(1:size(chain_pl, 1)),
                                 show_summary=true,
                                 fname="$sysname-rvpostplot.png")
end

println("\nResults saved to: $output_dir")

Command Line Usage

Run the script from the command line with various options:

# Basic usage with Hipparcos ID
julia script.jl --hip 12345 --host-mass 1.0 --n-rounds 10

# Using Gaia source ID
julia script.jl --gaia 756291174721509376 --host-mass 1.0

# With RV data from DACE
julia script.jl --hip 12345 --host-mass 1.0 --dace-rv

# With RV data from an RDB file
julia script.jl --hip 12345 --host-mass 1.0 --rdb-rv harps_data.rdb

# Multiple RDB files
julia script.jl --hip 12345 --host-mass 1.0 --rdb-rv harps.rdb --rdb-rv hires.rdb

# With Gaussian Process for stellar activity
julia script.jl --hip 12345 --host-mass 1.0 --dace-rv --gp

# Multi-planet fit
julia script.jl --hip 12345 --host-mass 1.0 --n-planets 2 --n-rounds 15

# Force circular orbits
julia script.jl --hip 12345 --host-mass 1.0 --circular

# Include Gaia DR3 RV variability constraint
julia script.jl --hip 12345 --host-mass 1.0 --gaia-rv

# Select specific observation types
julia script.jl --hip 12345 --host-mass 1.0 --obs-types "ra_hip,dec_hip,ra_hg,dec_hg,ueva_dr3"

Command Line Arguments

Argument	Type	Required	Description
`--hip`	Int	Yes*	Hipparcos ID
`--gaia`	Int64	Yes*	Gaia DR3 source ID
`--host-mass`	Float64	Yes	Host star mass in solar masses
`--n-planets`	Int	No	Number of planets (default: 1)
`--n-rounds`	Int	No	Number of MCMC rounds (default: 12)
`--dace-rv`	Flag	No	Query DACE for RV data
`--rdb-rv`	String	No	Path to RDB file (can repeat)
`--gp`	Flag	No	Enable GP for stellar activity
`--gaia-rv`	Flag	No	Include Gaia RV variability
`--obs-types`	String	No	Comma-separated observation types
`--circular`	Flag	No	Force circular orbits

*One of --hip or --gaia must be specified.

Performance Tips

Start with fewer rounds for initial exploration, then increase for publication-quality results.

Use freeze_epochs=true for faster sampling during testing:

absastrom = G23HObs(;
    gaia_id=gaia_id,
    freeze_epochs=true,  # Faster but approximate
)

Run Julia with multiple threads:

julia --threads=auto script.jl --hip 12345 --host-mass 1.0

Consider using a subset of observations for initial exploration, then add more data for final fits.