Fit Gaussian Process

This example shows how to fit a Gaussian process to model stellar activity in RV data. It continues from Basic RV Fit.

Note

Radial velocity modelling is supported in Octofitter via the extension package OctofitterRadialVelocity. To install it, run pkg> add OctofitterRadialVelocity

There are two different GP packages supported by OctofitterRadialVelocity: AbstractGPs, and Celerite. Important note: Celerite.jl does not support Julia 1.0+, so we currently bundle a fork that has been patched to work. When / if Celerite.jl is updated we will switch back to the public package.

Computational cost

GP models are significantly more computationally expensive than non-GP models. Plan for longer sampling times when using Gaussian processes.

For this example, we will fit the orbit of the planet K2-131 to perform the same fit as in the RadVel Gaussian Process Fitting tutorial.

We will use the following packages:

using Octofitter
using OctofitterRadialVelocity
using PlanetOrbits
using CairoMakie
using PairPlots
using CSV
using DataFrames
using Distributions

We will pick up from our tutorial Basic RV Fit with the data already downloaded and available as a table called rv_dat:

rv_file = download("https://raw.githubusercontent.com/California-Planet-Search/radvel/master/example_data/k2-131.txt")
rv_dat_raw = CSV.read(rv_file, DataFrame, delim=' ')
rv_dat = DataFrame();
rv_dat.epoch = jd2mjd.(rv_dat_raw.time)
rv_dat.rv = rv_dat_raw.mnvel
rv_dat.σ_rv = rv_dat_raw.errvel
tels = sort(unique(rv_dat_raw.tel))
2-element Vector{InlineStrings.String7}:
 "harps-n"
 "pfs"

Gaussian Process Fit with AbstractGPs

Let us now add a Gaussian process to model stellar activity. This should improve the fit.

We start by writing a function that creates a Gaussian process kernel from a set of system parameters. We will create a quasi-periodic kernel. We provide this function as an arugment gaussian_process to the likelihood constructor:

using AbstractGPs

gp_explength_mean = 9.5*sqrt(2.) # sqrt(2)*tau in Dai+ 2017 [days]
gp_explength_unc = 1.0*sqrt(2.)
gp_perlength_mean = sqrt(1. /(2. *3.32)) # sqrt(1/(2*gamma)) in Dai+ 2017
gp_perlength_unc = 0.019
gp_per_mean = 9.64 # T_bar in Dai+ 2017 [days]
gp_per_unc = 0.12

rvlike_harps = StarAbsoluteRVObs(
    rv_dat[rv_dat_raw.tel .== "harps-n",:],
    name="harps-n",
    gaussian_process = θ_obs -> GP(
        θ_obs.η_1^2 *
        (SqExponentialKernel() ∘ ScaleTransform(1/(θ_obs.η_2))) *
        (PeriodicKernel(r=[θ_obs.η_4]) ∘ ScaleTransform(1/(θ_obs.η_3)))
    ),
    # Note: variables= must come last
    variables=@variables begin
        offset ~ Normal(-6693,100) # m/s
        jitter ~ LogUniform(0.1,100) # m/s
        # Add priors on GP kernel hyper-parameters.
        η_1 ~ truncated(Normal(25,10),lower=0.1,upper=100)
        # Important: ensure the period and exponential length scales
        # have physically plausible lower and upper limits to avoid poor numerical conditioning
        η_2 ~ truncated(Normal(gp_explength_mean,gp_explength_unc),lower=5,upper=100)
        η_3 ~ truncated(Normal(gp_per_mean,1),lower=2, upper=100)
        η_4 ~ truncated(Normal(gp_perlength_mean,gp_perlength_unc),lower=0.2, upper=10)
    end
)
rvlike_pfs = StarAbsoluteRVObs(
    rv_dat[rv_dat_raw.tel .== "pfs",:],
    name="pfs",
    gaussian_process = θ_obs -> GP(
        θ_obs.η_1^2 *
        (SqExponentialKernel() ∘ ScaleTransform(1/(θ_obs.η_2))) *
        (PeriodicKernel(r=[θ_obs.η_4]) ∘ ScaleTransform(1/(θ_obs.η_3)))
    ),
    # Note: variables= must come last
    variables=@variables begin
        offset ~ Normal(0,100) # m/s
        jitter ~ LogUniform(0.1,100) # m/s
        # Add priors on GP kernel hyper-parameters.
        η_1 ~ truncated(Normal(25,10),lower=0.1,upper=100)
        # Important: ensure the period and exponential length scales
        # have physically plausible lower and upper limits to avoid poor numerical conditioning
        η_2 ~ truncated(Normal(gp_explength_mean,gp_explength_unc),lower=5,upper=100)
        η_3 ~ truncated(Normal(gp_per_mean,1),lower=2, upper=100)
        η_4 ~ truncated(Normal(gp_perlength_mean,gp_perlength_unc),lower=0.2, upper=10)
    end
)
## No change to the rest of the model

planet_1 = Planet(
    name="b",
    basis=RadialVelocityOrbit,
    observations=[],
    variables=@variables begin
        e = 0
        ω = 0.0
        # To match RadVel, we set a prior on Period and calculate semi-major axis from it
        P ~ truncated(
            Normal(0.3693038/365.256360417, 0.0000091/365.256360417),
            lower=0.0001
        )
        M = system.M
        a = cbrt(M * P^2) # note the equals sign.
        τ ~ UniformCircular(1.0)
        tp = τ*P*365.256360417 + 57782 # reference epoch for τ. Choose an MJD date near your data.
        # minimum planet mass [jupiter masses]. really m*sin(i)
        mass ~ LogUniform(0.001, 10)
    end
)

sys = System(
    name = "k2_132",
    companions=[planet_1],
    observations=[rvlike_harps, rvlike_pfs],
    variables=@variables begin
        M ~ truncated(Normal(0.82, 0.02),lower=0.1) # (Baines & Armstrong 2011).
    end
)

model = Octofitter.LogDensityModel(sys)
LogDensityModel for System k2_132 of dimension 17 and 71 epochs with fields .ℓπcallback and .∇ℓπcallback

Note that the two instruments do not need to use the same Gaussian process kernels, nor the same hyper parameter names.

Note

Tip: If you want the instruments to share the Gaussian process kernel hyper parameters, move the variables up to the system's @variables block, and forward them to the observation variables block e.g. η₁ = system.η₁, η₂ = system.η₂.

Initialize the starting points, and confirm the data are entered correcly:

init_chain = initialize!(model)
fig = Octofitter.rvpostplot(model, init_chain)
Example block output

Sample from the model using MCMC (the no U-turn sampler)

# Seed the random number generator
using Random
rng = Random.Xoshiro(0)

chain = octofit(
    rng, model,
    adaptation = 100,
    iterations = 100,
)
Chains MCMC chain (100×36×1 Array{Float64, 3}):

Iterations        = 1:1:100
Number of chains  = 1
Samples per chain = 100
Wall duration     = 421.2 seconds
Compute duration  = 421.2 seconds
parameters        = M, harps_n_offset, harps_n_jitter, harps_n_η_1, harps_n_η_2, harps_n_η_3, harps_n_η_4, pfs_offset, pfs_jitter, pfs_η_1, pfs_η_2, pfs_η_3, pfs_η_4, b_P, b_τx, b_τy, b_mass, b_τ, b_e, b_ω, b_M, b_a, b_tp
internals         = n_steps, is_accept, acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt, loglike, logpost, tree_depth, numerical_error

Summary Statistics
      parameters         mean       std      mcse   ess_bulk   ess_tail      r ⋯
          Symbol      Float64   Float64   Float64    Float64    Float64   Floa ⋯

               M       0.8189    0.0160    0.0017    93.1304    78.3393    0.9 ⋯
  harps_n_offset   -6682.0492   22.5053   18.3658     1.5601    21.0246    1.8 ⋯
  harps_n_jitter       0.8351    0.8202    0.1223    52.6255    45.6025    1.0 ⋯
     harps_n_η_1      28.8879    7.2690    1.7279    17.9194    71.2875    1.0 ⋯
     harps_n_η_2      13.5104    1.4612    0.1461    92.2428   116.7774    0.9 ⋯
     harps_n_η_3       9.5645    1.0160    0.0941   111.5546    54.1425    1.0 ⋯
     harps_n_η_4       0.3841    0.0176    0.0016   128.4996    57.2146    1.0 ⋯
      pfs_offset     -12.4595   20.4596    5.8070    15.6997    31.7840    1.0 ⋯
      pfs_jitter       4.9953    1.5315    0.1302   137.6414    76.6284    1.0 ⋯
         pfs_η_1      29.1882    6.9501    0.6491   113.8957    96.8032    1.0 ⋯
         pfs_η_2      13.1196    1.4715    0.1318   126.2657   101.7705    1.0 ⋯
         pfs_η_3       9.4102    0.9551    0.1028    81.0630    78.3393    1.0 ⋯
         pfs_η_4       0.3896    0.0202    0.0018   131.0251   116.7774    0.9 ⋯
             b_P       0.0010    0.0000    0.0000   131.4862   112.1358    1.0 ⋯
            b_τx       0.6216    0.1496    0.0134   137.1875    66.2307    1.0 ⋯
            b_τy      -0.7585    0.1611    0.0154   114.9962    58.9391    1.0 ⋯
          b_mass       0.0199    0.0048    0.0005    93.1109    55.4729    1.0 ⋯
        ⋮              ⋮           ⋮         ⋮         ⋮          ⋮          ⋮ ⋱
                                                    2 columns and 6 rows omitted

Quantiles
      parameters         2.5%        25.0%        50.0%        75.0%        97 ⋯
          Symbol      Float64      Float64      Float64      Float64      Floa ⋯

               M       0.7928       0.8065       0.8165       0.8315       0.8 ⋯
  harps_n_offset   -6715.7575   -6698.2424   -6687.3062   -6664.1587   -6643.2 ⋯
  harps_n_jitter       0.1376       0.2601       0.4826       1.0688       2.9 ⋯
     harps_n_η_1      18.2165      23.6859      27.9533      33.4977      43.1 ⋯
     harps_n_η_2      10.8265      12.6903      13.4108      14.2725      16.4 ⋯
     harps_n_η_3       7.6149       8.9876       9.5380      10.1647      11.5 ⋯
     harps_n_η_4       0.3499       0.3732       0.3834       0.3945       0.4 ⋯
      pfs_offset     -60.1584     -23.3148     -13.4610       4.0936      18.5 ⋯
      pfs_jitter       2.3555       3.9354       4.9785       6.1307       8.4 ⋯
         pfs_η_1      17.8788      23.9963      28.8934      34.1997      43.5 ⋯
         pfs_η_2      10.3851      12.2623      12.8803      14.2659      15.8 ⋯
         pfs_η_3       7.5106       8.8893       9.3257       9.9677      11.2 ⋯
         pfs_η_4       0.3511       0.3766       0.3896       0.4045       0.4 ⋯
             b_P       0.0010       0.0010       0.0010       0.0010       0.0 ⋯
            b_τx       0.3806       0.5214       0.6279       0.7093       0.9 ⋯
            b_τy      -1.0412      -0.8668      -0.7700      -0.6839      -0.4 ⋯
          b_mass       0.0112       0.0169       0.0193       0.0224       0.0 ⋯
        ⋮              ⋮            ⋮            ⋮            ⋮            ⋮   ⋱
                                                     1 column and 6 rows omitted

For real data, we would want to increase the adaptation and iterations to about 1000 each.

Plot one sample from the results:

fig = Octofitter.rvpostplot(model, chain) # saved to "k2_132-rvpostplot.png"
Example block output

Plot many samples from the results:

fig = octoplot(
    model,
    chain,
    # Some optional tweaks to the appearance:
    N=50, # only plot 50 samples
    figscale=1.5, # make it larger
    alpha=0.05, # make each sample more transparent
    colormap="#0072b2",
) # saved to "k2_132-plot-grid.png"
Example block output

Corner plot:

octocorner(model, chain, small=true) # saved to "k2_132-pairplot-small.png"
Example block output

Gaussian Process Fit with Celerite

We now demonstrate an approximate quasi-static kernel implemented using Celerite. For the class of kernels supported by Celerite, the performance scales much better with the number of data points. This makes it a good choice for modelling large RV datasets.

Warning

Make sure that you type using OctofitterRadialVelocity.Celerite and not using Celerite. Celerite.jl does not support Julia 1.0+, so we currently bundle a fork that has been patched to work. When / if Celerite.jl is updated we will switch back to the public package.

using OctofitterRadialVelocity.Celerite

rvlike_harps = StarAbsoluteRVObs(
    rv_dat[rv_dat_raw.tel .== "harps-n",:],
    name="harps-n",
    gaussian_process = θ_obs -> Celerite.CeleriteGP(
        Celerite.RealTerm(
            #=log_a=# log(θ_obs.B*(1+θ_obs.C)/(2+θ_obs.C)),
            #=log_c=# log(1/θ_obs.L)
        ) + Celerite.ComplexTerm(
            #=log_a=#  log(θ_obs.B/(2+θ_obs.C)),
            #=log_b=#  -Inf,
            #=log_c=#  log(1/θ_obs.L),
            #=log_d=#  log(2pi/θ_obs.Prot)
        )
    ),
    # Note: variables= must come last
    variables=@variables begin
        offset ~ Normal(-6693,100) # m/s
        jitter ~ LogUniform(0.1,100) # m/s
        # Add priors on GP kernel hyper-parameters.
        B ~ Uniform(0.00001, 2000000)
        C ~ Uniform(0.00001, 200)
        L ~ Uniform(2, 200)
        Prot ~ Uniform(8.5, 20)#Uniform(0, 20)
    end
)
rvlike_pfs = StarAbsoluteRVObs(
    rv_dat[rv_dat_raw.tel .== "pfs",:],
    name="pfs",
    gaussian_process = θ_obs -> Celerite.CeleriteGP(
        Celerite.RealTerm(
            #=log_a=# log(θ_obs.B*(1+θ_obs.C)/(2+θ_obs.C)),
            #=log_c=# log(1/θ_obs.L)
        ) + Celerite.ComplexTerm(
            #=log_a=#  log(θ_obs.B/(2+θ_obs.C)),
            #=log_b=#  -Inf,
            #=log_c=#  log(1/θ_obs.L),
            #=log_d=#  log(2pi/θ_obs.Prot)
        )
    ),
    # Note: variables= must come last
    variables=@variables begin
        offset ~ Normal(0,100) # m/s
        jitter ~ LogUniform(0.1,100) # m/s
        # Add priors on GP kernel hyper-parameters.
        B ~ Uniform(0.00001, 2000000)
        C ~ Uniform(0.00001, 200)
        L ~ Uniform(2, 200)
        Prot ~ Uniform(8.5, 20)#Uniform(0, 20)
    end
)

## No change to the rest of the model

planet_1 = Planet(
    name="b",
    basis=RadialVelocityOrbit,
    observations=[],
    variables=@variables begin
        e = 0
        ω = 0.0
        # To match RadVel, we set a prior on Period and calculate semi-major axis from it
        P ~ truncated(
            Normal(0.3693038/365.256360417, 0.0000091/365.256360417),
            lower=0.0001
        )
        M = system.M
        a = cbrt(M * P^2) # note the equals sign.
        τ ~ UniformCircular(1.0)
        tp = τ*P*365.256360417 + 57782 # reference epoch for τ. Choose an MJD date near your data.
        # minimum planet mass [jupiter masses]. really m*sin(i)
        mass ~ LogUniform(0.001, 10)
    end
)

sys = System(
    name = "k2_132",
    companions=[planet_1],
    observations=[rvlike_harps, rvlike_pfs],
    variables=@variables begin
        M ~ truncated(Normal(0.82, 0.02),lower=0.1) # (Baines & Armstrong 2011).
    end
)

using DifferentiationInterface
using FiniteDiff
model = Octofitter.LogDensityModel(sys, autodiff=AutoFiniteDiff())
LogDensityModel for System k2_132 of dimension 17 and 71 epochs with fields .ℓπcallback and .∇ℓπcallback

The Celerite implementation doesn't support our default autodiff-backend (ForwardDiff.jl), so we disable autodiff by setting it to finite differences, and then using the Pigeons slice sampler which doesn't require gradients or (B) use Enzyme autodiff,

Initialize the starting points, and confirm the data are entered correcly:

init_chain = initialize!(model)
fig = Octofitter.rvpostplot(model, init_chain)
Example block output
using Pigeons
chain, pt = octofit_pigeons(model, n_rounds=7)
fig = Octofitter.rvpostplot(model, chain)
Example block output