Tutorial - Using ACID

These tutorials requires use of the example data included in the example_ folder. You can find the script in example/tutorial_code.py

The architecture of ACID_v2 is different to the original ACID code. ACID now works under the hood as a class (called Acid), rather than previously as a function. The main result of Acid class or ACID function is also now a Result class with its own methods and attributes that allow for simple analysis. The legacy ACID and ACID_HARPS functions are still available for backwards compatibility, however it is recommended to use the Acid class for new applications. The tutorials below walk through how to use ACID for a variety of applications using the new class structure.

Quickstart

ACID returns LSD profiles based on input spectra. First, lets walk through an example for a single spectrum.

ACID requires an input spectrum and stellar line list. An example spectrum and line list are contained in the ‘example’ directory of the source code. In the ‘example’ directory we can set up our inputs are follows:

from astropy.io import fits

spec_file = fits.open('sample_spec_1.fits')

wavelength = spec_file[0].data   # Wavelengths in Angstroms
spectrum = spec_file[1].data     # Spectral Flux
error = spec_file[2].data        # Spectral Flux Errors
sn = spec_file[3].data           # SN of Spectrum

linelist = 'example_linelist.txt' # Insert path to line list

The stellar line list can also be obtained from VALD using their ‘Extract Stellar’ feature. You should input stellar parameters that correspond to your object and ensure that the wavelength range input covers the entire wavelength range of your spectrum. The detection threshold input to VALD must be less than 1/(3*SN) where SN is the signal-to-noise of the spectrum.

We can then run ACID and plot the final results:

import numpy as np
import ACID_code_v2 as acid

# choose a velocity grid for the final profile(s)
deltav = acid.calc_deltav(wavelength)   # velocity pixel size must not be smaller than the spectral pixel size - can use acid.calc_deltav function if unsure what this would be.
velocities = np.arange(-25, 25, deltav)

# Initiate Acid
Acid = acid.Acid(velocities=velocities, linelist_path=linelist_path)
# Run ACID
result = Acid.ACID(wavelength, spectrum, error, sn, nsteps=2000)

# Plot your final profile
result.plot_profiles() # See documentation for more plot kwarg options

# You can also plot walkers, corner plots, etc. See documentation.

# If you feel the need to continue sampling, you can do so with:
# result.continue_sampling(nsteps=2000) # And plot walkers to see the difference!

Multiple frames

Multiple frames of data can be input to directly to ACID. ACID adjusts these frames and performs the continuum fit on a combined spectrum (constructed from all frames). For this reason, frames must be from the same observation night where little variation is expected in the spectral continuum. As in the previous example, we must first read in the data:

from astropy.io import fits
import glob

# finds sample files in 'example directory'. Each file is a different frame.
files = glob.glob('sample_spec_*.fits')

# create lists for wavelengths, spectra, errors and sn for all frames
wavelengths = []
spectra = []
errors = []
sns = []

for file in files:
   spec_file = fits.open('%s'%file)

   wavelengths.append(spec_file[0].data)    # Wavelengths in Angstroms
   spectra.append(spec_file[1].data)        # Spectral Flux
   errors.append(spec_file[2].data)         # Spectral Flux Errors
   sns.append(float(spec_file[3].data))     # SN of Spectrum

linelist = 'example_linelist.txt' # Insert path to line list

Once the inputs have been constructed ACID can be applied and the results plotted.

import ACID_code_v2 as acid
import numpy as np

# choose a velocity grid for the final profile(s)
deltav = acid.calc_deltav(wavelength)
velocities = np.arange(-25, 25, deltav)

# run ACID function
Acid = acid.Acid(velocities=velocities, linelist_path=linelist_path)
result = Acid.ACID(wavelengths, spectra, errors, sns, nsteps=2000)

# plot results
result.plot_profiles()

# Remember you can obtain the entire result array via:
all_frames = result[:] # Which converts it to a numpy array of shape (frames, orders, 2, pixels)
# or you can do
all_frames = result.all_frames

Multiple wavelength ranges

In this example we will only consider one frame, however this example can be combined with the previous example to apply ACID to multiple frames and orders. Firstly, we will read in the data (exactly how we did in the quickstart tutorial).

from astropy.io import fits

spec_file = fits.open('sample_spec_1.fits')

wavelength = spec_file[0].data   # Wavelengths in Angstroms
spectrum = spec_file[1].data     # Spectral Flux
error = spec_file[2].data        # Spectral Flux Errors
sn = spec_file[3].data           # SN of Spectrum

linelist = 'example_linelist.txt' # Insert path to line list

We can then loop through our desired wavelength ranges, run ACID and plot the final results. In this example we will split the wavelength ranges into 1000Å chunks. When looping over wavelength ranges we also need to provide the result array (‘all_frames’) to keep all results in the same array.

import ACID_code_v2 as acid
import numpy as np

# choose a velocity grid for the final profile(s)
deltav = acid.calc_deltav(wavelength)
velocities = np.arange(-25, 25, deltav)

# choose size of wavelength ranges (or chunks)
wave_chunk = 25
chunks_no = int(np.floor((max(wavelength)-min(wavelength))/wave_chunk))

min_wave = min(wavelength)
max_wave = min_wave+wave_chunk

# create result array of shape (no. of frames, no. of chunks, 2, no. of velocity pixels)
result = np.zeros((1, chunks_no, 2, len(velocities)))

# Initiate Acid
Acid = acid.Acid(velocities=velocities, linelist_path=linelist_path)
for i in range(chunks_no):

   # use indexing to select correct chunk of spectrum
   idx = np.logical_and(wavelength>=min_wave, wavelength<=max_wave)

   # You can recursively call ACID to fill in each order
   # In the future, this loop will be handled internally by ACID and this
   # example will be updated accordingly.
   result = Acid.ACID(wavelength[idx], spectrum[idx], error[idx], sn,
                      all_frames=result, order=i, nsteps=2000)

   min_wave += wave_chunk
   max_wave += wave_chunk

# reset min and max wavelengths
min_wave = min(wavelength)
max_wave = min_wave+wave_chunk

# plot results
result.plot_profiles()
# Reember you can save and load results via:
# result.save_result("example/multi_order_result.pkl")
# result = acid.Result.load_result("example/multi_order_result.pkl")

HARPS data

ACID can also be directly applied to HARPS data from DRS pipeline 3.5. To apply ACID in this way all files must be contained in the same directory.

If applying to ‘s1d’ files, the corresponding ‘e2ds’ files must also be contained in this directory.

If applying to ‘e2ds’ files, the corresponding blaze files must be present in this directory as indicated in the FITS header of the e2ds file.

This application only requires a filelist of the HARPS FITS files, a line list that covers the entire wavelength range and a chosen velocity range. For ‘e2ds’ spectra the resolution of the profiles are optimized when the velocity pixel size is equal to the spectral resolution, i.e. 0.82 km/s.

import glob
import numpy as np

file_type = 'e2ds'
e2ds_files = glob.glob('tests/data/*e2ds_A*.fits') # Returns list of HARPS files
linelist_path = 'example/example_linelist.txt'
save_path = 'no save'
order_range = np.arange(41, 43) # Specify which orders to run ACID on (here we do 41 and 42 as an example)

# choose a velocity grid for the final profile(s)
deltav = 0.82     # velocity pixel size for HARPS e2ds data from DRS pipeline 3.5
velocities = np.arange(-25, 25, deltav)

These inputs can be input into the HARPS function of ACID (ACID_HARPS):

import ACID_code_v2 as acid

# run ACID function
Acid = acid.Acid(velocities=velocities, linelist_path=linelist_path)

# Due to legacy behaviour, the function returns BJDs, profiles and errors separately when indexed,
# not all_frames as in other examples. All frames can still be accessed via result.all_frames

result = Acid.ACID_HARPS(filelist=e2ds_files, file_type='e2ds', save_path=save_path, nsteps=2000,
                         order_range=order_range)

# BJDs, profiles, profile_errors = result
all_frames = result.all_frames
result.plot_profiles()

ACID computes and returns the Barycentric Julian Date, average profile and errors for each frame. The average profile is computed using a weighted mean across all orders.

Multiprocessing

The default multiprocessing setting is True for ACID, which means that ACID will automatically use all available CPU cores to run the MCMC sampler in parallel. According to emcee documentation, they recommend setting the environment variable: OMP_NUM_THREADS=1. With some testing, this is also absolutely necessary for ACID to avoid massive transfer overheads. We also recommend setting the environment variable: MKL_NUM_THREADS=1 to avoid similar issues.

For unknown reasons, on most standard machines, you can set these two variables to false just before the start of multiprocessing (which ACID does), but in some environments, eg. some HPC environments, they must be set either in the terminal with:

or right at the top of the page before ALL other imports:

We simply just recommend for all users to make sure these are set correctly before ACID is run. ACID will raise an exception in SLURM environments if they are not set when multiprocessing is True, but in other environments, if they are not set, ACID will run but will be extremely slow. You have been warned!

Reminder: you can always turn off multiprocessing in ACID by setting parallel=False.

Deterministic profile in MCMC fitting

As of 1.4.0, ACID can infer the profile points at each MCMC step from the continuum parameters. This means the sampler does not fit the profile points, but instead fits only the continuum parameters and calculates the profile from the alpha matrix and the continuum model. The end result is the same, given enough steps.

Basic testing with this feature enabled shows a slight speedup in the time per iteration, but a significant reduction in the number of iterations to convergence. This is because of the cumbersome nature of fitting every profile point as a free parameter in the MCMC. Convergence takes a long time. This is most obvious when we try ACID when running until convergence (see next section).

The feature can be enabled simply by setting deterministic_profile=True when calling ACID. By default, this is set to False to maintain the same behaviour as previous versions of ACID. In general we recommend this to be set to True for most applications.

# ... same code as before to set up data and run ACID ...

Acid = acid.Acid(velocities=velocities, linelist_path=linelist)
result = Acid.ACID(wavelength, spectrum, error, sn, nsteps=2000, deterministic_profile=True)
# Now runs ACID with deterministic profile points inferred from the continuum parameters.

Running ACID Until Convergence

As of 1.4, ACID can also detect if the sampler has converged based on the computed autocorrelation time of the sampler. You can configure the following parameters by passing them to the ACID function:

check_interval : int, optional
   Interval (in steps) at which to check for MCMC convergence if max_steps is set, by default 1000.
   Only used if max_steps is set.
min_checks : int, optional
   Minimum number of checks before MCMC can be stopped, by default 3. Only used if max_steps is set.
min_tau_factor : int, optional
   Minimum tau factor for MCMC stopping criterion, by default 50. Only used if max_steps is set.
tau_tol : float, optional
   Tolerance for tau convergence in MCMC stopping criterion, by default 0.01. Only used if max_steps is set.
   If the sampler has not converged, it will print a warning. You can configure this by setting the max_steps parameter to your choosing when calling ACID.
   The sampler will then run either until convergence is reached or the maximum number of steps is reached.
# ... same code as before to set up data and run ACID ...

Acid = acid.Acid(velocities=velocities, linelist_path=linelist)
result = Acid.ACID(wavelength, spectrum, error, sn, max_steps=5000)

The above code will still have a fully working sampler, which can plot the profiles as per normal (see the Results class, or the Other functions page for possible plotting options). The sampler will give the following warning however:

The number of MCMC steps is less than 50 times the maximum autocorrelation time.
The sampler may not have converged. Consider running more steps or checking the walker plots.
The max autocorrelation time is 651.30, therefore the minimum number of steps should be roughly 32565.
Disabling burnin from autocorrelation time, instead using burnin=steps-1000

If we turn on the deterministic profile feature, we see a significant improvement in convergence:

# ... same code as before to set up data and run ACID ...

Acid = acid.Acid(velocities=velocities, linelist_path=linelist)
result = Acid.ACID(wavelength, spectrum, error, sn, max_steps=5000, deterministic_profile=True)
Iteration 1/5, last tolerance: inf>0.05, neff: 0.00<50: 100% 1000/1000 [00:03<00:00, 282.63it/s]
Iteration 2/5, last tolerance: inf>0.05, neff: 0.00<50: 100% 1000/1000 [00:03<00:00, 284.76it/s]
Iteration 3/5, last tolerance: 0.0793>0.05, neff: 26.00<50: 100% 1000/1000 [00:03<00:00, 286.68it/s]
Iteration 4/5, last tolerance: 0.0181<0.05, neff: 38.00<50: 100% 1000/1000 [00:03<00:00, 286.20it/s]
Converged at step 4000. Final tolerance: 0.0066, final effective sample size: 51.00.