To Infinity And Beyond: Hunting For Exoplanets With Machine Learning & Kepler Data

To Infinity And Beyond: Hunting For Exoplanets With Machine Learning & Kepler Data

15th September 2018 12 By admin

Exoplanets are the planets outside of the solar system. They are detected by astronomers by observing the intensity of their parent stars. Since the planets will not have a light of their own, noting down the intensity of the star would give a sudden decrease for a certain amount of time with respect to its background ambient intensity if the orbiting planets comes in front of the star in the line of sight of our observation. This is how exoplanets are detected.

Kepler, NASA’s spacecraft named after Johannes Kepler launched in 2007 for discovering exoplanets, takes an enormous amount of data for any computer to deal with. It takes a lot of time to see this dip in the intensity and propose an existence of the planet when it comes to analysing it manually. Instead, machine learning algorithms can be used to train a model and predict these exoplanets. It is very difficult to find these exoplanets because:

1) They are very far away

2) The intensity dip that they might cause is far less than its host star intensity. So, it is not necessary that we get an exoplanet intensity dip when revolving around its parent star. But new machine learning algorithms have made it relatively easier to find them.

Advantages Of Using ML In The Exoplanet Search

Deploying machine learning algorithms gives a flexibility because it learns using examples and not by concrete programming. The basic idea is teaching an algorithm to classify whether the input intensity or transit graph is of an exoplanet or not. It will save time that astronomers use to analyse these exoplanets to come with a conclusion.

  • Neural network: A deep convolutional neural network is trained to test whether the transiting object causing an intensity dip is an exoplanet or not. There has also been a successful discovery of two new exoplanets using this neural network, after testing the Kepler data. One of them was a five-planet resonant chain around Kepler-80, the other one was found to orbit Kepler-90, which has seven planets revolving around it already. Known light curves were used as inputs to the neural network. Folding each flattened light curve and binning the same to produce a 1D vector is taken as an input. Three types of neural networks can be used for classifying the dip as exoplanets and non-exoplanets. For each type, three different input models are used —  global, local and global and local together.
  • Exoplanets and non-exoplanets are separated by a linear decision surface in the input image, so they are both linearly separable. Baseline model is a neural network with zero hidden layers.
  • Fully connected neural network which has the least assumptions for the input.
  • Convolution neural network used for spatially structured input data like speech synthesis and image classification.

The models was implemented in TensorFlow to get the result.

  • SVM: By implementing Support Vector Machine (SVM) algorithm, instead of creating copies of the same images, the algorithm has to create a new image that is a modified version of the given images. Therefore, we can have positive as well as negative examples as training images. Since the exoplanet detection is in the form of light intensity difference, overtime, it can be thought as a range of different frequencies jumbled up together. We must know what are the frequencies that belong to exoplanets and what frequencies belong to non-exoplanets. In order to separate these two different range of frequencies and decompose them, Fourier transform is used. It is better to make the positive and negative examples that go into the training set, equal in number. By doing this, the algorithm will be able to act better with the real-time data that it has not seen before. It essentially converts intensity over time to intensity over frequency and gives a more distinctive spike-like feature for the algorithm to train.
  • LPP and k-nearest: Locality Preserving Projections (LPP) dimensionality reduction and k-nearest neighbours are another set of methods that can be used to determine whether a given signal is that of a transiting exoplanet or not. By folding and binning the light curve and applying machine learning techniques that rely on our knowledge about what a transit signal looks like in the Kepler data, the LPP transit metric is calculated. It gives the advantage of reliability and time consumption over the manual analysis. LPP considers only the nearest neighbours by constructing a symmetric weighted matrix that has a value 1 when two data points are connected, and 0 when they aren’t.

Algorithms are also used to map the properties of temperature, composition and cloudiness of the exoplanets based on the intensity dip itself. Since the exoplanets filters the light from its host star, its absorption spectrum gives clues about its atmosphere, temperature and composition. The machine learning algorithm is trained using thousands of complicated spectra. Random forest method is used and trained with a precomputed grid of atmospheric models which gives distribution of certain molecules and atmospheric clouds. Precomputed grid was used by researchers so that a large part of the computer burden is shifted offline. A transmission spectrum of the exoplanet WASP-12b was taken to test using a model. The machine learning algorithm plots the spectra in N dimensional space. N is the number of wavelength bins in each spectrum and it then identifies clusters in multidimensional space. Atmospheres belonging to the same cluster are slightly similar. So, with a real life example of a spectrum, it generates a spectrum and assigns it to the physical attributes of the nearest cluster.

There is also research going on for detecting Milky Way’s Hot Jupiters through machine learning. The computer is taught how a phase curve (light curve with sinusoidal variation) of the exoplanets and the non-exoplanets looked like. This is done by giving a lot of examples and telling which ones belong to which of the two categories. Specific features of light curves, like amplitude, period of the planet’s signal, were also used in the training. The data can be portioned and then used to train a classifier to distinguish between exoplanets and non-exoplanets. The functionality of the classifier is also later checked. Phase curves can be further used to study their atmospheres and non-transiting Hot Jupiters can also be discovered.

Future Exoplanets

Machine learning, like many other astronomical applications, can be successfully used in real-time Kepler data to detect exoplanets within a time frame far less than the manual process that the astronomers go through.


courtesy : AIM

Please follow and like us: