Data-X Students assist Breakthrough Listen at SETI in search for extraterrestrial intelligence

Introduced by Tin Hang, Anshul Jain, Pushya Mitra, Diego Silva, Arundhishaan

Does life outside earth exist? To answer this question, scientists at the Breakthrough Foundation use radio telescopes to collect radio data from the planets in space. They believe these planets might be leaking out signals just like Earth, so they study the data to look for interesting signals. The researchers came across a few, but it usually is a interference from a close wifi or radio signal. Since they have huge amounts of data it is extremely difficult for them to keep track of the signals they capture.

Searching E-T is a project that we started as part of Berkeley SETI Research Center with the guidance of Steve Croft, who is a researcher in the astronomy department and an outreach specialist for Breakthrough Listen, an open project. Data from the automated planet finder and the Green Bank observatory telescopes are flowing into Breakthrough Listen’s public archive. This is an initiative to find extra-terrestrial intelligence.

Berkeley SETI has started many programs where researchers from across the world can make use of the data and be a part of this $100M-funded initiative. Large chunks of signal data are gathered using these observatory telescopes. We need to classify these signals as terrestrial and non terrestrial. We are in the process of identifying terrestrial signals and clustering them. If we can find all the known signals, then we can find unknown signals that come from space.

This space is entering the field of research which involves modulation and demodulation techniques, IoT applications, satellite communication, etc. Data Science is now popularly used in signal classification. And as part of SCET’s Data-X class, we started working on identifying FM stations in the Bay Area. There are total 46 FM stations around the Bay Area, out of which 40 were correctly identified while testing.

We took several approaches to identify FM stations. We used a rtl-sdr dongle, which has a frequency range of 60 MHz to 2400 MHz, to capture our signals. Initially, we captured the data and converted into a Pandas dataframe containing 3 columns for training, namely frequency, power and Is_FM. Then we fed this data to various classification models provided by scikit-learn and observed that XGboost and KNN model gave the best accuracy. However, because XGBoost takes a lot of time to both train and test, we planned to use KNN model. In order to perform the validation, we recorded another csv using RTL-SDR and modified the raw csv into useful data using a function similar to function used while training and fed the data to model and got around 88% accuracy score.

Our first part of project was to train a classification model which can identify radio stations from the entire range of frequencies GQRX can originally capture i.e. from 60 MHz – 2400 MHz and we got 75% accuracy. Our model identified 51 frequencies as FM stations out of which 40 were actual FM stations.

A second approach towards finding FM stations is to collect the I/Q data through the rtl-sdr, pre-process the signal directly, and run a neural net to identify the FM stations in the specified frequency range. Radio stations send signals in the form of I/Q data so that they can send more information through the carrier. Stronger tools like the HackRF One can be used to capture signals like WiFi signals, even above the 1 GHz range.

The second part of our project is to classify whether the FM stations that we found are playing music in real-time. We separated data into 4 classes, including music, noise, talking and no sound. We pre-trained the music data that had 4800 samples and controlled the GQRX TCP Server using our interface to generate 1 sec of .wav file at the FM stations we found. Using neural net with TensorFlow API, we were able to classify whether a FM station is playing music and return a list with the appropriate data.

Many things can be done in the future by identifying other signals, like WiFi and GPS, and capturing a wider spectrum. We can also work on identifying the different genres of music while it is being played on an FM station. Understanding all of the known signals will give rise to finding the unknown, and the search continues.

Taking the project further: Future Developments

  • Capturing signals in a larger frequency range like WiFi signals, GPS, and other terrestrial signals using stronger antennas like HackRF One, and trying to identify these signals and clustering them.
  • We identified music and not music at FM stations, and sought to understand different characteristics of these signals. Such extra projects have other applications like using them in car radios, apps, etc. and also can be improved if we can find the genres of music that are played on the FM station.
  • The research now is focused more on modulation and demodulation techniques, IoT applications, satellite communication, etc.