As NASA mission to Pluto almost disappeared
Non-public history of the probe "New Horizons"
On Saturday morning, July 4, 2015, Alan Stern [Alan Stern], head of NASA's Pluto mission , New Horizons, worked in his office near the mission control center when his phone rang. He was aware of the celebration of " Independence Day ", but he was much more interested in the fact that on that day the mark "10 days before rapprochement with Pluto" was reached. The spacecraft “New Horizons”, which was the main theme of his career over the past 14 years, was only 10 days away from its goal - meeting with the most distant from the studied [dwarf] planets.
Alan that day was immersed in work, preparing for rapprochement. He was used to the fact that during the last phase of the mission he could not sleep, but that day he woke up in the middle of the night and went to the mission operations center (GCM) to download a large and very important set of computer instructions that will control the spacecraft. during the rendezvous. This set of commands was the result of ten years of hard work, and that morning he was sent over radio waves at the speed of light, catching up the New Horizons while the ship was approaching Pluto.
Alice Bowman, Mission Operations Manager, or IOM, at work. At work, everyone calls her IOM [eng. mom - mom / approx. trans.] or GCM. The photo was taken at the time of the last awakening of the ship from the state of hibernation on December 6, 2014. Bowman said: "Apparently, I was then asking about configuration changes or about telemetry, which I watched on monitors."
Alan glanced at the ringing phone and was surprised that Glen Fountain was calling him, who had been working for a long time as the manager of the New Horizons project. A chill ran down his back, as he knew that Glen was now on vacation, resting in his house near, before the last, most intense phase of the mission, convergence. Why did Glenn need to call him?
Alan picked up the phone. "Glen, what happened?" "We lost contact with the ship." Alan replied: "We will meet in five minutes at the GCM." He hung up and sat at the table in shock for several seconds, shaking his head in incredulity. No spacecraft should suddenly lose contact with Earth. This has never happened to the New Horizons before, in all nine years of flight from Earth to Pluto. How can this happen now, just 10 days before Pluto?
* * *
For nine long years of travel to the ninth planet, radio communication with the “New Horizons” was a rescue rope that allowed the team to contact the ship, control it, get its status and data of its observations. The farther the ship went to the outskirts of the Solar System, the longer the time delays became, and the connection time increased up to the nine-hour period necessary for the radio signal to travel back and forth, moving at the speed of light.
To stay in touch, New Horizons, like all spacecraft on distant frontiers, rely on a little-known and unsung miracle of exploring the planets: a network of remote space communicationsNASA (Deep Space Network, DSN). This is a trio of giant parabolic antennas located in Goldstone (California), in Madrid (Spain) and in Canberra (Australia). They pass without pause to each other the obligation to transmit messages in the process of the Earth rotating around its axis. These three stations are so distributed around the planet that, regardless of the location of the ship in deep space, at any time at least one of these antennas can be sent to it.
But at that moment the DSN lost touch with one of the most valuable assets, New Horizons.
If it was an apparatus in orbit, or an all-terrain vehicle safely on the surface of an alien planet, then the team could slowly analyze the problem, make recommendations, try various options for action. But the mission "New Horizons" was to approach the celestial body. The ship rushed to Pluto at a speed of 1.2 million km per day, or more than 50,000 km per hour. He will return to working condition, or not - in any case he will fly past the planet on July 14, and will never return there again. It was impossible to stop the ship and deal with the problem. There was only one chance to get information about Pluto - the ship had no insurance, a second chance, a way to postpone a meeting with a dwarf planet.
* * *
Shortly before that, on that day of July 4, Alisa Bowman, an extremely competent and cold-blooded mission leader, a veteran with 14 years of experience as a mission operations manager (hence the IOM nickname), was in the GCM with a small number of operational workers waiting for a report from “ New Horizons ”, which was supposed to mention the successful receipt and installation of new instructions. The ship was to, following this long list of crews, make hundreds of scientific observations in the course of nine days, when it was the closest one to Pluto. This thoroughly tested list of commands was central to the entire mission, and its successful transmission and execution had to manage every turn and movement of the ship, every entry of information into memory, every transfer of data to the Earth, every shot of the camera, and so on.
At about one o'clock in the afternoon, just in time, the first signals began to arrive confirming the receipt of the list of commands. Alice:
Everything went fine, until 13:55. Suddenly we completely lost touch with the ship. Silence. Nothing. Connection lost, and not returned.
Nine times out of ten when a signal is lost, the problem lies in the ground station - something is not configured, or something like that. Since this download was very important, we kept all the engineers, network operators, in touch. We call them the acronym NOPE [Network OPerations Engineers]. And we also have Aces Pluto - these are supervisors from our center of operations. Therefore, we asked Aes Pluto to ask NOPE, who worked in Australia, to check the system configuration. And according to the result of all checks with the ground system, everything was within the normal range.
This meant that the problem was not on Earth - not in Maryland, where Alice and her team Ases Pluto gathered, not in Australia, where all the NOPE gathered at the station of remote space communications in Canberra, which received the signal from the New Horizons. Loss of signal meant a problem with the ship itself.
Loss of signal is one of the most unpleasant problems that a mission control team may face. It means a loss of connection with the Earth. But this is not the worst. After all, the emergence of such a problem may mean that the spacecraft had a catastrophic failure. Alice felt the injection of a previously unknown fear:
Do you know this feeling in your stomach when something happens and you cannot believe it? Nine and a half years have passed since the beginning of our journey, and I could not believe what had happened - we had never lost touch. 5-10 seconds, you can allow yourself to feel the fear and mistrust, but then everything that we learned during the training sessions is turned on.
The “New Horizons” was still millions of kilometers from Pluto, and any dangers it could pose. The chances of colliding with some body in interplanetary space were absurdly small. However, the whole team had a terrible thought: could this be a clash with something?
* * *
The team had telemetry received from the ship before it stopped, and Chris Hersman (head of the ship engineering team) with its engineers, who continued to arrive at the center, had already developed some working ideas. They quickly found out that shortly before the moment when the connection was interrupted, the main computer of the ship was performing two programs at the same time, and both of them were quite demanding of equipment. One of the tasks was related to the compression of 63 images of Pluto, obtained before, in order to free up space in the memory for new images that had to be obtained as they approached. At the same time, the computer received an update from the Earth and saved it in memory. Could the computer be overloaded with these computational tasks and restart?
This was Brian Bauer's theory. He was a systems engineer for autonomous work, who programmed the recovery procedure that the ship had to perform in such a situation. Brian told Alice: “If that’s what happened, the ship will reboot using an auxiliary computer, and in 60-90 minutes we will receive a signal that it is working from an auxiliary computer.”
Someone described the First World War as "months of boredom, interspersed with moments of horror." The same could be said about long-term space missions. It was a long and terrifying hour, waiting and hoping for a signal from the ship. Engineers, Aces, as well as Alice, Glen and Alan waited all these long minutes, preparing plans in case Brian's hypothesis would be wrong. But, of course, after 90 minutes from the ship came a signal that he switched to an auxiliary computer.
Communication has been restored, and the fear of the catastrophe of losing the ship has evaporated. But the crisis did not end there - he just moved into a new phase.
The GCB and adjoining rooms were quickly filled with engineers, members of the flight management team, and other people participating in the project who interrupted their weekends in order to come and help the project. People wearing shorts and flip-flops arrived in picnic clothes - they threw everything and hurried to get to the GCM.
Receiving more and more telemetry data from their birdies, they found that all the files with a list of commands to be executed during the rendezvous, which they downloaded to the main computer, were erased at the moment when the ship rebooted from the secondary. This meant that the entire sequence of commands sent that morning had to be sent again. Worse, it was necessary to re-send a lot of auxiliary files needed to execute the main list of commands, some of which were loaded onto the ship back in December. Alice recalls: “We have never had to recover from such an anomaly. The question was, is there enough time for us to launch the rendezvous procedure, which was supposed to start on July 7? ”
This kind of view could be observed in the study room every time new Pluton images were received.
This meant that the team had only three days to “Humpty Dumpty collect”, and this had to be done at a distance of 5 billion km. If they did not succeed, then every day they would lose dozens of opportunities to conduct observations of Pluto from close range, which were part of carefully developed plans. The team unexpectedly found itself in conditions where, in three days, it was necessary to restore everything that they had spent years of planning and loading months.
The process of returning the ship to its working state after any anomaly is based on holding formal meetings of the Anomaly Commission (AnBly Review Boards, ARB). Shortly after 4 pm, just 45 minutes after the restoration of communication with the ship, the first meeting of the commission took place in the meeting room next to the GCM.
At this introductory meeting, team members needed to assess what happened, understand how to restore the rendezvous plan, and how to make sure that in the recovery process they will not do anything that could lead to another problem. How much they were thrown back because of booting from an auxiliary computer was overwhelming. They quickly appreciated that they would need to reproduce several weeks of work in just three days to start the convergence process on July 7th. And all this had to be done without a single mistake.
Worse, each team had to be performed remotely, under conditions where it would take nine hours for the signal to go back and forth. The school talks about high speed of light, how a signal at this speed can go around the Earth in eighth of a second, and fly to the moon and back in just two and a half seconds. But for the New Horizons team, who were trying to restore the ship’s work when it was approaching Pluto, the great distance from Earth to the ship made the speed of light painfully slow.
* * *
The people who were going to the commission meeting knew that because of all this media attention, the whole world would soon know that the New Horizons had stumbled almost at the end of its journey. In just 10 days, the spacecraft will fly through the Pluto system - nothing can stop the celestial mechanics - but whether he will collect all the information for which he traveled for almost ten years, at the same time, this is a completely different question.
Alan and Glen opened the meeting, told the participants that they had never seen a better team than the one that is working on “New Horizons”, and that if any team can restore its work, those are the people who gathered in the room. Then Alice took the floor, and began to develop a recovery plan.
She immediately clarified to Alan exactly which scientific observations would be lost on that day and in the next three days before the sequence of commands for rapprochement should start on July 7. She needed to understand whether the team leader considered it necessary to try to restore these observations as well, in order to reconfigure the ship and load all the files and commands necessary for the rendezvous procedure. Alan:
I did not discuss this issue with the scientists from our team who were in the room. I didn’t even give the floor to my Chief Approach Planner, Leslie. I knew that Alice needed a clear direction of work, without any ambiguities, and that they needed to concentrate on the main event, and not on the preliminary observations that we lost because of the device’s downtime after the reboot. I told Alice that they should not be distracted by anything except returning the device to a working state, and launching the rendezvous procedure.
Alice got her directions. Her only job was to save the main sequence of commands for rapprochement, everything else could be sacrificed. But could it be done on time?
Alice and her team quickly but carefully drew up a recovery plan. Over the next three days, they had to develop procedures that returned the spacecraft back to work from the main computer, resend all lost commands and auxiliary files from the main command list, and check all this on the NHOPS (New Horizons Operations Simulator) ship simulator before implementation to ensure that every action works the first time - there was no space for repeated actions. They knew that the launch of the main list of teams was scheduled for noon on July 7. Therefore, the Alice team calculated the time remaining up to this point, and divided it into nine-hour segments, the travel time of the signal back and forth. In nine hours it was possible to send each set of procedures to the ship and receive confirmation that they were successfully completed. Taking into account everything else that had to be performed here on the ground, they found that they would have time to conduct only three such communication cycles before the main list of commands starts on the afternoon of July 7.
In this regard, the restoration had to be divided into three steps. First, the team must command the ship to restore normal communication mode instead of emergency mode. This will increase the speed of communications 100 times and allow you to perform all other actions on time. They estimated that only on this, the first step would take half a day to program this procedure, test it, send it to the ship and receive confirmation of a successful launch. Tik-Tak.
Then the team will command the ship to reboot from the main computer. It was necessary to perform the main list of procedures during the rendezvous. During the flight, they have never had to do such a reboot. Therefore, it was necessary to develop a procedure, program it, check for NHOPS, and then carefully examine the results of the test before sending it to the ship. Finally, the team will have to methodically restore all auxiliary files and run the renderer. The development of this plan was completed only by midnight, and it was impossible to lose time - almost 10 hours had passed since the signal was lost. Tik-Tak.
Alice’s team, working closely with the ship system team led by Chris Hersman, wrote, checked and sent the first set of crews about 12 hours after they reestablished communication with the ship, at about 3:15 am 5 July
Nine hours later, on the fifth day, the GCM received confirmation of the restoration of the normal communication mode! However, by that time the day had passed, and the ship had covered a distance of one and a half million kilometers, approaching its target, to Pluto. The first recovery step was completed, but only two days remained to launch the main command list. Tik-Tak.
* * *
The New Horizons team organized its work and life for the next few days to fit into the nine-hour communication cycles with the spacecraft. They worked with a very small number of hours of sleep and with a very large amount of adrenaline. They have worked together for more than ten years and have encountered problems with a spacecraft before, but never before have problems of such magnitude been at such a cost of error. She demanded round-the-clock presence in the control center, and the team provided everything needed for that.
Glenn recalls: “The team just did what was required of it. I began to look for people to sleep, something more comfortable than the floor in their office. ” Alice recalls: “We found cots, blankets and pillows, and someone brought inflatable mattresses. They were not enough, so we used them in turn. " Alan:
It was necessary to see. Without a single complaint, people worked day and night - even without changing clothes, without having places for comfortable sleep or a shower, in some cases - for four days in a row. Some people slept on the tables. Some slept two to three hours a day. There was no time to dine in the cafeteria. Some people were engaged only in delivering food and feeding the rest.
To ensure the correct operation of all the steps taken, it was simply necessary to check each procedure for NHOPS. Since this system perfectly emulated a spacecraft, it was possible to check all the teams for it, find errors, to make sure that all the commands sent to the ship did not contain errors.
It turned out that during the restoration work, one decision made many years ago saved the situation. Alan once worried that the team did not have a safety copy of NHOPS, which gave the task to build its doubler. And on the weekend of July 4, it turned out that there would simply not be enough time to test all the new procedures on a single simulator. Therefore, the second NHOPS was used as a backup, thanks to which the number of test runs was doubled. If the second simulator did not exist, the restoration would take several days more, and a huge amount of unique scientific data would be lost forever.
The intermediate step of bringing New Horizons out of the safe mode and switching to the main flight computer using the procedures tested on the NHOPS-1 and NHOPS-2 was successful, as confirmed by the telemetry sent by the ship on July 6.
After that, the ship had to be configured exactly the same way as before trying to load a set of commands to work when approaching July 4, and then, as a last step, send all these commands again, along with dozens of auxiliary files that were lost when the computer was restarted. On all these steps, on testing them on simulators, on several meetings of the anomaly commission, where each step was planned and approved, it took all of July 6th.
And, somehow, on the morning of July 7, all the restoration work was completed. Exhausted to the limit, the team was able to return the spacecraft to the operating mode and prepare it for rapprochement. They finished, having only four hours left before they had to run the commands on the list.
* * *
What scientific data was lost due to the July 4 anomaly and restoration? Saving the situation, Alice with the team clearly followed the instructions of Alan “to do everything possible” to save the main list of teams. As a result, they ruined all the observations that could have passed during the three days spent on restoration work, since it was completely impossible to re-plan them while simultaneously taking the ship out of the safe mode and preparing it to launch the rendezvous procedure.
However, Alice’s team managed to save 63 images that were compressing a computer at the time of the anomaly. These images needed to be compressed to be stored in memory, and larger, unprocessed ones, should be deleted in order to allow more space for the data on convergence in memory. During recovery operations, Alice’s team found an unoccupied window in the ship’s work, and managed to change the image compression schedule, saving each of the 63 priceless photos.
And what about all the observations that were supposed to be made when approaching the planet, but did not occur due to restoration work? Alan gave the task to the Chief Planner of the rendezvous procedure, Leslie Young, to form a team of specialists in particularly difficult cases and think about this situation. Leslie and her team studied each of the lost opportunities for observation and its influence on the mission’s scientific response for three days of recovery work. They found that in each case the observation could have been made later, with a higher resolution or from a closer distance, with the result that no tasks were affected, except for one. This is the last search for the satellites of Pluto, which was planned for July 5 and 6, when the ship was at a sufficiently large distance from the planet to make images of the space around it. This search had to go with a sensitivity several times greater than the search that was carried out a few days before. When all the images intended for satellite searches were thoroughly studied by the New Horizons team, no satellite was found. This surprised many team members, because every time the Hubble Space Telescope looked at something more carefully, he always found new satellites. Would the New Horizons satellites find the last, best search attempt that did not take place? No one knows, and we will know it only when some next mission arrives to Pluto to look for them again. intended to search for satellites, were carefully studied by the New Horizons team, no satellite was found. This surprised many team members, because every time the Hubble Space Telescope looked at something more carefully, he always found new satellites. Would the New Horizons satellites find the last, best search attempt that did not take place? No one knows, and we will know it only when some next mission arrives to Pluto to look for them again. intended to search for satellites, were carefully studied by the New Horizons team, no satellite was found. This surprised many team members, because every time the Hubble Space Telescope looked at something more carefully, he always found new satellites. Would the New Horizons satellites find the last, best search attempt that did not take place? No one knows, and we will know it only when some next mission arrives to Pluto to look for them again. the best search attempt that didn’t take place? No one knows, and we will know it only when some next mission arrives to Pluto to look for them again. the best search attempt that didn’t take place? No one knows, and we will know it only when some next mission arrives to Pluto to look for them again.
Alan Stern is the scientific leader of the New Horizons mission, managing NASA projects on the study of the Pluto system and the Kuiper belt. Planetologist, executive director of the space program, aerospace consultant, author of books, participant in two dozen scientific space missions.
David Greenspoon - astrobiologist, popularizer of science, author.
An excerpt from the book "Chasing New Horizons: Inside the Epic First Mission to Pluto" (Chasing New Horizons: Alan Stern and David Grinspoon, 2018).