Experience upgrading Oracle 11.2.0.4 to 12c

I welcome everyone. I am a representative of the billing systems development department in the regional telecom operator. I want to share the experience of upgrading Oracle to version 12c (12.2.0.1)
(For some reason, many people confuse the upgrade process with migration, here it is lucidly written when and in which cases to use one or another value). All events took place last year.
Organizational activities
From the beginning of the year, organizational preparatory work began, in the first place it was necessary to deploy a test zone. No, we had a test zone, only Oracle was deployed under SUSE in it. And in the industrial environment, Oracle is installed on servers with the IA-64 platform, and HP-UX as an OS, while deploying HP-UX in a virtual environment turned out to be that quest - HP-UX only supports one VM as a guest OS - Integrity Virtual Machines, which must be installed on a server with the same architecture. As a result, we decided to work on a production standby-db server.
The second step was setting up the HP-UX OS ( as recommended by Oracle) Taking into account that we have neither a “test zone” nor TP on HP-UX, we started looking for an HP-UX specialist who would take on this work taking into account the risks. Among the familiar specialists, there were none, they started phoning HP managers. Of course, the dialogue with the manager was reduced to the acquisition of technical support, otherwise there is nothing to talk about.
The last time we performed a cost calculation for HP technical support (hardware and software) about 5 years ago. For the sake of interest, I requested an assessment of the cost of TP. For the cost that we were provided with, you can deploy a small data center, and plus technical support for everything for 3 years, so the question of purchasing it has disappeared.
We tried to consider options with technical support only for software, with limited TP for the duration of the work, searched the system integrators for a used server on Itanium as a test zone, but all to no avail.
As a result, the specialist from the billing provider responded, for which many thanks to him.
OS settings were made, as indicated above, on the standby-db server. Oracle itself was updated on the test zone under SUSE. Everything went smoothly. Encouraged by the result, we planned an update in the industrial zone on the night of October 24 to 25, taking into account that by the beginning of the next working day, everything should have worked (according to our ideal plan).
Preparing for the upgrade process
We started preparations on the afternoon of October 24, together with the DBA sent to us.
The required parameters were set, configured extraordinary backups, cleared some "heavy" tables. Next stop services, business processes, jobs, and so on. In general, the preparation took almost the whole day, the update process itself began about 12 nights and ended at 4 in the morning. After they started to run everything in the reverse order, the time was about 6 in the morning, we were already mentally at home, ready to go to bed, but it wasn’t there. All services started, except for the application server. We found out that his service refused to start due to the fact that the version of Oracle Client (10) was lower than the server one, it was necessary to update all servers where the client part was below the valid version. Updated - earned.
It turned out to be just the smallest of the problems. During the health check of business processes. found that the functionality of the subscriber profile server is compromised. When accessing certain data, the ORA-00600 error [qmcxdGetQNameInfo2] popped up, which, in fact, was the root of the problem. We opened a service request (SR) in support Oracle in the status of "Severity 2", in parallel, searched for possible solutions to the problem. The situation was aggravated by the fact that we could neither register subscribers nor service: the customer service system (SBMS), when registering a subscriber, could not create a profile, CRM also did not function.
By evening, the load subsided, and the situation returned to normal. In addition, we have a solution to the problem. We found that errors only occur when accessing fields of type XML TYPE. At the same time, the XDB component (Oracle XML DB) itself was valid. It was decided to try setting the COMPATIBLE parameter to 12.2, which at that time was still at 11.2, as the Database Upgrade Guide , version 12.2 says: Do not make this change until you are ready to upgrade, because a downgrade back to an earlier compatibility level is not possible after you raise the COMPATIBLE initialization parameter value. those. after setting this parameter to the appropriate value, returning to the previous version becomes impossible. But another Oracle document (Doc ID 1292089.1) has the following remark: ... after the upgradeoperation, you must set the database compatibility to at least 12.1.0.1. If the compatibility is less than 12.1.0.1 then an error is raised when you try to use Oracle XML DB. Thus, we decided to try to correct the situation by sacrificing the opportunity to downgrade. But, as it turned out, this decision did not bring any result. After that, we postponed the solution of the problem until the morning, as the lack of sleep affected, and it was more difficult to think with each hour. In addition, you had to wait for an answer from Oracle support.
The solution to the main problem (BUG 26814058)
On the morning of October 26, we received a response from Oracle that identified the problem as: unpublished BUG 26814058 - SELECT FROM TABLE WITH XMLTYPE FAILS WITH ORA-600 [QMCXDGETQNAMEINFO2] , which is classified as “Code / Hardware Bug” in status 11. Bug is relatively fresh (registered September 16), and even on the previous version (12.1). Neither the patch nor the workaround existed for him at that time (and possibly for the current one). Status 11 indicates that work is underway on the patch, but at the same time, there is no exact timeline for the release of the patch from Oracle, even if they release it in a couple of days. Raised the level of our SR to "Severity 1", but there was no hope for a quick solution from Oracle. It was necessary to make a decision - return to version 11 or try to fix what is.
The second day without customer service affected, so we decided to wait for the night and roll back to version 11 using the standby-db server, since we couldn’t return using the standard downgrade procedure due to the COMPATIBLE parameter. After consulting and analyzing the tables, it turned out that a significant part of the services did not work, but all of them were tied to a profile server (GUP). Moreover, it was possible to localize the problem to two problem tables. One of them was of particular difficulty, since it contained more than 10 million. Records (about 4GB). The error occurred both when trying to process data of XML TYPE type fields, and when trying to export table data. The operation "create table ... as select ..." passed, but did not give any result, the data in the new table also turned out to be damaged.
We decided to try to extract data from the standby-db server using the DATA PUMP, which was in the "BEFORE the update" state. But it turned out that the data there is also corrupted. The fact is that in preparation for the upgrade, one of the steps involved rebuilding XDB in accordance with Oracle recommendations. Presumably, the data in the XML TYPE columns is corrupted as a result of this particular operation, which, incidentally, also occurs directly when upgrading to Oracle 12.2, since starting from version 12, the XDB component becomes mandatory and it is no longer possible to reinstall it. However, the update instruction requires that all database components be valid at the time of the update, otherwise they should be reinstalled. Thus, by the end of the working day the problem boiled down to finding the possibility of retrieving intact table data.
Conclusion
At the most critical moment, when we tried all the options and nothing helped, we were saved by backup from the standby-db server, which a colleague did before the start of the update on October 24th. It was possible, through RMAN, to partially (only the required table spaces) restore the standby DB at the time "before rebuilding XDB" and extract the necessary data. After that, using DATA PUMP, problem tables were exported from standby to the main-db server. This operation was completed at approximately 2 a.m. on October 27. After that, the functionality of the GUP server was restored. Despite the fact that the data in the restored table is outdated in time, the table was relevant, since all attempts to process the damaged data failed. Those. in fact, the database was in read-only state and the data in it did not change from the moment “BEFORE the update”.
If we had a test zone, we might have identified this problem in preparation for the upgrade, but since Oracle classified the BUG as a “Code / Hardware Bug”, it required matching not only in Oracle versions and user data, but also in platform and version OS, which was not possible.
Do not forget to backup, good luck to everyone.