How to find the line in which the mainframe program crashes by CEE dump
Once I had to master the work in C ++ with the mainframe, and there was a problem how to figure out where the program crashes and what is the reason. Immediately make a reservation that everything relates to programming on mainframes under the z / OS operating system in USS. On the one hand, this is elementary, but finding all this in the IBM documentation is not so simple. In addition, you must at least be able to read HLASM.
In the document below I will try to describe how to find the line where the program crashes by CEE dump.
Below is a short program that is written so that it will certainly fall as soon as it is launched. I did not complicate the program, and for C ++ programmers it seems too simple, but the task is not to find the error immediately in the code, but to come to it through the listings. In reality, programs are much larger and longer.
To find the line, you need to compile the program with a listing, it is done like this
1. without XPLINK
2.c XPLINK
The key is -Wc, list '(t1.list)', it forces the listing to be generated for this file.
In the Makefile, this is added like this.
The first line for C programs, the second in C ++.
Run the program, received a dump and messages below.
If you do not go into details, then already here we have the information we need. We see that the program fell into the f1 function, and we have an offset. Now we need a listing that was generated in the same directory as the source (t1.list, t2.list. T1 for a program without XPLINK, and t2 for a program with XPLINK).
In the listing, we look for the beginning of the function in which the program crashed.
1. without XPLINK
2. with XPLINK
The first column is the offset, in front of it is the code that was generated, the third is the line number in the source file.
As you can see, a simple program without XPLINK starts immediately, and it has an offset of 000018, and functions with XPLINK first have a header (blue), and then a program, and it starts with an offset of 000028. Thus, we calculated the initial offset of the function. Now to the initial offset add the offset where our program crashed. All calculations are in hexadecimal form.
1. 18 + 2A = 42
2. 28 + 16 = 3E
Now in the listing it remains to find the line in which everything fell. To do this, we go down the function until the moment we see the resulting offset. In this case, you must not go beyond the scope of the function itself. If you go beyond the scope, then there may be several reasons for this:
1. The source differs from the one that was when building the program.
2. Somewhere in the bias calculations, we made a mistake and counted incorrectly.
3. What did we generally consider.
1.
2.
In both cases, we got the same row number 6. Without paying attention to the assembler, we look at the row number in the 3rd column, 000006. If you want, you can continue to search for an error on this listing, but I would go to the source.
Further I will not give 2 versions without XPLINK and with XPLINK, I will limit myself to the first.
If we did not have stderr output with information that was higher (I mean the error message is output), then this can also be found in the CEE dump file. To do this, open the CEE dump and look for the very first table.
In this table, we look for Exception in the last row, this is our function, where the program fell, in the Enrty column the name of the function, in the E Offset column - the offset at which the program fell.
In the same table, we see a full stack of functions (Entry column). In the same way, we can determine in which line of the calling function the call to the called function was.
What to do if our code does not show where the error is.
To do this, in the CEE dump, we look at the assembler code (A description of the assembler instructions for z / OS can be found in the Principal of Operation, chapter 7).
As you can see, this instruction tries to write 0 to the address (r2 + 0). Instead of zero, there can be any other offset, but in this case 0. The question arises, but what about r2 in our register.
To do this, in the CEE dump, there are printouts of registers and memory sections to which these registers point.
or can still be found further
In the document below I will try to describe how to find the line where the program crashes by CEE dump.
Below is a short program that is written so that it will certainly fall as soon as it is launched. I did not complicate the program, and for C ++ programmers it seems too simple, but the task is not to find the error immediately in the code, but to come to it through the listings. In reality, programs are much larger and longer.
int f1(int a, char *b)
{
char *value = 0L;
value[0] = '\0';
return(0);
}
int main()
{
f1(1, "");
return(1);
}
To find the line, you need to compile the program with a listing, it is done like this
1. without XPLINK
c++ -Wc,list'(t1.list)' -c t1.C
c++ -o t1 t1.o
2.c XPLINK
c++ -Wc,xplink -Wc,list'(t2.list)' -c t1.C
c++ -Wl,xplink -o t2 t1.o
The key is -Wc, list '(t1.list)', it forces the listing to be generated for this file.
In the Makefile, this is added like this.
# Here is where we get convlit(iso8859-1) and __LIBASCII
#
XCCFLAGS = $(CCFLAGS_ASCII) -Wc,list'($*.list)'
XCPPFLAGS = $(CPPFLAGS_ASCII) -Wc,list'($*.list)'
The first line for C programs, the second in C ++.
Run the program, received a dump and messages below.
/u/mddegt/sb/omni/cpp/test:>t1
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From entry point f1(int,char*) at compile unit offset +0000002A at entry offset +0000002A at address 1000A95A.
Segmentation fault
/u/mddegt/sb/omni/cpp/test:>t2
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From entry point f1(int,char*) at compile unit offset +00000016 at entry offset +00000016 at address 100063D6.
Segmentation fault
If you do not go into details, then already here we have the information we need. We see that the program fell into the f1 function, and we have an offset. Now we need a listing that was generated in the same directory as the source (t1.list, t2.list. T1 for a program without XPLINK, and t2 for a program with XPLINK).
In the listing, we look for the beginning of the function in which the program crashed.
1. without XPLINK
15694A01 V1 R6 z/OS C++ t1.C: f1(int,char*) 02/14/06 15:42:32 3
OFFSET OBJECT CODE LINE# FILE# P S E U D O A S S E M B L Y L I S T I N G
000001 | * void f1();
000002 | *
000003 | * int f1(int a, char *b)
f1(int,char*)
000018 000003 | DS 0D
000018 47F0 F001 000003 | B 1(,r15)
00001C 01C3C5C5 CEE eyecatcher
000020 000000C8 DSA size
000024 000000D8 =A(PPA1-f1(int,char*))
000028 5050 D028 000003 | ST r5,40(,r13)
00002C 5850 D04C 000003 | L r5,76(,r13)
000030 End of Prolog
2. with XPLINK
15694A01 V1 R6 z/OS C++ t1.C: f1(int,char*) 02/14/06 16:02:24 3
OFFSET OBJECT CODE LINE# FILE# P S E U D O A S S E M B L Y L I S T I N G
000001 | * void f1();
000002 | *
000003 | * int f1(int a, char *b)
000018 @1L0 DS 0D
000018 00C300C5 =F'12779717' XPLink entrypoint marker
00001C 00C500F1 =F'12910833'
000020 00000090 =F'144'
000024 00000088 =F'136'
f1(int,char*)
000028 000003 | DS 0D
000028 9067 4788 000003 | STM r6,r7,1928(r4)
00002C End of Prolog
The first column is the offset, in front of it is the code that was generated, the third is the line number in the source file.
As you can see, a simple program without XPLINK starts immediately, and it has an offset of 000018, and functions with XPLINK first have a header (blue), and then a program, and it starts with an offset of 000028. Thus, we calculated the initial offset of the function. Now to the initial offset add the offset where our program crashed. All calculations are in hexadecimal form.
1. 18 + 2A = 42
2. 28 + 16 = 3E
Now in the listing it remains to find the line in which everything fell. To do this, we go down the function until the moment we see the resulting offset. In this case, you must not go beyond the scope of the function itself. If you go beyond the scope, then there may be several reasons for this:
1. The source differs from the one that was when building the program.
2. Somewhere in the bias calculations, we made a mistake and counted incorrectly.
3. What did we generally consider.
1.
000034 5020 50C0 000003 | ST r2,b(,r5,192)
000004 | * {
000005 | * .char *value = 0L;
000038 4110 0000 000005 | LA r1,0
00003C 1821 000005 | LR r2,r1
00003E 5020 50C4 000005 | ST r2,value(,r5,196)
000006 | * .value[0] = '\0';
000042 9200 2000 000006 | MVI (char)(r2,0),0
000007 | * .return(0);
000008 | * }
000046 000008 | @1L6 DS 0H
2.
000030 5020 4844 000003 | ST r2,b(,r4,2116)
000004 | * {
000005 | * .char *value = 0L;
000034 4130 0000 000005 | LA r3,0
000038 1813 000005 | LR r1,r3
00003A 5010 47E0 000005 | ST r1,value(,r4,2016)
000006 | * .value[0] = '\0';
00003E 9200 1000 000006 | MVI (char)(r1,0),0
000007 | * .return(0);
000008 | * }
000042 000008 | @1L6 DS 0H
In both cases, we got the same row number 6. Without paying attention to the assembler, we look at the row number in the 3rd column, 000006. If you want, you can continue to search for an error on this listing, but I would go to the source.
Further I will not give 2 versions without XPLINK and with XPLINK, I will limit myself to the first.
If we did not have stderr output with information that was higher (I mean the error message is output), then this can also be found in the CEE dump file. To do this, open the CEE dump and look for the very first table.
Traceback:
DSA Addr Program Unit PU Addr PU Offset Entry E Addr E Offset Statement Load Mod Service Status
10020CF0 CEEHDSP 046C0B00 +000048DA CEEHDSP 046C0B00 +000048DA CEEPLPKA UK10749 Call
100202B0 1000A930 +0000002A f1(int,char*)
1000A930 +0000002A *PATHNAM Exception
10020210 1000A968 +0000006E main 1000A968 +0000006E *PATHNAM Call
100200F8 044EFCB6 +000000B4 EDCZMINV 044EFCB6 +000000B4 CEEEV003 Call
10020030 CEEBBEXT 046C69E8 +000001A6 CEEBBEXT 046C69E8 +000001A6 CEEPLPKA HLE7709 Call
In this table, we look for Exception in the last row, this is our function, where the program fell, in the Enrty column the name of the function, in the E Offset column - the offset at which the program fell.
In the same table, we see a full stack of functions (Entry column). In the same way, we can determine in which line of the calling function the call to the called function was.
What to do if our code does not show where the error is.
To do this, in the CEE dump, we look at the assembler code (A description of the assembler instructions for z / OS can be found in the Principal of Operation, chapter 7).
000006 | * .value[0] = '\0';
000042 9200 2000 000006 | MVI (char)(r2,0),0
As you can see, this instruction tries to write 0 to the address (r2 + 0). Instead of zero, there can be any other offset, but in this case 0. The question arises, but what about r2 in our register.
To do this, in the CEE dump, there are printouts of registers and memory sections to which these registers point.
Condition Information for Active Routines
Condition Information for (DSA address 100202B0)
CIB Address: 10021630
Current Condition:
CEE3204S The system detected a protection exception (System Completion Code=0C4).
Location:
Program Unit: Entry: f1(int,char*)
Statement: Offset: +0000002A
Machine State:
ILC..... 0004 Interruption Code..... 0004
PSW..... 078D1400 9000A95E
GPR0..... 100087E8 GPR1..... 00000000 GPR2..... 00000000 GPR3..... 00000001
GPR4..... 1000A9A8 GPR5..... 100202B0 GPR6..... 1000AAAC GPR7..... 1000A0F0
GPR8..... 00000030 GPR9..... 80000000 GPR10.... 844EFCAA GPR11.... 846C69E8
GPR12.... 1001A7D0 GPR13.... 100202B0 GPR14.... 9000A9D8 GPR15.... 1000A930
FPC...... 00000000
FPR0..... 4DBE5D7C A198209F FPR1..... 00000000 00000000
FPR2..... 00000000 00000000 FPR3..... 00000000 00000000
FPR4..... 00000000 00000000 FPR5..... 00000000 00000000
FPR6..... 00000000 00000000 FPR7..... 00000000 00000000
FPR8..... 00000000 00000000 FPR9..... 00000000 00000000
FPR10.... 00000000 00000000 FPR11.... 00000000 00000000
FPR12.... 00000000 00000000 FPR13.... 00000000 00000000
FPR14.... 00000000 00000000 FPR15.... 00000000 00000000
Storage dump near condition, beginning at location: 1000A94A
+000000 1000A94A 50BC5020 50C04110 00001821 502050C4 92002000 5850D028 47F0E004 070747F0 |&.&.&.......&.&Dk....&...0.....0|
or can still be found further
f1(int,char*) (DSA address 100202B0): UPSTACK DSA Saved Registers: GPR0..... 100087E8 GPR1..... 00000000 GPR2..... 00000000 GPR3..... 00000001 GPR4..... 1000A9A8 GPR5..... 100202B0 GPR6..... 1000AAAC GPR7..... 1000A0F0 GPR8..... 00000030 GPR9..... 80000000 GPR10.... 844EFCAA GPR11.... 846C69E8 GPR12.... 1001A7D0 GPR13.... 100202B0 GPR14.... 9000A9D8 GPR15.... 1000A930 GPREG STORAGE: Storage around GPR0 (100087E8) -0020 100087C8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 |................................| +0000 100087E8 1001C028 00000000 00000000 00000000 10017728 10017732 00000000 00000000 |................................| +0020 10008808 00000000 00000000 00000000 00000000 00000001 00000000 00000001 00000000 |................................| Storage around GPR1 (00000000) +0000 00000000 Inaccessible storage. +0020 00000020 Inaccessible storage. +0040 00000040 Inaccessible storage. Storage around GPR2 (00000000) +0000 00000000 Inaccessible storage. +0020 00000020 Inaccessible storage. +0040 00000040 Inaccessible storage. Storage around GPR3 (00000001) -0001 00000000 Inaccessible storage. +001F 00000020 Inaccessible storage. +003F 00000040 Inaccessible storage. Storage around GPR4 (1000A9A8) -0020 1000A988 05404140 401E07F4 90E5D00C 58E0D04C 4100E0A0 5500C314 4140F040 4720F014 |. . ..4.V.....<......C.. 0 ..0.| +0000 1000A9A8 5000E04C 9210E000 50D0E004 18DE5800 C1F45000 D098C050 00000021 5800D098 |&..
Please note that the register listing should be related to this function.
Looking at the values of the registers, we get r2 = 0 and therefore (0 + 0). Now everything has become clear, the program is trying to write data to address 0, which is inaccessible neither for reading nor for writing.
That's all, now it remains to find in the source code why this is happening, why value points to 0, but that's another topic.
References: z / Architecture Principles of Operation