How to find the line in which the mainframe program crashes by CEE dump

Once I had to master the work in C ++ with the mainframe, and there was a problem how to figure out where the program crashes and what is the reason. Immediately make a reservation that everything relates to programming on mainframes under the z / OS operating system in USS. On the one hand, this is elementary, but finding all this in the IBM documentation is not so simple. In addition, you must at least be able to read HLASM.

In the document below I will try to describe how to find the line where the program crashes by CEE dump.

Below is a short program that is written so that it will certainly fall as soon as it is launched. I did not complicate the program, and for C ++ programmers it seems too simple, but the task is not to find the error immediately in the code, but to come to it through the listings. In reality, programs are much larger and longer.

int f1(int a, char *b) 
{
 char *value = 0L;
 value[0] = '\0'; 
 return(0);
}
int main()
{
 f1(1, "");
 return(1);
}


To find the line, you need to compile the program with a listing, it is done like this

1. without XPLINK
c++ -Wc,list'(t1.list)' -c t1.C
c++ -o t1 t1.o


2.c XPLINK
c++ -Wc,xplink -Wc,list'(t2.list)' -c t1.C
c++ -Wl,xplink -o t2 t1.o


The key is -Wc, list '(t1.list)', it forces the listing to be generated for this file.
In the Makefile, this is added like this.
#  Here is where we get convlit(iso8859-1) and __LIBASCII
#
XCCFLAGS = $(CCFLAGS_ASCII) -Wc,list'($*.list)'
XCPPFLAGS = $(CPPFLAGS_ASCII) -Wc,list'($*.list)'


The first line for C programs, the second in C ++.
Run the program, received a dump and messages below.

/u/mddegt/sb/omni/cpp/test:>t1
CEE3204S The system detected a protection exception (System Completion Code=0C4).
         From entry point f1(int,char*) at compile unit offset +0000002A at entry offset +0000002A at address 1000A95A.
Segmentation fault
/u/mddegt/sb/omni/cpp/test:>t2
CEE3204S The system detected a protection exception (System Completion Code=0C4).
         From entry point f1(int,char*) at compile unit offset +00000016 at entry offset +00000016 at address 100063D6.
Segmentation fault


If you do not go into details, then already here we have the information we need. We see that the program fell into the f1 function, and we have an offset. Now we need a listing that was generated in the same directory as the source (t1.list, t2.list. T1 for a program without XPLINK, and t2 for a program with XPLINK).

In the listing, we look for the beginning of the function in which the program crashed.

1. without XPLINK

15694A01 V1 R6 z/OS C++                                                 t1.C: f1(int,char*)       02/14/06 15:42:32            3
 OFFSET OBJECT CODE        LINE#  FILE#    P S E U D O   A S S E M B L Y   L I S T I N G 
                           000001 |       *  void f1();
                           000002 |       *  
                           000003 |       *  int f1(int a, char *b) 
                                          f1(int,char*)
 000018                    000003 |                 DS    0D
 000018  47F0  F001        000003 |                 B     1(,r15)
 00001C  01C3C5C5                                         CEE eyecatcher
 000020  000000C8                                         DSA size
 000024  000000D8                                         =A(PPA1-f1(int,char*))
 000028  5050  D028        000003 |                 ST    r5,40(,r13)
 00002C  5850  D04C        000003 |                 L     r5,76(,r13)
 000030                    End of Prolog


2. with XPLINK

15694A01 V1 R6 z/OS C++                                                 t1.C: f1(int,char*)       02/14/06 16:02:24            3
 OFFSET OBJECT CODE        LINE#  FILE#    P S E U D O   A S S E M B L Y   L I S T I N G                                        
                           000001 |       *  void f1();
                           000002 |       *  
                           000003 |       *  int f1(int a, char *b) 
 000018                                    @1L0     DS    0D
 000018  00C300C5                                         =F'12779717'       XPLink entrypoint marker
 00001C  00C500F1                                         =F'12910833'       
 000020  00000090                                         =F'144'            
 000024  00000088                                         =F'136'            
                                          f1(int,char*)
 000028                    000003 |                 DS    0D
 000028  9067  4788        000003 |                 STM   r6,r7,1928(r4)
 00002C                    End of Prolog


The first column is the offset, in front of it is the code that was generated, the third is the line number in the source file.
As you can see, a simple program without XPLINK starts immediately, and it has an offset of 000018, and functions with XPLINK first have a header (blue), and then a program, and it starts with an offset of 000028. Thus, we calculated the initial offset of the function. Now to the initial offset add the offset where our program crashed. All calculations are in hexadecimal form.

1. 18 + 2A = 42
2. 28 + 16 = 3E

Now in the listing it remains to find the line in which everything fell. To do this, we go down the function until the moment we see the resulting offset. In this case, you must not go beyond the scope of the function itself. If you go beyond the scope, then there may be several reasons for this:
1. The source differs from the one that was when building the program.
2. Somewhere in the bias calculations, we made a mistake and counted incorrectly.
3. What did we generally consider.

1.
 000034  5020  50C0        000003 |                 ST    r2,b(,r5,192)
                           000004 |       *  {
                           000005 |       *  .char *value = 0L;
 000038  4110  0000        000005 |                 LA    r1,0
 00003C  1821              000005 |                 LR    r2,r1
 00003E  5020  50C4        000005 |                 ST    r2,value(,r5,196)
                           000006 |       *  .value[0] = '\0';
 000042  9200  2000        000006 |                 MVI   (char)(r2,0),0
                           000007 |       *  .return(0);
                           000008 |       *  }
 000046                    000008 |        @1L6     DS    0H

2.
 000030  5020  4844        000003 |                 ST    r2,b(,r4,2116)
                           000004 |       *  {
                           000005 |       *  .char *value = 0L;
 000034  4130  0000        000005 |                 LA    r3,0
 000038  1813              000005 |                 LR    r1,r3
 00003A  5010  47E0        000005 |                 ST    r1,value(,r4,2016)
                           000006 |       *  .value[0] = '\0';
 00003E  9200  1000        000006 |                 MVI   (char)(r1,0),0
                           000007 |       *  .return(0);
                           000008 |       *  }
 000042                    000008 |        @1L6     DS    0H

In both cases, we got the same row number 6. Without paying attention to the assembler, we look at the row number in the 3rd column, 000006. If you want, you can continue to search for an error on this listing, but I would go to the source.

Further I will not give 2 versions without XPLINK and with XPLINK, I will limit myself to the first.

If we did not have stderr output with information that was higher (I mean the error message is output), then this can also be found in the CEE dump file. To do this, open the CEE dump and look for the very first table.

  Traceback:
    DSA Addr  Program Unit  PU Addr   PU Offset  Entry         E Addr    E  Offset   Statement  Load Mod  Service  Status
    10020CF0  CEEHDSP       046C0B00  +000048DA  CEEHDSP       046C0B00  +000048DA              CEEPLPKA  UK10749  Call
    100202B0                1000A930  +0000002A  f1(int,char*)
                                                               1000A930  +0000002A              *PATHNAM           Exception
    10020210                1000A968  +0000006E  main          1000A968  +0000006E              *PATHNAM           Call
    100200F8                044EFCB6  +000000B4  EDCZMINV      044EFCB6  +000000B4              CEEEV003           Call
    10020030  CEEBBEXT      046C69E8  +000001A6  CEEBBEXT      046C69E8  +000001A6              CEEPLPKA  HLE7709  Call

In this table, we look for Exception in the last row, this is our function, where the program fell, in the Enrty column the name of the function, in the E Offset column - the offset at which the program fell.

In the same table, we see a full stack of functions (Entry column). In the same way, we can determine in which line of the calling function the call to the called function was.

What to do if our code does not show where the error is.

To do this, in the CEE dump, we look at the assembler code (A description of the assembler instructions for z / OS can be found in the Principal of Operation, chapter 7).

                           000006 |       *  .value[0] = '\0';
 000042  9200  2000        000006 |                 MVI   (char)(r2,0),0

As you can see, this instruction tries to write 0 to the address (r2 + 0). Instead of zero, there can be any other offset, but in this case 0. The question arises, but what about r2 in our register.

To do this, in the CEE dump, there are printouts of registers and memory sections to which these registers point.

  Condition Information for Active Routines
    Condition Information for  (DSA address 100202B0)
      CIB Address: 10021630
      Current Condition:
        CEE3204S The system detected a protection exception (System Completion Code=0C4).
      Location:
        Program Unit:  Entry: f1(int,char*)
        Statement:     Offset: +0000002A
      Machine State:
        ILC..... 0004    Interruption Code..... 0004
        PSW..... 078D1400 9000A95E
        GPR0..... 100087E8  GPR1..... 00000000  GPR2..... 00000000  GPR3..... 00000001
        GPR4..... 1000A9A8  GPR5..... 100202B0  GPR6..... 1000AAAC  GPR7..... 1000A0F0
        GPR8..... 00000030  GPR9..... 80000000  GPR10.... 844EFCAA  GPR11.... 846C69E8
        GPR12.... 1001A7D0  GPR13.... 100202B0  GPR14.... 9000A9D8  GPR15.... 1000A930
        FPC...... 00000000
        FPR0..... 4DBE5D7C  A198209F            FPR1..... 00000000  00000000
        FPR2..... 00000000  00000000            FPR3..... 00000000  00000000
        FPR4..... 00000000  00000000            FPR5..... 00000000  00000000
        FPR6..... 00000000  00000000            FPR7..... 00000000  00000000
        FPR8..... 00000000  00000000            FPR9..... 00000000  00000000
        FPR10.... 00000000  00000000            FPR11.... 00000000  00000000
        FPR12.... 00000000  00000000            FPR13.... 00000000  00000000
        FPR14.... 00000000  00000000            FPR15.... 00000000  00000000
    Storage dump near condition, beginning at location: 1000A94A
      +000000 1000A94A  50BC5020 50C04110 00001821 502050C4  92002000 5850D028 47F0E004 070747F0  |&.&.&.......&.&Dk....&...0.....0|

or can still be found further

    f1(int,char*) (DSA address 100202B0):
      UPSTACK DSA
      Saved Registers:
        GPR0..... 100087E8  GPR1..... 00000000  GPR2..... 00000000  GPR3..... 00000001
        GPR4..... 1000A9A8  GPR5..... 100202B0  GPR6..... 1000AAAC  GPR7..... 1000A0F0
        GPR8..... 00000030  GPR9..... 80000000  GPR10.... 844EFCAA  GPR11.... 846C69E8
        GPR12.... 1001A7D0  GPR13.... 100202B0  GPR14.... 9000A9D8  GPR15.... 1000A930
    GPREG STORAGE:
      Storage around GPR0 (100087E8)
        -0020 100087C8  00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000  |................................|
        +0000 100087E8  1001C028 00000000 00000000 00000000  10017728 10017732 00000000 00000000  |................................|
        +0020 10008808  00000000 00000000 00000000 00000000  00000001 00000000 00000001 00000000  |................................|
      Storage around GPR1 (00000000)
        +0000 00000000    Inaccessible storage.
        +0020 00000020    Inaccessible storage.
        +0040 00000040    Inaccessible storage.
      Storage around GPR2 (00000000)
        +0000 00000000    Inaccessible storage.
        +0020 00000020    Inaccessible storage.
        +0040 00000040    Inaccessible storage.
      Storage around GPR3 (00000001)
        -0001 00000000    Inaccessible storage.
        +001F 00000020    Inaccessible storage.
        +003F 00000040    Inaccessible storage.
      Storage around GPR4 (1000A9A8)
        -0020 1000A988  05404140 401E07F4 90E5D00C 58E0D04C  4100E0A0 5500C314 4140F040 4720F014  |. .  ..4.V.....<......C.. 0 ..0.|
        +0000 1000A9A8  5000E04C 9210E000 50D0E004 18DE5800  C1F45000 D098C050 00000021 5800D098  |&..
Please note that the register listing should be related to this function.

Looking at the values ​​of the registers, we get r2 = 0 and therefore (0 + 0). Now everything has become clear, the program is trying to write data to address 0, which is inaccessible neither for reading nor for writing.

That's all, now it remains to find in the source code why this is happening, why value points to 0, but that's another topic.

References: z / Architecture Principles of Operation

Also popular now: