JunOS update on EX4500 switches in VirtualChassis - what could go wrong? Part 2

    So, without delaying the matter, I publish the second part of the earlier post . I express my gratitude for the publication - it is nice that the article interested you and the topic found a continuation.

    Let me remind you that in the last part I settled on the fact that after rebooting one of the VC devices did not work properly. As was rightly noted in one of the comments, it turns out that after all this I went home. No, the first part describes about 20 minutes of my almost five-hour saga. Fastened? Go!

    After the reboot, it is not clear what happened and whether it happened, but most importantly, the client traffic has gone. I am connected via the dedicated management Ethernet interface and the first surprise is that member1 became the main RE:

    login as: user
    user@switch password:
    --- JUNOS 12.3R12.4 built 2016-01-20 04:27:51 UTC
    {master:1}
    user@switch>

    In principle, this happens and is not scary, since I have the same devices with the pre-provisioned VC configuration and any of them can be a wizard. The OS has been updated and this is good. But this is no longer good:

    user@switch> show chassis routing-engine
    Routing Engine status:
    Slot 1:
    Current state Master
    DRAM 1024
    Memory utilization 45 percent
    CPU utilization:
    User 14 percent
    Background 0 percent
    Kernel 11 percent
    Interrupt 1 percent
    Idle 74 percent
    Model EX4500-40F
    Serial ID
    Start time 2016-06-02 01:28:45
    Uptime 34 minutes, 55 seconds
    Last reboot reason Router rebooted after a normal shutdown.
    Load averages: 1 minute 5 minute 15 minute
    0.59 0.80 0.66
    {master:1}
    user@switch>

    The device sees only one RE, and there should be two of them. Further investigation only confirms that the non-burning LEDs are not without reason:

    user@switch> show virtual-chassis
    Preprovisioned Virtual Chassis
    Virtual Chassis ID:
    Virtual Chassis Mode: Enabled
    Mstr Mixed Neighbor List
    Member ID Status Serial No Model prio Role Mode ID Interface
    0 (FPC 0) Inactive ХХХХХ ex4500-40f 129 Linecard N 1 vcp-1
    1 vcp-0
    1 (FPC 1) Prsnt ХХХХХ ex4500-40f 129 Master* N 0 vcp-1
    0 vcp-0
    {master:1}
    user@switch>

    The first device, member0, is recognized as Linecard and has the status Inactive - this means that it does not take an active part in the virtual chassis. Dedicated stack interfaces (vcp-1 and vcp-0) are active, so you can try local connection:

    Connection and verification
    {master: 1}
    user @ switch> request session member 0
    --- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
    {linecard: 0}
    user @ switch> show system storage
    fpc0:
    - Filesystem Size Used Avail Capacity Mounted on
    / dev / da0s1a 370M 142M 198M 42% /
    devfs 1.0K 1.0K 0B 100% / dev
    / dev / md0 37M 37M 0B 100% / packages / mnt / jbase
    / dev / md1 12M 7.3M 3.6M 67 % / packages / mfs-jcrypto-ex
    / dev / md2 22M 22M 0B 100% / packages / mnt / jcrypto-ex- 11.1R3.5
    / dev / md3 8.7M 4.1M 3.9M 51% / packages / mfs-jdocs- ex
    / dev / md4 6.3M 6.3M 0B 100% / packages / mnt / jdocs-ex- 11.1R3.5
    / dev / md5 64M 61M -1.4M 102% / packages / mfs-jkernel-ex
    / dev / md6 162M 162M 0B 100% /packages/mnt/jkernel-ex-11.1R3.5
    / dev / md7 13M 8.5M 3.5M 71% / packages / mfs-jpfe-ex45x
    / dev / md8 24M 24M 0B 100% /packages/mnt/jpfe-ex45x-11.1R3.5
    / dev / md9 20M 15M 2.9M 84% / packages / mfs-jroute-ex
    / dev / md10 47M 47M 0B 100% /packages/mnt/jroute-ex-11.1 R3.5
    / dev / md11 16M 11M 3.2M 78% / packages / mfs-jswitch-ex
    / dev / md12 35M 35M 0B 100% /packages/mnt/jswitch-ex-11.1R3.5
    / dev / md13 12M 7.8M 3.6M 68% / packages / mfs-jweb-ex
    / dev / md14 22M 22M 0B 100% /packages/mnt/jweb-ex-11.1R3.5
    / dev / md15 126M 8.0K 116M 0% / tmp
    / dev / da0s3e 243M 4.4M 219M 2% / var
    / dev / da0s3d 727M 130K 668M 0% / var / tmp
    / dev / da0s4d 123M 492K 113M 0% / config
    / dev / md16 118M 14M 95M 13% / var / rundb
    procfs 4.0K 4.0K 0B 100% / proc
    / var / jail / etc 243M 4.4M 219M 2% /packages/mnt/jweb-ex-11.1R3.5/jail / var / etc
    / var / jail / run 243M 4.4M 219M 2% /packages/mnt/jweb-ex-11.1R3.5/jail/var/run
    / var / jail / tmp 243M 4.4M 219M 2% / packages / mnt / jweb-ex-11.1R3.5 / jail / var / tmp
    / var / tmp 727M 130K 668M 0% /packages/mnt/jweb-ex-11.1R3.5/jail/var/tmp/uploads
    devfs 1.0K 1.0 K 0B 100% /packages/mnt/jweb-ex-11.1R3.5/jail/dev

    fpc1:
    - Filesystem Size Used Avail Capacity Mounted on
    / dev / da0s2a 363M 130M 204M 39% /
    devfs 1.0K 1.0K 0K 100% / dev
    / dev / md0 69M 69M 0B 100% / packages / mnt / jbase
    / dev / md1 5.8M 1.1M 4.2M 21% / packages / mfs-fips-mode-powerpc
    / dev / md2 2.9M 2.9M 0B 100% / packages / mnt / fips-mode-powerpc- 12.3R12.4
    / dev / md3 9.1M 4.4M 3.9M 53% / packages / mfs-jcrypto-ex
    / dev / md4 12M 12M 0B 100% / packages / mnt / jcrypto-ex- 12.3R12.4
    / dev / md5 8.1M 3.5M 4.0M 47% / packages / mfs-jdocs-ex
    / dev / md6 6.2M 6.2M 0B 100% / packages / mnt / jdocs-ex-12.3R12.4
    / dev / md7 43M 39M 616K 98% / packages / mfs-jkernel-ex
    / dev / md8 109M 109M 0B 100% /packages/mnt/jkernel-ex-12.3R12. 4
    / dev / md9 12M 7.9M 3.6M 69% / packages / mfs-jpfe-ex45x
    / dev / md10 22M 22M 0B 100% /packages/mnt/jpfe-ex45x-12.3R12.4
    / dev / md11 17M 12M 3.2M 79% / packages / mfs-jroute-ex
    / dev / md12 38M 38M 0B 100% /packages/mnt/jroute-ex-12.3R12.4
    / dev / md13 12M 7.2M 3.6M 67% / packages / mfs-jswitch-ex
    / dev / md14 21M 21M 0B 100% /packages/mnt/jswitch-ex-12.3R12.4
    / dev / md15 14M 9.5M 3.4M 73% / packages / mfs-jweb-ex
    / dev / md16 25M 25M 0B 100% /packages/mnt/jweb-ex-12.3R12.4
    / dev / da0s3e 243M 20M 204M 9% / var
    / dev / md17 252M 12K 232M 0% / tmp
    / dev / da0s3d 727M 107M 561M 16% / var / tmp
    / dev / da0s4d 123M 494K 113M 0% / config
    / dev / md18 118M 22M 86M 20% / var / rundb
    procfs 4.0K 4.0K 0B 100% / proc
    / var / jail / etc 243M 20M 204M 9% /packages/mnt/jweb-ex-12.3R12.4/jail/var/etc
    / var / jail / run 243M 20M 204M 9% / packages / mnt / jweb-ex -12.3R12.4 / jail / var / run
    / var / jail / tmp 243M 20M 204M 9% /packages/mnt/jweb-ex-12.3R12.4/jail/var/tmp
    / var / tmp 727M 107M 561M 16% /packages/mnt/jweb-ex-12.3R12.4/jail/var/tmp/uploads
    devfs 1.0K 1.0K 0B 100% /packages/mnt/jweb-ex-12.3R12. 4 / jail / dev

    {linecard: 0}
    user @ switch> exit
    rlogin: connection closed
    {master: 1}
    user @ switch>

    That's it! The OS was updated only on the second device, and on the first - the old one (pay attention to the version of the firmware file FPC0 and FPC1), so the VC logic deactivated it. One way or another, the device is there and you can try to update it again. One problem - when updating, I followed the guides from Juniper and put the image in / var / tmp, respectively, it is now empty there and you need to fill the image again. I focus on this switch and try to update the system / reboot only it several times (member1 continues to work):

    {master:1}
    user@switch> request system software add /var/tmp/jinstall-XXX.tgz validate member 0
    user@switch> request system reboot member 0

    At the end of the download / update process, each time I see:
    Installing disk0s3d:/jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
    Verified jinstall-ex-4500-12.3R12.4-domestic.tgz signed by PackageProduction_12_ 3_0
    mode = 040700, inum = 38, fs = /instrootmnt/var
    panic: ffs_valloc: dup alloc
    ###Entering boot mastership relinquish phase
    KDB: enter: panic
    ###Entering boot mastership relinquish phase
    [thread pid 316 tid 100041 ]
    Stopped at kdb_enter+0x1a0: addis r3, r0, -0x7fa4
    db>

    Despite the lack of knowledge on the Unix on which JunOS is based, the line “KDB: enter: panic” is not encouraging. Among other things, the system falls into system debugging mode (db>), and this is very bad. For reference: Juniper has a mode of a console familiar to everyone, where the working hardware is configured, you can go to the Unix command line as root and more; there is loader> bootloader mode for restoring and filling the operating system image, roughly corresponding to rommon> Cisco; and there is a debug mode db>, which appears when there are problems with the physical components of the structure. You can do very little in this mode if you are not a Juniper TAC engineer. At that moment, I don’t really understand what it is and, as a proud Windows user, I try to click "next":

    db> help
    DDB Quick Help
    -------------------
    Type 'c' to continue, 'reset' or 'panic' to restart.

    print p examine x search set write
    w delete d break dwatch watch dhwatch
    hwatch step s continue c until next
    match trace alltrace where bt call show
    ps gdb reset kill watchdog thread panic
    ddbdumpsys dumpsys halt reboot
    db> c
    Uptime: 2m41s
    Cannot dump. No dump device defined.
    Automatic reboot in 15 seconds - press a key on the console to abort
    Rebooting...

    ...Много вывода при перезагрузке...

    ***** FILE SYSTEM MARKED CLEAN *****
    switch (ttyu0)
    login: user
    Logging to master
    ...
    Connection to master failed, enabling local login
    Password:
    --- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
    {linecard:0}
    user@switch>

    Oh miracle - the system boots, albeit with the old version. At that time, I did not realize that this old version was loaded from the backup partition (slice alternate), since the updated version was written to the main partition and in my case it could not be loaded from it. Therefore, it is so important to update the bootloader whenever possible - this is another saving straw in case of problems. As a remark: also pay attention to the lines “Logging to master ... Connection to master failed”. All devices combined in VC have a single management console, that is, when connecting, for example via SSH, we immediately get to the master device console. Since in my case VC is inoperative, I get into the local hardware control mode.

    In the process, I come up with uploading an OS image to a workable RE and copying it between the VC members - this is faster and there is no need to constantly get distracted by WinSCP. This works even in my case, since the communication channels between the devices are active.

    user@switch> file copy fpc1:/var/tmp/jinstall-XXX.tgz fpc0:/var/tmp/jinstall-XXX.tgz

    Nevertheless, an attempt to update and reboot each time gives the same result - I find myself in system debug mode with the subsequent opportunity to download the old version. Accordingly, the problem is constant and I will not achieve anything by repeating the steps. Then I came up with the idea of ​​going - after all, I have a device with a working system (member1) and there is a flash drive on which you can roll up a snapshot and boot from it. So I do:

    {master:1}
    umass1: SanDisk Corporation U3 Cruzer Micro, rev 2.00/0.10, addr 4
    da1 at umass-sim1 bus 1 target 0 lun 0
    da1: Removable Direct Access SCSI-2 device
    da1: 40.000MB/s transfers
    da1: 973MB (1994385 512 byte sectors: 64H 32S/T 973C)
    user@switch> request system snapshot local partition media external
    user@switch> show system snapshot media external
    fpc0:
    --------------------------------------------------------------------------
    error: external media missing or invalid

    fpc1:
    --------------------------------------------------------------------------
    Information for snapshot on external (/dev/da1s1a) (backup)
    Creation date: Jun 2 02:28:20 2016
    JUNOS version on snapshot:
    jbase : 11.1R3.5
    jkernel-ex: 11.1R3.5
    jcrypto-ex: 11.1R3.5
    jdocs-ex: 11.1R3.5
    jswitch-ex: 11.1R3.5
    jpfe-ex45x: 11.1R3.5
    jroute-ex: 11.1R3.5
    jweb-ex: 11.1R3.5
    Information for snapshot on external (/dev/da1s2a) (primary)
    Creation date: Jun 2 02:29:21 2016
    JUNOS version on snapshot:
    jbase : ex-12.3R12.4
    jkernel-ex: 12.3R12.4
    jcrypto-ex: 12.3R12.4
    jdocs-ex: 12.3R12.4
    jswitch-ex: 12.3R12.4
    jpfe-ex45x: 12.3R12.4
    jroute-ex: 12.3R12.4
    jweb-ex: 12.3R12.4
    fips-mode-powerpc: 12.3R12.4

    Pay attention to the messages when connecting a flash drive - it is defined as a system device da1, it will be needed in the future. The snapshot on the external flash drive repeats that on the internal storage of the device - version 12.3 on the main partition (/ dev / da1s2a) and 11.1 - on the backup (/ dev / da1s1a). Slice names can also come in handy if you want to boot the system from a specific section. I insert the USB flash drive into the problem device and continue:

    user@switch> request session member 0
    --- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
    {linecard:0}
    user@switch> request system reboot member 0 media external
    Reboot the system ? [yes,no] (no) yes

    Here, again, as a precaution, I went into the local device control session, most likely it was possible to reload member0 from the wizard console. When I restart, I see a constantly cyclic sequence:

    U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
    Board: EX4500-40F 10.4
    EPLD: Version 6.2 (0x81)
    DRAM: Initializing (1024 MB)
    FLASH: 8 MB
    Firmware Version: 01.00.00
    USB: scanning bus for devices... 3 USB Device(s) found
    scanning bus for storage devices... 1 Storage Device(s) found
    ELF file is 32 bit
    Consoles: U-Boot console
    FreeBSD/PowerPC U-Boot bootstrap loader, Revision 2.4
    (hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011)
    Memory: 1024MB
    bootsequencing is enabled
    bootsuccess is not set
    new boot device = disk2
    can't load '/kernel'
    can't load '/kernel.old'
    Press Enter to stop auto bootsequencing and to enter loader prompt.

    Watchdog timed out. Resetting the board.

    The switch does not move anywhere further than these repeated lines. What the?!? Can't find the core? After a while, I pay attention to the penultimate line, press Enter and get into the loader:

    loader> ?
    Available commands:
    bcachestat get disk block cache stats
    boot boot a file or loaded kernel
    autoboot boot automatically after a delay
    help detailed help
    ? list commands
    show show variable(s)
    set set a variable
    unset unset a variable
    echo echo arguments
    read read input from the terminal
    more show contents of a file
    nextboot set next boot device
    lsdev list all devices
    install install JUNOS
    include read commands from a file
    ls list files
    load load a kernel or module
    unload unload all modules
    lsmod list loaded modules
    export export variables to U-Boot environment
    save save U-Boot environment
    heap show heap usage

    It's a bit ridiculous, but still better than a cyclic reboot. The loader mode itself is just designed to restore the system, that is, I'm in the right place. The operating time has exceeded 2 hours ... I try different options for the location of the system image and updates - without result.

    loader> install /var/tmp/jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
    invalid URL
    loader> install --format file:///jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
    cannot open package (error 22)
    loader> install --format file:///jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
    Device NOT ready
    Request Sense returned 06 28 00
    cannot open package (error 5)

    Actually, these lines should work, but for some reason they don’t work - either at that time I wasn’t thinking anything, or something else. I see the same cyclic reboot and swearing at the lack of a kernel. In the process of constant rebooting, another interesting thing pops up:

    Firmware Version: 01.00.00
    USB: scanning bus for devices... 3 USB Device(s) found
    scanning bus for storage devices... 1 Storage Device(s) found
    ELF file is 32 bit
    Consoles: U-Boot console
    FreeBSD/PowerPC U-Boot bootstrap loader, Revision 2.4
    (hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011)
    Memory: 1024MB
    bootsequencing is enabled
    bootsuccess is not set
    new boot device = disk2

    For me at that moment this is nothing more than an assumption, but bearing in mind that Juniper means devices with 0, it seems strange to me to have “disk2” - I have one flash drive. In addition, when I inserted the flash drive, it was recognized as da1. If you go back a little, you can see that the device tried to boot from 2 disks immediately after rebooting from the console (when I indicated the external USB flash drive as a boot device), but until now I did not notice this. We return to the loader and confirm the fears, there is no disk 2, and the flash drive is a zero device:

    loader> lsdev
    disk devices:
    disk0 - USB storage device 0
    net devices:
    net0:
    loader> nextboot disk0:
    loader> reboot
    Resetting...

    All? Yes, no matter how! The system again tries to boot from disk 2, but now I feel that I'm on the right track. Along the way, I sort through the nearby options with different slices on a flash drive (nextboot diskXsY), with no result. Already almost desperate, I find information that the boot device should be set as an environment variable from U-boot mode. I don’t know how to describe this fourth mode and what can be done there, but you can get there by interrupting the boot process by pressing Ctrl + C at the very beginning when the system polls for USB devices (USB: scanning bus for devices ...). The first line contains INTERRUPT in the <> delimiters, but markup and fonts move out because of it, so I removed the delimiters:

    => INTERRUPT
    => setenv loaddev disk1
    => saveenv
    Saving Environment to Flash...
    . done
    Un-Protected 1 sectors
    Erasing Flash...
    . done
    Erased 1 sectors
    Writing to Flash... writing to flash...
    done
    . done
    Protected 1 sectors
    => reset

    ...Перезагрузка...

    ...
    Boot media /dev/da1 has dual root support
    WARNING: JUNOS versions running on dual partitions are not same
    ** /dev/da1s1a
    FILE SYSTEM CLEAN; SKIPPING CHECKS
    clean, 274948 free (84 frags, 34358 blocks, 0.0% fragmentation)
    switch (ttyu0)
    login: user
    Logging to master
    ...
    Connection to master failed, enabling local login
    Password:

    --- JUNOS 12.3R12.4 built 2016-01-20 04:27:51 UTC
    warning: This chassis is operating in a non-master role as part of a virtual-chassis (VC) system.
    warning: Use of interactive commands should be limited to debugging and VC Port operations.
    warning: Full CLI access is provided by the Virtual Chassis Master (VC-M) chassis.
    warning: The VC-M can be identified through the show virtual-chassis status command executed at this console.
    warning: Please logout and log into the VC-M to use CLI.
    {linecard:1}
    user@switch>

    WARNING: cli has been replaced by an updated version:
    CLI release 12.3R12.4 built by builder on 2016-01-20 03:55:45 UTC
    Restart cli using the new version ? [yes,no] (yes)

    Restarting cli ...
    {master:0}
    user@switch>

    Let's see what I saw after the reboot:

    “WARNING: JUNOS versions running on dual partitions are not the same” is not scary and expected, because the new version is contained only in the main slice of the device.

    “Connection to master failed ...” and “warning: This chassis is operating in a non-master role ...” are not scary, since VC needs time to restore communication between members and synchronize the configuration.

    After several minutes of waiting, the system itself asks to restart the console (WARNING: cli has been replaced by an updated version) and now a new version is loaded on the correct RE.

    We check:

    user@switch> show chassis routing-engine
    Routing Engine status:
    Slot 0:
    Current state Master
    DRAM 1024
    Memory utilization 50 percent
    CPU utilization:
    User 43 percent
    Background 0 percent
    Kernel 24 percent
    Interrupt 1 percent
    Idle 32 percent
    Model EX4500-40F
    Serial ID
    Start time 2016-06-02 03:43:20
    Uptime 3 minutes, 22 seconds
    Last reboot reason Router rebooted after a normal shutdown.
    Load averages: 1 minute 5 minute 15 minute
    2.40 1.12 0.46
    Routing Engine status:
    Slot 1:
    Current state Backup
    DRAM 1024
    Memory utilization 44 percent
    CPU utilization:
    User 40 percent
    Background 0 percent
    Kernel 30 percent
    Interrupt 1 percent
    Idle 28 percent
    Model EX4500-40F
    Serial ID
    Start time 2016-06-02 01:28:45
    Uptime 2 hours, 17 minutes, 57 seconds
    Last reboot reason Router rebooted after a normal shutdown.
    Load averages: 1 minute 5 minute 15 minute
    0.49 0.46 0.44

    {master:0}
    user@switch>
    show virtual-chassis

    Preprovisioned Virtual Chassis
    Virtual Chassis ID:
    Virtual Chassis Mode: Enabled
    Mstr Mixed Neighbor List
    Member ID Status Serial No Model prio Role Mode ID Interface
    0 (FPC 0) Prsnt ex4500-40f ХХХХ 129 Master* N 1 vcp-1
    1 vcp-0
    1 (FPC 1) Prsnt ex4500-40f ХХХХ 129 Backup N 0 vcp-1
    0 vcp-0

    {master:0}

    Victory! Complete and unconditional! To say that I was pleased with myself was to say nothing, the ChSV simply went through the roof. Despite the fact that my work lasted about 4 hours, it was not so important, as the clients did not feel it. I not only gave myself a virtual medal, but also saved a lot of money for my company. I got so many impressions during these 4 hours that it then took many days (and beer) to put everything together and understand the whole picture.

    Now it remains only to make snapshots on the internal storage in the main section and, after a week or two - in the backup. Why in a week - to run in the new version in production, since downloading the old version of the system from the backup partition is much easier than downgrading it on the entire device.

    We analyze the situation.

    According to Juniper TAC, upgrade problems were due to damage to the primary boot partition. Nothing can be done with this and the switch must be taken under warranty. I still really hope that the problem was caused by damage to the file system (incorrect reboot or the like) and was fixed during the upgrade process (Un-Protected 1 sectors Erasing Flash .... done) when I set the environment variable.

    What fright the device wanted to boot from disk2, if no one explicitly pointed to it and it was not in the system - it is not clear, TAC also found it difficult to comment. In the logs, you could even trace that disk2 appears from nowhere (note that new boot device = disk1s2 changes to new boot device = disk2):

    Change boot device
    user @ switch> request system reboot member 0 media external
    Reboot the system? [yes, no] (no) yes
    Rebooting fpc0
    *** FINAL System shutdown message from root @ switch *** System going down IMMEDIATELY {linecard: 0}
    iuriia @ CORE> JWaiting (max 300 seconds) for system process `vnlru_mem ' to stop ... done
    Waiting (max 300 seconds) for system process `vnlru 'to stop ... done
    Waiting (max 300 seconds) for system process` bufdaemon' to stop ... done
    Waiting (max 300 seconds) for system process `syncer 'to stop ...
    Syncing disks, vnodes remaining ... 2 2 2 0 1 1 1 0 0 0 0 0 done
    syncing disks ... All buffers synced.
    Uptime: 23m53s
    recorded reboot as normal shutdown
    Rebooting ...
    U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
    Board: EX4500-40F 10.4
    EPLD: Version 6.2 (0x82)
    DRAM: Initializing (1024 MB)
    FLASH: 8 MB
    Firmware Version: 01.00.00
    USB: scanning bus for devices ... 3 USB Device (s) found
    scanning bus for storage devices ... 1 Storage Device (s) found
    ELF file is 32 bit
    Consoles: U-Boot console FreeBSD / PowerPC U-Boot bootstrap loader, Revision 2.4 (hmerge @ svl -junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011) Memory: 1024MB bootsequencing is enabled
    bootsuccess is set
    new boot device = disk1s2:

    can't load '/ kernel' can't load '/ kernel .old 'Press Enter to stop auto bootsequencing and to enter loader prompt. Watchdog timed out. Resetting the board.
    U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
    Board: EX4500-40F 10.4
    EPLD: Version 6.2 (0x81)
    DRAM: Initializing (1024 MB)
    FLASH: 8 MB
    Firmware Version: 01.00.00
    USB: scanning bus for devices ... 3 USB Device (s) found
    scanning bus for storage devices ... 1 Storage Device (s) found
    ELF file is 32 bit
    Consoles: U-Boot console FreeBSD / PowerPC U-Boot bootstrap loader, Revision 2.4 (hmerge @ svl -junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011) Memory: 1024MB bootsequencing is enabled
    bootsuccess is not set
    new boot device = disk2

    In fact, this problem increased the time spent by an hour and a half. Yes, the switch also swears at the lack of a kernel, but why then the system tries to use disk2, if the system did not seem to see it in loader> it is not clear. I can assume that if there are problems with the boot, the device tries to cycle through the disks, but again, the system did not see the disk2 device. How and why then the same flash drive in the future successfully loaded the device also raises questions.

    It is possible that I was mistaken here:

    loader> nextboot disk0:
    loader> reboot

    because when you restart the loader’s settings are lost. I had to try “boot” instead of “reboot”, but then I didn’t.

    The new version of the system significantly increased the load on the device. On the old version, the processor load during the day was about 27-30%, after the update - 45-48%, but neither the fairly simple configuration of the device nor the characteristics of the traffic changed. After several remote sessions with Juniper TAC, the reason could not be established - there were speculations about a memory leak and similar problems, but no. Strange, but had to be accepted as a fact.

    An attentive reader could notice that the device names displayed in the loader (disk0) and used to boot successfully (disk1 and then / dev / da1s1a) are different. With what it is connected I will not venture to assert. I can assume that the names change depending on the degree of successful system boot. Loader loaded - received some device names, contact from db> - there will be others; from the CLI we generally call devices through “media external” and “media internal”. In general, so far only an assumption.

    Most of the above calculations and commands I put together in a guide long before the update. After that, I periodically reread and supplement it if possible problems occurred to me. In it there was only db> mode and ==> setenv procedures. It’s clear, to foresee everything did not work out and something did not work as it should. But honestly - without this guide and time for his mental running-in, I would give up. Moreover, it was night work and the sharpness of mind was reduced.

    Backups - although they did not help me much, their presence calmed my conscience and soul. In the worst case, even if the entire internal storage is damaged, I would copy the text config to the console. These two points are a guarantee that you will concentrate on work, and not on analysis of how to return everything to its original state and what to do next.

    Of the significant shortcomings: in the process of work, I launched several PuTTY tabs that write the log to a single file. Then it was very difficult to sort everything out by individual devices and timestamps, it was better to use SecureCRT or run a separate window on different devices, especially since I had enough funds for this.

    And at the end - a picture from the scene. I hope this post will be useful to you. Good luck in upcoming updates!



    PS in the output of the commands I used markup for regular code, which looks worse than markup with a background of the source code of a certain language or BASH. However, the markup “code” allows the selection in bold, which was important for me to highlight interesting places in the output of the commands. If anyone shares how to do both (background + bold inside), I will be grateful and promise to use it in the future.
    Update: it turned out that in different browsers and versions, the markup of the code is displayed differently. I’m troubled to smoke further, how to make the text more visual and readable.

    Also popular now: