What to do if the RAM crashes. Anamnesis and treatment methods

    RAM is such a part of the system that is least likely to fail. But spontaneous system reboots with and without BSOD, crashes of games or software, incorrect results of processing tasks in heavy software - all this and much more can be symptoms of problems with it. In fact, such problems arise quite often and are mainly the result of incorrect settings by the user himself, although it is impossible to exclude hardware problems. In this article, we will get acquainted with the current memory modules for desktop systems, talk about possible problems in their work and the reasons why they arise, and also help with diagnostics. Why else and why can memory failures occur? What to do or not to do? Answering these questions



    What does a memory module consist of?


    RAM from the point of view of circuitry is a very simple device, when compared with other electronic components of the system and not take into account the fans (some of them have a simple controller that implements PWM control). What components are the modules assembled from?

    1. Chips themselves are the key elements that determine the speed of memory.
    2. SPD (Serial Presence Detect) - a separate chip containing information about a particular module.
    3. The key is a slot in the printed circuit board so that it is impossible to install modules of the same type in the boards that do not support them.
    4. The circuit board itself.
    5. All kinds of SMD components located on a printed circuit board.



    Of course, the set of components is far from complete. But for minimal memory work, this is enough. What else could be? Most often - radiators. They help cool high-frequency microcircuits operating at high voltage (though not always at high), as well as when overclocking the memory by the user.



    Someone will say that this is marketing and all that. In some cases, yes, but not HyperX. Modules Predator with a clock frequency of 4000 MHz easily heat the radiators to around 43 degrees, which we found in the material about them . By the way, overheating will be discussed today.



    Next up is the backlight. Some manufacturers install one of a certain color, and some - full-fledged RGB, and even with the ability to configure using switches on the modules themselves, so using plug-in cables, as well as motherboard software.



    But, for example, HyperX engineers went further - they implemented infrared sensors on the board, which are required for full synchronization of the backlight.



    We will not delve into this - the material is not about that, and we talked about them earlier, therefore, if anyone is interested, we get acquainted with the video below and read the material on the case further.



    What to be - not to be avoided


    Choosing a budget memory from little-known manufacturers, you get a pig in a poke - such modules can be assembled "on the knee in the basement of Uncle Liao" and not even know what quality control is. In other words, problems can occur the first time you turn it on. Kingston's ValueRAM, of course, doesn’t apply to it, although the price tags are close to the minimum. Given the previous chapter, some users may say that the more components there are, the higher the chance of them breaking. Logically, this cannot be refuted. But HyperX’s confidence in its products (in particular, the Predator RGB modules) is such that it has a lifetime warranty! But so anyway - what could fail? Any LEDs and other similar design elements are not taken into account.

    Damage to memory cells.

    Each memory microcircuit contains a huge number of such cells, into which a huge amount of information is written and from which it is read. If data is written to a damaged cell, they are distorted, which causes a malfunction of the system or application.

    Overclocking, improper timings and stress.

    Each of us has ever tried or wants to try to overclock memory. It is allowed to increase the memory frequency not on all platforms, but if you have already acquired a motherboard that supports overclocking, you may encounter certain problems along the way. In modern realities, overclocking of memory depends not only on the chips themselves, but also on the memory controller and line layout on the motherboard built into the processor. The last two aspects affect overclocking to a lesser extent than the used memory chips. The more you increase the clock speed of memory modules, the more likely errors will appear in their operation. With timings - vice versa. Their reduction may lead to unstable operation. Increased stability of the overclocked memory can be helped by the increased voltage on it, which leads to more heating and a decrease in the resource of work in general, as well as the potential for failure at any time. In general, if the system is unstable, then first return all the settings to the factory settings.

    Overheat.

    Yes, high memory temperatures can also affect system stability. Therefore, choosing high-frequency kits, you should take care of their cooling. At a minimum, they should have radiators. The same goes for low-frequency modules that are subject to overclocking on your part. Do you want to install a set of fast memory in a working system in which calculations are made with its help? Do not believe that modern DDR4 with an operating voltage of 1.2 V can get very hot? Admire it! The temperature of microchips of modules not equipped with radiators almost reaches 85 degrees, which is the limit for most microcircuits. Impressive, isn't it?



    Mechanical damage
    Any inaccurate movement - and you can damage the memory module. Chip a chip, SPD or track burst in the circuit board. With some damage, memory can still work, but with critical errors. For example, the SPD chip, which is shown in the photo below, made the module completely inoperative. Speaking of radiators, they can reduce the probability of mechanical damage to memory to almost zero, unless, of course, you spill tea or coffee on it ...



    Other sources of memory problems, but when memory has nothing to do with it.

    Separately, it must be said that memory can work unstably and not because of the reasons described above. Problems may still lie in the processor or motherboard. The memory controller in modern processors is implemented directly in the processor itself. And he can "behave badly" for various reasons, especially during acceleration. And it happens that even if you reset the settings to the nominal, then, for example, the "dead" memory channel will not come to life. Accordingly, replacing the module will not lead to anything. Physical damage to the processor socket or motherboard (kinks or other external / internal influences) can also be causes of incorrect memory operation. Therefore, we will not stop trying to persuade you to check all the components separately before you go buy a new memory kit, which can be a waste of money. And Kingston went further - it offers a configurator by which you can simply and conveniently find memory modules suitable for certain systems! You can find it athttps://www.kingston.com/en/memory/searchoptions .

    Take care ...

    Few people know that there are three letters that can simplify the selection of system components - QVL. The decoding sounds like a Qualified Vendors List, which in Russian sounds like a compatibility list. It includes those components with which the manufacturer of the motherboard has checked its product and guarantees correct operation. For obvious reasons, not everyone can check hundreds of items. But every self-respecting manufacturer offers a fairly extensive list of RAM models in our case.

    The blue screens of death, freezes and reboots - the malfunction is exactly in ...

    What is the minimum set of electronic components for a PC / laptop / all-in-one? From the motherboard, processor, drive, power supply and RAM. All of these components are interconnected, so if one of them is unstable, then this causes system crashes. The most correct way to diagnose is to test each of these components in another system. Thus, by elimination we can determine the “weakest link” and replace it. But it is not always possible to find another system for such actions. For example, not every friend you know may have a board for checking modules with a clock speed of 4000 MHz or so. Suppose a problem has been identified, and it lies in memory. We checked several times in different slots and on a pair of motherboards - and it began to work stably. Magic? As the Marvel universe says, magic is just an unexplored technology, the secret of which in our case is very simple. The contacts on the memory modules oxidize over time, which leads to the impossibility of their correct operation, and when you remove and return several times, they are polished a little, after which everything starts to work normally. In fact, oxidation of contacts is the most common problem of random access memory failures (and not only), so take it as a rule - if you have any problems with the platform, arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. the secret of which in our case is very simple. The contacts on the memory modules oxidize over time, which leads to the impossibility of their correct operation, and when you remove and return several times, they are polished a little, after which everything starts to work normally. In fact, oxidation of contacts is the most common problem of random access memory failures (and not only), so take it as a rule - if you have any problems with the platform, arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. the secret of which in our case is very simple. The contacts on the memory modules oxidize over time, which leads to the impossibility of their correct operation, and when you remove and return several times, they are polished a little, after which everything starts to work normally. In fact, oxidation of contacts is the most common problem of random access memory failures (and not only), so take it as a rule - if you have any problems with the platform, arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. and when you get out and return several times, they are polished a little, after which everything starts to work fine. In fact, oxidation of contacts is the most common problem of random access memory failures (and not only), so take it as a rule - if you have any problems with the platform, arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. and when you get out and return several times, they are polished a little, after which everything starts to work fine. In fact, oxidation of contacts is the most common problem of random access memory failures (and not only), so take it as a rule - if you have any problems with the platform, arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. therefore, take it as a rule - if there are any problems with the platform, then arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures. therefore, take it as a rule - if there are any problems with the platform, then arm yourself with an ordinary stationery eraser and gently wipe the contacts on both sides. This is relevant just in cases where problems arise when the memory operates in its nominal mode, if before that it worked for months or years without failures.



    If the eraser didn’t help

    What to do next? If the system works with catastrophic failures, then only check the components on a known working platform. If the suspicion is precisely about the memory working in the nominal mode, then several tests can be performed. There are free and paid versions of the programs, some from Windows / Linux, and some from DOS or even UEFI.

    To begin with, what each user of Windows 7 and newer has. Oddly enough, the built-in Windows memory test works very efficiently and is able to detect errors. It starts in two ways - from the Start menu:



    Or via Win + R:



    One result awaits us:



    If the basic or normal tests did not reveal errors, then you should definitely test in the "Wide" mode, which includes tests from the previous modes, but is supplemented by MATS +, Stride38, WSCHCKR, WStride-6, CHCKR4, WCHCKR3, ERAND, Stride6 and CHCKR8 .



    You can view the results in the "Event Viewer" application, namely - "Windows Logs" - "System". If there are many events, then it will be easiest to find the log we need through a search (CTRL + F) under the name MemoryDiagnostics-Results.



    To check the memory, it is recommended to use programs that function prior to loading the OS. This way we can check the maximum available free memory capacity, which will increase the chance of detecting errors, if any. A very common program is MemTest86. It exists in two versions - for legacy (BIOS) systems and for UEFI-compatible platforms. For the latter - the program is paid, although there is also a free option with limited functionality. If interested, the comparative table of editions is available on the official website of the manufacturer - https://www.memtest86.com/features.htm .

    This program is the best solution for finding memory errors. It has enough settings and displays the result in an understandable way. How much to test memory? The more, the better, if the probability of an error is small. If any memory chip is clearly problematic, then the result will not be long in coming.



    There is also MemTest for Windows. You can also use it, but it will make less sense - it does not test the memory area that is allocated for the OS and running programs in the background.



    Since this program is not new, enthusiasts (mainly Asians) write additional shells for it so that it is possible to conveniently and quickly launch several copies at once to test a large amount of memory.



    Unfortunately, updates to these shells most often remain in Chinese.



    But our enthusiasts write their software. A striking example is TestMem5 from Serj.



    In general, you can also bring linpack to the list of tests, but for its operation it will also require a full load on the processor, which can lead to overheating, especially if AVX instructions are used. Yes, and this test is not quite suitable for checking memory, but rather for warming up the processor in order to study the efficiency of the cooling system. Well, look at the numbers. In general, this is not a benchmark for home use, it has a completely different purpose.

    Quick solution to all problems


    But this, unfortunately, is not. Unless you are the owner of a thick wallet that will allow you to give your PC for diagnosis and repair. And even then - quickly even for money it will not work, unless you just buy a set of new components. Answering the questions posed at the very beginning of the article, we can say the following. The causes of system failures due to the fault of RAM may be several. And not all of them relate directly to memory modules, the processor, as well as the motherboard, may still be the fault. Speaking directly about the memory, overclocking in any of its manifestations also affects the stability of the work, and the module can be completely killed by accidentally physically - by static or inaccurate hand movement. If you exclude the board with the processor, make sure that the temperature is correct, remove overclocking and check the modules in another system, and they will not stop issuing errors - then you already have to go to the warranty department or, if all the deadlines are over, buy new modules. Only a few users will be able to fix the problem themselves - for this you will need to find the faulty microcircuit and replace it with a new one, as well as, if necessary, make changes to the SPD. Difficult, but possible. And do not forget about the eraser - perhaps the problem is solved very quickly :)



    For more information on HyperX and Kingston products, go to the companies website.

    Also popular now: