1 | #+TITLE: RAM failures: How to detect and fix
|
---|
2 | #+DATE: <2022-10-26 Wed>
|
---|
3 | #+LANGUAGE: en
|
---|
4 |
|
---|
5 | * RAM failures: How to detect and fix
|
---|
6 |
|
---|
7 | #+BEGIN_abstract
|
---|
8 | If your system break in unexpected ways and you can't understand why it
|
---|
9 | might be the RAM issue. In this article, I will try to explain why RAM
|
---|
10 | is getting broken, how to detect it, and how to potentially fix it
|
---|
11 | without going too deep into details.
|
---|
12 | #+END_abstract
|
---|
13 |
|
---|
14 | #+ATTR_HTML: :loading lazy
|
---|
15 | [[file:../../../public/images/thinkpad_ram_fail.jpg]]
|
---|
16 |
|
---|
17 | I've started struggling with segfaults and other problems more often
|
---|
18 | after years of using my laptop. It was not critical, but annoying and it
|
---|
19 | turned out that the problem was a broken ram module. I tested it and
|
---|
20 | then after figuring out that it was broken I removed it completely. I
|
---|
21 | had only 2GiB of RAM available after that, but I got to the conclusion
|
---|
22 | that 2GiB is enough for me to do the work if to apply some optimizations
|
---|
23 | to my GNU+Linux system.
|
---|
24 |
|
---|
25 | ** What RAM is used for
|
---|
26 | RAM is used to execute and store program memory. When you run the binary
|
---|
27 | all CPU instructions are being placed into RAM and then all that
|
---|
28 | instructions are being read sequentially one after another. While
|
---|
29 | executing your program it will store intermediate results such as
|
---|
30 | storing variables, data structures, and so on into RAM. After finishing
|
---|
31 | your program RAM is cleared from program instructions and intermediate
|
---|
32 | results of execution of these instructions.
|
---|
33 |
|
---|
34 | [[[https://en.wikipedia.org/wiki/Random-access_memory][Wikipedia: Random-access memory]]]
|
---|
35 |
|
---|
36 | ** Why RAM can be damaged
|
---|
37 | I can't say why exactly it can be damaged, overheating is probably one
|
---|
38 | of the factors. There is a chance that RAM itself is not damaged at all.
|
---|
39 | Actually it looks like that it breaks rarely. What can be "damaged" is
|
---|
40 | RAM module contacts and it can be easily fixed. What you need to do is
|
---|
41 | to take out ram modules and use an eraser to clean its contacts. But how
|
---|
42 | to understand that you have a problem in the first place?
|
---|
43 |
|
---|
44 | ** How to understand that RAM is broken
|
---|
45 | The most expected way to see RAM fail is [[https://computerhope.com/beep.htm][BIOS signalling]] that it is
|
---|
46 | broken. It should beep using PC Speaker a special signal. You can read
|
---|
47 | your motherboard manual to understand what does it mean. Usually it
|
---|
48 | means that computer won't start with "completely" broken RAM module.
|
---|
49 |
|
---|
50 | If system loads just fine, but you experience problems along the way
|
---|
51 | such as random segfaults and programs crashes, kernel panics and so on,
|
---|
52 | you might have broken segments of RAM. To detect such segments you can
|
---|
53 | use several programs listed below. RAM checking is usually not a fast
|
---|
54 | process, so you will probably need to left your device running for
|
---|
55 | several hours.
|
---|
56 |
|
---|
57 | *** Memtest86+
|
---|
58 | Memtest86+ runs from grub menu before your OS. It needs to run in such
|
---|
59 | way because it needs whole range of RAM and your system can use a range
|
---|
60 | of ram and would not allow to properly check it. It runs a lot of checks
|
---|
61 | and checks every segment of your ram. While checking it logs the list of
|
---|
62 | broken segments that you can write down.
|
---|
63 |
|
---|
64 | You can install it using your GNU+Linux package manager such as apt. The
|
---|
65 | package is usually called ~memtest86+~. But there is a small caveat. If
|
---|
66 | you use old version it wont work on UEFI systems.
|
---|
67 |
|
---|
68 | If it doesn't work you can download memtest86+ newer version
|
---|
69 | distribution to your usb stick and load memtest from that. It should
|
---|
70 | work on UEFI and BIOS systems. It can be downloaded from offical
|
---|
71 | website.
|
---|
72 |
|
---|
73 | [[[https://memtest.org/][Official Website]]]
|
---|
74 |
|
---|
75 | *** Memtester
|
---|
76 | It has the same purpose as memtest86+, but it runs while your system is
|
---|
77 | running, so it doesn't check whole RAM range, but only specified free
|
---|
78 | ram available at your system at the moment of running this program. It
|
---|
79 | can be installed using your package manager of choice by typing
|
---|
80 | ~memtester~ as package name.
|
---|
81 |
|
---|
82 | [[[https://pyropus.ca./software/memtester/][Official Website]]]
|
---|
83 |
|
---|
84 | ** How to fix broken RAM
|
---|
85 | #+ATTR_HTML: :loading lazy
|
---|
86 | [[file:../../../public/images/thinkpad_ram_repair.jpg]]
|
---|
87 |
|
---|
88 | First of all, if memtest86+ or(and) memtester doesn't show you any
|
---|
89 | error, congratulations! You don't have any problems with your RAM.
|
---|
90 |
|
---|
91 | If it shows small amount of errors like one or two, you can let Linux
|
---|
92 | Kernel ignore such segments of RAM, so programs doesn't use such broken
|
---|
93 | segments and programs work stable all the time. You need to use for that
|
---|
94 | ~memmap~ kernel argument in your grub configuration. For example:
|
---|
95 | ~memmap=0x100000$762ce9c38420,0x100000$34e03060,0x100000$87fce060,0x100000$23c63060,0x100000$87b6c060~. There
|
---|
96 | is also grub config unit called ~GRUB_BADRAM~, but it looks like it is
|
---|
97 | deprecated and memmap is preffered.
|
---|
98 |
|
---|
99 | For more details about blacklisting bad segments of RAM read [[https://unix.stackexchange.com/questions/75059/how-to-blacklist-a-correct-bad-ram-sector-according-to-memtest86-error-indicati][this
|
---|
100 | comprehensive Stack Overflow answer]].
|
---|
101 |
|
---|
102 | If it shows huge amount of errors, like many thousand, it means that
|
---|
103 | probably one of your sticks of RAM are broken. To detect which one is
|
---|
104 | broken exactly you can probably figure it out looking at addresses or
|
---|
105 | running another test using specific stick(s) of RAM and seeing if errors
|
---|
106 | are gone.
|
---|
107 |
|
---|
108 | Be aware, that if you left with one RAM stick there is a chance, that it
|
---|
109 | will only boot in specific RAM slot. Read your motherboard manual if
|
---|
110 | something doesn't work.
|
---|
111 |
|
---|
112 | If you have a RAM memory stick with tons of errors, you can try to
|
---|
113 | repair it. I can't tell how exactly it is being done and why it is done
|
---|
114 | in the way it should be done. You can find videos on fixing RAM sticks
|
---|
115 | on YouTube and other resources. Here is [[https://youtu.be/KVR91p-Bd6M][the link to one of such videos]].
|
---|
116 |
|
---|
117 | ** RAM Optimizations of GNU+Linux system
|
---|
118 | If your RAM was broken and you left with much less memory that you
|
---|
119 | expected, don't run and buy new RAM sticks. There is a chance that even
|
---|
120 | with less RAM the system will work completely fine. Linux is pretty good
|
---|
121 | at working on low end machines and it has different ways to handle lack
|
---|
122 | of memory. It is often situation to have devices that outperform their
|
---|
123 | tasks in modern world, like working on gaming laptop with very powerful
|
---|
124 | CPU and GPU, that are used mostly to render text in a text editor.
|
---|
125 |
|
---|
126 | *** Swap
|
---|
127 | Swap is a partition on your hard drive that is being used in situation
|
---|
128 | when there is no RAM left. It is used for other reasons too and such
|
---|
129 | partition is recommended to have on most GNU+Linux systems.
|
---|
130 |
|
---|
131 | You can configure how often linux system will use swap changing
|
---|
132 | ~swappiness~. You can read about changing that setting and to learn about
|
---|
133 | swap in general in the link below.
|
---|
134 |
|
---|
135 | [[[https://wiki.archlinux.org/title/swap][Arch Linux Wiki: Swap]]]
|
---|
136 |
|
---|
137 | *** Zram
|
---|
138 | Zram is something that stays in between RAM and Swap in terms of
|
---|
139 | performance. It helps your system to stay performant if it uses swap a
|
---|
140 | lot, but it increases the CPU usage because of that. I use Zram on a
|
---|
141 | machine with 2GiB of RAM and 16GiB swap and it works great even with
|
---|
142 | many programs opened at the same time (text editor, browser, docker
|
---|
143 | container, messenger).
|
---|
144 |
|
---|
145 | [[[https://wiki.archlinux.org/title/Improving_performance#zram_or_zswap][Arch Linux Wiki: Zram]]]
|
---|
146 |
|
---|
147 | *** Less bloat software
|
---|
148 | Also as alternative way you can simply use less bloat software, so you
|
---|
149 | don't need so much RAM in the first place. In many cases good software
|
---|
150 | doesn't require a lot of RAM, but bad software always leak memory, so
|
---|
151 | you would need many GiBs of RAM to use it properly. The most bloated
|
---|
152 | software is a web-browser such as chromium and firefox and browser-based
|
---|
153 | apps done in electron such as Slack, VSCode and other proprietary
|
---|
154 | products.
|
---|
155 |
|
---|
156 | ** Conclusions
|
---|
157 | Now you have directions about what to do when you suspect RAM
|
---|
158 | failure. That knowledge can also be used for testing when you buy used
|
---|
159 | memory sticks from someone else.
|
---|