source: at-w96k/content/en/posts/ram-fix.org@ ebbbf4f

Last change on this file since ebbbf4f was ebbbf4f, checked in by w96k <w96k@…>, on Nov 13, 2022 at 11:57:43 PM

Fix typos

  • Property mode set to 100644
File size: 7.6 KB
Line 
1#+TITLE: RAM failures: How to detect and fix
2#+DATE: <2022-10-26 Wed>
3#+LANGUAGE: en
4
5* RAM failures: How to detect and fix
6
7#+BEGIN_abstract
8If your system break in unexpected ways and you can't understand why it
9might be the RAM issue. In this article, I will try to explain why RAM
10is getting broken, how to detect it, and how to potentially fix it
11without going too deep into details.
12#+END_abstract
13
14#+ATTR_HTML: :loading lazy
15[[file:../../../public/images/thinkpad_ram_fail.jpg]]
16
17I've started struggling with segfaults and other problems more often
18after years of using my laptop. It was not critical, but annoying and it
19turned out that the problem was a broken ram module. I tested it and
20then after figuring out that it was broken I removed it completely. I
21had only 2GiB of RAM available after that, but I got to the conclusion
22that 2GiB is enough for me to do the work if to apply some optimizations
23to my GNU+Linux system.
24
25** What RAM is used for
26RAM is used to execute and store program memory. When you run the binary
27all CPU instructions are being placed into RAM and then all that
28instructions are being read sequentially one after another. While
29executing your program it will store intermediate results such as
30storing variables, data structures, and so on into RAM. After finishing
31your program RAM is cleared from program instructions and intermediate
32results of execution of these instructions.
33
34[[[https://en.wikipedia.org/wiki/Random-access_memory][Wikipedia: Random-access memory]]]
35
36** Why RAM can be damaged
37I can't say why exactly it can be damaged, overheating is probably one
38of the factors. There is a chance that RAM itself is not damaged at all.
39Actually it looks like that it breaks rarely. What can be "damaged" is
40RAM module contacts and it can be easily fixed. What you need to do is
41to take out ram modules and use an eraser to clean its contacts. But how
42to understand that you have a problem in the first place?
43
44** How to understand that RAM is broken
45The most expected way to see RAM fail is [[https://computerhope.com/beep.htm][BIOS signaling]] that it is
46broken. It should beep a special signal using a PC Speaker. You can read
47your motherboard manual to understand what it mean. Usually, it means
48that the computer won't start with a "completely" broken RAM module.
49
50If the system loads just fine, but you experience problems along the way
51such as random segfaults and programs crash, kernel panics, and so on,
52you might have broken segments of RAM. To detect such segments you can
53use several programs listed below. RAM checking is usually not a fast
54process, so you will probably need to leave your device running for
55several hours.
56
57*** Memtest86+
58Memtest86+ runs from the Grub menu before running your OS. It needs to
59run in such way because it needs the whole range of RAM and your running
60system is using that RAM range. It runs a lot of checks and checks every
61segment of your ram. While checking it logs the list of broken segments
62that you can write down.
63
64You can install it using your GNU+Linux package manager such as apt. The
65package is usually called ~memtest86+~. But there is a small caveat. If
66you use the old version it won't work on UEFI systems.
67
68If it doesn't work you can download memtest86+ newer version
69distribution to your USB stick and load memtest from that. It should
70work on UEFI and BIOS systems. It can be downloaded from the offical
71website.
72
73[[[https://memtest.org/][Official Website]]]
74
75*** Memtester
76It has the same purpose as memtest86+, but it runs while your system is
77running, so it doesn't check the whole RAM range, but only specified
78free ram available at your system at the moment of running this
79program. It can be installed using your package manager of choice by
80typing ~memtester~ as a package name.
81
82[[[https://pyropus.ca./software/memtester/][Official Website]]]
83
84** How to fix broken RAM
85#+ATTR_HTML: :loading lazy
86[[file:../../../public/images/thinkpad_ram_repair.jpg]]
87
88First of all, if memtest86+ or(and) memtester doesn't show you any
89error, congratulations! You don't have any problems with your RAM.
90
91If it shows a small number of errors like one or two, you can let Linux
92Kernel ignore such segments of RAM, so programs don't use such broken
93segments and work stable all the time. You need to use for that ~memmap~
94kernel argument in your grub configuration. For example:
95~memmap=0x100000$762ce9c38420,0x100000$34e03060,0x100000$87fce060,0x100000$23c63060,0x100000$87b6c060~. There
96is also grub config unit called ~GRUB_BADRAM~, but it looks like it is
97deprecated and memmap is prefered.
98
99For more details about blacklisting bad segments of RAM read [[https://unix.stackexchange.com/questions/75059/how-to-blacklist-a-correct-bad-ram-sector-according-to-memtest86-error-indicati][this
100comprehensive Stack Overflow answer]].
101
102If it shows a big number of errors, like many thousand, it means that
103probably one of your sticks of RAM is broken. To detect which one is
104broken exactly you can probably figure it out by looking at addresses or
105running another test using a specific stick(s) of RAM and seeing if errors
106are gone.
107
108Be aware, that if you leave with one RAM stick there is a chance, that
109it will only boot in a specific RAM slot. Read your motherboard manual
110if something doesn't work.
111
112If you have a RAM memory stick with tons of errors, you can try to
113repair it. I can't tell how exactly it is being done and why it is done
114in the way it should be done. You can find videos on fixing RAM sticks
115on YouTube and other resources. Here is [[https://youtu.be/KVR91p-Bd6M][the link to one of such video]].
116
117** RAM Optimizations of GNU+Linux system
118If your RAM was broken and you left with much less memory than you
119expected, don't run and buy new RAM sticks. There is a chance that even
120with less RAM the system will work completely fine. Linux is pretty good
121at working on low-end machines and it has different ways to handle a
122lack of memory. There is often a situation in a modern world, when a
123person has devices that outperform their tasks, like working on gaming
124laptop with very powerful CPU and GPU, that are used mostly to render
125text in a text editor.
126
127*** Swap
128Swap is a partition on your hard drive that is being used in a situation
129when there is no RAM left. It is used for other reasons too and such
130partition is recommended to have on most GNU+Linux systems.
131
132You can configure how often linux system will use swap changing
133~swappiness~. You can read about changing that setting and learn about
134swap in general in the link below.
135
136[[[https://wiki.archlinux.org/title/swap][Arch Linux Wiki: Swap]]]
137
138*** Zram
139Zram is something that stays in between RAM and Swap in terms of
140performance. It helps your system to stay performant if it uses swap a
141lot, but it increases the CPU usage because of that. I use Zram on a
142machine with 2GiB of RAM and 16GiB swap and it works great even with
143many programs opened at the same time (text editor, browser, docker
144container, messenger).
145
146[[[https://wiki.archlinux.org/title/Improving_performance#zram_or_zswap][Arch Linux Wiki: Zram]]]
147
148*** Less bloat software
149Also as an alternative way you can simply use less bloat software, so
150you don't need so much RAM in the first place. In many cases, good
151software doesn't require a lot of RAM, but bad software always leaks
152memory, so you would need many GiBs of RAM to use it properly. The most
153bloated software is a web browser such as chromium and firefox and
154browser-based apps done in electron such as Slack, VSCode, and other
155proprietary products.
156
157** Conclusions
158Now you have directions about what to do when you suspect RAM
159failure. That knowledge can also be used for testing when you buy used
160memory sticks from someone else.
Note: See TracBrowser for help on using the repository browser.