source: content/en/posts/ram-fix.org@ 709b33e

Last change on this file since 709b33e was 709b33e, checked in by w96k <w96k@…>, on Nov 13, 2022 at 3:44:35 AM

Fix typos

  • Property mode set to 100644
File size: 7.5 KB
Line 
1#+TITLE: RAM failures: How to detect and fix
2#+DATE: <2022-10-26 Wed>
3#+LANGUAGE: en
4
5* RAM failures: How to detect and fix
6
7#+BEGIN_abstract
8If your system break in unexpected ways and you can't understand why it
9might be the RAM issue. In this article, I will try to explain why RAM
10is getting broken, how to detect it, and how to potentially fix it
11without going too deep into details.
12#+END_abstract
13
14#+ATTR_HTML: :loading lazy
15[[file:../../../public/images/thinkpad_ram_fail.jpg]]
16
17I've started struggling with segfaults and other problems more often
18after years of using my laptop. It was not critical, but annoying and it
19turned out that the problem was a broken ram module. I tested it and
20then after figuring out that it was broken I removed it completely. I
21had only 2GiB of RAM available after that, but I got to the conclusion
22that 2GiB is enough for me to do the work if to apply some optimizations
23to my GNU+Linux system.
24
25** What RAM is used for
26RAM is used to execute and store program memory. When you run the binary
27all CPU instructions are being placed into RAM and then all that
28instructions are being read sequentially one after another. While
29executing your program it will store intermediate results such as
30storing variables, data structures, and so on into RAM. After finishing
31your program RAM is cleared from program instructions and intermediate
32results of execution of these instructions.
33
34[[[https://en.wikipedia.org/wiki/Random-access_memory][Wikipedia: Random-access memory]]]
35
36** Why RAM can be damaged
37I can't say why exactly it can be damaged, overheating is probably one
38of the factors. There is a chance that RAM itself is not damaged at all.
39Actually it looks like that it breaks rarely. What can be "damaged" is
40RAM module contacts and it can be easily fixed. What you need to do is
41to take out ram modules and use an eraser to clean its contacts. But how
42to understand that you have a problem in the first place?
43
44** How to understand that RAM is broken
45The most expected way to see RAM fail is [[https://computerhope.com/beep.htm][BIOS signalling]] that it is
46broken. It should beep using PC Speaker a special signal. You can read
47your motherboard manual to understand what does it mean. Usually it
48means that computer won't start with "completely" broken RAM module.
49
50If system loads just fine, but you experience problems along the way
51such as random segfaults and programs crashes, kernel panics and so on,
52you might have broken segments of RAM. To detect such segments you can
53use several programs listed below. RAM checking is usually not a fast
54process, so you will probably need to left your device running for
55several hours.
56
57*** Memtest86+
58Memtest86+ runs from grub menu before your OS. It needs to run in such
59way because it needs whole range of RAM and your system can use a range
60of ram and would not allow to properly check it. It runs a lot of checks
61and checks every segment of your ram. While checking it logs the list of
62broken segments that you can write down.
63
64You can install it using your GNU+Linux package manager such as apt. The
65package is usually called ~memtest86+~. But there is a small caveat. If
66you use old version it wont work on UEFI systems.
67
68If it doesn't work you can download memtest86+ newer version
69distribution to your usb stick and load memtest from that. It should
70work on UEFI and BIOS systems. It can be downloaded from offical
71website.
72
73[[[https://memtest.org/][Official Website]]]
74
75*** Memtester
76It has the same purpose as memtest86+, but it runs while your system is
77running, so it doesn't check whole RAM range, but only specified free
78ram available at your system at the moment of running this program. It
79can be installed using your package manager of choice by typing
80~memtester~ as package name.
81
82[[[https://pyropus.ca./software/memtester/][Official Website]]]
83
84** How to fix broken RAM
85#+ATTR_HTML: :loading lazy
86[[file:../../../public/images/thinkpad_ram_repair.jpg]]
87
88First of all, if memtest86+ or(and) memtester doesn't show you any
89error, congratulations! You don't have any problems with your RAM.
90
91If it shows small amount of errors like one or two, you can let Linux
92Kernel ignore such segments of RAM, so programs doesn't use such broken
93segments and programs work stable all the time. You need to use for that
94~memmap~ kernel argument in your grub configuration. For example:
95~memmap=0x100000$762ce9c38420,0x100000$34e03060,0x100000$87fce060,0x100000$23c63060,0x100000$87b6c060~. There
96is also grub config unit called ~GRUB_BADRAM~, but it looks like it is
97deprecated and memmap is preffered.
98
99For more details about blacklisting bad segments of RAM read [[https://unix.stackexchange.com/questions/75059/how-to-blacklist-a-correct-bad-ram-sector-according-to-memtest86-error-indicati][this
100comprehensive Stack Overflow answer]].
101
102If it shows huge amount of errors, like many thousand, it means that
103probably one of your sticks of RAM are broken. To detect which one is
104broken exactly you can probably figure it out looking at addresses or
105running another test using specific stick(s) of RAM and seeing if errors
106are gone.
107
108Be aware, that if you left with one RAM stick there is a chance, that it
109will only boot in specific RAM slot. Read your motherboard manual if
110something doesn't work.
111
112If you have a RAM memory stick with tons of errors, you can try to
113repair it. I can't tell how exactly it is being done and why it is done
114in the way it should be done. You can find videos on fixing RAM sticks
115on YouTube and other resources. Here is [[https://youtu.be/KVR91p-Bd6M][the link to one of such videos]].
116
117** RAM Optimizations of GNU+Linux system
118If your RAM was broken and you left with much less memory that you
119expected, don't run and buy new RAM sticks. There is a chance that even
120with less RAM the system will work completely fine. Linux is pretty good
121at working on low end machines and it has different ways to handle lack
122of memory. It is often situation to have devices that outperform their
123tasks in modern world, like working on gaming laptop with very powerful
124CPU and GPU, that are used mostly to render text in a text editor.
125
126*** Swap
127Swap is a partition on your hard drive that is being used in situation
128when there is no RAM left. It is used for other reasons too and such
129partition is recommended to have on most GNU+Linux systems.
130
131You can configure how often linux system will use swap changing
132~swappiness~. You can read about changing that setting and to learn about
133swap in general in the link below.
134
135[[[https://wiki.archlinux.org/title/swap][Arch Linux Wiki: Swap]]]
136
137*** Zram
138Zram is something that stays in between RAM and Swap in terms of
139performance. It helps your system to stay performant if it uses swap a
140lot, but it increases the CPU usage because of that. I use Zram on a
141machine with 2GiB of RAM and 16GiB swap and it works great even with
142many programs opened at the same time (text editor, browser, docker
143container, messenger).
144
145[[[https://wiki.archlinux.org/title/Improving_performance#zram_or_zswap][Arch Linux Wiki: Zram]]]
146
147*** Less bloat software
148Also as alternative way you can simply use less bloat software, so you
149don't need so much RAM in the first place. In many cases good software
150doesn't require a lot of RAM, but bad software always leak memory, so
151you would need many GiBs of RAM to use it properly. The most bloated
152software is a web-browser such as chromium and firefox and browser-based
153apps done in electron such as Slack, VSCode and other proprietary
154products.
155
156** Conclusions
157Now you have directions about what to do when you suspect RAM
158failure. That knowledge can also be used for testing when you buy used
159memory sticks from someone else.
Note: See TracBrowser for help on using the repository browser.