Discussion:
Server with 3TB Crashing at boot
(too old to reply)
Michael Fuckner
2015-03-11 13:46:04 UTC
Permalink
Hi,

I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8
Memory Risers with 12DIMMs each. With 2 Risers I can boot the System
with FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the
system crashes on boot.


Loading Image...
http://dedi3.fuckner.net/~molli123/temp/3tb_crash.avi


Here is the dmesg/ verbose dmesg, but I believe the verbose one got
truncated. Is there anything more I can do than booting verbose and to
copy /var/run/dmesg.boot?

http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_10.1.txt

Regards,
Michael!

PS: already posted to freebsd-amd64, but I was told these lists are better.
Adrian Chadd
2015-03-11 15:33:42 UTC
Permalink
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)

Hm, other hackers - what's the current size of the AMD64 direct map?



-adrian
Hi,
I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8 Memory
Risers with 12DIMMs each. With 2 Risers I can boot the System with
FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the system crashes
on boot.
http://dedi3.fuckner.net/~molli123/temp/3tb_ap1_phy2.png
http://dedi3.fuckner.net/~molli123/temp/3tb_crash.avi
Here is the dmesg/ verbose dmesg, but I believe the verbose one got
truncated. Is there anything more I can do than booting verbose and to copy
/var/run/dmesg.boot?
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_10.1.txt
Regards,
Michael!
PS: already posted to freebsd-amd64, but I was told these lists are better.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Oliver Pinter
2015-03-11 15:45:03 UTC
Permalink
Post by Adrian Chadd
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)
Hm, other hackers - what's the current size of the AMD64 direct map?
4TB - https://github.com/freebsd/freebsd/blob/master/sys/amd64/include/vmparam.h#L157
Post by Adrian Chadd
-adrian
Hi,
I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8 Memory
Risers with 12DIMMs each. With 2 Risers I can boot the System with
FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the system crashes
on boot.
http://dedi3.fuckner.net/~molli123/temp/3tb_ap1_phy2.png
http://dedi3.fuckner.net/~molli123/temp/3tb_crash.avi
Here is the dmesg/ verbose dmesg, but I believe the verbose one got
truncated. Is there anything more I can do than booting verbose and to copy
/var/run/dmesg.boot?
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_10.1.txt
Regards,
Michael!
PS: already posted to freebsd-amd64, but I was told these lists are better.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Julian Elischer
2015-03-11 19:13:52 UTC
Permalink
Post by Oliver Pinter
Post by Adrian Chadd
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)
Hm, other hackers - what's the current size of the AMD64 direct map?
4TB - https://github.com/freebsd/freebsd/blob/master/sys/amd64/include/vmparam.h#L157
yeah but since direct-map is, well, directly mapped, you might have
3TB of ram but it might be spread over a larger range.
there may be holes in it.. it would be worth knowing the apparent
layout of the ram.
Post by Oliver Pinter
Post by Adrian Chadd
-adrian
Hi,
I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8 Memory
Risers with 12DIMMs each. With 2 Risers I can boot the System with
FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the system crashes
on boot.
http://dedi3.fuckner.net/~molli123/temp/3tb_ap1_phy2.png
http://dedi3.fuckner.net/~molli123/temp/3tb_crash.avi
Here is the dmesg/ verbose dmesg, but I believe the verbose one got
truncated. Is there anything more I can do than booting verbose and to copy
/var/run/dmesg.boot?
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_10.1.txt
Regards,
Michael!
PS: already posted to freebsd-amd64, but I was told these lists are better.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Michael Fuckner
2015-03-12 09:32:55 UTC
Permalink
Post by Oliver Pinter
Post by Adrian Chadd
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)
Hm, other hackers - what's the current size of the AMD64 direct map?
4TB -
https://github.com/freebsd/freebsd/blob/master/sys/amd64/include/vmparam.h#L157
yeah but since direct-map is, well, directly mapped, you might have 3TB
of ram but it might be spread over a larger range.
there may be holes in it.. it would be worth knowing the apparent
layout of the ram.
is it this you are looking for (from OpenSUSE 13.2)?


http://dedi3.fuckner.net/~molli123/temp/dmesg.smp.disabled.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg-s4l_opensuse13.2.txt


[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x00000000000997ff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000099800-0x000000000009ffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000784affff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000784b0000-0x0000000078c63fff]
reserved
[ 0.000000] BIOS-e820: [mem 0x0000000078c64000-0x0000000078ca6fff]
ACPI data
[ 0.000000] BIOS-e820: [mem 0x0000000078ca7000-0x000000007a268fff]
ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000007a269000-0x000000007bdc3fff]
reserved
[ 0.000000] BIOS-e820: [mem 0x000000007bdc4000-0x000000007bdc4fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007bdc5000-0x000000007be4afff]
reserved
[ 0.000000] BIOS-e820: [mem 0x000000007be4b000-0x000000007bffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000003007fffffff] usable
Julian Elischer
2015-03-12 18:39:41 UTC
Permalink
Post by Michael Fuckner
Post by Oliver Pinter
Post by Adrian Chadd
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)
Hm, other hackers - what's the current size of the AMD64 direct map?
4TB -
https://github.com/freebsd/freebsd/blob/master/sys/amd64/include/vmparam.h#L157
yeah but since direct-map is, well, directly mapped, you might have 3TB
of ram but it might be spread over a larger range.
there may be holes in it.. it would be worth knowing the apparent
layout of the ram.
is it this you are looking for (from OpenSUSE 13.2)?
http://dedi3.fuckner.net/~molli123/temp/dmesg.smp.disabled.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg-s4l_opensuse13.2.txt
[ 0.000000] BIOS-e820: [mem
0x0000000000000000-0x00000000000997ff] usable
[ 0.000000] BIOS-e820: [mem
0x0000000000099800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem
0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem
0x0000000000100000-0x00000000784affff] usable
[ 0.000000] BIOS-e820: [mem
0x00000000784b0000-0x0000000078c63fff] reserved
[ 0.000000] BIOS-e820: [mem
0x0000000078c64000-0x0000000078ca6fff] ACPI data
[ 0.000000] BIOS-e820: [mem
0x0000000078ca7000-0x000000007a268fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem
0x000000007a269000-0x000000007bdc3fff] reserved
[ 0.000000] BIOS-e820: [mem
0x000000007bdc4000-0x000000007bdc4fff] usable
[ 0.000000] BIOS-e820: [mem
0x000000007bdc5000-0x000000007be4afff] reserved
[ 0.000000] BIOS-e820: [mem
0x000000007be4b000-0x000000007bffffff] usable
[ 0.000000] BIOS-e820: [mem
0x0000000080000000-0x000000008fffffff] reserved
[ 0.000000] BIOS-e820: [mem
0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem
0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem
0x0000000100000000-0x000003007fffffff] usable
ok, it looks like it is in one big chunk.. so that is not an issue.
Adrian Chadd
2015-03-12 19:30:56 UTC
Permalink
Right. Try booting it with SMP disabled but with all the RAM.

I think it's 'kern.smp.disabled=1' at the bootloader, then 'boot -v'


-a
Garrett Cooper
2015-03-13 07:36:45 UTC
Permalink
Post by Adrian Chadd
Right. Try booting it with SMP disabled but with all the RAM.
Is this possible? Can one CPU see the RAM on another CPU if that CPU
isn't enabled in the kernel?
I could be wrong, but I think Adrian's recommending that the number of variables be reduced so the root cause cause could be better isolated.

Thanks!
Konstantin Belousov
2015-03-13 07:52:30 UTC
Permalink
Post by Adrian Chadd
Right. Try booting it with SMP disabled but with all the RAM.
Is this possible? Can one CPU see the RAM on another CPU if that CPU
isn't enabled in the kernel?
Yes, of course. The disabled state means that the core is not started to
execute the stream of the architectural instructions opcodes. The memory
controller, address decoder and inter-socket links, and pcie links are
configured by the motherboard firmware during the POST. They are left
alone by our kernel.
Michael Fuckner
2015-03-13 17:50:42 UTC
Permalink
Post by Adrian Chadd
Right. Try booting it with SMP disabled but with all the RAM.
I think it's 'kern.smp.disabled=1' at the bootloader, then 'boot -v'
my problem seems to be my /boot/loader.conf

which is the correct option for verboose booting?
when loading nvme/nvd/zfs it crashes.
why is mpr1 detected after loading nvme? (I boot from mpr0)


hw.memtest.tests=0
#kern.smp.disabled=1
console=comconsole
boot_verbose="YES"
verbose_loading="YES"
zfs_load="NO"
nvme_load="NO"
nvd_load="NO"

http://dedi3.fuckner.net/~molli123/temp/kldload-nvme_nvd_zfs.txt
http://dedi3.fuckner.net/~molli123/temp/freebsd-10.1_smp-disabled_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/freebsd-10.1_smp-enabled_verbose.txt


Any idea?

Regards,
Michael!
Adrian Chadd
2015-03-13 18:02:38 UTC
Permalink
Post by Michael Fuckner
Post by Adrian Chadd
Right. Try booting it with SMP disabled but with all the RAM.
I think it's 'kern.smp.disabled=1' at the bootloader, then 'boot -v'
my problem seems to be my /boot/loader.conf
which is the correct option for verboose booting?
when loading nvme/nvd/zfs it crashes.
why is mpr1 detected after loading nvme? (I boot from mpr0)
hw.memtest.tests=0
#kern.smp.disabled=1
console=comconsole
boot_verbose="YES"
verbose_loading="YES"
zfs_load="NO"
nvme_load="NO"
nvd_load="NO"
http://dedi3.fuckner.net/~molli123/temp/kldload-nvme_nvd_zfs.txt
http://dedi3.fuckner.net/~molli123/temp/freebsd-10.1_smp-disabled_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/freebsd-10.1_smp-enabled_verbose.txt
Any idea?
Hi,

boot_verbose=YES looks to be right.

So hm, just to be clear - it boots fine if you don't load zfs/nvme/nvd?


-adrian
Michael Fuckner
2015-03-13 19:50:41 UTC
Permalink
Hi!
Post by Adrian Chadd
boot_verbose=YES looks to be right.
***@s4l:~ # grep verbose /boot/defaults/loader.conf
verbose_loading="NO" # Set to YES for verbose loader output

OK, this confused me :-(
Post by Adrian Chadd
So hm, just to be clear - it boots fine if you don't load zfs/nvme/nvd?
yes, probably zfs, but why does loading nvme also come up with other devices
like mpr?
Chris H
2015-03-13 20:26:32 UTC
Permalink
Post by Michael Fuckner
Hi!
Post by Adrian Chadd
boot_verbose=YES looks to be right.
verbose_loading="NO" # Set to YES for verbose loader output
OK, this confused me :-(
In case it's still not clear:
The values your copy of loader.conf located as /boot/loader.conf
overrides the values set in /boot/defaults/loader.conf

--Chris
Post by Michael Fuckner
Post by Adrian Chadd
So hm, just to be clear - it boots fine if you don't load zfs/nvme/nvd?
yes, probably zfs, but why does loading nvme also come up with other devices
like mpr?
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Michael Fuckner
2015-03-13 20:38:49 UTC
Permalink
Post by Chris H
Hi!
Post by Adrian Chadd
boot_verbose=YES looks to be right.
verbose_loading="NO" # Set to YES for verbose loader output
OK, this confused me :-(
The values your copy of loader.conf located as /boot/loader.conf
overrides the values set in /boot/defaults/loader.conf
that is clear to me, but I was confused by verbose_loading vs boot_verbose
Chris H
2015-03-13 23:59:51 UTC
Permalink
Post by Michael Fuckner
geschrieben: >
On Fri, 13 Mar 2015 20:49:44 +0100 (CET) Michael Fuckner
Hi!
Post by Adrian Chadd
boot_verbose=YES looks to be right.
verbose_loading="NO" # Set to YES for verbose loader output
OK, this confused me :-(
The values your copy of loader.conf located as /boot/loader.conf
overrides the values set in /boot/defaults/loader.conf
that is clear to me, but I was confused by verbose_loading vs boot_verbose
LOL in all honesty, that one got me too, at first. :-/
It *is* confusing.

--Chris
Post by Michael Fuckner
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
--Chris

--

Dirk-Willem van Gulik
2015-03-11 17:08:29 UTC
Permalink
Post by Adrian Chadd
Hm, have you tried with just one TB of RAM? I haven't had access to
systems with 3TB of RAM - I'm just about to get 1TB in a box. :)
Hm, other hackers - what's the current size of the AMD64 direct map?
Since 10.0 - https://svnweb.freebsd.org/base?view=revision&revision=254466 has made 2Tbyte work reliably for me.

Dw.

*: See also [amd64] The maximum amount of memory the FreeBSD kernel can address has been increased from 1TB to 4TB in relnotes of https://www.freebsd.org/releases/10.0R/relnotes.html
Neel Natu
2015-03-11 17:35:01 UTC
Permalink
Hi Michael,
Hi,
I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8 Memory
Risers with 12DIMMs each. With 2 Risers I can boot the System with
FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the system crashes
on boot.
Can you try to boot the system with SMP disabled and more than 2
risers populated?

LOADER> set kern.smp.disabled=1

best
Neel
http://dedi3.fuckner.net/~molli123/temp/3tb_ap1_phy2.png
http://dedi3.fuckner.net/~molli123/temp/3tb_crash.avi
Here is the dmesg/ verbose dmesg, but I believe the verbose one got
truncated. Is there anything more I can do than booting verbose and to copy
/var/run/dmesg.boot?
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_verbose.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg_q71l-4u_10.1.txt
Regards,
Michael!
PS: already posted to freebsd-amd64, but I was told these lists are better.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Michael Fuckner
2015-03-12 09:30:45 UTC
Permalink
Post by Neel Natu
Hi Michael,
Hi,
I have a server with 4 Xeon E7-8857 v2 and 96x32TB, organized as 8 Memory
Risers with 12DIMMs each. With 2 Risers I can boot the System with
FreeBSD10.1-p6, when adding a third riser (above 1TB RAM) the system crashes
on boot.
Can you try to boot the system with SMP disabled and more than 2
risers populated?
LOADER> set kern.smp.disabled=1
with 3 risers it boots, with 8 it crashes, but I couldn't read the error
message fast enough). And with 3 Modules USB was not working anymore-
probably the USB Controller is attached to another CPU?

http://dedi3.fuckner.net/~molli123/temp/dmesg.smp.disabled.txt
http://dedi3.fuckner.net/~molli123/temp/dmesg-s4l_opensuse13.2.txt
Loading...