Discussion:
Reset Problem with SATA Port Multiplier
(too old to reply)
Bob Bawn
2013-07-22 14:36:26 UTC
Permalink
Hello,

I'm testing high-density SATA storage with FreeBSD 9.1-STABLE. The
hardware is:

Drives: 45 * Seagate Altos ST3000NC002
Port Multipliers: 9 * SiI3826
SATA Controller: 3 * Marvell 88SX7042


After a few hours of a database-like workload over ZFS (NCQ enable, disk
write caches disabled), a disk becomes unresponsive (we think due to a
drive firmware problem):

Jun 14 21:39:54 adlax12st002 root: sysbench tests are now underway
Jun 15 12:12:07 adlax12st002 kernel: mvsch1: SNTF 15
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 12
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 serr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 status 40
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c80408
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 3
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 serr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 status 40
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c80400

After a few timeout/reset cycles, the afflicted device is removed:

Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): CAM status: Command timeout
Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): Error 5, Retry was blocked
Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): removing device entry
Jun 15 12:13:41 adlax12st002 kernel: mvsch1: MVS reset: device ready after 500ms


All of that seems like reasonable OS behavior when a drive is
unresponsive. In fact Linux/CentOS/ZoL behaves pretty much the same up
to this point.

The problem is that the other four drives behind the port multiplier
start timing out and get removed, one at a time, in target order, over
the next few minutes:

# grep "lost device" adlax12st002-messages.log
Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device
Jun 15 12:16:16 adlax12st002 kernel: (ada7:mvsch1:0:2:0): lost device
Jun 15 12:16:16 adlax12st002 kernel: (pass8:mvsch1:0:2:0): lost device
Jun 15 12:18:50 adlax12st002 kernel: (ada8:mvsch1:0:3:0): lost device
Jun 15 12:18:50 adlax12st002 kernel: (pass9:mvsch1:0:3:0): lost device
Jun 15 12:22:23 adlax12st002 kernel: (ada9:mvsch1:0:4:0): lost device
Jun 15 12:22:23 adlax12st002 kernel: (pass10:mvsch1:0:4:0): lost device
Jun 15 12:26:57 adlax12st002 kernel: (ada5:mvsch1:0:0:0): lost device
Jun 15 12:26:57 adlax12st002 kernel: (pass6:mvsch1:0:0:0): lost device

It looks like the timeout/reset/recovery sequence for the initial frozen
disk has somehow broken connectivity to all the drives behind the port
multiplier. This part does not happen on Linux. Sometimes the entire
machine is locked up after the "lost device" sequence. In all cases, a
full power cycle is required to make the devices available again. When I
soft reset the box over IPMI, the boot process gets stuck in a loop with
"mvsch2: MVS reset" and "mvsch2: Wait status d0".

Full /var/log/messages are at:

http://pastebin.com/xCJyfvSN

Unfortunately, I failed to grab the dmesg output and the box has since
been re-imaged. Here is a dmesg from a machine which I believe to be
identical to the test box:

http://pastebin.com/NYjezuMX

/var/log/messages for the CentOS/Linux case is at:

http://pastebin.com/qrWm0HJ0

Maybe this is a topic for a different post, but has anybody successfully
used high-density port-multiplied SATA platforms with FreeBSD? I've
heard lots of anecdotes about hardware and/or driver flakiness (like the
above), undocumented hardware, etc. (Actually, I've heard similar
complaints from Linux folks.) SAS machines seem to handle this workload
without any problems. We have tried 9.1-RELEASE and the behavior was
worse.

We're actually more interested in archive type workloads than this
database workload and we have not observed the problem with an archive
workload. However, we're worried that general single-drive failures
could turn into unavailability of five drives regardless of workload.

Any guidance would be appreciated.

Thanks!
Bob Bawn
Dieter BSD
2013-07-22 23:17:28 UTC
Permalink
Post by Bob Bawn
Drives: 45 * Seagate Altos ST3000NC002
Port Multipliers: 9 * SiI3826
SATA Controller: 3 * Marvell 88SX7042
After a few hours of a database-like workload over ZFS (NCQ enable, disk
write caches disabled), a disk becomes unresponsive (we think due to a
I have an 8.2 machine with Sil3132 controllers with Sil3726 pm with variety
of drives. I have been getting the "Timeout on slot <small integer>"
followed by "lost device". Sometimes the device reappears. (Although
the /dev/ufs/label does *not* reappear. :-( ) I have not seen the other
drives on the pm get removed, or had to power cycle to recover. Seagate
ST3000DM001 with CC4B firmware seems especially bad. ST3000DM001 with CC24
firmware have been ok. So your theory that the drive firmware has a problem
seems promising.

Sounds like FreeBSD is doing something bad to the pm, which Linux
isn't doing. Perhaps log the commands the OS sends to the
controller (over the network to a 2nd machine, or to a local
disk not on a pm) and compare BSD to Linux? Perhaps start
logging when you get the first timeout, to save hours of commands
to wade through.

Alternately you could stare at the driver sources until enlightenment
occurs.

AFAIK FreeBSD has never gotten a proper workaround for the quirk in
the 1st generation Sil sata controllers, while they run fine on NetBSD.
There might be a bug/quirk in the pm's firmware that FreeBSD triggers
but Linus doesn't.
Lev Serebryakov
2013-07-25 11:13:41 UTC
Permalink
Hello, Bob.
You wrote 22 июля 2013 г., 18:35:57:

BB> Drives: 45 * Seagate Altos ST3000NC002
BB> Port Multipliers: 9 * SiI3826
BB> SATA Controller: 3 * Marvell 88SX7042
I've heard, that only SiI3132 (2 port controller) works really well with
port multipliers
But we should wait answer from Alexander Motin (mav@) to be sure :)
--
// Black Lion AKA Lev Serebryakov <***@FreeBSD.org>
Dieter BSD
2013-07-25 21:42:39 UTC
Permalink
Post by Lev Serebryakov
I've heard, that only SiI3132 (2 port controller) works really well with
port multipliers
I've heard the theory that you are better off matching a Silicon Image
port multiplier with a Silicon Image controller (e.g. 3132 or 3124),
but with no data to back it up. For me, the Sil3726 pm seems to work as
well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)).

My theory is that as long as everything is working normally,
it works fine. But if you get some glitch (caused by hardware,
firmware, whatever), the recovery may not be as smooth as it
should be, or it may not recover at all.

Anyone have a way to inject various SATA faults on demand?

BTW, would the -drivers list be a better place to discuss this?
Sam Fourman Jr.
2013-07-26 00:07:18 UTC
Permalink
Post by Dieter BSD
Post by Lev Serebryakov
I've heard, that only SiI3132 (2 port controller) works really well with
port multipliers
I've heard the theory that you are better off matching a Silicon Image
port multiplier with a Silicon Image controller (e.g. 3132 or 3124),
but with no data to back it up. For me, the Sil3726 pm seems to work as
well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)).
I can confirm I have this SAME exact problem on -HEAD ... both of my
mirrored ZFS disks disappear seemingly at random during a build world....
there is a dmesg for the motherboard in question, in a different post... I
also have pciconf output... I hope there is a patch for this problem...
--
Sam Fourman Jr.
Sam Fourman Jr.
2013-07-26 00:07:45 UTC
Permalink
Post by Sam Fourman Jr.
Post by Dieter BSD
Post by Lev Serebryakov
I've heard, that only SiI3132 (2 port controller) works really well with
port multipliers
I've heard the theory that you are better off matching a Silicon Image
port multiplier with a Silicon Image controller (e.g. 3132 or 3124),
but with no data to back it up. For me, the Sil3726 pm seems to work as
well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)).
I can confirm I have this SAME exact problem on -HEAD ... both of my
mirrored ZFS disks disappear seemingly at random during a build world....
there is a dmesg for the motherboard in question, in a different post... I
also have pciconf output... I hope there is a patch for this problem...
--
forgot to paste the link :)
http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043239.html
Post by Sam Fourman Jr.
--
Sam Fourman Jr.
Bob Bawn
2013-07-26 22:16:55 UTC
Permalink
Sounds like FreeBSD is doing something bad to the pm, which Linux isn't
doing. Perhaps log the commands the OS sends to the controller (over the
network to a 2nd machine, or to a local disk not on a pm) and compare
BSD to Linux? Perhaps start logging when you get the first timeout, to
save hours of commands to wade through.
Yes, that occurred to me. I was hoping to avoid learning how to build
kernels with debug messages but I suppose it's a good skill to have. :-)
Alternately you could stare at the driver sources until enlightenment
occurs.
I did a little of this and superficially it does seem like there could
be differences between FreeBSD and Linux in the treatment of the
mysterious 6th port (SEMB) on the 5-port multiplier. Hopefully, the
logging you suggest will clarify the situation.

Thanks for your help.
Sam Fourman Jr.
2013-07-26 22:30:05 UTC
Permalink
Post by Bob Bawn
I did a little of this and superficially it does seem like there could
be differences between FreeBSD and Linux in the treatment of the
mysterious 6th port (SEMB) on the 5-port multiplier. Hopefully, the
logging you suggest will clarify the situation.
Thanks for your help.
if you have a script or a way to build a kernel to help debug this I will
run it if you post it here... I have the same issue on a 3 port multiplier
using -HEAD
--
Sam Fourman Jr.
Dieter BSD
2013-07-28 00:08:40 UTC
Permalink
Post by Bob Bawn
After a few hours of a database-like workload
A faster way to trigger the problem would be useful.
Post by Bob Bawn
We're actually more interested in archive type workloads than this
database workload and we have not observed the problem with an archive
workload.
So perhaps something about the timing triggers the bug?

Sam writes
Post by Bob Bawn
if you have a script or a way to build a kernel to help debug this I will
run it if you post it here... I have the same issue on a 3 port multiplier
using -HEAD
Can you share the make and model of this 3 port multiplier?
If it is happening with more than one model of pm, it is more likely
some generic problem, rather than triggering some model-specific quirk/bug.
Has anyone seen this problem with an older OS release? (say 7.x or 8.x?)
If the problem was introduced recently, we might be able to find it
by looking at what changed in the source code. I haven't seen the
problem with 8.2 or earlier.

Looks like a verbose boot will give a little more info.
But I suspect that adding more log(9) statements will be needed.
Unless mav has a better idea?
Alexander Motin
2013-07-28 10:23:59 UTC
Permalink
Post by Dieter BSD
Post by Bob Bawn
After a few hours of a database-like workload
A faster way to trigger the problem would be useful.
Post by Bob Bawn
We're actually more interested in archive type workloads than this
database workload and we have not observed the problem with an archive
workload.
So perhaps something about the timing triggers the bug?
Sam writes
Post by Bob Bawn
if you have a script or a way to build a kernel to help debug this I will
run it if you post it here... I have the same issue on a 3 port multiplier
using -HEAD
Can you share the make and model of this 3 port multiplier?
If it is happening with more than one model of pm, it is more likely
some generic problem, rather than triggering some model-specific quirk/bug.
Has anyone seen this problem with an older OS release? (say 7.x or 8.x?)
If the problem was introduced recently, we might be able to find it
by looking at what changed in the source code. I haven't seen the
problem with 8.2 or earlier.
Looks like a verbose boot will give a little more info.
But I suspect that adding more log(9) statements will be needed.
Unless mav has a better idea?
There are two sides of this problem: original issue and imperfect error
recovery. First one is a big question. I can't say what is actually
going on there that causes the problem. Just recently I've made one more
attempt to get some documentation on SATA controllers from Marvell. But
even after signing NDA process again stopped since I am neither buying
thousands of their chips as vendor nor they are supporting for
end-users. The alike situation is with other vendors.

What's about the recovery, problem is that neither CAM nor mvs driver
now track faulty status of the devices. So if some disk's firmware stuck
and start causing infinite timeouts, that will substantially interrupt
operation of other devices sharing that SATA port. Probably the
mechanism of dropping faulty device could be improved somehow.

What is about SAS, mentioned here -- that is quite different more
expensive market. And even while protocols are much more sophisticated
and hardware, firmware and software there are much better tested, there
also situations happen sometimes when single misbehaving device may put
down whole fabric.
--
Alexander Motin
Willem Jan Withagen
2013-07-28 11:23:42 UTC
Permalink
Post by Alexander Motin
Just recently I've made one more
attempt to get some documentation on SATA controllers from Marvell. But
even after signing NDA process again stopped since I am neither buying
thousands of their chips as vendor nor they are supporting for
end-users. The alike situation is with other vendors.
I heard from several board design houses and distributors that Marvell
is among the hardest to get the stuff from. So your story is not unlike
other stories I've heard.

In our previous company we needed support from a rather big chip vendor
as well... (Philips NXP) which sort of worked while they were still in
Eindhoven. But once this division got sold, things got a lost worse.

And in the end we had to got thru the designer/manufacturer of our
boards, and a second tier distributor. And again everything under the
original NDA.

Perhaps this is somethings you could also do for this problem. Find a
coorperative PM board manufacturer, and bigback on their support with
the promise to "support" their PM boards in FreeBSD.

Motivation for them would have to be that there could be a sales
advantage in selling PM boards to FreeBSDers. And given that we
advertise that ZFS does not need complex/expensive RAID controllers
will increase the usage of more simple devices.

Otehr way about it would be to involve the FreeBSD foundation, and get
them to do the legal part of the stuff as a umbrella for the developers,
and then delegate (again under NDA) work to developers that want to work
on Marvell stuff.

--WjW

Loading...