Discussion:
mfi driver performance too bad on LSI MegaRAID SAS 9260-8i
(too old to reply)
Jason Zhang
2016-06-17 10:54:24 UTC
Permalink
Hi,

I am working on storage service based on FreeBSD. I look forward to a good result because many professional storage company use FreeBSD as its OS. But I am disappointed with the Bad performance. I tested the the performance of LSI MegaRAID 9260-8i and had the following bad result:

1. Test environment:
(1) OS: FreeBSD 10.0 release
(2) Memory: 16G
(3) RAID adapter: LSI MegaRAID 9260-8i
(4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive
(5) Test tools: fio with io-depth=1, thread num is 32 and block size is 64k or 1M
(6) RAID configuration: RAID 5, stripe size is 1M

2. Test result:
(1) write performance too bad: 20Mbytes/s throughput and 200 random write IOPS
(2) read performance is expected: 700Mbytes/s throughput and 1500 random read IOPS


I tested the same hardware configuration with CentOS linux and Linux's write performance is 5 times better than FreeBSD.


Anyone encountered the same performance problem? Does the mfi driver have performance issue or I should give up on FreeBSD?





Jason
Jan Bramkamp
2016-06-17 13:42:44 UTC
Permalink
Post by Jason Zhang
Hi,
(1) OS: FreeBSD 10.0 release
(2) Memory: 16G
(3) RAID adapter: LSI MegaRAID 9260-8i
(4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive
(5) Test tools: fio with io-depth=1, thread num is 32 and block size is 64k or 1M
(6) RAID configuration: RAID 5, stripe size is 1M
(1) write performance too bad: 20Mbytes/s throughput and 200 random write IOPS
(2) read performance is expected: 700Mbytes/s throughput and 1500 random read IOPS
I tested the same hardware configuration with CentOS linux and Linux's write performance is 5 times better than FreeBSD.
Anyone encountered the same performance problem? Does the mfi driver have performance issue or I should give up on FreeBSD?
I have an old server with the same LSI 2108 chip as onboard HW RAID
controller, but I can't test run performance tests on it right now. As
far as I remember the controller was a bit annoying to configure but
works well for a SAS HW-RAID controller once configured.

The real question is why do you even want to use a hardware RAID
controller. In most cases you're better of with a dumb SAS HBA and ZFS.
Hardware RAID controllers just get in the way. Invest in more RAM or low
latency SSDs as dedicated ZIL devices instead of hardware RAID
controllers if you want to build a FreeBSD based storage system.

Which filesystem did you use for your benchmark?

Please not that FreeBSD 10.0 is EoL. The latest FreeBSD release is 10.3
and FreeBSD 11.0 is just around the corner.
Mark Felder
2016-06-21 16:29:57 UTC
Permalink
Post by Jason Zhang
Hi,
I am working on storage service based on FreeBSD. I look forward to a
good result because many professional storage company use FreeBSD as its
OS. But I am disappointed with the Bad performance. I tested the the
(1) OS: FreeBSD 10.0 release
10.0-RELEASE is no longer supported. Can you reproduce this on
10.3-RELEASE?
--
Mark Felder
***@feld.me
Mark Felder
2016-06-21 16:37:00 UTC
Permalink
Post by Jason Zhang
Hi,
I am working on storage service based on FreeBSD. I look forward to a
good result because many professional storage company use FreeBSD as its
OS. But I am disappointed with the Bad performance. I tested the the
(1) OS: FreeBSD 10.0 release
10.0-RELEASE is no longer supported. Can you test this on 10.3-RELEASE?

Have you confirmed that both servers are using identical RAID controller
settings? It's possible the CentOS install has enabled write caching but
it's disabled on your FreeBSD server. Are you using UFS or ZFS on
FreeBSD? Do you have atime enabled? I believe CentOS is going to have
"relatime" or "nodiratime" by default to mitigate the write penalty on
each read access.

We need more data :-)
--
Mark Felder
ports-secteam member
***@FreeBSD.org
Jason Zhang
2016-06-22 02:14:40 UTC
Permalink
Mark,

Thanks

We have same RAID setting both on FreeBSD and CentOS including cache setting. In FreeBSD, I enabled the write cache but the performance is the same.

We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk “mfidx” exported by mfi driver. We observed the “gstat” result and found that the write latency
is too high. When we “dd" the disk with 8k, it is lower than 1ms, but it is 6ms on 64kb write. It seems that each single write operation is very slow. But I don’t know
whether it is a driver problem or not.


Jason
Post by Mark Felder
Post by Jason Zhang
Hi,
I am working on storage service based on FreeBSD. I look forward to a
good result because many professional storage company use FreeBSD as its
OS. But I am disappointed with the Bad performance. I tested the the
(1) OS: FreeBSD 10.0 release
10.0-RELEASE is no longer supported. Can you test this on 10.3-RELEASE?
Have you confirmed that both servers are using identical RAID controller
settings? It's possible the CentOS install has enabled write caching but
it's disabled on your FreeBSD server. Are you using UFS or ZFS on
FreeBSD? Do you have atime enabled? I believe CentOS is going to have
"relatime" or "nodiratime" by default to mitigate the write penalty on
each read access.
We need more data :-)
--
Mark Felder
ports-secteam member
Borja Marcos
2016-06-22 07:05:23 UTC
Permalink
Post by Jason Zhang
Mark,
Thanks
We have same RAID setting both on FreeBSD and CentOS including cache setting. In FreeBSD, I enabled the write cache but the performance is the same.
We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk “mfidx” exported by mfi driver. We observed the “gstat” result and found that the write latency
is too high. When we “dd" the disk with 8k, it is lower than 1ms, but it is 6ms on 64kb write. It seems that each single write operation is very slow. But I don’t know
whether it is a driver problem or not.
There is an option you can use (I do it all the time!) to make the card behave as a plain HBA so that the disks are handled by the “da” driver.

Add this to /boot/loader.conf

hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES"

And do the tests accessing the disks as “da”. To avoid confusions, it’s better to make sure the disks are not part of a “jbod” or logical volume configuration.




Borja.
O. Hartmann
2016-08-01 08:41:49 UTC
Permalink
On Wed, 22 Jun 2016 08:58:08 +0200
Post by Borja Marcos
Post by Jason Zhang
Mark,
Thanks
We have same RAID setting both on FreeBSD and CentOS including cache
setting. In FreeBSD, I enabled the write cache but the performance is the
same.
We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk
“mfidx” exported by mfi driver. We observed the “gstat” result and found
that the write latency is too high. When we “dd" the disk with 8k, it is
lower than 1ms, but it is 6ms on 64kb write. It seems that each single
write operation is very slow. But I don’t know whether it is a driver
problem or not.
There is an option you can use (I do it all the time!) to make the card
behave as a plain HBA so that the disks are handled by the “da” driver.
Add this to /boot/loader.conf
hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES"
And do the tests accessing the disks as “da”. To avoid confusions, it’s
better to make sure the disks are not part of a “jbod” or logical volume
configuration.
Borja.
[...]

How is this supposed to work when ALL disks (including boot device) are settled
with the mfi (in our case, it is a Fujitsu CP400i, based upon LSI3008 and
detected within FreeBSD 11-BETA and 12-CURRENT) controller itself?

I did not find any solution to force the CP400i into a mode making itself
acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
kernel/ZFS control everything).

The boot device is a 256 GB Samsung SSD for enterprise use and putting the UEFI
load onto a EFI partition from 11-CURRENT-ALPHA4 is worse: dd takes up to
almost a minute to put the image onto the SSD. The SSD active LED is blinking
alle the time indicating activity. Caches are off. I tried to enable the cache
via the mfiutil command by 'mfiutil cache mfid0 enable', but it failed ... It
failed also on all other attached drives.

I didn't further go into more investigations right now, since the experience
with the EFI boot loader makes me suspect bad performance and that is harsh so
to speak. Glad to have found this thread anyway.

I cross post this also to CURRENT as it might be an issue with CURRENT ...

Kind regards,

Oliver Hartmann
Borja Marcos
2016-08-01 09:56:58 UTC
Permalink
Post by O. Hartmann
On Wed, 22 Jun 2016 08:58:08 +0200
Post by Borja Marcos
There is an option you can use (I do it all the time!) to make the card
behave as a plain HBA so that the disks are handled by the “da” driver.
Add this to /boot/loader.conf
hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES"
And do the tests accessing the disks as “da”. To avoid confusions, it’s
better to make sure the disks are not part of a “jbod” or logical volume
configuration.
Borja.
[...]
How is this supposed to work when ALL disks (including boot device) are settled
with the mfi (in our case, it is a Fujitsu CP400i, based upon LSI3008 and
detected within FreeBSD 11-BETA and 12-CURRENT) controller itself?
I did not find any solution to force the CP400i into a mode making itself
acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
kernel/ZFS control everything).
Have you tried that particular option?

With kinda recent LSI based cards you have three options:

- The most usual and definitely NOT RECOMMENDED option is to define a logical volume per disk
which actually LSI Logic called before JBOD mode. It’s not recommended at all if you want to run ZFS.

- Recent cards, I think I saw this first on the LSI3008, have a JBOD mode that exposes the drives as “mfisyspd” devices.
I don’t recommend it either, because the syspd drives are a sort of limited version of a disk device. With SSDs, especially, you
don’t have access to the TRIM command.

- The third option is to make the driver expose the SAS devices like a HBA would do, so that they are visible to the
CAM layer, and disks are handled by the stock “da” driver, which is the ideal solution.

However, this third option might not be available in some custom firmware versions for certain manufacturers? I don´t
know. And I would hesitate to make the conversion on a production machine unless you have a complete and reliable
full backup of all the data in case you need to rebuild it.

In order to do it you need a couple of things. You need to set the variable hw.mfi.allow_cam_disk_passthrough=1
and to load the mfip.ko module.

When booting installation media, enter command mode and use these commands:

-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———


Remember that after installation you need to update /boot/loader.conf in the system you just installed with the
following contents:

hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES”


A note regarding CAM and MFI visibility: On some old firmware versions for the LSI2008 I’ve even seen the disks
available both as “mfi” and “da” drivers. If possible, you should try to set them up as “unconfigured good” on the RAID
firmware. Use the RAID firmware set up or maybe mfiutil(8)

Also, make sure you don’t create any logical volumes on the disks you want exposed to CAM. You should delete
the logical volumes so that the MFI firmware doesn’t do anything with them.

AND BEWARE: Doing these changes to a system in production with valuable data is dangerous. Make sure you have a full
and sound backup before making these changes.

As a worst case, the card could expose the devices both as “syspd” and CAM (i.e., “da” drives) but as long as you don’t
touch the syspd devices the card won’t do anything to them as far as I know. It could be a serious problem, however, if you
access a drive part of a logical volume through CAM, as RAID cards tend do to “patrol reads” and other stuff on them.

Provided it’s safe to do what I recommended, try it and follow up by email.





Borja.
O. Hartmann
2016-08-01 13:12:29 UTC
Permalink
On Mon, 1 Aug 2016 11:48:30 +0200
Borja Marcos <***@sarenet.es> wrote:

Hello.

First, thanks for responding so quickly.
Post by Borja Marcos
Post by O. Hartmann
On Wed, 22 Jun 2016 08:58:08 +0200
Post by Borja Marcos
There is an option you can use (I do it all the time!) to make the card
behave as a plain HBA so that the disks are handled by the “da” driver.
Add this to /boot/loader.conf
hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES"
And do the tests accessing the disks as “da”. To avoid confusions, it’s
better to make sure the disks are not part of a “jbod” or logical volume
configuration.
Borja.
[...]
How is this supposed to work when ALL disks (including boot device) are
settled with the mfi (in our case, it is a Fujitsu CP400i, based upon
LSI3008 and detected within FreeBSD 11-BETA and 12-CURRENT) controller
itself?
I did not find any solution to force the CP400i into a mode making itself
acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
kernel/ZFS control everything).
Have you tried that particular option?
I have, indeed, used the "JBOD" function of the PRAID CP400i controller and the
intention of my posting regards to the suspicion, that this is, as mentioned in
many posts concerning RAID controllers and ZFS, the reason for the worse
performance. And as I can see, it has been confirmed, sadly.
Post by Borja Marcos
- The most usual and definitely NOT RECOMMENDED option is to define a logical
volume per disk which actually LSI Logic called before JBOD mode. It’s not
recommended at all if you want to run ZFS.
This is the only way to expose each disk as it is to the OS with the PRAID
CP400i built-in into our RX1330-M2 server (XEON Skylake based). I ordered that
specific box with a HBA capable controller. Searching the net reveals that
there is another one, called PSAS CP400i, which is also based on LSI/Avago
SAS3008 and the possibility to expose drives as-is is explicitely mentioned. I
do not know whether this is a software feature - as I suspect - or something
which has been hardwired to the controller.
Post by Borja Marcos
- Recent cards, I think I saw this first on the LSI3008, have a JBOD mode
that exposes the drives as “mfisyspd” devices. I don’t recommend it either,
because the syspd drives are a sort of limited version of a disk device. With
SSDs, especially, you don’t have access to the TRIM command.
They expose the drives as "mfidX" if setup as JBOD.
Post by Borja Marcos
- The third option is to make the driver expose the SAS devices like a HBA
would do, so that they are visible to the CAM layer, and disks are handled by
the stock “da” driver, which is the ideal solution.
I didn't find any switch which offers me the opportunity to put the PRAID
CP400i into a simple HBA mode.
Post by Borja Marcos
However, this third option might not be available in some custom firmware
versions for certain manufacturers? I don´t know. And I would hesitate to
make the conversion on a production machine unless you have a complete and
reliable full backup of all the data in case you need to rebuild it.
The boxes are empty and ready-for-installation, so I do not worry. It is more
worrying about this stupid software-based strangulations of options by Fujitsu
- if any. i do not want to blame them before I haven't double-checked.
Post by Borja Marcos
In order to do it you need a couple of things. You need to set the variable
hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———
Well, I'm truly aware of this problemacy and solution (now), but I run into a
henn-egg-problem, literally. As long as I can boot off of the installation
medium, I have a kernel which deals with the setting. But the boot medium is
supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
never be able to boot off the system without crippling the ability to have a
fullspeed ZFS configuration which I suppose to have with HBA mode, but not
with any of the forced RAID modes offered by the controller.


I will check with Fujitsu for a solution. Maybe the PRAID CP400i is capable
somehow of being a PSAS CP400i also, even if not exposed by the
recent/installed firmware.

Kind regards,
Oliver
Post by Borja Marcos
Remember that after installation you need to update /boot/loader.conf in the
hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES”
A note regarding CAM and MFI visibility: On some old firmware versions for
the LSI2008 I’ve even seen the disks available both as “mfi” and “da”
drivers. If possible, you should try to set them up as “unconfigured good” on
the RAID firmware. Use the RAID firmware set up or maybe mfiutil(8)
Also, make sure you don’t create any logical volumes on the disks you want
exposed to CAM. You should delete the logical volumes so that the MFI
firmware doesn’t do anything with them.
AND BEWARE: Doing these changes to a system in production with valuable data
is dangerous. Make sure you have a full and sound backup before making these
changes.
As a worst case, the card could expose the devices both as “syspd” and CAM
(i.e., “da” drives) but as long as you don’t touch the syspd devices the card
won’t do anything to them as far as I know. It could be a serious problem,
however, if you access a drive part of a logical volume through CAM, as RAID
cards tend do to “patrol reads” and other stuff on them.
Provided it’s safe to do what I recommended, try it and follow up by email.
Borja.
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
Borja Marcos
2016-08-01 13:31:18 UTC
Permalink
Post by O. Hartmann
First, thanks for responding so quickly.
Post by Borja Marcos
- The third option is to make the driver expose the SAS devices like a HBA
would do, so that they are visible to the CAM layer, and disks are handled by
the stock “da” driver, which is the ideal solution.
I didn't find any switch which offers me the opportunity to put the PRAID
CP400i into a simple HBA mode.
The switch is in the FreeBSD mfi driver, the loader tunable I mentioned, regardless of what the card
firmware does or pretends to do.

It’s not visible doing a "sysctl -a”, but it exists and it’s unique even. It’s defined here:

https://svnweb.freebsd.org/base/stable/10/sys/dev/mfi/mfi_cam.c?revision=267084&view=markup
(line 93)
Post by O. Hartmann
Post by Borja Marcos
In order to do it you need a couple of things. You need to set the variable
hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———
Well, I'm truly aware of this problemacy and solution (now), but I run into a
henn-egg-problem, literally. As long as I can boot off of the installation
medium, I have a kernel which deals with the setting. But the boot medium is
supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
never be able to boot off the system without crippling the ability to have a
fullspeed ZFS configuration which I suppose to have with HBA mode, but not
with any of the forced RAID modes offered by the controller.
Been there plenty of times, even argued quite strongly about the advantages of ZFS against hardware based RAID
5 cards. :) I remember when the Dell salesmen couldn’t possibly understand why I wanted a “software based RAID rather than a
robust, hardware based solution” :D

At worst, you can set up a simple boot from a thumb drive or, even better, a SATADOM installed inside the server. I guess it will
have SATA ports on the mainboard. That’s what I use to do. FreeNAS uses a similar approach as well. And some modern servers
also can boot from a SD card which you can use just to load the kernel.

Depending on the number of disks you have, you can also sacrifice two to set up a mirror with a “nomal” boot system, and using
the rest of the disks for ZFS. Actually I’ve got an old server I set up in 2012. It has 16 disks, and I created a logical volume (mirror)
with 2 disks for boot, the other 14 disks for ZFS.

If I installed this server now I would do it different, booting off a thumb drive. But I was younger and naiver :)






Borja.
Michelle Sullivan
2016-08-01 20:34:22 UTC
Permalink
Post by Borja Marcos
Post by O. Hartmann
First, thanks for responding so quickly.
Post by Borja Marcos
- The third option is to make the driver expose the SAS devices like a HBA
would do, so that they are visible to the CAM layer, and disks are handled by
the stock “da” driver, which is the ideal solution.
I didn't find any switch which offers me the opportunity to put the PRAID
CP400i into a simple HBA mode.
The switch is in the FreeBSD mfi driver, the loader tunable I mentioned, regardless of what the card
firmware does or pretends to do.
https://svnweb.freebsd.org/base/stable/10/sys/dev/mfi/mfi_cam.c?revision=267084&view=markup
(line 93)
Post by O. Hartmann
Post by Borja Marcos
In order to do it you need a couple of things. You need to set the variable
hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———
Well, I'm truly aware of this problemacy and solution (now), but I run into a
henn-egg-problem, literally. As long as I can boot off of the installation
medium, I have a kernel which deals with the setting. But the boot medium is
supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
never be able to boot off the system without crippling the ability to have a
fullspeed ZFS configuration which I suppose to have with HBA mode, but not
with any of the forced RAID modes offered by the controller.
Been there plenty of times, even argued quite strongly about the advantages of ZFS against hardware based RAID
5 cards. :) I remember when the Dell salesmen couldn’t possibly understand why I wanted a “software based RAID rather than a
robust, hardware based solution” :D
There are reasons for using either...

Nowadays its seems the conversations have degenerated into those like
Windows vs Linux vs Mac where everyone thinks their answer is the right
one (just as you suggested you (Borja Marcos) did with the Dell
salesman), where in reality each has its own advantages and
disadvantages. Eg: I'm running 2 zfs servers on 'LSI 9260-16i's... big
mistake! (the ZFS, not LSI's)... one is a 'movie server' the other a
'postgresql database' server... The latter most would agree is a bad
use of zfs, the die-hards won't but then they don't understand database
servers and how they work on disk. The former has mixed views, some
argue that zfs is the only way to ensure the movies will always work,
personally I think of all the years before zfs when my data on disk
worked without failure until the disks themselves failed... and RAID
stopped that happening... what suddenly changed, are disks and ram
suddenly not reliable at transferring data? .. anyhow back to the issue
there is another part with this particular hardware that people just
throw away...

The LSI 9260-* controllers have been designed to provide on hardware
RAID. The caching whether using the Cachecade SSD or just oneboard ECC
memory is *ONLY* used when running some sort of RAID set and LVs... this
is why LSI recommend 'MegaCli -CfgEachDskRaid0' because it does enable
caching.. A good read on how to setup something similar is here:
https://calomel.org/megacli_lsi_commands.html (disclaimer, I haven't
parsed it all so the author could be clueless, but it seems to give
generally good advice.) Going the way of 'JBOD' is a bad thing to do,
just don't, performance sucks. As for the recommended command above,
can't comment because currently I don't use it nor will I need to in the
near future... but...

If you (O Hartmann) want to use or need to use ZFS with any OS including
FreeBSD don't go with the LSI 92xx series controllers, its just the
wrong thing to do.. Pick an HBA that is designed to give you direct
access to the drives not one you have to kludge and cajole.. Including
LSI controllers with caches that use the mfi driver, just not those that
are not designed to work in a non RAID mode (with or without the
passthru command/mode above.)
Post by Borja Marcos
At worst, you can set up a simple boot from a thumb drive or, even better, a SATADOM installed inside the server. I guess it will
have SATA ports on the mainboard. That’s what I use to do. FreeNAS uses a similar approach as well. And some modern servers
also can boot from a SD card which you can use just to load the kernel.
Depending on the number of disks you have, you can also sacrifice two to set up a mirror with a “nomal” boot system, and using
the rest of the disks for ZFS. Actually I’ve got an old server I set up in 2012. It has 16 disks, and I created a logical volume (mirror)
with 2 disks for boot, the other 14 disks for ZFS.
If I installed this server now I would do it different, booting off a thumb drive. But I was younger and naiver :)
If I installed mine now I would do them differently as well... neither
would run ZFS, both would use their on card RAID kernels and UFS on top
of them... ZFS would be reserved for the multi-user NFS file servers.
(and trust me here, when it comes to media servers - where the media is
just stored not changed/updated/edited - the 16i with a good highspeed
SSD as 'Cachecade' really performs well... and on a moderately powerful
MB/CPU combo with good RAM and several gigabit interfaces it's
surprising how many unicast transcoded media streams it can handle...
(read: my twin fibres are saturated before the machine reaches anywhere
near full load, and I can still write at 13MBps from my old Mac Mini
over NFS... which is about all it can do without any load either.)

So moral of the story/choices. Don't go with ZFS because people tell
you its best, because it isn't, go with ZFS if it suits your hardware
and application, and if ZFS suits your application, get hardware for it.

Regards,
--
Michelle Sullivan
http://www.mhix.org/
Ultima
2016-08-02 03:23:02 UTC
Permalink
If anyone is interested, as Michelle Sullivan just mentioned. One problem I
found when looking for an HBA is that they are not so easy to find. Scoured
the internet for a backup HBA I came across these -
http://www.avagotech.com/products/server-storage/host-bus-adapters/#tab-12Gb1

Can only speak for sas-9305-24i. All 24 bays are occupied and quite pleased
with the performance compared to its predecessor. It was originally going
to be a backup unit, however that changed after running a scrub and the
amount of hours to complete cut in half (around 30ish to 15 for 35T). And
of course, the reason for this post, it replaced a raid card in passthrough
mode.

Another note, because it is an HBA, the ability to flash firmware is once
again possible! (yay!)

+1 to HBA's + ZFS, if possible replace it for an HBA.
Post by Michelle Sullivan
Post by Borja Marcos
Post by O. Hartmann
First, thanks for responding so quickly.
- The third option is to make the driver expose the SAS devices like a
Post by Borja Marcos
HBA
would do, so that they are visible to the CAM layer, and disks are handled by
the stock “da” driver, which is the ideal solution.
I didn't find any switch which offers me the opportunity to put the PRAID
CP400i into a simple HBA mode.
The switch is in the FreeBSD mfi driver, the loader tunable I mentioned,
regardless of what the card
firmware does or pretends to do.
It’s not visible doing a "sysctl -a”, but it exists and it’s unique even.
https://svnweb.freebsd.org/base/stable/10/sys/dev/mfi/mfi_cam.c?revision=267084&view=markup
(line 93)
In order to do it you need a couple of things. You need to set the
Post by O. Hartmann
Post by Borja Marcos
variable
hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———
Well, I'm truly aware of this problemacy and solution (now), but I run into a
henn-egg-problem, literally. As long as I can boot off of the installation
medium, I have a kernel which deals with the setting. But the boot medium is
supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
never be able to boot off the system without crippling the ability to have a
fullspeed ZFS configuration which I suppose to have with HBA mode, but not
with any of the forced RAID modes offered by the controller.
Been there plenty of times, even argued quite strongly about the
advantages of ZFS against hardware based RAID
5 cards. :) I remember when the Dell salesmen couldn’t possibly
understand why I wanted a “software based RAID rather than a
robust, hardware based solution” :D
There are reasons for using either...
Nowadays its seems the conversations have degenerated into those like
Windows vs Linux vs Mac where everyone thinks their answer is the right one
(just as you suggested you (Borja Marcos) did with the Dell salesman),
where in reality each has its own advantages and disadvantages. Eg: I'm
running 2 zfs servers on 'LSI 9260-16i's... big mistake! (the ZFS, not
LSI's)... one is a 'movie server' the other a 'postgresql database'
server... The latter most would agree is a bad use of zfs, the die-hards
won't but then they don't understand database servers and how they work on
disk. The former has mixed views, some argue that zfs is the only way to
ensure the movies will always work, personally I think of all the years
before zfs when my data on disk worked without failure until the disks
themselves failed... and RAID stopped that happening... what suddenly
changed, are disks and ram suddenly not reliable at transferring data? ..
anyhow back to the issue there is another part with this particular
hardware that people just throw away...
The LSI 9260-* controllers have been designed to provide on hardware
RAID. The caching whether using the Cachecade SSD or just oneboard ECC
memory is *ONLY* used when running some sort of RAID set and LVs... this is
why LSI recommend 'MegaCli -CfgEachDskRaid0' because it does enable
https://calomel.org/megacli_lsi_commands.html (disclaimer, I haven't
parsed it all so the author could be clueless, but it seems to give
generally good advice.) Going the way of 'JBOD' is a bad thing to do, just
don't, performance sucks. As for the recommended command above, can't
comment because currently I don't use it nor will I need to in the near
future... but...
If you (O Hartmann) want to use or need to use ZFS with any OS including
FreeBSD don't go with the LSI 92xx series controllers, its just the wrong
thing to do.. Pick an HBA that is designed to give you direct access to
the drives not one you have to kludge and cajole.. Including LSI
controllers with caches that use the mfi driver, just not those that are
not designed to work in a non RAID mode (with or without the passthru
command/mode above.)
Post by Borja Marcos
At worst, you can set up a simple boot from a thumb drive or, even
better, a SATADOM installed inside the server. I guess it will
have SATA ports on the mainboard. That’s what I use to do. FreeNAS uses a
similar approach as well. And some modern servers
also can boot from a SD card which you can use just to load the kernel.
Depending on the number of disks you have, you can also sacrifice two to
set up a mirror with a “nomal” boot system, and using
the rest of the disks for ZFS. Actually I’ve got an old server I set up
in 2012. It has 16 disks, and I created a logical volume (mirror)
with 2 disks for boot, the other 14 disks for ZFS.
If I installed this server now I would do it different, booting off a
thumb drive. But I was younger and naiver :)
If I installed mine now I would do them differently as well... neither
would run ZFS, both would use their on card RAID kernels and UFS on top of
them... ZFS would be reserved for the multi-user NFS file servers. (and
trust me here, when it comes to media servers - where the media is just
stored not changed/updated/edited - the 16i with a good highspeed SSD as
'Cachecade' really performs well... and on a moderately powerful MB/CPU
combo with good RAM and several gigabit interfaces it's surprising how many
unicast transcoded media streams it can handle... (read: my twin fibres are
saturated before the machine reaches anywhere near full load, and I can
still write at 13MBps from my old Mac Mini over NFS... which is about all
it can do without any load either.)
So moral of the story/choices. Don't go with ZFS because people tell you
its best, because it isn't, go with ZFS if it suits your hardware and
application, and if ZFS suits your application, get hardware for it.
Regards,
--
Michelle Sullivan
http://www.mhix.org/
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
Borja Marcos
2016-08-02 08:27:22 UTC
Permalink
There are reasons for using either…
Indeed, but my decision was to run ZFS. And getting a HBA in some configurations can be difficult because vendors insist on using
RAID adapters. After all, that’s what most of their customers demand.

Fortunately, at least some Avago/LSI cards can work as HBAs pretty well. An example is the now venerable LSI2008.
Nowadays its seems the conversations have degenerated into those like Windows vs Linux vs Mac where everyone thinks their answer is the right one (just as you suggested you (Borja Marcos) did with the Dell salesman), where in reality each has its own advantages and disadvantages.
I know, but this is not the case. But it’s quite frustrating to try to order a server with a HBA rather than a RAID and receiving an answer such as
“the HBA option is not available”. That’s why people are zapping, flashing and, generally, torturing HBA cards rather cruelly ;)

So, in my case, it’s not about what’s better or worse. It’s just a simpler issue. Customer (myself) has made a decision, which can be right or wrong. Manufacturer fails to deliver what I need. If it was only one manufacturer, well, off with them, but the issue is widespread in industry.
Eg: I'm running 2 zfs servers on 'LSI 9260-16i's... big mistake! (the ZFS, not LSI's)... one is a 'movie server' the other a 'postgresql database' server... The latter most would agree is a bad use of zfs, the die-hards won't but then they don't understand database servers and how they work on disk. The former has mixed views, some argue that zfs is the only way to ensure the movies will always work, personally I think of all the years before zfs when my data on disk worked without failure until the disks themselves failed... and RAID stopped that happening... what suddenly changed, are disks and ram suddenly not reliable at transferring data? .. anyhow back to the issue there is another part with this particular hardware that people just throw away…
Well, silent corruption can happen. I’ve seen it once caused by a flaky HBA and ZFS saved the cake. Yes. there were reliable replicas. Still, rebuilding would be a pain in the ass.
The LSI 9260-* controllers have been designed to provide on hardware RAID. The caching whether using the Cachecade SSD or just oneboard ECC memory is *ONLY* used when running some sort of RAID set and LVs... this is why LSI recommend 'MegaCli -CfgEachDskRaid0' because it does enable caching.. A good read on how to setup something similar is here: https://calomel.org/megacli_lsi_commands.html (disclaimer, I haven't parsed it all so the author could be clueless, but it seems to give generally good advice.) Going the way of 'JBOD' is a bad thing to do, just don't, performance sucks. As for the recommended command above, can't comment because currently I don't use it nor will I need to in the near future... but…
Actually it’s not a good idea to use heavy disk caching when running ZFS. Its reliability depends on being able to commit metadata to disk. So I don’t care about that caching option. Provided you have enough RAM, ZFS is very effective caching data itself.
If you (O Hartmann) want to use or need to use ZFS with any OS including FreeBSD don't go with the LSI 92xx series controllers, its just the wrong thing to do.. Pick an HBA that is designed to give you direct access to the drives not one you have to kludge and cajole.. Including LSI controllers with caches that use the mfi driver, just not those that are not designed to work in a non RAID mode (with or without the passthru command/mode above.)
As I said, the problem is, sometimes it’s not so easy to find the right HBA.
So moral of the story/choices. Don't go with ZFS because people tell you its best, because it isn't, go with ZFS if it suits your hardware and application, and if ZFS suits your application, get hardware for it.
Indeed, I second this. But really, "hardware for it" covers a rather broad cathegory ;) ZFS can even manage to work on hardware _against_ it.






Borja.

Loading...