Step one: hardware
Back of contorller has 5 eSATA ports, one built-in, and two PCIe SATA cards with two ports apiece. Drive bay had two eSATA ports.
Need to remember where to plug each of the two eSATA cables. Hopefully FreeNAS knows the disk identities and doesn’t care too much if they show up under the incorrect/alternate port-multiplier device?
Start with some housekeeping: re-seat all the drives, and get rid of any accumulated dust. This really wasn’t bad. I must have cleaned it before packing up to move a year and half ago.
Power up, experiment a bit.
Card 1: StarTech card using an ASmedia 1061 chip. It sees all 10 drives.
Card 2: SiI 3132 card. It only saw two drives, one per channel.
I’m going to assume I was using the ASmedia SATA card based on the fact its BIOS sees all my drives. (Why I left both cards installed is beyond me).
Boot up, and the controller’s internal drive is still good. FreeNAS starts to boot. And problems start.
pmp0 and pmp1 are both detected (SATA port-multipliers; I pronounce them pimp-zero and pimp-one). However immediately I’m getting command timeouts and uncorrectable parity errors. I would not be surprised if these eSATA cables are toast. I seem to recall getting these sorts of errors before I shut the whole thing down years ago before I moved.
Troubleshooting step one is try unplugging and plugging it in again for all the cables and drives. Each drive is in a hotswap sled, so eject and re-insert each of those. No improvement, still getting errors.
Let me order some replacement cables that aren’t so beat up. They are cheap, so that’s a low-cost step. Worst case, I’ll have some extra eSATA cables kicking around.
…
Sunday–got the cables. No instant success. More troubleshooting, I guess.
I removed anything that was extraneous. This included the extra SiI card and all the drives.
Reason I removed all the drives is to isolate any failures of the pimps from any failures of the drives. I would expect that errors would be reported by the full device name, e.g. pmp0.3 or something and not just pmp0 if it was a drive, but better to rule it out and be sure.
So, without any drives, I still get pimp errors.
Okay, maybe I guessed wrong and it was actually the other card I was using way ago? Let’s try the SiI card by itself (no sense leaving both cards in; future-Aimee will thank me). IIRC, FreeBSD didn’t have great driver support for one of the cards, but I honestly do not remember which one. Let’s boot this shit! No errors?! However, the card’s BIOS only detected two drives (one per port multiplier) on boot. Let’s see how this plays out. FreeBSD kernel seems to detect both pimps and all 10 drives. Woohoo! And, drumroll………..
It sees my ZFS array!
Step two: software
Okay, on to the software itself. This machine is running FreeNAS (handy dandy software suite built on FreeBSD). The version is old, version 9.2 of FreeNAS. FreeNAS is actually no more. Now, under the name TrueNAS Core, and they are up to version 13.0 as of writing.
uname: FreeBSD brainslug.chezpina.net 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0 r262572+38751c8: Thu Mar 20 21:13:02 PDT 2014
Step one, reset the networking config and login creds. The former is just wrong and the latter is long forgotten. Check. I’m in like Flynn. Let’s see what was detected on these drives.
[root@brainslug ~]# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT nas_data 36.2T 3.40T 32.8T 9% 1.00x ONLINE /mnt test_data - - - - - FAULTED - [root@brainslug ~]# zpool status pool: nas_data state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0 in 7h41m with 0 errors on Sun Mar 26 07:41:51 2017 config: NAME STATE READ WRITE CKSUM nas_data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/5233078a-8ba6-11e3-b882-000c2913f279 ONLINE 0 0 0 gptid/530fa2ad-8ba6-11e3-b882-000c2913f279 ONLINE 0 0 0 gptid/53ebb48a-8ba6-11e3-b882-000c2913f279 ONLINE 0 0 0 gptid/54c4d1e1-8ba6-11e3-b882-000c2913f279 ONLINE 0 0 0 gptid/559a1bb8-8ba6-11e3-b882-000c2913f279 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 gptid/5cadafcc-c59f-11e3-b4c2-80ee7370128f ONLINE 0 0 0 gptid/5d88a252-c59f-11e3-b4c2-80ee7370128f ONLINE 0 0 0 gptid/5e5e861f-c59f-11e3-b4c2-80ee7370128f ONLINE 0 0 0 gptid/5f37a745-c59f-11e3-b4c2-80ee7370128f ONLINE 0 0 0 gptid/600d39f3-c59f-11e3-b4c2-80ee7370128f ONLINE 0 0 0 errors: No known data errors pool: test_data state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM test_data UNAVAIL 0 0 0 5436593538603718601 UNAVAIL 0 0 0 was /dev/ada0 235558561425173661 UNAVAIL 0 0 0 was /dev/ada1 10027892940568442642 UNAVAIL 0 0 0 was /dev/ada2 2689412369469951613 UNAVAIL 0 0 0 was /dev/ada3 2952459394722526230 UNAVAIL 0 0 0 was /dev/ada4 [root@brainslug ~]#
Pool Info
Okay, two immediate tasks:
- Delete
test_data
pool. I’m not using it and looks like it won’t work anyway (I think this was leftover from when I first started playing with FreeNAS and only had the first pimp populated). - Last scrub of
nas_data
was early 2017 😳, let’s fix that and run a scrub.
Deleting the test pool.
[root@brainslug ~]# zpool destroy test_data
Destroy! Destroy!
Starting the scrub.
scan: scrub in progress since Sun Dec 4 14:22:33 2022 2.55G scanned out of 3.40T at 84.1M/s, 11h45m to go 0 repaired, 0.07% done ... some time later ... scan: scrub repaired 0 in 8h24m with 0 errors on Mon Dec 5 01:46:46 2022
Scrub a dub dub
Okay, now while that scrub is running, let’s see what else we can get fixed up. Things to check on:
- Samba/NFS shares – what is shared? With what permissions? Is it actually accessible now that my IP range is completely different (NFS, looking at you here).
- Maintenance schedule – do I have automatic scrubs scheduled? I can’t recall if the last scrub being 2017 is because that’s when I last had the NAS powered on, or because that’s the last time I manually ran a scrub.
- more
You fucked up? Yeah… LOOK A PINK CAR!
Whoopsadoodle. Me, a security conscious person, “Ah HTTPS, I should enable that for the NAS web UI. Obviously”. FreeNAS, an ancient piece of software, “Okay, I will now exclusively speak SSL 1.0”.
Now, I have the same ERR_SSL_VERSION_OR_CIPHER_MISMATCH
error I was getting
with the iDRAC card. Fuck.
Phew. SSH still works, so I was able to get in and manually edit the SQLite DB
and set the protocol back to plain HTTP. Though I had to reboot the machine to
have it picked up, since I didn’t know what process needed to get a kick to
notice the change and listen again (apparently BSD doesn’t have netstat -p
to
list the process that has a port open?)
1… Day… Later…
No errors found!
But, wait, what the fuck? How did this come back?
[root@brainslug] ~# zpool status ... pool: test_data state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM test_data UNAVAIL 0 0 0 5436593538603718601 UNAVAIL 0 0 0 was /dev/ada0 235558561425173661 UNAVAIL 0 0 0 was /dev/ada1 10027892940568442642 UNAVAIL 0 0 0 was /dev/ada2 2689412369469951613 UNAVAIL 0 0 0 was /dev/ada3 2952459394722526230 UNAVAIL 0 0 0 was /dev/ada4 [root@brainslug] ~#
Testing what?
Declaring success
I’m going to declare success at this point. No drive errors (honestly, I’m shocked), thing is running fine, I can access my files.
However, I don’t trust it. I’m going to as-quickly-as-possible copy everything of value off onto an external drive. Thankfully I was only using 3.4T (the NAS was young) so copying everything onto a 4T drive should work.