Restoring the NAS

Step one: hardware

Back of contorller has 5 eSATA ports, one built-in, and two PCIe SATA cards with two ports apiece. Drive bay had two eSATA ports.

Need to remember where to plug each of the two eSATA cables. Hopefully FreeNAS knows the disk identities and doesn’t care too much if they show up under the incorrect/alternate port-multiplier device?

Start with some housekeeping: re-seat all the drives, and get rid of any accumulated dust. This really wasn’t bad. I must have cleaned it before packing up to move a year and half ago.

Power up, experiment a bit.

Card 1: StarTech card using an ASmedia 1061 chip. It sees all 10 drives.

Card 2: SiI 3132 card. It only saw two drives, one per channel.

I’m going to assume I was using the ASmedia SATA card based on the fact its BIOS sees all my drives. (Why I left both cards installed is beyond me).

Boot up, and the controller’s internal drive is still good. FreeNAS starts to boot. And problems start.

pmp0 and pmp1 are both detected (SATA port-multipliers; I pronounce them pimp-zero and pimp-one). However immediately I’m getting command timeouts and uncorrectable parity errors. I would not be surprised if these eSATA cables are toast. I seem to recall getting these sorts of errors before I shut the whole thing down years ago before I moved.

Troubleshooting step one is try unplugging and plugging it in again for all the cables and drives. Each drive is in a hotswap sled, so eject and re-insert each of those. No improvement, still getting errors.

Let me order some replacement cables that aren’t so beat up. They are cheap, so that’s a low-cost step. Worst case, I’ll have some extra eSATA cables kicking around.

Sunday–got the cables. No instant success. More troubleshooting, I guess.

I removed anything that was extraneous. This included the extra SiI card and all the drives.

Reason I removed all the drives is to isolate any failures of the pimps from any failures of the drives. I would expect that errors would be reported by the full device name, e.g. pmp0.3 or something and not just pmp0 if it was a drive, but better to rule it out and be sure.

So, without any drives, I still get pimp errors.

Okay, maybe I guessed wrong and it was actually the other card I was using way ago? Let’s try the SiI card by itself (no sense leaving both cards in; future-Aimee will thank me). IIRC, FreeBSD didn’t have great driver support for one of the cards, but I honestly do not remember which one. Let’s boot this shit! No errors?! However, the card’s BIOS only detected two drives (one per port multiplier) on boot. Let’s see how this plays out. FreeBSD kernel seems to detect both pimps and all 10 drives. Woohoo! And, drumroll………..

It sees my ZFS array!

Step two: software

Okay, on to the software itself. This machine is running FreeNAS (handy dandy software suite built on FreeBSD). The version is old, version 9.2 of FreeNAS. FreeNAS is actually no more. Now, under the name TrueNAS Core, and they are up to version 13.0 as of writing.

uname: FreeBSD brainslug.chezpina.net 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0 r262572+38751c8: Thu Mar 20 21:13:02 PDT 2014

Step one, reset the networking config and login creds. The former is just wrong and the latter is long forgotten. Check. I’m in like Flynn. Let’s see what was detected on these drives.

[root@brainslug ~]# zpool list
NAME        SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
nas_data   36.2T  3.40T  32.8T     9%  1.00x  ONLINE  /mnt
test_data      -      -      -      -      -  FAULTED  -
[root@brainslug ~]# zpool status
  pool: nas_data
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 7h41m with 0 errors on Sun Mar 26 07:41:51 2017
config:

        NAME                                            STATE     READ WRITE CKSUM
        nas_data                                        ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/5233078a-8ba6-11e3-b882-000c2913f279  ONLINE       0     0     0
            gptid/530fa2ad-8ba6-11e3-b882-000c2913f279  ONLINE       0     0     0
            gptid/53ebb48a-8ba6-11e3-b882-000c2913f279  ONLINE       0     0     0
            gptid/54c4d1e1-8ba6-11e3-b882-000c2913f279  ONLINE       0     0     0
            gptid/559a1bb8-8ba6-11e3-b882-000c2913f279  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/5cadafcc-c59f-11e3-b4c2-80ee7370128f  ONLINE       0     0     0
            gptid/5d88a252-c59f-11e3-b4c2-80ee7370128f  ONLINE       0     0     0
            gptid/5e5e861f-c59f-11e3-b4c2-80ee7370128f  ONLINE       0     0     0
            gptid/5f37a745-c59f-11e3-b4c2-80ee7370128f  ONLINE       0     0     0
            gptid/600d39f3-c59f-11e3-b4c2-80ee7370128f  ONLINE       0     0     0

errors: No known data errors

  pool: test_data
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                    STATE     READ WRITE CKSUM
        test_data               UNAVAIL      0     0     0
          5436593538603718601   UNAVAIL      0     0     0  was /dev/ada0
          235558561425173661    UNAVAIL      0     0     0  was /dev/ada1
          10027892940568442642  UNAVAIL      0     0     0  was /dev/ada2
          2689412369469951613   UNAVAIL      0     0     0  was /dev/ada3
          2952459394722526230   UNAVAIL      0     0     0  was /dev/ada4
[root@brainslug ~]#

Pool Info

Okay, two immediate tasks:

  1. Delete test_data pool. I’m not using it and looks like it won’t work anyway (I think this was leftover from when I first started playing with FreeNAS and only had the first pimp populated).
  2. Last scrub of nas_data was early 2017 😳, let’s fix that and run a scrub.

Deleting the test pool.

[root@brainslug ~]# zpool destroy test_data

Destroy! Destroy!

Starting the scrub.

  scan: scrub in progress since Sun Dec  4 14:22:33 2022
        2.55G scanned out of 3.40T at 84.1M/s, 11h45m to go
        0 repaired, 0.07% done

... some time later ...

  scan: scrub repaired 0 in 8h24m with 0 errors on Mon Dec  5 01:46:46 2022

Scrub a dub dub

Okay, now while that scrub is running, let’s see what else we can get fixed up. Things to check on:

  1. Samba/NFS shares – what is shared? With what permissions? Is it actually accessible now that my IP range is completely different (NFS, looking at you here).
  2. Maintenance schedule – do I have automatic scrubs scheduled? I can’t recall if the last scrub being 2017 is because that’s when I last had the NAS powered on, or because that’s the last time I manually ran a scrub.
  3. more

You fucked up? Yeah… LOOK A PINK CAR!

Whoopsadoodle. Me, a security conscious person, “Ah HTTPS, I should enable that for the NAS web UI. Obviously”. FreeNAS, an ancient piece of software, “Okay, I will now exclusively speak SSL 1.0”.

Now, I have the same ERR_SSL_VERSION_OR_CIPHER_MISMATCH error I was getting with the iDRAC card. Fuck.

Phew. SSH still works, so I was able to get in and manually edit the SQLite DB and set the protocol back to plain HTTP. Though I had to reboot the machine to have it picked up, since I didn’t know what process needed to get a kick to notice the change and listen again (apparently BSD doesn’t have netstat -p to list the process that has a port open?)

1… Day… Later…

No errors found!

But, wait, what the fuck? How did this come back?

[root@brainslug] ~# zpool status

  ...

  pool: test_data
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                    STATE     READ WRITE CKSUM
        test_data               UNAVAIL      0     0     0
          5436593538603718601   UNAVAIL      0     0     0  was /dev/ada0
          235558561425173661    UNAVAIL      0     0     0  was /dev/ada1
          10027892940568442642  UNAVAIL      0     0     0  was /dev/ada2
          2689412369469951613   UNAVAIL      0     0     0  was /dev/ada3
          2952459394722526230   UNAVAIL      0     0     0  was /dev/ada4
[root@brainslug] ~#

Testing what?

Declaring success

I’m going to declare success at this point. No drive errors (honestly, I’m shocked), thing is running fine, I can access my files.

However, I don’t trust it. I’m going to as-quickly-as-possible copy everything of value off onto an external drive. Thankfully I was only using 3.4T (the NAS was young) so copying everything onto a 4T drive should work.

Extra stuff

zfs list

dmesg

smartctl -ia /dev/ada0

aimeeble@blog

the blog of aimeeble


Trying to boot my old NAS, fix its eSATA expanders, and recover my zpools.

By Aimee, 2022-11-30


Tagged:Table of Contents: