Safety Engineer, Dad, Husband, Pilot, Musician. Not necessarily in that order.

Ingenieur für funktionale Sicherheit, Vater, Ehemann, Pilot, Musiker. Nicht notwendigerweise in dieser Reihenfolge.

  • 1 Post
  • 30 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle
  • Then why do you think manufacturers still list these failure rates (to be sure, it is marked as a limit, not an actual rate)? I’m not being sarcastic or facetious, but genuinely curious. Do you know for certain that it doesn’t happen regularly? During a scrub, these are the kinds of errors that are quietly corrected (althouhg the scrub log would list them), as they are during normal operation (also logged).

    My theory is that they are being cautious and/or perhaps don’t have any high-confidence data that is more recent.


  • Hopfgeist@feddit.detoSelfhosted@lemmy.worldHow to fix my ZFS pool mistakes
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    3 months ago

    Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.

    But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.

    This article was not about total disk failure, but about the much more insidious undetected bit error.


  • Let’s do the math:

    The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.


  • Not so much server-based, but the experimental part of “lab” is well covered: I replaced my late-2013 27" iMac’s internal HDD with an SSD. It’s a really delicate procedure, as the display is glued to the chassis; it needs to be cut loose and very carefully removed (it’s tempered glass), and then re-glued with special adhesive strips. But the performance gain is worth it. In addition, it also now runs Ventura, even with the nVIDIA card, thanks to OpenCore Legacy Patcher. Feels like a new machine now, and is perfectly adequate even for small video editing tasks with its 32 GB RAM.


  • I don’t think there’s anything intrinsically wrong, but far as I can see you are using only a single disk for the zfs pool, which will give you integrity checks (know when something is corrupted), but no way to fix it.

    Since this is, by today’s standards, a tiny disk at 100G, I assume this is just a test setup? I’m not sure zfs is particularly well suited for virtual machines, I think it is better to have the host handle the physical data integrity by having the disk image on a zfs filesystem, or giving the VM a zfs volume (block device) directly.







  • What are the advantages of raid10 over zfs raidz2? It requires more disk space per usable space as soon as you have more than 4 disks, it doesn’t have zfs’s automatic checksum-based error correction, and is less resilient, in general, against multiple disk failures. In the worst case, two lost disks can mean the loss of the whole pack, whereas raidz2 can tolerate the loss of any 2 disks. Plus, with raid you still need an additional volume manager and filesystem.



  • Yes. I use a G7 N36L as an offsite-backup server in my second apartment. Works great with NetBSD and zfs, using rsnapshot to make remote backups every night.

    Since it is only active for an hour and a half each night, it is my only server to put the disks into powersave mode the rest of the time. Computing eprformance is so low that I don’t even run a folding@home client. It usually cannot finish any work package before the deadline.







  • For large storage, ECC helps a lot for avoiding storage corruption. In combination with a redundant architecture in zfs it is almost bullet-proof. (Make no mistake, redundant storage is no substitute for backups! You still need those.)

    One option is to use comparatively old server hardware. I have some pretty old stuff (around 10 years) that uses DDR3 RAM, which is dirt cheap, even with ECC (somewhere around 1 €/GB). And it will be fast enough by far for most applications. The downside is higher power consumption for the same performance. The Dell T320 I have with eight 3.5" SAS disks and 32 GB RAM uses some 140 W of power, to give you a ballpark figure.


  • What’s your problem with DAVx^5? It’s completely and permanently free and fully-featured on f-droid. Only the PlayStore version costs money. The authors don’t want to make money, but motivate you to move away from Google infrastructure.

    If you only need address/phone number sync, then nextcloud is probably overkill, but I use it, and it works great. Also for calendar sync and file storage.

    (You don’t need to put the community name in the title, especially not with “@”, which signifies usernames. Communities are prefixed by “!”.)