• NoSpotOfGround@lemmy.world
    link
    fedilink
    English
    arrow-up
    172
    ·
    edit-2
    2 days ago

    The real meat of the story is in the referenced blog post: https://blog.codingconfessions.com/p/how-unix-spell-ran-in-64kb-ram

    TL;DR

    If you’re short on time, here’s the key engineering story:

    • McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.

    • For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.

    • When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.

    • They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.

    • McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.

    • Using Golomb’s code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.

    • Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.

  • Troy@lemmy.ca
    link
    fedilink
    English
    arrow-up
    55
    ·
    2 days ago

    Long article for one sentence of trivia and no info on the algo itself. The death of the internet is upon us.

    • Em Adespoton@lemmy.ca
      link
      fedilink
      English
      arrow-up
      32
      ·
      edit-2
      2 days ago

      Doesn’t even name the algorithm, and somehow spells LZMA wrong, despite just having written it out longhand.

      Well, it’s PC Gamer.

      [edit] I still can’t figure out if they’re referencing LZW encoding… the L and Z being the same Lempel and Ziv from LZMA, but with Welch having a different solution for the rest of the algorithm due to size constraints.

    • GrabtharsHammer@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      2 days ago

      I’d like to imagine they took the short trivia fact and applied the inverse of the compression algorithm to bloat it into something that satisfied the editor.

  • 0x0@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    2 days ago

    Only 1 GiB of RAM? Moooom!
    Shut up Johnny, Voyager’s still out there with way less.

    • rmuk@feddit.uk
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      Yeah, but I’ve not got two hundred Firefox tabs open on Voyager.