Perl Weekly Challenge # 7: Niven Numbers and Word Ladders



  • These are some answers to the Week 7 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

    Challenge # 1: Niven Numbers

    Print all the niven numbers from 0 to 50 inclusive, each on their own line. A niven number is a non-negative number that is divisible by the sum of its digits.

    A Niven number or harshad number is a strictly positive integer that can be evenly divided by the sum of its digits. Note that this property depends on the number base in which the number is expressed (the divisibility property is intrinsic to a pair of numbers, but the sum of digits of a given number depends on the number base in which the number is expressed). Here we will consider only numbers in base 10.

    Please also note that 0 cannot be a divisor of a number. Therefore, 0 cannot really be a Niven number. We'll start at 1 to avoid problems.

    Niven Numbers in Perl 5

    For a simple problem like this, I can't resist doing in with a Perl one-liner:

    $ perl -E 'for my $num(1..50) { my $sum = 0; $sum += $_ for (split //, $num); say $num if $num % $sum == 0;}'
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    12
    18
    20
    21
    24
    27
    30
    36
    40
    42
    45
    48
    50
    

    If you don't like one-liners, this how a full-fledged Perl 5 script could look like:

    use strict;
    use warnings;
    use feature qw/say/;
    
    for my $num(1..50) { 
        my $sum = 0; 
        for (split //, $num) { 
            $sum += $_; 
        } 
        say $num if $num % $sum == 0;
    }
    

    This prints the same as the above one-liner.

    Niven Numbers in Perl 6

    We can simply translate the P5 one-liner in P6:

    $ perl6 -e 'for 1..50 -> $num { my $sum = [+] $num.comb; say $num if $num %% $sum}'
    

    This prints out the same values as before.

    Note that the code is more concise that its P5 counterpart, in large part because the use of the [+] hyper-operator makes is possible to compute the sum of the digits in just one expression.

    Since you can have both an if and a for statement modifier in the same statement in Perl 6, you can make this one-liner even more concise:

    $ perl6 -e '.say if $_ %% [+] $_.comb for 1..50'
    

    And it you prefer a real script, this is one way it could be done:

    use v6;
    .say for gather { 
        for 1..50 -> $num { 
            my $sum = [+] $num.comb; 
            take $num if $num %% $sum 
        }
    }
    

    Challenge # 2: Word Ladders

    A word ladder is a sequence of words [w0, w1, …, wn] such that each word wi in the sequence is obtained by changing a single character in the word wi-1. All words in the ladder must be valid English words.

    Given two input words and a file that contains an ordered word list, implement a routine that finds the shortest ladder between the two input words. For example, for the words cold and warm, the routine might return:

    ("cold", "cord", "core", "care", "card", "ward", "warm")
    

    However, there’s a shortest ladder: (“cold”, “cord”, “card”, “ward”, “warm”).

    The text of the challenge provides the following additional information:

    • Givens:

      1. All words in the list have the same length.

      2. All words contain only lowercase alphabetical characters.

      3. There are no duplicates in the word list.

      4. The input words aren't empty and aren't equal but they have the same length as any word in the word list.

    • Requirements:

      1. The routine must return a list of the words in the ladder if it exists. Otherwise, it returns an empty list.

      2. If any of the input words is the wrong length (i.e., its length is different to a random from the word list) or isn’t in the word list, return an empty list.

    According to this Wikipedia page, word ladder puzzles were invented by the famous writer and mathematician Lewis Carroll in 1877, at a time when there was obviously no computer.

    To comply with given # 1, I'll break up my word.txt list into files with 2-letter words, 3-letter words, and so on, although this is really not necessary: it would be trivial to filter out the words with a different letter count when reading the word.txt file.

    My word.txt input file only contains words with only lowercase alphabetical ASCII characters.

    I'll slightly depart from given # 4 and Requirement # 1: if the input words are equal, I'll simply return that word as being the ladder. And I'll abort the program if the input words have different lengths.

    Just as for some previous challenges, I will use a words.txt file containing 113,809 lower-case English words usually accepted for crossword puzzles and other word games. The words.txt file can be found on my Github repository. The original list was contributed to the public domain by Internet activist Grady Ward in the context of the Moby Project. This word list is also mirrored at Project Gutenberg.

    For the purpose of testing the programs below, the words.txt file is located in my current directory. Obviously, when we will be reading the list, we will need to keep only the words having the same length as the two input words.

    This is task that is much more complicated than the other challenge of this week (and than most previous challenges). In fact, my first reaction when reading the problem was, "Gosh, I've got no idea how I'm going to solve that." In such case, it is often a good idea to try to break up the problem into smaller ones.

    The first thing that we must be able to do is to figure out whether one word can be transformed into another with just one letter change. It would probably be also very useful to know whether this can be done with two letter changes, three letter changes, etc. For this, we may want to use a well-known CS string metric named the Levenshtein distance or Levenshtein edit distance, which is the smallest number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. In the case of this challenge, however, we probably don't need to consider insertions and deletions, but are interested only in substitutions.

    Once we have a routine to compute the Levenshtein distance, we might try to use brute force with backtracking to test all possibilities, or an optimized version thereof able to remove non optimal paths relatively early in the process, or a branch and bound algorithm, or implement some form of Dijstra's algorithm for shortest paths.

    Word Ladders in Perl 5

    Computing the Edit Distance In Perl 5

    There are a number of CPAN modules to compute the Levenshtein distance on the CPAN, such as Text::Levenshtein and various variants. However, we will not use any of them for several reasons. One is that using a ready-made library for solving such a challenge is kind of cheating; I especially don't wish to use a non-core module. The second reason is that, as we have seen above, the general Levenshtein distance is the smallest number of single-character edits (insertions, deletions or substitutions) required to change one word into the other, where as, since our input words have the same length, we're interested with only the smallest number of substitutions. The last reason is that, because of the second reason just stated, computing the full Levenshtein distance would be overkill: we really need to compute only the number of letter differences for each given position in the input words. Because of that, I'll use the term edit distance, rather than Levenshtein distance.

    Once we've clarified in our head what we need, it is quite simple to write our edit_distance subroutine and some code to test it:

    use strict;
    use warnings;
    use feature qw/say/;
    sub edit_distance {
        my ($word1, $word2) = @_;
        die "Words $word1 and $word2 have different lengths\n" unless length $word1 == length $word2;
        my @w1 = split //, $word1;
        my @w2 = split //, $word2;
        my $dist = 0;
        for my $i (0..$#w1) {
            $dist++ if $w1[$i] ne $w2[$i];
        }
        return $dist;
    }
    for my $word_pair_ref (["cold", "cord"], ["cord", "core"], ["cord", "cord"], 
            ["cold", "warm"], ["kitten", "sittin"], ["kitten", "sitting"]) {
        my ($w1, $w2) = @$word_pair_ref;
        say "Distance between $w1 and $w2 is: \t", edit_distance ($w1, $w2);
    }
    

    This script displays the expected output:

    Distance between cold and cord is:      1
    Distance between cord and core is:      1
    Distance between cord and cord is:      0
    Distance between cold and warm is:      4
    Distance between kitten and sittin is:  2
    Words kitten and sitting have different lengths
    

    So, we have a working edit_distance subroutine, a ladder subroutine might look in part like this:

    sub ladder {
        my $word1, $word2) = @_;
        return ($word1) if edit_distance($word1, $word2) == 0;
        return ($word1, $word2) if edit_distance($word1, $word2) == 1;
        # ...
    }
    

    The problem, of course, is to write the code that will replace the ellipsis. Eventually, the real code will not really look like that, but it could.

    Finding the Ladders in Perl 5

    Let's try an improved brute force algorithm. For this, we will first analyze our input file and prepare data structures that are likely to make brute force relatively fast.

    Preparing the Data

    To start with, we will use our words.txt file to create files of words having 2, 3, 4, ... 9, 10 letters. This can be done with simple one-liners such as:

    $ perl -nE 'chomp; say if length == 10' words.txt >  words10.txt
    

    We're not really interested with longer words, because it will become increasingly difficult and unlikely to find complete ladders between such longer words.

    Our 9 files have the following word counts:

    113809  113809 1016714 words.txt
    9199     9199  101189 words10.txt
      85       85     255 words2.txt
     908      908    3632 words3.txt
    3686     3686   18430 words4.txt
    8258     8258   49548 words5.txt
    14374   14374  100618 words6.txt
    21727   21727  173816 words7.txt
    26447   26447  238023 words8.txt
    16658   16658  166580 words9.txt
    220447  220447 1932357 total
    

    The next thing we want to do is to build hashes which, for each word in the input file, lists all words that are a single-letter-change away from that given word. This implies nested loops on large data collections; this can take a very long time to run, so we want to cache the result in order to avoid having to recompute it every time. That's what the first 40 code or so lines in the program below do. We use the Storable core module to store the generated hash in a file. Note that it would be very easy to store our %words hash of arrays in a plain text file and to retrieve it on demand (for example, one line per hash key, with the key and the list of words on eash such line), and that's what we'll do below in P6, but Storable does it well and fast for us. The basic idea is as follows: we get two words as parameters to the program; if the words have the same length, say 5, we look for a data store file named word_store_5 on the disk. If the file is there, we just retrieve the data and load the hash from it (which takes a split second); if not, we generate the hash and store it. This way, these long computations can be done only once for each word length.

    The %words hash of arrays is in effect a graph of the single-letter-edit connections between words.

    Some words are what I call "orphans" in the code, i.e. there are not connected to any other word through a single character edit. When legendary computer scientist Donald Knuth studied the problem several decades ago, he found that the word "aloof" was one such word. So he named "aloof" all of these words not connected to any other. In the code below, these words are stored in "aloof" files, despite the fact that my word list has in fact the word "kloof" (whatever it means) which is one character edit away from "aloof." We remove these words from our hash, since they cannot participate in any path between two words.

    The table below summarizes word counts and run times for this part of the process.

    | File        | Words   | Run time   | Aloof words |
    | ----------- | ------- | ---------- | ----------- |
    | words2.txt  |  85     | 0m0.102s   | 0           |
    | words3.txt  |  908    | 0m2.519s   | 6           |
    | words4.txt  |  3686   | 0m38.162s  | 68          |
    | words5.txt  |  8258   | 4m46.641s  | 711         |
    | words6.txt  |  14374  | 16m29.848s | 3093        |
    | words7.txt  |  21727  | 39m5.278s  | 7348        |
    | words8.txt  |  26447  | 71m47.795s | 12516       |
    | words9.txt  |  16658  | 28m6.781s  | 10096       |
    | words10.txt |  8258   | 7m51.305s  | 6494        |
    

    When words have 9 characters or more, significantly more than half of them are "aloof" words. It probably becomes relatively difficult to find pairs of words that are connected with a ladder.

    As you can see, this process takes quite a lot of time when there are many words in the input file (more than 71 minutes for words having 8 characters), it is good to store the data produced to avoid recomputing it. Note that is might be quicker not to do that and to go directly for the ladders if we were to determine a path between only two words, but I ran the program probably a couple of hundred times for the purpose of testing and finding information about our word lists, so I'm happy that I first took the time to prepare the data for faster processing later. When the word store already exists, it takes about 0.1 to 0.3 second to reconstruct the hash from the store.

    Finding the Word Ladders

    This is my Perl 5 code for finding word ladders:

    use strict;
    use warnings;
    use feature qw/say/;
    use Storable;
    use Data::Dumper;
    
    die "Please pass two words as parameters" unless @ARGV == 2;
    my ($word1, $word2)= @ARGV;
    my $length = length $word1;
    die "The two words must have the same length\n" if $length != length $word2;
    
    my $store_file = "word_store_$length";
    my ($store_ref, %words);
    if (-e $store_file) {
        my $store_ref = retrieve($store_file);
        %words = %$store_ref;
    } else {
        my $file = "words$length.txt";
        open my $IN, '<', $file or die "Cannot open $file$!";
        while (my $word = <$IN>) {
            chomp $word;
            $words{$word} = [];
            for my $key (keys %words) {
                if (edit_distance($key, $word) == 1) {
                    push @{$words{$key}}, $word;
                    push @{$words{$word}}, $key;
                }
            }
        }
        close $IN;
        my $orphans = "aloof_$length.txt";
        open my $OUT, ">", $orphans or die "Cannot open file $orphans$!";
        for my $key (keys %words){
            if (scalar keys $words{$key} == 0) {
                say $OUT "$key";
                delete $words{$key}; 
            }
        }
        close $OUT;
        store \%words, $store_file;  
    }
    
    my $max = $le   ngth * 2;
    
    sub edit_distance {
        my ($word1, $word2) = @_;
        # die "Words $word1 and $word2 ..." -> No longer needed as this is checked before
        my @w1 = split //, $word1;
        my @w2 = split //, $word2;
        my $dist = 0;
        for my $i (0..$#w1) {
            $dist++ if $w1[$i] ne $w2[$i];
        }
        return $dist;
    }
    
    sub ladder {
        my ($word1, $word2, $tmp_result) = @_;
        return $tmp_result if $word1 eq $word2;
        return [] if scalar @$tmp_result >= $max;
        my @temp_solutions;
        for my $word (@{$words{$word1}}) {
            next if $word eq $word1;
            next if grep { $_ eq $word } @$tmp_result; # not really needed but a bit faster
            push @temp_solutions, [@$tmp_result, $word] and last if $word eq $word2;
            my $new_tmp = ladder($word, $word2, [@$tmp_result, $word]);
            next if scalar @$new_tmp == scalar @$tmp_result;
            next unless scalar @$new_tmp;
            push @temp_solutions, $new_tmp;
        }
        return [] unless @temp_solutions;
        my $best_sol = (sort { scalar @$a <=> scalar @$b } @temp_solutions)[0];
        $max = scalar @$best_sol if scalar @$best_sol < $max;
        return $best_sol;
    }
    
    for ($word1, $word2) {
        die "Word $_ not found\n" unless exists $words{$_};
    }
    my $ladder = ladder $word1, $word2, [$word1];
    
    if (@$ladder) {
        say join "->", @$ladder;
    } else {
        say "No ladder found for $word1 and $word2"
    }
    

    The bulk of the work is done in the ladder subroutine, which calls itself recursively for all words connected to the $word1 input word. The $max variable, which controls how deep we go into the recursive research, may not be needed for the correctness of the algorithm (provided we find another way to stop recursion), but it enables the program to run tens to hundreds of times faster, depending on the input words.

    This is an example run:

    $ time perl ladder2.pl warm cold
    warm->ward->card->cord->cold
    
    real    0m2.959s
    user    0m2.859s
    sys     0m0.077s
    

    Printing out intermediate results shows that there are other shortest ladders between these two words, for example:

    warm worm word wold cold
    warm worm word cord cold
    warm worm corm cord cold
    warm ward card cord cold
    

    There is one caveat: it should be noted that I have initialized the $max variable to twice the length of the input words. In a way, this is a bug because there are certainly some (probably very rare) pairs of four-letter words for which the shortest ladder contains more than 8 words, but I have kept it this way as a trade-off because the program runs so dramatically faster with a relatively low value for $max. If we needed to make sure that we don't miss any shortest ladder, we could run it the way it is now (so that it is fairly fast most of the time) and, when no ladder is found, run it again with a much larger initial value of $max. According to this site, the longest shortest ladder (with words of six letters) has 50 steps.

    As a conclusion to the Perl 5 implementation of word ladders, I should say that, although it seems to work properly (subject to the caveat just above), I'm not really fully satisfied with this solution: I think it should be possible to make something simpler (and perhaps faster), but I don't have the time at the moment to investigate further.

    Word Ladders in Perl 6

    Let's try to adapt the P5 script to P6.

    Serializing the Word Hash of Arrays

    There doesn't seem to be a Storable module in Perl 6, but I did not worry about that, since I thought that the gist routine would provide a serialized image of the $words hash which could then be EVALed to retrieve the hash. This seems to work fine with a small hash. But that does not work with our big %words hash of arrays, because it turns out that, apparently, gist truncates its output to a few thousands characters.

    It would probably be possible to serialize the %words hash with some JSON or YAML module, but there doesn't seem to be any core module for that.

    As noted in the P5 section of this challenge, there is nothing complicated in writing our own plain text serializer for a simple hash of arrays. For example, we can write a plain text file with one line for each hash item, with the key at the beginning of the line and then a list of the values. Let's do it in a little toy module, which could look like something like this:

    unit package Store;
    
    sub store (%hash, $file) is export {
        # stores a hash or array as lines containing key et values
        # such as: key | val1 val2 val3
        my $out;
        for %hash.kv -> $key, $val {
            $out ~= "$key | $val \n";
        }
        spurt $file, $out;
    }
    sub retrieve (%hash, $file) is export {
        # populates a hash of arrays with stored data
        for $file.IO.lines -> $line {
            my ($key, $val) = split /\s?\|\s?/, $line;
            %hash{$key} = $val.words;
        }
    }
    

    As an example, the first few lines of the word_store_4 file look like this:

    yawl | bawl pawl wawl yawn yawp yaws yowl 
    pled | fled bled pied plea peed gled pleb plod sled 
    pita | dita pima pica pika pina pith pits pity vita 
    keir | heir weir 
    quag | quad quai quay 
    frug | frig frog frag drug
    

    The Ladder Script in Perl 6

    Now that we have solved the problem of storing and retrieving the hash or arrays, adapting the P5 script into Perl 6 is fairly easy:
    
    use v6;
    use Store;
    
    die "Please pass two words as parameters" unless @*ARGS == 2;
    my ($word1, $word2)= @*ARGS;
    my $length = $word1.chars;
    die "The two words must have the same length\n" if $length != $word2.chars;
    
    my $max = 2 * $length;
    my $store-file = "word_store_$length";
    my ($stored, %words);
    if ($store-file.IO.e) {
        retrieve %words, $store-file;
    } else {
        for "words$length.txt".IO.lines -> $word { 
            %words{$word} = [];
            for keys %words -> $key {
                if (edit-distance($key, $word) == 1) {
                    push @(%words{$key}), $word;
                    push @(%words{$word}), $key;
                }
            }
        }
        %words = grep { $_.value.elems > 0 }, %words.pairs; 
        store %words, $store-file;
    }
    
    sub edit-distance (Str $word1, Str $word2) {
        my @w1 = $word1.comb;
        my @w2 = $word2.comb;
        my $dist = 0;
        $dist++ if @w1[$_] ne @w2[$_] for (0..@w1.end) ;
        return $dist;
    }
    
    sub ladder (Str $word1, Str $word2, $tmp-result) {
        return $tmp-result if ($word1 eq $word2);
        return [] if @$tmp-result.elems >= $max;
        my @temp-solutions;
        for @(%words{$word1}) -> $word {
            next if $word eq $word1;
            next if grep { $_ eq $word }, @$tmp-result;
            push @temp-solutions, [|@$tmp-result, $word] and last if $word eq $word2;
            my $new_tmp = ladder($word, $word2, [|@$tmp-result, $word]);
            next if @$new_tmp.elems == @$tmp-result.elems;
            next unless @$new_tmp.elems;
            push @temp-solutions, $new_tmp;
        }
        return [] if @temp-solutions.elems == 0;
        my $best_sol = (sort { $_.elems }, @temp-solutions)[0];
        $max = @$best_sol.elems if @$best_sol.elems < $max;
        return $best_sol;
    }
    
    for ($word1, $word2) {
        die "Word $_ not found\n" unless  %words{$_} :exists;
    }
    my $ladder = ladder $word1, $word2, [$word1];
    
    if (@$ladder) {
        say join "->", @$ladder;
    } else {
        say "No ladder found for $word1 and $word2"
    }
    

    Running the script with the words "warm" and "cold" produces the following output:

    $ perl6  ladder.p6 warm cold
    warm->worm->word->cord->cold
    
    $ perl6  ladder.p6 cold warm
    cold->wold->word->worm->warm
    

    Wrapping up

    The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating to this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, May 19. And, please, also spread the word about the Perl Weekly Challenge if you can.



    http://blogs.perl.org/users/laurent_r/2019/05/perl-weekly-challenge-7-niven-numbers-and-word-ladders.html

Log in to reply
 

© Lightnetics 2019