Perl 6 and Markov sequences
- Transfer
Consider one non-numerical sequence based on the use of Markov chains in the text. The next character in the sequence will be randomly determined based on the previous two. The distribution follows the template contained in the source text.
After initialization, three parts are clearly visible in the code.
The first enters the text of the model and gets rid of non-alphabetic characters. Line 4 uses slurp to read standard input ($ * IN) into one string variable, and lc will lowercase everything. The first subst removes all underscores and apostrophes. The second replaces all sequences of non-alphabetic characters with spaces.
The second part uses the sliding-window function from List :: Utils and Perl magic.
$ model-text.comb divides the text into characters.
sliding-window, a sliding window, goes through the list and produces N (in this case, 3) elements, starting with each of the elements in the list. That is, first you get 1st, 2nd and 3rd, then 2nd, 3rd and 4th, etc.
In a loop, we create a table of tables. External keys are the first two of three consecutive characters. Inner is the third character, and its meaning is how many times this character follows the first two. That is, feeding the program the text of the album of the Aqualung group, we get the contents of% next-step {"qu"} of the form:
This will happen if we have “a” five times after “q” and “u”, and then “e” two times.
The third part of the code uses this data to build the sequence. We take the first two characters, and we know which character comes after them. Then we create a sequence starting with these two characters and using -> $ a, $ b {% next-step {$ a ~ $ b} .roll} as a generator. It uses the previous two characters as a frequency hash for the third. The roll method returns one random hash key, according to its weight. In the example with “qu”, we can imagine that we are throwing a seven-sided cube, in which 5 faces are “a” and two are “e”. If it is not known which character follows the first two (for example, these two characters were unique to the text), an undefined value is returned, and the sequence stops.
We get the first 80 characters of the sequence through the munch method.
By running the script on Aqualung texts, we get sequences like
“t carealven thead you he sing i withe and upon a put saves pinsest to laboonfeet” and “t steall gets sill a creat ren ther he crokin whymn the gook sh an arlieves grac”.
The program does not have a hard-coded character set with which it should work. Everything that Perl 6 recognizes as a symbol will be processed. By feeding the standard “Land der Berge” file, which p6eval uses as stdin, you get lines like “laß in ber bist brüften las schören zeites öst froher land der äckerzeichöne lan”.
use v6;
use List::Utils;
my $model-text = $*IN.slurp.lc;
$model-text .=subst(/<[_']>/, "", :global);
$model-text .=subst(/<-alpha>+/, " ", :global);
my %next-step;
for sliding-window($model-text.comb, 3) -> $a, $b, $c {
%next-step{$a ~ $b}{$c}++;
}
my $first = $model-text.substr(0, 1);
my $second = $model-text.substr(1, 1);
my @chain := $first, $second, -> $a, $b { %next-step{$a ~ $b}.roll.key } ... *;
say @chain.munch(80);
After initialization, three parts are clearly visible in the code.
The first enters the text of the model and gets rid of non-alphabetic characters. Line 4 uses slurp to read standard input ($ * IN) into one string variable, and lc will lowercase everything. The first subst removes all underscores and apostrophes. The second replaces all sequences of non-alphabetic characters with spaces.
The second part uses the sliding-window function from List :: Utils and Perl magic.
$ model-text.comb divides the text into characters.
sliding-window, a sliding window, goes through the list and produces N (in this case, 3) elements, starting with each of the elements in the list. That is, first you get 1st, 2nd and 3rd, then 2nd, 3rd and 4th, etc.
In a loop, we create a table of tables. External keys are the first two of three consecutive characters. Inner is the third character, and its meaning is how many times this character follows the first two. That is, feeding the program the text of the album of the Aqualung group, we get the contents of% next-step {"qu"} of the form:
{"a" => 5, "e" => 2}
This will happen if we have “a” five times after “q” and “u”, and then “e” two times.
The third part of the code uses this data to build the sequence. We take the first two characters, and we know which character comes after them. Then we create a sequence starting with these two characters and using -> $ a, $ b {% next-step {$ a ~ $ b} .roll} as a generator. It uses the previous two characters as a frequency hash for the third. The roll method returns one random hash key, according to its weight. In the example with “qu”, we can imagine that we are throwing a seven-sided cube, in which 5 faces are “a” and two are “e”. If it is not known which character follows the first two (for example, these two characters were unique to the text), an undefined value is returned, and the sequence stops.
We get the first 80 characters of the sequence through the munch method.
By running the script on Aqualung texts, we get sequences like
“t carealven thead you he sing i withe and upon a put saves pinsest to laboonfeet” and “t steall gets sill a creat ren ther he crokin whymn the gook sh an arlieves grac”.
The program does not have a hard-coded character set with which it should work. Everything that Perl 6 recognizes as a symbol will be processed. By feeding the standard “Land der Berge” file, which p6eval uses as stdin, you get lines like “laß in ber bist brüften las schören zeites öst froher land der äckerzeichöne lan”.