Comments
Loading Dream Comments...
You must be logged in to write a comment - Log In
ArtistA beautiful purple-haired woman wearing a long purple velvet ankle-length dress with flouncy sleeves and purple suede high heels. She is holding several pieces of string, which have letters and symbols threaded on them.
In order to optimise our FSM LLM, we now store incoming training data as syllables rather than characters or dictionaried words. Ideally we need less than 256 orthogonal combinations of (consonant, vowel), e.g. WOO, YAY, so that our nodes may remain byte-oriented rather than 16bit cell oriented. A lexer preprocessor coroutine will convert incoming character-based data on-the-fly into single byte syllables, which will then form our Markov FSM. For example, if ttyin() returns "HELLO WORLD" the lexer will return (HE)(Lé)(W)( )(Wé)(R)(L)(D) or better.
Phonetic syllables allow us to retain the one-bytedness of our FSM, without going to the 16bittedness of a worded dictionary and the consequent massive increase in nodal valency (i.e. every node would need up to 65536 exits if we used a vocabulary-based system with up to 65536 dictionary words). It also makes the discontinuous nature of the Markov creativity "softer", for example if our two threads flowing through the Markov chain returned "WOO" and "YAY" in response to a prompt ::SHE+ILA:: hadn't been trained for, then the output (being a linear superposition of WOO and YAY) may be YOO or WAY (rather than some gibberish like WAO or YZX). I'm sure you've all seen the attempts by various AI models to render a superposition of two ASCII characters in an output literal, and producing a blurred pseudo-character (technically known as a "squish quad" -- in fact my "screen name" SQUISHY is a superposition of a comment made by my mum one Xmas when she saw one of my plushies ("They get squishier every time I see them") and the term "squish quad" ("[]") from APL.
Anyway, we can pack a consonant and a vowel into a single byte, 4 bits for each (like the way an opcode and an addressing mode are packed into an instruction). We can use the values $80_$E3 since we don't need them as BCD digits (packed decimal). We would use ^N and ^O (Shift In and Shift Out) as embedded delimiters to denote that our string contains phonemes rather than alphabetics.
A possible implementation is given below :
$01_$1A Plain consonants with no following vowel
$41_$5A Consonants followed by "A"
$61_$7A Consonants followed by "E"
$81_$9A Consonants followed by "I"
$A1_$BA Consonants followed by "O"
$C1_$DA Consonants followed by "U"
Note that the control characters ^A_^Z have been reused, hence cannot be present in a syllable string; however the punctuation ($20_$3F) and tags ($E4_$FF) are unaltered.