::SHE+ILA::'s Dictionary Storage Format

Glamorous Woman in Purple Dress by Filing Cabinets
54
0
  • Squishy Plushie's avatar Artist
    Squishy Pl...
  • Prompt
    Read prompt
  • DDG Model
    DaVinci2
  • Access
    Public
  • Created
    8h ago
  • Try

More about ::SHE+ILA::'s Dictionary Storage Format

As previously discussed in the previous picture, all permanent strings are stored in a special "Dictionary" segment. There is a table of strings, compressed in ETAIONS format, along with an entry number, and a part-of-speech binary flag (one bit each for: noun, adjective, verb, adverb, identifier, conjunction, preposition, article, miscellaneous, stropped token). A stropped token is a reserved word such as .and., .or., .not., etc. ::SHE+ILA:: uses the Fortran stropping convention of leading and trailing dots, unlike Algol which uses single quotes, or Pascal which uses no explicit stropping. The string table has the format:
struct {
int32 entrynum;
bitmask16 partofspeech;
int16 languageID;
int64 hash;
ETAIONSchar* string;
};
There are also four index tables, one of which sorts the strings alphabetically, one sorts the entrynum numerically ascending (for binary chop), and one batching the similar parts of speech (for GPT-like application). The fourth one directly maps the hash of the incoming string into a chain of buckets referencing the string table entries. The dictionary is pre-trained (the P in GPT) on a standard vocabulary, enabling procedural and AI-based prose generation (the G in GPT). The hash speeds up the searching for an incoming string, by hashing the incoming string and checking it matches the hash in the table before comparing the actual characters.

Comments


Loading Dream Comments...

Discover more dreams from this artist