::SHE+ILA::'s Dictionary Storage Format

Glamorous Woman in Purple Dress by Filing Cabinets
130
0
  • Squishy Plushie's avatar Artist
    Squishy Pl...
  • Prompt
    Read prompt
  • DDG Model
    DaVinci2
  • Access
    Public
  • Created
    1mo ago
  • Try

More about ::SHE+ILA::'s Dictionary Storage Format

As previously discussed in the previous picture, all permanent strings are stored in a special "Dictionary" segment. There is a table of strings, compressed in ETAIONS format, along with an entry number, and a part-of-speech binary flag (one bit each for: noun, adjective, verb, adverb, identifier, conjunction, preposition, article, miscellaneous, stropped token). A stropped token is a reserved word such as .and., .or., .not., etc. ::SHE+ILA:: uses the Fortran stropping convention of leading and trailing dots, unlike Algol which uses single quotes, or Pascal which uses no explicit stropping. The string table has the format:
struct {
int32 entrynum;
bitmask16 partofspeech;
int16 languageID;
int64 hash;
int64 genclass;
ETAIONSchar* string;
};
There are also four index tables, one of which sorts the strings alphabetically, one sorts the entrynum numerically ascending (for binary chop), and one batching the similar parts of speech (for GPT-like application). The fourth one directly maps the hash of the incoming string into a chain of buckets referencing the string table entries. The genclass references another dictionary entry which is the name of the class of object in this entry. For example REVOLVER -> GUN -> WEAPON. The dictionary is pre-trained (the P in GPT) on a standard vocabulary, enabling procedural and AI-based prose generation (the G in GPT). The hash speeds up the searching for an incoming string, by hashing the incoming string and checking it matches the hash in the table before comparing the actual characters.

Comments


Loading Dream Comments...

Discover more dreams from this artist