• partial_accumen@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    3 days ago

    Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around “teen angst”.

    I wonder if in context training would be helpful to mask the “edgelord” training data sets.