Based on recent comments this feels like a discussion we should have. So…topic, basically.

I’m not looking to be chief noisemaker on this, but I stand by what I wrote in !privacy and what’s in my post history.

https://lemmy.ml/post/48724623/26190950

Let’s have at; do we want a [AI] and [NOT AI] tag. Why or why not?

  • windpunch@feddit.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 hours ago

    Hmm, can I RegEx this?

    [\s-]AI[-,\.\s]
    

    This is assuming it’s not at the start of the article.

    EDIT: Thinking about it 2 more seconds, this might actually be more precise:

    [\W_]AI[\W_]
    

    Doing more, like \WAI would filter words like “ailment”. Haven’t found a word matching AI\W yet, but I’m careful atm.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 hours ago

      It won’t work. You need a text classifier to do sentiment analysis, because “ai” is a concept, not just “ai”. TinyBERT or MiniLM I reckon could do it or if you really want to cut off your nose to spite your face, code the equivalent in python from scratch.

      Say what you want about M$, but TinyBERT / MiniLM are awesome.

      Smart play would be for the RSS reader to have that as optional plug in module, IMHO.

    • Carl Newton@feddit.uk
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 hours ago

      You know, now you mention it, I haven’t tested to see if the filter functionality of my reader will accept a regular expression. I’ll give it a go later, thanks!