fasterthanlime , Englisch
@fasterthanlime@hachyderm.io avatar
christmastree ,
@christmastree@mastodon.social avatar
syphdias ,
@syphdias@social.linux.pizza avatar

@fasterthanlime I don’t think calling "someone" out on their lies counts as bullying

pointlessone ,
@pointlessone@status.pointless.one avatar

@fasterthanlime so did it tell the truth or did it confess under duress?

rlabrecque ,
@rlabrecque@mastodon.gamedev.place avatar

@fasterthanlime They should pass all output through a layer that asks the question, gets the response, then asks "Are you really sure that's true?" and then provides that response instead.

fasterthanlime OP , (Bearbeitet )
@fasterthanlime@hachyderm.io avatar

@rlabrecque I think that's already a thing and it's called "self refinement": https://arxiv.org/abs/2311.07961

edit: according to the paper, it's not great

  • Alle
  • Abonniert
  • Moderiert
  • Favoriten
  • random
  • haupteingang
  • Alle Magazine