• jballs@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    7
    ·
    8 months ago

    Lol the part about non-copyrighted text definitely should be read with a wink.

    You can use any text that you want, but please, do not choose something copyrighted. The New York Times is currently suing OpenAI for training ChatGPT on its copyrighted material. Reddit’s data is uniquely valuable, since it’s not subject to those kinds of copyright restrictions, so it would be tragic if users were to decide to intermingle such a robust corpus of high-quality training data with copyrighted text.

  • Dr. Wesker@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    8 months ago

    Does Reddit not persist post and comment revision history? If they do-- as a developer imaging myself in charge of such a feature-- I would just use full post and comment revision history for training, directly from the database.

    This extension probably feels great, but may accomplish very little.

    • kadu@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      They do. After I deleted all my comments, using automatic tools (that replace the text) and manually, they keep recreating them. In fact, this might sound a bit like a conspiracy, but I’ve noticed all my comments that do come back are the ones that people find coming from Google.

      So everything is deleted, then some user searches Google for a solution and my comment was the only one, as soon as they click the post, my comment is back and shows up in my account. The original comment, not even the modified version that should’ve replaced it.

      So 100% Reddit keeps everything.

      • SkyezOpen@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        Then the solution is to continue posting on reddit and poisoning your own post with random nonsense words and hope that does something, I guess.

  • Bob Robertson IX@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 months ago

    I have mixed feelings about this… Reddit was an incredible source of knowledge and now it feels the the Library of Alexandria is burning down.

    I would much rather see an extension that copies your comments off of Reddit and onto another location… Ideally into an open source LLM model.

    • theluddite@lemmy.mlOPM
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Yeah, me too. Here’s how I think about it, though: The French are famously proud of Paris. They love it. The French government also knows that if they push their citizens too hard, they will burn Paris to the ground. This is, surprisingly, very healthy, and has allowed the French to resist the neoliberalization that has swept the rest of the west much more successfully. Meanwhile, Americans would never do such a thing, so we don’t get healthcare, pensions, vacation days, etc. Tech companies are insufficiently afraid of their users. They should know that we’ll burn down the internet should they displease us. We might end up losing a few valuable things in the short term, but in the long term, we’ll have a much healthier relationship.

    • jballs@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Yeah, would be awesome if you could move them to a Lemmy community and then point to that. Then replace it with “I’ve moved to the Fediverse and so should you. To see this comment, follow this link.”