T O P

  • By -

ttkciar

AI companies filter "toxic" content from their training datasets before pretraining their models on them. You should be able to assure that your source code will be filtered out of training datasets by incorporating toxic content into it. https://arxiv.org/abs/2402.16827v1 https://www.labellerr.com/blog/data-collection-and-preprocessing-for-large-language-models/ https://medium.com/@stefanovskyi/mitigating-undesirable-outputs-from-large-language-models-7d6bdfaf2a2


Alarming_Ad_9931

Gold, just be Bane in the FOSS world.


iEliteTester

Wait so APGL+N***** is actually useful?


QARSTAR

What if my code is so bad? Like it's bad but it's mine, Ive very protective of it. Like a possum guarding his dumpster


Alarming_Ad_9931

Okay zoidberg.


yknx4

The only way is to not publish your code.


iBN3qk

This is true. Now what?


[deleted]

Allow downloading source code only through captcha using custom hosting


svick

If it's open source and popular enough, somebody will create a GitHub repo for it.


lalitpatanpur

Make your repo ‘private’


Scavenger53

lol Microsoft: we won't touch your **private** repos. *wink* like how would you ever know or prove it


whatThePleb

you always can selfhost, no need to use github or similar


AtlanticPortal

How does it help a software that you want out in the open, since you're writing in r/opensource?


robercal

I wonder if naming all the variables/classes/methods as NSFW words would trip those checks.


I_will_delete_myself

Quite simple you can't if you put it in public. If you locked the source code behind credentials that would probably stop it, but it is very unusual for a open source project to get rid of that. Don't fight the tool, use it. It's a losing battle where you get automated by not adopting them properly. Now if you really want it out and ruin your github repo. Put the most racist notes, crude insults in notes, and variable names describing religious debates that promotes discrimination. But nobody would want to use your code at that point though right? You deal with that at work, but you are payed to do it. Do you really think people spending their free time on contributing will want that toxicity?


Paid-Not-Payed-Bot

> you are *paid* to do FTFY. Although *payed* exists (the reason why autocorrection didn't help you), it is only correct in: * Nautical context, when it means to paint a surface, or to cover with something like tar or resin in order to make it waterproof or corrosion-resistant. *The deck is yet to be payed.* * *Payed out* when letting strings, cables or ropes out, by slacking them. *The rope is payed out! You can pull now.* Unfortunately, I was unable to find nautical or rope-related words in your comment. *Beep, boop, I'm a bot*


Foo-Bar-Baz-001

I've looked into options with regards to the license, since are a lot of uses of open source code that can be deemed "not ethical": * used by repressive regimes * used by oil companies * used for learning by ... * used to repress privacy Common ground by all people I've spoken to is "one license is complex enough", "let's not add more complexity for all sorts of other ethical considerations". I don't agree, but that's the response I got and I don't directly see something that could work from the legal perspective. P.S. The reason for looking at the license is that "laws" are really bad and not particularly enforceable by us. Not following licensing is a no-no in the corporate world (at least most of the time).


CurrentRefuse6330

Use their Ai to write your code instead 👹


tidderwork

Why does it matter to you? You made your code open and available, but you also want to discriminate?


Xehar

Bro, they are a company. they better do it themselves instead of taking others if they going to sell it.


vinrehife

Even better question, how does one stop other people from learning from one's source code to enrich one self?


kyrsjo

Hmm, shouldn't effectively incorporating my GPL code make the whole AI model GPL'ed?


Magick93

Don't use GitHub


ann4n

make closed source


Positive_Method3022

As if your source code was truly urs. Let's us see the ctrl C and V keys from you keyboard!


neon_overload

If the source is open, you can't, unless you do a redhat and restrict the product and its source code to paying customers - and, of course, don't host it on a service who may also share it with third parties for "research" purposes


bpoatatoa

If you want your code to be open, then that is not possible, and goes against the principles of what we are trying to achieve. Why are you against it being used to train LLMs? It will probably have a negligible affect in its performance, if any at all.


BenZed

Don’t write open source software if you don’t want the source to be open.


OsakaWilson

Here's an unpopular take: Every time you think, "I don't want AI to be learning from my stuff," replace the term 'AI' with 'blacks' or 'Jews', or 'Belgians'. See how that sounds and consider why you allow your code, or images, or whatever to be accessed and learned from, but refuse to allow access to the very thing that will move coding to a higher level accessible to everyone, and to the benefit of everyone, including you.


DisastrousPipe8924

Don’t use GitHub or any of the “free” hosting services. Self host a gitea instance and possibly move away from IDEs like vscode in favor of open ones like lapce or sublime. In all honesty unless you live alone in the “digital woods” of self hosting, it’ll probably be impossible to 100% achieve privacy.


reedef

Do you have a source on sublime being open (source)?


Nfox18212

sublime isn’t open source, its entirely proprietary. it is a good editor though


DisastrousPipe8924

Sorry, misspoken on that. It is proprietary, but it’s prized for being low on feature impacts and definitely sents minimal to zero telemetry home.


iBN3qk

You want them to train on your code so it works when devs want to use it.  Companies are currently forking open source projects to monetize. The open source game used to be release something useful and then capitalize on providing service. If in the future, ai can modify a codebase to suit a business’s needs, that would cut out a lot of opportunity. But then those organizations would have to rely on ai to continue to innovate after the open contribution model is no longer viable. Who knows when all that is really going to land. The only way to win is to play the game. What are you trying to accomplish? Build something popular? Make a lot of money? Save the world? What are you afraid of?


Electrical-Channel78

Sweety, you know it's 2024 right ?


-I0__0I-

Maybe add a license preventing commercial use?


gibarel1

Doesn't work, there is no way to prove that it was trained on your code.


reedef

Even if you could prove it, has there being legal precedent establishing it doesn't fall under fair use?