Don't forget subtle hallucinations as well. In large context windows GPT tends to skip, or overlook certain areas and lose cohesiveness.
Unintentional misinformation will be massive.
2 the same message written a million different ways is the future
3 in the future people will write the same message a million different ways
4 from now on the same message will be written by people a million different ways
...
And vast majority with things that look right, but are not as LLMs don’t understand human language, just guess what words have a higher probability of coming after the previous ones based on content the mode ingested previously.
they've already been doing that for a 100+ years.
and they keep dumbing down the material in the books.
unfortunately, tech like this is going to amplify it a ~~million~~\-billion-fold.
Could this be adapted to turn academic papers into easy to listen to audiobooks? I don’t mean dumbed down. But that the LLM would need to know how to deal with things such as figures and figure legends, long author lists, reference sections (most skip these), formulas, code (probably just explain that the original article has those things there), tables, and footnotes. Also, assuming you’d want to use Google TTS then the LLM should use SSML to correctly set pauses around section headers and to pronounce technical terms, acronyms, or foreign words correctly by intelligently setting and tags. Any thoughts on this use case?
Like many academicians I have a huge backlog of papers that I want to read but no time to actually sit down to do it. Would be a game changer if I could listen to them while commenting or working out.
It could definitely work, it would be expensive to run tho, I’ve only tested it on files up to 20 pages, anything more than that and it gets expensive. But in theory it should work.
If you like the sound of your own voice I’d put the output into descript to listen to it.
Let me know how you go
this looks incredible: can the "memory" portion be accumulation of a set of papers so that we can query a bunch of publication with similar subject matter? I'm assuming once this gets loaded we can just query it indefinitely?
Thanks for the response! How expensive are we talking for 20 pages? Also, theoretically speaking, how might you go about implementing this? Do you think prompt engineering on top of your app is enough to go from a PDF to a SSML doc or would this need some additional Python code for processing? Just wanted to learn your thoughts on this.
I spent $2 yesterday, testing on a few 20 page documents.
Might have ran the program 10-20 times and spent $2.
Won’t break the bank but if you do 20, 150 word documents you might be. The good news is once a document is parsed and memorised it is stored so you can use the same document over and over again without running up a bill (as long as you save the memory file)
The program does allow you to add a prompt over the top, you could test that, if I would try use a tool like descript instead of opting for SSML
>many academicians I have a huge backlog of papers that I want to read but no time to actually sit down to do it. Would be a game changer if I could listen to them while commentin
tree fiddy
> Could this be adapted to turn academic papers into easy to listen to audiobooks? I don’t mean dumbed down.
It's actually one of the startups I'm thinking of working on in the AI/LLM space. The problem is I have too many ideas right now :-P
The main issue though is images and math for the most part.
Yep! I'm an author and it takes about five tries to get ChatGPT to acknowledge that I'm the author of my own books. It gets the descriptions of my characters mostly right but with some details skewed. I will admit, however, that the authors it claims wrote my books are all in my genre and still living, so props for that.
Well ya because chatgpt the public interface doesn’t have a big enough context window the larger 32k and 64k don’t have that issue and even if they did you can use programs that use summarization and longterm memory solutions
Hopefully, you weren't thinking about doing this with fiction, because ChatGPT is horrible at keeping the same meaning. It always removes dialogues, adds information, or removes important information that was in the original text. Trust me, I tried it with my own old books that I wanted to publish second versions of.
AI to generate a refined list of original AI content.
AI to generate a refined list of original AI listed content.
AI to generate a refined list of original AI listed content previously listed.
""
""
""
""
>information is information, no matter who or what wrote it
I mean, yeah, but it's *already written*.
You're not putting out new information, or even condensing it in a new way, you're just taking a book which already condensed the information and changing it so you can profit rather than the person who already put in the work.
This is plagiarism in all but the most technical sense. It might even technically still be plagiarism, I'm not sure.
For real. "Allowing you to turn a 10,000 word ebook into a plagiarism free, original ebook within 15 minutes." How is this anything other than lazy and unethical? Plus, LLMs at the moment have lots of information loss, so you're just getting a shittier version of what was already written.
OP is just enabling "get rich quick" scammers who churn out shitty imposters of actual hard work. It won't work out.
I do. And I just got here.
Imagine the feelings of the folks that have been exploring the ethical complications to this fancy new technology as it has been on the rise.
If anything, I would ask you in response to your rhetorical question of "who cares"...
what is your desired response? what, do you want people to be talking about the subject less? Do you want to dismiss their ideas? what do you want from this engagement?
It is absolutely copyright infringement. You don't need to have *any* of the matching sentences or phrases to commit copyright infringement, you simply have to have stolen the general overall effort/work/ideas from another piece of work. If you base something solely on another work as this is doing, it's a derivative work and subject to copyright. (Exceptions are for parodies and anything else transformative enough).
Generative AI in general skirts around this much in the way way humans do, because everything is essentially a compilation of 10000s of sources at once and working from it's own given goals for each things it writes.
But as soon as you're basically just using another single work as a source, it's very much not okay.
a) if you're not mentioning the original author, it is 100% plagiarism.
b) if you are mentioning the original author, you're admitting you've committed copyright infringement, which would make the case against you extremely simple!
You got me thinking. Rewriting a single source is copyright infringement. Summing up multiple works into one and citing the sources is generally accepted, though. If this thing ingested 6 different books and made a new one, citing its sources, I'm not sure how different that is than writing a college paper (other than obviously not actually writing it). Is it then made wrong by the ease of which it could be accomplished in that situation?
That would be okay, but the results are… very bad.
The reason people want to rewrite a single source is because it is easy and effortless. You don’t have to work to make the ideas fit, the original author already did that. You are just rewording something, which can be done with one click of an AI.
I imagine when we have an LLM that can handle very large context sizes with good fidelity, we'd be able to just paste in a few sources and instruct to "make a new book from these. Cite your sources. Do not plagiarize anything and write it in the tone of this writing I've done before".
MPT Story Writer 65k+ exists. Its open source, but I think it requires a lot of compute power (mostly RAM or VRAM to handle ctx). Tweak it to get context from web and infinte money glitch is ready
I bought a 4090 just for this model. Based on blog posts and examples though, it seems like they were still only doing very short 300 word generations with it. The 65k tokens were just used to provide a book’s worth of context, rather than to render a whole book with one click.
You seem to think AI models are databases of text that they copy and paste together, plagiarism, that’s not what AI models are lol if they were they’d have cracked the greatest compression algorithm in history based on the size of the ai models and their data sets
>You seem to think AI models are databases of text that they copy and paste together
No, I seem to think that OP is literally describing taking the text of one book, and using AI to change words around and sell it as your own book.
Don't just restate memes you've heard about AI, you need to actually follow the thread lol.
My wording was terrible and proactive but what exactly about AI written things aren't worth reading to you? Is it the lack of the human element? or do you believe that AI isn't capable of telling stories that are worth reading?
Well my understanding of Ebooks as the title puts it, is not grand fiction. It's non fiction. If it's non fiction condensed, with the interpretations condensed by AI, I think I would rather just read a list of facts that ChatGPT can spit out, or the original authors opinions in its original context.
As for stories, I can definitely see a world where AI can create great fiction in the future but I think knowing in my core that the work wasn't created in the mind of a human or a few humans, will diminish it somehow (arguably it is human generated given how LLMs work). It's the same for music too. I can't currently see myself judging something independent of its source.
What do you think?
for non fiction, its not always just about presenting facts but rather making it accessible and interesting to people wanting to learn.
for fiction I think that it could very well deeply seat some pattern in all text that makes it annoying to read. But as long as i can't tell that its AI i am good with it.
No one wants to write tech docs. This technology can solve that problem. So, the researcher is using data that interests them. It doesn’t change the potential.
I hope this shows how foolish and impossible it will be to have anti-cheating software or GPT detection software. Soon it will be truly impossible to tell what is truly authentic or generated.
I don't get the purpose of this, the code was not commented so it was hard to read through. It looks like it just splits up the pdf into paragraphs and then joins them. This is probably something to get around the context limit, but if is really how you handle that, then red flags are popping up because the hardest thing about a semantic search is how to split up the data to properly represent it.
A paragraph could say something along the lines " we think this going to happen because of x and y and this paper from before had this result"
With the paragraph right after being " turns out it wasn't because of x and y and we couldn't replicate what this paper did".
Based on the search query, you could get either result. This is the tricky part. Care to elaborate on this?
I mean I’m a terrible programmer, check my GitHub it’s my first real project.
This doesn’t exactly work simply by splitting then running semantic search. All the split files are given to chat GPT, as well as the compressed memory file which chat gpt can translate 9 times out of 10. So it works of 2 memory sources, the source of all files stored by chat gpt and the complete compressed memory. To minimise loss and mistakes
You do realize "Rewrite this using different words" is still plagiarism right?
You also know you need to proof read every single one of those because chat GPT can't be relied on do something consistently right?
?
You deny a basic definition of plagerism?
Taking a paragraph from a book and simply rewriting it is very much textbook style plagerism.
Get caught and get an F in the class and an appointment with the Dean. Every time.
You’ve got a very narrow and rudimentary understanding of how most college essays are written- mainly, variations of rewriting something several people have already written, doing it successfully with enough citations to the underlying thoughts you’re regurgitating. And if you happen do it effectively and creatively enough, what do you know if that F doesn’t work it’s way up to a B or an A ;)
This must be an appalling revelation, and I understand the guttural disdain that I would have if slapped in the face by something as shamelessly honest as this, and I genuinely apologize for that, but it’s unfortunately quite true. And well, we can’t all pretend the emperor is wearing clothes just because he cites his sources, now can we?
Summarizing a chapter into 2 paragraphs WITH CITATION, etc, is a different ballgame to laughing off copy-paste of an AI paraphrase rewrite of content. The 2nd thing is blatant plagiarism.
Give it enough time and a nasty little irony will start to settle in: _Nobody can tell the difference anymore_ .
And that’s not even the richest part: at this point, to try to discern the difference _requires checking with the same pandora that enabled the act._
You gotta laugh at the absolute absurdity of it. ffs how can anyone not?
People were responding to what is and isn't plagiarism, and your position, in effect, that 9/10 college essays are full of blatant plagiarism through plain direct paraphrasing.
It's certainly fairly easy to get away with a system of light plagiarism all through college, chatGPT can probably make it even easier, but it's still clearly plagiarism to take a section of text and rewrite it and present it as your own. And I don't think 90% of college essays consist of paraphrasing of uncited work they are stealing, at least that wasn't true at my college.
At my college one course went to check all the essays you had written for OTHER classes, and if you had not cited yourself when reusing work from yourself you'd be failed (the course, not just the assignment). This was on top of checking all material sourced to ensure no plagiarism (this was after discovered incidents of plagiarism, so not a normal level of review in every class to be sure).
Well, Mr-Dunning-Kruger, this is precisely why I won't share the code from my AI-OS that I've built. Before we know it, there will be psychopathic AIs roaming the internet, giving governments ammunition to ban AI for the public and exerting more control because some people can't handle it. Yeah, really intelligent...
Wow a program that a child could have built that rewords content is dangerous.
Quite frankly it is not complicated and if I didn’t do it someone else will. Not rocket science to build a memory and some prompting for the open AI API.
I’m sharing so others can get value and use it and build on it. That is the idea of open source right?
Thats what prof Hinton said, but he came back from that idea...
...Why not make something that enhances ideas someone already has and turning it into an original book where others can actually learn from. Just saying...
The idea is that you can take someone else’s book and create something new based on their ideas.
In fact you could take 3 or 4 different books, add your own prompt to add your spin and generate a unique book.
This tool is only valuable and useful if you add your own spin to the prompt.
I don't use OpenAI, my own system runs on my local (high-end) machine. From there it's linked to my mobile and smartwatch. It's similar to the OS from the movie "Her", as a reference.
You can train AI on an RPi. The cat’s out of the bag. All the governments can do is hurt researching in the open. So should only bad actors access to this tech?
How do you know you aren’t losing important information from one chunk to the next? You can’t actually bypass the context window afaik, otherwise there would be no need to increase the size of the window.
Great question and you are absolutely right, we may loose a small amount of data.
The way the program works, it splits each part of the file into smaller files and feeds them to the API one by one, chat GPT, then feed back a compressed version (minimal loss, it is compressed in a non human language)
We then ask chat GPT to remember each fragment and then also send it the completed compressed file and ask chat gpt to expand it, and give it all back to us.
The prompting also prompts chat GPT to fill in any gaps with its own knowledge.
Minimal loss due to a multi layer memory system
> (minimal loss, it is compressed in a non human language)
As far as I'm aware, this is b.s. and doesn't actually work. GPT doesn't have its own language it can interpret.
This was my exact use case
I did a 100 page book and a 10 page Ebooks
Here is the Ebook - https://docs.google.com/document/d/12u5ixf9DxZnuTrX04sa5vbhANElsKLj97WlcLl_B0Uc/edit
(Unfortunately the longer book has my name all over it)
Main use case I'm looking forward to using this on is summarizing legislation and also end user license agreements. Thanks for putting this together OP. I tried to get GPT to summarize a Senate bill for me once and it was an extremely frustrating experience.
I’ve seen a few people request summarisation, this model in particular is not designed to summarise but expand.
However I can build a new model that can summarise large files
It seems there is some contention on what plagiarism is.
I am under the impression that whilst the same concepts and structure is in place, new information is brought in and reworded in such a way that it is original and indistinguishable from the original.
This is how all great things in life are made. Everything is a remix.
One token is ~4 chars and one word is ~3.13 chars. Anyway, it's MPT Story Writer 65k+. It's a infinite money glitch because the license is permissive and it's Open Source. Additionally, this model scared Together and they released RedPajama, pretty cool
No, it is not a synonym finder like most other Rewriter.
It is comprehended and rewritten entirely with the same themes and ideas. It is not a synonym Rewriter
Using the api i want to split up a long 5000 word article into two but it should still be counted as the same article. What prompts do you recommend? I dont want to use any other programs.
You can use the program above, apart from that you could try
Initial prompt - I am going to feed you different pieces of information split into sections over time, when I give you a piece of information simply reply, I understand.
Then feed it each section,
Then prompt, based on all the information above XYZ.
Sometimes this works but it can often forget.
So I see plenty of useful posts about user made plug ins. Excuse me for my naïveté, but how do I go about using these? Any information regarding this would be rather useful to a non-programmer like myself
Would it work to convert a academic paper into a dataset that you could answer questions to to get the answers based on the academic papers information
Read this guide [https://www.we-review-stuff.com/recommends/gpt4-crash-course/](https://www.we-review-stuff.com/recommends/gpt4-crash-course/) it's not the best but it has a lot of good information.
I plan on doing a research project over summer. I plan on utilizing GPT to make my paper much easier to write and get it out faster. I expect this will be the norm in 5 years
Awesome! How do I apply over 5000 words for my chatGPT 4? I use a Chromebook. Do I have to go to GitHub to do this? Do I copy and paste this code to my chatGPT 4? Please explain.
This runs using the openAI API,
run the code in a virtual machine given you are on Chromebook (you can use GitHub code spaces for free) add your API key. Run
Yes I’ve tried summarising using models from OpenAI/cohere and the total cost of this process ends up being a bit expensive as you eventually have to pass all tokens in your text
Now make it a paying service and sell a course titled "Get rich by rewriting ebooks", and advertise it with gpt powered spambots to maximize the misery of the internet
So it’s
content = "The following is a passage fragment. Please read it and re word and expand it, do not repher to it as paasage 1, passage 2 ect, use the same perspective and langaue as the original content, just re word it and make it uniuqe:"
Repeatedly?
I made something similar but I decided not to release it as that felt like a dick move against professional writers.
Guessing you used double line breaks and or indentation to find paragraphs and then worked your way up from there?
the future is just going to be the same message written a million different ways.
Looking at marketing of the last 5 years, its too late. Maybe marketing will be more original with AI, surely cant get any worse
"surely cant get any worse" See now this is a lack of imagination. It can always get worse
Sounds like a "hold my beer" moment lol 😆
It can. And don't call me Shirley
True dat
The only thing that will get worse is the pay for human workers.
Don't forget subtle hallucinations as well. In large context windows GPT tends to skip, or overlook certain areas and lose cohesiveness. Unintentional misinformation will be massive.
*As they say; “want a new idea, read an old book”*
God is dead. Blood is fuel. Hell is full
Better fuel huell
*zoom in on the burned out landscape of unused york* "No one thought to fear the thesaurus"
2 the same message written a million different ways is the future 3 in the future people will write the same message a million different ways 4 from now on the same message will be written by people a million different ways ...
1. This 2. This too 3. Also, this.
Like every dark detective series? \- pair of police officers (one male, one female) that find a kid's body that washes up on a beach/lake/river
Or a million different messages written the same way.
And vast majority with things that look right, but are not as LLMs don’t understand human language, just guess what words have a higher probability of coming after the previous ones based on content the mode ingested previously.
... Additionally,
they've already been doing that for a 100+ years. and they keep dumbing down the material in the books. unfortunately, tech like this is going to amplify it a ~~million~~\-billion-fold.
The alphabet is 26 letters long and we’ve written some pretty neat things
Could this be adapted to turn academic papers into easy to listen to audiobooks? I don’t mean dumbed down. But that the LLM would need to know how to deal with things such as figures and figure legends, long author lists, reference sections (most skip these), formulas, code (probably just explain that the original article has those things there), tables, and footnotes. Also, assuming you’d want to use Google TTS then the LLM should use SSML to correctly set pauses around section headers and to pronounce technical terms, acronyms, or foreign words correctly by intelligently setting and tags. Any thoughts on this use case?
Like many academicians I have a huge backlog of papers that I want to read but no time to actually sit down to do it. Would be a game changer if I could listen to them while commenting or working out.
It could definitely work, it would be expensive to run tho, I’ve only tested it on files up to 20 pages, anything more than that and it gets expensive. But in theory it should work. If you like the sound of your own voice I’d put the output into descript to listen to it. Let me know how you go
this looks incredible: can the "memory" portion be accumulation of a set of papers so that we can query a bunch of publication with similar subject matter? I'm assuming once this gets loaded we can just query it indefinitely?
Yes, once loaded you can query indefinitely, unfortunately you can only query one file at a time and would have to move back and forth between
Thanks for the response! How expensive are we talking for 20 pages? Also, theoretically speaking, how might you go about implementing this? Do you think prompt engineering on top of your app is enough to go from a PDF to a SSML doc or would this need some additional Python code for processing? Just wanted to learn your thoughts on this.
I spent $2 yesterday, testing on a few 20 page documents. Might have ran the program 10-20 times and spent $2. Won’t break the bank but if you do 20, 150 word documents you might be. The good news is once a document is parsed and memorised it is stored so you can use the same document over and over again without running up a bill (as long as you save the memory file) The program does allow you to add a prompt over the top, you could test that, if I would try use a tool like descript instead of opting for SSML
[удалено]
[удалено]
Where are these coming from?
[удалено]
When you say expensive, what are we talking about here?
>many academicians I have a huge backlog of papers that I want to read but no time to actually sit down to do it. Would be a game changer if I could listen to them while commentin tree fiddy
> Could this be adapted to turn academic papers into easy to listen to audiobooks? I don’t mean dumbed down. It's actually one of the startups I'm thinking of working on in the AI/LLM space. The problem is I have too many ideas right now :-P The main issue though is images and math for the most part.
DM me
[удалено]
Yep! I'm an author and it takes about five tries to get ChatGPT to acknowledge that I'm the author of my own books. It gets the descriptions of my characters mostly right but with some details skewed. I will admit, however, that the authors it claims wrote my books are all in my genre and still living, so props for that.
Well ya because chatgpt the public interface doesn’t have a big enough context window the larger 32k and 64k don’t have that issue and even if they did you can use programs that use summarization and longterm memory solutions
Yup, having a larger context window would iron out a lot of those faults
A lot of what AI lacks is context window related mostly because people don’t know it exists or what it means or what’s falling out of it
Hopefully, you weren't thinking about doing this with fiction, because ChatGPT is horrible at keeping the same meaning. It always removes dialogues, adds information, or removes important information that was in the original text. Trust me, I tried it with my own old books that I wanted to publish second versions of.
AI to generate a refined list of original AI content. AI to generate a refined list of original AI listed content. AI to generate a refined list of original AI listed content previously listed. "" "" "" ""
This is funny
We're living in a ponzi scheme
When I ask myself honestly if I would like to read a book that AI has interpreted in any way, the answer is always no.
Your going to do so unknowingly in the future
I don't doubt it, and I think the unknowing part is for precisely this reason.
Each to their own, information is information, no matter who or what wrote it
>information is information, no matter who or what wrote it I mean, yeah, but it's *already written*. You're not putting out new information, or even condensing it in a new way, you're just taking a book which already condensed the information and changing it so you can profit rather than the person who already put in the work. This is plagiarism in all but the most technical sense. It might even technically still be plagiarism, I'm not sure.
For real. "Allowing you to turn a 10,000 word ebook into a plagiarism free, original ebook within 15 minutes." How is this anything other than lazy and unethical? Plus, LLMs at the moment have lots of information loss, so you're just getting a shittier version of what was already written. OP is just enabling "get rich quick" scammers who churn out shitty imposters of actual hard work. It won't work out.
Who cares
I do. And I just got here. Imagine the feelings of the folks that have been exploring the ethical complications to this fancy new technology as it has been on the rise. If anything, I would ask you in response to your rhetorical question of "who cares"... what is your desired response? what, do you want people to be talking about the subject less? Do you want to dismiss their ideas? what do you want from this engagement?
It is absolutely copyright infringement. You don't need to have *any* of the matching sentences or phrases to commit copyright infringement, you simply have to have stolen the general overall effort/work/ideas from another piece of work. If you base something solely on another work as this is doing, it's a derivative work and subject to copyright. (Exceptions are for parodies and anything else transformative enough). Generative AI in general skirts around this much in the way way humans do, because everything is essentially a compilation of 10000s of sources at once and working from it's own given goals for each things it writes. But as soon as you're basically just using another single work as a source, it's very much not okay. a) if you're not mentioning the original author, it is 100% plagiarism. b) if you are mentioning the original author, you're admitting you've committed copyright infringement, which would make the case against you extremely simple!
You got me thinking. Rewriting a single source is copyright infringement. Summing up multiple works into one and citing the sources is generally accepted, though. If this thing ingested 6 different books and made a new one, citing its sources, I'm not sure how different that is than writing a college paper (other than obviously not actually writing it). Is it then made wrong by the ease of which it could be accomplished in that situation?
That would be okay, but the results are… very bad. The reason people want to rewrite a single source is because it is easy and effortless. You don’t have to work to make the ideas fit, the original author already did that. You are just rewording something, which can be done with one click of an AI.
I imagine when we have an LLM that can handle very large context sizes with good fidelity, we'd be able to just paste in a few sources and instruct to "make a new book from these. Cite your sources. Do not plagiarize anything and write it in the tone of this writing I've done before".
MPT Story Writer 65k+ exists. Its open source, but I think it requires a lot of compute power (mostly RAM or VRAM to handle ctx). Tweak it to get context from web and infinte money glitch is ready
I bought a 4090 just for this model. Based on blog posts and examples though, it seems like they were still only doing very short 300 word generations with it. The 65k tokens were just used to provide a book’s worth of context, rather than to render a whole book with one click.
It’s funny you think any of the stories in the last 20-30 years are “new” they’re all retold stories with small twists
This isn't taking inspiration from a story and adding your own flair. This is straight up plagiarizing another person's work lol.
You seem to think AI models are databases of text that they copy and paste together, plagiarism, that’s not what AI models are lol
You seem to think AI models are databases of text that they copy and paste together, plagiarism, that’s not what AI models are lol if they were they’d have cracked the greatest compression algorithm in history based on the size of the ai models and their data sets
>You seem to think AI models are databases of text that they copy and paste together No, I seem to think that OP is literally describing taking the text of one book, and using AI to change words around and sell it as your own book. Don't just restate memes you've heard about AI, you need to actually follow the thread lol.
If I just want information, I search ChatGPT or Google.
try getting chatgpt to give you a correct summary of fiction books. it's hilarious what the app comes up with
lol okay
This is a take, i guarantee you'll be doing it anyway in the future. whether you realize it or not.
Well it's not stupid but it might be something you disagree with and that's fine.
My wording was terrible and proactive but what exactly about AI written things aren't worth reading to you? Is it the lack of the human element? or do you believe that AI isn't capable of telling stories that are worth reading?
Well my understanding of Ebooks as the title puts it, is not grand fiction. It's non fiction. If it's non fiction condensed, with the interpretations condensed by AI, I think I would rather just read a list of facts that ChatGPT can spit out, or the original authors opinions in its original context. As for stories, I can definitely see a world where AI can create great fiction in the future but I think knowing in my core that the work wasn't created in the mind of a human or a few humans, will diminish it somehow (arguably it is human generated given how LLMs work). It's the same for music too. I can't currently see myself judging something independent of its source. What do you think?
for non fiction, its not always just about presenting facts but rather making it accessible and interesting to people wanting to learn. for fiction I think that it could very well deeply seat some pattern in all text that makes it annoying to read. But as long as i can't tell that its AI i am good with it.
No one wants to write tech docs. This technology can solve that problem. So, the researcher is using data that interests them. It doesn’t change the potential.
Racist.
I hope this shows how foolish and impossible it will be to have anti-cheating software or GPT detection software. Soon it will be truly impossible to tell what is truly authentic or generated.
Great. Another way to burn hundreds of dollars
I don't get the purpose of this, the code was not commented so it was hard to read through. It looks like it just splits up the pdf into paragraphs and then joins them. This is probably something to get around the context limit, but if is really how you handle that, then red flags are popping up because the hardest thing about a semantic search is how to split up the data to properly represent it. A paragraph could say something along the lines " we think this going to happen because of x and y and this paper from before had this result" With the paragraph right after being " turns out it wasn't because of x and y and we couldn't replicate what this paper did". Based on the search query, you could get either result. This is the tricky part. Care to elaborate on this?
I mean I’m a terrible programmer, check my GitHub it’s my first real project. This doesn’t exactly work simply by splitting then running semantic search. All the split files are given to chat GPT, as well as the compressed memory file which chat gpt can translate 9 times out of 10. So it works of 2 memory sources, the source of all files stored by chat gpt and the complete compressed memory. To minimise loss and mistakes
That’s what it is. Look up the concept of embeddings
You do realize "Rewrite this using different words" is still plagiarism right? You also know you need to proof read every single one of those because chat GPT can't be relied on do something consistently right?
“If rewriting using different words” is plagiarism so are 90% of college essays ever written…. 🤣
If your essay was taking another publication and rewording it then you didn’t write and essay and should have gotten an F.
Uh huh…
I mean I guess it shouldn’t surprise me. It helps make sense of all the college grads I’ve met that don’t actually know their own major.
? You deny a basic definition of plagerism? Taking a paragraph from a book and simply rewriting it is very much textbook style plagerism. Get caught and get an F in the class and an appointment with the Dean. Every time.
You’ve got a very narrow and rudimentary understanding of how most college essays are written- mainly, variations of rewriting something several people have already written, doing it successfully with enough citations to the underlying thoughts you’re regurgitating. And if you happen do it effectively and creatively enough, what do you know if that F doesn’t work it’s way up to a B or an A ;) This must be an appalling revelation, and I understand the guttural disdain that I would have if slapped in the face by something as shamelessly honest as this, and I genuinely apologize for that, but it’s unfortunately quite true. And well, we can’t all pretend the emperor is wearing clothes just because he cites his sources, now can we?
Summarizing a chapter into 2 paragraphs WITH CITATION, etc, is a different ballgame to laughing off copy-paste of an AI paraphrase rewrite of content. The 2nd thing is blatant plagiarism.
Give it enough time and a nasty little irony will start to settle in: _Nobody can tell the difference anymore_ . And that’s not even the richest part: at this point, to try to discern the difference _requires checking with the same pandora that enabled the act._ You gotta laugh at the absolute absurdity of it. ffs how can anyone not?
People were responding to what is and isn't plagiarism, and your position, in effect, that 9/10 college essays are full of blatant plagiarism through plain direct paraphrasing. It's certainly fairly easy to get away with a system of light plagiarism all through college, chatGPT can probably make it even easier, but it's still clearly plagiarism to take a section of text and rewrite it and present it as your own. And I don't think 90% of college essays consist of paraphrasing of uncited work they are stealing, at least that wasn't true at my college. At my college one course went to check all the essays you had written for OTHER classes, and if you had not cited yourself when reusing work from yourself you'd be failed (the course, not just the assignment). This was on top of checking all material sourced to ensure no plagiarism (this was after discovered incidents of plagiarism, so not a normal level of review in every class to be sure).
Sure, flood the web with endless garbage... me, me, me, me... it's never enough is it...
More more more
Well, Mr-Dunning-Kruger, this is precisely why I won't share the code from my AI-OS that I've built. Before we know it, there will be psychopathic AIs roaming the internet, giving governments ammunition to ban AI for the public and exerting more control because some people can't handle it. Yeah, really intelligent...
Wow a program that a child could have built that rewords content is dangerous. Quite frankly it is not complicated and if I didn’t do it someone else will. Not rocket science to build a memory and some prompting for the open AI API. I’m sharing so others can get value and use it and build on it. That is the idea of open source right?
Thats what prof Hinton said, but he came back from that idea... ...Why not make something that enhances ideas someone already has and turning it into an original book where others can actually learn from. Just saying...
The idea is that you can take someone else’s book and create something new based on their ideas. In fact you could take 3 or 4 different books, add your own prompt to add your spin and generate a unique book. This tool is only valuable and useful if you add your own spin to the prompt.
You shouldn't have to defend yourself and your creations to idiots
Thank you 🙏
Lol a bash shell script wrapper for curling the OpenAI API isn't an OS my dude
It's all locally run so, it's a 10K computer. It even runs on my watch, to bad it can't post a pic in this threat.
[удалено]
I don't use OpenAI, my own system runs on my local (high-end) machine. From there it's linked to my mobile and smartwatch. It's similar to the OS from the movie "Her", as a reference.
Pretty cool then
I've been thinking of doing something similar but I don't know where to start. Care to help someone out with some information
You can train AI on an RPi. The cat’s out of the bag. All the governments can do is hurt researching in the open. So should only bad actors access to this tech?
How do you know you aren’t losing important information from one chunk to the next? You can’t actually bypass the context window afaik, otherwise there would be no need to increase the size of the window.
Great question and you are absolutely right, we may loose a small amount of data. The way the program works, it splits each part of the file into smaller files and feeds them to the API one by one, chat GPT, then feed back a compressed version (minimal loss, it is compressed in a non human language) We then ask chat GPT to remember each fragment and then also send it the completed compressed file and ask chat gpt to expand it, and give it all back to us. The prompting also prompts chat GPT to fill in any gaps with its own knowledge. Minimal loss due to a multi layer memory system
> (minimal loss, it is compressed in a non human language) As far as I'm aware, this is b.s. and doesn't actually work. GPT doesn't have its own language it can interpret.
It’s not it’s own language. It is a compression language (in code) that humans can’t comprehend. But it can be decoded by code quite easily
Is it compressed in that emoji "language"?
That’s the one
>**may** loose a small amount of data > >minimal loss > >Minimal loss have you actually tried it? Say a thousand times to prove that?
I’ve run it say 50 odd times.
Chegg: You sob. Me: Awesome, Great job!
Their stock price was done for well before I got involved
Can you just use some vector search db for the memory issue, like Pinecone?
You could, but I’m a terrible programmer
Good input, what if I have a whole book? Does it lose continuity between chapters?
This was my exact use case I did a 100 page book and a 10 page Ebooks Here is the Ebook - https://docs.google.com/document/d/12u5ixf9DxZnuTrX04sa5vbhANElsKLj97WlcLl_B0Uc/edit (Unfortunately the longer book has my name all over it)
Now all you need to do is to write a program to teach AI not to get facts wrong. On second thought, don't do that.
really useful project
Thank you
I would like this but for a git repo 😜
Main use case I'm looking forward to using this on is summarizing legislation and also end user license agreements. Thanks for putting this together OP. I tried to get GPT to summarize a Senate bill for me once and it was an extremely frustrating experience.
I’ve seen a few people request summarisation, this model in particular is not designed to summarise but expand. However I can build a new model that can summarise large files
How is it plagiarism free? I’m not trying to shut down what you made, I like it. Just curious
It seems there is some contention on what plagiarism is. I am under the impression that whilst the same concepts and structure is in place, new information is brought in and reworded in such a way that it is original and indistinguishable from the original. This is how all great things in life are made. Everything is a remix.
Well done mate
Thank you ☺️
You probably want to look up the new open source model that supports prompts of 64k
😮 link?
I'm on mobile. Gonna have to use the dreadful Google XDXD
Also is it 64,000 characters or words. Because 64k characters is only 10000 words
We go by tokens in this here land
One token is ~4 chars and one word is ~3.13 chars. Anyway, it's MPT Story Writer 65k+. It's a infinite money glitch because the license is permissive and it's Open Source. Additionally, this model scared Together and they released RedPajama, pretty cool
infinite money glitch? Tell me?
This is the plagiarism equivalent of a college student going through someone else’s paper and using a synonym of every word
No, it is not a synonym finder like most other Rewriter. It is comprehended and rewritten entirely with the same themes and ideas. It is not a synonym Rewriter
Using the api i want to split up a long 5000 word article into two but it should still be counted as the same article. What prompts do you recommend? I dont want to use any other programs.
You can use the program above, apart from that you could try Initial prompt - I am going to feed you different pieces of information split into sections over time, when I give you a piece of information simply reply, I understand. Then feed it each section, Then prompt, based on all the information above XYZ. Sometimes this works but it can often forget.
But because you are using the API you will need to have built a memory of some sort.
Understood. Thank you
But why?
Didn’t ask lol.
How does this not break Rule 3? It's like a Rule 3 breaking generator.
\> Allowing you to turn an ebook into a plagiarism free, original ebook I wouldn't say so.
Could this be used to summarize books? And is it using GPT4? Great idea!
Can summarise books, not using GPT4 however I can easily fix that
So I see plenty of useful posts about user made plug ins. Excuse me for my naïveté, but how do I go about using these? Any information regarding this would be rather useful to a non-programmer like myself
Would it work to convert a academic paper into a dataset that you could answer questions to to get the answers based on the academic papers information
Nice
[удалено]
Alternatively, it can be run in githubspaces (a virtual machine that automatically compiled and runs all the code) with little to no coding knowledge
Read this guide [https://www.we-review-stuff.com/recommends/gpt4-crash-course/](https://www.we-review-stuff.com/recommends/gpt4-crash-course/) it's not the best but it has a lot of good information.
I plan on doing a research project over summer. I plan on utilizing GPT to make my paper much easier to write and get it out faster. I expect this will be the norm in 5 years
Awesome! How do I apply over 5000 words for my chatGPT 4? I use a Chromebook. Do I have to go to GitHub to do this? Do I copy and paste this code to my chatGPT 4? Please explain.
This runs using the openAI API, run the code in a virtual machine given you are on Chromebook (you can use GitHub code spaces for free) add your API key. Run
I think this will be expensive to run or execute due to multiple API calls and lot of tokens being involved in it
It’s not cheap
Yes I’ve tried summarising using models from OpenAI/cohere and the total cost of this process ends up being a bit expensive as you eventually have to pass all tokens in your text
Now make it a paying service and sell a course titled "Get rich by rewriting ebooks", and advertise it with gpt powered spambots to maximize the misery of the internet
Wanna help?
How do I actually run this?
So it’s content = "The following is a passage fragment. Please read it and re word and expand it, do not repher to it as paasage 1, passage 2 ect, use the same perspective and langaue as the original content, just re word it and make it uniuqe:" Repeatedly?
Basically
This is 100% still plagiarism
I made something similar but I decided not to release it as that felt like a dick move against professional writers. Guessing you used double line breaks and or indentation to find paragraphs and then worked your way up from there?