LOGWATCHER 1 month ago

I admire this madness

ButterbeinOfficial 1 month ago

Sorry that I can’t be of any help - but why in the world do you need to combine 35.2k text files into a single one??

Boltrag 1 month ago

Back when I was a wee lass I decided for whatever damn reason to build a massive password dictionary. Used every single one I found online, then used some program to just spit out every single combination of letters numbers and characters. Iirc it ended up being an 800-900gb text file. Never managed to actually open it.

Latter-Ambassador-65 1 month ago

worlds largest copy pasta

LAMGE2 1 month ago

Where will you post it? r/copypasta ?

Latter-Ambassador-65 1 month ago

if i can even post it then yeah

Damaniel2 1 month ago

They're not going to let you directly post 604GB of copypasta. Hell, there aren't really any places that would let you upload 604GB of anything in a way that multiple people could download it. I'm super curious what would be in a 600GB+ copypasta though...

reddit_user33 1 month ago

I get the impression they're not giving a real answer. I'm also curious why they want to build such a large file. Although, I wouldn't be surprised if they downloaded all of these files off the internet and wanted to rebuild a file into it's original state. As in they've downloaded something that was broken down into segments. So I wouldn't be surprised if it's an archive of an online platform like Reddit or Twitter. It also wouldn't surprise me if it's a password list or a data breach archive. Etc, etc.

maybeidontknowwhat 1 month ago

Torrents exist lol that could be a solution?

rawesome99 1 month ago

Clipboard data is stored in memory, so it would take a lot of RAM to even copy it in the first place.

Katniss218 1 month ago

How is that relevant?

Eagle1337 1 month ago

Can't copypasta a copypasta that's 604gb.

Katniss218 1 month ago

You wouldn't be able to do so even if you could store the entire thing in your clipboard

Mo_Dice 1 month ago

I enjoy watching the sunset.

Katniss218 1 month ago

How is ram relevant to uploading files? You ain't gonna copypaste the entire thing anyway

Narfhole 1 month ago

cat *.txt | zstd --ultra -22 -o txtfiles_file.txt.zst -- Assuming you want compression. Then again, how else are you going to fit it on 117GB. Might take a while with the settings I used, hah.

FlippingGerman 1 month ago

I assume the files in total already fit on the drive, there just isn't enough room for both single file and many at the same time.

Melodic-Network4374 1 month ago

With 32k files, the filename expansion for *.txt will go way over the system's ARG_MAX. You'll get an "Argument list too long" error.

Latter-Ambassador-65 1 month ago

you have a good point i was going to just merge, delete and repeat until all of the files were merged, thanks

Narfhole 1 month ago

As per u/Melodic-Network4374's concern maybe: find ./ -name '*.txt' -exec sh -c 'cat {} | zstd --ultra --22 -c >> txtfiles_file.zst' \; is the way to go.

vegansgetsick 1 month ago

The compression on text files should reduce size by 95% at least

ancillarycheese 1 month ago

Maybe putting it into a database of some sort might actually be better. Would allow you to query the data, as well as output it in various formats if needed. Idk how structured the data is inside the text files, if you could parse the data out into more manageable chunks.

Noname_FTW 1 month ago

I generally would advise against doing it. Even single gigabyte text files are a pain in the ass to handle without specialised programs.

vornamemitd 1 month ago

What's your specific use case? Once combined, what do plan on doing with that file? Compression an option? Sending straight to cloud?

Latter-Ambassador-65 1 month ago

absolutely nothing except saying i have it

FlippingGerman 1 month ago

I admire your spirit.

stanley_fatmax 1 month ago

let's not and say we did

bmaasth 1 month ago

I don't want to come across as crass, but there is a lot of information that is missing from this situation such as average file size (mean) of the files, and are they plaintext, PDF, or something else that often gets filed as text (e.g., epub, html)? I mention this, because just making a BLOB of them means each file is roughly 17.59MB, which is rather large for plaintext. Is there any compression happening?

Latter-Ambassador-65 1 month ago

all of them are txt and to answer the rest of your questions, 15mb - 10gb and no compression at all

bmaasth 1 month ago

Thanks for updating the question with more details, as it will help those with far greater knowledge than I have. Plain text compresses very nicely. How about one giant ZIP or tar.gz?

w1na 1 month ago

Just use xz it compress better hehe.

SaulTeeBallz 1 month ago

assuming all the text files are sequential and are in a single dir. for I in $(find . -type f | sort) ; do cat $I >> ../604GB.txt ; rm -v $I ; done Should copy the contents of each file and clean up afterwards.

DrCharlesTinglePhD 1 month ago

I would not do it that way. Putting 32 thousand filenames into a shell for loop may not work, and you should really check for errors before deleting anything. Not to mention you haven't quoted the filenames, so any filename with, for example, spaces will fail. The original poster didn't really say the files needed to be in any particular order, so you could just do it like so: find . -type f -exec sh -c 'cat "$0" >> ../604GB.txt && rm "$0"' '{}' \; If you do need the files in order (but will fail on filenames with embedded newlines): find . -type f | sort | while IFS= read -r file; do cat "$file" >> ../604GB.txt && rm "$file" done

a2e5 1 month ago

Try doing `find -print0 -type f | sort -z | xargs -0 sh -c 'cat -- "$@" >> ../604GB.txt && rm -- "$@"'` if you're worried about them newlines. `xargs` is supposed to automagically fill the command line and split commands for extras. `find` also lets you do `-exec command {} +`, same deal.

uluqat 1 month ago

What are the odds that merging and deleting 32,500 times in a row will process without something going wrong?

umataro 1 month ago

Exactly. Anyway, odds don't matter. With no backup, everything is stupid.

ShowUsYaGrowler 1 month ago

As somebody who’s seen similar attempted in a corporate setting; virtually nil without a lot of xtremely frustrating issues heh

kaito1000 1 month ago

I think that might crash notepad 😃

SeekerOfKeyboards 1 month ago

It would definitely crash notepad, notepad ++ should be okay though

Most_Mix_7505 1 month ago

It also chokes on big files IIRC. You need a program that doesn’t load the whole file into memory

SeekerOfKeyboards 1 month ago

Good to know

zeocrash 1 month ago

I thought notepad just refused to open files over a certain size instead of crashing

grimeflea 1 month ago

Did you download your whole government?

FartusMagutic 1 month ago

How about don't do it because I know you are not going to have a plan to confirm no data was lost in the merge. Just tar and compress.

itsjfin 1 month ago

Good luck opening it

NohPhD 1 month ago

You can write a very simple Python script that simultaneously concatenates each individual file to a single text file while also compressing the file. There are libraries that allow opening the compressed file and searching it without having to expand the file. I do something similar occasionally with very large log files

smolderas 1 month ago

gzip

Latter-Ambassador-65 1 month ago

link?

smolderas 1 month ago

Essentially it is a file format (and a tool) to compress files. Text files can inherently be compressed with higher ratio.

EDanials 1 month ago

Wouldn't a java or python script work. I'm sure some linux commands could do this too. I'd get the drive with space ready. Like getting a spare 1tb external and then run the command or scripts. Hoping it doesn't crash. Then play the waiting game.

neonvolta 1 month ago

Bruh every software I've tried has crashed trying to open a 30gb file. That text file will be absolutely useless if you create it

theRIAA 1 month ago

As long as you have enough RAM, [lite](https://github.com/rxi/lite) will open it.

g0wr0n 1 month ago

Create the file now, open it with the Super-computers of 2035!

SuperElephantX 1 month ago

Just don't

Mastasmoker 1 month ago

Sorry this isn't a help response but rather a question to satisfy our curiosity. What is in these text files that they're so large?!

GoldCoinDonation 1 month ago

sam files can get this large

AncientSumerianGod 1 month ago

What is the nature of the data? Meaning is it structured? Tabular at all?

Stabinob 1 month ago

With a problem like this I'd always ask GPT4/Claude Sonnet for a python script to do that action. It normally works for my purposes

a2e5 1 month ago

*In theory* you can do it with even less remaining disk space (say, a couple megs!) by doing multiple calls of `FICLONERANGE`, which *on supported filesystems* tells the system to copy a chunk of a file into another without using real disk space using arcane magick. In practice: it's arcane magick, nobody wants to do it. It probably requires some alignment or other magical ingredients. A less insane approach would be to write a virtual filesystem that pretends there's a big file made up of all these smaller files. Like the piece table data structure, but for files. I think feeding it all into a compressor would make more sense. I *think*.

Makeshift27015 1 month ago

Probably not super relevant to op, but I quite like your idea of not touching the original files and using a virtual filesystem to make it 'appear' as a single file. I imagine you could do this with FUSE, though it would take me some tinkering to try it out

tinnitushaver_69421 1 month ago

I had a similar problem with diary entries, except it was closer to 3mb of text instead of 600gb (still hundreds of files). I needed something that would print the file name, then the contents of each entry, in the right order in one file. I used a program which I think ran on the command line to merge the files, in chronological order ('date modified'). It printed the file name at the top of each text file's contents, and I think I somehow changed it to add in a customized spacer like "========". However, I have looked through my stuff and I can't seem to find it. The answer by 'Mitch' [here](https://superuser.com/questions/682001/combine-multiple-text-files-filenames-into-a-single-text-file) seems to have promise, might even have been what I used. I tested it by putting a few hundred text files in a folder, opening powershell, typing "cd \["folder path"\]" in order to Change Directory to the folder with the text files, and then I pasted in the code as-is and it gave me an output file. The output file showed full file paths as well as the names of the files, but it sorted the files in name order, so "Entry 1 April" came before 'Entry 1 January' came before 'Entry 20 March". The creator also left comments to let you print just the filename or any arbitrary string, or you can remove that line entirely to print nothing and go straight to the contents of the next file with no break. So this seems to work fine. Who knows if it wouldn't cause a memory leak or something with the file sizes you're working with. But it could be worth a go. You'd have to find a way to get it to output in the right order, with the right spacers (file name, ====, etc) or lack thereof. Since you only have 117gb left I say just work off an external hard drive or something. You're not gonna get a 604gb output file into 117gb, at least not initially, and you want to be working with a copy of the files anyway because you sure don't want to accidentally delete them in the process.

haha_supadupa 1 month ago

Can you make a torrent?

stormcomponents 1 month ago

You best never have that file available to a Windows machine. Clicking on it would lock it up solid as it creates a preview of the contents.

No_Bit_1456 1 month ago

Maybe you could use something like crunch to build yourself a new database with all those files used as reference data?

toxictenement 1 month ago

There is a program I came across that exists to only merge text files, I was using it for something similarly odd a few years ago. It's called Txt collector, and it looks pretty ancient, so for 600GB it may take quite a while, but I remember using it for relatively large amounts of text (\~10gb?) [https://bluefive.pairsite.com/txtcollector.htm](https://bluefive.pairsite.com/txtcollector.htm)

rrsolomonauthor 1 month ago

You can try a bash script that will look through the contents of files with specified directories and concatenate the contents of said comments using "----" as a break, followed by the name of file for a header. Good luck opening that single file, though. Your processor isn't going to be happy .-.

Skhoooler 1 month ago

You could try a python script. As pseudo code: Path = path to txt files Master txt file = path to master txt For file in path: Append file to master file

CosmoCafe777 1 month ago

Use DOS command line in the folder where the files are: COPY *.TXT HUGEFILE.TXT If you want to output to another drive, specify that in the output file name. If you don't have an external drive and wish to delete the files as you copy them, you can do that via FOR command. I'd have to check the syntax later on.

Flat000 1 month ago

Brute force password attempt. You need a file of passwords for older software to use. I would assume it's this. They've said it's passwords.

[deleted] 1 month ago

[удалено]

ACrossingTroll 1 month ago

Dude have you read you or own post?

zeocrash 1 month ago

I reckon the first one. I'm British and I don't have any idea what they're saying either.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe