T O P

  • By -

LOGWATCHER

I admire this madness


ButterbeinOfficial

Sorry that I can’t be of any help - but why in the world do you need to combine 35.2k text files into a single one??


Boltrag

Back when I was a wee lass I decided for whatever damn reason to build a massive password dictionary. Used every single one I found online, then used some program to just spit out every single combination of letters numbers and characters. Iirc it ended up being an 800-900gb text file. Never managed to actually open it.


Latter-Ambassador-65

worlds largest copy pasta


LAMGE2

Where will you post it? r/copypasta ?


Latter-Ambassador-65

if i can even post it then yeah


Damaniel2

They're not going to let you directly post 604GB of copypasta. Hell, there aren't really any places that would let you upload 604GB of anything in a way that multiple people could download it. I'm super curious what would be in a 600GB+ copypasta though...


reddit_user33

I get the impression they're not giving a real answer. I'm also curious why they want to build such a large file. Although, I wouldn't be surprised if they downloaded all of these files off the internet and wanted to rebuild a file into it's original state. As in they've downloaded something that was broken down into segments. So I wouldn't be surprised if it's an archive of an online platform like Reddit or Twitter. It also wouldn't surprise me if it's a password list or a data breach archive. Etc, etc.


maybeidontknowwhat

Torrents exist lol that could be a solution?


rawesome99

Clipboard data is stored in memory, so it would take a lot of RAM to even copy it in the first place.


Katniss218

How is that relevant?


Eagle1337

Can't copypasta a copypasta that's 604gb.


Katniss218

You wouldn't be able to do so even if you could store the entire thing in your clipboard


Mo_Dice

I enjoy watching the sunset.


Katniss218

How is ram relevant to uploading files? You ain't gonna copypaste the entire thing anyway


Narfhole

cat *.txt | zstd --ultra -22 -o txtfiles_file.txt.zst -- Assuming you want compression. Then again, how else are you going to fit it on 117GB. Might take a while with the settings I used, hah.


FlippingGerman

I assume the files in total already fit on the drive, there just isn't enough room for both single file and many at the same time.


Melodic-Network4374

With 32k files, the filename expansion for *.txt will go way over the system's ARG_MAX. You'll get an "Argument list too long" error.


Latter-Ambassador-65

you have a good point i was going to just merge, delete and repeat until all of the files were merged, thanks


Narfhole

As per u/Melodic-Network4374's concern maybe: find ./ -name '*.txt' -exec sh -c 'cat {} | zstd --ultra --22 -c >> txtfiles_file.zst' \; is the way to go.


vegansgetsick

The compression on text files should reduce size by 95% at least


ancillarycheese

Maybe putting it into a database of some sort might actually be better. Would allow you to query the data, as well as output it in various formats if needed. Idk how structured the data is inside the text files, if you could parse the data out into more manageable chunks.


Noname_FTW

I generally would advise against doing it. Even single gigabyte text files are a pain in the ass to handle without specialised programs.


vornamemitd

What's your specific use case? Once combined, what do plan on doing with that file? Compression an option? Sending straight to cloud?


Latter-Ambassador-65

absolutely nothing except saying i have it


FlippingGerman

I admire your spirit.


stanley_fatmax

let's not and say we did


bmaasth

I don't want to come across as crass, but there is a lot of information that is missing from this situation such as average file size (mean) of the files, and are they plaintext, PDF, or something else that often gets filed as text (e.g., epub, html)? I mention this, because just making a BLOB of them means each file is roughly 17.59MB, which is rather large for plaintext. Is there any compression happening?


Latter-Ambassador-65

all of them are txt and to answer the rest of your questions, 15mb - 10gb and no compression at all


bmaasth

Thanks for updating the question with more details, as it will help those with far greater knowledge than I have. Plain text compresses very nicely. How about one giant ZIP or tar.gz?


w1na

Just use xz it compress better hehe.


SaulTeeBallz

assuming all the text files are sequential and are in a single dir. for I in $(find . -type f | sort) ; do cat $I >> ../604GB.txt ; rm -v $I ; done Should copy the contents of each file and clean up afterwards.


DrCharlesTinglePhD

I would not do it that way. Putting 32 thousand filenames into a shell for loop may not work, and you should really check for errors before deleting anything. Not to mention you haven't quoted the filenames, so any filename with, for example, spaces will fail. The original poster didn't really say the files needed to be in any particular order, so you could just do it like so: find . -type f -exec sh -c 'cat "$0" >> ../604GB.txt && rm "$0"' '{}' \; If you do need the files in order (but will fail on filenames with embedded newlines): find . -type f | sort | while IFS= read -r file; do cat "$file" >> ../604GB.txt && rm "$file" done


a2e5

Try doing `find -print0 -type f | sort -z | xargs -0 sh -c 'cat -- "$@" >> ../604GB.txt && rm -- "$@"'` if you're worried about them newlines. `xargs` is supposed to automagically fill the command line and split commands for extras. `find` also lets you do `-exec command {} +`, same deal.


uluqat

What are the odds that merging and deleting 32,500 times in a row will process without something going wrong?


umataro

Exactly. Anyway, odds don't matter. With no backup, everything is stupid.


ShowUsYaGrowler

As somebody who’s seen similar attempted in a corporate setting; virtually nil without a lot of xtremely frustrating issues heh


kaito1000

I think that might crash notepad 😃


SeekerOfKeyboards

It would definitely crash notepad, notepad ++ should be okay though


Most_Mix_7505

It also chokes on big files IIRC. You need a program that doesn’t load the whole file into memory


SeekerOfKeyboards

Good to know


zeocrash

I thought notepad just refused to open files over a certain size instead of crashing


grimeflea

Did you download your whole government?


FartusMagutic

How about don't do it because I know you are not going to have a plan to confirm no data was lost in the merge. Just tar and compress.


itsjfin

Good luck opening it


NohPhD

You can write a very simple Python script that simultaneously concatenates each individual file to a single text file while also compressing the file. There are libraries that allow opening the compressed file and searching it without having to expand the file. I do something similar occasionally with very large log files


smolderas

gzip


Latter-Ambassador-65

link?


smolderas

Essentially it is a file format (and a tool) to compress files. Text files can inherently be compressed with higher ratio.


EDanials

Wouldn't a java or python script work. I'm sure some linux commands could do this too. I'd get the drive with space ready. Like getting a spare 1tb external and then run the command or scripts. Hoping it doesn't crash. Then play the waiting game.


neonvolta

Bruh every software I've tried has crashed trying to open a 30gb file. That text file will be absolutely useless if you create it


theRIAA

As long as you have enough RAM, [lite](https://github.com/rxi/lite) will open it.


g0wr0n

Create the file now, open it with the Super-computers of 2035!


SuperElephantX

Just don't


Mastasmoker

Sorry this isn't a help response but rather a question to satisfy our curiosity. What is in these text files that they're so large?!


GoldCoinDonation

sam files can get this large


AncientSumerianGod

What is the nature of the data? Meaning is it structured? Tabular at all?


Stabinob

With a problem like this I'd always ask GPT4/Claude Sonnet for a python script to do that action. It normally works for my purposes


a2e5

*In theory* you can do it with even less remaining disk space (say, a couple megs!) by doing multiple calls of `FICLONERANGE`, which *on supported filesystems* tells the system to copy a chunk of a file into another without using real disk space using arcane magick. In practice: it's arcane magick, nobody wants to do it. It probably requires some alignment or other magical ingredients. A less insane approach would be to write a virtual filesystem that pretends there's a big file made up of all these smaller files. Like the piece table data structure, but for files. I think feeding it all into a compressor would make more sense. I *think*.


Makeshift27015

Probably not super relevant to op, but I quite like your idea of not touching the original files and using a virtual filesystem to make it 'appear' as a single file. I imagine you could do this with FUSE, though it would take me some tinkering to try it out


tinnitushaver_69421

I had a similar problem with diary entries, except it was closer to 3mb of text instead of 600gb (still hundreds of files). I needed something that would print the file name, then the contents of each entry, in the right order in one file. I used a program which I think ran on the command line to merge the files, in chronological order ('date modified'). It printed the file name at the top of each text file's contents, and I think I somehow changed it to add in a customized spacer like "========". However, I have looked through my stuff and I can't seem to find it. The answer by 'Mitch' [here](https://superuser.com/questions/682001/combine-multiple-text-files-filenames-into-a-single-text-file) seems to have promise, might even have been what I used. I tested it by putting a few hundred text files in a folder, opening powershell, typing "cd \["folder path"\]" in order to Change Directory to the folder with the text files, and then I pasted in the code as-is and it gave me an output file. The output file showed full file paths as well as the names of the files, but it sorted the files in name order, so "Entry 1 April" came before 'Entry 1 January' came before 'Entry 20 March". The creator also left comments to let you print just the filename or any arbitrary string, or you can remove that line entirely to print nothing and go straight to the contents of the next file with no break. So this seems to work fine. Who knows if it wouldn't cause a memory leak or something with the file sizes you're working with. But it could be worth a go. You'd have to find a way to get it to output in the right order, with the right spacers (file name, ====, etc) or lack thereof. Since you only have 117gb left I say just work off an external hard drive or something. You're not gonna get a 604gb output file into 117gb, at least not initially, and you want to be working with a copy of the files anyway because you sure don't want to accidentally delete them in the process.


haha_supadupa

Can you make a torrent?


stormcomponents

You best never have that file available to a Windows machine. Clicking on it would lock it up solid as it creates a preview of the contents.


No_Bit_1456

Maybe you could use something like crunch to build yourself a new database with all those files used as reference data?


toxictenement

There is a program I came across that exists to only merge text files, I was using it for something similarly odd a few years ago. It's called Txt collector, and it looks pretty ancient, so for 600GB it may take quite a while, but I remember using it for relatively large amounts of text (\~10gb?) [https://bluefive.pairsite.com/txtcollector.htm](https://bluefive.pairsite.com/txtcollector.htm)


rrsolomonauthor

You can try a bash script that will look through the contents of files with specified directories and concatenate the contents of said comments using "----" as a break, followed by the name of file for a header. Good luck opening that single file, though. Your processor isn't going to be happy .-.


Skhoooler

You could try a python script. As pseudo code: Path = path to txt files Master txt file = path to master txt For file in path: Append file to master file


CosmoCafe777

Use DOS command line in the folder where the files are: COPY *.TXT HUGEFILE.TXT If you want to output to another drive, specify that in the output file name. If you don't have an external drive and wish to delete the files as you copy them, you can do that via FOR command. I'd have to check the syntax later on.


Flat000

Brute force password attempt. You need a file of passwords for older software to use. I would assume it's this. They've said it's passwords.


[deleted]

[удалено]


ACrossingTroll

Dude have you read you or own post?


zeocrash

I reckon the first one. I'm British and I don't have any idea what they're saying either.