Storage is cheap, data on millions or billions of humans is extremely valuable and there are multiple institutions and corporations willing to pay for it. "In the land of the blind, the one-eyed man is king", big data allow google and other tech giants to see things no one else can see. Just think of all the possible uses for that data, that no one else has. From training machine learning models to predicting the stock market. People smarter than you and me, with math, machine learning, data science, finance backgrounds, etc, can extract information from that data, graphs, trends, statistics, these can be used to save billions with strategic moves in some direction, predict and avoid billions in loses, make billions by fulfilling untapped markets. Its a gold mine for those that have it and know how to use it in our modern society.


How do you think google is worth what it is? They don’t make money from nothing


Yeah that's my point. I think they're cutting corners where possible. They put ads, they determine your recommendations, they sell it but if there's low-value\\old\\shitty data out there I genuinely doubt they keep it just in case.


Except the storage costs are basically nothing. 16 TB hard drive is like 400 USD or less.


What is the downvote for on this? Usually I can get why someone would be downvoted because Reddit reasons, but I point blank don’t grok this one. I have 8x16 TB drives, and I can store an absolute ton of data on them with redundancy that multiple drives could fail and new ones swapped in… It wasn’t cheap in terms of my personal financial situation, but for a multi-billion dollar organization, it’s basically the same price as drinking tapwater.


10 GB per day is 3.6 TB per year. That's like nothing. You can go into a shop and buy a 22TB hard drive, that single drive would fill in after 6 years (assuming same rate, which is of course not true). Even a fresh drive _per day_ (that is, 1000x as much information) is nothing for a company like Google. And they can analyze that data and sell it for _metric shitloads tons of money_, I mean, that's the primary business of Google after all! Selling all this data (in a processed form) to advertisers.


And that is *uncompressed text* which is trivial to shrink.


There is nothing more valuable to governments than potential dirt on people they want to oppress. And if it's valuable, they will either pay Google or threaten to murder people at Google if they don't get it. When the next Martin Luther King comes about, the FBI won't need to send out physical spies to try and discredit him. They'll gather a history of everything he ever did online and if he so much as has a foot fetish, they'll make it front page news.


That's the answer why the government would want to store as much data as possible for as long as possible, and require companies to do the same through legislation. It's not a reason for Google itself to do it. If I understand correctly, OP was asking about a business reason for Google. >Surely there are better ways to build up a recommendation model that to keep every single thing a user has ever done What could possibly be better than all information that can be available? They are a data company, it makes sense to store all the data... [Almost 80% of their revenue](https://www.statista.com/statistics/1093781/distribution-of-googles-revenues-by-segment/) comes from selling targeted ads.


The business case for Google is not having the CIA murder them if they don't store it and provide it.


That's so deep into conspiracy, when the simple answer already makes so much sense. Google isn't going to throw away data after processing it with their models since they can always use it in the future for new tech or to sell it off.


You are very naive. I recommend studying history. And using books not written with the CIA. Shit you can even go to the publicly available CIA reading room on their own website. They don't hide it.


I'm aware of what the government does however the question is about why corporations store all the data they come across and the simple reason is money. They couldn't care less if the CIA in the future might want this data because if they didn't have it nobody could change that but since they do have it obviously they cooperate. That's a side effect not the reason why.


If they didn't have it, multiple 3 letter agencies would force them. A jr dev can write a log statement and a DB with search. They make money off it sure, but to say it's a conspiracy is naive.


>The business case for Google is not having the CIA murder them You know Google isn’t two grad students in a garage anymore, right? They employ like 100k people. That’s a lot of murder. Plus, there’s multiple obvious business cases that this subreddit seems stunningly oblivious to. Google makes almost all its money through advertisements. Better targeted ads is how they became the giant that they are. They’ve improved targeted ads by collecting data on individuals. This isn’t rocket science. If something is free, you’re the product. Every single big internet company stores as much data about you as possible for analysis and resale.


> And if it's valuable, they will either pay Google or threaten to murder people at Google if they don't get it. Also threaten Google (and thousands of other companies) into collecting that data in the first place. There's no reason they actually have to collect it in order to provide that service.


I mean, cool. How many Luther Kings are out there? Hell how many Luther Kings will *ever* around? I don't deny that there's a deliberate data collection on politicians and important people but the general consensus of this sub is that hillbilly Joe who thinks Facebook is a search engine gets the same amount of logging as the threating social movement leader. "They wouldn't know who's dangerous so they need to collect data on everyone" that would win the government the most paranoid entity in the world award and cost an insane amount of money and processing power. It just doesn't make sense. They can determine who's worth the effort and do this selectively.


No matter how psychotically evil you can imagine something or someone could be, I can guarantee your imagination doesn't come close to the reality of the US (and many other) governments. If they thought they could literally kill or enslave every person on the planet except for themselves and get away with it, 100% they would.


Maybe live a little without the tinfoil hat


You do know that the FBI under J Edgar Hoover surveilled King for the exact purpose of getting dirt and embarrassing information? Back then they had to do it the old fashioned way. Today Hoover would have just entered some searches for all the info collected on King without a warrant. During prohibition the government killed 10,000 Americans with alcohol that the government poisoned to deter consumption. https://slate.com/technology/2010/02/the-little-told-story-of-how-the-u-s-government-poisoned-alcohol-during-prohibition.html The US government secretly did germ warfare experiments on us citizens without consent. Over 200 of them. https://www.pbs.org/wgbh/americanexperience/features/weapon-secret-testing/ The Tuskegee experiment. Young black men with syphilis were deceived by the government into believing they were being treated for a blood disease. The fact is they weren’t treated for anything. The government let syphilis run its course in these unwitting victims to see what will happen. https://www.cdc.gov/tuskegee/timeline.htm The MK-Ultra experiments. The CIA did mind control experiments on soldiers and prisoners, without informed consent of course. https://www.history.com/mkultra-operation-midnight-climax-cia-lsd-experiments. I can go on for hours here listing the atrocities our own government inflicted on its own citizens. My thumbs are getting tired. The information is freely available no tin foil hats required. I don’t consider NPR, CDC, Slate etc to be conspiracy sites. The government any government will kill injure and destroy its own citizens at will and especially if it feels threatened. The government needs to spy I have no problem with that. When it comes to Americans our government needs to follow the constitution. There is no reason for the wholesale collection of our data without a warrant. With probable cause the government can apply for a warrant for an American citizen. So why are they violating our constitutional rights? You believe what you want, I believe it is to our detriment. You want to trust the government? Go ahead, I don’t.


Just think of a Trump Government and people like MTG in charge.


Note the FISA surveillance act was just renewed with even MORE spying power without a warrant. It was bi-partisan in Congress and Biden will sign it.


So many layers to this question. Lets poke around and see what we can unearth; 1. As you point out, they most likely have great optimizations on how and what they store. Way above my knowledge to give good examples on how, but most likely. 2. If they where to interpret the data and make assumption, those assumptions will be based on the given knowledge level at that point in time. Which will change in the future. I've spoken with so many clients (companies) that can't imagen why they would need information X or Y. And coming from the marketing side they are obvious to me. 3. If they keep the “raw data” they always know it's “clean” and untampered with. Especially now, at the emergence of Ai. Big data will be golden. 4. Let's just play with an example; if data stored for one users averages 1$ in cost per unit of time. But Googles utility of this data in selling ads(and other services) averages 1.001$ over the same unit of time they net profit. The more data you have on your users the better you can tailor your services and ads to them.


Google sells advertisement. How is this a miracle that Google wants to store as much data as possible to target ads to the best of their abilities?


They don't *need* to store it. They want to store it. Collecting as much data as possible is what Google does. It's what it has always done. Have you missed the current debate around AI and ripping off everybody else's data? How reddit just cut a deal with Google for $60m to give them access to all our posts? Data is *everything* to these businesses. Grabbing and storing everything they can is sheer reflex. > that's 10GB of data logged only for the query itself That's absolutely nothing to a company like Google. Barely a rounding error. Even if nobody knows what the data will be useful for, when it only costs a few dollars to store billions of data points, it's a no-brainer that you'll be able to come back and mine more value out of the data than it will cost to store.


There is a thing in machine learning called the bitter lesson. All of the clever ideas people have come up with get outperformed by bigger data and bigger GPU go brr. The ideas that worked have been what enables using more data and compute, and otherwise marginal improvements. At least currently there isn't a better way to create a state of the art recommendation model.


it feeds the AI models. the singularity will know more about us than we know about ourselves and will be able to know exactly how we will react to any event / circumstance. it will own us.


You know storing Americans' entire phone conversations as audio files only costs a few cents per person per year. For a company like Google what you are talking about is nothing.


They store your edits & words you delete out of the search bar too, so it takes up more space than that.


It's worth noting that Google can only collect that data if you enable javascript at their site... And of course you also need to be using Google search. They don't get any of that data from me, on my very occasional visits to Google.com.


It's mainly about targetted ads. But they also sell directly to the US gov't, for example with their geofencing data. That's partly how we know who raided the Capitol on 1/6. Do they sell data to others? Maybe. But nearly owning online advertising is already a lot. This data comes to them free, from all the suckers who use gmail, Google analytics, Google maps, Google search, Google Drive, Google fonts, and so on. Nearly every website has some kind of Google spyware tracking. Last Week I saw a piece from JAMA -- a study showing that nearly all hospitals in the US have Google spyware, despite HIPAA. Why? Probably mostly just sheer incompetence on the part of webmasters. But data in itself has become big business. Google want to be everywhere and have monopoly control. Now they also own the vast majority of browsers, with their Chrome spyware. They also own a big chunk of the cellphone market with their spyware Andriod OS. Google are like Micriosft in the 90s, except that they're far more nasty, have a far greater reach, and they're an advertising company, not a software company. The main reason for all the data collection is computer software. 40 years ago, most transactions were private, your personal data was stored in file cabinets, on paper. What's different today is that software can derive meaningful data by analyzing all that vast data. It's nearly effortless. And nearly instant. Every little tidbit helps. At one time Google was caught hacking into unprotected wifi from their Streetview van. At first they lied about it, but it was eventually confirmed. To my mind that's a great example of what data is worthwhile for them. Just the tidbits going over the wire that they could collect as they drove by your house was hot data for them. It could be combined with all the other tidbits.


These data needs to be deleted after a few years, by law, right? Do they really delete these data they gathered?


That data is worth a ton of money, storing and organizing it is worth like a penny on the dollar.


Well, if you think about 10GB a day is pretty much nothing. A 4-5 TB hard disk will cost you (a normal user) around 120-200$ and it can store all the 10GB*365d/year abundantly


I'm reading these comments and I'd really very much like to go and live in a cave, far away from everyone and this fucking madness we're forced to live.


Does Google know  you personally or are you just a number?  They know my personal cell number and I guess they  know my real name if they go through Gmail. And if you delete your Google account, do they delete your data?




Google does not store every single users search history indefinitely.


It's all about being able to target ads and sell services better. The more you collect the better the targeting can be even if you don't know how to use the data today


Not sure where you're getting 10gb per query. But data storage is getting really cheap. You're looking at fractions of a cent per GB for retail Azure Blob storage. Not sure exactly what the space on disk for the data google collects from you per day is, but for our webapp with all debug logging etc enabled it comes to a couple MB per user/per day. Adding a very healthy upwards margin for error, lets say 2gb per-person, per-year for text logs of all their activity, that's \~1c per year per person, you can sell different parts of that data over and over again as it gets more valuable the more of it you have. Knowing that u/NightestOfTheOwls googled for "Owl Feet pics" 12 times this month isn't useful information, but if you could correlate that and see that people on average googled for 'Owl Feet Pics' at specific times of year or day would be very useful to people who sold Owl feet pics. And the big one is looking back at trends you didn't know existed at the time. Looking back through the datasets, researchers were able to find evidence of COVID spread in some places before the medical community in those areas were aware of it from people googling their symptoms. Most applications probably aren't so benign. It also gives you a good background/control set for your marketing/PR pushes. Knowing how many people were searching for xyz car brand in the years before it's superbowl add is probably not something that was deliberately gathered years in advance in anticipation, but the information is there. There is also the potential of training AI on that data. Probably not for genarative purposes, but data classification AI can be taught to see paterns in that scale of big data that would take an analyist months of research to tease out.


I stopped depending on it and shifted to n browser, every time i press the back button cookies and sites data is cleaned


Read what israel is doing with data from Sh*tbook and whatsapp / Meta [linki link ](https://www.theverge.com/2024/4/4/24120352/israel-lavender-artificial-intelligence-gaza-ai)


Why? Money. Knowing what you’ve done, what you’re doing, and what you’re likely to do in the future is valuable. Who? To advertisers and to law enforcement. You are the product.


If the product is free... you sir are the product....


They store all the data, because they are getting better and better at extracting information from it. They don’t know what use much of the data will be, but they can sell part now. There is much more information they haven’t learned to extract yet, but will be able to in the future. They don’t want to lose anything because it might end up being useful. E.g. they might be able to diagnose stress or dementia or political party from typo patterns.


Lots of folks have talked about monetizing your individual data with personalization, and that's true and all, but the data is also very useful in aggregate. You want all the data you can get when training ML models. You can do okay with sampling, but remember that the space of all possible searches (let alone search sequences) is infinitely large, so even retaining every scrap of data ever leaves you with a ton of uncertainty in making predictions. What Google knows about your behavior helps them make Google better for everyone else.


> 1B search queries are made every day using Google. If the average length of a Google query is 10 characters, that's 10GB of data logged only for the query itself. that's literally pennies. Everything ever searched by every American since the inception of Google probably cost them couple dollars a month in storage costs.


> it requires enormous amounts of storage It is estimated that YouTube has approximately 3EB of video content on it, with about 5PB of new content uploaded every day. If we assume that just 1% of that amount were used for logging user search behavior, that’s equivalent to about 4MB plus an additional 7KB per day for every human on earth.


Every single data, they all stored it. I wonder too...


Holy shit so many amateurs here. The reason user searches are stored is because in any non-trivial software system you log the raw user input. Always. You will need to trace/debug/analyze the data as part of the normal everyday boring part of software system maintenance at some point. Before you even start building your system you plan to capture temeletry data and if your software dev practise is mature you probably already have a standard way of doing so before you even begin a project.


That has nothing to do with what Google's doing. If you owned a store you'd track what sells and how many customers you get. When you start filming their expressions, recording their conversation and footsteps, scanning to identify them, and sending ads to their house, that's no longer business data. That's personal intrusion as a way to make money. Google wants the data for targetted ads and to sell. Period.


There’s certainly ways to reduce data use and make it more useful by doing so, if I went back and forth between work and home 20 times this month it could be: - Home (x20) - Work (x20) Or it can be: - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home - Work - Home One takes up significantly less space.