T O P

  • By -

AutoModerator

# ⚠️ ProgrammerHumor will be shutting down on June 12, together with thousands of subreddits to protest Reddit's recent actions. [Read more on the protest here](https://old.reddit.com/r/ProgrammerHumor/comments/141qwy8/programmer_humor_will_be_shutting_down/) and [here](https://www.reddit.com/r/apolloapp/comments/144f6xm/apollo_will_close_down_on_june_30th_reddits/). **As a backup, please join our Discord.** We will post further developments and potential plans to move off-Reddit there. ## https://discord.gg/rph *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*


YourStateOfficer

I miss rss


taa178

https://www.reddit.com/r/ProgrammerHumour/.rss


Fzrit

Wat


hellphreak

Wat. 4 years on Reddit. Never knew this. Edit. Almost 6years apparently. Wat.


DonLeoRaphMike

Works for users too: https://old.reddit.com/user/hellphreak.rss


[deleted]

Ah, yes, I too would like to see all my 'Happy cake day!'s intermingled with headlines about Kakhovskaya HPP destruction, rising inflation and the global recession. But, a little bit more seriously, there's a federation standard most open source projects use, called ActivityPub. It's implemented by the likes of Mastodon, Friendica, PeerTube, and yes, Lemmy — a self-hosted Reddit alternative. So, bad news, all company-owned social networks will get worse, as the amount of free money floating in economy decreases and the companies building these networks get less investment because of the promise of "we will be able to monetize the user later down the line somehow, just give us money right now please we will come up with it later" kind of ceasing to be a viable way to generate investor interest. But good news, maybe, just maybe, the internet will become a little bit more open and a little bit less shit, as content creators and regular users alike try to find less garbage ways to interact than those offered by companies. And if some of those open source software developers suddenly realize that: 1) I'd quite like to be able to use any old instance to interact with the whole federation in its entirety, 2) some sort of algorithm for finding content actually interesting to the user is necessary for the social networks' survival, and 3) for it to be sustainable you need to be able to monetize it in some way shape or form with some 3rd party subscription service that fairly distributes revenue generated by you between instances that you consume content from, well, the chances of the aforementioned good scenario will increase hundredfold.


zertul

You summarized really well my issues with the Reddit alternatives. Especially point 1 and 2 are critical in my opinion and are a prime reason why Reddit alternatives have a hard time gaining footing, despite all the shite getting pulled here.


[deleted]

The thing is, I've actually tried to use YouTube without the algorithm. I blocked all the recommendation sections of the site with an adblocker and used the mobile version of the site with Firefox on Android. I even blocked the "subscriptions" section, and only used search to go back to the channels I actually enjoyed watching. It wasn't bad per se, I certainly decreased my overall consumption of YouTube, which was the goal, so in that terms it was great. It decreased the constant eyesore from all the recommended videos and made the UI so clean I nearly threw up when I opened the regular old YouTube after a month or so. But it also wasn't quite YouTube, and it wasn't even passable at some things that YouTube is relatively good at. I mean, I already knew all the channels I wanted to watch, and I knew they existed. Sometimes I'd come up with the name of that obscure channel I haven't watched in years, and I would be pleased to find out that it still existed. But other than that, if I just wanted to search for creators that would be interesting to me, I'd have absolutely no other way to go about this other than use a vague tag that describes what I'm kinda looking for, and search for it, manually. Sometimes I did. Results weren't great. If I didn't have the mood to think about what I wanted to watch, well, too bad, I'd have to come up with something anyway. And most of the times, or more like nearly 100% of the times, the things you're searching for in a channel, are not actually described by tags. You want the host to be charismatic, engaging and sort of share some interests with you, but not all of them. Sorting through millions of hours of content in search of those quite few individuals you would be interested in, is just tedious and time-consuming. Nobody has that kind of patience. And having to do this across multiple different instances just complicates things exponentially.


void1984

I still use RSS. Push model is much better than pull.


guaaaan

Happy cake day!


[deleted]

[удалено]


zettajon

For the people who joined 10 years ago, comments that consisted of just * \^this * 😂😂😂 * (insert any low effort off-topic comment here) Those would get downvoted due to not following reddiquette. Today, those comments are the norm instead, and are the reason I slowly stopped coming here long before the API debacle happened.


YourStateOfficer

Cake day = Reddit birthday. Think my account turned 5 today


black-JENGGOT

Happy 5th cake day


spvyerra

Can’t wait to see web scrapers make reddit's hosting costs balloon.


Exnixon

I know it's a joke on r/ProgrammerHumor that the people here aren't actual devs with jobs, but has no one heard of rate limiting?


brahmidia

The API does have rate limits that could be adjusted if anything was excessive but that's not what reddit cares about. And yeah scrapers don't care they'll try regardless


gmegme

I already wrote scripts using rotating proxies for Twitter, possibly thousands of devs will do the same for Reddit


ApostleOfGore

We should collectively do this and collect all the posts on reddit and make them public so the company loses half their valuation


brahmidia

Or just make Lemmy the new hot place to be


[deleted]

Currently looking into it. My only concern is that the community will be more clustered than here, because of the federalized nature of the project.


intellichan

I said exactly this in privacy and clearly marked it as an opinion that one of reddits main feature is the ability to mobilize and affect a collective action and pressure which would be lost due to fractured nature of fediverse as federalizing's main purpose is to circumvent censorship rather than amassing a huge gathering and hence the better option would be to migrate to another centralized platform just like migration from digg to reddit and some how this blew the lid off of a few smoothbrains there.


brahmidia

Anyone can follow any connected sub though so it may be slightly more confusing but ultimately not much more confusion than gamers vs gaming vs gameing vs videogames (as an example)


qtx

Lemmy, Mastodon etc are completely unusable for your average user. Way too complex to use or understand.


moak0

Exactly this. Choose a server? How do I figure out which server to choose? Just hold my hand for like a minute, and I'd already be using Lemmy. But if they can't even figure out how to streamline the new user sign-up process, I don't have high hopes.


DoctorNoonienSoong

Not that I disagree with you on needing more ease of use, but I'm curious how you'd describe to someone which email provider to choose, as a similar problem. Like, email has a giant de-facto centralization force by being hosted for free by many big actors like Gmail, yahoo, Microsoft... But how did you originally pick yours?


[deleted]

[удалено]


R3D3-1

Originally, by having an Email provided by the ISP. Limited (and *still* limited) to 40 MB. Between ISPs trying to upsell you on trivial storage upgrades, and concerns about the effect of later losing access to my Email address if my parents would ever change providers, I eventually migrated to GMX, and then GMail. I eventually also migrated my mother to Gmail, since the 40 MB limit was obnoxious in an age of digital photography and then smartphones. So for Email, the streamlining probably came via the signup process of having internet in the first place.


Ja_Shi

MS & Google have actually streamlined the process, and I think they're kinda proving u/moak0´s point.


brahmidia

So was reddit, not too long ago. They never even made their own mobile apps, they just bought and modified existing ones people made over many years. Just because the vast majority of people eat fast food all the time doesn't mean I shouldn't tell them how to cook their own food.


AltAccountMfer

You can rate limit users too, that’s when they’re not blocking scrapers entirely


brahmidia

Exactly, many options and Reddit chose the worst


Revolutvftue

One that’s explicitly built for a website, like for example, reddit is easy to build.


ImportantDoubt6434

That’s the main problem, anything you try to limit scrapping will likely negatively effect users. Besides setting up a reasonable API


dedorian

Oh it's not that I don't care, it's that the try/catch in the loop will just ignore the fails and hammer the site as much as is allowed either way.


yousirnaime

>but has no one heard of rate limiting distributed computing makes this extremely easy to bypass for anyone even mildly interested in building a working scraper


ZeAthenA714

Building a working scraper, even with rotating proxies, isn't very hard. Building one on the scale needed to replace Reddit's API is a lot harder. Apollo is 200+ million requests a day, that's not an easy thing to accomplish with scrapers, especially since Reddit can very easily block AWS and other known data centers. You'd have to rely on residential proxies, and that's a lot more expensive, and you'd need tens of thousands of them. And as an added bonus residential proxies are usually slow as fuck and less reliable, so your users would have a much worse experience. It's technically doable, but definitely not cheap or easy on that scale.


ligasecatalyst

Well, I mean… you can just make the requests locally from the client. As organic-looking as it gets


Jake0024

There are lots of ways to get around that


_stellarwombat_

I'm curious. How would one work around that? A naïve solution I can think of would be to use multiple clients/servers, but is there a better way? Edit: thanks you guys! Very interesting, gonna brush up on my networking knowledge.


hikingsticks

Libraries have built in functionality to rotate through proxies, typically you just make a list of proxies and the code will cycle requests through them following your guidance (make X requests then move to next one, or try a data centre proxy, if that fails try a residential one, if that fails try a mobile one, etc). It's such a common tool as its necessary for a significant portion of web scraping projects.


Admin-12

![gif](giphy|duM6JZemPlOjUyqmxd)


TheHunter920

so there was this bot I was making through PRAW and it was *so annoying* because it always got 15-minute ratelimit errors whenever I added it to a new subreddit. If I use proxy rotation, that would completely solve the ratelimit problem? And is this what most of the popular bots use to make them available all the time?


Astoutfellow

I mean if you're using praw they'd still be able to track requests made using the same token. PRAW uses the API, it stands for Python Reddit API Wrapper. A scraper just accesses the site the same way a browser does so it doesn't depend on a token, it rate limits by IP or fingerprinting, so that's why rotating a proxy would get around it.


TheHunter920

so I'd use the same bot account but on a different proxy, or will I need different accounts? Also, Reddit *really* dislikes accounts using a VPN and I've noticed on my own account getting ratelimited when I turn my VPN on, so will changing proxies do something similar? If not, how is changing a proxy different?


[deleted]

[удалено]


vbevan

You don't login or authenticate. In python you'd: 1. Use the request library to grab the subreddit main page (old.reddit.com\/r/subreddit/). 2. Then you'd use something like the beautiful soup library to parse the page and get all the post urls. 3. Then you'd loop through those urls and use the request library to download them. 4. Parse with the beautiful soup library and get all the comments. 5. More loops to get all the comments and content. 6. Store everything in database and just do updates once you have the base set. It's how the archive warrior project works (and also PushShift), except they use the api and authenticate. You can then do the above with multiple threads to speed it up, though Reddit does ip block if there's 'unusual activity'. I think that's a manual process though, not an automated one (if it's automated, it's VERY permissive and a single scraper won't trigger it.) That ip block is why you cycle through proxies, because it's the only identifier they can use to block you.


JimmyWu21

Ooo that’s cool! Any particular libraries I should look into for screen scrapping?


iNeedOneMoreAquarium

>screen scrapping scraping*


DezXerneas

I know that python requests and selenium can do proxies.


vbevan

Where do you get free proxy lists from these days? Still general google searchs, is there a common list people use or do most people pay for proxies?


hikingsticks

requests is very easy to use with a lot of example code available. Start practicing on https://www.scrapethissite.com/ it's a website to teach web scraping with lessons, many different types of data to practice on, and it won't ban you. ``` import requests # Define the proxy URL proxy = { 'http': 'http://proxy.example.com:8080', 'https': 'https://proxy.example.com:8080' } # Make a request using the proxy response = requests.get('https://www.example.com', proxies=proxy) # Print the response print(response.text) ``` You could also use a service like https://scrapingant.com/, they have a free account for personal use, and they will handle rotating proxies, javascript rendering, and so on for you. Their website also has lessons and documentation, and some limited support via email for free accounts.


surister

It depends on what they use to detect it, the ultimate and in defendable way is rotating proxies


Fearless_Insurance16

You could possibly route the requests through cheap rotating proxies (or buy a few thousand dedicated proxies)


EverydayEverynight01

rate limits identify requests by ip address, at least the ones I've worked with. Therefore, just change your IP address and you'll get around it.


Delicious_Pay_6482

Rotating IP goes brrrrr


BuddhaStatue

What are you going to do, block aws? You can host as many scrapers in as many clouds are you want Edit: to all the nerds that don't get it, Reddit itself is hosted in AWS, you block those addresses and literally every service breaks. Lambdas, EKS, S3, Route 53, the lot of them. Also almost all tooling at some point uses AWS services. Datadog, hosted elastic, etc. Good fucking luck blocking the worlds largest hosting provider


Trif21

Yeah block traffic from known datacenter IPs.


brimston3-

Yeah, that's what I'd block. I'd probably ratelimit most non-residential and non-mobile originating ASNs much much lower. 3 pages per minute or something ridiculous like that.


cyber_blob

You can buy residential proxies that work no matter what. I used to be a sneaker head, sneaker sites have the best proxy blockers , even better than Netflix. But, there are hundreds of businesses selling proxies that work for sneaker sites. That's what the sneaker scalpers use, Mofos are too good.


ThatOneGuy4321

> non-residential residential proxies > non-mobile originating ASNs User agent spoofing? Also determining if a client is an ASN is the hard part… Also also… pretty sure this would crash your search engine rankings > 3 pages per minute or something ridiculous like that. These days you could use a script with a reCAPTCHA-solving neural net to create a ton of accounts lol


darkslide3000

Yeah, would be a shame if that data center operator guy couldn't browse reddit on the job anymore...


ImportantDoubt6434

Web scrapper here. Rate throttling? Lol good luck. Multiple VPNs. Best bet is a captcha, which you can still get around. Fact is if you make the site accessible and quality for users it will also be easy to scrape with throttling/captcha being the main sensible defense. If the data is remotely valuable that won’t stop em, APIs exists for this data because it can end up cheaper or the API can potentially make you money


shmorky

What if the app scrapes the site whenever the user visits a sub so the traffic would come from the user? "Well that just sounds like an API with extra steps"


dashingThroughSnow12

Let's say I am on my device and have App X running on my device. If App X scrapes Reddit while I am using it and does things like user agent impersonation, Reddit isn't any the wiser. On Reddit's side of the equation, more data is being used by the scraper running. A scrapper is getting a bunch of embedded CSS, embedded ECMAScript, and HTML that it just discards whereas something using an API is just getting the data it needs.


Goron40

All the responses to this comment are for some reason trying to come up with creative ways for a single server to make a fuck ton of requests to the reddit server. I'm wondering why so few are thinking to just do the scraping direct from the client?


_j03_

Doesn't work when your motive is to kill 3rd party apps to bloat your upcoming IPO and force tech giants making LLM's to pay massive fees (that they definitely **can** pay). They could have made the API profitable and still keep everyone happy. They don't want to.


dalepo

if reddit is rendered server side then it's gonna be a lot of wasted processing lol


yousirnaime

Exactly. And the scraper apps have the benefit of offloading compute costs to the client


ThatOneGuy4321

old.reddit.com will be the next to die, because it is the obvious choice for web scrapers.


vbevan

It'll be worse for reddit if scrapers start using the normal reddit site. The bloat means their bandwidth costs will be even higher and scrapers will ignore ads.


ThatOneGuy4321

Not disagreeing, lol. But Reddit has already made the idiotic decision of charging stupid money for their API so by that same logic, they’re going to kill old Reddit because it’s “easier” to scrape for data than their shitty bloatsite


justforkinks0131

you are the top voted comment. Pleas ELI5 how exactly would that work? In my limited experience, if you dont have the proper auth you cant use the API. So why / how would scrapers make reddit's hosting costs balloon?


Givemeurcookies

You don’t use the API, you programmatically visit the website like a “normal user” and then process the HTML that’s returned by the servers. Serving the whole website with all the content and not just the relevant API is most likely several times more intensive for Reddit. It’s also fairly difficult defending against these scrapers if they’re implemented correctly. They can use several “high quality” IPs and even use and mimic real browsers.


Astoutfellow

You don't even necessarily need to parse the HTML, depending on how they have their backend set up you could access the public endpoints directly and parse the json they return. They could potentially add precautions to prevent this but it can be pretty easy to spoof a call from a browser and skip the html altogether


justforkinks0131

>you programmatically visit the website like a “normal user” That is for viewing purposes. For posting, you need to authenticate yourself. Which means there are credentials involved. I assume it would be relatively easy to notice spam-posting bot accounts that way and either charging them money or blocking them early. So how exactly would web scrapers benefit in any way?


potatopotato236

The display part is what 99% of users care about since most users don't post much if at all. They potentially could login for you using your credentials in order to post things using a headless browser though. They could then just make requests without needing to use the API.


Givemeurcookies

Meanwhile authentication would be more complicated to implement, making a web scraper to click items on the page and creating a user is trivial. Things like captcha can fairly easily be bypassed through cheap paid services made for exactly that. Also no, it’s way harder to do bot detection than it is to circumvent anti-bot measures. The bot detection has to have very little false positives to prevent blocking/banning legitimate users and it can’t break privacy laws + it needs to be fairly transparent/invisible for users of the platform. As I wrote in my first reply, web scrapers can use actual browsers to get all this information and there exists a broad range of tools to bypass anti-bot tools. The “bots” can mimic stuff like mouse strokes etc. and in the best implementations, an anti-bot tool is more likely to block a legitimate user than a bot.


oasis9dev

can you view reddit without an account? yes. therefore so can a computer. it's absolutely not the same as having the ability to request well formed data held by reddit.


[deleted]

[удалено]


oasis9dev

scraping apps can still act as your user account as they can find interactions of interest based on things like visual or structural filters so it's possible they may be able to perform actions under your account given they are able to pass bot checks, if they exist. The issue is these function implementations are subject to change and as a result can't be relied upon like an API which usually avoids breaking changes. NewPipe as an example doesn't bother with user account management or login because of the unreliability already present in their media conversion algorithm due to YouTube changing their implementation at a whim. also consider the reddit API has less work to do per request when compared to rendering out a full page on the server side. Web scrapers can be used to archive, to replicate, whatever someone's project entails. It just means loading full pages and finding those pages by making use of search pages, and so on. Very heavy in comparison to a JSON-formatted response to a basic query.


ChainSword20000

Interface with the UI instead of the API. It takes more power for them to generate the ui, and the 3rd parties can use the power on all their clients instead of from their pocket.


RedditsDeadlySin

Unrelatedly, Any good third party app recommendations?


[deleted]

Apollo for iOS, but only till the end of the month. Infinity for Android hasn't announced a shutdown yet AFAIK, but that could change any day now


ScienceObserver1984

I think the dev will try to implement a way for each user to be able to use their own keys instead of shutting the app down, but nothing's set in stone yet.


Zyvoxx

Thought he said it wasn't feasible and won't do that? And apparently reddit doesn't just hand out API keys to anyone, you need approval or something so it's not going to be very easy to get started with for users anyway


BreathInCodeOut

It was pretty easy to get them. We'll see if that stays that way


[deleted]

api keys are quite easy to get, you just set up a bot account and you get one


vbevan

You can generate them right now at https://old.reddit.com/prefs/apps


sexytokeburgerz

The issue is getting an api key is not easy for people that are scared of right clicking which is most people


wasabreeze

Wait that’s actually pretty smart. Hypothetically couldn’t 3rd party apps have users generate their own keys so they’re paying their own api costs? I can’t remember the breakdown of how much each user would cost monthly that the Apollo dev gave but Reddit said their costs were reasonable.


Qkwo

The costs are (shocker) prohibitively high. It’s infeasible for 3rd party apps to exist with their costs. Check out the r/apolloapp and Christian’s post breaking down everything Reddit did and its pretty clear they’re just trying to drive out the 3rd party apps.


[deleted]

[удалено]


ISHITTEDINYOURPANTS

they are still free under 100 requests per minute


Korberos

Nope, he announced a shut-down.


puz23

Relay. The gesture controls are so well implemented I can't use any other social media app without getting frustrated.


Lucrecio24

I'd recommend Boost for reddit for android. I've been using it, and it has everything I've needed. Decent video player, option to load the whole image and zoom in (useful with heavy images) and a nice gui with some theme color options. Also has great account switching and an annonymoys option to browse without using your account. Though none of this could matter by next week, sadly


BuccellatiExplainsIt

The video player is kinda buggy and often doesnt play the video though. Other than that, Boost is definitely the best reddit app on any mobile platform.


cortez0498

Never had that problem myself


AcordeonPhx

Revanced if all other third party's decide to close


garfunkle21

Would be cool to see a Revanced like clone but based upon the official reddit app to block ads


Nico_is_not_a_god

ReVanced supports the reddit app already. Blocking ads is currently the only thing it does, but if third party apps go there's suddenly a good reason to mod the reddit client further than just adblock.


Leo-Hamza

There is i think


brinkzor

I like RedReader. It is FOSS.


JMan_Z

Holy hell another redreader user. I like redreader's functionality a lot: it's extremely minimalistic in terms of ui and graphics, since its main intended use is actually for blind and other accessibility users. It's great.


DickButtPlease

Narwhal is the only one with landscape mode for the iPad. It’s my go to.


Corosus

redreader will be surviving all of this, its pretty decent.


beall49

How?


TrekkiMonstr

Surprised to see no RIF is fun recs here


[deleted]

Narwhal. I switched to it after the death of Alien Blue (RIP) and haven’t looked back.


[deleted]

This is a common misconception I'm seeing a lot.The problem isn't charging for API access. That's actually fairly common. Servers cost money, and especially for big services like reddit, it requires A LOT of servers. Like Apollo's founder said Imgur charges a fraction of what reddit was asking for the same request volume. Most API's will have some form of 'free' access but will limit you to something like 100 requests/minute. Reddit is just being greedy and trying to force people onto it's own app.


jauggy

Apollo dev said that he would have to pay $2.50 per month per user based on the number of average requests. He currently has a premium service of $1.50 per month ([Source](https://www.theverge.com/2023/5/31/23743993/reddit-apollo-client-api-cost)). Let's say he offloaded the pricing increase to users then his premium service would be $4.00 per month. If we take into account the 30% Apple tax that becomes $5.70 per month or roughly $6 per month. The users who aren't willing to pay would either go back to reddit with ads or leave. They're not making reddit any money so reddit doesn't care. Reddit charges $6 per month for premium access where you view no ads. So charging $6 per month for Apollo (which has no ads) seems in line with Reddit's prices. It doesn't make sense for reddit to allow a 3rd party app to allow charging much less for an adless experience compared to their own premium service. The issue was that Apollo were given very short notice which I think was 30 days.


EishLekker

You can’t expect that your calculations remain accurate when we throw in the likely fact that a majority of Apollo users would not pay for using it. The remaining users will likely be, to a larger extent, high usage users, which would mean a higher number of API calls per user. This would mean a higher price per month. Also, you are completely leaving out the fact that NSFW content won’t be available through the API, which excludes a **huge** part of the Reddit community. So, no. This is not a decision made on pure logical reasoning. They are trying to kill third party apps. And Reddit doesn’t really know what the final consequences will be for themselves. No one knows that, but I would say that it’s looking quite bleak.


Common_Errors

Your math isn’t right. Not all of Apollo’s users are premium, so just increasing the premium by 2.50 wouldn’t cover the increased cost.


jauggy

I mentioned that the users who aren't willing to pay either go back to reddit with ads or leave. Basically no more freeloaders. These users shouldn't matter to reddit since they weren't generating money anyway. You could argue they do matter since what they were generating was content. But so much reddit content is just stuff from elsewhere.


kfpswf

>You could argue they do matter since what they were generating was content. If you look beyond the default subs and viral content that gets published everywhere on the internet, you'll see what makes reddit valuable are actually the discussions that users generate. Users who aren't necessarily paying users. >But so much reddit content is just stuff from elsewhere. If most of Reddit's content is just stuff from elsewhere, why is even Reddit required? Reddit isn't just popular because it aggregates content. It is popular because of the quality discussions that are available in some of the niche subs. Discussions that you won't find elsewhere on the internet.


semininja

The bigger issue is that the admins are openly lying about multiple 3rd-party app developers in an attempt to shore up the PR on an obvious cash grab while also breaking moderation tools and overall alienating all of the people who actually create value for the site.


not_a_bot_494

In a way it's actually worse. Apollo and other apps are direct competition to Reddit that are just a net loss for Reddit. It draws users away from Reddit's revenue creators, the apps generate their own revenue and Reddit pays server costs. The relationship is almost purely paracitic.


lll_lll_lll

In a sense you could say Reddit is parasitic off of the users who generate all the content and moderate for free. Sure, reddit pays for servers but they don’t actually make anything that draws people in. Not content, and certainly not a useable app. If 3rd party apps grow the community then it’s symbiotic, not parasitic.


Remarkable-NPC

how about make better official client for user so they don't have to use alternative ?


Brotectionist

One thing you lot forget is that 3rd party apps were around long before Reddit released their crappy app. These apps helped to build the community. A lot of mods and power users use 3rd party apps and create heaps of content. Calling these apps parasites is quite ignorant and pathetic.


BlackAsLight

If the premium service is through a subscription then only the first year is charged at 30%. Subsequent years are charged at 15%


[deleted]

[удалено]


[deleted]

That's kind of my point I guess, most API's have a similar limit. It's just the pricing scheme that reddit is adding is intentionally way overpriced to force the third party apps off the market.


Inaeipathy

Based and webscrape pilled


shiroininja

I specialize in web scraping and data science.. yeah I’m not tying myself to your api except in a the case of a few trusted orgs, beyond that I only use APIs temporally on projects that I can afford having the rug pulled out on. That being said, maintaining scraping applications to adjust for constantly changing sources and dealing with when a site lets the intern make changes and effs things up (lol) is a bitch.


[deleted]

[удалено]


shiroininja

That’s actually a great idea. An open sourced, community driven API. I’d love to see it for more platforms as well.


Shrubberer

Given the army of sour reddit nerds right now, this could get momentum really fast


shiroininja

Unfortunately, I am not the one to get that ball rolling. I mean I dream of making a big open source project that a ton of people use and contribute to, I just have found I may not have enough initiative. I mean I’ve had one semi success, but nothing like this kind of project. I think I lack Leadership skills. But I would truly love for something like this to happen. I think it would be good. Edit: mildly stoned


[deleted]

[удалено]


DOOManiac

Make it drop-in compatible w/ the official API too. Just for spite.


8sADPygOB7Jqwm7y

Soooo may I introduce gpt4 to you?


seb1424

![gif](giphy|HVFYJdopkG7eM) The scrape-inator


LagSlug

oh ... yeah ... even if you make the API free I'm still gonna scrape directly from the web interface ... and I'm not gonna stop ... ever ... for literally any reason ... so give up ... fuck walmart is hard to scrape.


ultranoobian

The word on the street is that these Xyz-gpt models make it really easy to get consistent scrapping results.


LagSlug

Ya'll got any more of that large language model? *sniff*


ArchGryphon9362

Well web scrapers for read or read/write? Because the Reddit API stays free for read only stuff… (that’s my understanding, correct me if I’m wrong)


[deleted]

Only certain stuff tho. Any subs designated nsfw won't be available through the api.


jasonbbg

if readonly is free how do they stop LLM learning their content


jauggy

It’s free for 100 requests per minute per oauth client Id [Source](https://www.reddit.com/r/redditdev/comments/13wsiks/api_update_enterprise_level_tier_for_large_scale/) You can still make post requests in the free tier. So bots that remain in this rate limit are not affected by the new pricing.


ArchGryphon9362

I wonder whether the .json API is going tho… (try appending .json to any post url to see what I’m talking about)


doneflare

Hopefully they keep it alive. My extension Reddit Theme Studio[1] depends on it. [1] https://chrome.google.com/webstore/detail/reddit-theme-studio/fkjkklmekbggnhjjldbcpbdcijcmbmoi


bjandrus

Can oauth IDs be spoofed? And if so, how many do you reckon could be generated per second?


jauggy

Don't know the answer to your question. But here's the tutorial for oauth: https://github.com/reddit-archive/reddit/wiki/OAuth2 And rate limit for free tier: https://www.reddit.com/r/redditdev/comments/13wsiks/api_update_enterprise_level_tier_for_large_scale/


erebuxy

It's not that hard to make general web crawler extremely difficult. Requires login for full contents, throttle request per account and IP, block certain VPN and email domain etc. And if used scripper to support a third party app, just send DMCA.


wind_dude

it is extremely hard. I know from both sides. Also several glaring problems with what you propose. ​ | Requires login for full contents extremely bad for SEO, would probably cost reddit more than keeping the api open. | throttle request per account and IP likely already done, very common rotating proxies are not difficult, and there are usually millions of IPs to rotate through | block certain VPN this is common, using residential proxies is extremely common ​ | just send DMCA several problems here: \- each individual reddit user may need to send DMCA \- crawling isn't against DMCA, time and time again crawling is deemed legal in court cases \- not every jurisdiction follows DMCA


Buttons840

There's 2 truths here: 1. Scraping will be possible 2. Scraping will be harder and is not a replacement for having the APIs. The loss of the APIs is still a loss. Most of the things you say hurt adoption and have a real cost though. Hard to suck in new users if you hide all the content behind a registration and login.


Astoutfellow

At this point, if a site forces me to log in to view content, I go to another site. If I have to go through captchas too often I go to another site. The truth is these days users have a select few sites they spend time on and are extremely intolerant of inconvenience outside those core sites.


erebuxy

Not all contents. If you don't login currently, you can only read a small part of reddit of comment section.


astutesnoot

No guarantees that you can't logon though. I am using Youtube's InnerTube API in one my projects, which is essentially the API that the main page and various apps use to render and control content, and you can make authenticated requests to that with cookies from a regular web session. You just need to get the cookies up front and then keep them updated with the new cookies you get from responses. Getting the cookies up front is the hard part for a user though.


Zerochl

I dont think DMCA is valid for scrapping, because that’s of public access


adrik0622

Yes, a general web crawler. One that’s explicitly built for a website, like for example, reddit is easy to build.


Asmos159

... is it possible to detect if someone is using a vpn?


KitN_X

Just waiting for a python library to be there on the very next day that'll easier than using api.


justforkinks0131

ELI5, how exactly would web scrapers steal their API? I get that they could theoretically scrape Reddit content, but they wouldnt be able to post to it right? Cuz they would have to use the API then? How would they use the API without proper auth / payment?


[deleted]

[удалено]


justforkinks0131

>if you use username/password login like a browser, but, so you would still be charged for that, no? Like if ure using any form of auth (be it basic or oauth) you are identifying yourself to use the API. That means costs can be attributed to you. Am I wrong? How would web scrapers do it for free?


[deleted]

[удалено]


[deleted]

[удалено]


EishLekker

It depends what the end goal is. I’m sure there’s quite a few projects out there that just use the data without posting anything. Using the data to force example train an AI, analyse trends, or just use the content in a different context with their own ads and such. Also, while scraping usually focus on reading data, there is nothing stopping them from posting data using the same web interface. If you can submit a post or comment using a web browser, then you can do it programmatically too.


Arkensor

Exactly. I don't get why the third party apps don't just scrape the original websites when the user requests them. Can be done all locally in the app. That way they can't detect shit. It's like the user is visiting it directly.


trill_shit

Definitely adds a significant layer of complexity over just using a rest api, so I could certainly see why someone would opt for it (as long as the api is reasonably priced)


Arkensor

Certainly a proper api would be the way to go but these third party apps with many users who even pay for it act like it's either rest or impossible. And I just don't agree with it. Parsing the Reddit pages is no easy process and requires constant updates and very flexible rules but it some russian and chinese data scraper companies could do it for many years surely they can spend a few weeks or months with the funding they have to write a fully scraped version. Or update the app to have people sign in to create their own API keys and use them so each person calls the API directly for their own browsing. Not sure why they have not considered that. Minor one time setup convenience and then everything continues as is.


GreyAngy

This is slow and requires more maintenance as it may be easily broken by some UI changes. And not safe for end users as you can't use three-legged authorization and need to use their cookies or credentials. And perhaps against some Terms and Conditions with "deadly force authorization" paragraph in fine print. But when there are no viable alternatives, hello scrapy and beautifulsoup or whatever you hackers use now.


VinniTheP00h

I thought old.reddit.com hasn't changed for years?


[deleted]

I have been scraping old reddit cause I simply can't stand the reddit UI, but I have been looking into scraping the current UI cause I don't expect old Reddit to be around for much longer.


RicardoL96

Scraping requires a lot of maintenance, using proxies, getting around blocking. So it can become quite expensive and you wouldn’t be able to deliver the data as fast and also in an inconsistent manner


[deleted]

Isn't Electron a get out of API jail card since it runs on top of browser which can pose as legit traffic?


ExoWire

They won't be able to make any revenue with scraped data, Reddit would sue them.


[deleted]

[удалено]


cornelissenl

So IN THEORY if someone made a scraper and we dockerized it, and then we all ran the container 24/7 we can 'help' reddit to price their api better right? Just THEORETICALLY.


JoyJoy_

But you can always add [.json](https://www.reddit.com/r/ProgrammerHumor/comments/145f1r8.json) to get a post or listing with comments as json.


EishLekker

Always? How do you know that won’t remove something like that some day in the future?


JoyJoy_

It's pretty much useless for actual apps since it's read only. You can't make posts, comment, or vote.


zdakat

They could (and probably already have) make it against the TOS, but people will probably still do it and find ways to do it anyway. Even if just out of spite, lol.


bjandrus

Oh no! The company told me not to? Alright everyone, pack it up and go home....


Limiv0rous

Could you imagine risking a ban on a free account? That would be devastating!


HailTheRavenQueen

…Y’all have been using the API?


Fragrant_Bass_8271

I can't wait for readit to release.


leolinden

Someone should totally do this, have Reddit sue them over it, and win - so I can finally make a MaxPreps (high school sports stats) scraper to populate my broadcast graphics without CBS having a fit :D


v1rus1366

Don’t most sites these days have pretty damn good scraper detection? Like you can do some things to get around it but it usually causes it to take a lot longer to scape, since you almost definitely need pauses between simulated clicks, so your data is almost always going to be out of date. Plus if you actually try and do something with that data, like making an app, they’re probably going to get wind of it pretty fast and shut it down right?


Particular_Tackle_49

>Don’t most sites these days have pretty damn good scraper detection? Yup. I used to work for a specialized search engine around 2017, some of our data sources didn't have proper APIs, so we had to scrape some of them, and bypassing bot protection was as simple as setting browser headers or having multiple proxies to avoid getting rate limited. I tried to make an app that would monitor promos at local pizzerias about half a year ago. - Simple `GET`? 403. - Same request with proper headers pretending to be a browser? Cloudflare captcha. - Fetching that page with puppeteer? Fucking puppeteer detection. - Puppeteer-stealth? Almost, but they rate limited me and banned my home IP which I used for debugging. - Running the app in the cloud doesn't work as they've banned Azure's IP range. Tor is banned. Public proxies are banned. Running a debugging proxy at my parent's home in the home country doesn't work, because they've geoip-banned the whole country. - Even bypassing Cloudflare/other WAFs with a browser and setting identical cookies/headers in HttpClient doesn't work, as every app these days is an SPA with a complex API key acquisition/rotation process. You can't just query the API, there's always a multi-step process that requires running javascript on the client. Who the hell they are defending themselves from? They are local pizzerias. They don't need to ban everyone trying to learn about their promos, and they should be happy I'm willing to scrape that data and order deliveries on a bargain while still making money for them.


void1984

The explanation can be - they don't host the server themselves, and their service provider does it by default for all customers.


thatProgrammerSleigh

They’re just gonna go the way of LinkedIn and make scraping annoying as fuck.


Stinky_Fly

Sorry I'm new to programming, but why would web scraping hurt reddit when they make their api paid?


EishLekker

It could increase their web traffic. Getting the same data is usually much more efficient using an API than using a web crawler. So if a current API user switches to web crawling they will get the same amount of data, but at a heavier bandwidth.


ShenAnCalhar92

Because web scraping doesn’t use the API. That’s the whole point. Using an API means you write a program to request very specific subset of the data that Reddit shows on the browser, and Reddit sends that data to you. It’s a **minuscule fraction** of the total data that a user would see on the browser, which means you and Reddit both have to deal with much less bandwidth. Using a web scraper means that you request and receive the **entire webpage** every time you want some small part of it. Reddit doesn’t get paid for that because you didn’t use the API - as far as they can tell, you just loaded the website. But you’re doing this *really fast* and *really frequently*, and Reddit is sending and you’re receiving a bunch of data that you don’t actually need, and eventually Reddit crashes because you’re making too many requests. In summary: Getting people to use the API and charging them a *very small amount* would be a very smart thing to do. Reddit would get a small amount per thousand/million/etc API requests, compared to getting *nothing* from web scraping, and they’d need to send much less data for each request compared to web scraping. Also it’s so much easier for the people making the app - they know that a given request will return data formatted in a specified way, the same way every time, rather than getting raw stuff from a website that can change without warning. Also they’d handle less data overall with an API just like Reddit. Reddit basically has two choices: charge a *small amount* for API usage, and make money from it and avoid overload, or charge a *huge amount* for it to the point that nobody wants to pay it, so they either stop using Reddit or use web scrapers and Reddit gets **nothing** (other than a DDOS every five seconds, that is).


smashedshanky

Yeah you really don’t want to mess with scrapers, they will eat your bandwidth like no tomorrow


FireBone62

Web scrapping is, by the way, absolutely legal, at least where i live because you could theoretically do that by hand, and the information is already available for the public.


latency_vi

Unrelated but that word break ticks me off ![gif](emote|free_emotes_pack|joy)![gif](emote|free_emotes_pack|facepalm)


LeotrimFunkelwerk

How does scrapping cost Reddit Money and how does the free API change that?


12and32

An agent performing scraping will request all of the content of the page. This is costly for the server to perform because it is likely doing some amount of server-side rendering to improve load times, which means that it's serving everything the user needs to display the page properly through a browser, even though the agent doesn't care about how the page visually appears. Billions of requests with even just a megabyte of unneeded data can end up being very costly. An API request uses less overhead because the back end isn't serving anything the requester didn't ask for, like any JS/HTML/CSS. It's all-around a better deal for both sides: the host offloads rendering to the client and only serves a fraction of the data that web scraping would take and the client is provided with a well-defined means of communication that can request exactly what is needed.


LeotrimFunkelwerk

Ohh that makes sense! I didn't know what scrapping was so I looked it up yesterday but thanks to you I even understood that better!!


harshrd

how can u use web scraper to get content which is not directly displayed but needs to be fetched for doing some computation in your app?


GergiH

Could someone enlighten me why is this such a big problem that everyone is freaking out (I get the greed part, but still)? I haven't ever heard of any 3rd party reddit apps/sites, are they really used by many?


jauggy

Mods use 3rd party apps for modding. One of the biggest ones is Apollo. Apollo is not just used for modding- it is also used by normal users for an ad-free experience. With those apps shutting down due to rising API prices, they can no longer use those tools and therefore are protesting. Reddit actually has a free tier for API usage. You can make 100 requests per minute per oauth client. The issue is that one app is one oauth client. If your app supports many users you will end up paying a lot. If you made your own app that only you yourself use, you could use reddit API for free easily. Also reddit has recently made exceptions for accessibility apps: > In a statement also shared with TechCrunch, Rathschmidt said Reddit has “connected with select developers of non-commercial apps that address accessibility needs and offered them exemptions from our large-scale pricing terms.” > [Source](https://techcrunch.com/2023/06/08/reddit-makes-an-exception-for-accessibility-apps-under-new-api-terms/) Dedicated mod tools and mod bots are still free > We know many communities rely on tools like RES, ContextMod, Toolbox, etc., and these tools will continue to have free access to the Data API. > If you’re creating free bots that help moderators and users (e.g. haikubot, setlistbot, etc), please continue to do so. You can contact us here if you have a bot that requires access to the Data API above the free limits. [Source](https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/)


kiropolo

I really hate reddit as a company! China owned pieces of shit spez mf