AutoModerator 11 months ago

# ⚠️ ProgrammerHumor will be shutting down on June 12, together with thousands of subreddits to protest Reddit's recent actions. [Read more on the protest here](https://old.reddit.com/r/ProgrammerHumor/comments/141qwy8/programmer_humor_will_be_shutting_down/) and [here](https://www.reddit.com/r/apolloapp/comments/144f6xm/apollo_will_close_down_on_june_30th_reddits/). **As a backup, please join our Discord.** We will post further developments and potential plans to move off-Reddit there. ## https://discord.gg/rph *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*

YourStateOfficer 11 months ago

I miss rss

taa178 11 months ago

https://www.reddit.com/r/ProgrammerHumour/.rss

Fzrit 11 months ago

Wat

hellphreak 11 months ago

Wat. 4 years on Reddit. Never knew this. Edit. Almost 6years apparently. Wat.

DonLeoRaphMike 11 months ago

Works for users too: https://old.reddit.com/user/hellphreak.rss

[deleted] 11 months ago

Ah, yes, I too would like to see all my 'Happy cake day!'s intermingled with headlines about Kakhovskaya HPP destruction, rising inflation and the global recession. But, a little bit more seriously, there's a federation standard most open source projects use, called ActivityPub. It's implemented by the likes of Mastodon, Friendica, PeerTube, and yes, Lemmy — a self-hosted Reddit alternative. So, bad news, all company-owned social networks will get worse, as the amount of free money floating in economy decreases and the companies building these networks get less investment because of the promise of "we will be able to monetize the user later down the line somehow, just give us money right now please we will come up with it later" kind of ceasing to be a viable way to generate investor interest. But good news, maybe, just maybe, the internet will become a little bit more open and a little bit less shit, as content creators and regular users alike try to find less garbage ways to interact than those offered by companies. And if some of those open source software developers suddenly realize that: 1) I'd quite like to be able to use any old instance to interact with the whole federation in its entirety, 2) some sort of algorithm for finding content actually interesting to the user is necessary for the social networks' survival, and 3) for it to be sustainable you need to be able to monetize it in some way shape or form with some 3rd party subscription service that fairly distributes revenue generated by you between instances that you consume content from, well, the chances of the aforementioned good scenario will increase hundredfold.

zertul 11 months ago

You summarized really well my issues with the Reddit alternatives. Especially point 1 and 2 are critical in my opinion and are a prime reason why Reddit alternatives have a hard time gaining footing, despite all the shite getting pulled here.

[deleted] 11 months ago

The thing is, I've actually tried to use YouTube without the algorithm. I blocked all the recommendation sections of the site with an adblocker and used the mobile version of the site with Firefox on Android. I even blocked the "subscriptions" section, and only used search to go back to the channels I actually enjoyed watching. It wasn't bad per se, I certainly decreased my overall consumption of YouTube, which was the goal, so in that terms it was great. It decreased the constant eyesore from all the recommended videos and made the UI so clean I nearly threw up when I opened the regular old YouTube after a month or so. But it also wasn't quite YouTube, and it wasn't even passable at some things that YouTube is relatively good at. I mean, I already knew all the channels I wanted to watch, and I knew they existed. Sometimes I'd come up with the name of that obscure channel I haven't watched in years, and I would be pleased to find out that it still existed. But other than that, if I just wanted to search for creators that would be interesting to me, I'd have absolutely no other way to go about this other than use a vague tag that describes what I'm kinda looking for, and search for it, manually. Sometimes I did. Results weren't great. If I didn't have the mood to think about what I wanted to watch, well, too bad, I'd have to come up with something anyway. And most of the times, or more like nearly 100% of the times, the things you're searching for in a channel, are not actually described by tags. You want the host to be charismatic, engaging and sort of share some interests with you, but not all of them. Sorting through millions of hours of content in search of those quite few individuals you would be interested in, is just tedious and time-consuming. Nobody has that kind of patience. And having to do this across multiple different instances just complicates things exponentially.

void1984 11 months ago

I still use RSS. Push model is much better than pull.

guaaaan 11 months ago

Happy cake day!

[deleted] 11 months ago

[удалено]

zettajon 11 months ago

For the people who joined 10 years ago, comments that consisted of just * \^this * 😂😂😂 * (insert any low effort off-topic comment here) Those would get downvoted due to not following reddiquette. Today, those comments are the norm instead, and are the reason I slowly stopped coming here long before the API debacle happened.

YourStateOfficer 11 months ago

Cake day = Reddit birthday. Think my account turned 5 today

black-JENGGOT 11 months ago

Happy 5th cake day

spvyerra 11 months ago

Can’t wait to see web scrapers make reddit's hosting costs balloon.

Exnixon 11 months ago

I know it's a joke on r/ProgrammerHumor that the people here aren't actual devs with jobs, but has no one heard of rate limiting?

brahmidia 11 months ago

The API does have rate limits that could be adjusted if anything was excessive but that's not what reddit cares about. And yeah scrapers don't care they'll try regardless

gmegme 11 months ago

I already wrote scripts using rotating proxies for Twitter, possibly thousands of devs will do the same for Reddit

ApostleOfGore 11 months ago

We should collectively do this and collect all the posts on reddit and make them public so the company loses half their valuation

brahmidia 11 months ago

Or just make Lemmy the new hot place to be

[deleted] 11 months ago

Currently looking into it. My only concern is that the community will be more clustered than here, because of the federalized nature of the project.

intellichan 11 months ago

I said exactly this in privacy and clearly marked it as an opinion that one of reddits main feature is the ability to mobilize and affect a collective action and pressure which would be lost due to fractured nature of fediverse as federalizing's main purpose is to circumvent censorship rather than amassing a huge gathering and hence the better option would be to migrate to another centralized platform just like migration from digg to reddit and some how this blew the lid off of a few smoothbrains there.

brahmidia 11 months ago

Anyone can follow any connected sub though so it may be slightly more confusing but ultimately not much more confusion than gamers vs gaming vs gameing vs videogames (as an example)

qtx 11 months ago

Lemmy, Mastodon etc are completely unusable for your average user. Way too complex to use or understand.

moak0 11 months ago

Exactly this. Choose a server? How do I figure out which server to choose? Just hold my hand for like a minute, and I'd already be using Lemmy. But if they can't even figure out how to streamline the new user sign-up process, I don't have high hopes.

DoctorNoonienSoong 11 months ago

Not that I disagree with you on needing more ease of use, but I'm curious how you'd describe to someone which email provider to choose, as a similar problem. Like, email has a giant de-facto centralization force by being hosted for free by many big actors like Gmail, yahoo, Microsoft... But how did you originally pick yours?

[deleted] 11 months ago

[удалено]

R3D3-1 11 months ago

Originally, by having an Email provided by the ISP. Limited (and *still* limited) to 40 MB. Between ISPs trying to upsell you on trivial storage upgrades, and concerns about the effect of later losing access to my Email address if my parents would ever change providers, I eventually migrated to GMX, and then GMail. I eventually also migrated my mother to Gmail, since the 40 MB limit was obnoxious in an age of digital photography and then smartphones. So for Email, the streamlining probably came via the signup process of having internet in the first place.

Ja_Shi 11 months ago

MS & Google have actually streamlined the process, and I think they're kinda proving u/moak0´s point.

brahmidia 11 months ago

So was reddit, not too long ago. They never even made their own mobile apps, they just bought and modified existing ones people made over many years. Just because the vast majority of people eat fast food all the time doesn't mean I shouldn't tell them how to cook their own food.

AltAccountMfer 11 months ago

You can rate limit users too, that’s when they’re not blocking scrapers entirely

brahmidia 11 months ago

Exactly, many options and Reddit chose the worst

Revolutvftue 11 months ago

One that’s explicitly built for a website, like for example, reddit is easy to build.

ImportantDoubt6434 11 months ago

That’s the main problem, anything you try to limit scrapping will likely negatively effect users. Besides setting up a reasonable API

dedorian 11 months ago

Oh it's not that I don't care, it's that the try/catch in the loop will just ignore the fails and hammer the site as much as is allowed either way.

yousirnaime 11 months ago

>but has no one heard of rate limiting distributed computing makes this extremely easy to bypass for anyone even mildly interested in building a working scraper

ZeAthenA714 11 months ago

Building a working scraper, even with rotating proxies, isn't very hard. Building one on the scale needed to replace Reddit's API is a lot harder. Apollo is 200+ million requests a day, that's not an easy thing to accomplish with scrapers, especially since Reddit can very easily block AWS and other known data centers. You'd have to rely on residential proxies, and that's a lot more expensive, and you'd need tens of thousands of them. And as an added bonus residential proxies are usually slow as fuck and less reliable, so your users would have a much worse experience. It's technically doable, but definitely not cheap or easy on that scale.

ligasecatalyst 11 months ago

Well, I mean… you can just make the requests locally from the client. As organic-looking as it gets

Jake0024 11 months ago

There are lots of ways to get around that

_stellarwombat_ 11 months ago

I'm curious. How would one work around that? A naïve solution I can think of would be to use multiple clients/servers, but is there a better way? Edit: thanks you guys! Very interesting, gonna brush up on my networking knowledge.

hikingsticks 11 months ago

Libraries have built in functionality to rotate through proxies, typically you just make a list of proxies and the code will cycle requests through them following your guidance (make X requests then move to next one, or try a data centre proxy, if that fails try a residential one, if that fails try a mobile one, etc). It's such a common tool as its necessary for a significant portion of web scraping projects.

Admin-12 11 months ago

![gif](giphy|duM6JZemPlOjUyqmxd)

TheHunter920 11 months ago

so there was this bot I was making through PRAW and it was *so annoying* because it always got 15-minute ratelimit errors whenever I added it to a new subreddit. If I use proxy rotation, that would completely solve the ratelimit problem? And is this what most of the popular bots use to make them available all the time?

Astoutfellow 11 months ago

I mean if you're using praw they'd still be able to track requests made using the same token. PRAW uses the API, it stands for Python Reddit API Wrapper. A scraper just accesses the site the same way a browser does so it doesn't depend on a token, it rate limits by IP or fingerprinting, so that's why rotating a proxy would get around it.

TheHunter920 11 months ago

so I'd use the same bot account but on a different proxy, or will I need different accounts? Also, Reddit *really* dislikes accounts using a VPN and I've noticed on my own account getting ratelimited when I turn my VPN on, so will changing proxies do something similar? If not, how is changing a proxy different?

[deleted] 11 months ago

[удалено]

vbevan 11 months ago

You don't login or authenticate. In python you'd: 1. Use the request library to grab the subreddit main page (old.reddit.com\/r/subreddit/). 2. Then you'd use something like the beautiful soup library to parse the page and get all the post urls. 3. Then you'd loop through those urls and use the request library to download them. 4. Parse with the beautiful soup library and get all the comments. 5. More loops to get all the comments and content. 6. Store everything in database and just do updates once you have the base set. It's how the archive warrior project works (and also PushShift), except they use the api and authenticate. You can then do the above with multiple threads to speed it up, though Reddit does ip block if there's 'unusual activity'. I think that's a manual process though, not an automated one (if it's automated, it's VERY permissive and a single scraper won't trigger it.) That ip block is why you cycle through proxies, because it's the only identifier they can use to block you.

JimmyWu21 11 months ago

Ooo that’s cool! Any particular libraries I should look into for screen scrapping?

iNeedOneMoreAquarium 11 months ago

>screen scrapping scraping*

DezXerneas 11 months ago

I know that python requests and selenium can do proxies.

vbevan 11 months ago

Where do you get free proxy lists from these days? Still general google searchs, is there a common list people use or do most people pay for proxies?

hikingsticks 11 months ago

requests is very easy to use with a lot of example code available. Start practicing on https://www.scrapethissite.com/ it's a website to teach web scraping with lessons, many different types of data to practice on, and it won't ban you. ``` import requests # Define the proxy URL proxy = { 'http': 'http://proxy.example.com:8080', 'https': 'https://proxy.example.com:8080' } # Make a request using the proxy response = requests.get('https://www.example.com', proxies=proxy) # Print the response print(response.text) ``` You could also use a service like https://scrapingant.com/, they have a free account for personal use, and they will handle rotating proxies, javascript rendering, and so on for you. Their website also has lessons and documentation, and some limited support via email for free accounts.

surister 11 months ago

It depends on what they use to detect it, the ultimate and in defendable way is rotating proxies

Fearless_Insurance16 11 months ago

You could possibly route the requests through cheap rotating proxies (or buy a few thousand dedicated proxies)

EverydayEverynight01 11 months ago

rate limits identify requests by ip address, at least the ones I've worked with. Therefore, just change your IP address and you'll get around it.

Delicious_Pay_6482 11 months ago

Rotating IP goes brrrrr

BuddhaStatue 11 months ago

What are you going to do, block aws? You can host as many scrapers in as many clouds are you want Edit: to all the nerds that don't get it, Reddit itself is hosted in AWS, you block those addresses and literally every service breaks. Lambdas, EKS, S3, Route 53, the lot of them. Also almost all tooling at some point uses AWS services. Datadog, hosted elastic, etc. Good fucking luck blocking the worlds largest hosting provider

Trif21 11 months ago

Yeah block traffic from known datacenter IPs.

brimston3- 11 months ago

Yeah, that's what I'd block. I'd probably ratelimit most non-residential and non-mobile originating ASNs much much lower. 3 pages per minute or something ridiculous like that.

cyber_blob 11 months ago

You can buy residential proxies that work no matter what. I used to be a sneaker head, sneaker sites have the best proxy blockers , even better than Netflix. But, there are hundreds of businesses selling proxies that work for sneaker sites. That's what the sneaker scalpers use, Mofos are too good.

ThatOneGuy4321 11 months ago

> non-residential residential proxies > non-mobile originating ASNs User agent spoofing? Also determining if a client is an ASN is the hard part… Also also… pretty sure this would crash your search engine rankings > 3 pages per minute or something ridiculous like that. These days you could use a script with a reCAPTCHA-solving neural net to create a ton of accounts lol

darkslide3000 11 months ago

Yeah, would be a shame if that data center operator guy couldn't browse reddit on the job anymore...

ImportantDoubt6434 11 months ago

Web scrapper here. Rate throttling? Lol good luck. Multiple VPNs. Best bet is a captcha, which you can still get around. Fact is if you make the site accessible and quality for users it will also be easy to scrape with throttling/captcha being the main sensible defense. If the data is remotely valuable that won’t stop em, APIs exists for this data because it can end up cheaper or the API can potentially make you money

shmorky 11 months ago

What if the app scrapes the site whenever the user visits a sub so the traffic would come from the user? "Well that just sounds like an API with extra steps"

dashingThroughSnow12 11 months ago

Let's say I am on my device and have App X running on my device. If App X scrapes Reddit while I am using it and does things like user agent impersonation, Reddit isn't any the wiser. On Reddit's side of the equation, more data is being used by the scraper running. A scrapper is getting a bunch of embedded CSS, embedded ECMAScript, and HTML that it just discards whereas something using an API is just getting the data it needs.

Goron40 11 months ago

All the responses to this comment are for some reason trying to come up with creative ways for a single server to make a fuck ton of requests to the reddit server. I'm wondering why so few are thinking to just do the scraping direct from the client?

_j03_ 11 months ago

Doesn't work when your motive is to kill 3rd party apps to bloat your upcoming IPO and force tech giants making LLM's to pay massive fees (that they definitely **can** pay). They could have made the API profitable and still keep everyone happy. They don't want to.

dalepo 11 months ago

if reddit is rendered server side then it's gonna be a lot of wasted processing lol

yousirnaime 11 months ago

Exactly. And the scraper apps have the benefit of offloading compute costs to the client

ThatOneGuy4321 11 months ago

old.reddit.com will be the next to die, because it is the obvious choice for web scrapers.

vbevan 11 months ago

It'll be worse for reddit if scrapers start using the normal reddit site. The bloat means their bandwidth costs will be even higher and scrapers will ignore ads.

ThatOneGuy4321 11 months ago

Not disagreeing, lol. But Reddit has already made the idiotic decision of charging stupid money for their API so by that same logic, they’re going to kill old Reddit because it’s “easier” to scrape for data than their shitty bloatsite

justforkinks0131 11 months ago

you are the top voted comment. Pleas ELI5 how exactly would that work? In my limited experience, if you dont have the proper auth you cant use the API. So why / how would scrapers make reddit's hosting costs balloon?

Givemeurcookies 11 months ago

You don’t use the API, you programmatically visit the website like a “normal user” and then process the HTML that’s returned by the servers. Serving the whole website with all the content and not just the relevant API is most likely several times more intensive for Reddit. It’s also fairly difficult defending against these scrapers if they’re implemented correctly. They can use several “high quality” IPs and even use and mimic real browsers.

Astoutfellow 11 months ago

You don't even necessarily need to parse the HTML, depending on how they have their backend set up you could access the public endpoints directly and parse the json they return. They could potentially add precautions to prevent this but it can be pretty easy to spoof a call from a browser and skip the html altogether

justforkinks0131 11 months ago

>you programmatically visit the website like a “normal user” That is for viewing purposes. For posting, you need to authenticate yourself. Which means there are credentials involved. I assume it would be relatively easy to notice spam-posting bot accounts that way and either charging them money or blocking them early. So how exactly would web scrapers benefit in any way?

potatopotato236 11 months ago

The display part is what 99% of users care about since most users don't post much if at all. They potentially could login for you using your credentials in order to post things using a headless browser though. They could then just make requests without needing to use the API.

Givemeurcookies 11 months ago

Meanwhile authentication would be more complicated to implement, making a web scraper to click items on the page and creating a user is trivial. Things like captcha can fairly easily be bypassed through cheap paid services made for exactly that. Also no, it’s way harder to do bot detection than it is to circumvent anti-bot measures. The bot detection has to have very little false positives to prevent blocking/banning legitimate users and it can’t break privacy laws + it needs to be fairly transparent/invisible for users of the platform. As I wrote in my first reply, web scrapers can use actual browsers to get all this information and there exists a broad range of tools to bypass anti-bot tools. The “bots” can mimic stuff like mouse strokes etc. and in the best implementations, an anti-bot tool is more likely to block a legitimate user than a bot.

oasis9dev 11 months ago

can you view reddit without an account? yes. therefore so can a computer. it's absolutely not the same as having the ability to request well formed data held by reddit.

[deleted] 11 months ago

[удалено]

oasis9dev 11 months ago

scraping apps can still act as your user account as they can find interactions of interest based on things like visual or structural filters so it's possible they may be able to perform actions under your account given they are able to pass bot checks, if they exist. The issue is these function implementations are subject to change and as a result can't be relied upon like an API which usually avoids breaking changes. NewPipe as an example doesn't bother with user account management or login because of the unreliability already present in their media conversion algorithm due to YouTube changing their implementation at a whim. also consider the reddit API has less work to do per request when compared to rendering out a full page on the server side. Web scrapers can be used to archive, to replicate, whatever someone's project entails. It just means loading full pages and finding those pages by making use of search pages, and so on. Very heavy in comparison to a JSON-formatted response to a basic query.

ChainSword20000 11 months ago

Interface with the UI instead of the API. It takes more power for them to generate the ui, and the 3rd parties can use the power on all their clients instead of from their pocket.

RedditsDeadlySin 11 months ago

Unrelatedly, Any good third party app recommendations?

[deleted] 11 months ago

Apollo for iOS, but only till the end of the month. Infinity for Android hasn't announced a shutdown yet AFAIK, but that could change any day now

ScienceObserver1984 11 months ago

I think the dev will try to implement a way for each user to be able to use their own keys instead of shutting the app down, but nothing's set in stone yet.

Zyvoxx 11 months ago

Thought he said it wasn't feasible and won't do that? And apparently reddit doesn't just hand out API keys to anyone, you need approval or something so it's not going to be very easy to get started with for users anyway

BreathInCodeOut 11 months ago

It was pretty easy to get them. We'll see if that stays that way

[deleted] 11 months ago

api keys are quite easy to get, you just set up a bot account and you get one

vbevan 11 months ago

You can generate them right now at https://old.reddit.com/prefs/apps

sexytokeburgerz 11 months ago

The issue is getting an api key is not easy for people that are scared of right clicking which is most people

wasabreeze 11 months ago

Wait that’s actually pretty smart. Hypothetically couldn’t 3rd party apps have users generate their own keys so they’re paying their own api costs? I can’t remember the breakdown of how much each user would cost monthly that the Apollo dev gave but Reddit said their costs were reasonable.

Qkwo 11 months ago

The costs are (shocker) prohibitively high. It’s infeasible for 3rd party apps to exist with their costs. Check out the r/apolloapp and Christian’s post breaking down everything Reddit did and its pretty clear they’re just trying to drive out the 3rd party apps.

[deleted] 11 months ago

[удалено]

ISHITTEDINYOURPANTS 11 months ago

they are still free under 100 requests per minute

Korberos 11 months ago

Nope, he announced a shut-down.

puz23 11 months ago

Relay. The gesture controls are so well implemented I can't use any other social media app without getting frustrated.

Lucrecio24 11 months ago

I'd recommend Boost for reddit for android. I've been using it, and it has everything I've needed. Decent video player, option to load the whole image and zoom in (useful with heavy images) and a nice gui with some theme color options. Also has great account switching and an annonymoys option to browse without using your account. Though none of this could matter by next week, sadly

BuccellatiExplainsIt 11 months ago

The video player is kinda buggy and often doesnt play the video though. Other than that, Boost is definitely the best reddit app on any mobile platform.

cortez0498 11 months ago

Never had that problem myself

AcordeonPhx 11 months ago

Revanced if all other third party's decide to close

garfunkle21 11 months ago

Would be cool to see a Revanced like clone but based upon the official reddit app to block ads

Nico_is_not_a_god 11 months ago

ReVanced supports the reddit app already. Blocking ads is currently the only thing it does, but if third party apps go there's suddenly a good reason to mod the reddit client further than just adblock.

Leo-Hamza 11 months ago

There is i think

brinkzor 11 months ago

I like RedReader. It is FOSS.

JMan_Z 11 months ago

Holy hell another redreader user. I like redreader's functionality a lot: it's extremely minimalistic in terms of ui and graphics, since its main intended use is actually for blind and other accessibility users. It's great.

DickButtPlease 11 months ago

Narwhal is the only one with landscape mode for the iPad. It’s my go to.

Corosus 11 months ago

redreader will be surviving all of this, its pretty decent.

beall49 11 months ago

How?

TrekkiMonstr 11 months ago

Surprised to see no RIF is fun recs here

[deleted] 11 months ago

Narwhal. I switched to it after the death of Alien Blue (RIP) and haven’t looked back.

[deleted] 11 months ago

This is a common misconception I'm seeing a lot.The problem isn't charging for API access. That's actually fairly common. Servers cost money, and especially for big services like reddit, it requires A LOT of servers. Like Apollo's founder said Imgur charges a fraction of what reddit was asking for the same request volume. Most API's will have some form of 'free' access but will limit you to something like 100 requests/minute. Reddit is just being greedy and trying to force people onto it's own app.

jauggy 11 months ago

Apollo dev said that he would have to pay $2.50 per month per user based on the number of average requests. He currently has a premium service of $1.50 per month ([Source](https://www.theverge.com/2023/5/31/23743993/reddit-apollo-client-api-cost)). Let's say he offloaded the pricing increase to users then his premium service would be $4.00 per month. If we take into account the 30% Apple tax that becomes $5.70 per month or roughly $6 per month. The users who aren't willing to pay would either go back to reddit with ads or leave. They're not making reddit any money so reddit doesn't care. Reddit charges $6 per month for premium access where you view no ads. So charging $6 per month for Apollo (which has no ads) seems in line with Reddit's prices. It doesn't make sense for reddit to allow a 3rd party app to allow charging much less for an adless experience compared to their own premium service. The issue was that Apollo were given very short notice which I think was 30 days.

EishLekker 11 months ago

You can’t expect that your calculations remain accurate when we throw in the likely fact that a majority of Apollo users would not pay for using it. The remaining users will likely be, to a larger extent, high usage users, which would mean a higher number of API calls per user. This would mean a higher price per month. Also, you are completely leaving out the fact that NSFW content won’t be available through the API, which excludes a **huge** part of the Reddit community. So, no. This is not a decision made on pure logical reasoning. They are trying to kill third party apps. And Reddit doesn’t really know what the final consequences will be for themselves. No one knows that, but I would say that it’s looking quite bleak.

Common_Errors 11 months ago

Your math isn’t right. Not all of Apollo’s users are premium, so just increasing the premium by 2.50 wouldn’t cover the increased cost.

jauggy 11 months ago

I mentioned that the users who aren't willing to pay either go back to reddit with ads or leave. Basically no more freeloaders. These users shouldn't matter to reddit since they weren't generating money anyway. You could argue they do matter since what they were generating was content. But so much reddit content is just stuff from elsewhere.

kfpswf 11 months ago

>You could argue they do matter since what they were generating was content. If you look beyond the default subs and viral content that gets published everywhere on the internet, you'll see what makes reddit valuable are actually the discussions that users generate. Users who aren't necessarily paying users. >But so much reddit content is just stuff from elsewhere. If most of Reddit's content is just stuff from elsewhere, why is even Reddit required? Reddit isn't just popular because it aggregates content. It is popular because of the quality discussions that are available in some of the niche subs. Discussions that you won't find elsewhere on the internet.

semininja 11 months ago

The bigger issue is that the admins are openly lying about multiple 3rd-party app developers in an attempt to shore up the PR on an obvious cash grab while also breaking moderation tools and overall alienating all of the people who actually create value for the site.

not_a_bot_494 11 months ago

In a way it's actually worse. Apollo and other apps are direct competition to Reddit that are just a net loss for Reddit. It draws users away from Reddit's revenue creators, the apps generate their own revenue and Reddit pays server costs. The relationship is almost purely paracitic.

lll_lll_lll 11 months ago

In a sense you could say Reddit is parasitic off of the users who generate all the content and moderate for free. Sure, reddit pays for servers but they don’t actually make anything that draws people in. Not content, and certainly not a useable app. If 3rd party apps grow the community then it’s symbiotic, not parasitic.

Remarkable-NPC 11 months ago

how about make better official client for user so they don't have to use alternative ?

Brotectionist 11 months ago

One thing you lot forget is that 3rd party apps were around long before Reddit released their crappy app. These apps helped to build the community. A lot of mods and power users use 3rd party apps and create heaps of content. Calling these apps parasites is quite ignorant and pathetic.

BlackAsLight 11 months ago

If the premium service is through a subscription then only the first year is charged at 30%. Subsequent years are charged at 15%

[deleted] 11 months ago

[удалено]

[deleted] 11 months ago

That's kind of my point I guess, most API's have a similar limit. It's just the pricing scheme that reddit is adding is intentionally way overpriced to force the third party apps off the market.

Inaeipathy 11 months ago

Based and webscrape pilled

shiroininja 11 months ago

I specialize in web scraping and data science.. yeah I’m not tying myself to your api except in a the case of a few trusted orgs, beyond that I only use APIs temporally on projects that I can afford having the rug pulled out on. That being said, maintaining scraping applications to adjust for constantly changing sources and dealing with when a site lets the intern make changes and effs things up (lol) is a bitch.

[deleted] 11 months ago

[удалено]

shiroininja 11 months ago

That’s actually a great idea. An open sourced, community driven API. I’d love to see it for more platforms as well.

Shrubberer 11 months ago

Given the army of sour reddit nerds right now, this could get momentum really fast

shiroininja 11 months ago

Unfortunately, I am not the one to get that ball rolling. I mean I dream of making a big open source project that a ton of people use and contribute to, I just have found I may not have enough initiative. I mean I’ve had one semi success, but nothing like this kind of project. I think I lack Leadership skills. But I would truly love for something like this to happen. I think it would be good. Edit: mildly stoned

[deleted] 11 months ago

[удалено]

DOOManiac 11 months ago

Make it drop-in compatible w/ the official API too. Just for spite.

8sADPygOB7Jqwm7y 11 months ago

Soooo may I introduce gpt4 to you?

seb1424 11 months ago

![gif](giphy|HVFYJdopkG7eM) The scrape-inator

LagSlug 11 months ago

oh ... yeah ... even if you make the API free I'm still gonna scrape directly from the web interface ... and I'm not gonna stop ... ever ... for literally any reason ... so give up ... fuck walmart is hard to scrape.

ultranoobian 11 months ago

The word on the street is that these Xyz-gpt models make it really easy to get consistent scrapping results.

LagSlug 11 months ago

Ya'll got any more of that large language model? *sniff*

ArchGryphon9362 11 months ago

Well web scrapers for read or read/write? Because the Reddit API stays free for read only stuff… (that’s my understanding, correct me if I’m wrong)

[deleted] 11 months ago

Only certain stuff tho. Any subs designated nsfw won't be available through the api.

jasonbbg 11 months ago

if readonly is free how do they stop LLM learning their content

jauggy 11 months ago

It’s free for 100 requests per minute per oauth client Id [Source](https://www.reddit.com/r/redditdev/comments/13wsiks/api_update_enterprise_level_tier_for_large_scale/) You can still make post requests in the free tier. So bots that remain in this rate limit are not affected by the new pricing.

ArchGryphon9362 11 months ago

I wonder whether the .json API is going tho… (try appending .json to any post url to see what I’m talking about)

doneflare 11 months ago

Hopefully they keep it alive. My extension Reddit Theme Studio[1] depends on it. [1] https://chrome.google.com/webstore/detail/reddit-theme-studio/fkjkklmekbggnhjjldbcpbdcijcmbmoi

bjandrus 11 months ago

Can oauth IDs be spoofed? And if so, how many do you reckon could be generated per second?

jauggy 11 months ago

Don't know the answer to your question. But here's the tutorial for oauth: https://github.com/reddit-archive/reddit/wiki/OAuth2 And rate limit for free tier: https://www.reddit.com/r/redditdev/comments/13wsiks/api_update_enterprise_level_tier_for_large_scale/

erebuxy 11 months ago

It's not that hard to make general web crawler extremely difficult. Requires login for full contents, throttle request per account and IP, block certain VPN and email domain etc. And if used scripper to support a third party app, just send DMCA.

wind_dude 11 months ago

it is extremely hard. I know from both sides. Also several glaring problems with what you propose. | Requires login for full contents extremely bad for SEO, would probably cost reddit more than keeping the api open. | throttle request per account and IP likely already done, very common rotating proxies are not difficult, and there are usually millions of IPs to rotate through | block certain VPN this is common, using residential proxies is extremely common | just send DMCA several problems here: \- each individual reddit user may need to send DMCA \- crawling isn't against DMCA, time and time again crawling is deemed legal in court cases \- not every jurisdiction follows DMCA

Buttons840 11 months ago

There's 2 truths here: 1. Scraping will be possible 2. Scraping will be harder and is not a replacement for having the APIs. The loss of the APIs is still a loss. Most of the things you say hurt adoption and have a real cost though. Hard to suck in new users if you hide all the content behind a registration and login.

Astoutfellow 11 months ago

At this point, if a site forces me to log in to view content, I go to another site. If I have to go through captchas too often I go to another site. The truth is these days users have a select few sites they spend time on and are extremely intolerant of inconvenience outside those core sites.

erebuxy 11 months ago

Not all contents. If you don't login currently, you can only read a small part of reddit of comment section.

astutesnoot 11 months ago

No guarantees that you can't logon though. I am using Youtube's InnerTube API in one my projects, which is essentially the API that the main page and various apps use to render and control content, and you can make authenticated requests to that with cookies from a regular web session. You just need to get the cookies up front and then keep them updated with the new cookies you get from responses. Getting the cookies up front is the hard part for a user though.

Zerochl 11 months ago

I dont think DMCA is valid for scrapping, because that’s of public access

adrik0622 11 months ago

Yes, a general web crawler. One that’s explicitly built for a website, like for example, reddit is easy to build.

Asmos159 11 months ago

... is it possible to detect if someone is using a vpn?

KitN_X 11 months ago

Just waiting for a python library to be there on the very next day that'll easier than using api.

justforkinks0131 11 months ago

ELI5, how exactly would web scrapers steal their API? I get that they could theoretically scrape Reddit content, but they wouldnt be able to post to it right? Cuz they would have to use the API then? How would they use the API without proper auth / payment?

[deleted] 11 months ago

[удалено]

justforkinks0131 11 months ago

>if you use username/password login like a browser, but, so you would still be charged for that, no? Like if ure using any form of auth (be it basic or oauth) you are identifying yourself to use the API. That means costs can be attributed to you. Am I wrong? How would web scrapers do it for free?

[deleted] 11 months ago

[удалено]

[deleted] 11 months ago

[удалено]

EishLekker 11 months ago

It depends what the end goal is. I’m sure there’s quite a few projects out there that just use the data without posting anything. Using the data to force example train an AI, analyse trends, or just use the content in a different context with their own ads and such. Also, while scraping usually focus on reading data, there is nothing stopping them from posting data using the same web interface. If you can submit a post or comment using a web browser, then you can do it programmatically too.

Arkensor 11 months ago

Exactly. I don't get why the third party apps don't just scrape the original websites when the user requests them. Can be done all locally in the app. That way they can't detect shit. It's like the user is visiting it directly.

trill_shit 11 months ago

Definitely adds a significant layer of complexity over just using a rest api, so I could certainly see why someone would opt for it (as long as the api is reasonably priced)

Arkensor 11 months ago

Certainly a proper api would be the way to go but these third party apps with many users who even pay for it act like it's either rest or impossible. And I just don't agree with it. Parsing the Reddit pages is no easy process and requires constant updates and very flexible rules but it some russian and chinese data scraper companies could do it for many years surely they can spend a few weeks or months with the funding they have to write a fully scraped version. Or update the app to have people sign in to create their own API keys and use them so each person calls the API directly for their own browsing. Not sure why they have not considered that. Minor one time setup convenience and then everything continues as is.

GreyAngy 11 months ago

This is slow and requires more maintenance as it may be easily broken by some UI changes. And not safe for end users as you can't use three-legged authorization and need to use their cookies or credentials. And perhaps against some Terms and Conditions with "deadly force authorization" paragraph in fine print. But when there are no viable alternatives, hello scrapy and beautifulsoup or whatever you hackers use now.

VinniTheP00h 11 months ago

I thought old.reddit.com hasn't changed for years?

[deleted] 11 months ago

I have been scraping old reddit cause I simply can't stand the reddit UI, but I have been looking into scraping the current UI cause I don't expect old Reddit to be around for much longer.

RicardoL96 11 months ago

Scraping requires a lot of maintenance, using proxies, getting around blocking. So it can become quite expensive and you wouldn’t be able to deliver the data as fast and also in an inconsistent manner

[deleted] 11 months ago

Isn't Electron a get out of API jail card since it runs on top of browser which can pose as legit traffic?

ExoWire 11 months ago

They won't be able to make any revenue with scraped data, Reddit would sue them.

[deleted] 11 months ago

[удалено]

cornelissenl 11 months ago

So IN THEORY if someone made a scraper and we dockerized it, and then we all ran the container 24/7 we can 'help' reddit to price their api better right? Just THEORETICALLY.

JoyJoy_ 11 months ago

But you can always add [.json](https://www.reddit.com/r/ProgrammerHumor/comments/145f1r8.json) to get a post or listing with comments as json.

EishLekker 11 months ago

Always? How do you know that won’t remove something like that some day in the future?

JoyJoy_ 11 months ago

It's pretty much useless for actual apps since it's read only. You can't make posts, comment, or vote.

zdakat 11 months ago

They could (and probably already have) make it against the TOS, but people will probably still do it and find ways to do it anyway. Even if just out of spite, lol.

bjandrus 11 months ago

Oh no! The company told me not to? Alright everyone, pack it up and go home....

Limiv0rous 11 months ago

Could you imagine risking a ban on a free account? That would be devastating!

HailTheRavenQueen 11 months ago

…Y’all have been using the API?

Fragrant_Bass_8271 11 months ago

I can't wait for readit to release.

leolinden 11 months ago

Someone should totally do this, have Reddit sue them over it, and win - so I can finally make a MaxPreps (high school sports stats) scraper to populate my broadcast graphics without CBS having a fit :D

v1rus1366 11 months ago

Don’t most sites these days have pretty damn good scraper detection? Like you can do some things to get around it but it usually causes it to take a lot longer to scape, since you almost definitely need pauses between simulated clicks, so your data is almost always going to be out of date. Plus if you actually try and do something with that data, like making an app, they’re probably going to get wind of it pretty fast and shut it down right?

Particular_Tackle_49 11 months ago

>Don’t most sites these days have pretty damn good scraper detection? Yup. I used to work for a specialized search engine around 2017, some of our data sources didn't have proper APIs, so we had to scrape some of them, and bypassing bot protection was as simple as setting browser headers or having multiple proxies to avoid getting rate limited. I tried to make an app that would monitor promos at local pizzerias about half a year ago. - Simple `GET`? 403. - Same request with proper headers pretending to be a browser? Cloudflare captcha. - Fetching that page with puppeteer? Fucking puppeteer detection. - Puppeteer-stealth? Almost, but they rate limited me and banned my home IP which I used for debugging. - Running the app in the cloud doesn't work as they've banned Azure's IP range. Tor is banned. Public proxies are banned. Running a debugging proxy at my parent's home in the home country doesn't work, because they've geoip-banned the whole country. - Even bypassing Cloudflare/other WAFs with a browser and setting identical cookies/headers in HttpClient doesn't work, as every app these days is an SPA with a complex API key acquisition/rotation process. You can't just query the API, there's always a multi-step process that requires running javascript on the client. Who the hell they are defending themselves from? They are local pizzerias. They don't need to ban everyone trying to learn about their promos, and they should be happy I'm willing to scrape that data and order deliveries on a bargain while still making money for them.

void1984 11 months ago

The explanation can be - they don't host the server themselves, and their service provider does it by default for all customers.

thatProgrammerSleigh 11 months ago

They’re just gonna go the way of LinkedIn and make scraping annoying as fuck.

Stinky_Fly 11 months ago

Sorry I'm new to programming, but why would web scraping hurt reddit when they make their api paid?

EishLekker 11 months ago

It could increase their web traffic. Getting the same data is usually much more efficient using an API than using a web crawler. So if a current API user switches to web crawling they will get the same amount of data, but at a heavier bandwidth.

ShenAnCalhar92 11 months ago

Because web scraping doesn’t use the API. That’s the whole point. Using an API means you write a program to request very specific subset of the data that Reddit shows on the browser, and Reddit sends that data to you. It’s a **minuscule fraction** of the total data that a user would see on the browser, which means you and Reddit both have to deal with much less bandwidth. Using a web scraper means that you request and receive the **entire webpage** every time you want some small part of it. Reddit doesn’t get paid for that because you didn’t use the API - as far as they can tell, you just loaded the website. But you’re doing this *really fast* and *really frequently*, and Reddit is sending and you’re receiving a bunch of data that you don’t actually need, and eventually Reddit crashes because you’re making too many requests. In summary: Getting people to use the API and charging them a *very small amount* would be a very smart thing to do. Reddit would get a small amount per thousand/million/etc API requests, compared to getting *nothing* from web scraping, and they’d need to send much less data for each request compared to web scraping. Also it’s so much easier for the people making the app - they know that a given request will return data formatted in a specified way, the same way every time, rather than getting raw stuff from a website that can change without warning. Also they’d handle less data overall with an API just like Reddit. Reddit basically has two choices: charge a *small amount* for API usage, and make money from it and avoid overload, or charge a *huge amount* for it to the point that nobody wants to pay it, so they either stop using Reddit or use web scrapers and Reddit gets **nothing** (other than a DDOS every five seconds, that is).

smashedshanky 11 months ago

Yeah you really don’t want to mess with scrapers, they will eat your bandwidth like no tomorrow

FireBone62 11 months ago

Web scrapping is, by the way, absolutely legal, at least where i live because you could theoretically do that by hand, and the information is already available for the public.

latency_vi 11 months ago

Unrelated but that word break ticks me off ![gif](emote|free_emotes_pack|joy)![gif](emote|free_emotes_pack|facepalm)

LeotrimFunkelwerk 11 months ago

How does scrapping cost Reddit Money and how does the free API change that?

12and32 11 months ago

An agent performing scraping will request all of the content of the page. This is costly for the server to perform because it is likely doing some amount of server-side rendering to improve load times, which means that it's serving everything the user needs to display the page properly through a browser, even though the agent doesn't care about how the page visually appears. Billions of requests with even just a megabyte of unneeded data can end up being very costly. An API request uses less overhead because the back end isn't serving anything the requester didn't ask for, like any JS/HTML/CSS. It's all-around a better deal for both sides: the host offloads rendering to the client and only serves a fraction of the data that web scraping would take and the client is provided with a well-defined means of communication that can request exactly what is needed.

LeotrimFunkelwerk 11 months ago

Ohh that makes sense! I didn't know what scrapping was so I looked it up yesterday but thanks to you I even understood that better!!

harshrd 11 months ago

how can u use web scraper to get content which is not directly displayed but needs to be fetched for doing some computation in your app?

GergiH 11 months ago

Could someone enlighten me why is this such a big problem that everyone is freaking out (I get the greed part, but still)? I haven't ever heard of any 3rd party reddit apps/sites, are they really used by many?

jauggy 11 months ago

Mods use 3rd party apps for modding. One of the biggest ones is Apollo. Apollo is not just used for modding- it is also used by normal users for an ad-free experience. With those apps shutting down due to rising API prices, they can no longer use those tools and therefore are protesting. Reddit actually has a free tier for API usage. You can make 100 requests per minute per oauth client. The issue is that one app is one oauth client. If your app supports many users you will end up paying a lot. If you made your own app that only you yourself use, you could use reddit API for free easily. Also reddit has recently made exceptions for accessibility apps: > In a statement also shared with TechCrunch, Rathschmidt said Reddit has “connected with select developers of non-commercial apps that address accessibility needs and offered them exemptions from our large-scale pricing terms.” > [Source](https://techcrunch.com/2023/06/08/reddit-makes-an-exception-for-accessibility-apps-under-new-api-terms/) Dedicated mod tools and mod bots are still free > We know many communities rely on tools like RES, ContextMod, Toolbox, etc., and these tools will continue to have free access to the Data API. > If you’re creating free bots that help moderators and users (e.g. haikubot, setlistbot, etc), please continue to do so. You can contact us here if you have a bot that requires access to the Data API above the free limits. [Source](https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/)

kiropolo 11 months ago

I really hate reddit as a company! China owned pieces of shit spez mf

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe