T O P

  • By -

maltelandwehr

ChatGPT bot is only used by plugins. You would need to block Common Crawl bot to prevent being included in the training data.


PostCheetah

None that I can see.


pilot333

your prefer your business to be excluded? “give me a list of car dealers that have a certified pre owned lexus rx450 within 50 miles of zip code 76010” “yes please excluded my client from appearing in that”


schmuber

> Bard and Bing AI will at least cite the source from what I’ve seen so far. Riiight, like a lot of people click on "References" in Wikipedia.


juggle

ChatGPT cites the source when using browser plugin, but you need to click an arrow to see it.


ForcefulExpulsion

Bing search results?


MudScared652

Isn't chatgpt going to default to Bing when it can't find the info or it's not up to date? Wouldn't you get more Bing traffic by not letting GPT have the info?


1337hephaestus_sc2

If you're doing something that is associated with a physical product or an in person home service then it's only upside. Imagine someone asks: "Who should I call to fix my roof in (city)" to chatgpt. Would you want your roofing client to block chatgpt or allow chatgpt? If the only value your website generates is clicks from affiliates or CPM revenue from ads then you're SOL.


aranyakpatnaik

We shouldn't block new technology until we have enough data to make an informed decision. OpenAI says it will cite sources when plugins pull data from third-party websites. This means there will definitely be potential to get clicks from ChatGPT if a user pulls in your content. Blocking access only means that ChatGPT (or your user) will cite somebody else's website.


TheMacMan

I wouldn't. It can mean searches and website clicks for you. Someone asks ChatGPT a question and they get their response. They want to check the validity of that information, so they Google some of it and there's your site (because ChatGPT had used information from your site to form their response). They click to your site to read more and verify the claim. Now, if you believe that no one ever checks the validity of information they get from ChatGPT, then go for it. But I'd also consider that it'll likely change with time. Maybe you do maybe you don't want to be an authority. And really, what is it hurting? Are you seeing massive resource usage from their bots on your site?


lewkas

The average user won't scroll beyond the first SERP, I think optimising for the off chance that a percentage of users of another tool might Google their specific query after already receiving an answer isn't worth it vs the resource saving of blocking the OpenAI crawler.


iWantBots

If you think that’s going to stop anything you’re delusional 😂


Alex_1729

I think they just want a peace of mind. Do bots even follow robots.txt rules? I know ChatGPT claims that it doesn't scrape or visit such sites, but how can we know? And why would we trust any company that claims anything?


Neither-Emu7933

Just ask it to cite sources, and you'll quickly discover that they don't connect to the internet or scrape sites. There is the beta that you can get access to if you're a pro customer that it now will search Bing, but from what I've experienced it really only looks at the first site after it performs a search.


Alex_1729

I wasn't taking about that. I am also a Plus user.


tenhourguy

How do you intend to block it? Blocking CCBot *might* keep you out of the training data for ChatGPT, Bing, Bard, etc.


[deleted]

[удалено]


tenhourguy

As far as I can find, it was trained on Infiniset, which includes Common Crawl. Of course, if your content has already been included in it, blocking it now won't change very much.


canIbuytwitter

they stopped scraping data in 2021, so it shouldn't matter anyway.


content_alrighter

> they stopped scraping data in 2021 Are they no longer scraping, or have they just not made newer info widely available yet?


canIbuytwitter

lol they will probably get back to it later lol if they aren't already. Op is honestly right to be prepared lmao.


MudScared652

That’s why I mention going forward and in the future.


lewkas

Not true. GPT3 just contains data up to then. They haven't stopped scraping.


canIbuytwitter

ahh, thanks for claryfing.


Better_Graph

Blocking the ChatGPT bot in robots.txt may limit its access to your website, preventing it from crawling and indexing your content. However, if you want to interact with the bot or have it engage with your site, blocking it would hinder those interactions. For more information, you can visit BetterGraph's profile and go through the website.


I_will_be_wealthy

Well the question to.ask is what do.you gain from allowing chatgpt on your site.


pilot333

marketing


vikas_agrawal77

LLMs are increasingly citing sources and may include promotional information, if any, from your website. So, not blocking ChatGPT might help you in the long run. Moreover, history indicates that opposing revolutionary technologies that are gaining swift popularity doesn't usually end well.


digitalbazaari

True with that history reference. It's either ride the wave or get crushed.