WordPress防AI爬虫的robots.txt正确写法

因为我自己网站是真的被Facebook爬虫和Amazonbot这两个狗东西爬崩溃过,所以我给每个WordPress都加上了robots.txt这个文件。

大家可以去了解以下每个爬虫的名字,除了我们熟知的搜索引擎爬虫外,还有很多浪费服务器资源的垃圾爬虫,像什么帮别人做SEO分析的爬虫都要来爬你的网站,着你受得了?

上面说我的网站被那两个狗东西爬崩溃的,Facebook的爬虫是meta-externalagent,亚马逊的爬虫是Amazonbot,还有OpenAI的爬虫GPTBot也很可怕,它们搞AI的爬虫都很可怕。

如果你用robots.txt屏蔽等不及,因为要24小时才生效,那你直接在网站日志里看它们的IP,把它们IP段直接禁了,禁IP段的教程看这个:https://www.shoushai.com/p/982

robots.txt设置(适用于WordPress):

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-include/
Disallow: /wp-content/plugins/
Disallow: /wp-login.php?redirect_to=*
Disallow: /go?_=*
Allow: /wp-admin/admin-ajax.php
User-agent: GPTBot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: YisouSpider
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: SemrushBot-BA
Disallow: /
User-agent: SemrushBot-SI
Disallow: /
User-agent: SemrushBot-SWA
Disallow: /
User-agent: SemrushBot-CT
Disallow: /
User-agent: SemrushBot-BM
Disallow: /
User-agent: SemrushBot-SEOAB
Disallow: /
user-agent: AhrefsBot
Disallow: /
User-agent: DotBot
Disallow: /
User-agent: Uptimebot
Disallow: /
User-agent: MegaIndex.ru
Disallow: /
User-agent: ZoominfoBot
Disallow: /
User-agent: Mail.Ru
Disallow: /
User-agent: BLEXBot
Disallow: /
User-agent: ExtLinksBot
Disallow: /
User-agent: aiHitBot
Disallow: /
User-agent: Researchscan
Disallow: /
User-agent: DnyzBot
Disallow: /
User-agent: spbot
Disallow: /
User-agent: YandexBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: SemrushBot-BA
Disallow: /
User-agent: SemrushBot-SI
Disallow: /
User-agent: SemrushBot-SWA
Disallow: /
User-agent: SemrushBot-CT
Disallow: /
User-agent: SemrushBot-BM
Disallow: /
User-agent: SemrushBot-SEOAB
Disallow: /

本文来自投稿,不代表首晒立场,如若转载,请注明出处:https://www.shoushai.com/p/985

喜欢 (0)

发表回复

登录后才能评论

猜你喜欢