How do you check if your website is being crawled by a bot or spider, or in case you want to expose a certain section or sections of your website to the search engines which otherwise would be hidden from them as they are protected resources.
First of all, you have to know the bots or spiders that exist in horizon
You can get the list of search engine bots/spiders and the user agents from the links below
http://www.robotstxt.org/db.html
https://github.com/monperrus/crawler-user-agents/blob/master/crawler-user-agents.json
Now to the main part, how do we detect them
Most of the logic can be as simple as distinctly recognizing the User Agent
Here is a function that I use most of the time,
1 2 3 4 5 6 |
function isBot() { return ( isset($_SERVER['HTTP_USER_AGENT']) // check if the user agent header key exists && preg_match('/bot|crawl|spider|mediapartners|slurp|patrol/i', $_SERVER['HTTP_USER_AGENT']) ); } |
You can then simply call the above function as
1 2 3 |
if(isBot()){ // expose something here else protect the resource } |
This is the simplest way you can answer a question as to how to detect search engine/bots/spiders visits on my site
I hope this helps you to detect crawlers with PHP.
Best,
Leave a Reply