To give proper credit (I have no affiliation with the site) I am giving the link to where I found the below info. I was searching Google for the term "search engine friendly forum" and stumbled on this site. Curious to know your thoughts. I know sessiionID's have been discussed already, but still curious, especially after reading the below snippet, courtesy of :
rayburgettdesigns (.) com/adding-a-search-engine-friendly-forum/
"What is a Search Engine Friendly Forum?
The major problem with forums is the usage of sessionIDs. SessionIDs are URL parameters that are used to identify forum visitors. They look like this: http://www.domain.com/page.php?PHPSESSID=as8d87ad68a7sd9a6 When you first open a forum page, you are assigned a unique sessionID parameter, which is added to all the forum URLs that you visit. The forum script will identify you by this unique session value. That situation creates many different URLs for the same pages. From a search engine point of view, this is a lot of duplicate content (many urls with the same content).
To make your forum spider-friendly you simply have to get rid of the sessionIDs. Unfortunately, most popular forums need additional modifications to implement this behavior.
Crawlers behave just like anonymous visitors. A search engine friendly forum is a forum that does not use sessionIDs for guest visitors. Additionally, I recommend that you disable sessionIDs for the visitors who have cookies enabled. You can find the solution at the bottom of this article."
Thoughts Elksters?
Elk normally doesn't add a session ID to the query parameters? Anyway, I think all modern search engines do something like this:
What might be more exclusive to Google is that you can explicitly tell it to ignore specific parameters (see the page from which I quoted).
That's interesting. I have taken a snippet from one of my members post comments and put it in Google with quotes. Instead of bringing me to the actual post, Google takes me to the members profile page. No other links in Google, just to the members profile. I wonder why its doing that? Maybe it will come back and crawl and give me the link to the actual post. It was a relatively new post.
Google adds weight to 'recent' content. So pages that update more regularly, such as members profiles will appear higher. Some trickery with robots.txt can suppress it I'm sure.
Also worth noting, my new site http://discuss.scot which only has bots posting currently is only a few weeks old and is indexing with PHP session id's in the URL too. Whether that's optimal or not remains to be seen but it's working. Feel free to sign up to bump member count :P
https://www.google.co.uk/search?q=site:discuss.scot&client=safari&rls=en&biw=1920&bih=924&ei=Hx6bWLfkJ92ogAaSr5_ACQ&start=10&sa=N
Quick reading though, sitemaps may sort this, will open a feature request,
I thought the system was supposed to try and catch if its being browsed by a BOT and in those cases not add PHPSESSID to the URL (for all the above reasons).
I'll have a look and see if the detection can be improved, or if its fine, why PHPSESSID is being added when possibly_robot is true.
You're welcome to an admin account/access if you want to poke about. They don't bother me as indexing is happening as we can see.
Last time I checked, the sessionid is not added for robots, but google is not really a bot any more and doesn't really care for what we think it should care about, so at the first pass it picks up the phpsessid in the URLs.
Though, after a while, the canonical attribute kicks in and the session id is removed:
https://www.google.com/search?q=site:elkarte.net
Anyway,
@elk_is_cool , if you look through your posts, we have already discussed these topics:
http://www.elkarte.net/community/index.php?topic=3989.0
http://www.elkarte.net/community/index.php?topic=3887.0
I know. I did mention that in my opening post. It was the article I came across that raised my curiosity again about this subject. Guess I should have maybe posted in one of the original threads about sessionIDs. Just trying to get the "cleanest URL", not only for Google, but Bing, Yahoo, WebCrawler, AOL etc. Kind of like how WordPress does it.