Roundup Thursday for the Week of 6/29/08

July 3rd, 2008 admin Posted in Seo | No Comments »

Posted by rebecca

Stories, news, and other notable items from the past week:

One star link:

  • Hacker News started a thread asking why hackers hate SEO. Most of the answers are frustrating, though some folks chime in to defend our industry.

Three star links:

Four star links:

Five star links:

  • Over at SEO Scoop, James Duthie examines the traffic quality between SEO and SMO and concluded that "search engines generate superior quality traffic." It’s important to keep in mind that while social media marketing is still a great tactic to leverage, it’s not a replacement for natural search engine traffic.

YOUmoz entries:

Best of YOUmoz:

New events added to the Events Calendar:

Upcoming events:

New additions to the SEOmoz Marketplace:

Featured job postings:

Featured companies:

United States/North America:

UK/Europe:

Australia:

Featured resumes:

Currently looking:

  • Austin Wisner is an SEO and PPC marketer who also has extensive writing experience.
  • David Arrington is the founder and CEO of Blue Fusion Marketing and has expertise in running a successful online marketing company.
  • Joe Williams is seeking an organic SEO position with an SEO/marketing firm or long-term contract work with an organization interested in moving to the top of major search engines. He has experience with building SEO-optimized websites and integrating e-commerce packages, as well as SEO knowledge.
  • Babul Paul is an SEO/SEM manager who is looking to leverage his experience of Internet marketing, web development, and email marketing.

Happily employed:

  • Samual is an SEM expert with over two years’ experience in Internet marketing and SEO.

Do you like this post? YesNo

More: continued here

AddThis Social Bookmark Button

Reddit, Stumbleupon, Del.icio.us and Hacker News Algorithms Exposed!

July 3rd, 2008 admin Posted in Seo | No Comments »

Posted by Danny Dover

It is greatly ironic that algorithms, the quintessential example of all that is not human, would be so fundamental to social media. Last week I wrote a post about how Google gathers user data. This week I continue by exposing how popular social media websites use algorithms to utilize user data.

Although humans power social media, it is algorithms that provide the frameworks that make user input useful. As proven by the countless social sites online, finding the correct mix of participation and rules can be extremely difficult. Below are some of the algorithms that when combined with the right people have proven successful.

Popular Social Media Algorithms

Y Combinator’s Hacker News:

Hacker News Rank

Formula:

(p - 1) / (t + 2)^1.5

Description:

Votes divided by age factor

p = votes (points) from users.
t = time since submission in hours.

p is subtracted by 1 to negate submitters vote.
age factor is (time since submission in hours plus two) to the power of 1.5.

Source: Paul Graham, creator of Hacker News


Reddit:

Reddit Rank

Formula:

Reddit Algorithm

Description:

First of all, the time 7:46:43 am on December 8th 2005 is a constant used to determine the relative age of a submission. (It is likely the time the site launched but I have not been able to confirm this) The time the story was submitted minus the constant date is ts. ts works as the force that pulls the stories down the frontpage.
y represents the relationship of up votes to down votes.

45000 is the amount of seconds in 12.5 hours. This constant is used in combination with yts to "water down" votes as they are made farther and farther from the time the article was submitted.

log10 is also used to make early votes carry more weight than late votes. In this case, the first 10 votes have exactly as much weight as votes 11 through 101.

Source: code.reddit.com, Redflavor.com and Hacker News user Aneesh


StumbleUpon:

Stumbleupon Rank

Formula:

(Initial stumbler audience / # domain) + ((% stumbler audience / # domain) + organic bonus – nonfriend) – (% stumbler audience + organic bonus) + N

Description:

The initial stumbler "power" (Audience of the initial stumbler divided by the amount of times that stumbler has stumbled the given domain) is added to the sum of all the subsequent stumbler’s powers.
 
Subsequent stumbler power is ((Percentage of audience stumbler makes up divided by the number of times given stumbler has stumbled domain) + a predetermined power boost for using the toolbar - a predetermined power drain if stumblers are connected)  + (% of the stumbler audience + a predetermined boost for using the toolbar)

N is a "safety variable" so that the assumed algorithm is flexible. It represents a random number.

Source: 2007 Tim Nash at The Venture Skills Blog Please see his blog post for more in depth information


Del.icio.us:

Del.icio.us Rank

Formula:

Points = (Amount of times story has been bookmarked in the last 3600 seconds)

Description:

Rank on Del.icio.us Popular is determined by comparing points. Points represent the amount of times a story has been bookmarked in the last hour. The higher the rate, the higher the points. Every bookmark counts as one point.
3600 is the seconds in one hour.

Source: Based on my extended observations of Del.icio.us Popular


Avoiding the 10,000 lb Gorilla in the Room(Digg.com)

Digg is different. The company is a lot less transparent than the above mentioned companies. It is fearful of being gamed and in response has created a secritive algorithm that appears to be far more complex than its competition.

At a minimum I expect that Digg’s algorithm takes into account the following factors:

  • Submission Time
  • Submission Category
  • Submitter’s Digg authority
  • Submitter’s website wide activity
  • Sumbitter’s friends and fans
  • Subsequent digger’s authority
  • Subsequent digger’s friends and fans
  • Subsequent digger’s geo location
  • Subsequent digger’s HTTP referer

I look forward to hearing from the social media experts in the comments. Please let me know how I can improve this post. If any of you (experienced or inexperienced) decide it would be preferable to contact me privately, please feel free to via E-mail, Linkedin or Twitter. Thanks!

Do you like this post? YesNo

More: continued here

AddThis Social Bookmark Button

Cracking Google’s 1,000 Page Barrier

July 2nd, 2008 admin Posted in Seo | No Comments »

Posted by Dr. Pete

One of the frustrations of doing SEO for large websites is the fact that Google makes it very difficult to see more than a small part of the search index. Even in Webmaster Tools, Google’s index search is built on the same mechanics as its web search, which only lets you see the first 1,000 pages of any result. Whether you’re trying to get pages discovered, struggling with duplicate content, confirming robots.txt changes, or doing advanced index sculpting, that 1,000-page barrier can be extremely limiting when you’re dealing with a site with 10,000 or more indexed pages.

So, how can we dig deeper into the index and really see the big picture?

The Tools – Site: and Inurl:

First off, you’re going to need a couple of tools. I’ll assume that most of you are familiar with Google’s "site:" command, which returns the indexed pages from any given domain or subdomain. Let’s take our friends here at SEOmoz as an example. Type "site:seomoz.org" into Google’s search box, and you’ll see something like this:

The other command we’ll be using is "inurl:", which, paired with other search terms, restricts the results to only those containing a specific keyword in the URL. Paired with the "site:" command, Google only reveals indexed pages which contain those URL keywords.

The Tactic – Index Deconstruction

Using our SEOmoz example, how can we find out which pages are included in the roughly 12,000-page index when we can only see those pages 1,000 at a time? Those last three words are the key: we can only see 1,000 pages at a time, but depending on how we construct our searches, they don’t have to be the same 1,000 pages. By splitting up our index searches logically, we can break the full index up into manageable chunks. We’ll do this by using "inurl:" to force the "site:" command to show us the index through smaller windows.

An Example – Deconstructing SEOmoz

This is one of those techniques that’s much easier to illustrate with an example. Let’s say that we needed to dig deeply into SEOmoz’s 12,000 indexed pages. The first thing that we might do is to take a look at the main navigation to get an idea of the URL/folder structure of the site. Looking at the top-right navigation on SEOmoz, we see the following (I’ve added the numbers 1-6 - see below):

Other than "Home," the first link goes to the "/blog" folder. That looks promising, so let’s try out our combination "site:" and "inurl:" search:

After clicking the "omitted results" link to see the full list, we get 2,430 pages of the index that contain the word "blog." That’s a good start, so let’s see what we can do with a few more of the major folders (numbered above):

  1. inurl:blog – 2430
  2. inurl:ugc - 712
  3. inurl:articles - 96
  4. inurl:tools - 29
  5. inurl:users – 5880
  6. inurl:marketplace - 787

Not bad: with just 6 subfolders, we’ve accounted for 9,934 pages or over 80% of the index. This, of course, assumes minimal overlap, and the accuracy of Google’s numbers may be questionable (I’ll discuss some issues with "inurl:" at the end of the post), but it’s more than adequate to get the job done.

Now, we’re left with a couple of groups, such as (5) that are still greater than 1,000 pages. At this point, you’ll have to use some logic and your knowledge of the site in question. As a frequent Moz user, I know that the "users" folder contains all of the user profiles. Digging a little, I can easily find that those profiles all contain "users/view." A new search on "inurl:users/view" reveals 5,810 user profiles, making up almost all of the pages in the "users" folder and almost half of the total index.

An Example – Canonical URLs

Most of the time, we aren’t going to be trying to deconstruct the entire Google index for a site, but just need to answer a specific question. Let’s take my own company site/blog as an example. Recently, I realized that I had left some loose ends in the code that were revealing both canonical and non-canonical URLs. So, for example, the same blog post might have the following two URLs:

  1. http://www.usereffect.com/topic/the-last-spam-youll-ever-need
  2. http://www.usereffect.com/index.php?id=154

I’ve recently made some code changes to fix the problem, but how do I find out if my fix is working? I simply look for "id" in the URL with a search command like "site:usereffect.com inurl:id". As of this writing, that search only shows 1 result, suggesting that my changes are having the desired effect.

Advanced Inurl Tips

I hope that I’ve demonstrated just how powerful two relatively simple search tools can be when effectively combined. Before you go out and put this to work, though, a couple of warnings about "inurl:", which has a tendency to misbehave.

First, "inurl:" seems to ignore punctuation, for the most part. A targeted search on the folder "inurl:/blog" returns the same results as "inurl:blog," which is to say that it returns every page that contains "blog" anywhere in the URL. In some cases, this won’t be a problem, but you’ll have to judge that on a case-by-case basis. Like standard Google search terms, "inurl:" only searches on whole words (but doesn’t seem to allow word stems), and you can only use a single word at a time in any given "inurl:" statement.

You can use multiple "inurl:" statements (one for each word) in your search, which are automatically combined with a logical AND. You can also use "-inurl:" to exclude specific URL keywords from any given search. Finally, you can combine "site:", "inurl:" and stand-alone keywords to target indexed pages by URL and content keywords in one statement.

Do you like this post? YesNo

More: continued here

AddThis Social Bookmark Button

Capitalising On The Ultimate Form Of Duplicate Content

July 1st, 2008 admin Posted in Seo | No Comments »

Posted by Jane Copland

The first time I ever accessed the Internet was from my mother’s work computer in late 1995. I was eleven years old and her homepage was set to Yahoo. I can’t really remember what it looked like, but Googling (oh, I hate the irony too) "Yahoo in 1995" produced a post by John Battelle with a magnificent screen cap of the portal in the mid-90s. This was thirteen years ago (so, over half my lifetime), and my memory might not be serving me very well, but I’m fairly sure that the first thing I ever searched for was song lyrics. Probably to a very bad 1995 song. My father wanted to try it next and he searched for the lyrics to "Flower of Scotland." That, I remember.

Today, searching for lyrics is a horrendous task. Most top-ranked lyrics websites look like MySpace threw up on GeoCities and, if I dare to click on a result, inundate my computer with pop-up advertising. Earlier today, I actually stumbled on an instance of a robotic voice congratulating me for having won two iPod nanos. To get a coherent result and not be presented with the "Are You Stupid?" test, you have to memorise which sites are worthwhile to click on.

How do search engines really determine which sites should rank well for song lyrics-related material? This niche seems to be relatively competitive, with advertising being the business model of choice. The first big problem is certainly duplicate content. This is an especially important question when it comes to lyrics because of people’s tendency to take a sample of a song they’ve heard and search for it without knowing the song’s name. If there are thousands of instances of the same song present online, how does a site make sure its version is ranked?

The suggestions Google shows for searches beginning with "lyrics" is a good place to start when analysing what search engines value for these types of searches.

Currently popular music obviously dominates. Choosing the search "lyrics to take a bow," you’ll see that Google presents both results for a currently popular song with that name by Rihanna, as well as a track from 2007 by Leona Lewis and a fourteen-year-old song by Madonna. Edit: two YouTube videos have made it into the mix in the last twenty-four hours, taking out the Leona Lewis song.

The top three results, plus results five, six, seven and nine are all for the same Rihanna track. In the pages’ inlinks, I’ve included internal links, as some of these sites do interesting things with their internal link structure. When you look at the links for the LyricsMode.com page, you’ll see that tens of thousands of them appear to come from pages like this, which are results pages for failed queries. Instead of displaying no content, the site shows the top 100 most popular songs at the given moment. Given that the page has only 29 links from external sources, I have to believe that its internal work is quite important here.

1)  http://www.metrolyrics.com/take-a-bow-lyrics-rihanna.html
     663 inlinks - PR3
2)  http://www.completealbumlyrics.com/lyric/133088/Rihanna+-+Take+A+Bow.html
     72 inlinks - PR2
3)  http://justjared.buzznet.com/2008/03/14/rihanna-take-a-bow-lyrics/
     109 inlinks - PR 6
5)  http://www.lyricsmode.com/lyrics/r/rihanna/take_a_bow.html
     68,982 inlinks - PR0
6)  http://www.lyricstop.com/t/takeabow-rihanna.html
     6 inlinks - PR0
7)  http://www.musicloversgroup.com/rihanna-take-a-bow-video-and-lyrics/
     415 inlinks - PR0
9)  http://www.celebridiot.com/2008/04/25/rihanna-take-a-bow-video-and-lyrics/
     130 inlinks - unranked

For comparison’s sake, here are the links and PageRanks for the domains:

http://www.metrolyrics.com/ - 1,879,225 inlinks - PR5
http://www.completealbumlyrics.com/ - 39,336 inlinks - PR6
http://justjared.buzznet.com/ - 447,097 inlinks - PR6
http://www.lyricsmode.com - 906,098 inlinks - PR5
http://www.lyricstop.com/ - 11,433 inlinks - PR5
http://www.musicloversgroup.com/ - 59,875 inlinks - PR4
http://www.celebridiot.com/ - 526,323 inlinks - PR4

On the surface, this seems totally unexplainable. Aside from a manual tweak which somehow acknowledges that lyrics are inherently duplicated, how do search engines justify ranking the same content over and over again?

Or is this the result of literally everything related to this query being duplicate content? If search engines filter duplicate content, simply lowering results that are duplicated, then surely it stands to reason that if all the results are duplicates, then there is nothing else to be shown above affected results. However, you’d think that adding your own content and hiding the lyrics with something like an iframe, but still optimising for lyrics searches, would be beneficial. Or would this be considered too manipulative? Obviously, this would negate searches where people type in snippets of songs they’ve heard and want to find. For this, could you pick out which parts of songs people are most likely to include in search queries (first words, repeated phrases, hooks, etc) and include only those as indexable content, excluding the rest with whichever technique you choose. It could certainly be done, especially with iframes, and could probably look relatively natural.

The answer in regards to the Rihanna song may well be that the content is not in fact the same. Many of these lyrics websites rely on users to provide their content, and it seems to be rare that words are actually taken from official resources. Each results’ lyrics are slightly different.

The common wisdom is that duplicate content will still be singled out if a degree of similarity is detected. How similar do results have to be in order to be filtered? Also, there is unique content on each of these pages, the easiest and most common being user comments about the song. How much of the content has to be duplicated, and should it make a difference that the original comments are virtually hidden whilst the lyrics are front-and-centre?

If having only ever-so-slight unique content is all it takes, this changes our duplicate content landscape a bit. Currently, we’ll give people advice such as present duplicated (or substantially similar) content in an iframe, surrounding it with unique content to prevent a page from being filtered. Is it really enough to change instances of "closing" to "closin’ " and "cause" to "cuz?"

Perhaps a better indication of truly duplicate content would be a lesser-known song and one that has less room for interpretation when it comes to lyrics. For this, I chose "No Aphrodisiac" by Australian band The Whitlams. The song has two lines which could be up for interpretation as far as punctuation and spelling go. However, upon searching for "lyrics to no aphrodisiac," I see that all but one site replicates the same spelling and same punctuation.

I’d like to see what would happen if a site like Last.fm began offering lyrics. Last.fm, Pandora, and similar sites provide some of the highest quality online music content and are miles ahead of Lyricsdepot, A-Z Lyrics, and other lyrics databases. Last.fm has the web presence and the community to make such a campaign work: the main question would be whether they’d be interested in harnessing that market. For informed users, a Last.fm result would be far more satisfying than the pop-up ridden, hideous results that currently rule the SERPs.

Last and Pandora would also be optimising for a different purpose: 99.9% (I’d say 100% but someone would have to prove me wrong) of ranking lyrics websites are pushing ringtone advertisements; Last and Pandora sell premium subscriptions to their online "radio" stations. Both sites show advertising, but not nearly with the saturation of lyrics databases. I have little experience with Pandora, but Last also touts links to Amazon for users to purchase CDs and mp3s. These business models are very different and undoubtedly, very few lyrics searchers will end up converting into paying Last members. However, those who do often end up providing quite a healthy stream of income as repeat customers, and the commission earned from the Amazon links probably doesn’t go astray either.

Given the duplicate content and the overall horrifying quality of lyrics sites, I wonder how difficult it would be to rank well for these searches. Some of these sites’ link profiles are quite impressive, but if search engines’ goal is to provide the highest quality content to users, they would surely love to see a high-quality competitor take hold of the niche, whether that competitor was selling premium content or making its money through advertising.

As an aside, I have always found LyricsMode to be a lot better than most of these sites and it’s good to see their rankings steadily improving. I do believe, however, that there’s plenty of room for improvement in this lucrative market and if someone dares to make vast improvements, the rest of the market will follow suit.

Do you like this post? YesNo

More: continued here

AddThis Social Bookmark Button

An Initial Review of Boudica, the Social News Site for Women

July 1st, 2008 admin Posted in Seo | No Comments »

Posted by rebecca

Danny Sullivan’s lovely wife, Lorna Harris (who once lent me a hat and gloves when Danny took me to see Stonehenge on an especially cold, windy day), recently created Boudica, a social news site for women.

The site is pretty new and is currently in Beta testing mode, but I thought I’d give an initial review of things thus far. I’ll start with the following caveat: I’m not an especially girly female. I love Digg and reddit and don’t get offended by the "omg hot girls" content that frequents more male-dominated social media sites. However, having said that, though I’m not fawning over the latest fashion trends or counting down the days until the next Matthew McConaughey shirtless pec-baring chick flick rom com, I am still female, and thus I was intrigued by Boudica and wanted to take a peek under the hood and see what sort of content it has, the community it’s building, etc. I don’t intend this to be a scathing review of a site that’s "too girly" for my tomboy tastes. I’ll aim to be fair yet straightforward, and keep in mind that the following are nothing more than my personal opinion.

What I Like About Boudica

  1. The community doesn’t seem stereotypically "girly." Thankfully, unlike this satirical glimpse of what the Internet would look like if it were "ruled by females," Boudica features stories about women on the web, lists of "super foods," crazy fad diets, travel tips, gadgets, geek stuff, and more. In other words, you’ll find a lot of stories that frequent Digg and other social news sites. Sure, there are submissions about Sex and the City, body figures, and Desperate Housewives dinnerware, but there’s also content that I find appealing.
  2. I can discover articles and stories that may slip through the cracks on other social news sites. There was a submission that linked to a study claiming that gay men have similar brains to straight women. It’s an interesting article, and I hadn’t seen it on other social news sites. While I saw some submissions that were already prevalent on Digg, del.icio.us, etc, I did find some interesting submissions that I’d not seen before.
  3. The site transcends beyond social news and focuses on the community. Though I haven’t delved too deeply into Boudica, there’s plenty for me to do here. I can send private messages to other members, post a blog entry, submit stories to the social news section, scrawl a quick message on the "Scrawl Wall," and interact with members beyond simply adding them to my friends list. The various features are a nice change of pace from other social news sites–Boudica encourages participation and discussion, and it seems to reward/appreciate users who put a lot of time and effort into using the site.
  4. I like the marketing potential. Women-oriented sites can craft link bait and interesting articles/blog posts that appeal to women rather than trying to figure out a way to put a "techy/young male" spin on a story in hopes of getting a piece on Digg or reddit or Propeller. If Boudica gains in popularity, it can be a great marketing resource for sites that produce content/offer products that are more female-centric.

What I Don’t Like About Boudica

  1. Purple and pink aren’t really my cup of tea. Does "female" always have to equate to "pink and purple"? I like bright greens and oranges and other "web 2.0" colors–I’d love to see Boudica have a hip, cool design that’s fresh and clean but doesn’t feel blatantly "feminine." Also, from a usability perspective, the pink links are a bit light and can be difficult to read.
  2. I don’t understand the top-level navigation. At this point I can’t tell the difference between the News, Arts, House, Time-Off, and Talk categories. They all seem to list submissions. "News" and "Arts" could be different subject categories, but what’s "House"? Are they stories that deal with home matters? That are appealing to housewives? Is it a category entirely devoted to Hugh Laurie? Is it mighty migh-tay, just lettin’ it all hang out? The same goes for "Time-Off" and "Talk"–I don’t understand what they signify. I think a brief but clear explanation for each category would be useful (and perhaps it’s necessary to re-name the categories with something more intuitive).
  3. The site lists users by "karma" but doesn’t explain what "karma" is. I can imagine that Karma is like your popularity or signifies the strength of your account, but it’s never explicitly defined on the site. Why is karma good? How do you get more karma? Can you lose karma? What’s the benefit of increasing your karma points? Do you get a nifty badge or title that you can display?  This is another feature that I’d like to see fleshed out a bit more.
  4. The site layout is a bit too cluttered for my taste. One one page I can see submitted stories, my account information, a list of recent blog posts, a list of people on my friends list, a list of the best karma users, my friends’ recent blog posts, an invitation to invite a friend to use Boudica, a tag cloud, the best published "scoops," the best upcoming "scoops," and the Scrawl Wall. It’s a bit of an information overload. I don’t really need to see my list of friends–that could be something I can click on and see within my profile. Recent blog posts/friends’ blog posts could maybe go under a "Blog" section that gets added to the top-level navigation. I think the Scrawl Wall is cute, but it could move further down the page so that more important information (like upcoming and published scoops) can get moved further up. It seems like a lot of the information featured on the page can be better placed elsewhere on the site.

Obviously, a lot of my gripes can easily be due to the fact that Boudica’s still in Beta and is working out the kinks. For example, I like the tag cloud, but it’s not working properly right now (I clicked on "book review," which seemed like a popular tag based on its font size, but no stories pulled up). I certainly don’t fault the site for any of the problems I’ve identified–I’ve no doubt that Boudica will strive to improve user experience and functionality based on its Beta feedback.

That being said, overall I have to say that Boudica is a pretty interesting site. I’m really curious to see how popular it will become, not just among female SEOs and marketers, but what its adoption rate will be for other women (such as people like my sister, a teacher with two kids who casually uses the Internet but isn’t uber-net savvy, yet sends me interesting stories and photos she comes across every now and then). I hope the site gains in popularity–if anything, it’d be a fascinating ethnographic study to see what sort of women frequent the site and what information they think is interesting. And, of course, as I said, the marketing potential is huge (hey, you can take the girliness out of the marketer…). Obviously I’ll keep hanging around Digg, reddit, Propeller, Mixx, Yahoo! Buzz, del.icio.us, StumbleUpon, and other social news and social media sites, but alhough I’ll certainly keep playing with the boys in their treehouse, I’m happy to sit down for a tea party with the girls every now and then, too. :)

Do you like this post? YesNo

More: continued here

AddThis Social Bookmark Button