Fog Creek Software
Discussion Board

If you like this...

What type of things can use 'if you like this, you may also like this'? ex. books, music,...

How much 'mass' would be required to make such an application worth while and  efficient..


Social Programmer
Thursday, August 26, 2004

This is quite possibly the worst phrasing for a question I've ever seen.

Do you mean that you want to know what sorts of things have widespread appeal to the degree that they're valid candidates for the "If you like X" cliche?

If your marketting is based upon "If you like X, then WOW!!", I'd recommend that you fire your sales staff.

Thursday, August 26, 2004

I'll try to be a little less condescending.

If you're talking about automating the generation of similar objects, then there's a number of ways you can do it.

You can design a list of rules specifying if someone chooses A, they'll likely be interested in at least looking at D, R, and W as well. Repeat for every item in your collection.

Or, record all previous choices, and mine out the relationships afterwards. Amazon can look back at months of data and say..

1. 20% of people who bought A also looked up D, R, and W at some point in time'
2. 20% of these people also bought D, R, and W.
3. Since only 2% of people looking the D page buy it, there is likely a relationship between A and D.

So, they might give a link to D from the A page, or give a discount on D, whatever.

This approach requires a very large amount of data (assuming large colleciton of objects) to produce any sort of meaningful results. No idea if Amazon uses this, but I've heard of this being done elsewhere.

Thursday, August 26, 2004

For a music website I run, we have something like this. We have a database of (currently) about 10,000 users, 6,000 artists and 50,000 ratings (users can 'rate' artists out of 10). These are run through a vector-based algorithm every night to calculate a lookup table of 'similarity' between each pair of artists and each pair of users, based on the correlation between their ratings. These in turn are used to generate recommendations for users, based on similarities to artists they've rated well. It can also tell you which other users have the most similar (or differing) music taste to you. Works pretty well and on this sort of scale can be implemented quite easily using MySQL (or similar) to do all the grunt-work - the nightly regeneration of lookup tables takes about 20minutes.

Thursday, August 26, 2004

...and the recommendations are generally quite good, although obviously it helps if you rate a lot of bands to get good recommendations.

On a larger scale I'd probably write optimised code in C to do the calculations, to reduce the load and speed things up, rather than just scripting the DBMS to do it.

Thursday, August 26, 2004

The "people who bought X also bought Y" recommendations on Amazon used to be quite reliable for me.

The recommendations seemed to get worse around the time when they started doing "the page you made".  Pure speculation on my part, but I think they jumbled together the "people who looked at X also looked at Y" data with the actual purchase data.

I think you could probably do it with links, based on a dataset like or furl, and maybe with photographs, and almost certainly with groceries.  If you're conservative with it (ie wait until a sizeable proportion of a large dataset indicates something) then I don't think you can go far wrong.

Tom (a programmer)
Thursday, August 26, 2004

I'm not sure which site Matt runs, but Yahoo's Launch has a similar feature.

If you go to a performer's page, it will suggest other performers by both suggesting a 'Fan Station' and by direct links - "Listen to Y or Z and other artists liked by fans."

To Quote the Launch help "A fan station plays songs that fans of a particular artist may also enjoy. The musical selection is based on what other music is highly rated by LAUNCHcast listeners who like that artist."

It can be interesting and useful. For instance the SheDaisy page ( ) has links to other Country perfomers but it also suggest Nickleback.

I've found an interesting mix of music this way just by browsing these links.

Thursday, August 26, 2004

Jeff - How could someone suggesting Nickelback ever be construed as 'useful' ? ;)

The music site is - and yes we didn't start doing the recommendations until a decent data  set of ratings was built up. If you're not sure whether you've reached that point, try it and see - if you get nothing but junk out perhaps best wait til you have more data. A few thousand 'ratings' were enough for us although those are quite clearly-defined/chosen by the user - if you're collecting vaguer data based on click patterns or something you'd probably need a lot more.

Choosing what algorithm to use is also quite tricky. The vector-based one we use seems to work well and is pretty simple to implement - it literally calculates the (cosine of the) 'angle' between pairs of artists in a high-dimensional user-ratings space.

Other methods may scale better or may be slightly more powerful but tend to be harder to get to grips with. There are lots of research papers out there on the subject - 'collaborative filtering' is the key phrase to search for.

Thursday, August 26, 2004

In the simplest case, if your web-store only sells two products, A and B, you can quickly calculate that anyone who buys A might buy B as well.

All other cases are left an an excercise to the reader.

Thursday, August 26, 2004

Check out at the University of Minnesota.

Matt Cruikshank
Thursday, August 26, 2004


Have you ever tried a factor analysis on your ratings data? I'd be curious to find out if musical tastes can be boiled down to a few simple factors, or if tastes truly are complex.

Rob VH
Thursday, August 26, 2004

If you like pussy you may like the other hole near that.

Thursday, August 26, 2004

"The "people who bought X also bought Y" recommendations on Amazon used to be quite reliable for me."

I always thought this broke down when new Harry Potter movies came out:

People who bought Joel on Software: And on Diverse and Occasionally Related Matters That Will Prove of Interest to Software Developers, Designers, and Managers, and to Those Who, Whether by Good Fortune or Ill Luck, Work with Them in Some Capacity also bought Harry Potter.

People who bought Spot Goes to the Beach also bought Harry Potter.


"Have you ever tried a factor analysis on your ratings data?"

Would require someone coming up with factors in the first place. It's simply not as clean as simple user ratings.
Thursday, August 26, 2004

Mark, the vector method does attempt to compensate for popularity - as it depends on the angles between ratings vectors, and not their magnitude, harry potter's popularity shouldn't (in theory) buy him any extra similarity to joel.

Dunno what amazon uses.

About factor analysis - I read a bit about something called (I think) Latent Factor Analysis - the statistical model has people's ratings patterns parameterised as a linear combination of unknown 'factors' (factors as in decision factors, not factors in the mathematical sense), but what those factors are/correspond to doesn't need to be specified beforehand. The statistical analysis finds them for you. Although it's still up to you to figure out what (if anything) the factors/groupings correspond in real life once it's done that.

Rob: Does that sound like what you were talking about? the paper I read made it sound rather complicated and so I went for the simpler option :) if you have any links to understandable material on how to implement it I might give it a go.

Thursday, August 26, 2004

All I know is if 100% of the people on Amazon buy Harry Potter, 100% of the people who bought Joel's book will also have bought Harry Potter, and therefore it will get recommended.
Friday, August 27, 2004

The generic term for features like this is "Collaborative Filtering".

And the other poster was right -- Amazon's recommendations have really gone downhill.  Could be because they're now mixing in the stuff people looked at, but didn't buy.  Or because some (probably all, >sigh<) are now sponsored.

Friday, August 27, 2004

Mark: As the similarity metric is symmetric it also takes into account the fact that the vast majority of Harry Potter readers /aren't/ into Joel. A book with a high association both ways would be a much better fit for similarity, ie a book whose readers like Joel, and which Joel readers like. Even if it doesn't come close to Harry Potter in overall popularity, or even popularity amongst Joel readers.

The formula is  similarity(a,b) = cos(theta) = a . b  /  |a| |b|

where a and b are ratings vectors for two artists/products, with each entry in the vector corresponding to the rating given by a particular user,  . is the scalar product, and |a| is the magnitude of the vector a. The crucial bit is that we're dividing through by the magnitudes. Harry will have a large magnitude but most of the entries in his ratings vector will be matched by corresponding zeroes in Joel's ratings vector. So the similarity should still be small.

Friday, August 27, 2004

Ah. I see. So while 100% of Joel readers read Harry Potter, only 1/1,000,000 Harry Potter readers read Joel. Not enough similarity for a match.

Sound better than Amazon. :)
Friday, August 27, 2004

*  Recent Topics

*  Fog Creek Home