Booting up a personal recommendation system for news

As I mentioned yesterday, I’m a big fan of ‘Rebooting the News’. That goes for both meanings: I love the podcast series by Jay Rosen and Dave Winer; and I’m also totally intrigued by the phenomenal transition of our system of news which is happening right under our noses.

In the 9-minute passage of RBTN 82 that I transcribed, our hosts talk about an idea that Dave put forward in a recent blog post, ‘Find me stuff that I’m interested in‘. It’s a discussion about the concepts of a personal recommendation system for news, on Dave’s part inspired by collaborative filtering technology which underpins Amazon’s personal product recommendations.

Not only do I agree with all the conceptual choices that Jay and Dave favor, – such as avoiding categories, using gestures, using feeds, looking at other users’ previous behavior, including information about authoring as well as consumption, including serendipity… – ; I have actually been thinking about these exact concepts for years.

Now, I’m not going to say, “It’s all been done already”, because Dave would think I’m trying to pitch a product :-)   Truth is, had it been done, we would all be using it. A personal system of highly relevant information is pretty much the Holy Grail of the Internet.

One potential complication with applying collaborative filtering to news content is that, when news breaks, there is no critical mass of gestures from previous users. This may cause some delay in the build-up of a recommendation. Instead of immediate, mass-scale amplification of the breaking news event, the news item might be a more slowly developing “trending topic” as per Twitter.

Also, when the news is very fresh, and its relevance is very personal (i.e. highly relevant to a small number of people), it may take too much time for a collaborative filtering system á la Amazon to collect sufficient gestures from other users in order to deliver the recommendation to the right people.

Therefore, rather than waiting for a new news item to pick up the critical mass which can enable collaborative filtering the Amazon way, we could instead look at the *history* of users’ gestures. If the stuff I have “gestured” in the past is very similar to the stuff you have “gestured” in the past, there is a likelihood that what you “gesture” next will be of interest to me.

So what I propose, instead of collecting many gestures from different users in order to generate a recommendation to one specific user, is to identify pairs of users whose gesture behavior is most similar, and let their behavior inform their mutual recommendations.

One could calculate a “similarity-percentage” for each combination of two users based on their gestures. With a view to serendipity, the ideal similarity is not necessarily approaching 100 percent. The system could offer users a feature to mix their own doses of serendipity. Want more off-beat news today? Turn the potmeter down to 70 percent signal and get 30 percent noise!

BTW, one headache which this idea would take care of is the eternal question: “What is news?” Whatever news means to you is defined by what you “gesture”. Hence the more accurate question to ask would be: “What is relevant?” or, indeed: “What is interesting?”

Like said, I’ve been pondering over this stuff for a while and I’d just love the opportunity to help make it happen.

‘Channels’ does not sufficiently describe the dynamics of distributed online conversations

Interesting conversation about "channels" developing here with Bill French.

Totallly agree that people create channels in efforts to create order from chaos. The way I used "channels" in my post on ‘The End of Channels?‘ was with the traditional notion of, if you will, media titles, in mind: TV/radio channels or shows, zines, newspapers, websites, blogs, forums…

I suppose what they have in common is that they all have a name, an address, and usually a more or less defined scope. They are often furnished with editorial policies and they may be designed to further particular political or commercial interests. Also, most often they have a brand identity.

But if we look passed the keeper of the gate and over the garden wall, I am willing to accept that channels – as in "meta-handlers" – are not necessarily disappearing, but rather evolving into new forms, such as distributed conversations connected by tags.

The point I am trying to make is that old-style channels are designed to contain conversations within them. Sure, they are helpful as meta-handlers in creating order. And, agreed, the new meta-handlers are facilitated by social media, e.g. through tags. However, I hesitate to go as far as to call those tag-connected (micro-content contributions to) conversations, ehm, "channels".

In Dutch, we use the same word for channel and canal: "kanaal". So it won’t surprise you that I quite strongly associate the word channel with a human-made, one-directional, controlled flow.

Bill writes:

"(…) People tend to prefer the benefits that channels provide – they create the notion of a "meta-handle" that makes it easier for them to understand, know about, and share. (…)"

Well, I won’t deny that people find channels convenient. Still, to me, even "virtual channel" or "conversation channel" doesn’t quite sufficiently express the dynamic nature of distributed online conversations. These conversations do not have ONE name, ONE address or even a defined scope.

Tags are useful in searching and navigating these conversations, – in particular because they add social filtering to the mix – and "tag cloud" is a metaphor that helps people venture into the Web 2.0 era.

And yet, even tag clouds cannot contain or accurately scope conversations. The Web, and in particular the social media web, makes our culture and economy more "probabilistic", as Chris Anderson puts it in The Long Tail.

So, why not liberate the conversations from their channels and simply call them "conversations"?

(See also: ‘www.josschuurmans.com: ‘The concept of "conversation" as in the Long Tail of Conversations‘)

Continue reading

Privacy concerns about Smart Digg Button

[UPDATE, May 24, 2007: In his post on HitTail's blog, 'Online Marketing Webinars Coming Soon to HitTail', Mike Levin inadvertently links to the post you're looking at. Instead, the link from Mike's post should really point at: 'Can I talk to you about "northern exposure videos" for a moment?']

Out of the 74 responses to date to Derek van Vliet‘s Smart Digg Button for Firefox, only one, by Muhammad, expresses concerns about privacy, "(…) as this extension tells Digg about every page you’re visiting for as long as it’s enabled (…)".

Derek, I’m certain this is a serious concern to many potential users. Would you care to respond?

Continue reading

Digg.com serves “Story of the Century” on my first day

Solar_system_240x135
(Image: Our Solar System, from Wikipedia’s entry on "Planets", public domain, by NASA.)

As luck will have it… the day I started using Digg.com, it offered me what a journalist on the BBC World Service this morning called "(…) potentially the news story of the century in terms of the future of human kind":

First Habitable Planet Oustide Of The Solar System

(The BBC, of course, have their own version: ‘New ‘super-Earth’ found in space‘)

Why I started "digging" yesterday was that a colleague of mine – after hearing about some of the stuff I do in corporate communications – had asked me if I’d like to help introduce some new collaborative news selection functionalities to Nokia’s intranet.

So I decided to get some first-hand experience with Digg. The concept seemed interesting and so I wanted to get a "feel" for how it works, what works and what doesn’t.

Being a novice to Digg, when I came across the "habitable planet" story I wasn’t quite sure how serious to take it. After all, I was just being taught by way of Neil Patel‘s Beginner’s Guide to Digg that, in order to get read, submissions to Digg should:

  1. Make a statement and do not be dull;
  2. Be controversial and make false promises; and/or
  3. Use keywords in the title that diggers love and that are also relevant to the story.

But, whoa, am I convinced now! Every news junkie has to hook on to Digg!

By the way, the planet story reminds me of a presentation to Pop!Tech, podcast on IT Conversations, in which Carolyn Porco, Imaging Team Leader of the Cassini Imaging Central Laboratory for Operations (CICLOPS) makes a passionate appeal to human kind to make space exploration a top priority.

From the podcast description:

"(…) [One of Saturn's moons], Titan, is where the Huygens probe landed in January 2005. From the panoramic images taken during the decent and the all the data that has been collected since, the CICLOPS team is excited to see signs that fluids once flowed over the surface, that the atmosphere has precipitation and that the probe itself may have landed on a shoreline. All-in-all, the Titan moon may give us a significant glimpse of what the Earth was like before living organisms. (…)"

PS: To support the point of yesterday’s piece on Wikipedia As A News Medium, the site’s entry on Gliese 581 c seems nicely up-to-date.

Continue reading

A business model for collaboratively filtered news?

SUMMARY: Amazon.com has demonstrated the power of collaborative filtering when it comes to selling books. But how about a hyper-personalized, collaboratively filtered news offering? The main challenge may be the business model. Would ad revenue be able to cover the cost?

Via ‘Wink Search‘, an entry on Zeevveez‘s QTSaver blog, I found an interesting review of ‘New Ideas in Search (Wink, Gravee)‘, on Business 2.0′s B2Day blog. Erick Schonfeld writes:

(…) Wink is very much a social search engine, since results are based on how other people previously rated and tagged things. The question is: Will a search based on public tags turn up substantially different results than a regular Google search based on link popularity? After all, at their core both are based on humans making their preferences public (one by explicitly tagging a Website with a descriptive keyword, the other by linking to it). (…)

I agree with Erick that one might wonder if Wink can do a better job than Google, considering that both engines rank search results by popularity. (And by the way, when it comes to tagging and searching, del.icio.us does a very good job, too.)

My interest in search is inspired primarily by one use case: hyper-personalized news provision. When it comes to search relevance, I’m convinced that artificial intelligence is the Holy Grail.

So I’ve been wondering if anyone is working on an Amazon.com for news. RSS feeds rule, tagging is the tool, Google is gool, but the best way to filter news by relevance is by looking at the news preferences of like-minded users.

Think about a collaboratively filtered news offering. If you and I have had very similar patterns of news consumption in the past, and you have already read and rated a particular piece of news, chances are that I will be interested in reading it, too.

There is an important difference between link popularity and collaborative filtering. Link popularity tells us which search results are considered most relevant to a particular search query by "everybody" (that is, anybody who ever published a link or tagged a piece of published content). Collaborative filtering, on the other hand, sorts search results on the basis of what is know about me, compared to people like me.

That’s the power of Amazon.com when it comes to selling books.

So perhaps the main challenge with collaboratively filtered news is the business model. It can only work given a critical mass of users. Which means that the service should probably be offered for free on the Internet. Does that mean ads would have to pay for it? And could they?

Continue reading