Booting up a personal recommendation system for news

As I mentioned yesterday, I’m a big fan of ‘Rebooting the News’. That goes for both meanings: I love the podcast series by Jay Rosen and Dave Winer; and I’m also totally intrigued by the phenomenal transition of our system of news which is happening right under our noses.

In the 9-minute passage of RBTN 82 that I transcribed, our hosts talk about an idea that Dave put forward in a recent blog post, ‘Find me stuff that I’m interested in‘. It’s a discussion about the concepts of a personal recommendation system for news, on Dave’s part inspired by collaborative filtering technology which underpins Amazon’s personal product recommendations.

Not only do I agree with all the conceptual choices that Jay and Dave favor, – such as avoiding categories, using gestures, using feeds, looking at other users’ previous behavior, including information about authoring as well as consumption, including serendipity… – ; I have actually been thinking about these exact concepts for years.

Now, I’m not going to say, “It’s all been done already”, because Dave would think I’m trying to pitch a product :-)   Truth is, had it been done, we would all be using it. A personal system of highly relevant information is pretty much the Holy Grail of the Internet.

One potential complication with applying collaborative filtering to news content is that, when news breaks, there is no critical mass of gestures from previous users. This may cause some delay in the build-up of a recommendation. Instead of immediate, mass-scale amplification of the breaking news event, the news item might be a more slowly developing “trending topic” as per Twitter.

Also, when the news is very fresh, and its relevance is very personal (i.e. highly relevant to a small number of people), it may take too much time for a collaborative filtering system á la Amazon to collect sufficient gestures from other users in order to deliver the recommendation to the right people.

Therefore, rather than waiting for a new news item to pick up the critical mass which can enable collaborative filtering the Amazon way, we could instead look at the *history* of users’ gestures. If the stuff I have “gestured” in the past is very similar to the stuff you have “gestured” in the past, there is a likelihood that what you “gesture” next will be of interest to me.

So what I propose, instead of collecting many gestures from different users in order to generate a recommendation to one specific user, is to identify pairs of users whose gesture behavior is most similar, and let their behavior inform their mutual recommendations.

One could calculate a “similarity-percentage” for each combination of two users based on their gestures. With a view to serendipity, the ideal similarity is not necessarily approaching 100 percent. The system could offer users a feature to mix their own doses of serendipity. Want more off-beat news today? Turn the potmeter down to 70 percent signal and get 30 percent noise!

BTW, one headache which this idea would take care of is the eternal question: “What is news?” Whatever news means to you is defined by what you “gesture”. Hence the more accurate question to ask would be: “What is relevant?” or, indeed: “What is interesting?”

Like said, I’ve been pondering over this stuff for a while and I’d just love the opportunity to help make it happen.

Transcript of 9 minutes ‘Rebooting the News’, episode 82

I have listened to all 82 episodes of ‘Rebooting the News‘, the podcast series by Jay Rosen and Dave Winer. (This probably means that I’m their biggest fan and/or that I should get a life)

I felt an urge to transcribe the following 9-minute passage from episode 82, recorded on February 14, 2011. (More about that later)

Forgive the occasional typos and other glitches.

How is that for a gesture? :-)

http://soundcloud.com/josschuurmans/rbtn82-9mins

[STARTING AT 04:43]

Jay Rosen: ‘Find me stuff that I’m interested in‘.

Dave Winer: Yeah, oh, that’s not a question for me, is it?

J: That’s an opening for our next theme here. This is something that’s interested you for a while; it’s interested me for a while.

D: I don’t know. No, actually this is a recent thing. This is recent. This is like the mantra, you know, when you are a product developer camped out in a category, you know – if you’re listening -, you know what people want. I mean, you get that short list of features that everybody wants and on that list are some thiings that you have no clue how to do. But you’re listening and trying to understand it. And *this* is at the top of the list.

Absolutely the one that you hear the most often is: ‘Just find me what I want.’ Now, my brain kinda turns off when I hear that, ’cause what I think is going to happen if you ever trust somebody to do that for you, they are not going to give you what *you* want; they are gonna give you what *they* want you to have.

That’s what I worry about, that you’re not gonna get… So, any diet of news that I’m interested in has to also include subscriptions to places that are going to give me news that I don’t know that I’m interested in.

J: That’s one of the problems.

D: Well, that’s easily solved, actually. Just take… but you know, here is the model. You might say that I’m addicted to Amazon. I just, like, in an idle moment, if there’s nothing happening in the world, I’ll go to Amazon, I’ll go through their recommendations, right?

J: Recommendations for what?

D: For products that they want me to buy. Things they want me to buy. So I can influence that, I can definitely influence it. Like, I was looking for a lamp a couple of weeks ago. And now they show me lamps. Or, I buy a lot of shirts through Amazon and… I always get shirts. I buy a lot of books, I get a lot of books. I’ve bought stereo equipment, computer nerd stuff, vitamins… This is an interesting mix…

J: It’s just reacting to what you bought before.

D: And I can manipulate it by just looking at things. I can inform them that this is an interest of mine. And they will start recommending things for me. I think, well, the epiphany was, why don’t we do this for news?

What we need is a way of expressing an interest in a news area, right?

J: Right.

D: In other words, the equivalent of looking at lamps. Or the equivalent of looking at cameras. Well, I look at a story about prince Charles, right? So, the system infers… Maybe I don’t look at a story, but I tweet a link to it.

J: Well that would be a stronger signal.

D: And maybe that’s the only signal I want it to use, is the fact that – and this is a way that I have become… I think of this as becoming my own editor-in-chief.

J: Yeah, I would love that. If it took everything that I tweeted…

D: Actually, you know, the technology…

J: That’s not a bad idea.

D: The technology here is…

J: It can’t be that far away.

D: It’s not far away. I was about to say, we know how to do this. This is like a well-worn path. It’s not something, not a whole lot of innovation, *no* innovation needed here.

The bad new is that, as far as I could tell, only one or two people reading that blog post understood what I was talking about. ‘Cause the responses that I got were like, oh that’s already been done.

J: People always say that.

D: They do. And they’re always wrong. Because usually they are the people who made the product and they are pitching it. They are trying to sneak in all their spam there.

So, I don’t know, if anybody listening to this wants to do this, just let me know. I want to do it. I’d like to get into a position to do this.

J: Every time I look at a product that [claims] to be able to do this, to send me a quote-unquote ‘personalized news stream’, the problem I find is that they have these pre-fab categories that represent what *they* think of as the significant divisions of news, right? Like: ‘business’. Well, I’m not interested in ‘business’.

D: That’s bogus. This is why I get bored, my eyes glaze over…

J: It’s a category of production, it’s not a category of use.

D: Correct.

J: And that’s the problem…

D: Do you know why it’s a problem for them, is that they’re not… First of all, this wouldn’t work for everybody. Okay? Let’s be clear about this.

J: Right. What I want is something that works for me.

D: Exactly. And you would be easy, because we already have a very good handle on your stuff.

J: [Well, I would be...]

D: We have it in a database. I have your links in a database, right?

J: Right.

D: So, building it for you would be easy. And you know, once you get a little critical mass thing going there, it is just self-maintaining. Because the things I link to, you know, if I have a hundred people in this mix, I can now do collaborative filtering. That’s the name for the technology you use here.

J: Right, so give me a quick sketch: what is collaborative filtering? I think I know, but…

D: ‘People who like this, also like this’. That’s the idea.

J: Right right right.

D: It’s like Facebook recommending friends to you. It has noticed that you and this person are friends with five other people. Therefore we might guess that you might like anybody that this person is a friend with. So we’ll start suggesting this to you.

[Technician:] …products…

J: Product, yes, Amazon does that.

D: News is a product just like that. There is no reason news can’t submit to this. Also there is another source of valuable information here that could be used is your blog. You know, I’ve been blogging for god know how long. That is an incredible base of information about my interests.

J: Right.

D: So, I’ve always said Google ought to take that into account. I ought to be able to tell Google, ‘Hey Google, this is my blog and I can prove it to you.’ Okay? Now I want… or, why should I have to prove it? All I’m saying is I want my search results to be customized for the author of this blog.

J: Right.

D: Period. You know, that’s how Google… Everybody says, oh, we need a new generation of search. Why hasn’t anybody tried this yet?

J: Right. So, instead of using consumption behavior as the signal, you use authoring behavior as the signal.

D: That’s correct. Yeah.

J: Now we’re cooking.

D: I think we’re definitely cooking. I think, this is a business model by the way that would work for editorial organizations because the way we evolve something like this requires an understanding of news. Which the tech industry typically, as you have noted, doesn’t really have.

J: And also, the more of your user base you have authoring, the better the recommendation engine gets.

D: Always. That’s exactly how this stuff works.

J: And that’s the incentive – right? – to get more people blogging at your site an recommending things and sending links and comments and… yeah.

D: Well I don’t want people blogging on anybody else’s site. ’cause I want them to operate their own infrastructure.

J: Right.

D: But we’ll get to that later.

J: But you could affiliate your blog with the news system you’re using and it could therefore learn from what you blog about [...]

D: Oh, absolutely. Just give me the pointer to the feed. Or give me the pointer to the blog, from there you can get to everything. There is no… absolutely. But I think… just because you use Tumblr and I use Tumblr doesn’t mean we have anything in common as far as our interests are. [...] the list of feeds that I subscribe to might give you another good idea.

J: Here’s what I like to do. I don’t read most 99% of the news or commentary written about the NBA. I’m not a big fan of the NBA. However, if anybody writes and article about race and the NBA, I want it. Because it’s like this hidden subject that almost never gets talked about. Like, black players, white players, white coaches, black players, the compositions, the racial mix, different ways that these things play out in the politics of the sport. Like, I’m totally fascinated by that. But, the system as it stands says, ‘Do you want NBA news?’ No, I only want ‘race and the NBA’-news.

D: You can’t… I think that the point here is that you could never ever customize… you can never be the editor-in-chief of your own news channel by setting up queries like that. It has to be done with gestures. It has to be inferred from…

J: ‘With gestures’. What do you mean by that?

D: Gestures would mean pointing to… pointing to an article is a gesture. Reading an article is also a gesture.

J: Right.

D: As you pointed out, pointing to intuitively feels as a stronger endorsement, a stronger gesture if you will.

J: Well, let’s move on. That’s definitely need.

[ENDING AT 13:53]

Removing the clutter around published content with Readability

I heard about Readability on the Rebooting the News (RBTN) podcast, episode 55, where Rich Ziade was a guest. Seems very useful. I'm thinking that we might want to apply a similar technology in the content retrieval algorithm of Cluetail Radar Pro – Cluetail Ltd's on-line Media Intelligence tool.

http://vimeo.com/moogaloop.swf?clip_id=8798492&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1

Readability – Installation Video for Firefox, Safari & Chrome from Arc90 on Vimeo.

Capturables from Rebooting the News #10

Just arrived to the office. Lots of stuff I feel like unloading.

On my way here I listened to episode 10 of Rebooting the News. I think it was one of the best shows in the series so far (among the first 10, that is – I have some catching up to do).

Jay Rosen makes two very pertinent connections between the tech world and journalism. The first connection is about bug catching, a very common and appreciated practice in software development, but very under-utilized and unappreciated in journalism.

In software development, everyone acknowledges that you cannot ship a perfect product. There will always be bugs and users are actually thanked for pointing them out. In journalism however, the expectation is that journalist check and double-check before they publish, and then ship a "perfect" product. If a reader points out a mistake or contradiction, typically the journalist either doesn't respond at all, or responds in a defensive fashion. Jay explains it as tribalism.

Blogging seems to allow for a less defensive attitude. Blog posts are perceived as less finished or less perfect, and bloggers seem more willing to correct and update their copy, while acknowledging readers' feedback.

It's an interesting phenomenon to point out and certainly something that needs to be addressed in the "new news system".

The second connection Jay makes is about usability. Why are geeks not better at making things easy to use? Dave Winer says it's because it's so damn hard to do. And it requires a great sense of empathy – the ability to put oneself in the users' shoes. He mentions Martin Scorsese and Marlon Brando.

Jay sees a nice parallel in that journalism is about making it easy for users to user their own democracy, lowering barriers to participate without much prior knowledge. (This is so true and elegant!)

What else? The Church of the Savvy. That's Jay's description of the undeclared religion of the press. Above anything else, journalists will value, remain loyal to and defend their savvy-ness.

Jay's inspiration of the week is Elvis Costello's recording of Nick Lowe's classic, 'What's So Funny 'Bout Peace Love and Understanding'.

Note-to-self: action points:

  1. Check out Jay's tumblr blog – I didn't know he had one, and I was wondering why Google Reader hasn't served me any blog content from Jay lately (I've subscribed to PressThink);
  2. Check out blogtalkradio, which is what Dave is using for these podcasts. I need to figure out a way to produce podcasts easily and economically.

[REPEAT from June 1: Dave built a dedicated site for 'Rebooting the News', at http://rebootnews.com/. He also created an RSS feed of this podcast series, at http://rebootnews.com/rss.xml. And a package of the first ten episodes which he uploaded as a torrent to Mininova at http://www.mininova.org/tor/2637891. He announced all of this here: http://www.scripting.com/stories/2009/05/30/rebootingTheNews110.html]

And don't miss the FriendFeed room either!