Part I – Google News Personalization: Scalable Online Collaborative Filtering.

These are some notes for my data mining class. I haven’t read the entire article but this is what I have so far.

Main Focus:
The papers main focus is on google’s news web site, http://news.google.com , and how to recommend news stories to users. The focus is on a recommendation system to allow the user to find useful information that cant be easily located using a search engine.

Why Google News
Google News was used due to the amount of users that use the service, according to the article, millions a month, and the amount of “Item churn” that is produced by the site. To clarrify, Item Churning is the process in which news stories are added and removed from the site. In this case item churning is high since news articles are added to the site every few minutes and existing articles are removed from the site in the same amount of time.

Rating System vs Points Earned System
The paper touches on a few algorithms used but ill go over what the overall idea is. The algorithm allows the user to click on an article and assign it a positive point. For example. If i click on a article the article get one point while a non click gives it no points at all. This is in contrast to a rating system much like that of Amazon.com where a user gives an item 1-5 rating.

Limitations
The process proposed cant give an article a negative rating. If the user clicked on the article yet hated it there is no way to determine this, the article will retain the positive rating. Compared to that of amazon’s and netflix where a rating of 1 can be a sure indicator of a negative feeling towards the media.

The Math (Notations Only)
N = total number of users.
U = Collection of click history of users.
Cu = Specific click history set of user X.
s = stories within the click-history set.

What is a click history? A click history is a collection of article the user has gone to in a spefic amount of time. For example. If i click on the news article, “Snoopy, hacked the Pentagon”, then I read, “Armandos Dog hacked the Pentagon”, and finally I read, “Worlds greatest dog is computer wiz – owner say, ‘Pssh i know'”, the click history for user1 will be:

Cu = {“Snoopy, hacked the Pentagon”,
“Armandos Dog hacked the Pentagon”,
“Worlds greatest dog is computer wiz – owner say,’Pssh i know'”}

And s = the stories above in no particular order.

Ill add more and revice as I keep reading the paper.
http://www2007.org/papers/paper570.pdf

Armando Padilla

Add a Comment

Your email address will not be published. Required fields are marked *