Wednesday, February 11, 2004

My favorites: A killer application for WinFS?

Update 1:
I realize that some human readers tried to go through the following post but had to give up after a few paragraphs of my horrible English prose. I will be trying to improve it, but it will take time... It is three AM and I have drunk 3 different kinds of excellent white wine and one kind of red wine, So I better not touch it right now....

Update 2:
I see some people still try and fail to read this article. The only comment so far belongs to a guy that is working along the same lines on the Mozilla platform (it seems to be very interesting, so go check for the link in comments). So I am doing my best right now to improve the readability.

The original article:
I have been playing in my mind with the idea of a better Favorites feature for years, long before I heard about WinFS. The idea first came out of my own basic necessities, and I have been gradually enriching it with different concepts I have picked from everywhere.

When I saw the "Stuff I've Seen" demo I realized that this is is closer than ever of being implemented by Microsoft using WinFS.

Problem statement:

Everyday in my geek life, I have to do some research on different subjects. I usually start in Google typing some words and in a few minutes I have hundreds of Internet Explorer windows open. I try to discard the useless material as quickly as possible, but usually I find lots of interesting and lengthy articles that I mostly read only superficially. Very often I have to go home and I still have tens of windows opened with possibly promising content.

The spectrum of the subjects I keep track of is so diverse and is scattered on so many dimensions! Besides my current work, I usually keep and eye on technical subjects like operating systems, development platforms, hardware technologies, technology companies, certifications, etc. And I also like to read fresh news about some computer science subjects like evolutionary computation, neural networks, and other forms of machine learning. Moreover there are people I like that have their web pages and publish pictures of their families. I also read lots of different news pages and blogs (however I am aware of the alternative of using an aggregator for blogs).

So I am always trying to keep a collection of valuable links to the most diverse array of subjects for future reference. The problem is that there seem to be no effective way to keep this huge link collection sorted out in a way that makes it useful later.

The Favorites feature in Internet Explorer (and in other browsers) is for me, in many ways, the equivalent of the "filling forms of metadata" approach that provides such a poor experience for average users. It takes to much time and energy to keep an organized tree of all the things I am interested in! Whenever I try to clean up my Favorites, it can take a day, so I don't try anymore.

So what I do lately is the most ridiculous thing: I don't use favorites, but I right click on the page and use the "Create Shortcut" menu (sometimes I cannot do it this way because of frames, but this is another story).

So yes, that is true, Jim Allchin cleaned up my desktop so I could fill it with hundreds "e" icons.

The reason this is not an absolutely ugly solution for me is the same reason caches are such a successful mean to improve performance: temporal locality. The links in my desktop are usually the most recently created links, so they are a good sample of what I have been investigating in the last days and of what I will need to find in my close future.

From time to time I do my clean up by moving the links to a special folder (stupidly named "Stuff") I have on my desktop.

Currently the "Stuff" folder is the software incarnation of a black hole. There are more than two thousand links in there. There is no basic structure or taxonomic index for the links there. I only keep the page titles (which too often is "Cannot find server" because of a conspicuous bug in Internet Explorer), and the dates.

I am sure 10% of the links point to pages that are not online anymore. I also know some links are more important than others to me, some are more popular than others, some pages are interlinked among them, some must be child pages of others, some are updated often, and some I have visited again more than twice in the last month and some others I haven't. All this could be useful metadata if there were a simple way to gather it.

So why do I still feed this black hole? Well, all this shit it is waiting for the right killer application to sort it out, to make multiple graphical maps of it in multiple dimensions, to help me filter it out every time I am looking for something I know I have already seen.

As I said at the beginning of this entry, I have been playing with the idea of implementing such a tool for years now. Of course I haven't. Instead I have come to the conclusion I won't ever do it alone. So I am blogging it in case somebody already wrote the program, or in case somebody wants to pick the challenge up. Anyway, I would love to participate in a project like this.

Envisioning the solution:

I have seen many technologies I though I could use to build such an application. Here is a little list of those thechnologies:

Web services: Of course, some of the required processing is too heavy for a client computer with a thin pipe to the Internet. There is huge potential for the use of a smart client connected to set of heavy weight web services.

Latent Semantic Indexing: I found about this when researching for alternative ways to evaluate the semantic proximity of documents.

"Stuff I've Seen" (SIS): It is an application that Susan Dumais demoed at the Bill Gates speech in Comdex 2003. Actually Susan was a prominent researcher in the Latent Semantic Indexing camp before joining Microsoft. I am not very familiar with SIS, but when I saw it, I realized it provides most of what I have been thinking about and then some more. I would just add the ability to "flag" pages I see in internet explorer with different levels of importance, or with some different connotation. Not everything I have seen, I want to remember! The best think I could think of is that it should allow me to assign an emoticon to each entry, SIS could also have a chance to send down things I don't really care. That would eliminate completely the need for "favorites". OneNote like functionality should also be very tightly integrated to SIS. Of course SIS has also the potential of registering how much time I have spent with each piece of information, or even use some biometric to determine my level of interest. I would also like it to be more of a "launching pad" for all my days' activities. It could become some kind of mix of Outlook Today and Start Menu. It should be time aware in some other sense. For example, if I read the same group of blogs everyday at 9:00 am, it should automatically show those links on top.

Of course, at the core of "Stuff I've Seen" you will find WinFS.

WebSOM: An application of self-organizing maps to Internet Exploration. I have been fascinated with the potential of Kohonen's SOMs for a long time. I once implemented one in Visual Basic!

Buzan Centres: A powerful graphical representation technique that try to mirror the way we organize information "inside".

Hyperbolic trees: Looking for the original company on hyperbolic trees (it took half an hour because I couldn't find the link in my Favorites!) I found that they have already coupled their visualization technology with some advanced knowledge management tools. The only bad thing is that it has nothing to do with managing Favorites links, but I have to say wow!

PageRank: it would be great to be able to integrate some Google's web services. I would ask Google for the PageRank of each page and I would store it and use it for sorting search results.


Semantic Web: I don't consider myself either a believer or a detractor. I just think that the SW can provide an even cleverer heuristic than PageRank, and thus I see it as valuable.

MSDN Favorites Web Service: It was an attempt to build a Web Services version of the Windows Favorites feature. It was interesting. Unfortunately it seems to be no longer available.

What else would I do?
  • I would also like to use any taxonomic web service out there if it existed. You pass your URL and you get the ID of a node in a taxonomic tree.
  • I would cache versions of pages locally in case they get inaccessible or change over time.
  • I would flag the ones that aren't already online.
  • I would keep the name of the links updated with the current titles of pages.
  • I would make use of time data like creation time, last access time, and usage frequency as a way to sort things out in graphical representations.
  • Etc.

Moving to MSDN

I haven't decided yet, but it is very likely that I will stop blogging here for some time. For some background, I have moved to the sate...