Zoetrope promises a view of the Web over time
While it's just as easy to locate a 15-year-old page in HTML 1.0 as a blog post published this afternoon, the front page of last Friday's BetaNews is as gone as 49-cent gas. Enter Zoetrope, aiming to track Web information over time.
Time keeps on slippin' online -- but wouldn't it be cool to actually happen in something faster than real-time?
For now, Zoetrope is mainly a research project -- a joint effort from researchers at the University of Washington and Adobe Systems. The paper "Zoetrope: Interacting with the Ephemeral Web" (PDF available here) describes a fairly complex system of tools working in concert:
- a visual programming toolkit and set of interactions for building queries
- a new semantics and set of operators for manipulating streams of data gathered over time
- indexing structures for handling the dataset
- new ways of re-rendering pages in the dataset, and
- the dataset itself, collected as often as needed (hourly? daily? by the minute, for certain purposes?) by a newly designed Web crawler.
Hasn't archive.org already been invented, you ask? Yes, and if you're wondering how a certain site has looked over time, it's a marvelous library...as long as by "over time" you generally mean "every few months." Zoetrope, on the other hand, envisions you tracking changes by the day or even by the hour, and linking them to changing data on other sites.
Eytan Adar, a researcher at the University of Washington, has published a number of papers on "temporal informatics" -- how the Web changes over time, and how we interact with that changing data. Zoetrope is part of his dissertation, currently underway, and he is the project lead.
Watching the successive changes to, say, BetaNews could be amusing, but the juice of the Zoetrope is in its lenses -- bits of code that excerpt part of a page and allow you to follow strictly that bit over time.
The flashiest version of the tech is the visual lens, which would let the user simply drag the mouse to tell the browser what to keep an eye on. But there are structural lenses and textual lenses, both of which compensate to some degree for the annoying tendency of site designers to scramble things around because they can.
Lenses, in other words, focus on one chunk of information on one page over time. That could be a lot of data, so the plan provides for filters on lenses -- "Watch the top-headline spot and grab anything that mentions the iPhone," for instance, or, "If nothing's changed on this page, don't retain a copy of the current peek."
Lenses and filters are "bound" or 'stacked" to form queries -- say, "tell me how the stock market reacts to news of variations in component supplies from various manufacturers, how long it takes for the market to show those effects, and which news sources have the biggest impact on prices." Results could be shown in various formats (timelines, cluster visualizations, movies), and the team has created a tool that ships data out to a Google Spreadsheet for even more fun.
There are already sites that let you track data over time, of course. Zoetrope's strength, as Adar explains, comes when people ask new kinds of questions that require synthesis of data from multiple sources. That may sound like deep water for people who don't spend their days correlating data, but don't count out the civilians. Adar says that though some advanced users would be comfortable designing their own queries, "What we've been thinking about recently is how to go beyond this core set of users to a broader audience. An interesting direction is to let people export the visualizations they create and allow those to be embedded in Web pages or e-mailed around."
It all sounds splendid, especially if you've ever searched cached Google pages or archive.org for something you know you saw once upon a time. As the project moves from research-project status to reality, though, it's going to take resources beyond the university. Adar expressed an enthusiastic wish to find some way to work with the archive.org crew, and foresees several possibilities for managing the tech resources necessary to make Zoetrope spin.
As Zoetrope's currently conceived, it doesn't require any action from the owners of searched sites -- only from the would-be Zoetrope user. Not that a little help wouldn't be welcomed, of course.
"It would be wonderful if site providers found enough value in Zoetrope to be willing to consider modifying their sites," says Adar. "I actually think that very small markup changes to the sites would help Zoetrope a great deal. For example, if the sites added 'hints' to the HTML that better marked what parts of the document were going to be consistent that would really help the way Zoetrope worked. But we're pretty pragmatic about it...it would be great to have, but we have to try to build solutions for Web sites that don't provide this information."
Even you, honored BetaNews reader, could be part of the solution. "We also think there is an opportunity to have Zoetrope work in a distributed fashion (P2P), since it would be hard to have one central service collecting every version of every page," says Adar. "Being able to have multiple 'observers' working together would be great, and having some standard to share this data would make a lot of things easier. An API for letting people build new applications using the Zoetrope data might also be something for us to look at in the future."