dLIS 540 Information Systems, Architecture and Retrieval
Autumn 2007

Terry surveys Digitopia! Docusphere! Infomania! and other stuff






A student writes...

you mean that future generations will approach information retrieval systems and use words that we cannot anticipate? i'm not sure how i can anticipate words. future words. how does xml help me predict the future?

I completely share your concern that tomorrow somebody will approach an information retrieval system and use a word that I've never heard of.

I, too, don't know how you can anticipate new words.

As far as I know XML is a protocol for marking up information and has nothing to do with predicting the future.




A student writes...

1. on your blog you specify that we can now predict how language will evolve, so i am a little bit lost there. i think maybe that we can let language evolve, but still understand xml documents because of how they are coded so we can understand the past and put them into whatever program, whenever. which is sorta the point of decorating martha's data.

On my blog I pointed to an article published in Nature where some linguists had studied the growth and change in word forms. It might be a little strong to say as you do that "we can predict how language will evolve" but I'm sympathetic to your point.
Your second thought is that we will be able to understand xml documents because of how they are coded, even though language evolves.

Generally speaking I would caution mixing these two ideas togther.
THOUGHT # 1: Scientists seem to have evidence that language evolves.
THOUGHT # 2: Because of the universality of XML as a archiving and distribution format, many (but not all) application software will be able to suck up information marked up as XML.

If we mixed these two ideas together, is it possible to ask that if I mark up my information as XML does that mean I can finesee the evolution of language. My considered opinion: No.

To illustrate my point, I received an e-mail this morning from MorUmp, a caveman who lived 10,000 years ago (i.e., the Internet connection to his cave had been offline for a long time and just now, messages that he composed long ago are being sent). MorUmp sent me the following XML document, but as you can see even though his XML document is well formed, the evolution of language makes it hard to understand:

<poorut>
<oopMo>BarBQ tonight</oppMo>
<suumTop>7:30 pm my place</suumTop>
</poorut>


As this example illustrates, the language of XML tags would evolve as well and thus we "moderns" have difficulty understanding exactly what MorUmp was attempting to communicate. Sure hope this explanation helps...



A student writes...

Hi Terry,
I'm wondering if you could explain what the professors can see as to student activity on go post. In the past Grace has said they can track all activity when a student is signed into the class. Recently I've heard that professors receive a daily report of who has posted to go-post. In particular, I'm wondering if Instructors can see how much time a student spends on line reading the threads. This really has nothing to do with LIS540, since you don't care about weekly postings. Rather, I feel comfortable asking you the question, I think you know the answer, and it would be convenient to receive the answer on your blog.
Generally speaking, I'm not comfortable posting, but I always keep up with the discussions. In the past I've lost too many grade points for not posting, so I'm just curious as to if my time spent reading posts can be taken into account.
Just wondering,



Hi "Just wondering"
It would appear that the new Catalyst GoPost tool keeps a profile of activity of each participant. Here is mine as of October 15, 2007



As I said at the residency, I couldn't care less if you participate on the chat lists or not. Life is too short for such silliness.



A student writes...

I tried a simple "XML" file, but I am not sure it was really XML - that is, I don't know if I marked it correctly as far as the computer is concerned. Excel sure rejected it as XML. It didn't import into Access (or File Maker Pro - I am a Mac user, after all) either.

Here is the file that the student created...



No wonder nobody liked this file. Look at all the formatting garbage at the top and the tags are not well formed and balanced.


Here is a simple XML file that I created. Notice that it doesn't have any formatting garbage in it and that it is well formed.



So I fed my XML file into the MS Excel (Version 2007) and this is what my XML file looks like as a spreadsheet!



Then I fed my XML file into MS Access (Version 2007) and this is what my XML file looks like as a database!



Then I fed my XML file into MS Word (Version 2007) and this is what my XML file looks like as a literary document!





The first deliverable just got harder to do!

Now we can predict how quickly language evolves...Ouch!



I thought that we could just build huge databases and information retrieval would be cool because language was static. No, wait a minute, maybe only documents are static. Language is dynamic but documents are static! That means that future generations will approach information retrieval systems and use words we can anticipate! Good grief! There goes information retrieval! But at least we'll have huge databases of documents. Nobody will be able to use them, but that's another issue!



October 11, 2007
...Germans [say] "schwanz" and the French [say] "queue" to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly—such as the number 'two', for which all Indo-European language speakers use the same related word-form1. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English2, Spanish3, Russian4 and Greek5) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages6 to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
"Frequency of word-use predicts rates of lexical evolution throughout Indo-European history" Nature, 449, 717-720 (11 October 2007)



Search without a query

October 8, 2007
There was an interesting article in the October 7, 2007 NY Times about StumbleUpon. I wove it into our course on the "Finding lots of stuff" page.

A Web service called StumbleUpon has spent the last six years trying to satisfy such a need, perfecting a formula to help you discover content you are likely to find interesting. You tell the service about your professional interests or your hobbies, and it serves up sites to match them. As you “stumble” from site to site, you will feel as if you are channel-surfing the Internet, or rather, a corner of the Internet that is most relevant to you.
Web discovery, or search without a query, is still a niche activity, but StumbleUpon’s growth to 3.5 million registered users from 600,000 two years ago suggests it is on a path to becoming more mainstream.

What is historically significant about 'search without a query' is that it begins to free search from being an exclusively language-based activity.


A Greasemonkey example

October 5, 2007
In the category of "practicing what you preach" - here is a real example of altering webpages for personal convenience. I need to logon to the Wiley Manuscript Central site about once a month. I can never remember the password, etc. I wrote myself a Greasemonkey script that replaces a bunch of needless verbiage with my important information. [Don't ask me questions such as 'why not let your browser remember the password'? or 'why not write the password on a yellow sticky and put it next to your computer monitor? or 'why not fight mental decline by memorising the password'? = I have no good answers for these questions.]


UWEO Server down! Ouch!

September 30, 2007

Dear students,
The server that hosts the distance lectures will probably be down the entire weekend. When someone is back in the office at UWEO on Monday, I expect the problem will be resolved very quickly. I do not know anything more about the problem than the server is inaccessible.
My apologies for this downtime. Please focus on the readings, discussion and assignment sections of your first module while this is resolved.
Sincerely,
Grace Whiteaker

NOT THE DMLIS 540 COURSE! We're being hosted by the iSchool server! Horaay for us!



Online advertising and Internet searching: Conversion attribution

September 26, 2007
Microsoft is taking solid aim at a business that is arguably outside its core competence: advertising.
Mr. McAndrews has a long-term strategy that boils down to divorcing online advertising from Internet searches. The two have been viewed as a couple, because so many people use portals and search engines as their home base on the Web, but Mr. McAndrews says that model shortchanges advertisers and Web publishers.
Mr. McAndrews’s proposed system, called “conversion attribution,” would track all of the online places where consumers see ads and give advertisers a fuller picture of the various ways that consumers reach them. Tracking is important, because the site that gets credit for prompting a user’s visit is the one that gets paid for it.
Mr. McAndrews contends that search engines, which long have claimed credit for sending people to companies’ Web sites, do not deserve it all.
“Google gets all the credit, and in fact, you might have just gone to Google to type in the U.R.L.,” Mr. McAndrews said, pointing out that people often search for companies’ names after seeing their ads elsewhere.
Using technology from aQuantive’s Atlas division, Microsoft will be able to provide advertisers with a log of all the places on the Internet where people see ads before going to the advertisers’ Web sites. The data is based on individual computers’ electronic signatures, not individual people.
"Microsoft Takes Aim at Google’s Ad Supremacy" The NY Times, September 26, 2007

Snipes and bugs

September 24, 2007
In the category of 'you learn something new everyday':

Snipes are just the latest effort by network executives to cram promotions onto television screens in the age of channel surfing, ad skipping and screen-based multitasking. At first, viewers may feel a slight jolt of pleasure at the sight of a new visual effect, they say, but over time the intrusions contribute to the sense that the screen is far more cluttered — not just with ads, but with news crawls and other streams of information.
For better or worse, viewers say, the additions are making the experience of watching television more closely mirror the feeling of using a computer.
That may be so, network executives say, but the extra content is here to stay. The snipes — not to be confused with bugs, those network logos that pop up in screen corners during shows — are important enough to the beleaguered television industry that the networks plan to tolerate the backlash.
"As the Fall season arrives, TV screens get more cluttered" The NY Times, September 24, 2007

A student writes...

Hi Terry,
Am I correct that in your essay on Orthography for the first week 'EPIC' should be 'ERIC'? I am finding 'EPIC' repeatedly, but it seems to be referring to 'ERIC' since the first occurrence clearly refers to OCLC's on-line database retrieval system. I have found this enough now that it has become a little confusing. I thought you might wish to know.

I wrote that essay a number of years ago and it reflects searching during the "modern database period" (i.e., before the Web), say, 1960-1990. During this period one could search the ERIC database on Dialog, DataStar, BRS and OCLC EPIC. Each had its own search language and conventions. The point of the essay is to display how orthography is handled variously by these popular search query languages.
Nowadays, I believe that OCLC doesn't offer EPIC searching anymore; BRS is gone bye bye, and Dialog, which bought DataStar, has been marginalized. I believe that most people search the ERIC database on the Web for free. How tragic for me! I once was an expert on those systems (i.e., fluent in the four searching languages). So much of my youth was spent learning those systems; how we misspend our youth!
Note that the orthography problems haven't gone away...they simply re-appear on the Web. The orthography problems are intellectual/conceptual/linguistic problems that are sand in the gears of such information systems. [Such is my opinion, you are free to disagree.]



A student writes...

Dear professor,
Considering that we are the readers for the first reading I would like to ask whether we should start posting related to it starting from next Monday, when the quarter officially starts, or only after the residency?

Do it!
Digital media transcend time and space...our course has 'already begun', the course pages are 'already available', you don't have to wait for the residency. At the residency, however, I'll explain the function of our 'expert readers' and how I would like them to read their paper, create a chat thread about their paper and stand ready for the questions and comments from their classmates.




A student writes...

Hello again, Professor:
I meant to ask if it was intended that the class web-page not be secured. For all of my other courses, to the present, I have had to login through MYUW to get to the class web-site, but with this course I did not. I thought I'd mention it in case this was an oversight or there is a glitch.

No oversight, no glitch. Like the NY Times, I'm an open Web kinda guy. See entry below about the open access advantage...twice the number of citations = twice the influence, etc. I actually log the use of my web pages...point your browser at http://faculty.washington.edu/tabrooks/MonthlyUse.htm and examine the traffic to my web pages.



The NY Times says it's so!

September 19, 2007
Times Select I read the NY Times everyday and as a home subscriber I was given special online privileges as a Times Select customer. If you weren't a home subscriber, then the Times Select privileges cost $. Oops! The open nature of the Web changes everything:

Dear Home Delivery Subscriber,
We are ending TimesSelect, effective today. This will not affect any services you are already receiving as a home delivery customer.
The Times's Op-Ed and news columns are now available to everyone free of charge, along with Times File and News Tracker. In addition, The New York Times online Archive is now free back to 1987 for all of our readers.
Why the change?
Since we launched TimesSelect, the Web has evolved into an increasingly open environment. Readers find more news in a greater number of places and interact with it in more meaningful ways. This decision enhances the free flow of New York Times reporting and analysis around the world. It will enable everyone, everywhere to read our news and opinion - as well as to share it, link to it and comment on it.



[A teaching moment]

Information architecture:: "increasingly open environment"

Information systems: "interact with it"

Information retrieval: "Readers find"


A student writes...

Quick question: do you want us to share our deliverables with the class here or just submit them to you...where? Exactly?
I really like your "classroom design" and the assignments look like fun too.

Today is Thursday September 13 and the natives are getting restless!
At our residency I'll explain how the course works, where the deliverables are, how to deliver the deliverables, how we will use our chat list, etc.
I'll happy to have you looking at the first deliverable, but hold off until we make a proper beginning at the residency.
Our residency will be on Friday September 28 in Parrington Hall 101 from 1:30 pm to 5:20 pm.




Open access advantage?

July 12
There are many observations/arguments that putting a paper out in open access on the Web gives it a great advantage in visibility (i.e., increased citation), but apparently this is not true for Astrophysical Journal.

Here we have shown conclusively that for the Astrophysical Journal there is no cost component to the citation differential, confirming our previous result in astrophysics (Kurtz, et al. 2005a) and Moed's (2007) in condensed matter physics. There are a number of excellent arguments in favor of changing the scientific publication system to an open access model. The open access ciation advantage is not one of them.
"Open Access does not increase citations for research articles from The Astrophysical Journal"



A student writes...

I'm quite familiar with distance course model. I falsely was under the assumption that since there are two seperate sections that each section would have their own online discussion board. In my experience so far in this program I've had better experiences in smaller classes in regards to sharing ideas on discussion boards. Conversely in larger discussion boards thoughts and ideas can easily be lost amongst the glut of threads and posts. However if both sections are going to be participating on one discusion board then my concerns are irrelevant.

July 15, 2007
I take a "the more the merrier", "are we having fun yet?", and "if there's no blood, there's no foul" approach to teaching this distance class. We have one big discussion board that everyone can use. Usually, the vast majority of students just lurk anyway and it is only a vocal minority that contributes. I wouldn't worry about your thoughts and ideas being lost. Point your browser at "https://catalysttools.washington.edu/gopost/board/tabrooks/1814/", log on and have at it. Start a new conversation thread if you like. If you are concerned with the information management aspects of many active threads, I've written a Greasemonkey script that places all the conversation threads with 'new' content at the top of your screen. Point your browser at "http://projects.ischool.washington.edu/tabrooks/GreaseMonkey/ePostRemix/ePostRemix.htm". It does require the Firefox browser, however.



Information architecture DOES matter

July 14, 2007
Dewey classification = Information architecture = Library environment
Trying to build popularity, many public libraries across the country have been looking more like big chain bookstores, offering comfortable easy chiars, coffee bars and displays of the latest best sellers. But the new library in this growing Phoenix suburb has gone a step further. It is one of the first in the nation to have abandoned the Dewey Decimal System of classifying books, in favor of an approach similar to that at Barnes & Noble, say, where books are shelved in 'neighborhoods' based on subject matter.
"Dewey? At this library with a very different outlook, they don't" The NY Times, July 14, 2007

Information architecture DOES matter

July 10, 2007

AJAX is a web page architecture that permits portions of a web page to be updated instead of the whole web page refreshed. An example: Google maps uses an Ajaxian web page architecture...you're dragging the little map around...without you noticing it the web page is signaling the web server to send it fresh new map segments...the whole effect seems to be a huge map hiding behind the little square hole in your web page.

A leading online measurement service will scrap rankings based on the longtime industry yardstick of page views and begin tracking how long visitors spend at the sites...Yahoo and others, however, are increasingly using a software trick called Ajax that allows sites to update data automatically and continually, without users needing to pull up new pages. Page views decline as a result.
"Nielsen Revises its gauge of web page rankings" The NY Times, July 10, 2007



Wikipedia: Encyclopedia or news wire?

July 9, 2007
"...the bizarre tale of the professional wrestler Christopher Benoit who during one weekend last month killed his wife, Nancy, and later his son and himself. The police didn't find the bodies until Monday afternoon, June 25, but a Wikipedia entry on Mr. Benoit had reported his wife's death matter-of-factly 13 hours earlier.
...with more and more kinds of media, there are more and more intermediate levels of info availability
...This is the crucial dividing line: between reporting on events in as close to real time as possible--which can prove jarring to society, and journalists in particular, but hardly supernatural--and predicting thigns around the bend."
"In the blink of a byte, future becomes the past" The NY Times, July 9, 2007


Former student in NY Times!

July 9, 2007
"Jessamyn West, 38, an editor of 'Revolting Librarians Redus: Radical Librarians Speak Out' a book that promotes social responsibility in librarianship, and the librarian behind the Web site librarian.net (its tagline is "putting the rarin' back in librarian since 1999") agreed that many new librarians are attracted to what they call the "library 2.0" phenomenon "It's become a techie profession," she says.
"A hipper crowd of shushers" The NY Times, July 8, 2007.

I remember when Jessamyn sat in the front row of my LIS 503 class.


Net consensus replaces literary taste!

July 6, 2007
James Surowiecki in the The New Yorker, July 9 & 16, 2007 writes "Last month, the publisher Simon & Schuster announced a partnership with a Web site called MediaPredict which would use the collective judgment of readers to evaluate book proposals. The deal drew scorn from many, who saw it as evidence that publishers, in an era of stagnant sales, had so lost confidence in their own judgment that they were reduced to the methods of 'American Idol.' Asking readers to weigh in on a book's commercial prospects was a recipe for mediocrity, and the experiment was 'doomed to fail.' Yet even the idea's critics recognized that it was a response to a real problem: most books today are not economically successful, which means that much of the time and money that publishers invest in projects is wasted."


Words still have a future!

July 2, 2007
There was an article in the NY Times Sunday Magazine "All the news that's fit to print out" by Jonathan Dee about Wikipedia. The article quotes Jimmy Wales, founder of Wikipedia, commenting on media and writing: "The classic question I get at conferences is 'Do you think that Wikipedia will remain text, or will it be more and more video in the future?' I think it's pretty hard to beat written words. Especially for collaboration because words are the most fluid medium for shaping and reshaping and collaboratively negotiating something. It's kind of hard to do with video, and I don't think that's just a technical barrier."

June 29, 2007
This is my couse blog. I'll put announcements here and as the quarter rolls along, I'll post comments about leading edge events in digitopia, docusphere, informania, etc., etc.