dLIS 540 Information Systems, Architecture and Retrieval
Autumn 2007
Week two: Information architecture in the 21st century!
This week's lecture begins with this timeless observation about human writing systems:
What's the big deal about separating content from presentation, huh?
Consider the dynamics of living in a world of billions of web pages, millions of database records, etc., etc., and somebody has to 'manage' all that stuff.
Have a look here: "Content management systems" - Wikipedia
Computers process all text - letters or numbers - as series of binary numerical codes - 1's and 0's. When a computer writes the letter 'A' on to your hard drive, it doesn't create an image of the letter 'A', but writes a series of 1's and 0's that represent the letter 'A' from a table of code. When your computer "reads" the letter 'A' from your hard drive, it really reads a series of 1's and 0's and then consults a font file for selecting the character shape of 'A' that it shows on the computer monitor.
Bob Bemer developed the American Standard Code for Information Interchange, ASCII. In 1960, there was no such standardization. IBM's equipment alone used nine different character sets. "They were starting to talk about families of computers, which need to communicate. I said, 'Hey, you can't even talk to each other, let alone the outside world,'" says Bemer, who worked at IBM from 1956 to 1962.
ASCII is a seven-bit code that consists of 128 decimal numbers ranging from zero through 127 assigned to letters, numbers, punctuation marks, and the most common special characters. The Extended ASCII Character Set also consists of 128 decimal numbers and ranges from 128 through 255 representing additional special, mathematical, graphic, and foreign characters.
During 1980s researchers at Xerox begin mapping every character to a 16-bit code. They developed a "unique, universal and uniform character encoding" - UNICODE.
Unicode provides a consistent way of encoding multilingual text and helps the exchange text files internationally. The design of Unicode is based on the simplicity and consistency of ASCII, but goes far beyond ASCII's limited ability to encode only the Latin alphabet. The Unicode Standard provides the capacity to encode all of the characters used for the written languages of the world. To keep character coding simple and efficient, the Unicode Standard assigns each character a unique numeric value and name.
The original goal was to use a single 16-bit encoding that provides code points for more than 65,000 characters. While 65,000 characters are sufficient for encoding most of the many thousands of characters used in major languages of the world, the Unicode standard and ISO/IEC 10646 now support three encoding forms that use a common repertoire of characters but allow for encoding as many as a million more characters. This is sufficient for all known character encoding requirements, including full coverage of all historic scripts of the world, as well as common notational systems.
"Self-referential" ... a drawn hand drawing a hand ... text that refers to other text ...
What is so clever about the name "Mark Up"? Examine the following:
Nobody has used the name "Mark Up" so far. There's "Mary Up" and "Margaret Up", even "Luann Up", but no "Mark Up". Clever, no?
The digital processing of text requires distinguishing the "content" text from flags or signs embedded in the text that signal how the content text should be processed.
SGML differs from other markup languages in that it does not simply indicate where a change of appearance occurs, or where a new element starts. SGML sets out to clearly identify the boundaries of every part of a document. To allow the computer to do as much of the work as possible, SGML requires users to provide a model of the document being produced. This model, called a Document Type Definition (DTD), describes each element of the document in a form that the computer can understand. The DTD shows how the various elements that make up a document relate to one another.
HTML is a document-layout and hyperlink-specification language. It defines the syntax and placement of special, embedded directions that aren't displayed by the browser, but tell it how to display the contents of the document, including text, images, and other support media.
"Yield to the browser. Let it format your document in whatever way it deems best. Recognize that the browser's job is to present your documents to the user in a consistent, usuable way. Your job, in turn, is to use HTML effectively to mark up your documents so that the browser can do its job effectively. Spend less time trying to achieve format-oriented goals. Instead, focus your efforts on creating the actual document content and adding the HTML tags to structure that content effectively." Chuck Musciano & Bill Kennedy. HTML: The Definitive Guide O'Reilly, 1997
Here's a little question: Who is really in control of HTML?
The Reader or the Writer?
Who is the real architect of information?
Required reading: "No Bad Webpages: Reader Empowerment and the Web" by T.A. Brooks
I.B.M. has posted a tutorial for its mash-up tool, QEDWiki, on YouTube.
Now mash-ups are poised to hit the mainstream, and to spread well beyond music. Yahoo, I.B.M., Microsoft and others are creating systems to let ordinary people who’ve never been near a Java class create useful computer applications by combining, or “mashing up,” different online information sources.
If the technology catches on, many of us may become part-time programmers, instead of waiting for the people in information technology to help.
Here’s just one example: An employee at a chain of hardware stores creates a mash-up that combines inventory data, storm forecasts and the telephone numbers of branch managers. Then, when snow is on the way, the application sends text messages to the managers’ cellphones, telling them how many shovels to order.
Devising that sort of mash-up, which handles multiple data sources to produce a customized solution, is typically the province of a professional. But the new systems are designed, their creators say, so people with modest technical skills can tailor applications to their needs — while writing little or no code.
"Do the Mash (Even if You Don’t Know All the Steps)" The NY Times, September 2, 2007
Don't believe Terry?
What I did to the Catalyst Portfolio tool?
What I will do the Catalyst ePost tool?
"Adding effective Dynamic HTML (DHTML) content to your pages requires an understanding of other technologies, specified by additional standards that exist outside the charter of the original HTML Working group...DHTML is an amalgam of specifications that stem from multiple standards efforts and proprietary technologies that are built into the two most popular DHTML-capable browsers, Netscape Navigator and Internet Explorer, beginning with Version 4 of each browser." Danny Goodman, Dynamic HTML: The Efinitive Reference O'Reilly, 1998
Technologies covered by Goodman: (1) Cascading stylesheets and (2) JavaScript.
[Note: This web page is an example of DHTML]
XML is text-based markup that permits authors to invent their own tags, hence Semantic Markup
<?xml version="1.0" encoding="UTF-8" ?> <pets> <dog> <name>Fido</name> </dog> <cat> <name>Fluffy</name> </cat> </pets>
One consequence of permitting authors to invent their own tags is that XML coding must be strictly correct - no broken or missing tags.
Associated technologies are XSLT - Extensible Stylesheet Language Transformation and XML Schemas - schemas act as definitions for XML documents by declaring their structure. An XML schema validates and instance of an XML document. Validation is important because it permits you to be sure that the XML instance you have is correctly structured according to its defintion.
Jon Bosak is Sun's XML architect. He organized and led the working group that created XML and served for two years as chair of the W3C XML Coordination Group. He is a founding member of OASIS, the Organization for the Advancement of Structured Information Standards, and of its predecessor, SGML Open. At Sun he holds the position of Distinguished Engineer.
Required reading: The Birth of XML: A Personal Recollection by Jon Bosak
XHTML extends HTML by making it XML compliant. This permits standard XML tools to view, edit and validate them. "The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility." XHTML 1.0, W3C Recommendation, January 26, 2000
Like XML, JSON is also used to share information among applications, but it is easy for people to read and machines to parse. "While JSON is often positioned "against" XML, it's not uncommon to see both JSON and XML used in the same application" (Wikipedia: JSON)
An example of a JSON object describing football players and their positions:
{ "players" : [
{ "firstName" : "Ryan", "lastName" : "Campbell", "position" : "S" },
{ "firstName" : "Chris", "lastName" : "Campbell", "position" : "QB" },
{ "firstName" : "Kevin", "lastName" : "Hale", "position" : "DT" }
]}
Feeling slightly nauseous with all these acronyms? Head swimming? What you're witnessing is the rapid development of many different information architectures to solve various problems. Some of these architectures are for presentation (e.g. HTML), some are for modifying presentation (e.g. JSON), some are for heavy-duty information dissemination (e.g. XML).
Some typical scenarios for different information architectures: