The Paris Pages Experience
Index
Classifying Data
Web pages fundamentally work with data, and for that reason, some notions about data, its classification, and how to describe the relations between data are imperative. One notion is that of data type. In many contexts, this idea refers to broad categories into which data can be classified. For example:If the data at had consisted of only these four types, and if there was not too much of it, then these basic data types might serve well. I am interested in an image of the Eiffel Tower. I immediately look at the list of images. This list is a list of five images only - one of which is the Eiffel Tower. Navigation through the data is easy and uncomplicated. If however, the amount of data is overwhelming, it may not be possible to even find the Effel Tower image. It is lost in the list of millions and millions of images. More categories are needed. Thus, these four data types are usually too general to really help us much except in the simplist cases. Because we are interested in data for the City of Paris, more categories are needed, and we might consider the following list of data classifications:
- Images
- Sounds
- Text
- Video
Yet, nothing in the above list is actually particular to Paris. The data for any city might be broken down into these categories. Furthermore, many objects in Paris would properly be arranged under several of these categories. The Musée du Louvre for example, is important for its architecture, history, art, and culture. Consider then a simple list of data classifications adapted to the City of Paris. Such a list might be:
- Architecture
- History
- Art
- Transport
- Culture
- Geography
- Plants
- Maps
- Music
- Medicine
- Literature
- Textiles
- Industry
This list is specific enough to Paris, that it would be possible to guess this even if one was not told. Thus, this list might provide a more efficient representation of the city than either of the first two lists. That said, some objects such as the Musée du Louvre still properly fit into several of the categories (Museums, Monuments, Arrondissements). Furthermore, because there are over 150 museums and hundreds of monuments, each of those sections will need to be broken down into subgroups.
- Museums
- Monuments
- Cafés
- Métro
- Streets
- Houses
- Gardens & Parks
- Cemeteries
- Arrondissements (Districts / Quartiers)
- Bridges
- Churches, Mosques, Synagoges
- The Seine
- The Canals
- The Sewers
- Daily Scenes
- Events
Some appropriate subgroups that come to mind are those in the second list above: Architecture, history, etc. No sooner said however, then the question as to whether Architecture is a subset of Museums, or visa versa. If a particular choice of hierarchy is chosen, are some data classification options irrevicably lost?
Nature of the Data Problem
The above illustrates the fundamental issue of data classification, and data presentation. As the amount of data grows, more detail is required and hence more detailed data classifications. As the list of classes/categories grows, they need to be split up into levels (sub-categories). One level with millions of categories is not useful, and one is driven to a structure which is not unmanagably wide. As the number of classification levels grows, their order becomes ambiguous.Furthermore the classification of the data has an impact - in most cases - on the actual directory structure of the web site. This can impose fixed relationships between the data - hierarchal versus relational structures - which cannot lead to a proper presentation of the data in all cases. Notice that in web sites which are essentially database query engines, no html pages actually exist, but are generated 'on-the-fly' in response to a query. Potentially it is possible to mimic any data structure desired. In practice this is a difficult problem however.