Xrefer.com and Nupedia.com: a comparison

Comparison between xrefer.com and nupedia.com, two sites offering encyclopedic content on the web.


The web is, in theory, the ideal medium for the exploration and propagation of knowledge, and has been so since its inception. When Vannevar Bush first wrote of what later became the internet in 1945, he saw it primarily as an aid to the human brain: "Consider a future device for individual use, which is a sort of mechanized private file and library [..] in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.". Computers could help make information instantly accessible and thus facilitate the development of human understanding. Similarly, when hypertext links were first imagined by Ted Nelson in 1965, their purpose was to make navigation through information easier than with fixed media, therefor allowing a greater quantity of information to be accessed with greater facility. When Arpanet, the first network of computers was set up in the late 1960s, its aim was to allow knowledge to spread through knowledge-driven communities in order to enhance the speed and efficiency of research. It should thus follow suite that the popularization of the internet would lead to the dissemination of knowledge, or at least of its existence in a manner suited to the new medium; xrefer.com and nupedia.com have both set themselves this task, and the aim of this essay is to examine their relative successes at doing so. In comparing these two web sites, we will not only assess their approach and its success in terms of their avowed aims, but we shall also have a brief look at how they have attempted to take advantage of the new medium as a navigational tool through information, and thus at how successful they, as representatives of encyclopedic knowledge over the web, have been at making the internet a tool for navigating knowledge.

This essay shall thus comprise of three parts; first of all, an examination and assessment of each of the two sites; secondly, some of the conclusions to be drawn from this examination; and finally, a short discussion regarding the capacity of both sites for making use the the facilities offered by the internet to deliver their content.


Xrefer and Nupedia both aim to offer encyclopedic content to users over the internet. However, the similarity between the two sites all but stops there; whilst the core aim of both sites is identical, the approach they take in doing so is radically different, and this has naturally influenced the route they have chosen to take in order achieve their aims. Xrefer is set up as a company aiming for profitability, with content licensed from major publishers; Nupedia.com on the other hand is an open-source project, which means that all of the content it contains has been contributed by volunteers, and that both its content and code are freely available for download, modification and redistribution.

Xrefer's board of directors includes the ex-director of Oxford University Press's Electronic Publishing division, a member of Macmillan Publishing's marketing team, and as non-executive directors a VP from Flextech Telewest and the CEO of Macmillan Publishing. The stated aim of the site is to achieve profitability, from advertising revenue and their subscription service; the site comprises an extensive "corporate information" section with information concerning the prices of advertising on their site, demographic information on their users, the cost of registering for xrefer's subscription service ("xrefer plus"), which offers access to additional titles.

The fact that the company's goal is to achieve profitability has to a great extent shaped their approach in the designing and conception of their web site. The content of the site is licensed from reknowed publishers, such as Columbia University Press (U.P.), Cambridge U.P., Macmillan, Penguin, Routledge et. al.. The company have done some research concerning their audience for the purpose of advertising sales1: amongst the most interest facts are that the audience is roughly spread over the 18 to 54 age branch; that the audience are frequent(82% every day) and seasoned (70% more than two years) web users; that their income is more equally distributed from the lower to the higher levels than nationally; and finally that xrefer consider "knowledge surfers- quiz players, crossword junkies etc." to be a part of their market to the same extent as "professionals with specific factual requirements", "educators and learners", and "web experts who need high-quality information fast". Xrefer have recently tried to ingratiate themselves with that user group by offering regular "Friday night quizzes" several Fridays a month, a marketing strategy aimed at increasing what is known as "site stickiness", referring to the frequency with which users return to the site.

However, xrefer also offer a paid subscription service which, from their literature, is not aimed so much to individuals as to institutions; this carries with it requirements which are somehow contradictory to what is generally seen as necessary to appealing to users, which is to create a "community web site.". Indeed, it is generally thought that by creating a web site. for a specific community of users, such as "knowledge surfers", marketing penetration is increased by the loyalty of the users to the site and the frequency with which they use it, as well as making advertising easier to target. However, some of the requirements of such a site, such as tightly focused content, communication between users and thus its permissive moderation, and allowing relationships to develop between users and the site, could possibly alienate the more casual and infrequent users, and thus make it an unattractive proposal to institutional buyers. Seemingly for this reason, xrefer has opted for a more fixed-content focused approach: hardly any interaction between users is allowed for, the content does not change very often, the focus is chiefly on providing information.

Similarly, the design of the site seems to be geared to the largest number: the most prominent colours are white and grey, a touch of a dark, discreet purple is introduced for visual clarity and for highlighting purpose, the site follows a very strict and limitative design template. The fonts used are standard web fonts, not only in the textual content, but also in the images, where designers are usually not restricted by the availability of fonts on users' machines4. The font use across the site is Trebuchet, a very sober font with a slightly modern slant, and even the xrefer logo seems to be derived from the Times font family.

The structure of the site is similarly geared to simplicity, and contains a core total of three pages. The home page contains little but basic information about the site and a form with the term to be queried, in a manner that seems inspired by the Google search engine, the results appear in a very straight forward and uncluttered manner, and the article page contains nothing but information immediately relevant to the search term. Here also, the information is laid out in a very immediate and uncluttered style, navigation is straightforward, and emphasis is placed on core informative content. The site thus follows a classical tree structure, starting at the index page, down to the search results page and then finally to the article itself. This structure is classical not so much in web sites as in books, with the book at the top of the tree, the table of contents indicating the chapters available and their location, and then finally the textual content as a part of each chapter. It can be noted that the physical layout of the article page follows the classical "given/new" semiotic opposition, whereby the "given" is placed on the left of the page, and the "new" on the right"; thus the links relevant to the search term appear on the left of page, representing the given of the which information we have just requested, whilst the actual article pertaining to the query, or the "new" information we are now discovering, is placed to the right. Both of these strands of information are placed in a superseding "given/new" relationship, with both the xrefer logo at the top of the page and a form to begin a new search reminding us of the given that we are on the xrefer site with the intention of finding information, whilst both the article and the links below represent the "new" information we have been searching for.

The content itself is licensed from the most reputable academic sources, and is geared both towards reference material such as dictionaries and encyclopediae, such as Oxford Paperback Encyclopedia The Macmillan Encyclopedia 2001. One of xrefer's main selling points is its cross-referencing capabilities5; from any given search, it offers links from inside the article's text to other relevant titles. Thus, if we take as example the article on atonality taken from the Macmillan Encyclopedia, we will find links from within the text to other articles from the same source on related topics, such as "Schoenberg" and "tonality". On the left of the screen, we will also find links to similar articles from different sources ("atonality" in the Oxford dictionary), or other related articles from the same ("tonality" from the Macmillan encyclopedia) and different sources ("tonality" from the Oxford dictionary). This is of course a very attractive feature, and it comes close to fulfilling one of the perceivable aims of using the internet as a tool for navigating encyclopedic information; however, in this case, the execution of the concept seems to come short of its original aim.

There are several points which one may wish to question, from both conceptual and contextual points of view. From the conceptual view, we are lead to wonder why there are several different articles on the same subject from different sources; indeed, if the aim of the user is to find out information about a given subject, it would follow that his primary concern would be the information itself rather than its origins or the presence of different but similar articles on the same subject. Although the origins of the article may prove reassuring as to the quality of its content, and the presence of several different articles may prove interesting in analyzing the different editorial slants of the different sources of information, these concerns seem to be diminished by the importance of the actual information itself. Taking the articles on atonality as references, we find that both the articles from the Macmillan Encyclopedia and that from the Oxford Dictionary offer similar but complementary definitions. The Macmillan article mentions chromaticism in the 19th century and more detailed information about Schoenberg, whereas the Oxford article is more specific on the musical theory behind the concept of atonality and references different composers. It would seem logical to amalgamate both these articles from a users' point of view, rather than having to trawl through a list of different yet complementary definitions.

As regards to the content itself, and once again using the articles on atonality as a reference point, one could be disappointed by the lack of depth in the given definitions. Whilst it seems forgivable for paper encyclopediae, with conjunction of the need to simplify in order to satisfy the many different levels of readership (from students in need of a definition to musicians seeking in-depth knowledge) and the limits of the medium in terms of available space, to offer only a rather limited definition of atonality, it seems less forgivable once these limits are removed. Indeed, it should now be possible to offer definitions of a term to several different audiences with different contextual interests without hindering the clarity of the result; thus once could imagine first a dictionary definition, leading on to more elaborate articles on the term from the points of view of several different subjects, such as history or musical theory. For example, one could expect content about the importance of atonality from a music-historical perspective, detailed technical information about how it is achieved, how it differed from previous methods of composition, or how it light have influenced or been influenced by other disciplines such as philosophy for instance. By the chance of the list of works which it presents online, xrefer has a little technical information about atonality from a musical perspective from the Penguin and Oxford dictionaries of music, but this content is not placed in relationship with other definitions; moreover, no other perspectives on the term are present.

This comes in many ways from the purpose xrefer has set itself and the target audience it has concerned itself with. The purpose of the company is financial profit, which is mainly attained by obtaining the maximal amount of clients with a minimal financial expense. As such, it is not so much content-oriented as customer-oriented; and its main customers are advertisers and institutions. For this reason, it is natural for them to prefer content already available in book form from prestigious publishing houses in its original form. First of all, the cost of using these sources is minimal from a technical point of view, as the works are already available in digital form in most cases, thereby reducing the cost of putting them online. Editing these works in order to make the content more relevant to the medium thus has several disadvantages: on one hand, it greatly increases the cost involved, and on the other it reduces the brand appeal of the content. Indeed, consolidating the content would require both skilled and expensive academic editors to edit the content, and a more sophisticated search engine to retrieve it, the details of which we will briefly examine later. Moreover, brands such as the "Oxford Dictionary of English" and the "Macmillan encyclopedia" have a successful sales history and thus a high public profile; as such it is more appealing for advertisers to advertise on a site referencing branded content than unbranded content, with the expectation that the public will trust and prefer branded to unbranded content. If the content were to be edited, the association with the branded content would be more difficult as the content would not the same as in the branded original. The reasoning is the same when applied to institutional clients; it is much easier to sell a product which has already been successful in the past than a completely new product, and the clients might thus prefer to invest in known brand names than as-yet unproved content. Xrefer offers both those brand names and the reduced costs of providing content over the internet rather than in physical form (less storage space required, less chances of physical damage to the content reducing replacement costs etc...). It can thus be said that xrefer is in essence converting existing content to digital format, or putting books on screen; and whilst it does offer the possibility of cross-referencing content, the manner in which this has been done and depth of content available mean that the medium of the internet yet has to be taken advantage of.


One of Nupedia.com's most appealing aspects is that it is not immediately linked to such commercial requirements. Indeed, whilst Nupedia6 is also a registered company whose aims are to achieve profit, the way in which it plans to do so is largely different from xrefer's. The content on Nupedia.com is to be written and edited by volunteers, but remains free of any licensing restrictions, and can thus be redistributed, downloaded or edited free of charge, with the requirement that all modifications to the original content must be made available at the same conditions. This type of content is known as "open-source"; other examples of open-source content include the Linux operating system, the source code of which is readily available and has been contributed to by programmers worldwide, or the Mozilla internet browser on which Netscape 6 is based. Nupedia's core business is thus not the content itself, but its repackaging and distribution, much in the same way that Red Hat Inc. has repackaged and redistributed Linux. Because Nupedia's content is thus completely independent of any commercial restrictions, one would expect it to not be restricted in the same way as xrefer's is, and thus to possess less of its defects. Put concisely, because its content is free of commercial considerations, Nupedia.com can afford to be entirely content-driven, an advantage xrefer did not possess.

Nupedia is a relatively new project, having been set up in 1999/2000, and thus as of now the amount of content available on its site is relatively small. Because Nupedia relies on voluntary contributions to provide it with content, the focus of the site might be more towards potential contributors rather than on content delivery. This seems to be confirmed by the information available on their home page: the centre of the page is occupied by information relative to the mechanisms of the site rather than content search, and all the links apart from the forms to the left of the main section refer to information about the site rather than information concerning content. The site thus caters to the providers of content rather than to those who are looking to read it.

Both the writers and the editors of nupedia.com are from an academic background, according to the information provided by the site. According to this information, this ranges from university students to university professors, holders of PhD's and other postgraduate diplomas, but also comprising less fully educated but equally academically credible members. The majority of members also seem to be male; and because the site has received relatively little publicity outside the internet, we can also assume that most are regular and fluent internet users. Their dedication to providing content free of charge probably also indicates an emphasis of content over form; however, the variety of professions mentioned would seem to indicate a strong diversity of both age and social groups.

The design of the site reflects the indefinite nature of the target audience. Social and age groups are traditionally the driving factors in the graphic targeting of web sites, but this data is absent; moreover, unlike such strong subject matters as rock groups or horror movies, encyclopedic content does not have a predefined graphic chart, with existing and successful models in place from which to derive. As such, the site's design is rather indefinite: much use is made of the colour blue on a white background, both neutral colours, and the typeface used is the standard browser font - on the whole, the global brand image of the site is not very strong. Although there is a site logo, it uses the same blue transition as Microsoft Windows window, and thereby standard and inoffensive; the font used in the logo seems to be Arial, another standard font face. The only distinctive feature of the logo is the calligraphed "N", which consequently looks out of place rather than thrust-providing. On the whole, the graphic chart of the site seems to content itself with legibility, emphasizing the content of the site not so much through its design but by the absence of a strong visual identity.

The structure of the site is similarly indefinite; the index page contains more information about how to provide content than on how to find it, yet does place a search form for content on the upper left of the page, a prominent position. The index page contains the latest news concerning the site, which imply a rather community-oriented web site.; however, because we are concerned primarily with the way content is delivered, we shall not examine these pages in detail. It is however useful to point out that the index page serves the purpose of an umbrella over all of the different pages; and that navigation is not facilitated as a result. When browsing through pages related to the users' community, there are two different templates in use, as exemplified by the policy page and the press page, which makes navigation confusing and which does not facilitate quickly moving from one subject to the other.

This apparent lack of navigational structure also applies to the page listing the results of the search and the actual article page. The search page lists all results found without any particular order or without further information about the entry; in the case of the "atonality" search, two entries were found which are long and short versions of the same article, though no mention of this was made on the results page. The articles themselves are presented without any links to other relevant articles, and the bibliographies references off-line works, thereby defeating the purpose of presenting encyclopedic information on the internet. The lack of content on the site does limit the amount of possible cross referencing possible, however no provision in either the layout or the structure of the site seems to have been made for this to happen if and when the amount of available content does increase. On the whole, both the structure and layout of the site are not conducive to pleasant browsing, and indeed do not provide additional facilities to traditional referencing methods.

The main advantage has over xrefer is the quality of its content. Though at the moment there are only around 30 completed articles on the site, (and over 1500 on its sister site wikipedia.com, which is also less developed technologically and whose lack of editorial policy makes the quality of its content hard to assess), the quality and focus of these over xrefer's is palpable. The entry on atonality is very complete and covers many aspects of the term: its musical context and influence are explained, links to other disciplines are mentioned, the origins of the term are discussed, on the whole the article gives and excellent insight into the concepts behind atonality. All of the information is also concentrated into one article; this as regards to a general definition is an advantage. It is still true, however, that none of these subjects are discussed in depth; for example, whilst mention has been made of the similarity of the self-defining nature of groups of chord suites in atonal music and Kandinsky's work, but no explanation is given as to why this is the case. Similarly, one could deplore the lack of accompanying musical notations exemplifying uses of atonality; and it is true that both of these concerns do not have their place in a general definition. However, because of the nature of the internet as a medium, it should be possible to discuss these issues without harming the continuity of a more generalist definition. Where with a paper support it is impossible to enter such depths in attempting to define a term, the small amount of space taken by text in a digital medium combined with the navigational facilities of hypertext make the internet a perfect medium for content of almost limitless depth; in the context of an encyclopedia, it becomes possible to give not only a general description of a subject, but also to enter in-depth analysis of certain of its aspects without however disrupting the flow of navigation. Thus, although nupedia is clearly more content-oriented than xrefer with relative success, it still does not fulfill the capacities of the medium to its limits.


Both xrefer and Nupedia have their strong points and their weaknesses. Xrefer is essentially a market-oriented product; as such, its aesthetic design is very successful, it offers the useful possibility of cross-referencing although only to a point, and very adequately succeeds in putting preexisting content online within an agreeable interface. However, its function is in many ways limited to the online browsing of traditionally off-line material; and though it does so successfully, it fails in taking advantage of its medium to provide truly relevant content, content which differs both in depth and in structure from traditional off-line models. The content available on Nupedia, on the other hand, is less subject to the requirements of traditional media and as such more successful in providing relevant material. However, both its visual and structural design are poor, and the site seems geared towards potential content providers, making it difficult to use as a reference site. In both cases however, the possibilities of the medium in terms of navigation and depth of content were not taken advantage of. At this point it becomes necessary to examine the implications of the medium in slightly more depth.

There is a difference in between using the internet as a navigational tool to better browse through information, and offering encyclopedic information over the internet. The core difference is one of approach; the latter offers information, whilst the former offers navigation through information. The difference in structure can be compared to that in between a book and the mechanism Vannevar Bush was describing in his early essay; the information in a book is freely available, but accessing is physically time-consuming; whilst Bush was describing a mechanism where information was immediately available on demand. The implications here are immense; on one hand, the user searching for information has to physically locate the book, which might not be immediately available, then he has to physically search for the information he is looking for, locate it, turn to the appropriate location, and only then is he satisfied in his query. This of course rapidly becomes tedious when searching for a vast amount of information about which the user might not know much, because the amount of physical research to be undertaken is quite vast. With the possibilities of hyper-texting and document indexing that computers offer, the internet offers the possibility of realizing Vannevar Bush's vision: information on any particular subject could be instantly available through document hyper-linking. Thus, as we have seen with the example of atonality, it would become possible to offer both a general definition of the concept, and in-depth coverage of more specific issues relating to the term, whilst maintaining the facility of navigation inherent to the medium. However, as we have seen with xrefer and nupedia, this is not currently the case on the internet.

Whilst it is out of the scope of this essay, a brief attempt shall be made to suggest why this might be the case from a technical point of view. Both xrefer and nupedia use XML to store their data, rather than database-driven systems as does the Google search engine for instance. XML is a customizable markup language; as such its main advantage is its ability to describe the data it is referring to, as opposed to a database which focuses on the relationships between different types of data7, or HTML which describes the layout of data. Thus for instance in the case of an encyclopedia, it becomes possible to define tags which describe the different types of data we are referring to, such as "entries", "definitions", or "synonyms" for instance. A very rudimentary XML tagset for an encyclopedia might look as such:

<' entry >
<' word > Atonality < /word >
<' definition type="general" > The use in music of all 12 notes of the scale in such a way as to avoid tonality< /definition >
<' /entry >

By expanding this tagset, one might arrive to offer several different "types" of definitions, such as "musical", "historical", "etymological", or offer more in-depth discussion of specific relevant issues, whilst at the same time permit cross-referencing to other relevant articles. This would be taking full advantage of XML to define the data, and thus facilitate its indexing and thus navigation.

However, both the tagsets of Nupedia and xrefer do not use XML to define data as such. Xrefer makes use of a rather general tagset, similar to HTML, which focuses on describing the hierarchical relationship between the different elements of an entry rather than the actual nature of the data. Nupedia's tagset also mainly focuses on hierarchical relationships, though some of the tags do offer useful basies for information, such as their <pronunciation/> tag. On the whole though, no indications relevant to the different types of encyclopedic content are given, and both tagset seemed to be used more for layout and perhaps variable assignment purposes (where XML is used to assign values to variables, replacing $foo="bar" with <foo>bar</foo>). Because the data is not defined as such, even in the context of the available data on both sites, it becomes more difficult to draw links between different entries; for instance, whereas the relationship between an entry on Shopenhauer's compositional techniques and the influence of musical currents in Kandinsky's painting might be obvious if one knows that both pertain to music in general, a certain period in time, certain artistic currents, and the use of atonality, this relationship is less obvious if one does not know of these similarities. Because of the tagsets used by both xrefer and nupedia, their search engines would have no way of knowing of any relationship between the articles, and links from one of these entries to the other are unlikely. Thus, it can be said that one of the reasons existing web encyclopediae fail to take advantage of the medium to offer virtually limitless depth of definition is the way they define and thus use the data available to them.

Moreover, it is interesting to look at the navigational interface used by the existing encyclopedic models. Xrefer's interface is successful at providing the content the site contains, but would it be able to cope with a large amount of in-depth data? It might be difficult to present the issues at hand in the form of a long list of links to the left of the general definition, even if these were broken down in subcategories- considering the case of atonality, would it be possible to reference all instances of the influence of atonality on painting to the left of a general article considering the question? Several alternative methods of browsing can be suggested. First of all, the use of Javascript to hide or reveal certain parts of a page, allowing a list of links to be described in a set of headlines. However, it might also be to consider alternative browsing methods to textual browsing, such as visual browsing, once again briefly due to the scope of the essay. Plum Design have pioneered an interesting visual navigation for a thesaurus, whereby the user jumps from word to word by selecting one of its synonyms. The list of possible synonyms is updated as the word changes, and the browser thus becomes dynamic; it thus becomes possible to link a large amount of terms to a given entry contextually. Applying this reasoning to encyclopedic content, it might thus be possible to link the "atonality" entry to different types of definitions, such as "musical" or "philosophical", which could then in turn be linked to more different subcategories, with the ease of navigation afforded by graphical navigational methods. Thus, it might be more advantageous to consider alternated methods of navigation when delivering encyclopedic content, although this area demands additional research.


In this essay, we have thus been able to examine and assess the successes and failures of both xrefer and nupedia, two sites which aim to offer encyclopedic content over the internet. We have been able to assess them not only with regards to their own stated aims, but also with regards to the more general notion of the exploiting the possibilities offered by the medium, and thus to take advantage of a medium seemingly tailored to offering such content in an accessible manner. We have concluded the essay with a brief passage about possible reasons for shortcomings in that area, as well as possible suggestions which may provide solutions to some of the problems found. Most importantly however, the subject has been opened for discussion, and the comparison between xrefer and nupedia has been used to the more general effect of examining both existing and potential possibilities for delivering encyclopedic knowledge over the internet; and as such, it is hoped that this essay shall provide the foundations for deeper research into this area.

[ close this window ]