Sunday, November 30, 2003
Re-Inventing Subject Access for the Semantic Web by Rosemary Franklin
Early research on the Web did not have an adequate system of authority control, but now web-based research is beginning to combine subject access with the principles of bibliographic control and cataloging. Organizing Web content with standards and controlled vocabulary will improve search and retrieval, increase relevance, and provide better use of technology. This short essay will focus on the New Research Methods section of this article.
Today research on the World Wide Web is all about computers and computer applications; online systems are now pointing scholarly research in new directions with new products of research. This turn of events requires technology that allows information to be retrieved in a precise and controllable manner. Basic to this new research capability are the principles of bibliographic control and subject classification.
Scholars are now approaching the subject of their research in a different way. For these scholars the computer does more than serve as a means of accessing information; it is now driving the scholarship as well as the objective of the scholarship. Computer applications that support scholarship vary widely. Author-based collections can be arranged around specific writers, themes, genre, or period. Then there are search engines that search records containing valuable metadata. Dublin Core makes possible a wide array of search and retrieval discovery.
The old Web did not guarantee quality or integrity of data. During those times determining some kind of authority was by checking the domain. Also, .edu, .gov, and .org were thought to harbor accurate information. To a certain degree, this is true today. The Web is changing and authority control of information is setting standards based on the principles of bibliographic control and subject access.
Librarians and other information management professionals are now "identifying, evaluating, cataloging, and adding subject classification as a way of ensuring value and authority. The high speed semantic Web will need to incorporate standards that are basic to integrity of data. Future scholarship will be based on this premise".
In the class you have spoken about this issue, so I enjoyed reading about the coming changes.
Early research on the Web did not have an adequate system of authority control, but now web-based research is beginning to combine subject access with the principles of bibliographic control and cataloging. Organizing Web content with standards and controlled vocabulary will improve search and retrieval, increase relevance, and provide better use of technology. This short essay will focus on the New Research Methods section of this article.
Today research on the World Wide Web is all about computers and computer applications; online systems are now pointing scholarly research in new directions with new products of research. This turn of events requires technology that allows information to be retrieved in a precise and controllable manner. Basic to this new research capability are the principles of bibliographic control and subject classification.
Scholars are now approaching the subject of their research in a different way. For these scholars the computer does more than serve as a means of accessing information; it is now driving the scholarship as well as the objective of the scholarship. Computer applications that support scholarship vary widely. Author-based collections can be arranged around specific writers, themes, genre, or period. Then there are search engines that search records containing valuable metadata. Dublin Core makes possible a wide array of search and retrieval discovery.
The old Web did not guarantee quality or integrity of data. During those times determining some kind of authority was by checking the domain. Also, .edu, .gov, and .org were thought to harbor accurate information. To a certain degree, this is true today. The Web is changing and authority control of information is setting standards based on the principles of bibliographic control and subject access.
Librarians and other information management professionals are now "identifying, evaluating, cataloging, and adding subject classification as a way of ensuring value and authority. The high speed semantic Web will need to incorporate standards that are basic to integrity of data. Future scholarship will be based on this premise".
In the class you have spoken about this issue, so I enjoyed reading about the coming changes.
Wednesday, November 05, 2003
Subject Language Syntax from The Intellectual Foundation of Information Organization by Elaine Svenonius
There has been a slow trend developing over the past century in subject language development--that of shifting from enumerative to synthetic languages. Three theorists, Derek Austin, Julius Kaiser, and S.R. Ranganathan, contributed to the synthesis movement, and although their works are no longer really considered or used anymore, these works have guided those who now are developing other languages that will demonstrate "better economic backingand survival power".
Library of Congress subject headings will be discussed here as a section of Chapter 10, along with definitions of terms. The subject languages in use today include LCSH, DDC, LCC, and the UDC, which are further characterized according to whether they are term or string languages, precoordinate or post coordinate languages, and enumerative or synthetic languages. A term language does not employ a syntax; an example of this would be a language that uses a thesaurus as a vocabulary tool. A string language places terms into larger expressions. In alphabetic subject languages they are called strings or subject headings, and in classificatory languages they are known as synthesized or built numbers. The subject langauges today are a mixture of enumerative and synthesis--LCSH and DDC are examples.
A LCSH string starts with a main heading that tries to deal with what the document is about. Sometimes subdivisions follow the main heading. There are rules that detail when these subdivisions can be used and in what order. The facets used in LCSH are Topic Place, Time, and Form. Most commonly it looks like this:
Topic main heading--Place--Topic--Time--Form. An example of this would be Art--Censorship--Europe--Twentieth Century--Exhibitions.. In this example Topic and Place are reversed.
Syntax rules dictate that there are two typres of main headings:
1. Those that list subdivisions permissible for a main heading type, such as ethnic groups, persons, groups of persons, etc.
2. Those that specify a pattern to be followed for a main-heading type such as languages and diseases.
With respect to classes of subdivisions those most often used are about common, or free-floating subdivisions, which can be used to divide any heading, subject to certain restrictions. One can find these subject specific in notes and references in the authority list of LCSH terms , and in lists of subdivisions and information sheets for individual subdivisions in the LOC Subject Cataloging Manual: Subject Headings.
This chapter is going along with the lecture content. For me it still will require another reading or two to understand everything. Particularly interesting were the sections on the tree theorists and their ideas about languages.
There has been a slow trend developing over the past century in subject language development--that of shifting from enumerative to synthetic languages. Three theorists, Derek Austin, Julius Kaiser, and S.R. Ranganathan, contributed to the synthesis movement, and although their works are no longer really considered or used anymore, these works have guided those who now are developing other languages that will demonstrate "better economic backingand survival power".
Library of Congress subject headings will be discussed here as a section of Chapter 10, along with definitions of terms. The subject languages in use today include LCSH, DDC, LCC, and the UDC, which are further characterized according to whether they are term or string languages, precoordinate or post coordinate languages, and enumerative or synthetic languages. A term language does not employ a syntax; an example of this would be a language that uses a thesaurus as a vocabulary tool. A string language places terms into larger expressions. In alphabetic subject languages they are called strings or subject headings, and in classificatory languages they are known as synthesized or built numbers. The subject langauges today are a mixture of enumerative and synthesis--LCSH and DDC are examples.
A LCSH string starts with a main heading that tries to deal with what the document is about. Sometimes subdivisions follow the main heading. There are rules that detail when these subdivisions can be used and in what order. The facets used in LCSH are Topic Place, Time, and Form. Most commonly it looks like this:
Topic main heading--Place--Topic--Time--Form. An example of this would be Art--Censorship--Europe--Twentieth Century--Exhibitions.. In this example Topic and Place are reversed.
Syntax rules dictate that there are two typres of main headings:
1. Those that list subdivisions permissible for a main heading type, such as ethnic groups, persons, groups of persons, etc.
2. Those that specify a pattern to be followed for a main-heading type such as languages and diseases.
With respect to classes of subdivisions those most often used are about common, or free-floating subdivisions, which can be used to divide any heading, subject to certain restrictions. One can find these subject specific in notes and references in the authority list of LCSH terms , and in lists of subdivisions and information sheets for individual subdivisions in the LOC Subject Cataloging Manual: Subject Headings.
This chapter is going along with the lecture content. For me it still will require another reading or two to understand everything. Particularly interesting were the sections on the tree theorists and their ideas about languages.
Wednesday, October 29, 2003
How a Search Engine Works by Elizabeth Liddy
Search engine is a more colorful term that is commonly used in place of the academic term information retrieval system. Most users would prefer the term be changed to finding engine because we would rather find something quickly and have that retrieval be what we needed rather than having to spend time searching for it.
Search engines match queries against an index that has been created. This index includes the words in each document, along with pointers to their locations within the documents. A search engine has four parts:
document processor
query processor
search and matching function
ranking capability
The document processor is charged with preparing, processing, and inputting the documents that a user searches. It has other functions including a three step pre-processing module, identifying the elements to index, stemming, deleting stop words, extracting the entries from the original document, assigning a weight to terms, and finally, creating an index.
Query processing also has at least seven steps, athough some systems can skip steps and proceed directly to matching the query to the inverted file. Many of the query processing steps are similar to the document processing steps. Relative to this, more steps and more documents make this process expensive in terms of resources and responsiveness. Usually the longer the wait for results, the better the results. Un view of this, search engine designers must decide which is more important to the user--time or quality. Many choose time over quality.
The third step in information retrieval is the search and match function. The mechanism of this step varies with the system designer's philosophy. Searching the inverted files for documents that match the query is a standard binary search. The simpler the document representation, the query, and the matching sequence, the less relevant the results. The exception to this is the direct, one word query asking for the most general of information.
After computing the similarity of each document in the subset of documents the user gets an ordered list, and begins clicking on each document to determine relevance to their query.
The author states that, at least in 2001, search engine providers have offered less rather than more in the way of complex processing of documents and queries. The searcher is left with much work to do in order to get good results. I believe that there have been improvements in this area of information retrieval.
I liked this article. There is no philosophy or too much theory, just a striaght-forward account of how it works. I particularly liked the example of the reduced text that was then inserted and stored in the inverted file that lists the index entries. (The Milosevic and Yugoslavia example).
Search engine is a more colorful term that is commonly used in place of the academic term information retrieval system. Most users would prefer the term be changed to finding engine because we would rather find something quickly and have that retrieval be what we needed rather than having to spend time searching for it.
Search engines match queries against an index that has been created. This index includes the words in each document, along with pointers to their locations within the documents. A search engine has four parts:
document processor
query processor
search and matching function
ranking capability
The document processor is charged with preparing, processing, and inputting the documents that a user searches. It has other functions including a three step pre-processing module, identifying the elements to index, stemming, deleting stop words, extracting the entries from the original document, assigning a weight to terms, and finally, creating an index.
Query processing also has at least seven steps, athough some systems can skip steps and proceed directly to matching the query to the inverted file. Many of the query processing steps are similar to the document processing steps. Relative to this, more steps and more documents make this process expensive in terms of resources and responsiveness. Usually the longer the wait for results, the better the results. Un view of this, search engine designers must decide which is more important to the user--time or quality. Many choose time over quality.
The third step in information retrieval is the search and match function. The mechanism of this step varies with the system designer's philosophy. Searching the inverted files for documents that match the query is a standard binary search. The simpler the document representation, the query, and the matching sequence, the less relevant the results. The exception to this is the direct, one word query asking for the most general of information.
After computing the similarity of each document in the subset of documents the user gets an ordered list, and begins clicking on each document to determine relevance to their query.
The author states that, at least in 2001, search engine providers have offered less rather than more in the way of complex processing of documents and queries. The searcher is left with much work to do in order to get good results. I believe that there have been improvements in this area of information retrieval.
I liked this article. There is no philosophy or too much theory, just a striaght-forward account of how it works. I particularly liked the example of the reduced text that was then inserted and stored in the inverted file that lists the index entries. (The Milosevic and Yugoslavia example).
Wednesday, October 15, 2003
The Subject Indexing Process Section of Deconstructing the Indexing Process by Jens-Erik Mai
Library and Information Science has studied the representation of knowledge in documents for some time, but what has been missing from much research in this area is the intellectual process that takes place in indexing. There are rules and manuals to consult on how to index, but very little information on the mental process involved in getting it done.
Hutchins stated that "the literature of indexing and classification contains remarkably little discussion of the processes of indexing and classifying. We find a great deal about the construction of index languages and classification systems, about the correct formulation of index entries...but very little about how indexers and classifiers decide what the subject of a document is, how they decide what it is about".
The Dewey Decimaal Classification says that the indexer should consider the title, the table of contents and/or chapter headings, the preface and/or introduction, the text, bibliographical rederences, and outside sources like reviews, reference works, and subject experts. This guideline gives places for the indexer to look, but says nothing about the intellectual exercise the indexer uses to arrive at a subject.
The author then examines the ISO standard for choosing a subject heading, and arrives at the same conclusion--the standard lists places to consult for a subject heading, but no information on just how the sources should be looked at, or really what the indexer should look for-only sources for finding the subject.
Wilson postulated four ways to arrive at a subject heading:
1. Purposive Method--the indexer tries to find out "what the writer is trying to describe, report, narrate, prove, show, question, explain". With this method, hopfully, the author will lay out his purpose, but many times this is not the case.
2. Figure-Ground Method-Here the indexer tries to determine what aspects of the document appear to be emphasized most. The problem with this is that it relies on the indexer's impression of the document, and not everyone sees it the same way.
3. The Constantly Referred to Method--This method is objective as the subject is determined by counting the number of times a word appears in a document. Again a problem arises if that word has nothing to do with the content of an item.
4. Appeal to Unity Method--This method tries to determine what makes the document stay together, or what makes the document a whole. The subject is what ties the whole work together. There are indexer differences to be considered here--no two indexers might agree on what makes the document whole.
There is one more method of arriving at a subject headings and that is the user-oriented method, which is similar to the purposive method. In this method the indexer attempts to identify the users' potential information needs and indexes the document according to this identification.
To me, the purposive method and the user-oriented method are the best ones to use to arrive at a subject heading. I don't think there will ever be a perfect way to do this. One must choose the one that works best for the situation. I also know that it is important to outline the intellectual process that an indexer goes through, but I am not sure that should be among the primary topics for research. Learn how to do it first.
Library and Information Science has studied the representation of knowledge in documents for some time, but what has been missing from much research in this area is the intellectual process that takes place in indexing. There are rules and manuals to consult on how to index, but very little information on the mental process involved in getting it done.
Hutchins stated that "the literature of indexing and classification contains remarkably little discussion of the processes of indexing and classifying. We find a great deal about the construction of index languages and classification systems, about the correct formulation of index entries...but very little about how indexers and classifiers decide what the subject of a document is, how they decide what it is about".
The Dewey Decimaal Classification says that the indexer should consider the title, the table of contents and/or chapter headings, the preface and/or introduction, the text, bibliographical rederences, and outside sources like reviews, reference works, and subject experts. This guideline gives places for the indexer to look, but says nothing about the intellectual exercise the indexer uses to arrive at a subject.
The author then examines the ISO standard for choosing a subject heading, and arrives at the same conclusion--the standard lists places to consult for a subject heading, but no information on just how the sources should be looked at, or really what the indexer should look for-only sources for finding the subject.
Wilson postulated four ways to arrive at a subject heading:
1. Purposive Method--the indexer tries to find out "what the writer is trying to describe, report, narrate, prove, show, question, explain". With this method, hopfully, the author will lay out his purpose, but many times this is not the case.
2. Figure-Ground Method-Here the indexer tries to determine what aspects of the document appear to be emphasized most. The problem with this is that it relies on the indexer's impression of the document, and not everyone sees it the same way.
3. The Constantly Referred to Method--This method is objective as the subject is determined by counting the number of times a word appears in a document. Again a problem arises if that word has nothing to do with the content of an item.
4. Appeal to Unity Method--This method tries to determine what makes the document stay together, or what makes the document a whole. The subject is what ties the whole work together. There are indexer differences to be considered here--no two indexers might agree on what makes the document whole.
There is one more method of arriving at a subject headings and that is the user-oriented method, which is similar to the purposive method. In this method the indexer attempts to identify the users' potential information needs and indexes the document according to this identification.
To me, the purposive method and the user-oriented method are the best ones to use to arrive at a subject heading. I don't think there will ever be a perfect way to do this. One must choose the one that works best for the situation. I also know that it is important to outline the intellectual process that an indexer goes through, but I am not sure that should be among the primary topics for research. Learn how to do it first.
Entering the Millenium: A New Century for LCHS by Lois Chan and Theodora Hodges
In the past 105 years the LCSH, originally designed for the use of the Library of Congress, has evolved into the main subject retrieval tool for libraries around the world. Electronic technology expanded the role of LCSH; its role continues to grow due to increasing outside use along with access through the Internet and other online capabilities. One question addressed in this article is the future role of LCHS. Will it continue to be useful? What changes, if any, need to be implemented.
There seems to be one certainty about the LCSH system. In order for LCSH to continue its role in subject access to information it must recognize and adapt to a new environment. The author lists three considerations that LCSH must address: simplicity, interoperability, and scalability. One way to measure scalability is to determine if non-catalogers are able to use the system. Whether a good thing or a bad thing, due to the web, there are those not trained in cataloging who are providing subject data for information resources.
Today most information systems do not operate in a vacuum; interoperability is critical. Ideally a system would be able to search across disciplines as well as retrieval and storage systems. Scalability involves providing for flexibility of use in situations that vary in depth and sophistication. For instance, the number of subject headings assigned to each record has been scaled down. These are just abbreviated examples of the three considerations.
Building on these three considerations, the author lists ways in which the LCSH system is disadvantageous:
1. It is complex requiring trained personnel
2. It is costly to maintain subject heading strings in bibliographic or metadata records.
3. It is not compatible in syntax with other controlled vocabularies
4. Subject heading strings are not easy to map onto other controlled vocabularies.
5. It is not easily used outside the online catalog environment--this is true especially of search engines on the web
The author then lists the advantages to the system:
1. LCSH has a comprehensive, rich vocabulary
2. It has synonym and homonym control
3. LCHS has cross-references among terms
4. It is a universal controlled vocabulary
5. LCSH is compatible with subject data in MARC records.
Two possible solutions to the LCSH problem :
1. Develop a simplified syntax according to which LCSH terminology will be used--for instance, in the OPAC environment there is usually trained personnel for cataloging, but in situations where personnel and finances are short, but that there is a lot of information to be handled, LCSH could be applied using simpler syntaxes.
2. Use a post coordinate approach to simplify the LCSH string.
In summary, the system was developed as the 20th century was approaching. Over 100 years later the LCSH system is preparing for another century with a new set of problems and a few of the old ones. More changes are expected, especially in the electronic technology arena.
I liked this article. It was readable. It was organized well --the LCSH to date and the LCSH in the future. The more I read about bibliographic control and subject analysis and thesaurus construction and controlled vocabulary the more the entire subject begins to focus. This also helps with my internship.
I keep forgetting to put my e-mail address
beepus19 @aol.com
[ Wed Oct 15, 08:45:35 AM | maryann putt | edit ]
The Subject Indexing Process Section of Deconstructing the Indexing Process by Jens-Erik Mai
[ Tue Oct 07, 08:11:48 AM | maryann putt | edit ]
Entering the Millenium: A New Century for LCHS by Lois Chan and Theodora Hodges
In the past 105 years the LCSH, originally designed for the use of the Library of Congress, has evolved into the main subject retrieval tool for libraries around the world. Electronic technology expanded the role of LCSH; its role continues to grow due to increasing outside use along with access through the Internet and other online capabilities. One question addressed in this article is the future role of LCHS. Will it continue to be useful? What changes, if any, need to be implemented.
There seems to be one certainty about the LCSH system. In order for LCSH to continue its role in subject access to information it must recognize and adapt to a new environment. The author lists three considerations that LCSH must address: simplicity, interoperability, and scalability. One way to measure scalability is to determine if non-catalogers are able to use the system. Whether a good thing or a bad thing, due to the web, there are those not trained in cataloging who are providing subject data for information resources.
Today most information systems do not operate in a vacuum; interoperability is critical. Ideally a system would be able to search across disciplines as well as retrieval and storage systems. Scalability involves providing for flexibility of use in situations that vary in depth and sophistication. For instance, the number of subject headings assigned to each record has been scaled down. These are just abbreviated examples of the three considerations.
Building on these three considerations, the author lists ways in which the LCSH system is disadvantageous:
1. It is complex requiring trained personnel
2. It is costly to maintain subject heading strings in bibliographic or metadata records.
3. It is not compatible in syntax with other controlled vocabularies
4. Subject heading strings are not easy to map onto other controlled vocabularies.
5. It is not easily used outside the online catalog environment--this is true especially of search engines on the web
The author then lists the advantages to the system:
1. LCSH has a comprehensive, rich vocabulary
2. It has synonym and homonym control
3. LCHS has cross-references among terms
4. It is a universal controlled vocabulary
5. LCSH is compatible with subject data in MARC records.
Two possible solutions to the LCSH problem :
1. Develop a simplified syntax according to which LCSH terminology will be used--for instance, in the OPAC environment there is usually trained personnel for cataloging, but in situations where personnel and finances are short, but that there is a lot of information to be handled, LCSH could be applied using simpler syntaxes.
2. Use a post coordinate approach to simplify the LCSH string.
In summary, the system was developed as the 20th century was approaching. Over 100 years later the LCSH system is preparing for another century with a new set of problems and a few of the old ones. More changes are expected, especially in the electronic technology arena.
I liked this article. It was readable. It was organized well --the LCSH to date and the LCSH in the future. The more I read about bibliographic control and subject analysis and thesaurus construction and controlled vocabulary the more the entire subject begins to focus. This also helps with my internship.
I keep forgetting to put my e-mail address
beepus19 @aol.com
In the past 105 years the LCSH, originally designed for the use of the Library of Congress, has evolved into the main subject retrieval tool for libraries around the world. Electronic technology expanded the role of LCSH; its role continues to grow due to increasing outside use along with access through the Internet and other online capabilities. One question addressed in this article is the future role of LCHS. Will it continue to be useful? What changes, if any, need to be implemented.
There seems to be one certainty about the LCSH system. In order for LCSH to continue its role in subject access to information it must recognize and adapt to a new environment. The author lists three considerations that LCSH must address: simplicity, interoperability, and scalability. One way to measure scalability is to determine if non-catalogers are able to use the system. Whether a good thing or a bad thing, due to the web, there are those not trained in cataloging who are providing subject data for information resources.
Today most information systems do not operate in a vacuum; interoperability is critical. Ideally a system would be able to search across disciplines as well as retrieval and storage systems. Scalability involves providing for flexibility of use in situations that vary in depth and sophistication. For instance, the number of subject headings assigned to each record has been scaled down. These are just abbreviated examples of the three considerations.
Building on these three considerations, the author lists ways in which the LCSH system is disadvantageous:
1. It is complex requiring trained personnel
2. It is costly to maintain subject heading strings in bibliographic or metadata records.
3. It is not compatible in syntax with other controlled vocabularies
4. Subject heading strings are not easy to map onto other controlled vocabularies.
5. It is not easily used outside the online catalog environment--this is true especially of search engines on the web
The author then lists the advantages to the system:
1. LCSH has a comprehensive, rich vocabulary
2. It has synonym and homonym control
3. LCHS has cross-references among terms
4. It is a universal controlled vocabulary
5. LCSH is compatible with subject data in MARC records.
Two possible solutions to the LCSH problem :
1. Develop a simplified syntax according to which LCSH terminology will be used--for instance, in the OPAC environment there is usually trained personnel for cataloging, but in situations where personnel and finances are short, but that there is a lot of information to be handled, LCSH could be applied using simpler syntaxes.
2. Use a post coordinate approach to simplify the LCSH string.
In summary, the system was developed as the 20th century was approaching. Over 100 years later the LCSH system is preparing for another century with a new set of problems and a few of the old ones. More changes are expected, especially in the electronic technology arena.
I liked this article. It was readable. It was organized well --the LCSH to date and the LCSH in the future. The more I read about bibliographic control and subject analysis and thesaurus construction and controlled vocabulary the more the entire subject begins to focus. This also helps with my internship.
I keep forgetting to put my e-mail address
beepus19 @aol.com
[ Wed Oct 15, 08:45:35 AM | maryann putt | edit ]
The Subject Indexing Process Section of Deconstructing the Indexing Process by Jens-Erik Mai
[ Tue Oct 07, 08:11:48 AM | maryann putt | edit ]
Entering the Millenium: A New Century for LCHS by Lois Chan and Theodora Hodges
In the past 105 years the LCSH, originally designed for the use of the Library of Congress, has evolved into the main subject retrieval tool for libraries around the world. Electronic technology expanded the role of LCSH; its role continues to grow due to increasing outside use along with access through the Internet and other online capabilities. One question addressed in this article is the future role of LCHS. Will it continue to be useful? What changes, if any, need to be implemented.
There seems to be one certainty about the LCSH system. In order for LCSH to continue its role in subject access to information it must recognize and adapt to a new environment. The author lists three considerations that LCSH must address: simplicity, interoperability, and scalability. One way to measure scalability is to determine if non-catalogers are able to use the system. Whether a good thing or a bad thing, due to the web, there are those not trained in cataloging who are providing subject data for information resources.
Today most information systems do not operate in a vacuum; interoperability is critical. Ideally a system would be able to search across disciplines as well as retrieval and storage systems. Scalability involves providing for flexibility of use in situations that vary in depth and sophistication. For instance, the number of subject headings assigned to each record has been scaled down. These are just abbreviated examples of the three considerations.
Building on these three considerations, the author lists ways in which the LCSH system is disadvantageous:
1. It is complex requiring trained personnel
2. It is costly to maintain subject heading strings in bibliographic or metadata records.
3. It is not compatible in syntax with other controlled vocabularies
4. Subject heading strings are not easy to map onto other controlled vocabularies.
5. It is not easily used outside the online catalog environment--this is true especially of search engines on the web
The author then lists the advantages to the system:
1. LCSH has a comprehensive, rich vocabulary
2. It has synonym and homonym control
3. LCHS has cross-references among terms
4. It is a universal controlled vocabulary
5. LCSH is compatible with subject data in MARC records.
Two possible solutions to the LCSH problem :
1. Develop a simplified syntax according to which LCSH terminology will be used--for instance, in the OPAC environment there is usually trained personnel for cataloging, but in situations where personnel and finances are short, but that there is a lot of information to be handled, LCSH could be applied using simpler syntaxes.
2. Use a post coordinate approach to simplify the LCSH string.
In summary, the system was developed as the 20th century was approaching. Over 100 years later the LCSH system is preparing for another century with a new set of problems and a few of the old ones. More changes are expected, especially in the electronic technology arena.
I liked this article. It was readable. It was organized well --the LCSH to date and the LCSH in the future. The more I read about bibliographic control and subject analysis and thesaurus construction and controlled vocabulary the more the entire subject begins to focus. This also helps with my internship.
I keep forgetting to put my e-mail address
beepus19 @aol.com
Wednesday, October 01, 2003
Thesaurus Construction: Problems and Thier Roots by Uri Miller
According to Uri Miller there are three basic reasons for the current change and development of library search and retrieval mechanisms: the placement of computers in libraries, the evolution of advanced search tools, and the Internet. These innovations have led to an analysis of construction methods and their influence on library and databases processes. A discussion of thesaurus and classification is part of that analysis.
Aitchison and Gilchrist stated that a thesauruus was essentially an information retrieval tool-a vocabulary of a controlled indexing language and organized so that relationships between concepts are made clear. This makes a thesaurus an artificial language that was created for a specific purpose. Some authors believe that this is also the definition of a classification scheme, but, despite that, there are still differences between the two information processing and retrieving tools.
Bates says that "index terms are fundamentally linguistic, and classification schemes organize conceptual categories. The intent with indexing vocabularies is to find good, compact words or phrases to use to describe documents...with classification schemes, on the other hand, the goal ... is to have completely distinct conceptual categories that are mutually exclusive and jointly exhaustive...as a part making these rigorous categorical distinctions, classifications are generally further organized in a structural manner not seen in index vocabularies". Others disagreed with Bates' definitions; some even arriving at totally opposite conclusions.
The author believes that the difference between thesaurus and classification lies within their own distinctive structures and construction methods. These differences are based on the thesaurus and classification schemes in relation to the concept of hierarchy, and on their concept of association. Miller defines hierarchy as " a system with grades of authority, or status from lowest to highest". Using these words, then the classification scheme is monohierarchical and limited to a single aspect where every concept is taken apart and put into a category. A thesaurus is polyhierarchical because it offers access in many ways.
This was a complicated article for me. I wanted to read it because I thought it might be challenging and it was. I think I will understand these terms better after the lecture. I believe I remember this article from last semester's Fundamental of Information Science class. One thing mentioned several times was the fact that a thesaurus must be user friendly. If problems existed in 1996 with thesaurus construction and definitions of terms-classification and thesaurus-then I am wondering if in 2003 things are any clearer with reagrd to the problem.
According to Uri Miller there are three basic reasons for the current change and development of library search and retrieval mechanisms: the placement of computers in libraries, the evolution of advanced search tools, and the Internet. These innovations have led to an analysis of construction methods and their influence on library and databases processes. A discussion of thesaurus and classification is part of that analysis.
Aitchison and Gilchrist stated that a thesauruus was essentially an information retrieval tool-a vocabulary of a controlled indexing language and organized so that relationships between concepts are made clear. This makes a thesaurus an artificial language that was created for a specific purpose. Some authors believe that this is also the definition of a classification scheme, but, despite that, there are still differences between the two information processing and retrieving tools.
Bates says that "index terms are fundamentally linguistic, and classification schemes organize conceptual categories. The intent with indexing vocabularies is to find good, compact words or phrases to use to describe documents...with classification schemes, on the other hand, the goal ... is to have completely distinct conceptual categories that are mutually exclusive and jointly exhaustive...as a part making these rigorous categorical distinctions, classifications are generally further organized in a structural manner not seen in index vocabularies". Others disagreed with Bates' definitions; some even arriving at totally opposite conclusions.
The author believes that the difference between thesaurus and classification lies within their own distinctive structures and construction methods. These differences are based on the thesaurus and classification schemes in relation to the concept of hierarchy, and on their concept of association. Miller defines hierarchy as " a system with grades of authority, or status from lowest to highest". Using these words, then the classification scheme is monohierarchical and limited to a single aspect where every concept is taken apart and put into a category. A thesaurus is polyhierarchical because it offers access in many ways.
This was a complicated article for me. I wanted to read it because I thought it might be challenging and it was. I think I will understand these terms better after the lecture. I believe I remember this article from last semester's Fundamental of Information Science class. One thing mentioned several times was the fact that a thesaurus must be user friendly. If problems existed in 1996 with thesaurus construction and definitions of terms-classification and thesaurus-then I am wondering if in 2003 things are any clearer with reagrd to the problem.
Monday, September 22, 2003
Summary of Authority Control by Kerrie Talmacs
This article will briefly discuss authority control of the Internet and authority work in online catalogs and the use of reference in online catalogs. Talmacs lists several definitions that are important to understanding the subject:
authority control--the process that ensures catalog headings are consistent
authority work--the intellectual decision making that determines these variant forms
authority records--the authoratative form of an access point (heading)
authority file--an organized assemblage of authority records
Authority control is essential in online catalogs. Users benefit from authority work, but so do acquisitions, interlibrary loan and even cataloguers themselves. Even public services find the authority file useful in their work. It is true that online catalog users have excellent searching tools, nevertheless their searches are still greatly enhanced by authority control.
Another point to consider is that all online catalogs are not alike. Differences are manifested in the way they are searched, the way they store and process the headings, resulting sometimes in user confusion. Keyword searching varies significantly among catalogs. Some allow the retrieval of too many records, although in some catalogs a seasoned searcher can successfully narrow the search. Authority control is involved in the way online catalogs are searched and in the results received.
A study by Watson and Taylor revealed that a significant percentage of personal name references (41.5%) and corporate name references (21.9%) would be unnecessary in a system with keyword searching and truncation; another study produced similar percentages. On the other hand, yet another study produced entirely different results. The final analysis was that, in general, keyword searching and truncation are not a good substitute for seesee references.
Authority control of Internet resources is another consideration. Some organization to the disorganization could be brought about by improving searching technologies. Problems with authority control of the Internet include those of author and title, and providing subject access. Mandel and Wolvern say that "trained catalogers will need to be involved if name collocation, subject access, version control, and genre identification are to be achieved for Internet resources".
This was an article that I read several times to at least get a basic understanding of authority control. I am still not sure I understand but have yet to hear the lecture. I notice that the conclusion says, at least in 1998, there was a move to reconsider the basics of cataloging, and that at the same time a re-evaluation of authority control was also needed. I was wondering if this had been done yet.
This article will briefly discuss authority control of the Internet and authority work in online catalogs and the use of reference in online catalogs. Talmacs lists several definitions that are important to understanding the subject:
authority control--the process that ensures catalog headings are consistent
authority work--the intellectual decision making that determines these variant forms
authority records--the authoratative form of an access point (heading)
authority file--an organized assemblage of authority records
Authority control is essential in online catalogs. Users benefit from authority work, but so do acquisitions, interlibrary loan and even cataloguers themselves. Even public services find the authority file useful in their work. It is true that online catalog users have excellent searching tools, nevertheless their searches are still greatly enhanced by authority control.
Another point to consider is that all online catalogs are not alike. Differences are manifested in the way they are searched, the way they store and process the headings, resulting sometimes in user confusion. Keyword searching varies significantly among catalogs. Some allow the retrieval of too many records, although in some catalogs a seasoned searcher can successfully narrow the search. Authority control is involved in the way online catalogs are searched and in the results received.
A study by Watson and Taylor revealed that a significant percentage of personal name references (41.5%) and corporate name references (21.9%) would be unnecessary in a system with keyword searching and truncation; another study produced similar percentages. On the other hand, yet another study produced entirely different results. The final analysis was that, in general, keyword searching and truncation are not a good substitute for seesee references.
Authority control of Internet resources is another consideration. Some organization to the disorganization could be brought about by improving searching technologies. Problems with authority control of the Internet include those of author and title, and providing subject access. Mandel and Wolvern say that "trained catalogers will need to be involved if name collocation, subject access, version control, and genre identification are to be achieved for Internet resources".
This was an article that I read several times to at least get a basic understanding of authority control. I am still not sure I understand but have yet to hear the lecture. I notice that the conclusion says, at least in 1998, there was a move to reconsider the basics of cataloging, and that at the same time a re-evaluation of authority control was also needed. I was wondering if this had been done yet.
Wednesday, September 17, 2003
Summary of Standards for Descriptive Cataloging: Two Perspectives on the past 20 years by Tom Delsey
For over a hundred years descriptive cataloging has been in existence and has changed its format over the years due to a new cataloging theory or simply by force of practice. In the last twenty years of those one hundred years two things have influenced the development of cataloging standards: shared cataloging and computer technology.
The argument for shared cataloging, mainly an economic one, is simply that a standard description for an item that is created by a proper cataloging authority should be used by others cataloging that same item. The idea behind this is that time will be saved by not duplicating the record, and that in the long run money will be saved.
As with most situations, it might not be ideal, and might not always work to the degree one hopes, but it should be inplemented anyway for the above reasons. First, on the international level, language and culture pose a problem as modifications generally must be made. Nationally, different audiences and the purpose of various catalogs will lead to adapting the record to local standards. Despite these shortcomings, standardization does have its advantages and must be pursued. Thus over the years there has been a mre broadly based approach to the development of cataloging standards, plus an intensive network of support to assist in standardizing cataloging rules.
As expected, computer technology brought about demands for precision and logic in the recording of bibliographic data. Now with the machine readable format there is less reason to have variations in format and less reason for inconsistencies in general. The cataloging rules were affected and AACR authors began to see that computer technology demanded a consistent record structure in material specific formats, especially in databases. This brought about the formation of the General International Standard Bibliography Descritpion, which served as a framework for all materials--"print and non-print, monographic/serial, and published and manuscript".
As of 1989 when the article was written, the AACR2 rules for the description of machine readable files and a new ISBD for computer files had been accomplished. Computer technology will most likely continue to influence the development of structures for representation of bibliographic data.
This article was published in 1989. Since that time computer technology has become a most important part of the above sentence--influencinng the development of structures for representation of bibliographic data. This was a good article--organized, readable, and comprehensible. Shared cataloging and computer technology in descriptive cataloging have come a long way. By the way, it is interesting to note the spelling of catalog or catalogue.
For over a hundred years descriptive cataloging has been in existence and has changed its format over the years due to a new cataloging theory or simply by force of practice. In the last twenty years of those one hundred years two things have influenced the development of cataloging standards: shared cataloging and computer technology.
The argument for shared cataloging, mainly an economic one, is simply that a standard description for an item that is created by a proper cataloging authority should be used by others cataloging that same item. The idea behind this is that time will be saved by not duplicating the record, and that in the long run money will be saved.
As with most situations, it might not be ideal, and might not always work to the degree one hopes, but it should be inplemented anyway for the above reasons. First, on the international level, language and culture pose a problem as modifications generally must be made. Nationally, different audiences and the purpose of various catalogs will lead to adapting the record to local standards. Despite these shortcomings, standardization does have its advantages and must be pursued. Thus over the years there has been a mre broadly based approach to the development of cataloging standards, plus an intensive network of support to assist in standardizing cataloging rules.
As expected, computer technology brought about demands for precision and logic in the recording of bibliographic data. Now with the machine readable format there is less reason to have variations in format and less reason for inconsistencies in general. The cataloging rules were affected and AACR authors began to see that computer technology demanded a consistent record structure in material specific formats, especially in databases. This brought about the formation of the General International Standard Bibliography Descritpion, which served as a framework for all materials--"print and non-print, monographic/serial, and published and manuscript".
As of 1989 when the article was written, the AACR2 rules for the description of machine readable files and a new ISBD for computer files had been accomplished. Computer technology will most likely continue to influence the development of structures for representation of bibliographic data.
This article was published in 1989. Since that time computer technology has become a most important part of the above sentence--influencinng the development of structures for representation of bibliographic data. This was a good article--organized, readable, and comprehensible. Shared cataloging and computer technology in descriptive cataloging have come a long way. By the way, it is interesting to note the spelling of catalog or catalogue.