The most direct approach to understand how a program works is just to try it. Therefore, we start with the usage section.
bibtex2rdf is provided as jar file (the source code will be published soon).
The current version is 1.0 beta 5. Download it here.
It is a command line tool and needs a Java JDK/JRE 1.5 as prerequisite. To translate
sample.bib to sample.rdf,
type
java -jar bibtex2rdf.jar
sample.bib sample.rdf
The complete call syntax is java -jar bibtex2rdf.jar [-schema <file>] [-baseuri <uri>] [-enc <enc>] <bibtex> [<output>]. For parameter explanation, see the following table. The application generates a log file (bibtex2rdf.log) which contains all warnings and errors.
Parameter
|
Description
|
---|---|
-schema <file> | optional schema file. see section Mapping Configuration |
-baseuri <uri> | prepend all generated uris with the specified base uri. If omitted, file local URIs will be used |
-enc <enc> | use specified encoding. Default is ISO-8859-1. To generate Unicode format, use UTF-8 or UTF-16. |
<bibtex> | this file is translated to RDF. If it is a directory, bibtex2rdf scans it (and its sub-directories) and translates all files found which have a .bib suffix |
<output> | the result is written to this file. If omitted, the result is written to stdout |
For each BibTeX entry at least one resource is generated. If an entry has authors and/or editors, we generate a separate resource to describe each author. Thus, it is possible to use the same resource if one person has authored or edited several publications. The same applies to publications which are part of a collection (conference, journal, book, etc.). In this case the collection is modelled as a separate resource, and all fields which relate to the collection instead of the publication become also properties of the collection (e.g. publisher, editor, address, year, month). We do not (yet) create a resource to identify a journal or conference series, in other words: each journal number is modeled as a separate collection. Also, in contrast to the person data we do not (yet) attempt to identify identical collections and merge them to one resource.
The second-most direct approach to understand a translation is an example. Therefore we present it here before going into the details. The BibTeX entry
@InProceedings{aberer2003chatty, author = {Karl Aberer and Philippe Cudr�-Mauroux and Manfred Hauswirth}, title = {The Chatty Web: Emergent Semantics Through Gossiping}, booktitle = {Proceedings of the Twelfth International World Wide Web Conference}, location = {Budapest, Hungary}, year = {2003}, month = {May}, pages = {197--206}, publisher = {ACM Press}, address = {New York, USA}, url = {http://www2003.org/cdrom/papers/refereed/p471/471-aberer.html} }is converted to
<?xml version='1.0' encoding='ISO-8859-1'?> <rdf:RDF xmlns:bibtex="http://www.edutella.org/bibtex#" xmlns:dct="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <bibtex:InProceedings rdf:about="aberer2003chatty" dc:date="2003-05" bibtex:pages="197-206"> <dc:title>The Chatty Web: Emergent Semantics Through Gossiping</dc:title> <dct:isPartOf> <bibtex:Proceedings dc:date="2003-05"> <dc:title>Proceedings of the Twelfth International World Wide Web Conference</dc:title> <vcard:ADR vcard:Locality="Budapest" vcard:Country="Hungary"/> <dc:publisher rdf:resource="aberer2003chatty:ACM_Press"/> </bibtex:Proceedings> </dct:isPartOf> <dc:identifier>http://www2003.org/cdrom/papers/refereed/p471/471-aberer.html</dc:identifier> <dc:publisher rdf:resource="aberer2003chatty:ACM_Press"/> <dc:creator> <rdf:Seq> <rdf:li rdf:resource="aberer2003chatty:Aberer_Karl"/> <rdf:li rdf:resource="aberer2003chatty:Cudré-Mauroux_Philippe"/> <rdf:li rdf:resource="aberer2003chatty:Hauswirth_Manfred"/> </rdf:Seq> </dc:creator> </bibtex:InProceedings> <bibtex:Person rdf:about="aberer2003chatty:Aberer_Karl"> <vcard:FN>Karl Aberer</vcard:FN> <vcard:N vcard:Family="Aberer" vcard:Given="Karl"/> </bibtex:Person> <bibtex:Person rdf:about="aberer2003chatty:Hauswirth_Manfred"> <vcard:N vcard:Given="Manfred" vcard:Family="Hauswirth"/> <vcard:FN>Manfred Hauswirth</vcard:FN> </bibtex:Person> <bibtex:Person rdf:about="aberer2003chatty:Cudré-Mauroux_Philippe"> <vcard:FN>Philippe Cudré-Mauroux</vcard:FN> <vcard:N vcard:Given="Philippe" vcard:Family="Cudré-Mauroux"/> </bibtex:Person> <bibtex:Organization rdf:about="aberer2003chatty:ACM_Press"> <vcard:ADR rdf:parseType="Resource"> <vcard:Country>USA</vcard:Country> <vcard:Locality>New York</vcard:Locality> </vcard:ADR> <vcard:FN>ACM Press</vcard:FN> </bibtex:Organization> </rdf:RDF>
Abbreviation
|
Namespace URI
|
Comment
|
---|---|---|
dc | http://purl.org/dc/elements/1.1/ | Dublin Core Standardized Schema for basic bibliographic metadata |
dct | http://purl.org/dc/terms/ | Dublin Core Metadata Terms to refine the basic elements |
vcard | http://www.w3.org/2001/vcard-rdf/3.0# | RDF mapping of vCard person and address data |
bibtex | http://www.edutella.org/bibtex# | New schema to capture all remaining elements |
BibTeX Type
|
RDF Type
|
Comment
|
---|---|---|
@article | bibtex:Article | An article from a journal or magazine. |
@book | bibtex:Book | A book with an explicit publisher. |
@booklet | bibtex:Booklet | A work that is printed and bound, but without a named publisher or sponsoring institution. |
@conference | bibtex:InProceedings | The same as inproceedings. |
@inbook | bibtex:InBook | A part of a book, which may be a chapter (or section or whatever) and/or a range of pages. |
@incollection | bibtex:InCollection | A part of a book having its own title. |
@inproceedings | bibtex:InProceedings | An article in a conference proceedings. |
@manual | bibtex:Manual | Technical documentation. |
@mastersthesis | bibtex:MastersThesis | A Master's thesis. |
@misc | bibtex:Misc | Use this type when nothing else fits. |
@phdthesis | bibtex:PhDThesis | A PhD thesis. |
@proceedings | bibtex:Proceedings | The proceedings of a conference. |
@techreport | bibtex:TechReport | A report published by a school or other institution, usually numbered within a series. |
@unpublished | bibtex:Unpublished | A document having an author and title, but not formally published. |
BibTeX field
|
RDF property
|
BibTeX Comment
|
RDF Mapping Comment
|
---|---|---|---|
address | vcard:Locality, vcard:Country | For journals, books, etc. usually the address of the publisher or other type of institution. For proceedings, often the location of the event | If the field contains a comma the text after the last comma is taken as vcard:Country, everything before the comma as vcard:Locality |
annote | bibtex:annote | An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography. | The content in this field is not cleared from special TeX formatting, but left unchanged |
author | dc:creator | The name(s) of the author(s), in the format described in the LaTeX book. | Authors are listed as element of an rdf:Seq. For each author a resource with the properties vcard:FN and vcard:N are created. the vcard:N resource gets the properties vcard:Family, and vcard:Given, vcard:Others, vcard:Prefix and vcard:Suffix, if these parts appear in the name. |
booktitle | dct:isPartOf | Title of a book (or other collection), part of which is being cited. | Booktitles are handled as collections. If the entry is of type inproceedings, the representing resource is typed bibtex:Proceedings |
chapter | bibtex:chapter | A chapter (or section or whatever) number. | default mapping. The LaTeX commands commonly used in bibtex files (e.g. accents) are handled |
crossref | dct:isPartOf | The database key of the entry being cross referenced. Any fields that are missing from the current record are inherited from the field being cross referenced. | A reference to the resource created for the crossref'd entry is created |
edition | bibtex:edition | The edition of a book---for example, ``Second''. | default mapping |
editor | dc:creator | Name(s) of editor(s), typed as indicated in the LaTeX book. If there is also an author field, then the editor field gives the editor of the book or collection in which the reference appears. | the editor is added to the collection resource, as author. |
howpublished | bibtex:howpublished | How something strange has been published. | default mapping |
institution | bibtex:institution | The sponsoring institution of a technical report. | This field is handled as organization |
journal | dct:isPartOf | A journal name. | Journals are handled as collections. Resources representing Journals are typed bibtex:Journal. |
key | dc:identifier | Used for alphabetizing, cross referencing, and creating a label when the ``author'' information is missing. | default mapping |
location | vcard:Locality, vcard:Country | A location associated with the entry, such as the city in which a conference took place. | handled as address |
month | dc:date | The month in which the work was published or, for an unpublished work, in which it was written. | the month field is merged with the year field to form the dc:date property |
note | bibtex:note | Any additional information that can help the reader. | The content in this field is not cleared from special TeX formatting, but left unchanged |
number | bibtex:number | The number of a journal, magazine, technical report, or of a work in a series. | default mapping. This information is added to the collection resource |
organization | bibtex:organization | The organization that sponsors a conference or that publishes a manual. | This field is handled as organization |
pages | bibtex:pages | One or more page numbers or range of numbers, such as 42--111 or 7,41,73--97 or 43+ | Consecutive hyphen chars are transformed to exactly one hyphen. |
publisher | dc:publisher | The publisher's name. | This field is handled as organization |
school | bibtex:school | The name of the school where a thesis was written. | This field is handled as organization |
series | dc:isPartOf | The name of a series or set of books. | this field is handled as collection |
title | dc:title | The work's title, typed as explained in the LaTeX book. | default mapping |
type | bibtex:type | The type of a technical report---for example, ``Research Note''. | default mapping |
url | dc:identifier | The WWW Universal Resource Locator that points to the item being referenced | default mapping |
volume | bibtex:volume | The volume of a journal or multi-volume book. | default mapping. This information is added to the collection resource |
year | dc:date | The year of publication or, for an unpublished work, the year it was written. | In W3CDTF format. If the month field is available, month and year information are merged. |
The mapping can be configured via a Java properties file. We show a commented sample file with commented entries which represents the default configuration. Exactly this default configuration is used when not specifying a '-schema' argument. If you write your own configuration files, you need only to include properties which values are different from the default. See below for links to other configuration files.
Important Note: the mapping configuration file format is subject to change in upcoming beta versions.
########################################################## # Namespaces ########################################################## # declare namespaces in the form # ns_<shorthand>=<uri> ns_rdfs=http://www.w3.org/2000/01/rdf-schema# ns_dc=http://purl.org/dc/elements/1.1/
ns_dct=http://purl.org/dc/terms/ ns_vcard=http://www.w3.org/2001/vcard-rdf/3.0# # if a namespace with shorthand 'unknown' is declared, # this namespace is used to create RDF property names # for unknown BibTeX fields. # By default, unknown fields are ignored. #ns_unknown=http://www.edutella.org/bibtex_unknown/ # if a namespace with shorthand 'bibtex' is declared, # this namespace is used to create RDF property names # for known BibTeX types and fields, if they # arent mapped to DC or RDF properties ns_bibtex=http://www.edutella.org/bibtex# ########################################################## # Flags ########################################################## # flags control the way the output is structured # create a property where year and month are merged as # one property (in the form YYYY-MM) createDate=true # try to create an address resource and split the # address field into components (Locality and Country) createAddressResource=true # create a Seq for author and editor lists. # if this flag is set to false, each author/editor # is added directly as property to the entry resource createSeqForPersonList=true # create a separate resource for each author/editor. # If this flag is false, the fullname is used as # property value. createPersonResource=true # create separate resources for collections # (proceedings, journals, etc.). # If this flag is false, the collection title # (and all other collection related information) # is added directly to the entry resource. createCollectionResource=true # add a Seq of all generated entries to the output. # creates a Seq which contains all entry references. # this allows to preserve the entry order information. # the URI of this sequence will be <baseUri>+"referenceList" createEntryList=true # add datatype declarations to all literals createDatatypes=false # to overwrite default datatypes, use the following entries # all other fields are of type xsd:string and currently not overwritable yearType=http://www.w3.org/2001/XMLSchema#nonNegativeInteger numberType=http://www.w3.org/2001/XMLSchema#nonNegativeInteger volumeType=http://www.w3.org/2001/XMLSchema#nonNegativeInteger chapterType=http://www.w3.org/2001/XMLSchema#nonNegativeInteger dateType=http://www.w3.org/2001/XMLSchema#gYearMonth ########################################################## # Field output lists ########################################################## # field output lists have three purposes: # - they allow to restrict the output to a selected subset # of fields # - they allow to specify which fields are collection # information and which are entry information # - they allow to specify if some special output is # requested which doesn't directly correspond to # a BibTeX field # Such additional properties are: # - sourceFile: outputs source file information # - label: adds a label to a resource # - shortTitle: tries to extract a short title from # the title and adds it as separate # property # # as shorthand, the pseudo-field "all" is allowed to # specify that all fields for which a mapping is available # should be mapped. Note that the latter three special # properties are not included in "all". To get these, you # have to specify them additionally, as in "all, label, sourceFile". # for BibTeX entries, output all fields, but nothing special. entryProperties=all # for person and organization resources, output all # available fields. This is a full name and a # structured name resource according to vCard. personProperties=all # assign the following fields to collection resources. # this is the default for all collection types. collectionProperties=address, booktitle, crossref, editor, journal, location,\ month, number, publisher, series, volume, year, shortTitle
# if you want to assign different fields to specific collection # types, you can overwrite the default by setting the following properties. proceedingsProperties=address, booktitle, location, publisher, month, volume, year journalProperties=address, journal, month, number, publisher, volume, year seriesProperties=publisher,series bookProperties=booktitle, editor, series, year # if you only want the fullname, use # personProperties=personFullname # if you want to output some fields just as strings, add them to the following list verbatimProperties=note, annote, key
########################################################## # Type mappings ########################################################## # Types start with an upper case letter # Default entry types and their associated RDF types Article=bibtex:Article
Book=bibtex:Book
Booklet=bibtex:Publication
InBook=bibtex:InBook
InCollection=bibtex:InCollection
InProceedings=bibtex:InProceedings
Manual=bibtex:Manual
MastersThesis=bibtex:Masterthesis
Misc=bibtex:Misc
Periodical=bibtex:Publication
PhdThesis=bibtex:PhDThesis
Proceedings=bibtex:Proceedings
TechReport=bibtex:TechnicalReport
Unpublished=bibtex:Unpublished Conference=bibtex:Conference
# You may add new non-standard entry types # which will be translated according to the # specified mapping # Matharticle=bibtex:Article
# Mastersthesis=bibtex:Masterthesis # Masterthesis=bibtex:Masterthesis # Mscthesis=bibtex:Masterthesis # Periodical=bibtex:Publication # Types assigned to person and organization # resources from the corresponding BibTeX field. Author=bibtex:Person Editor=bibtex:Person Organization=bibtex:Organization Institution=bibtex:Organization School=bibtex:Organization Publisher=bibtex:Organization # Types assigned to collection resources # They are inferred from the entry type # and from the BibTeX field. # Proceedings and Books are already defined above # collection for @article Journal=bibtex:Journal # collection for series field Series=bibtex:Series # everything else Collection=bibtex:Collection # special cases # resource for the 'and other' author/editor part EtAl=bibtex:EtAl # type of resource which represents the source file BibFile=bibtex:SourceFile ########################################################## # Field mappings ########################################################## # fields start with a lower case letter # address related fields address=vcard:ADR location=vcard:ADR # date related fields year=bibtex:year month=bibtex:month # title related fields title=dc:title # collection related fields # # Note that in the collection resource these # fields are always mapped to the title property. # # if you set createCollectionResource to false, # you also need to change the mapping for these fields. booktitle=dct:isPartOf journal=dct:isPartOf series=dct:isPartOf crossref=dct:isPartOf # person or organization related fields author=dc:creator editor=bibtex:editor publisher=dc:publisher institution=bibtex:institution organization=bibtex:organization school=bibtex:school # identifier fields url=dc:identifier key=dc:identifier # all other bibtex fields annote=bibtex:annote chapter=bibtex:chapter edition=bibtex:edition howpublished=bibtex:howpublished note=bibtex:note number=bibtex:number pages=bibtex:pages type=bibtex:type volume=bibtex:volume # fields derived from BibTeX information #used if createAddressResource addressCountry=vcard:Country addressLocality=vcard:Locality # used for the merged date date=dc:date # used for person and organization resources personFullname=vcard:FN personStructuredName=vcard:N # the structured name has several parts. # "Charles Louis Xavier Joseph de la Vallee Poussin Jr" is # split as follows:
# nameFamily = "Vallee Poussin"
# namePrefix = "de la"
# nameSuffix = "Jr"
# nameGiven = "Charles" # nameOther = "Louis Xavier Joseph"
nameFamily=vcard:Family namePrefix=vcard:Prefix nameSuffix=vcard:Suffix nameGiven=vcard:Given nameOther=vcard:Other # property used to attach a label label=rdfs:label # While Persons and Organizations always get their full name as label, # you can specify a label pattern for entries. Use <field> to refer # to a BibTex field and 'text' to include fixed text. # you can concatenate any elements using +. # to add label components only if a specific field x exists, use # (<x>: ...), e.g. (<year>: ', '+<year>) # This is the default setting: defaultLabelPattern=<title> # if you want to use different pattern for different types, # you can overwrite the default, e.g.: # articleLabelPattern=(<author>:<author>+'. ')+<title>+(<journal>:'. '+<journal>+(<volume>:' '+<volume>+(<number>:'('+<number>+')')))+(<year>:', '+<year>) # property used to attach source file information sourceFile=bibtex:sourceFile # property used to add the absolute path as string # to the source file resource fileAbsolutePath=bibtex:absolutePath
There are already several RDF mappings out there. For all we know we provide a configuration file. Note that no deep analysis of these mappings has taken place. Therefore the configuration file might produce slightly different results than the original converter.
This is an online service provided by the SemanticWeb@VU initiative at Vrije Universiteit Amsterdam. It is available at http://www.cs.vu.nl/~mcaklein/bib2rdf/. This translator creates person and organization resources, but no collection resources. Also, all fields are translated into properties of the same schema. Use the file VUMapping.properties (coming soon) to create similar output.
A Java application for conversion is provided at AIFB, Universität Karlsruhe as part of the SWAP project. It is downloadable at http://www.aifb.uni-karlsruhe.de/WBS/pha/bib/index.html. This translator creates person and organization resources, and adds source file information to them. The file SWRCMapping.properties creates similar output, with the following exceptions: a) In the original output, the RDF type and property URIs don't have a namespace. According to Jena 2.0 this isn't valid anymore. b) while person resources are created, the author/editor property isn't added to the entry resources; as this seems to be just an omission, we add these properties.
An OWL Ontology for BibTex is available at http://visus.mit.edu/bibtex/0.1/. This seems to be just a flat schema. An RDF model according to this schema can be generated with VisusMapping.properties.
As part of its mediator architecture, MarcOnt also provides a translater from BibTex to RDF (see http://www.marcont.org/mediations.jsp). The OWL schema used is available at http://www.marcont.org/marcont/marcont.owl and refers to the Visus ontology. The output created by the online converter is really flat, even all author names are put into one String. An RDF model according to this schema can be generated with MarcOntMapping.properties.
bibtex2rdf uses several open source libraries to fulfill its task:
Library
|
Description
|
---|---|
JavaBib | a highly capable BibTeX parser, provided by Johannes Henkel |
Jena | the well-known RDF library provided by HP Research Labs |
Log4J | the also well-known logging toolkit provided by Apache Software Foundation |
MacBinary Toolkit 2 for Java | provides an AccentComposer to create accented unicode chars; provided by Gregory Guerin |