Content types

The org.eclipse.core.runtime.content package provides support for defining content-types for data streams. Content types are used by several content-sensitive features of Eclipse, such as automatic encoding determination, comparison editor selection, and menu contributions. A central content registry managed by an IContentTypeManager allows plug-ins to define content types and specify a class that knows how to read and describe the content. In order to contribute content types, a basic understanding of the content framework is necessary.

Note:  For this discussion, we specifically avoid the use of the word file when talking about content. The runtime content engine does not assume that content is contained in a file in the file system. However, it does include protocol that allows content types to be associated with file-naming patterns. In practice, these file names represent files in the file system, but nothing in the implementation of the content system assumes that the content is located in the file system. File encoding and content types discusses the file-oriented content types contributed by the platform resources plug-in.

 

Defining and describing content

The platform defines some fundamental content types, such as plain text and XML data streams. These content types are defined the same way as those contributed by other plug-ins. We'll look at how the platform defines the text content type in order to better understand the content type framework.

Plug-ins define content types by contributing an extension for the extension point org.eclipse.core.runtime.contentTypes. In this extension, a plug-in specifies an id and name for the content type, and an IContentDescriber which can read an input stream and supply a description of the content. The following snippet is the runtime plug-in's contribution for the text content type:

    <extension point="org.eclipse.core.runtime.contentTypes">
        <content-type id="text" name="%textContentTypeName">
            priority="high"
            file-extensions="txt">
            <describer class="org.eclipse.core.internal.content.TextContentDescriber"/>    
        </content-type>
        ...

The TextContentDescriber is responsible for reading an input stream and quickly determining whether the supplied content is a valid sample of text. The method describe(inputStream, description) is called whenever the platform is trying to determine the content for a particular data stream. The IContentDescriber is responsible for quickly determining whether the contents represent a valid sample for its content type and returning a constant that indicates whether the content matches its type. If the content does match its type, the describer should also fill in the supplied IContentDescription with information about the data.

The IContentDescription stores content-specific attributes in key/value pairs. These attributes are specific to the particular content type. The platform specifies attributes for the character set and the byte order of a text file, but others can be defined.

 

Finding out about content types

IContentTypeManager defines the protocol for the content registry. Clients can use this class to test a content stream or to find out about other content types in the system.

Content types are represented by IContentType. This class represents a unique content type that knows how to read a data stream and interpret content type-specific information. Content types are hierarchical in nature. For example, a content type for XML data is considered a child of the text content type.

        <content-type id="xml" name="%xmlContentTypeName"     
            base-type="text"
            priority="high"            
            file-extensions="xml"
            default-charset="UTF-8">
            <describer class="org.eclipse.core.internal.content.XMLContentDescriber"/>
        </content-type>

This allows new content types to leverage the attributes or behavior of more general content types.

Character sets

The platform text content type does not define a character set for text content. Children of the text content type are free to specify different default character sets when appropriate, as the XML content type does. The default character set for XML streams is UTF-8, which means that when an XML file does not have its encoding explicitly stated in its contents, its encoding will be deemed as UTF-8.

Content type collisions

It is conceivable that two independent plug-ins contribute a content type for the same kind of content. In this case, the platform will select only one content describer for the content. The describer selected is determined using a priority attribute that can be specified in the contentTypes extension. If two plug-ins contribute a content type for the same content with the same priority, it is indeterminate which content describer will be selected. Once a content describer is selected, however, all registry references to the "losing" content describer will be aliased to the one that was chosen.