JAR File Specification

Contents

Introduction

JAR file is a file format based on the popular ZIP file format and is used for aggregating many files into one. A  JAR file is essentially a zip file that contains an optional META-INF directory. A JAR file can be created by the command-line jar API in the Java platform. There is no restriction on the name of a JAR file, it can be any legal file name on a particular platform.

In many cases, JAR files are not just simple archives of java classes files and/or resources. They are used as building blocks for applications and extensions. The META-INF directory, if it exists, is used to store package and extension configuration data, including security, versioning, extension and services.

The META-INF directory

The following files/directories in the META-INF directory are recognized and interpreted by the Java 2 Platform to configure applications, extensions, class loaders and services: The manifest file that is used to define extension and package related data. This file is generated by the new "-i" option of the jar tool, which contains location information for packages defined in an application or extension.  It is part of the JarIndex implementation and used by class loaders to speed up their class loading process. The signature file for the JAR file.  'x' stands for the base file name. The signature block file associated with the signature file with the same base file name. This file stores the digital signature of the corresponding signature file. This directory stores all the service provider configuration files.

Name-Value pairs and Sections

Before we go to the details of the contents of the individual configuration files, some format convention needs to be defined. In most cases, information contained within the manifest file and signature files is represented as so-called "name: value" pairs inspired by the RFC822 standard.  We also call these pairs headers or attributes.

Groups of name-value pairs are known as a "section". Sections are separated from other sections by empty lines.

Binary data of any form is represented as base64. Continuations are required for binary data which causes line length to exceed 72 bytes. Examples of binary data are digests and signatures.

Implementations shall support header values of up to 65535 bytes.

All the specifications in this document use the same grammar in which terminal symbols are shown in fixed width font and non-terminal symbols are shown in italic type face.

Specification:

  section:           *header +newline
  nonempty-section:      +header +newline
  newline:          CR LF | LF | CR (not followed by LF)
  header:           name : value
  name:             alphanum *headerchar
  value:              SPACE *otherchar newline *continuation
  continuation:        SPACE *otherchar newline
  alphanum:            {A-Z} | {a-z} | {0-9}
  headerchar:          alphanum | - | _
  otherchar:            any UTF-8 character except NUL, CR and LF

; Also: To prevent mangling of files sent via straight e-mail, no
; header will start with the four letters "From".
 

Non-terminal symbols defined in the above specification will be referenced in the following specifications.

JAR Manifest

Overview

A JAR file manifest consists of a main section followed by a list of sections for individual JAR file entries, each separated by a newline. Both the main section and individual sections follow the section syntax specified above. They each have their own specific restrictions and rules.

The main section contains security and configuration information about the JAR file itself, as well as the application or extension that this JAR file is a part of. It also defines main attributes that apply to every individual manifest entry.  No attribute in this section can have its name equal to  "Name". This section is terminated by an empty line.

The individual sections define various attributes for packages or files contained in this JAR file. Not all files in the JAR file need to be listed in the manifest as entries, but all files which are to be signed must be listed. The manifest file itself must not be listed.  Each section must start with an attribute with the name as "Name", and the value must be a relative path to the file, or an absolute URL referencing data outside the archive.

If there are multiple individual sections for the same file entry, the attributes in these sections are merged. If a certain attribute have different values in different sections, the last one is recognized.

Attributes which are not understood are ignored. Such attributes may include implementation specific information used by applications.

Manifest Specification:

  manifest-file:              main-section newline *individual-section
  main-section:              version-info newline *main-attribute
  version-info:          Manifest-Version : version-number
  version-number :         digit+{.digit+}*
  main-attribute:           (any legitimate main attribute) newline
  individual-section:       Name : value newline *perentry-attribute
  perentry-attribute:      (any legitimate perentry attribute) newline
  newline :                CR LF | LF | CR (not followed by LF)
   digit:              {0-9} 

In the above specification, attributes that can appear in the main section are referred to as main attributes, whereas attributes that can appear in individual sections are referred to as per-entry attributes. Certain attributes can appear both in the main section and the individual sections, in which case the per-entry attribute value overrides the main attribute value for the specified entry. The two types of attributes are defined as follows.
 

Main Attributes

Main attributes are the attributes that are present in the main section of the manifest. They fall into the following different groups:

Per-Entry Attributes

Per-entry attributes apply only to the individual JAR file entry to which the manifest entry is associated with.  If the same attribute also appeared in the main section, then the value of the per-entry attribute overwrites the main attribute's value. For example, if JAR file a.jar has the following manifest content: It means that all the packages archived in a.jar are sealed, except that package foo.bar is not.

The per-entry attributes fall into the following groups:

Signed JAR File

Overview

A JAR file can be signed by using the command line  jarsigner  tool or directly through the java.security API. Every file entry will be signed if the JAR file is signed by jarsigner tool. Subsets of a JAR file can be signed by using the java.security API. A signed JAR file is exactly the same as the original JAR file, except that its manifest is updated and two additional files are added to the META-INF directory, a signature file and a signature block file.  When jarsigner is not used, the signing program has to construct both the signature file and the signature block file.

For every file entry that is signed in the signed JAR file, an individual manifest entry is created for it, if it does not exist in the manifest before. Each manifest entry lists one or more digest attribute and an optional Magic attribute.

Signature File

Each signer is represented by a signature file with extension .SF. The major part of the file is similar to the manifest file. It consists of a main section which includes information supplied by the signer but not specific to any particular jar file entry, followed by a list of individual entries whose name must also be present in the manifest file. Each individual entry must contain at least the digest of the corresponding entry in the manifest file.

Paths or URL's appearing in the manifest file but not in the signature file are not used in calculation.

Signature Validation

The signature is first verified when the manifest is first parsed. This verification can be remembered, for efficiency. This only validates the signature directions themselves, not the actual archive files.

To validate a file, a digest value in the signature file is compared against a digest calculated against the corresponding entry in the manifest file. Then, a digest value in the manifest file is compared against a digest calculated against the actual data referenced in the "Name:" attribute, which specifies either a relative file path or URL.

Example manifest file:
 

The corresponding signature file would be:
 

The Magic Attribute

Another requirement to validate the signature on a given manifest entry is that the verifier understand the value or values of the Magic key-pair value in that entry's manifest entry.

The Magic attribute is optional but it is required that a parser understand the value of an entry's Magic key if it is verifying that entry's signature.

The value or values of the Magic attribute are a set of comma-separated context-specific strings. The spaces before and after the commas are ignored. Case is ignored. The exact meaning of the magic attributes is application specific. These values indicate how to compute the hash value contained in the manifest entry, and are therefore crucial to the proper verification of the signature. The keywords may be used for dynamic or embedded content, multiple hashes for multilingual documents, etc.

Here are two examples of the potential use of Magic attribute in the manifest file:

        Name: http://www.scripts.com/index#script1
        SHA-Digest: (base64 representation of SHA hash)
        Magic: JavaScript, Dynamic

       Name: http://www.tourist.com/guide.html
        SHA-Digest: (base64 representation of SHA hash)
        SHA-Digest-French: (base64 representation of SHA hash)
        SHA-Digest-German: (base64 representation of SHA hash)
        Magic: Multilingual

In the first example, these Magic values may indicate that the result of an http query is the script embedded in the document, as opposed to the document itself, and also that the script is generated dynamically. These two pieces of information indicate how to compute the hash value against which to compare the manifest's digest value, thus comparing a valid signature.

In the second example, the Magic value indicates that the document retrieved may have been content-negotiated for a specific language, and that the digest to verify against is dependent on which language the document retrieved is written in.

Digital Signatures

A digital signature is a signed version of the .SF signature file. These are binary files not intended to be interpreted by humans.

Digital signature files have the same filename as the .SF file but different extension. The extension varies depending on the type of digital signature.

.RSA      (PKCS7 signature, MD5 + RSA)
  .DSA      (PKCS7 signature, DSA)
  .PGP      (Pretty Good Privacy Signature)

For those formats that do not support external signed data, the file shall consist of a signed copy of the .SF file. Thus some data may be duplicated and a verifier ought to compare the two files.

Formats that support external data either reference the .SF file, or perform calculations on it with implicit reference.

Each .SF file may have multiple digital signatures, but those signatures ought to be generated by the same legal entity.

File name extensions may be 1 to 3 alphanum characters. Extensions unrecognized are ignored.

Notes on Manifest and Signature Files

Following is a list of additional restrictions and rules that apply to manifest and signature files.

JAR Index

Overview

Since 1.3, JarIndex is introduced to optimize the class searching process of class loaders for network applications, especially applets. Originally, an applet class loader uses a simple linear search algorithm to search each element on its internal search path, which is constructed from the "ARCHIVE" tag or the "Class-Path" main attribute. The class loader downloads and opens each element in its search path, until the class or resource is found. If the class loader tries to find a nonexistent resource, then all the jar files within the application or applet will have to be downloaded. For large network applications and applets this could result in slow startup, sluggish response and wasted network bandwidth. The JarIndex mechanism collects the contents of all the jar files defined in an applet and stores the information in an index file in the first jar file on the applet's class path. After the first jar file is downloaded, the applet class loader will use the collected content information for efficient downloading of jar files.

The existing jar tool is enhanced to be able to examine a list of jar files and generate directory information as to which classes and resources reside in which jar file. This directory information is stored in a simple text file named INDEX.LIST in the META-INF directory of the root jar file. When the classloader loads the root jar file,  it reads the INDEX.LIST file and uses it to construct a hash table of mappings from file and package names to lists of jar file names. In order to find a class or a resource, the class loader queries the hashtable to find the proper jar file and then downloads it if necessary.

Once the class loader finds a INDEX.LIST file in a particular jar file, it always trusts the information listed in it. If a mapping is found for a particular class, but the class loader fails to find it by following the link, an InvalidJarIndexException is thrown. When this occurs, the application developer should rerun the jar tool on the extension to get the right information into the index file.

To prevent adding too much space overhead to the application and to speed up the construction of the in-memory hash table, the INDEX.LIST file is kept as small as possible. For classes with non-null package names, mappings are recorded at the package level. Normally one package name is mapped to one jar file, but if a particular package spans more than one jar file, then the mapped value of this package will be a list of jar files. For resource files with non-empty directory prefixes, mappings are also recorded at the directory level.  Only for classes with null package name, and resource files which reside in the root directory, will the mapping be recorded at the individual file level.

Index File Specification

The INDEX.LIST file contains one or more sections each separated by a single blank line. Each section defines the content of a particular jar file, with a header defining the jar file path name, followed by a list of package or file names, one per line.  All the jar file paths are relative to the code base of the root jar file. These path names are resolved in the same way as the current extension mechanism does for bundled extensions.

The UTF-8 encoding is used to support non ASCII characters in file or package names in the index file.
 

Specification

    index file :             version-info blankline section*
    version-info :        JarIndex-Version: version-number
    version-number :        digit+{.digit+}*
    section :          body blankline
    body :             header name*
    header :          char+.jar newline
    name :            char+ newline
    char :              any valid Unicode character except NULL, CR and LF
    blankline:              newline newline
    newline :          CR LF | LF | CR (not followed by LF)
    digit:               {0-9}
 
The INDEX.LIST file is generated by running jar -i. See the jar man page for more details.

Backward Compatibility

The new class loading scheme is totally backward compatible with applications developed on top of the current extension mechanism.  When the class loader loads the first jar file and an INDEX.LIST file is found in the META-INF directory, it would construct the index hash table and use the new loading scheme for the extension. Otherwise, the class loader will simply use the original linear search algorithm.

Service Provider

Overview

Files in the META-INF/services directory are service provider configuration files. A service is a well-known set of interfaces and (usually abstract) classes. A service provider is a specific implementation of a service. The classes in a provider typically implement the interfaces and subclass the classes defined in the service itself. Service providers may be installed in an implementation of the Java platform in the form of extensions, that is, jar files placed into any of the usual extension directories. Providers may also be made available by adding them to the applet or application class path or by some other platform-specific means.

A service is represented by an abstract class. A provider of a given service contains one or more concrete classes that extend this service class with data and code specific to the provider. This provider class will typically not be the entire provider itself but rather a proxy that contains enough information to decide whether the provider is able to satisfy a particular request together with code that can create the actual provider on demand. The details of provider classes tend to be highly service-specific; no single class or interface could possibly unify them, so no such class has been defined. The only requirement enforced here is that provider classes must have a zero-argument constructor so that they may be instantiated during lookup.
 

Provider-Configuration File

A service provider identifies itself by placing a provider-configuration file in the resource directory META-INF/services. The file's name should consist of the fully-qualified name of the abstract service class. The file should contain a newline-separated list of unique concrete provider-class names. Space and tab characters, as well as blank lines, are ignored. The comment character is '#' (0x23); on each line all characters following the first comment character are ignored. The file must be encoded in UTF-8.
 

Example

Suppose we have a service class named java.io.spi.CharCodec. It has two abstract methods:

   public abstract CharEncoder getEncoder(String encodingName);
  public abstract CharDecoder getDecoder(String encodingName);

Each method returns an appropriate object or null if it cannot translate the given encoding. Typical CharCodec providers will support more than one encoding.

If sun.io.StandardCodec is a provider of the CharCodec service then its jar file would contain the file META-INF/services/java.io.spi.CharCodec. This file would contain the single line:

  sun.io.StandardCodec    # Standard codecs for the platform

To locate an encoder for a given encoding name, the internal I/O code would do something like this:

  CharEncoder getEncoder(String encodingName) {
       Iterator ps = Service.providers(CharCodec.class);
       while (ps.hasNext()) {
     CharCodec cc = (CharCodec)ps.next();
     CharEncoder ce = cc.getEncoder(encodingName);
     if (ce != null)
         return ce;
       }
       return null;
   }
 

The provider-lookup mechanism always executes in the security context of the caller. Trusted system code should typically invoke the methods in this class from within a privileged security context.

API Details

See Also