Class ContainerFactory


  • public final class ContainerFactory
    extends Object
    Aggregates SAMRecord objects into one or more Containers, composed of one or more Slices. based on a set of rules implemented by this class in combination with the parameter values provided via a CRAMEncodingStrategy object. The general call pattern is to pass records in one at a time, and process Containers as they are returned:
    
      long containerOffset = initialOffset; // after writing header, etc
      ContainerFactory containerFactory = new ContainerFactory(...)
      // retrieve input records and obtain/emit Containers as they are produced by the factory...
      while (inputSAM.hasNext() {
         Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset);
         if (container != null) {
             containerOffset = writeContainer(container...)
         }
      }
    
      // if there is a final Container, retrieve and emit it
      Container finalContainer = containerFactory.getFinalContainer(containerOffset);
      if (finalContainer != null) {
          containers.add(finalContainer);
      }
      
     
    Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never aggregated with other slices into a single container, no matter how many slices/container are requested, since it can be very inefficient to do so (the spec requires that if any slice in a container is multiple-reference, all slices in the container must also be MULTI_REFERENCE). For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into it's own container.
    • Method Detail

      • getNextContainer

        public final Container getNextContainer​(SAMRecord samRecord,
                                                long containerByteOffset)
        Add a new SAMRecord object to the factory, obtaining a Container if one is returned.
        Parameters:
        samRecord - the next SAMRecord to be written
        containerByteOffset - the byte offset to record in the Container if one is created
        Returns:
        a Container if the threshold for emitting a Container has been reached, otherwise null
      • getFinalContainer

        public Container getFinalContainer​(long containerByteOffset)
        Obtain a Container from any remaining accumulated SAMRecords, if any.
        Parameters:
        containerByteOffset - the byte offset to record in the newly emitted Container if one is created
        Returns:
        a Container if any record have been accumulated, otherwise null
      • shouldEmitContainer

        public boolean shouldEmitContainer​(int currentReferenceContextID,
                                           int nextRecordIndex,
                                           int numberOfSliceEntries)
        Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters. A container is emitted if: - the requested number of slices per container has been reached, or - a multi-reference slice has been accumulated (a multi-ref slice will always be emitted into it's own container as soon as it's generated, since we dont want to confer multi-ref-ness on the next slice, which might otherwise be single-ref), or - we haven't reached the requested number of slices, but we're changing reference contexts and we don't want to create a MULTI-REF container out of two or more SINGLE_REF slices with different contexts, since by the spec we'd be forced to call that container MULTI-REF, and thus the slices would have to be multi-ref. So instead emit a single ref container
        Parameters:
        currentReferenceContextID -
        nextRecordIndex -
        numberOfSliceEntries -
        Returns:
        true if a Containershould be emitted, otherwise false