.. _cluster: ############### *cluster* ############### | .. image:: ../images/tool-glyphs/cluster-glyph.png :width: 600pt | Similar to :doc:`../tools/merge`, ``cluster`` report each set of overlapping or "book-ended" features in an interval file. In contrast to ``merge``, ``cluster`` does not flatten the cluster of intervals into a new meta-interval; instead, it assigns an unique cluster ID to each record in each cluster. This is useful for having fine control over how sets of overlapping intervals in a single interval file are combined. .. note:: ``bedtools cluster`` requires that you presort your data by chromosome and then by start position (e.g., ``sort -k1,1 -k2,2n in.bed > in.sorted.bed`` for BED files). .. seealso:: :doc:`../tools/merge` ========================================================================== Usage and option summary ========================================================================== **Usage**: :: bedtools cluster [OPTIONS] -i **(or)**: :: clusterBed [OPTIONS] -i =========================== =============================================================================================================================================================================================================== Option Description =========================== =============================================================================================================================================================================================================== **-s** Force strandedness. That is, only cluster features that are the same strand. *By default, this is disabled*. **-d** Maximum distance between features allowed for features to be clustered. *Default is 0. That is, overlapping and/or book-ended features are clustered*. =========================== =============================================================================================================================================================================================================== ========================================================================== Default behavior ========================================================================== By default, ``bedtools cluster`` collects overlapping (by at least 1 bp) and/or bookended intervals into distinct clusters. In the example below, the 4th column is the cluster ID. .. code-block:: bash $ cat A.bed chr1 100 200 chr1 180 250 chr1 250 500 chr1 501 1000 $ bedtools cluster -i A.bed chr1 100 200 1 chr1 180 250 1 chr1 250 500 1 chr1 501 1000 2 ========================================================================== ``-s`` Enforcing "strandedness" ========================================================================== The ``-s`` option will only cluster intervals that are overlapping/bookended *and* are on the same strand. .. code-block:: bash $ cat A.bed chr1 100 200 a1 1 + chr1 180 250 a2 2 + chr1 250 500 a3 3 - chr1 501 1000 a4 4 + $ bedtools cluster -i A.bed -s chr1 100 200 a1 1 + 1 chr1 180 250 a2 2 + 1 chr1 501 1000 a4 4 + 2 chr1 250 500 a3 3 - 3 ========================================================================== ``-d`` Controlling how close two features must be in order to cluster ========================================================================== By default, only overlapping or book-ended features are combined into a new feature. However, one can force ``cluster`` to combine more distant features with the ``-d`` option. For example, were one to set ``-d`` to 1000, any features that overlap or are within 1000 base pairs of one another will be clustered. .. code-block:: bash $ cat A.bed chr1 100 200 chr1 501 1000 $ bedtools cluster -i A.bed chr1 100 200 1 chr1 501 1000 2 $ bedtools cluster -i A.bed -d 1000 chr1 100 200 1 chr1 501 1000 1