Explanation of Partek's Hilbert Curve Visualization

What is a Hilbert Space-Filling Curve?

A Hilbert Space-Filling Curve (Hilbert Curve for short) is a way to view one-dimensional data in two-dimensional space. It has good locality-preserving behavior, meaning points close to each other in 1D space are on average close to each other in 2D space. Viewing 1D data in two dimensions allows you to "see" more of your data by utilizing more of your computer screen's real estate.

How do Hilbert Curves Apply to Next Generation Sequencing?

Hilbert curves have been successfully applied to genomic data as a way of visualizing intensity values at each genomic position (HilbertVis). This is especially well-suited for genome-wide data such as those obtained from next-generation sequencing. From a glance, you can see the distribution of read coverage and the location of read clusters that you would not be able to easily discern from a 1-D genome view alone.

What Exactly Am I Looking At?

The Partek Hilbert Visualization displays the chromosomes of your next-generation samples. First, the chromosomes are divided into non-overlapping bins of a user specified size. The default bin size is 10,000 base pairs. Next, the number of reads that fall within the bins are calculated. You can either view the total number of reads in each bin or the maximum coverage of each bin. You may use the mouse to hover over the the Hilbert Visualization or the cytoband view to display information about genomic position and read coverage. You may also right click and zoom to that location in the Partek Chromosome Viewer.

Example Profiles:

Here are some example pictures of next-generation data viewed with Partek's Hilbert Curve Visualization.

ChIP-Seq: Transcription Factor Binding

Notice that the peaks are localized (tiny red dots), which is expected for transcription factor binding peaks in ChIP-Seq data.

ChIP-Seq: Methylation

The peaks are not localized like in TF-binding but instead are more spread out and diffuse.

RNA-Seq

This shows the distribution of expressed genes in a skeleton muscle sample.