Title: | A Method to Analyze Recurrent DNA Copy Number Aberrations in Tumors |
---|---|
Description: | In tumor tissue, underlying genomic instability can lead to DNA copy number alterations, e.g., copy number gains or losses. Sporadic copy number alterations occur randomly throughout the genome, whereas recurrent alterations are observed in the same genomic region across multiple independent samples, perhaps because they provide a selective growth advantage. This package implements the DiNAMIC procedure for assessing the statistical significance of recurrent DNA copy number aberrations (Bioinformatics (2011) 27(5) 678 - 685). |
Authors: | Vonn Walter [aut, cre]
|
Maintainer: | Vonn Walter <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-03-11 02:43:30 UTC |
Source: | https://github.com/cran/dinamic |
Cytoband annotation information from the hg19 genome build
annot.file
annot.file
This four-column data frame contains cytoband annotation data that is used by the makeCytoband
function. Each row corresponds to a distinct cytoband, and column 1 contains the chromosome number, column 2
contains the start position (in base pairs), column 3 contains the end position (in base pairs), and column 4
contains the cytoband name (e.g. p21.3). Additional columns may be present, but they are not used.
The file cytoBand.txt.gz for the hg19 build can be downloaded from the UCSC Genome Browser at
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/. The format of cytoBand.txt differs from that
of annot.file, but it can be used by the function makeCytoband
if reformat.cytoband = TRUE
.
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
Assessing the Significance of Recurrent DNA Copy Number Aberrations
detailedLook( x, marker.data, annot.file, num.perms, num.iters, gain.loss = "gain", reformat.annot = FALSE, random.seed = NULL )
detailedLook( x, marker.data, annot.file, num.perms, num.iters, gain.loss = "gain", reformat.annot = FALSE, random.seed = NULL )
x |
An n by m numeric matrix containing DNA copy number data from n subjects at m markers. |
marker.data |
A dataframe containing marker position data for markers in the autosomes. Column 1 contains the chromosome number for each marker, and column 2 contains the position (in base pairs) each markers. Additional columns, if present, represent information about the markers (e.g. probe names). |
annot.file |
A cytoband annotation dataframe. Each row corresponds to a distinct cytoband, and column 1 contains the chromosome number, column 2 contains the start position (in base pairs), column 3 contains the end position (in base pairs), and column 4 contains the cytoband name (e.g. p21.3). Additional columns may be present, but they are not used. |
num.perms |
A positive integer that represents the number of cyclic shifts used to create the empirical null distribution. |
num.iters |
A positive integer that represents the number of distinct gain (loss) loci that will be assessed. |
gain.loss |
A character string that indicates whether recurrent gains ( |
reformat.annot |
A logical value that indicates whether annot.file needs to be reformatted
(default = FALSE). See the "note" section of |
random.seed |
An optional random seed (default = NULL). |
This function applies the Detailed Look version of DiNAMIC's cyclic shift procedure to assess
the statistical significance of recurrent DNA copy number aberrations. Either recurrent gains
(gain.loss = "gain"
) or recurrent losses (gain.loss = "loss"
) are assessed using a null
distribution based on num.perms
cyclic shifts of x
. Iterative calls to DiNAMIC's
peeling procedure (implemented here in the peeling
function) allow users to assess
the statistical significance of num.iters distinct gains (losses). As noted in Bioinformatics (2011) 27(5)
678 - 685, the Detailed Look procedure recalculates the null distribution after each iteration of the peeling
procedure. While this approach is more computationally intensive, simulations suggest that it provides more
power to detect recurrent gains (losses).
A matrix with num.iters
rows. The entries of each row correspond to the marker that is
being assessed. More specifically, the entries are (1) the chromosome number, (2) the marker position
(in base pairs), (3) additional marker information present in marker.data
, (4) the marker number,
and (5) the p-value obtained from the null distribution, (6) the endpoints of the peak interval (in base
pairs), as described in Bioinformatics (2011) 27(5) 678 - 685.
detailedLook(wilms.data, wilms.markers, annot.file, 100, 3)
detailedLook(wilms.data, wilms.markers, annot.file, 100, 3)
Find the chromosome arm for each marker
makeCytoband(marker.data, annot.file, reformat.annot = FALSE)
makeCytoband(marker.data, annot.file, reformat.annot = FALSE)
marker.data |
A two-column numeric matrix of marker position data for markers in the autosomes.
Column 1 contains the chromosome number for each marker, and column 2 contains the position (in base
pairs) for each marker. This is a submatrix of the marker position matrix used by |
annot.file |
A dataframe containing cytoband annotation for the autosomes. Each row corresponds to a distinct cytoband, and column 1 contains the chromosome number, column 2 contains the start position (in base pairs), column 3 contains the end position (in base pairs), and column 4 contains the cytoband name (e.g. p21.3). Additional columns may be present, but they are not used. |
reformat.annot |
A logical value that indicates whether |
DiNAMIC's peeling procedure is detailed in Bioinformatics (2011) 27(5) 678 - 685, and it is performed
by the peeling
function. By construction, the peeling procedure only affects markers in a given
chromosome arm. This function is used internally by the peeling
function to restrict the peeling
procedure to the chromosome arm containing the marker that corresponds to max(colSums(x))
.
A character vector of length m, where m is the number of markers.
wilms.pq = makeCytoband(wilms.markers, annot.file) #A character vector of length 3288, and each entry is either #"p" or "q", depending on the chromosome arm of the given marker. table(wilms.pq) #Produces the following output: #wilms.pq # p q #1147 2141
wilms.pq = makeCytoband(wilms.markers, annot.file) #A character vector of length 3288, and each entry is either #"p" or "q", depending on the chromosome arm of the given marker. table(wilms.pq) #Produces the following output: #wilms.pq # p q #1147 2141
Apply the peeling procedure at a given marker
peeling(x, marker.data, cytoband, k)
peeling(x, marker.data, cytoband, k)
x |
An n by m numeric matrix containing DNA copy number data from n subjects at m markers. |
marker.data |
marker.data A two-column numeric matrix of marker position data for markers in the
autosomes. Column 1 contains the chromosome number for each marker, and column 2 contains the position
(in base pairs) for each markers. This is a submatrix of the marker position matrix used by
|
cytoband |
A character vector of length m that contains the chromosome arm (p or q) for each
marker. This is produced by the |
k |
A positive integer between 1 and m that represents the most aberrant marker. |
The peeling procedure is detailed in Algorithm 2 of Bioinformatics (2011) 27(5) 678 - 685, but here
we provide a brief overview. By construction, marker k
represents the most aberrant gain (loss).
The peeling procedure rescales all copy number values in x
that contribute to making marker k
aberrant, so that after applying the peeling procedure marker k
is "null." By construction, the
rescaling procedure is restricted to entries in x
that correspond to markers in the same chromosome
arm as k
. This allows users to assess the statistical significance of multiple gains (losses) throughout
the genome.
A list containing two components: (1) the n by m matrix produced by applying the peeling algorithm
to the matrix x
at marker k
, and (2) the peak interval around marker k
, as described
in Bioinformatics (2011) 27(5) 678 - 685.
Find DiNAMIC's null distribution
quickLook( x, marker.data, annot.file, num.perms, num.iters, gain.loss = "gain", reformat.annot = FALSE, random.seed = NULL )
quickLook( x, marker.data, annot.file, num.perms, num.iters, gain.loss = "gain", reformat.annot = FALSE, random.seed = NULL )
x |
An n by m numeric matrix containing DNA copy number data from n subjects at m markers. |
marker.data |
A dataframe containing marker position data for markers in the autosomes. Column 1 contains the chromosome number for each marker, and column 2 contains the position (in base pairs) for each markers. Additional columns, if present, represent information about the markers (e.g. probe names). |
annot.file |
A cytoband annotation dataframe. Each row corresponds to a distinct cytoband, and column 1 contains the chromosome number, column 2 contains the start position (in base pairs), column 3 contains the end position (in base pairs), and column 4 contains the cytoband name (e.g. p21.3). Additional columns may be present, but they are not used. |
num.perms |
A positive integer that represents the number of cyclic shifts used to create the empirical distribution. |
num.iters |
A positive integer that represents the number of distinct gain (loss) loci that will be assessed. See "Details" for more information. |
gain.loss |
A character string that indicates whether recurrent gains ( |
reformat.annot |
A logical value that indicates whether annot.file needs to be reformatted (default = FALSE).
See the "Note" section of |
random.seed |
An optional random seed (default = NULL). |
This function applies the "Quick Look" version of DiNAMIC's cyclic shift procedure to assess the statistical
significance of recurrent DNA copy number aberrations. Either recurrent gains (gain.loss = "gain"
) or
recurrent losses (gain.loss = "loss"
) are assessed using a null distribution based on num.perms cyclic shifts
of x
. Iterative calls to DiNAMIC's peeling procedure (implemented here in the peeling
function)
allow users to assess the statistical significance of num.iters distinct gains (losses). As noted in Bioinformatics
(2011) 27(5) 678 - 685, the "Quick Look" procedure calculates the null distribution once, and the same distribution
is used to assess the statistical significance of the most aberrant gain or loss after each iteration of the peeling
procedure. This approach is less computationally intensive than "Detailed Look" because the null distribution is
only computed once, but simulations suggest that it provides less power to detect recurrent gains (losses). The
resulting p-values are corrected for multiple comparisons because the null distribution is based on computing
max(colSums(x))
or min(colSums(x))
.
A matrix with num.iters
rows. The entries of each row correspond to the marker that is
being assessed. More specifically, the entries are (1) the chromosome number, (2) the marker position
(in base pairs), (3) additional marker information present in marker.data, (4) the marker number, and
(5) the p-value obtained from the null distribution, (6) the endpoints of the peak interval (in base pairs),
as described in Bioinformatics (2011) 27(5) 678 - 685.
quickLook(wilms.data, wilms.markers, annot.file, 100, 3)
quickLook(wilms.data, wilms.markers, annot.file, 100, 3)
Recode binary vectors
recodeBinary(binary.vec, k)
recodeBinary(binary.vec, k)
binary.vec |
A binary vector of length |
k |
A positive integer. |
This function is called internally by peeling
.
A binary vector of length m that contains a single contiguous string of 1's, namely the string that contains the 1 in the kth position of binary.vec.
test = c(1, 0, 0, 1, 1, 0, 0, 1, 0) recodeBinary(test, 5) #Returns (0, 0, 0, 1, 1, 0, 0, 0, 0)
test = c(1, 0, 0, 1, 1, 0, 0, 1, 0) recodeBinary(test, 5) #Returns (0, 0, 0, 1, 1, 0, 0, 0, 0)
Probe-level DNA copy number data from Wilms' tumor (Natrajan et al., 2006)
wilms.data
wilms.data
A 97 by 3288 numeric matrix containing DNA copy number data, as described below.
Natrajan et al. (J. Pathology (2006) 210: 49 - 58) used array comparative genomic hybridization to obtain genome-wide DNA copy number data from 97 Wilms' tumor samples at 3288 markers. This matrix contains the DNA copy number data after applying the bias-correction procedure outlined in Bioinformatics (2011) 27(5) 678 - 685. Each row corresponds to DNA copy number from one subject at 3288 markers, while each column contains DNA copy number data for 97 subjects at one marker.
https://www.ebi.ac.uk/biostudies/arrayexpress accession number E-TABM-10.
Array comparative genomic hybridization marker data from Natrajan et al. (2006)
wilms.markers
wilms.markers
A data frame with 3288 observations on the following 3 variables.
Chromosome
The chromosome for the given marker
Position
The position (in bp) for the given marker
Name
The name of the marker (e.g., R:A-MEXP-192:RP11-465B22)
Natrajan et al. (J. Pathology (2006) 210: 49 - 58) used array comparative genomic hybridization to obtain genome-wide DNA copy number data from 97 Wilms' tumor samples at 3288 markers. This data frame contains genomic position data for the probes in the array.
https://www.ebi.ac.uk/biostudies/arrayexpress accession number E-TABM-10.