Sunday, June 24, 2007

Illumina data analysis

I am working on the first dataset from Charlie to pilot the analysis of Illumina data analysis.

Lumi is a package to use. Initial problems I had:

1. Different version of BeadStudio have different format. Charlie is using BeadStudio 3.0.14. One interesting thing is current version give STD_ERR. Early version gave STD_DEV.
2. Annotation package should use lumiMouseV1. This is for mouse chip v1.1
3. The output file directly generated by BeadStudio did not put \t for rows that do not have data at the end of the row. So each row have different number of columns. For example, some has 50 columns (including DEFINATION, SYNONYM), but others that do not have those data at the last several columns may only have 48 columns or less. The way to solve this problem is to modify read.table like the following:
read.table("pmt1_all.txt", sep = "\t", fill = TRUE, header = TRUE, skip =8)

4. The pmt1_all.txt is too big. When I read into my laptop with 1.5G memory. It can only read in 3817 lines. I have to trim most of the columns to read in all the data.

R Script:

1. Generate a summary table

for(i in 1:542) {
tid = upTargetIDAB[[i]]
ABresults[i,1] = pmt1[which(pmt1$TargetID == tid),2]