2013-09-08

use KOBAS to identify enriched pathways

看着手里公司做好的KEGG富集列表，总想自己试试看自己能做怎么样。之前想着自己用python写个脚本来做，开始写才发现，这活不是一个几行脚本能搞定的，不过发现了keggrest package，一个实现了KEGG API的好东西，基本上get link list都有了。今天发现了KOBAS，才知道国人已经为傻瓜化KEGG注释搭建web server.我要做的就是把合适的基因号导入，然后run就行了。

###ID prepare
我的基因号现在形如：

AT1G01170
AT1G01480
AT1G01490

我要把它转为：

ath:AT1G01170
ath:AT1G01480
ath:AT1G01490
ath:AT1G01610

这是uniProt识别的KEGG格式gene id,要几行python 代码:

if __name__ == "__main__":
  with open('inputgene.txt','r') as f:
    genelist = f.readlines()
  for g in genelist:
    g = 'ath:' + g.strip()
    with open('output.txt','a') as f:
      f.write(q)
      f.write('\n')
print "convertion complete!"

ID mapping

使用uniProt ID Mapping功能，源格式选KEGG，目标格式选uniprot KA(这也是KOBAS可识别的格式之一)。Mapping结束后，下载结果：

From To
ath:AT1G01170 Q2HIQ2
ath:AT1G01480 Q06402
ath:AT1G01490 O03982

我需要把第二列提取出来:

Q2HIQ2
Q06402
O03982

使用的python代码：

import re
with open('uniportlist.txt','r') as f:
  genelist = f.readline()
for g in genelist:
  unip = re.search(r'\t(\S+)',g)
  neog = unip.group(0).strip()
  with open('cleanlist.txt','a') as f:
    f.write(neog)
    f.write('\n')
print 'extraction complete!'

接着就是选参数，run,然后identify enriched pathways

popucui

Stuff Related to Bioinformatics Computer and More

use KOBAS to identify enriched pathways

ID mapping

留言