use KOBAS to identify enriched pathways

看着手里公司做好的KEGG富集列表,总想自己试试看自己能做怎么样。之前想着自己用python写个脚本来做,开始写才发现,这活不是一个几行脚本能搞定的,不过发现了keggrest package,一个实现了KEGG API的好东西,基本上get link list都有了。今天发现了KOBAS,才知道国人已经为傻瓜化KEGG注释搭建web server.我要做的就是把合适的基因号导入,然后run就行了。

###ID prepare
我的基因号现在形如:

  • AT1G01170
  • AT1G01480
  • AT1G01490

我要把它转为:

  • ath:AT1G01170
  • ath:AT1G01480
  • ath:AT1G01490
  • ath:AT1G01610

这是uniProt识别的KEGG格式gene id,要几行python 代码:

1
2
3
4
5
6
7
8
9
if __name__ == "__main__":
with open('inputgene.txt','r') as f:
genelist = f.readlines()
for g in genelist:
g = 'ath:' + g.strip()
with open('output.txt','a') as f:
f.write(q)
f.write('\n')
print "convertion complete!"

ID mapping

使用uniProt ID Mapping功能,源格式选KEGG,目标格式选uniprot KA(这也是KOBAS可识别的格式之一)。Mapping结束后,下载结果:

  • From To
  • ath:AT1G01170 Q2HIQ2
  • ath:AT1G01480 Q06402
  • ath:AT1G01490 O03982

我需要把第二列提取出来:

  • Q2HIQ2
  • Q06402
  • O03982

使用的python代码:

1
2
3
4
5
6
7
8
9
10
import re
with open('uniportlist.txt','r') as f:
genelist = f.readline()
for g in genelist:
unip = re.search(r'\t(\S+)',g)
neog = unip.group(0).strip()
with open('cleanlist.txt','a') as f:
f.write(neog)
f.write('\n')
print 'extraction complete!'

接着就是选参数,run,然后identify enriched pathways