COMPUTER SCIENCE

 

JAVA

 

R

 

XML

 

LINUX

 

OTHERS

 

BIOINFORMATICS

 

BIOJAVA

 

 

BIOSQL

 

 

MICROARRAY

 

 

MOTIF FINDING

 

 

REGULATION NETWORK

 

OTHERS

 

LIFE SCIENCE

 

 

我如何计算一个分布中的信息量或熵?

分布中的信息量或熵反映着分布的富集程度。可以用分布工具类中的静态方法计算香农信息(shannon information)和熵。
香农信息的值反映着总的信息量,以double存储。熵值以HashMap存储每个标志和它对应着的熵。下面的程序对一个偏好分布的序列计算这两个值。

import java.util.*;
import org.biojava.bio.dist.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.symbol.*;

public class Entropy {
public static void main(String[] args){
Distribution dist = null;

try{
//创建一个偏好分布
dist = DistributionFactory.DEFAULT.createDistribution(DNATools.getDNA());
// 设置a的权重0.97
dist.setWeight(DNATools.a(),0.97);
// 其余的为0.1
dist.setWeight(DNATools.c(),0.01);
dist.setWeight(DNATools.g(),0.01);
dist.setWeight(DNATools.t(),0.01);
}
catch(Exception ex){
ex.printStackTrace();
System.exit(-1);
}

// 计算信息量
double info = DistributionTools.bitsOfInformation(dist);
System.out.println("Information = "+info+" bits");
System.out.print("\n");

// 计算熵(以log2为底)
HashMap entropy = DistributionTools.shannonEntropy(dist,2.0);

//打印每个碱基的熵
System.out.println("Symbol\tEntropy");
for ( Iterator i = entropy.keySet().iterator(); i.hasNext(); ){
Symbol sym = (Symbol) i.next();
System.out.println(sym.getName()+" \t"+entropy.get(sym));
}
}
}

--BACK TO TOP

 

Maintainted by Wu Xin, CBI, Peking University, China, 2003