Atom typing is an important and integral part of any chemoinformatics software. Most of the calculations, and more importantly the assignment of implicit/explicit hydrogen(s) depend on this. Thanks to Egon, Rajarshi and others, the CDK has improved a lot over the years.
I (with Nimish a summer intern) did a quick test on the KEGG molecules (before KEGG becomes a paid service on 30th June, which is a pity…..!).
The good news is that only 1.37% molecules failed the “Atom Typing” test, as compared to last year (the failure rate was approximately 10%, mostly phosphates). Most of the failed atoms are metals (Cl,N,S,Fe,Mn,Zn,Cu, etc).
Here are the raw results: https://gist.github.com/1016328
Wish list: I hope the CDK developers or chemists can fix these too!
Kindly comment and leave your suggestions.