# GNNExplainer on MUTAG Dataset
## Mutag Dataset
### Abstract
A review of the literature yielded data on over 200 aromatic and heteroaromatic nitro compounds tested for mutagenicity in the Ames test using S. typhimurium TA98. From the data, a quantitative structure-activity relationship (QSAR) has been derived for 188 congeners. The main determinants of mutagenicity are the hydrophobicity (modeled by octanol/water partition coefficients) and the energies of the lowest unoccupied molecular orbitals calculated using the AM1 method. It is also shown that chemicals possessing three or more fused rings possess much greater mutagenic potency than compounds with one or two fused rings. Since the QSAR is based on a very wide range in structural variation, aromatic rings from benzene to coronene are included as well as many different types of heterocycles, it is a significant step toward a predictive toxicology of value in the design of less mutagenic bioactive compounds. Ref (https://pubs-acs-org.libproxy.utdallas.edu/doi/pdf/10.1021/jm00106a046)
### Data
| # of Graphs | # of Classes | Total # of Nodes | Total # of edges |
| ----------- | ------------ | ---------------- | --- |
| 188 | 2 | 3371 | 7442 |
Nodes are labeled by atom type and edges by bond type.
| Encoded ID | Node labels | Edge labels | Classes |
| ---------- | ----------- | ----------- | ---------- |
| -1 | - | - | Nonmutagen |
| 0 | C | Aromatic | - |
| 1 | N | Single | Mutagen |
| 2 | O | Double | - |
| 3 | F | Triple | - |
| 4 | I | - | - |
| 5 | Cl | - | - |
| 6 | Br | - | - |
Data Download (https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip)
### Ground Truth
1. Hydrophobicity is clearly a major factor in the mutagenic potency of aromatic nitro compounds. (P 796)
2. Electron-attracting elements conjugated with nitro groups enhance mutagenicity. (P 796)

- Order of Electron-attracting elements (most to least):
- F > O > N > Cl > Br > C > I
3. Compounds with three or more fused rings are much more mutagenic, other factors being equal, than those with one or two. (P 796)
4. From GNNExplainer paper they identifies features that are important for predicting the molecule’s mutagenicity, are C, O, H, and N atoms (P 9)
## Explanation on Classifcation Model
### Results
Trained a graph classification model with an accuracy of 81% on the MUTAG dataset.
Results of the explainations on the classifier found that most influential nodes for classification are:




As shown from the figures it is found that the most influential elements for the classifcation of mutegentic molecule is the presence of **C, O, N** elements (greatest to least). And the most bond between elemnts for classifcation is **double , and single**.
### Findings
Now to answer the question does the explaintion match what with the ground truth of dataset?
* As stated in the ground truth (2) Electron-attracting elements conjugated with nitro groups enhance mutagenicity. And from our findings we found that the most influential elements are C, O, N where O has high eloctronegivity and joined with N which is in nitro group can increase mutagenicity. Which is also found to be the most occurance of edge as being influential (O - N or O - C).
* The results of the explaintion also matches with the ground truth found in the GNNExplainer paper where they identified the features that are important for predicting the molecule’s mutagenicity, are C, O, H, and N atoms.