[P] A new open source MLP symbolic distillation and analysis tool Project
Open-source tool extracts symbolic formulas from trained models, potentially letting you delete the original network.
Developer Mate Kobiashvili has released SDHCE (Symbolic Distillation via Hierarchical Concept Extraction), an open-source tool that performs a novel type of neural network analysis. After training a model, SDHCE extracts a human-readable concept hierarchy directly from the network weights—requiring no additional data. The tool then verifies whether this distilled hierarchy alone can reproduce the network's original predictions. If successful, users get a compact symbolic formula they could theoretically implement by hand and discard the original neural network entirely.
The tool's innovation lies in its 'concept arithmetic' naming system. Instead of simply concatenating layer names, SDHCE traces every computational path back to raw input features, sums signed contributions, and cancels out opposing signals. This means if two network paths pull a feature like 'petal_length' in opposite directions, that feature disappears from the final concept name rather than creating clutter. The system also handles arbitrary interval granularity automatically, creating splits like low/mid/high without manual intervention.
Initial testing on the classic Iris dataset demonstrated SDHCE's potential: a 4-layer neural network distilled down to exactly two core concepts that fully reproduced all predictions. The resulting symbolic formula was compact enough to fit in a simple text file. While promising for interpretability, the developer is seeking feedback on whether the concept naming holds up on more complex, 'messier' real-world datasets beyond clean benchmarks.
- Extracts human-readable concept hierarchies directly from neural network weights with no extra data needed
- Uses 'concept arithmetic' to trace paths to input features and cancel opposing signals for cleaner formulas
- Tested on Iris dataset, distilling a 4-layer network to 2 concepts that fully reproduced predictions
Why It Matters
Could make complex AI models interpretable and replaceable with simpler, verifiable symbolic formulas for critical applications.