Although deep learning has had a lot of success on unstructured data, such as images, audio and natural languages, it is remarkably less effective on structured data. These would be the kind of data that fits in a nice Excel spreadsheet, for instance. Shallow neural networks perform about the same level as deep networks.
Why would that be the case though?
The key is all about sparse feature selection: most deep learning networks built for tabular data assumes continuous features, but this seldom holds true for tabular data. Instead,
- only a small number of features contribute to the majority of correlations with the labels, and
- categorical features abound.
Fortunately, there is now new work that addresses the issue.
I write about this more in-depth here.
Update: I now have code for the modifications of TabNet right here. Check it out for yourself!