Project

Feature selection and classification in high dimensional data based on Neuroevolution (FWOSB21)

Problem definition and objectives Nowadays there is a trend towards high dimensional datasets resulting from the simultaneous measurement of a high number of characteristics (features) of a small number of samples. In classification tasks this phenomenon is known as the 'curse of dimensionality' and thus feature selection methods are an indispensable module of the classification pipeline to identify the relevant features that can describe the underlying problem. In addition to a good feature selection scheme, the classifier’s optimal parameters are of very importance but they are not easily tunable. In this project I am going to develop an embedded feature selection method based on Neuroevolution that will automatically perform feature selection and learning of the optimal weights and topology of Artificial Neural Networks, thus exempting the user from manually defining the right network topology. The proposed algorithm will be based on NEAT and its successors (FS/FD-NEAT). The goal is the development of a robust, scalable algorithm able to deal with high dimensionality. The developed method will extend current similar algorithms in the sense that it will evolve simpler networks that are faster, easier to be interpreted, with much less computational demands but with the same expressive power so that application to high dimensional data becomes feasible. Towards these purposes important contributions are necessary to the domains of evolutionary computation and machine learning. The main objectives are to improve design choices made arbitrarily in the previous versions, evolve heterogeneous networks, control the way new structure is added and reduce the number of training cycles so that the convergence speed can be accelerated. Finally, the algorithm will be applied to biomedical datasets including imaging, genomics, pathological and clinical data, i.e. a dataset that requires feature selection and classification. This will allow to improve insight in identifying key factors related to particular diseases and in interpreting data regarding the degree of response to a therapy, allowing thus better diagnosis and personalized treatment options. Methodology and technology used In order to achieve these objectives, a literature study is needed in order to gain a deep understanding of the current approaches and then a systematic method will be implemented in order to explore better design choices. In addition, through algorithmic development, experimentation and validation on well-known benchmark datasets, it will become possible to apply a new robust algorithm to contemporary problems in healthcare. MATLAB is the software with which similar algorithms have been implemented. For reference reasons, MATLAB will be the first programming language used in this project to develop the algorithm. Later, implementation in more robust and scalable programming languages such as C++ will be performed in order to make it capable of commercialization as a software used in medical applications. Research group The research will take place in ETRO-IRIS of VUB which has a long standing track record in neuroevolutionary paradigms and medical imaging. I will closely collaborate with Prof Bart Jansen and Dr Jef Vandemeulebroucke who are experts in neuroevolutionary computation/ machine learning and medical imaging respectively. Also, the research group to which I belong has close collaborations with UZ Brussel, in particularly with Dr Johan de Mey and Dr Nico Buls. Currently a new collaboration with Dr Yan Liu from EORTC and an industrial user board are being established to ensure the scope of the project fits the needs of the market.

Date:1 Jan 2016 → 31 Dec 2019

Keywords:Neuroevolution

Disciplines:Image processing

Project type:Collaboration project

Project

Feature selection and classification in high dimensional data based on Neuroevolution (FWOSB21)

Researchers

Project partners

Funding