I am a student in computer engineering and data science at the INSA of Rouen, in France. I love science, especially maths and I am curious about everything. I study things such as data mining, optimization, signal processing, image processing, graph theory and software engineering. I would really love to apply data mining to the medical field in order to bring a meaning to what I do.
I also love to read and take pictures, and I am always eager to travel and discover new places and cultures, it is why I spent a year as an exchange student in the US in 2011.
As part of my summer internship at Creative Data, in 2015, I worked on a Kaggle competition. The goal of this competition was to identify different hand movements based on electroencephalograms. I worked on several Python scripts in order to solve this problem.
I first tried some very classic machine learning algorithms, such as logistic regression, SVM and random forest. Then I tried neural networks, which performed way better on my validation set. I used a dense neural network made of two couples of layers dense-drop out and a convolution neural network made of a convolution layer, a max-pooling layer and a dense layer. I used a weighted mean of the scores predicted by those two networks to compute my final result.
Seeing that one class was over-represented, I tried re-balancing the classes by selecting only 20% of the data from the biggest class. The result wasn't good because a lot of data was lost so I didn't do this for my final solution. I also applied a band pass filter, as well as a Common Spatial Pattern algorithm which I used to create new variables. I also tried reducing the number of features but it resulted in a greater test error.
Recognition of 3D point clouds
I worked on a school project which goal was to implement solutions from the paper "Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes", by Andrew E. Johnson and Martial Hebert. The goal was to identify 3D point clouds representing an object on the road (pedestrian, car...).
We first practised on only two classes, using a Gaussian and a linear SVM. We decided to keep the linear SVM for the multi-class algorithm. We used an SVM one versus all, and an SVM one versus one, for which we tried two kinds of parameter validation: one where the parameter was chosen for each SVM one versus one and one where the same parameter was used for all SVMs.
The choice of the attributes was based on the recommendations of the article we were studying. We used statistics on the intensity of the points, the bounding box and the attributes scatter-ness, linear-ness and surface-ness. As one class was over-represented compared to the others, we also rebalanced the classes.
I studied data mining in class during three semesters. It is hard to sum up everything we have seen during this time and the algorithms we have implemented, but here are some topics:
- Unsupervised data mining: k-means, fuzzy k-means, k nearest neighbours, hierarchical clustering, PCA;
- Optimisation: without constraints (gradient descent and Newton), and with constraints;
- Regression: linear and polynomial regression;
- Classification: SVM, neural networks, random forest, bagging, Bayesian decision, logistic regression;
- Focus on SVM: multi-class SVM, hyper-parameters tuning, kernels;
- Others: Lasso, ridge regression.
What I do
As a part of my studies, I am currently working on a end-of-studies project. I am part of an 8-member team, working with Scrum methods 22 hours a week. We are working for Libon, by Orange Vallée, which is a VoIP application. We studied the different applications of graph processing and data mining to social networks to decide on the most interesting applications for Libon. We are now developing a program to detect communities in the network formed by Libon users. We are using Scala, Spark and GraphX.
In summer 2015 I have done my specialization internship at Creative Data, a French start-up in Rouen. I created a Scala program to automatize getting and cleaning open data from the French platform data.gouv and saving it to HDFS and Hive Table. I also analysed different prediction tools for the company. I worked on a Kaggle competition which goal was to identify different hand movements based on electroencephalograms, in Python. I also followed a sale-forecasting project and wrote its unit tests in R.
In 2014, I have been the president of AJIR, the Junior Entreprise of this INSA of Rouen. A Junior Entreprise is an association managed by students which goal is to realize projects for companies. As a president, I managed a team of 10 students, decided the strategy of the association, met with the clients and supervised scientific projects. I have also been an auditor for the CNJE (the French confederation of Junior Entreprises). Therefore I audited other Junior Entreprises in France in order to help protect the label and advice them in their development.
I love learning new things. It is why simply studying what I see in school doesn't seem enough for me. I took an on-line class on Coursera about Machine Learning. It gave me a different approach of subjects I study in school, such as data mining. I applied some of the things I learned on a school project on semi-supervised learning for clustering, and on many other projects, such as image processing and the data mining projects presented earlier.