Development & Middleware
Wednesday, 14. October 2015., 11:30
Hall 3 Restaurant
In predictive analysis it is common to be faced with a large amount of variables which could possibly be useful, but also possibly irrelevant or even contra productive if included in the final predictive model. In a real world cases it is not uncommon to observe a data set of up to 1000 or even more variables. A well performed manual selection results in a loss of a substantial amount of work hours and can as well result in varying degrees of human error. In our practice we’ve implemented a solution by using Oracle R Enterprise and thus automated the whole process of variable usability grading. This has resulted with a substantially smaller amount of variables whilst preserving the model accuracy as well as being almost no time consuming. Our solution presents the user with the clearly outlined usability of all the available variables as well as performs a quick preselection based on attribute importance. Results are made available in a form of an ORE data.frame, graphical form or a spreadsheet to result in a maximal usability as well as for further analysis. With this in mind user can clearly choose how many and which variables to use in order to maximize the predictive capabilities of the resulting model. An attendant of this talk can expect to learn more about the variable selection in predictive analysis, learn about our approach devised with Oracle technology as well as hear about some of our experiences with variable selection in practice.