Business Intelligence & Analytics
Wednesday, 19. October 2016., 09:00
Data, both structured and unstructured, the amount of which continues to rise, generally represent a growing effort and cost in terms of storage and maintenance for companies. Usually, they are kept for longer periods only for the sake of the legislation or due to necessity of rare and occasional reach for small pieces of stored information. Therefore, the system and DB administrators require that older data is stored on slower media and that models are reduced to a minimum.However, this large amount of data, from which the in-depth analysis can extract very valuable new conclusions (very rarely available from "common sense"), can be of exceptional value for the company . Samples, links and correlations among phenomena can be unexpectedly discovered thru analysis of time-course and spatial and social groupings.The biggest problems data analysts and scientists are currently facing are not only setting up statistical models, selection of adequate analytical tools and properly positioning the "real" versus false correlation based on the large number of variables, but also the basis on which all this works, which is data. Unstructured, "dirty" and missing data should be structured, refined and shaped in order just to get started with the analysis, and according to the analysts themselves, it can take up to 80% of their time. The missing data problem is usually resolved via extrapolation or guessing based on experience which creates additional uncertainty in the accuracy of the analysis. Also, considering that such an analysis for a large number of cases usually takes an unexpected turn and end up in blind alleys, analysis is required to be repeated several times, with the re-start necessary to transform, aggregate or adjust raw data, in order to meet new initial assumptions. Since we are talking about huge amounts of data, this is an extremely demanding process, both time- and performance wise, and usually results in lower quality of delivered results. due to tight deadlines.Although the title suggests that the lecture primarily deals with adjustments od data structures in the system Kapsch Fraud Management System 4.1, the idea is to set some general rules for creating new models and improving existing in order to facilitate complex analyzes in the best possible way.