NL FR EN
www.belgium.be

Optimisation of the OASIS data warehouse (OASIS)

Research project AG/JJ/142 (Research action AG)

Persons :

  • Prof. dr.  BONTEMPI Gianluca - Université Libre de Bruxelles (ULB)
    Coordinator of the project
    Financed belgian partner
    Duration: 1/10/2007-30/9/2009
  • M.  PACOLET Joseph - Katholieke Universiteit Leuven (KU Leuven)
    Financed belgian partner
    Duration: 1/10/2007-30/9/2009
  • Dr.  SAERENS Marco - Université Catholique de Louvain (UCLouvain)
    Financed belgian partner
    Duration: 1/10/2007-30/9/2009

Description :

The four federal departments of social investigation (SPF Sécurité sociale, Office national de Sécurité sociale, SPF Emploi Travail et Concertation sociale, Office national de l'emploi) currently use the OASIS datawarehouse (Organisation Anti-fraude des Services d’Inspection Sociale) which centralizes administrative data from different federal services. Currently, the OASIS datawarehouse helps social investigators of the fraud squad to determine which company they should monitor. The OASIS_AGORA research project has two complementary goals: enhance the OASIS datawarehouse by adding datamining tools and improve the understanding of the social fraud by knowledge extraction. The data driven modelling provided by the datamining techniques is expected to return a more accurate and adaptive insight on the fraud mechanisms. This will provide an added value to the investigating services in terms of better predicting and understanding this social phenomenon in our country, improving the planning and targeting of their audits and by that contribute to the effectiveness and efficiency. An expected outcome of the project will be an interface which will provide a data mining assisted access to data for the different communities (investigating services, public services, political services, scientists). By using a user interface, each user community will refer and exploit data and results related to their needs.

Up until now the OASIS- indicators (operational since 2002) of fraud risk are based on expert knowledge and derives frorm generic administrative information and data sharing and some data-matching. No confrontation has been made with the since 2005 operationally data warehouse on audits and the results of those audits (GENESIS register of audits- kadaster van onderzoeken /cadastre des enquetes) containing information on discovered fraud and violation of certain laws. The validation of the warning indicators of OASIS with those discovered fraud will not only validate the used alarms and algorithms, but will discover a new and more performing indicator as well and could lead to some ‘Copernican’ revolution in the data mining of the social inspectorates.
The project has been re-oriented at the start also from the existing data warehouse in OASIS, limited to some industries, to the information available in the Crossroad Data warehouse on Employment and Social Protection to all industries. This coincides also better with the coverage of the GENESIS register of audits, also covering the complete economy (with the exception of independent work).

Added value of the project to social science research: The OASIS_AGORA consortium deems that the information available in administrative sources has not been sufficiently exploited as far as the size of undeclared work is concerned. International and national efforts were too much concentrated on indirect (macro-economic most of the time) or direct (surveys) methodologies, missing at some point the potential to learn from the administrative information. There was also (i) no joint effort of the social and fiscal administrations in order to jointly analyse and compare their data (ii) no joint analysis of this information with data coming from independent workers (although national accounts situated huge amounts of undeclared work there) and (iii) no reconciliation of macro estimates with administrative sources. There was also limited information available for the services themselves, or reconciliation of their information. The OASIS_AGORA project will, of course, not be able to address all of these issues; rather it will focus on the use of micro-information to estimate the risk and volume of fraud. The outcome will also be useful in targeting the control activities, by improving its return and efficiency and its preventive effect on the non-compliance behaviour of individuals and firms. Efforts to estimate not only the risk of fraud but also the volume will be helpful to target the control and adapt the fines, and to improve the effectiveness of those fines that should be in relation with the potential benefit from fraud.

An additional value will derive from quantifying the total amount of undeclared work and increasing the visibility of this hidden side of economy (shadow economy) and the fight against it. In particular, making the fight against fraud more visible will make it more effective since the demonstration of the success, return, and persistency are known to have a preventive effect and to support law-abiding behaviours.

One of the major problems in fraud detection are the so called ‘ghosts’, namely fraudulent players not present in the registers. Belgium has a growing strategy and tradition to introduce and enlarge electronic administrative coverage of economic units. This coverage will be made more exhaustive from the combination of the information coming from social security for workers, independent persons and fiscal administration. We expect then that in these new data-sources there will be an evident potential to detect potential non-compliance within the administrative sources.

Expertise of the project partners:

- ULB:
A major expertise of the ULB-MLG team concerns the use of feature selection techniques in highly dimensional data analysis problems. Feature selection is a topic of machine learning whose objective is selecting, among a set of variables, the ones that will lead to the best predictive and explanatory model. Selecting features can also increase the intelligibility of a model while at the same time decreasing measurement and storage requirements. The OASIS_AGORA project is expected to benefit from such an expertise in terms of enhanced understanding of the social fraud indicators. Another research topic is time-series prediction which is ideally suited for tasks such as outlier (suspicious claim) detection and to track changes in claimant behaviour in order to predict the likelihood of abuse of the benefit system.
Web page: www.ulb.ac.be/di/mlg

- KUL:
The Higher institute of Labour studies carried out from 1996 several research projects on the structure and level of social fraud. This resulted in a large experience on the socio-economic relevance of administrative information obtained by studying links between revealed fraud and other information. On a meso and macro-level the information on the control activities and the detected fraud has been used to estimate the size and the scope of undeclared work and fraud.
Web page: www.hiva.

- UCL:
The UCL-MLG group has developed an expertise and plays an active role in the international scientific community about a number of techniques including: nonlinear statistical tools, artificial neural networks, probabilistic automata, hidden Markov models, formal grammar and finite-state automata induction, fuzzy logic, genetic algorithms, support vector machines, etc.
Web page: www.ucl.ac.be/mlg/

- ISTI:
The international partners takes part in several projects aiming at building and deploying a data mining application in the field of fiscal fraud detection. One of the main research activity of the partner outside Belgium is the development of a system for pattern discovery based on constraints. Another important research area is privacy preserving data mining, i.e., the development of data mining techniques which are aware and respectful of the privacy and anonymity of the citizens whose data are stored and are going to be analysed
Web page: www.isti.cnr.it/