ITC516 | Privacy in Data Mining

Data mining

Data mining is defined to be the process of generating a sequence of meaningful and correct queries to retrieve information from wide range of data in a database. It has been known that data mining techniques are important for recovering issues in the database security. But with growth and development, it becomes a serious concern that data mining techniques can pose security issues. Various security professionals see data mining as most primary issues that customers will face in next decade. Security issues in data mining are one of the popular issues because while using data mining, individuals work on the large set of information and they can have easier to it. This is risky when data is not used securely.

Security issues of data mining

One of the major issues raised by this technology is not a technological or business one but a social one, i.e. issue of individual privacy. Data mining helps to analyse the routine business transactions and glean a significant amount of data regarding individuals purchasing habits as well as preferences. Another big security issue is data integrity. A key implementation issue is aggregating conflicting or redundant information from distinct sources. Moreover, there can also be issues associated with cost. The hardware cost of the system has dropped dramatically in past years; data mining attempts to be self-reinforcing. The more powerful is the data mining queries, the higher is the utility of information which is gleaned from information. Several other issues related to data mining are mentioned as follows:

Data quality: Data quality is known as a multi-faceted challenge which represents a major issue for data mining. Data quality defines completeness and accuracy of a data. Data quality is impacted by consistency and structure of data being analysed. The availability of replicated records, timeliness of updates, human errors and lack of data standards impact the efficiency of highly complex data mining techniques that are secretive to subtle differences existing in the data. To enhance the quality of data, it is essential to clean the data that involves removing duplicate records and normalising values for presenting information in a database.

Interoperability: Associated with data quality, is the challenge of interoperability of several databases. Interoperability is known as the ability of the system to work with systems or data with the help of common processes. For data mining, interoperability of database is essential to allow search and analysis of different databases concurrently. Data mining projects that tend to take benefit of current legacy databases might experience interoperability issues. Thus, organisation moving forward with the development of advanced databases and data sharing efforts requires accounting interoperability issues in the planning phases to assure better effectiveness.

Privacy: Concerns regarding privacy emphasize on both actual projects defined and concerns regarding the potential for data mining applications to be extended beyond original objectives. In the context of security, data mining has shown to be advantageous in encountering various forms of attacks to the computer systems. A data integration, as well as analysis efforts made by businesses and government agencies, raises fears regarding privacy that motivates the privacy preserving data mining. Another aspect of privacy preserving data mining is that organisations apply data mining algorithms without observing sensitive data values. With the help of data mining technology, an adversary can gain access to confidential information that cannot be reached through querying tools, hence jeopardizing the individual privacy.

Other interesting issues

  • Security leakage in existing privacy-preserving data mining methods
  • Impact of distributed data sources to security and privacy
  • Statistical disclosure control enforced to privacy preserving data mining

