ITC516 | Security Privacy and Ethics in Data Mining | Information Technology

Task

Question 1 - Written Exercise

Topic: Security, Privacy and Ethics in Data Mining.

In this task, you are required to read the journal articles provided below and write a short discussion paper based on the topic of security, privacy and ethics in data mining. You must:

 

  • evaluate how significant these implications are for the business sector; and
  • support your response with appropriate examples and references.

 

The task is worth 10 marks of the overall marks available for assessment. The recommended word length for this posting is 500 to 800 words.

 

Journal articles:

 

Ryoo, J. ‘Big data security problems threaten consumers’ privacy’ (March 23, 2016)

Tasioulas, J. ‘Big Data, Human Rights and the Ethics of Scientific Research’ (December 1, 2016)

Question 2 - Practical

 

    • Consider the following data set below which represents the assessment results of 40 students in a subject consisting of four assignments and final exam.

Assignment-1, Assignment-2, Assignment-3, Assignment-4, Final_Exam

?,94,34,30,42

35,94,85,33,45

31,46,22,35,48

46,90,60,36,50

52,94,49,48,50

 

58,94,30,34,51

47,90,?,23,52

37,94,25,?,52

35,94,45,31,54

57,94,100,29,54

51,94,5,30,54

 

45,94,33,33,55

44,0,35,36,55

52,95,56,42,56

 

35,94,?,36,57

57,97,57,42,57

 

45,90,71,43,57

39,94,54,33,57

 

31,94,63,31,57

45,94,?,29,59

35,90,84,49,59

37,90,40,50,61

83,97,26,39,61

68,97,55,45,62

 

50,95,56,46,62

77,93,?,41,63

84,48,18,35,63

45,90,21,38,63

62,95,38,?,63

38,94,40,39,64

 

50,90,?,29,64

32,90,38,32,64

 

44,90,43,36,65

57,94,52,39,68

50,94,39,42,70

55,90,62,?,71

 

43,94,54,36,72

50,90,30,30,74

54,90,82,28,77

64,95,5,8,78

 

  1. Create an ARFF file by using a text editor for this dataset and open the ARFF file in Weka [4 marks].

 

  1. Observe the summary data for the data set and the histograms for all attributes on the Preprocess tab page. Use the Visualize tab page to view the scatter plots between the variables of the data sets. Put a screenshot of the tab in your assignment [2 marks]

 

  1. Apply the unsupervised Discretize filter to the Assignment-4 marks. Put a screenshot of the filter output in your assignment and make some remarks on the data [2 marks]

 

  1. Practice filling in the missing values for all columns in the Viewer window in Weka both manually and by using filters. Put a screenshot of the filter outputs in your assignment and make comments on what values are suggested by WEKA for the missing values? [2 marks]

Rationale

These tasks aim to assess your progress towards:

 

  • be able to identify and analyse business requirements for the identification of patterns and trends in data sets;
  • be able to appraise the different approaches and categories of data mining problems;
  • be able to compare and evaluate output patterns;
  • be able to compare and evaluate appropriate techniques for detecting and evaluating patterns in a given data set;
  • be able to identify and evaluate the security, privacy and ethical implications in data mining;

Marking criteria

The grade you receive for this assessment as a whole is determined by the cumulative marks gained for each question. The tasks in this assessment involve a sequence of several steps and therefore

you will be marked on the correctness of your answer as well as clear and neat presentation of your diagrams, where required.

Question 1 - Written Exercise

 

Criteria

HD

DI

CR

 

Demonstrate an

 

 

 

 

ability to analyse,

Demonstrate an

Demonstrate an ability

 

Demonstrate an

reason and discuss

ability to analyse,

to analyse, reason and

 

ability to analyse,

the concepts to draw

reason and discuss

discuss the concepts to

 

reason and

justified conclusions

the concepts to draw

discuss the

that are logically

justified conclusions

draw justified

 

conclusions that are

 

concepts learned

supported by

that are logically

generally logically

 

in the subject

examples and best

supported by

supported by examples

 

(This includes

practice. Answers

examples and best

and best practice. The

 

content from

succinctly integrate

practice. The

answers are generally

 

online meetings,

and link information

answers are logically

textbook

into cohesive and

structured to create

logically structured to

 

create a

 

chapters,

coherent piece of

cohesive and

comprehensive, mainly

 

modules,

analysis and

coherent piece of

descriptive piece of

 

readings and

consistently use

analysis that

analysis. Some use of

 

forum

correct data mining

consistently use

correct data mining

 

discussions)

terminologies and

correct data mining

terminologies.

 

 

sophisticated

terminologies.

 

 

 

 

language.

 

 

 

 

 

 

 

 

 

PS

 

Demonstrate an

ability to analyse,

reason and

discuss most

concepts to draw

justified

conclusions

 

that are generally

 

logically

 

supported by

examples and

best

practice. The

answers are

partially

structured into

loosely-linked

rudimentary

sentences to

create a

comprehensive,

descriptive piece

 

of

analysis. Some use of correct data mining terminologies.

 

Question 2 - Practical

The grade you receive for this task is determined by the cumulative marks gained for each question (FL 0-49%; PS 50-64%; CR 65-74%; DI 75-84%; HD 85-100%).

Criteria Description

  • Correctness of formatted ARFF file that can be read by WEKA
  • Clear and neat screenshots
  • You have communicated your answer clearly and outlined what you have done using an appropriate mix of text and diagram.

Solution