I need your help regarding attached for all the questions for course DATAMining
ID: 3586175 • Letter: I
Question
I need your help regarding attached for all the questions for course DATAMining management
Answered it from your mind Do not copying the Answered
1- Define the KDD process and describe all its step? 2- Differentiate between classification of data and clustering of data with the help of suitable examples. Explain different approaches to handle the problem of missing values of attributes while data cleaning. 3- Why do we need preprocessing of the data? Explain any four data preprocessing techniques with the help of suitable examples 4-Explanation / Answer
1.
The KDD process is a Knowledge of Discovering data in Databases.
The KDD process is also known for finding knowledge in data and increasing the high level application data mining methods.
The KDD process is a interest of researchering machine learning, databases, statistics, artificial intelligence and data visualization.
The Steps of the KDD Process
- The KDD Process used for Developing an understanding of the application domain.
- The KDD Process used for Data cleaning and preprocessing.
- The KDD Process used for Data reduction and projection.
- The KDD Process used for Creating a target data set
- The KDD Process used for Data mining.
- The KDD Process used for Interpreting mined patterns.
- The KDD Process used for Consolidating discovered knowledge.
2.
Differiate between the classification of data and clustering of data
- Classification of data are new data which will have to set new label for them.
- Clustering of data are set of history transactions which recorded who bought what.
- Classification of data needs to labeled samples from a set of classes
- Clustering of data needs to unlabeled samples.
- The classification of data use case will clssify new sample into known classes.
- The clustering of data use case suggest groups based on patterns in data.
3
The data preprocessing technique which cleans the bad data and filters the incorrect data out of the data set and reduce the unnecessary data is called as Data cleaning.
- Data cleaning attributes are used for finding the Missing Values Replacement Policies.
- Data cleaning attributes are used for Ignoring the records with missing values.
- Data cleaning attributes are used for Replacing the global constant
- Data cleaning attributes are used for Fill in missing values manually based on your domain knowledge.
- Data cleaning attributes are used for Replace them with the variable mean or the most frequent value.
- Data cleaning attributes are used for Use modeling techniques.
4.
- We need preprocessing of the data because by using the techniques the data preprocessing will get rid of all noises and unimportant data.
- Data preprocessing is all about transforming or converting the raw data into an understandable format.
- Even the dataset becomes easy to handle and with less volume.
The four data preprocessing techniques are:
- Data cleaning.
- Dara TransForma±on
- Data Integra±on
- Missing Data Imputation.
Data Cleaning : This is the data preprocessing technique which cleans the bad data and filters the incorrect data out of the data set and reduce the unnecessary data.
Data Transformation : This is the data preprocessing technique which convert the data and conolidated the data so that the mining process result could be applied.
Data Integration: This is the data preprocessing technique which will merge the data from multple sources.
This process must implemented very carefully in order to avoid redundencies and inconsistencies in the data set.
Missing Data Imputation: This is the data preprocessing technique which will imputate all the missing data from the data set and recover all the integrated data from different corners of the data set.