In the references bellow three research papers for data quality in big data are
ID: 3858592 • Letter: I
Question
In the references bellow three research papers for data quality in big data are presented. Write paragraphs to report on a comparison between the three papers:
1- Big Data Preprocessing: A Quality Framework.
2- Evaluation the Quality of Social Media Data in Big Data Architucture.
3- Data Quality: The other Face of Big Data.
You need to clearly identify the criteria you use for this comparison. You need also to identify the motivations, objectives, advantages, limitations, contributions, approach, methodology, issues, problems of each paper. Discuss how can you use such frameworks in Data and Information Quality Management.
References
[1] Immonen, Anne, Paakonen, Pekka, et Ovaska, Eila.Evaluating the quality of social media data in big data architecture. IEEE Access, 2015, vol. 3, p. 2028-2043.
[2] Taleb, Ikbal, Dssouli, Rachida, et Serhani, Mohamed Adel. Big data pre-processing: A quality framework. In: Big Data (BigData Congress), 2015 IEEE International Congress on. IEEE, 2015. p. 191-198.
[3] Barna Saha, Divesh Srivastava, "Data Quality: The other Face of Big Data", 2014 IEEE, ICDE Conference 2014. Page 1294 - 1297.
Explanation / Answer
*Big Data Preprocessing: A Quality Framework:
From the various sources of data interpretation or raw data the Big Data has become a best approach in acquiring,
processing, and analyzing large amounts of heterogeneous data to derive valuable evidences.
In which the size, speed, and formats data is generated and processed affect the overall quality of information.
Therefore, Quality of Big Data has become an important factor to ensure that the quality of data is maintained
at all Big data processing phases.
This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization.
We propose a QBD model incorporating processes to support Data quality profile selection and adaptation.
Every data transformation happened in the pre-processing phase. which tracks and registers on a data provenance repository the effect of the data quality will large the EEG dataset.
So in Data and Information Quality Management: They obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
* Evaluation the Quality of Social Media Data in Big Data Architucture.
The online data availability is increasing day by daya and many companies are detecting the value of these
data from their clients and their business.
Similarly the data from the online are mainly procure for the availability like sociel media data like facebook.
So there are properly treated and assist in business decisions making.
So the unstructured and uncertain big data presents a new kind of challenge and questions in minds of data type.
So the questions like how to evaluate the quality of data and manage the value of data within a big data architecture?
So we should contribute the challenges by introducing a new architectural solution to evaluate
and manage the quality of social media data in each processing phase of the big data pipeline.
So in Data and Information Quality Management:The validated data for the user are proposed solution improves
business decision and useful in real time.
The solution is validated with an industrial case example, in which the customer insight is extracted from
social media data in order to determine the customer satisfaction regarding the quality of a product.
* Data Quality: The other Face of Big Data:
In society the Big data being generated and collected at unprecedented scale and also analyzed.
So the decision making is it bit cleaning the all prosiblity in outside world.
In Recent times they are very poor quality data is prevalent in large databases and on the Web.
Even in data analysis due to the poor data there are some serious consequences are in results.
The Big data is increasingly being recognized and challenges that the first three `V's, volume, velocity and variety,
bring to dealing with veracity in big data.
Due to the sheer volume and velocity of data, one needs to understand and repair erroneous data in a scalable and timely manner.
With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori
one needs to let the data to speak for itself in order to discover the semantics of the data.
So in Data and Information Quality Management the big data quality management mainly focused on the two major dimensions
- discovering quality issues from the data itself, and
- trading-off accuracy vs efficiency and identifies a range of open problems for the community.