Topics : Please select one of the following topics, research their current backg
ID: 3819234 • Letter: T
Question
Topics: Please select one of the following topics, research their current background, analyze the purpose of requirements, and design/implement a new information system including a relational database gathering and justify its role in any development process. Then write the term paper.
1. Data Warehousing Systems
2. Data Visualization/Virtual Reality Information Systems
3. Artificial Intelligence and Expert Systems
4. Data Mining Systems
5. Group Decision Support Systems
Including Subjects in the Selected Topic:
1. Describe how the database supports the selected topic
2. Discuss what the database has a role to implement the selected topic
3. Discuss your idea about to improve the current topic in accordance with the database progress
4. Draw each component for the database and explain their function related with the topic
5. What is the future work for improving your research?
Assignment Instructions:
1. No ZIP file
2. The submitted assignment must be typed by ONE Single MS word file
3. At least 15 pages and 5 references
4. Use 12-font size and 1.5 lines space
5. No more than 4 Figures and 3 tables
6. Follow APA style and content format
You should follow the following content format:
Title: Topic
Abstract:
I. Introduction
II. Background
III. Current Issues and Suggest Topics
IV. Methods and Evaluations
V. Future Works
VI. Summary
References
Explanation / Answer
Data Mining Systems
Introduction –
Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
The information or knowledge extracted so can be used for any of the following applications –
Data mining is highly useful in the following domains –
Apart from these, data mining can also be used in the areas of production control, customer retention, science exploration, sports, astrology, and Internet Web Surf-Aid.
Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be mined, there are two categories of functions involved in Data Mining
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of descriptive functions
Classification and Prediction-
Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be able to use this model to predict the class of objects whose class label is unknown. This derived model is based on the analysis of sets of training data. The derived model can be presented in the following forms
Background knowledge-
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the Concept hierarchies are one of the background knowledge that allows data to be mined at multiple levels of abstraction.
Representation for visualizing the discovered patterns-
This refers to the form in which discovered patterns are to be displayed. These representations may include the following
Fraud Detection-
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms.
Background-
Necessity is the mother of invention. Since ancient times, our ancestors have been searching for useful information from data by hand.
However, with the rapidly increasing volume of data in modern times, more automatic and effective mining approaches are required.
Early methods such as Bayes' theorem in the 1700s and regression analysis in the 1800s were some of the first techniques used to identify patterns in data. After the 1900s, with the proliferation, ubiquity, and continuously developing power of computer technology, data collection and data storage were remarkably enlarged.
As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms in the 1950s, Decision trees in the 1960s and support vector machines in the 1980s. Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns.
Data mining or data mining technology has been used for many years by many fields such as businesses, scientists and governments. It is used to sift through volumes of data such as airline passenger trip information, population data and marketing data to generate market research reports, although that reporting is sometimes not considered to be data mining.
Data mining commonly involves four classes of tasks:
(1) Classification, arranges the data into predefined groups
(2) Clustering, is like classification but the groups are not predefined, so the algorithm will try to group similar items together
(3) Regression, attempting to find a function which models the data with the least error
(4) Association rule learning, searching for relationships between variables.
According to Han and Kamber, data mining functionalities include data characterization, data discrimination, association analysis, classification, clustering, outlier analysis, and data evolution analysis.
Data characterization is a summarization of the general characteristics or features of a target class of data.
Data discrimination is a comparison of the general features of target class objects with the general features of objects from one or a set of contrasting classes.
Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. Classification is the process of finding a set of models or functions that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
Current Issues-
Data mining is not an easy task, as the algorithms used can get very complex and data is not always available at one place. It needs to be integrated from various heterogeneous data sources. These factors also create some issues. Here in this, we will discuss the major issues regarding
Mining Methodology and User Interaction Issues-
Performance Issues-
There can be performance-related issues such as follows
Diverse Data Types Issues-
Methods and Evaluations-
Data Warehouse
A data warehouse exhibits the following characteristics to support the management's decision-making process
Query-Driven Approach-
This is the traditional approach to integrate heterogeneous databases. This approach is used to build wrappers and integrators on top of multiple heterogeneous databases. These integrators are also known as mediators.
Process of Query Driven Approach
Mentioned below is process of query driven data warehousing approach
Update-Driven Approach-
Today's data warehouse systems follow update-driven approach rather than the traditional approach discussed earlier. In the update-driven approach, the information from multiple heterogeneous sources is integrated in advance and stored in a warehouse. This information is available for direct querying and analysis.
From Data Warehousing (OLAP) to Data Mining (OLAM)-
Online Analytical Mining integrates with Online Analytical Processing with data mining and mining knowledge in multidimensional databases. Here is the diagram that shows the integration of both OLAP and OLAM
How does data mining work -
Data mining is used primarily in end-user queries to analyze patterns and relationships between data. Usually four different types of relationships are sought.
1. Classes: Data is sorted to find data in groups.
2. Clusters: Data items are grouped based on logical parameter or user preference.
3. Associations: Data can be used to find associations between two types of data.
4. Sequential patterns: Data is used to determine patterns and trends.
Data mining consists of five major elements-
1. Extract, transform, and load transaction data onto the data warehouse system.
2. Store and manage the data in a multidimensional database system.
3. Provide data access to business analysts and information technology professionals.
4. Analyze the data by application software.
5. Present the data in a useful format, such as a graph or table.
Stages of Data Mining Edit-
There are three separate stages of data mining
(1) Exploration
(2) Model building
(3) Deployment
Exploration-
This stage starts with preparing data such as data cleaning, transformation, selecting records etc. Depending on the nature of the problem, the first stage of the process of data mining may involve a simple choice of prediction the regression model, to identify the most relevant variables and determine the complexity and/or the general nature of models that can be taken into account in the next stage.
Model building and validation-
This stage involves choosing the best model based on there predictive performance. This sounds like an easy task but can be difficult. Several different methods may be used to determine which model is best for you.
Deployment-
This stage combines the previous two by implementing the model you chose and applying it to the data to generate predictions or estimates of the outcome.
The future hold for data mining -
Data stored in colossal repositories are analyzed and extracted for meaningful information then relevant knowledge. Data mining that is the process of knowledge discovery becomes one of the best tools on advanced statistics, machine learning, artificial intelligence, pattern recognition and computation capabilities in the business field. In the future, it is very likely that data mining becomes predictive analysis (Agosta, 2004) Data mining’s applications that will enrich human life in various fields such as business, education, medical field, scientific field, politics (Kumar and Bhardwaj, 2011), include:
- Data Mining in Security and Privacy Preserving. Example for recording of electronic communication like email logs and web logs have captured human process.
- Challenges in Mining Financial Data. For example, investors use models of assets prices to gain bigger profits.
- Detecting Eco-System Disturbances.
- Distributed Data Mining. Distributed algorithm is developed for association analysis such as parallel decision tree construction.
Coenen (2011) wrote that data mining would go beyond simple tabular mining for further applications on:
- Text mining: An example is the use of opinion or questionnaire mining where the objective is to obtain useful information.
- Image Mining: One of the examples is the classification of retinal image data and magnetic resonance imaging scan data to identify disorders.
- Graph mining: Facebook desires to identify groupings (communities) within these networks.
Future Trends and Applications-
1. DISTRIBUTED/COLLECTIVE DATA MINING-
One area of data mining which is attracting a good amount of attention is that of distributed and collective data mining. Much of the data mining which is being done currently focuses on a database or data warehouse of information which is physically located in one place. However, the situation arises where information may be located in different places, in different physical locations.
2. UBIQUITOUS DATA MINING (UDM)-
The advent of laptops, palmtops, cell phones, and wearable computers is making ubiquitous access to large quantity of data possible. Advanced analysis of data for extracting useful knowledge is the next natural step in the world of ubiquitous computing. Accessing and analyzing data from a ubiquitous computing device offer many challenges.
REFERENCES-
1. Fayyad, Usama, Gregory Piatetsky-Shapiro, Padhraic Smyth, From Data Mining to knowledge Discovery in Databases, 1996.
2. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, London:
Academic Press, 5, 2001.
3. Kantardzic, Mehmed, Data Mining: Concepts, Models, Methods, and Algorithms, New York: John Wiley & Sons Inc publishes, 2003.
4. Michael Chau, Reynold Cheng, Ben Pao, Uncertain Data Mining: A New Research Direction, Introduction, 2005.
5. Mika Sato, Yoshiharu Sato, L.C.Jain, J.Kacprzyk, Fuzzy Clustering Models and
Applications. UK: Physica-Verlag Heidelberg, 1999.
6. Ouri Wolfson, A. Prasad Sistla, Sam Chamberlain, and Yelena Yesha, Updating and Querying Databases that Track Mobile Units, MA: Kluwer Academic Publishers, 1999.
7. Reynold Cheng, Dmirti V.Kalashnikov, Sunil Prabhakar, Evaluating Probabilistic Queries over Imprecise Data, UK: Elsevier Science Ltd, 2007.
8. Sapphire, Large Scale Data Mining and Pattern Recognition,
https://computation.llnl.gov/casc/sapphire/overview/data_mining_steps.gif, 1999.