Clinical data can be obtained from various sources, such as file Medical Transcription and Electronic Medical Records (EMR). Can we have a new clinical database that large amounts of patient information and medical conditions using these two sources accumulate? Relationships and patterns in this data could provide new medical knowledge.

Importance of Clinical Data Mining

1. In 2010 over 30 million people were treated for fatal diseases. Cancer and heart disease are few. Identification of early signs of cancer and heart disease is possible and can save thousands of lives. Analysis of a database of thousands of patients can provide valuable information about possible causes, the nature of progression, etc., can help systems that would identify the disease at the first sign of action to develop leads to timely treatment and prevention techniques.

2. Every year issued new guidelines on the use and dosage of different drugs. Sometimes guidelines provide a number of drugs used in combination can produce side effects. The most recent example of that:

June 8, 2011, the FDA came out with new guidelines for the use of simvastatin, in particular, specific combinations of drugs that are now defined as "cons" to each dose of simvastatin.

Using this knowledge, we patients taking these drugs to disagree.

Clinical approach to data mining:

The data mining process is divided into four phases: i) the collection of information ii) prior to treatment iii) Data analysis iv) Application of knowledge
Clinical Data Mining

1. Data collection: the clinical data of a patient is stored in two different formats. i) Medical Transcription file (containing from 25 to 30% of the information) ii), DME (in 75-80% of the information). In this phase, each patient information about the file transcribed and EMR mapped.

2. Pre-treatment: For an exact power of the analyzer, the input document Clinical Document Architecture (CDA) is. So in the pre-processing input documents will be converted into CDA format.

3. Data analysis: pre-processed data is analyzed in a structured format unique. Here is the negation, SNOMED codes, Rx Norm codes, ICD-9 codes, measurements; dosages of medications, allergies and smoking are detected.

4. KT: The use of this knowledge, we have a new database, and querying the database can be useful in medical research and improving the health of the patient.
The process of systematically and automatically to be. In this study, a "data preparation framework called" model for data preparation. In the proposed model, the data from the source data into a flat table verification specialist, a dataset can be made suitable for further machine learning.

The validation of the proposed model, a number of experiments. , Machine learning, performance, receiver operating characteristic curve using the area to the clinical relevance of the selected variables assessed by physicians to evaluate two types of measurements were evaluated as performance indicators.

The results show significant performance improvements in each of the three principles preprocessing treatment, indicated by both types to the right data and heuristic rules to guide the work can help reduce demand, and thus make it possible to develop.

Author's Bio: 

Joseph Hayden writes article on Data Scraping Services, Web Data Scraping, Website Data Scraping, Web Screen Scraping, Web Data Mining, Web Data Extraction etc.