This dataset includes 286 instances described by nine attributes, including categorical features. This is an example of imbalanced data. The goal of corresponding predictive task is to predict the occurrence of breast cancer.
Filter by: OpenML
EHR
The dataset represents EHR results saved for ten years (1999-2008) in clinical care units at 130 US hospitals and integrated delivery networks. Data includes 101766 observations, a description of the patient's condition at the time of admission, information about the diagnosis, and the number of tests performed.
The dataset contains results for 615 patients, who are blood donors and Hepatitis C patients. Demographic features like age are reported next to laboratory results.
This dataset is curated by combining five datasets over 11 standard features, making it the largest heart disease dataset available for research. Despite sharing this data on OpenML, it comes from separate research studies and is merged as a result of the meta-analysis.
Dataset was collected to detect patients with liver disease. Data comes from Andhra Pradesh in India. This dataset contains information about 583 patients and 11 variables.
Originally, the Dataset came from the National Institute of Diabetes and Digestive and Kidney Diseases, but data was restricted because of ethical guidelines. The objective of the experiment is to predict whether a patient has diabetes based on certain diagnostic measurements. This dataset is one of the most popular data used to introduce machine learning methods.
This dataset was created by combining 6 different sources. All of them were collected in Australia. The dataset is used to identify prognostic factors in thyroid disease among 30 different features. Among them is information from blood tests but also from the patient's interview.