Cardea
The Problem Definition is considered a fundamental component that formulates the task for Machine Learning models. It includes generating and identifying two main concepts: the target variable and the cutoff times.
Therefore, the first step to work with Cardea is defining a Machine Learning Task (or using one of the already defined tasks). For example, Missed Appointment is a common task that aims to predict whether the patient showed to the appointment or not, helping hospitals to optimize their scheduling policies and resources efficiently.
Following with the previous example, the Missed Appointment task is currently defined as a binary classification task in the system, determining whether a patient showed to the appointment or not from the point of appointment scheduling.
Usually, the outcome is defined over the FHIR data schema, using the resource id values for references between instances.
As it was stated before, the success of the Problem Definition step and its outcome depends on two main concepts: the target variable and the cutoff times. The target variable is generated automatically by Cardea if it does not exist in the dataset and its objective is to set the definition of the model output. In the other hand, the objective of cutoff times is to split the data in such manner that any events before the cutoff time are used for training while events after the cutoff time are used for testing. The following code shows the format for these values in the Missed Appointment task:
In [1]: from cardea import Cardea In [2]: cardea = Cardea() In [3]: cardea.load_entityset(data='kaggle') In [4]: cardea.select_problem('MissedAppointment') Out[4]: time instance_id label 5642903 2016-04-29 18:38:08+00:00 5642903 noshow 5642503 2016-04-29 16:08:27+00:00 5642503 noshow 5642549 2016-04-29 16:19:04+00:00 5642549 noshow 5642828 2016-04-29 17:29:31+00:00 5642828 noshow 5642494 2016-04-29 16:07:23+00:00 5642494 noshow ... ... ... ... 5651768 2016-05-03 09:15:35+00:00 5651768 noshow 5650093 2016-05-03 07:27:33+00:00 5650093 noshow 5630692 2016-04-27 16:03:52+00:00 5630692 noshow 5630323 2016-04-27 15:09:23+00:00 5630323 noshow 5629448 2016-04-27 13:30:56+00:00 5629448 noshow [110527 rows x 3 columns]
Cardea encapsulates six different prediction problems for users to explore easily, these are described as follows:
Diagnosis Prediction: a. Predicts whether a patient will be diagnosed with a specified diagnosis.
Length of Stay: a. Predicts how many days the patient will be in the hospital.
Missed Appointment: a. Predicts whether the patient showed to the appointment or not.
Mortality Prediction: a. Predicts whether a patient will suffer from mortality.
Prolonged Length of Stay: a. Predicts whether a patient stayed in the hospital more or less than a period of time (a week by default).
Readmission: a. Predicts whether a patient will revisit the hospital within certain period of time (a month by default).
You can see the list of problems using the list_problems(...) method, example:
list_problems(...)
In [5]: from cardea import Cardea In [6]: cardea = Cardea() In [7]: cardea.list_problems() Out[7]: {'DiagnosisPrediction', 'LengthOfStay', 'MissedAppointment', 'MortalityPrediction', 'ProlongedLengthOfStay', 'Readmission'}