What should I look for when exploring a dataset?

During the Explore phase, you should (i) validate that your data is correctly loaded and parsed by the system, and (ii) look for any obvious correlations, features, insights in your dataset that you can use later on during the Training stage to interpret model training results. 

  • Data validation. In the Load Dataset step, take a quick look at the sample data showing in the Source Data card once you've loaded the data. Do all the fields appear? Is the data correctly parsed? Is this the correct dataset, the dataset you intended to process? 
  • Data insights. In the Explore Variables step, take a quick look at some of the variables of interest. First check whether the metadata makes sense, i.e. is the fill rate what you expected (percent of rows without a missing record for this field)? Is the unique count what you expected? Keep in mind the system is not looking at the full dataset but only a sample (see Sample Count in the variables list). Then, selecting a variable in the list, look at the distribution of values in the Variable Detail card. Is it what you expected? You can then further look at correlation between variables using the Correlation Analysis card. Here again, is it what you expected? 

Once you've validated your dataset meets your expectations, you can then move on to the Training stage. 

