TwoRavens

TypeGuessUtil

TypeGuessUtil is the first Utility object we used during profiling process. It has several useful function to help you check the data type of current variable.

TypeGuessUtil (col_series, col_info)

      Return a blank TypeGuessUtil, does the data type analysis and fill corresponding variables in the given ColumnInfo object.


check_types ()

      Does the data type analysis and fill corresponding variables in the given ColumnInfo object.


is_not_numeric (var_series)

      Static method, check whether the input column is a numeric column or not. Return True if it is not numeric, False otherwise.


is_not_logical (var_series)

      Static method, check whether the input column contains boolean values.


check_nature (data_series, continuous_check)

      Static method, check the nature of input column, return corresponding nature type.


check_time (var_series)

      Static method, check whether the input column is a time instance, return date format where possible, ‘?’ if the format can not be determined, and None if it is not a time instance.

This does a series of tests for each value in the series to verify the (currently predefined) threshold is met for good matches / size sample, then returns the most common match (or None).

  1. Sanitize value
  2. If value is an int between 1600 and 2100, is a year
  3. Check if value is a month such as Jul or July
  4. Check if value is a day such as Sat or Saturday
  5. Filter out non-date values such as 0.1
  6. Check if value passes dateutil.parser.parse
  7. Check if variable is called ‘year’
  8. If “pandas.core.tools.datetimes._guess_datetime_format” is valid, return that value, otherwise ‘?’

check_location (var_series)

      Static method, check whether the input column is a location, return ‘US state’, ‘country’, or ‘country subdivision’ if it is, None otherwise.

This does a series of tests for each value in the series to verify the (currently predefined) threshold is met for good matches / size sample, then returns the most common match (or None).

  1. Sanitize value
  2. Check if value is a US state using us
  3. Check if value is a country using pycountry
  4. Check if value is a country subdivision using pycountry