TwoRavens

Preprocess Runner

PreprocessRunner is the most fundamental component in our service. It handles the input data frame, stores the task configuration and executes the actual variable analysis function. Below is the function it provided.


PreprocessRunner (dataframe, job_id=None, jsonld_citation=None, schema_info_dict=None, data_source_info=None, celery_task=None)

      Return a blank PreprocessRunner with specified setting.


run_preprocess ()

      Let the PreprocessRuner execute the data profiling process. Return True is the process is done correctly, otherwise, an error message will be logged and False will be returned.


load_from_file (input_file)

      Static method, it will create a dataframe item by read given file. Return a None and an error message if any error happened. Return an initialized PreprocessRunner and a None if everything goes well.


load_update_file (preprocess_input, update_input)

      Static method, it will initialize the PreprocessRunner via the JSON file contains sufficient information. The JSON file should have the same content after calling get_final_json_indented().


get_self_section ()

      Return a JSON string contains only the information in the self section.


get_datset_level_section ()

      Return a JSON string contains only the information in the dataset-level section.


show_final_info (indent=None)

      Print a JSON string contains self section, dataset-level section, variable section and variable display section.


get_final_json_indented ()

      Return a indented JSON string, which has the same content with the string printed by show_final_info().


get_final_dict (as_string)

      Return a not indented JSON result string, which has the same content with the string printed by show_final_info().