read the SYNOPSIS





1 TABLE OF CONTENTS



back to TABLE OF CONTENTS













2 PROLOGUE


To provide some context for the reader with respect to what this is all about, some general information was included:

A summary for the analysis was not included in this chapter, but can be found at the chapter SYNOPSIS.


back to start of this chapter
back to TABLE OF CONTENTS


2.1 About The Assignment

This project was created for the 2nd peer-graded assignment of:

Course 5: Reproducible Research,
from Data Science Specialization,
by Johns Hopkins University,
at Coursera

The course is taught by:

  • Jeff Leek, PhD
  • Roger D. Peng, PhD
  • Brian Caffo, PhD

As putted by the teachers of the course:

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

The assignment requests to address 2 questions:

Your data analysis must address the following questions:

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Question 2: Across the United States, which types of events have the greatest economic consequences?

based on the observation from the supplied dataset:

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

Some quite general guidelines and a tip were provided:

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

It was deliberately decided to adopt a more educational approach aiming to produce a well-justified and self-explained product that can serve as guide to a beginner on how a basic pipeline can be constructed in order to obtain a report with an analysis from scratch.

All the requirements for the assignment were followed, with one exception:

  • due to the book-like structure that was adopted for the report it was considered more appropriate to include the SYNOPSIS not immediately after the title, but as a separate chapter after the PROLOGUE


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






2.2 About The Main Script

In the github repository https://github.com/jzstats/Reproducible-Research--2nd-Assignment, that hosts all the material relevant to this project the main script RepRes_____analysis.Rmd that contains the code used to conduct the analysis can be found.

When knitted directly from RStudio, it produces the Markdown file RepRes_____analysis.md with the analysis.

In addition, it was rendered with the script render_____RepRes_analysis.R, (as explained at the following section of this chapter, 2.3 About The Report) to produce a bookdown variation that was uploaded to Rpubs and used to populate the webpage that was created to showcase this project.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






2.3 About The Report

The main Rmd file, RepRes_analysis.Rmd that contains the code to conduct the analysis and produces the Markdown document RepRes_analysis.md was rendered with the script render_____RepRes_analysis.R to create a bookdown version of the report with the analysis, that are hosted at the webpage created to showcase the this project:


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













3 SYNOPSIS


The U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Events Database, was explored to identify the most harmful weather event types, among the weather phenomena defined in NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 7), with respect to population health and economy.

The raw data was loaded in R from the supplied file, preproccessed, the target data subset was extracted, in-record validation was conducted, the majority of missing values were imputed (via a deterministic and conservative approach), the observations were cross validated and finally the table with the processed data was created, which contained all information needed to address the two questions of interest:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

For the first question, the harm on population health by each weather event type was evaluated (separately) based on the average impact of the observations that resulted in non-zero damage over each of the three perspectives (fatalities, injuries and casualties) that were considered to be of importance.

Similarly for the second question, the harm on economy by each weather event type was evaluated (separately) based on the average impact from the observations that resulted in non-zero damage over each of the three perspectives (property damage, crop damage and economic damage) that were considered to be of importance.

Although for both questions the main criterion to rank the included weather event types (from the most harmful to the least) for each perspective was the overall average damage observed (with respect to each perspective) based on the observations that caused non-zero damage, the average for the 90% of cases with lowest impact versus the average for the 10% of cases with the highest impact (for each of the included weather event types) was reported to provide a more complete and insightful ‘picture’ of the consequences observed by each weather event type, due to the fact that for all perspectives, the majority of weather event types were highly positively skewed.

The analysis was structured, performed and documented in such way that fortifies the reproducibility of the report and explains every required detail so that even the non-expert can follow the procedure and understand the thought process behind the decision making at each stage.


back to start of this chapter
back to TABLE OF CONTENTS













4 STORM EVENTS DATASET


To conduct the analysis for this project, the file with the raw data repdata_data_StormData.csv.bz2 was used, which contains data from the Storm Events Dataset gathered and made publicly available by U.S. National Oceanic and Atmospheric Administration (NOAA).

Some general information as well as two points of interest about the dataset:

were discussed to provide the nessecary insights in order to understand why the decisions which govern the approach adopted in this analysis were made.


back to start of this chapter
back to TABLE OF CONTENTS


4.1 General Informations

The version of the dataset used in this analysis contains observations for the severe weather events that happened (or more accurately begun) from January 1950 to November 2011 at United States.

Further details about the dataset (which was used in this analysis) can be accessed by the supplemental material provided at the instructions of the assignment:

For additional information on the Storm Events Dataset, as well as an updated and cleaner version of the data, with observations from January 1950 up to January 2020 (at the time this report was produced, but it is expected to continue updating), it is recommended to visit and explore:

Finally, a document with detailed information for the history of the dataset, was available at NOAA’s Storm Events Dataset wepbage for the version history:


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






4.2 Points Of Interest

In order to understand why some of the decisions which govern the approach adopted in this analysis were made, it is essential to take into account two crucial facts with respect to the observations recorded in the dataset:


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


4.2.1 Changes in the composition of weather event types

Through the years, as the publicity of the dataset soared, several aspects governing the data collection procedure changed in order to expand, enrich and fortify the quality of the data.

As a result the number of defined weather event types that were collected increased several times starting from just one (TORNADO) for the first few years and expanding into 48 defined weather event times at the time the dataset used in this analysis was created. Consequently there are inconsistencies in the the composition of weather event types between different periods that could affect the integrity of the analysis.

Furthermore for the period 1996 up to 2000 while the weather event types that were being recorded had already been significantly increased, the values for the weather event type entries were entered though a free text field resulting in more than 950 different unique entries.

For this reason it was decided to use for the analysis only the part with observations since January 2001, for which as a result of the introduction of a drop down menu and the removal of the free text field for the entries of the weather event type values, the majority of observations don’t suffer from such problems and the weather event types contained include the majority of the latest defined weather event types.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




4.2.2 Eligibility criteria for inclusion of weather events in the dataset

Out of all weather events that happened in the period from January 2001 to November 2011 at United States and were classified as one of the types that were recorded (at the period they occurred), only those in the subset that belonged to at least one of the following three groups were eligible to be included in the dataset:

  1. The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce.
  2. Rare, unusual, weather phenomena that generate media attention.
  3. Other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occur in connection with another event.

An important implication of the above policy must be highlighted:

  • From all the weather phenomena that happened in the period from January 2001 to November 2011 at United States and were of a type that was recorded at the time they occurred, the dataset contains only the subset with those that either resulted in harm (to population health or to economy) or gathered high publicity.
  • On the contrary all the weather phenomena that happened in the period from January 2001 to November 2011 at United States and neither caused any harm (to population health or to economy) nor gathered high public interest, were ignored, even if they were of a type that was recorded at the time they occurred.

Consequently any conclusion made for a weather event type in general will inevitably be biased, as it will overestimate the consequences with respect to the harm they caused (either to population health or to economy) due to the fact that the available sample is not representative of the the overall population of weather phenomena (of the types that were recorded) by default.

For this reason it was decided to use for the analysis:

  • Only the subset of observations that resulted in non-zero harm with respect to each of the perspectives of interest (fatalities, injuries and casualties) in order to determine the most harmful weather event types for the population health.
  • Only the subset of observations that resulted in non-zero harm with respect to each of the perspectives of interest (property damage, crop damage and economic damage) in order to determine the most harmful weather event types for the economy.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













5 PRELIMINARY ACTIVITIES


Executes four preliminary tasks in order to ensure (and set when it is needed and possible) that the working directory and the R session are ready to proceed with the analysis:


back to start of this chapter
back to TABLE OF CONTENTS


5.1 Set The Random Seed

In an attempt to fortify the reproducibility of the random events, the number 1234567890 was explicitly chosen and set as the random seed.

Note that the only random events that took place in this analysis were the assignment of random positions for the labels at the plots:

by the function geom_repel_label() from the ggrepel library.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






5.3 Create All Required Directories

During the execution of the main script, RepRes_analysis.Rmd several outputs are produced, (that are also included in the report), mostly to enhance further the reproducibility of the analysis.

All those files are exported in appropriate sub-directories inside the directory with name outputs which is created at the working directory.

# Create a list with the paths to all sub-directories 
# of the directory tree for the outputs of this analysis
directory_tree_____outputs <- list(
    "filepath_____outputs_____processed_data" = 
        file.path("outputs", "processed_data"),
    "filepath_____outputs_____harm_on_population_health_____figures" = 
        file.path("outputs", "harm_on_population_health", "figures"),
    "filepath_____outputs_____harm_on_population_health_____results" = 
        file.path("outputs", "harm_on_population_health", "results"),
    "filepath_____outputs_____harm_on_economy_____figures" = 
        file.path("outputs", "harm_on_economy", "figures"),
    "filepath_____outputs_____harm_on_economy_____results" = 
        file.path("outputs", "harm_on_economy", "results"),
    "filepath_____outputs_____reproducibility_support_____r_session" = 
        file.path("outputs", "reproducibility_support", "r_session"),
    "filepath_____outputs_____reproducibility_support_____MD5_checksums" =
        file.path("outputs", "reproducibility_support", "MD5_checksums")
    
)

# Create the directory tree for the outputs of the analysis.  
invisible(lapply(
    X = directory_tree_____outputs,
    FUN = function(filepath_of_subdirectory) {
        if ( ! dir.exists(filepath_of_subdirectory) ) {
            dir.create(filepath_of_subdirectory, recursive = TRUE)
        }
    }
))


# Check if all subdirectories of the directory for the outputs of the analysis 
# were successfully created.
do_the_directories_exists <- vapply(
    X = directory_tree_____outputs,
    FUN = dir.exists,
    FUN.VALUE = logical(1)
)

# If failed to created any of the sub-directories 
# required for the outputs of the analysis 
# the process terminates
if (any(!do_the_directories_exists)) {
    stop(
        "\n",
        "Failed to create the directories: ", "\n",
        paste0("\t", directory_tree_____outputs[!do_the_directories_exists], "\n"),
        "The process is aborted for now.", "\n",
        "Please rerun the script or create the required sub-directories manually.", 
        "\n"
    )
}

If failed to created any of the sub-directories in the directory tree for the outputs of the analysis, the process terminates.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






5.4 Access The File With The Raw Data

The file with name repdata_data_StormData.csv.bz2, which contains data from the Storm Events Dataset was supplied for this assignment and used to conduct the analysis.

If the file doesn’t already exists at the working directory, an attempt will be made to download it automatically.

# Path to the file with the compressed raw data.
filepath_____unprocessed_data <- "repdata_data_StormData.csv.bz2"

# The link supplied by the instuctions of the assignment 
# to download the file with the compressed raw data.
url_to_download_the_data_file <- 
  "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

# Check if the file 'repdata_data_StormData.csv.bz2', 
# with the compressed raw data is available at the working directory. 
## if it doesn't exist...
if ( !file.exists(filepath_____unprocessed_data) ) {

  message(
    "\n", 
    "The file, '", filepath_____unprocessed_data, "'", "\n", 
    "doesn't exists at the working directory.",
    "\n"
  )
  message(
    "\n", "Trying to download the file, ", "\n",
    "'", filepath_____unprocessed_data, "' ", "\n",
    "with the raw data from the url: ", "\n",
    "\t", "'",  url_to_download_the_data_file, "'"
  )
  
  ### ...an attempt is made to download it from the link supplied by assignment
  try(
    download.file(
      url = url_to_download_the_data_file,
      destfile = filepath_____unprocessed_data)
  )
  
  ## Checks if the file 'repdata_data_StormData.csv.bz2' 
  ## was successfully downloaded.  
  ### in case the file is not found at the working directory 
  ### after the attempt to download 
  ### the process terminates with an informative message 
  ### that explains the situation to the user
  if ( !file.exists(filepath_____unprocessed_data)  ) {
    stop(
      "\n", 
      "Failed to download the required file,", "\n",
      "'", filepath_____unprocessed_data, "'", "\n",
      "with the raw data.", "\n",
      "The process is aborted for now."
    )
  } 
} 

If the download fails, the process terminates.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













6 DATA PROCESSING


The data processing pipeline, started with a supplied file, repdata_data_StormData.csv.bz2 that contained raw data from the Storm Events Dataset and produced the table with the processed data.

The pipeline consists of seven distinct stages:

  1. Load The Raw Data In R
    • The table with the raw data was created by loading in R the raw data from the supplied file with the compressed raw data with all variables coerced to character type. Post validation was conducted and an overview of the table with the raw data was presented.
  2. Preprocess The Raw Data
    • From the data at the table with the raw data, in order to create the table with the preprocessed data prerequisites were verified about the variables required for the analysis before they were selected, coerced to their appropriate types and a key was set for the table. Post validation was conducted and an overview of the table with the preprocessed data was presented.
  3. 6.3 Extract The Target Data Subset
    • From the table with the preprocessed data only the subset of data that includes the observations for the weather events that happened in the period from 2001 to 2011 and caused non-zero fatalities, injuries, property damage or crop damage were extracted. Post validation was conducted and an overview of the table with the target data subset was presented.
  4. Conduct In-Record Data Validation
    • The values of each variable at the table with the target data subset were validated against appropriate constrains for each column separately (independently of the other variables) and those entries that were found invalid got substituted with NAs to create the table with the in-record validated data. Post validation was conducted and an overview of the table with the in-record validated data was presented.
  5. Impute Missing Values
    • The missing values at each variable from the table with the in-record validated data were examined and the those that could be retrieved (via a deterministic and conservative way) were imputed, to produce the table with the imputed data. Post validation was conducted and an overview of the table with the imputed data was presented.
  6. Conduct Cross-Record Data Validation
    • Each observations from the table with the imputed data was validated against appropriate constrains that spanned across all available variables and only those that were found valid were used to create the table with the cross-record validated data. Post validation was conducted and an overview of the table with the cross-record validated data was presented.
  7. 6.7 Produce The Processed Data
    • From the table with the cross-record validated data, by transforming appropriately the available information, the table with the processed data was created that contained the variables required to identify the most harmful weather event types with respect to the population health and for the economy. Post validation was conducted.

At each stage of the data processing procedure any fact that played a major role was highlighted and examined when it was needed, in compliance with the spirit of the assignment, aiming to supply all the facts necessary to understand how and why the decision making behind this analysis happened in order to create a well justified and documented, reproducible report.


back to start of this chapter
back to TABLE OF CONTENTS


6.1 Load The Raw Data In R

Summary

The raw data was loaded in R from the supplied file repdata_data_StormData.csv.bz2 (which contains data from the Storm Events Dataset), to create the table with the raw data which was then post validated and some basics fact about it were highlighted.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.1.1 Create the table with the raw data

The raw data was loaded in R directly from the supplied file repdata_data_StormData.csv.bz2 (which was a CSV file, compressed via a bzip2 algorithm), with all variables deliberately coerced to character type in order to ensure that no information was lost or altered as a side effect of coercion. The first row of the file includes headers that were used to automatically assign the names of all the variables at the table with the raw data that was created.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.1.2 Conduct post validation for the table with the raw data

The table with the raw data was post validated to ensure that the data from the file was loaded in R correctly.
Three simple constrains were applied:

  1. It should contain 37 variables.
  2. It should contain 902297 observations.
  3. The type of all variables should be ‘character’.

(The expected number of variables and the expected number of observations, were acquired interactively before the execution of the main script RepRes_analysis.Rmd and were then used to form the constrains for the post validation.)

The table with the raw data was valid.

TABLE 6.1.3-1: The results of post validation for the table with the raw data.
name items passes fails nNA error warning
expected_number_of_variables 1 1 0 0 FALSE FALSE
expected_number_of_observations 1 1 0 0 FALSE FALSE
expected_variable_types 37 37 0 0 FALSE FALSE


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.1.3 Overview of the table with the raw data

The table with the raw data contained 37 variables that were all of type ‘character’ and 902297 observations.

## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : chr  "1.00" "1.00" "1.00" "1.00" ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : chr  "97.00" "3.00" "57.00" "89.00" ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ COUNTYENDN: chr  "" "" "" "" ...
##  $ END_RANGE : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : chr  "14.00" "2.00" "0.10" "0.00" ...
##  $ WIDTH     : chr  "100.00" "150.00" "123.00" "100.00" ...
##  $ F         : chr  "3" "2" "2" "2" ...
##  $ MAG       : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ FATALITIES: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ INJURIES  : chr  "15.00" "0.00" "2.00" "2.00" ...
##  $ PROPDMG   : chr  "25.00" "2.50" "25.00" "2.50" ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : chr  "3040.00" "3042.00" "3340.00" "3458.00" ...
##  $ LONGITUDE : chr  "8812.00" "8755.00" "8742.00" "8626.00" ...
##  $ LATITUDE_E: chr  "3051.00" "0.00" "0.00" "0.00" ...
##  $ LONGITUDE_: chr  "8806.00" "0.00" "0.00" "0.00" ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : chr  "1.00" "2.00" "3.00" "4.00" ...
##  - attr(*, ".internal.selfref")=<externalptr>

There were no missing values (coded as NAs) at any of the variables it contained, but there were a lot of empty values which probably represent missing values. For some of the variables, a suspiciously large or small number of distinct values was observed.

TABLE 6.1.3-2: Facts about the variables at the table with the raw data.
Variable Number of Distinct Values Number of NAs Number of Empty Values
STATE__ 70 0 0
BGN_DATE 16335 0 0
BGN_TIME 3608 0 0
TIME_ZONE 22 0 0
COUNTY 557 0 0
COUNTYNAME 29601 0 1589
STATE 72 0 0
EVTYPE 985 0 0
BGN_RANGE 272 0 0
BGN_AZI 35 0 547332
BGN_LOCATI 54429 0 287743
END_DATE 6663 0 243411
END_TIME 3647 0 238978
COUNTY_END 1 0 0
COUNTYENDN 1 0 902297
END_RANGE 266 0 0
END_AZI 24 0 724837
END_LOCATI 34506 0 499225
LENGTH 568 0 0
WIDTH 293 0 0
F 7 0 843563
MAG 226 0 0
FATALITIES 52 0 0
INJURIES 200 0 0
PROPDMG 1390 0 0
PROPDMGEXP 19 0 465934
CROPDMG 432 0 0
CROPDMGEXP 9 0 618413
WFO 542 0 142069
STATEOFFIC 250 0 248769
ZONENAMES 25112 0 594029
LATITUDE 1781 0 47
LONGITUDE 3841 0 0
LATITUDE_E 1729 0 40
LONGITUDE_ 3778 0 0
REMARKS 436906 0 287433
REFNUM 902297 0 0
Note:
The table with the raw data contains 37 variables that are all of type ‘character’
and 902297 observations.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.2 Preprocess The Raw Data

Summary

From the table with the raw data which contains 37 variables, only 9 were selected to create the table with preprocessed data and proceed with this analysis:

  1. REFNUM : an id that uniquely identifies each observation
  2. BGN_DATE : the date when each weather event begun
  3. EVTYPE : the type of each weather event
  4. FATALITIES : the number of fatalities
  5. INJURIES : the number of injuries
  6. PROPDMG : the magnitude value of the damage caused in properties that could have been expressed in thousands, millions or billions of dollars, depending on the corresponding indicator value at the variable PROPDMGEXP
  7. PROPDMGEXP : an indicator value that denotes whether the corresponding magnitude value at the variable PROPDMG refers to thousands, millions or billions of dollars
  8. CROPDMG : the magnitude value of the damage caused in crops that could have been expressed in thousands, millions or billions of dollars, depending on the corresponding indicator value at the variable CROPDMGEXP
  9. CROPDMGEXP : an indicator value that denotes whether the corresponding magnitude value at the variable CROPDMG refers to thousands, millions or billions of dollars

Due to the fact that all variables at the table with the raw data were (deliberately) loaded as type ‘character’ some prerequisites were needed to get verified for the format of the character string values that they contained before they were coerced to their appropriate type.

The variable REFNUM after having verified that the values it contained uniquely identify each observation, was set as the key for the table with the preprocessed data.

Finally post validation was conducted and some facts about the table with the preprocessed data were highlighted.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.2.1 Verify the prerequisites for the selected variables

Two key points were checked for the values of the selected variables from the table with raw data before proceeding to create the table with the preprocessed data:


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.2.1.1 Verify the coercibility of the values for the selected variables

The format of the character string values of the selected variables, REFNUM, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP, from the table with the raw data were checked in order to verify that they were compatible with the new variable type that each them should be coerced to, so as not to lose any information without knowing it (or in other words to avoid the side effect of automatic substitution by NAs, of the values that were incompatible with the new variable type that each of them should be coerced to).

The variables EVTYPE, BGN_DATE, PROPDMGEXP and CROPDMGEXP were already in their appropriate type (which is ‘character’), so no further adjustments were needed. On the other hand the variables REFNUM, FATALITIES and INJURIES had to be coerced from ‘character’ type to ‘integer’, while the type of the remaining two variables, PROPDMG and CROPDMG had to change from ‘character’ to ‘double’.

A validation was conducted to verify that:

  1. the values of the variable REFNUM can be coerced to ‘integer’ type
  2. the values of the variable FATALITIES can be coerced to ‘integer’ type
  3. the values of the variable INJURIES can be coerced to ‘integer’ type
  4. the values of the variable PROPDMG can be coerced to ‘double’ type
  5. the values of the variable CROPDMG can be coerced to ‘double’ type

The values of all selected variables were found to be compatible with the new type that each of them should be coerced to.

Table 6.2.1.1-1: The results of the validation for the compatibility of the format of the character string values at the selected variables from the table with raw data with the appropriate type that each of them should be coerced to, at the table of preprocessed data.
name items passes fails nNA error warning
REFNUM_value_is_coercible_to_integer 902297 902297 0 0 FALSE FALSE
FATALITIES_value_is_coercible_to_integer 902297 902297 0 0 FALSE FALSE
INJURIES_value_is_coercible_to_integer 902297 902297 0 0 FALSE FALSE
PROPDMG_value_is_coercible_to_double 902297 902297 0 0 FALSE FALSE
CROPDMG_value_is_coercible_to_double 902297 902297 0 0 FALSE FALSE


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.2.1.2 Verify the uniqueness of the key values

The variable REFNUM, coerced to its proper type (that is ‘integer’), should and was expected to uniquely identify each observation, making it an excellent choice for the key of the table with the preprocessed data, (as well as for the rest of the tables that were generated at the following stages of the data processing pipeline).

Before proceeding to set the REFNUM as the key, the claim that it uniquely identifies each observations was checked to avoid unexpected surprises that may jeopardize the reproducibility of the analysis.

All values of the variable REFNUM were found to be distinct, and consequently they uniquely identify each observation.

Table 6.2.1.2-1: The results from the validation for the uniqueness of values from REFNUM variable at the table with the raw data.
name items passes fails nNA error warning
value_uniquely_identifies_the_observation 902297 902297 0 0 FALSE FALSE
Note:
The values at REFNUM variable were coerced to ‘integer’ type
before checking if they uniquely identify each observation.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.2.2 Create the table with the preprocessed data

Having identify the variables from the table with the raw data (REFNUM, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP) that were required to proceed with the analysis, and verified that they satisfied the necessary prerequisites, the table with preprocessed data was created, by selecting those 9 variables, coercing them to their appropriate type:

  1. REFNUM was selected and coerced from ‘character’ type to ‘integer’
  2. BGN_DATE was selected (no coercion happened as it was already of the proper type, ‘character’)
  3. EVTYPE was selected (no coercion happened as it was already of the proper type, ‘character’)
  4. FATALITIES was selected and coerced from ‘character’ type to ‘integer’
  5. INJURIES was selected and coerced from ‘character’ type to ‘integer’
  6. PROPDMG was selected and coerced from ‘character’ type to ‘double’
  7. PROPDMGEXP was selected (no coercion happened as it was already of the proper type, ‘character’)
  8. CROPDMG was selected and coerced from ‘character’ type to ‘double’
  9. CROPDMGEXP was selected (no coercion happened as it was already of the proper type, ‘character’)

and finally setting the variable REFNUM as the key of the table.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.2.3 Conduct post validation for the table with the preprocessed data

The table with the preprocessed data was post validated to ensure that:

  1. all and only, the variables required for the analysis were included
  2. all the observations from table with raw data were transfered
  3. each of the selected variables was coerced to its appropriate type
  4. no missing values were introduced as a result of the coercion
  5. REFNUM was set as the key of the table
# Create a vector with the names of the expected variables 
# at the table with the preprocessed data.
expected_variables_at_the_table_with_preprocessed_data <- c(
  "REFNUM", "BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", 
  "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"
)

# Create a validator for the post validation of the preprocessed data.
V_____post_validation_of_table_with_preprocessed_data <- validator(
  # check if the table contains all and only the required variables 
  "all_and_only_the_required_variables_are_included" = 
    ( names(.) == expected_variables_at_the_table_with_preprocessed_data ),
  # check if all the observations were included.
  "all_observations_were_transfered" = nrow(.) == nrow(raw_data),
  # checks if each variable is coerced to its appropriate type
  "REFNUM_is_integer" = 
    ( paste(class(.[["REFNUM"]]), collapse = ",") == "integer" ),
  "BGN_DATE_is_character" = 
    ( paste(class(.[["BGN_DATE"]]), collapse = ",") == "character" ),
  "EVTYPE_is_character" = 
    ( paste(class(.[["EVTYPE"]]), collapse = ",") == "character" ),
  "FATALITIES_is_integer" = 
    ( paste(class(.[["FATALITIES"]]), collapse = ",") == "integer" ),
  "INJURIES_is_integer" = 
    ( paste(class(.[["INJURIES"]]), collapse = ",") == "integer" ),
  "PROPDMG_is_numeric" = 
    ( paste(class(.[["PROPDMG"]]), collapse = ",") == "numeric" ),
  "PROPDMGEXP_is_character" = 
    ( paste(class(.[["PROPDMGEXP"]]), collapse = ",") == "character" ),
  "CROPDMG_is_numeric" = 
    ( paste(class(.[["CROPDMG"]]), collapse = ",") == "numeric" ),
  "CROPDMGEXP_is_character" = 
    ( paste(class(.[["CROPDMGEXP"]]), collapse = ",") == "character" ),
  # check that no missing values were introduced as a result of coercion
  "no_missing_values_introduced" = ( mean(complete.cases(.)) == 1 ),
  # checks if the REFNUM is set as the key of the table
  "REFNUM_is_the_key_of_the_table" = ( attributes(.)[["sorted"]] == "REFNUM" )
)

# Confront the table with the preprocessed data with the validator 
# which contains the constrains for the validity of preprocessed data. 
CF_____post_validation_of_table_with_preprocessed_data <- confront(
  dat = preprocessed_data,
  V_____post_validation_of_table_with_preprocessed_data
)

The table with the preprocessed data was valid.

Table 6.2.3-1: The results of post validation for the table with the preprocessed data.
name items passes fails nNA error warning
all_and_only_the_required_variables_are_included 9 9 0 0 FALSE FALSE
all_observations_were_transfered 1 1 0 0 FALSE FALSE
REFNUM_is_integer 1 1 0 0 FALSE FALSE
BGN_DATE_is_character 1 1 0 0 FALSE FALSE
EVTYPE_is_character 1 1 0 0 FALSE FALSE
FATALITIES_is_integer 1 1 0 0 FALSE FALSE
INJURIES_is_integer 1 1 0 0 FALSE FALSE
PROPDMG_is_numeric 1 1 0 0 FALSE FALSE
PROPDMGEXP_is_character 1 1 0 0 FALSE FALSE
CROPDMG_is_numeric 1 1 0 0 FALSE FALSE
CROPDMGEXP_is_character 1 1 0 0 FALSE FALSE
no_missing_values_introduced 1 1 0 0 FALSE FALSE
REFNUM_is_the_key_of_the_table 1 1 0 0 FALSE FALSE


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.2.4 Overview of the table with the preprocessed data

The table with the preprocessed data, contained 9 variables and 902297 observations.

The variable REFNUM was set as the key of the table.

## Classes 'data.table' and 'data.frame':   902297 obs. of  9 variables:
##  $ REFNUM    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: int  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : int  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.3 Extract The Target Data Subset

Summary

From all available observations that the table with the preprocessed data contains, only the subset of the weather phenomena that begun at 2001 or later and resulted in non-zero harm either to population health (caused fatalities or injuries) or to economy (caused property damage or crop damage) will be used for this analysis (for the reasons that were discussed in detail at the section 4.2 Points Of Interest about the Storm Events Dataset).

The consistency of the format of dates at the BGN_DATE variable (that indicates when each weather phenomenon begun) was checked, as it was intended to be used for the identification, the eligible observations for the target data subset were identified, and got extracted to create the table with the target data subset.

Finally post validation was conducted and some facts about the table with the target data subset were highlighted.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.3.1 Identify the target subset of observations

Out of all available observations it was decided to use for the analysis the subset that includes only the weather phenomena that happened from 2001 and later (due to the implications of changes in the composition of weather event types) and resulted in non-zero harm either to population health (caused fatalities or injuries) or to economy (caused property damage or crop damage) (due to the implications of the eligibility criteria for inclusion of weather events in the dataset).

The format of the date values at BGN_DATE variable from the table with preprocessed data had to be checked to see if it is consistent across all observations, before it was used to form the first of the two constrains.

The eligible observations were finally identified by their key value (denoted by the variable REFNUM).

  • 6.3.1.1 Verify the consistency of date format
    • Verifies that the character string format of the values at BGN_DATE variable are consistent.
  • 6.3.1.2 Identify the eligible observations
    • Identifies by their key value the observations at the table with preprocessed data that begun from 2001 or later and resulted in non-zero harm either to population health (caused fatalities or injuries) or to economy (caused property damage or crop damage)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.3.1.1 Verify the consistency of date format

The year in the value of date at the variable BGN_DATE was intended to be used as one of the two criteria to identify the eligible observations for the target data subset at the next subsubsection.

That’s why it is crucial at this point, to verify that the values of date are in the expected format, which as indicated by the overview of the table with preprocessed data (as well as some interactive examination) seems to be:

  • MM/DD/YYYY 0:00:00
    • MM stands for 2 characters for the month
    • DD stands for 2 characters for the day
    • YYYY stands for 4 characters for the year
    • the value of year is followed by a space
    • 0:00:00 is a dummy part that stands for the time

Indeed all values for dates were found to be in the expected format.

Table 6.3.1.1-1: The results of the validation for the format of the character sting values of dates from the variable BGN_DATE at the table with preprocessed data.
name items passes fails nNA error warning
expected_character_string_format_of_date 902297 902297 0 0 FALSE FALSE


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.3.1.2 Identify the eligible observations

According to the discussion for the two points of interest for the Storm Events Dataset only a subset of observations will be used for this analysis. This target data subset includes only the observations which refer to weather phenomena that simultaneously satisfy the following two criteria:

  • begun at Jan 2011 and later due to the implications of changes in the composition of weather event types
    • the year (that was extracted from the date value of the BGN_DATE variable
      coerced to integer) must be found equal or larger than 2001
  • resulted in non-zero harm either to population health (caused fatalities or injuries) or to economy (caused property damage or crop damage) due to the implications of the eligibility criteria for inclusion of weather events in the dataset
    • the value of at least one of the variables, FATALITIES, INJURIES, PROPDMG and CROPDMG must be positive

Out of 902297 observation from the table with preprocessed data, there were found:

  • 488692 observations which refer to weather phenomena that begun at 2001 or later
  • 254633 observations which refer to weather phenomena that resulted in non-zero harm either to population health (caused fatalities or injuries) or to economy (caused property damage or crop damage)
Table 6.3.1.2-1: The results for the eligibility criteria for inclusion of observations from the table with the preprocessed data in the target data subset.
name items passes fails nNA error warning
begin_date_from_2001_and_later 902297 488692 413605 0 FALSE FALSE
non_zero_damage_to_population_health_or_economy 902297 254633 647664 0 FALSE FALSE

The observations that satisfied simultaneously the two criteria which determine which observation would be included in the target data subset were identified by their key value (denoted by the variable REFNUM).

Exactly 144826 observations were found eligible to be included in the table with the target data subset.

Table 6.3.1.2-2: The number of observations that were found eligible to get included in the table with the target data subset.
Number of Eligible Observations for the Target Data Subset
144826


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.3.2 Create the table with the target data subset

From the table with the preprocessed data, the table with the target data subset was created by including only those observation that simultaneous satisfied two criteria:

The observations were identified and extracted by their key value (denoted by the variable REFNUM).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.3.3 Conduct post validation for the table with the target data subset

Post validation was conducted to verify that all observations contained at the table with the target data subset were eligible.

The same constrains that were used to identify the eligible observations from the table with preprocessed data were used to verify the eligibility of observations at the table with the target data subset.

All observations contained at the table with the target data subset were eligible.

Table 6.3.3-1: The results of the post validation from the table with the target data subset.
name items passes fails nNA error warning
begin_date_from_2001_and_later 144826 144826 0 0 FALSE FALSE
non_zero_damage_to_population_health_or_economy 144826 144826 0 0 FALSE FALSE
Note:
The same constrains that were used to identify the eligible observations from the table with the preprocessed data,
were used for the post validation of the observations at the table with the target data subset.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.3.4 Overview of the table with the target data subset

The table with the target data subset contained 9 variables and 144826 observations.

The variable REFNUM was set as the key of the table.

## Classes 'data.table' and 'data.frame':   144826 obs. of  9 variables:
##  $ REFNUM    : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ BGN_DATE  : chr  "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" ...
##  $ EVTYPE    : chr  "TSTM WIND" "TSTM WIND" "TSTM WIND" "TSTM WIND" ...
##  $ FATALITIES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ PROPDMG   : num  10 8 2 15 5 3 10 450 150 3 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variables EVTYPE, PROPDMGEXP and CROPDMGEXP contained a suspiciously large number of distinct values.

Table 6.3.4: Facts about the variables at the table with the target data subset.
Variable Name Number of Distinct Values
REFNUM 144826
BGN_DATE 3746
EVTYPE 97
FATALITIES 31
INJURIES 101
PROPDMG 1162
PROPDMGEXP 4
CROPDMG 269
CROPDMGEXP 4
Note:
The table with the target data subset contains 9 variables
and 144826 observations without any missing value (coded as NA).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.4 Conduct In-Record Data Validation

Summary

Through the in-record data validation stage the values of each variable from the table with the target data subset were examined independently of the corresponding values at other variables, in order to identify invalid entries which were then substituted by missing values (coded properly as NA) to create the table with the in-record validated data.

Finally post validation was conducted and some facts about the table with the in-record validated data were highlighted.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.4.1 Introduce information from the Strom Data Documentation

Some constants with the valid values for the variables EVTYPE, PROPDMGEXP and CROPDMGEXP (as stated at the Storm Data Documentation) were created and used in order to form their respective constrains.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.4.1.1 Valid values for the EVTYPE variable

The entries of the variable EVTYPE according to the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 7), must take one of the 48 character values that correspond to the defined weather event types:

  1. ASTRONOMICAL LOW TIDE
  2. AVALANCHE
  3. BLIZZARD
  4. COASTAL FLOOD
  5. COLD/WIND CHILL
  6. DEBRIS FLOW
  7. DENSE FOG
  8. DENSE SMOKE
  9. DROUGHT
  10. DUST DEVIL
  11. DUST STORM
  12. EXCESSIVE HEAT
  13. EXTREME COLD/WIND CHILL
  14. FLASH FLOOD
  15. FLOOD
  16. FROST/FREEZE
  1. FUNNEL CLOUD
  2. FREEZING FOG
  3. HAIL
  4. HEAT
  5. HEAVY RAIN
  6. HEAVY SNOW
  7. HIGH SURF
  8. HIGH WIND
  9. HURRICANE/TYPHOON
  10. ICE STORM
  11. LAKE-EFFECT SNOW
  12. LAKESHORE FLOOD
  13. LIGHTNING
  14. MARINE HAIL
  15. MARINE HIGH WIND
  16. MARINE STRONG WIND
  1. MARINE THUNDERSTORM WIND
  2. RIP CURRENT
  3. SEICHE
  4. SLEET
  5. STORM SURGE/TIDE
  6. STRONG WIND
  7. THUNDERSTORM WIND
  8. TORNADO
  9. TROPICAL DEPRESSION
  10. TROPICAL STORM
  11. TSUNAMI
  12. VOLCANIC ASH
  13. WATERSPOUT
  14. WILDFIRE
  15. WINTER STORM
  16. WINTER WEATHER


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.4.1.2 Valid values for the PROPDMGEXP variable

The entries of the variable PROPDMGEXP that indicates whether the magnitude for the economic damage, (denoted by the PROPDMG variable), refers to thousands, millions or billions of dollars, according to the information provided by NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 2.7), must take one of the following 3 character values :

  1. K which corresponds to thousands of dollars
  2. M which corresponds to millions of dollars
  3. B which corresponds to billions of dollars


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.4.1.3 Valid values for the CROPDMGEXP variable

The entries of the variable CROPDMGEXP that indicates whether the magnitude for the economic damage, (denoted by the CROPDMG variable), refers to thousands, millions or billions of dollars, according to the information provided by NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 2.7), must take one of the following 3 character values :

  1. K which corresponds to thousands of dollars
  2. M which corresponds to millions of dollars
  3. B which corresponds to billions of dollars


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.4.2 Conduct in-record data validation for each variable

To create the constrains for the in-record validation for each variable from the table with the target data subset, some ‘common world knowledge’ combined with information provided by the Storm Data Documentation about the valid values of the available variables were used.

Specifically:

  1. The REFNUM variable’s values must be unique for each observations.
  2. the BGN_DATE variable’s value must contain a year part that is from 2001 up to 2011
  3. The EVTYPE variable’s values must be one of the 48 defined events types.
  4. The FATALITIES variable’s values must be non-negative.
  5. The INJURIES variable’s values must be non-negative.
  6. The PROPDMG variable’s values must be non-negative.
  7. The PROPDMGEXP variable’s values must be K, M or B.
  8. The CROPDMG variable’s values must be non negative.
  9. The CROPDMGEXP variable’s values must be K, M or B.

Although unnecessary to test the constrains for all variables, that were included in the table with the target data subset (the uniqueness of the values in key variable REFNUM, the fact the year indicated in the BGN_DATE variable was from 2001 to 2011 as well as the fact that the values of the variables FATALITIES, INJURIES, CROPDMG, PROPDMG were non-negative), because some of them had been verified in previous stages of the data processing procedure, such tests were included in order to provided a detailed and complete overview of all the in-record constrains for the entries
of each variable at the validated data table.

Actually only the variables EVTYPE, PROPDMGEXP and CROPDMGEXP needed to be validated in this stage, as these were the ones that haven’t been checked properly yet.

According to the results of the in-record data validation, there is a significant proportion of invalid values found at the variables EVTYPE, PROPDMGEXP and CROPDMGEXP.

Table 6.4.2-1: The results of the in-record data validation for the table with the target data subset.
name items passes fails nNA error warning
REFNUM 144826 144826 0 0 FALSE FALSE
BGN_DATE 144826 144826 0 0 FALSE FALSE
EVTYPE 144826 112051 32775 0 FALSE FALSE
FATALITIES 144826 144826 0 0 FALSE FALSE
INJURIES 144826 144826 0 0 FALSE FALSE
PROPDMG 144826 144826 0 0 FALSE FALSE
PROPDMGEXP 144826 140668 4158 0 FALSE FALSE
CROPDMG 144826 144826 0 0 FALSE FALSE
CROPDMGEXP 144826 89785 55041 0 FALSE FALSE

The invalid values for each variable were identified by their key value (denoted by the variable REFNUM).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.4.4 Conduct post validation for the table with the in-record validated data

Post validation was conducted to verify that the values of variables at the table with the in-record validated data were valid according to the same constrains that were used to identify the invalid values for each variable at the table with the target data subset.

All the values for each variable at the table with the in-record validated data were valid.

Table 6.4.4-1: The results of post validation for the table with the in-record validated data.
name items passes fails nNA error warning
REFNUM 144826 144826 0 0 FALSE FALSE
BGN_DATE 144826 144826 0 0 FALSE FALSE
EVTYPE 144826 112051 0 32775 FALSE FALSE
FATALITIES 144826 144826 0 0 FALSE FALSE
INJURIES 144826 144826 0 0 FALSE FALSE
PROPDMG 144826 144826 0 0 FALSE FALSE
PROPDMGEXP 144826 140668 0 4158 FALSE FALSE
CROPDMG 144826 144826 0 0 FALSE FALSE
CROPDMGEXP 144826 89785 0 55041 FALSE FALSE
Note:
The same constrains that were used to identify the invalid values of each variable at the table with the target data subset,
were used for the post validation of the observations at the table with the in-record validated data.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.4.5 Overview of the table with the in-record validated data

The table with the in-record validated data contained 9 variables and 144826 observations.

The variable REFNUM was set as the key of table.

## Classes 'data.table' and 'data.frame':   144826 obs. of  9 variables:
##  $ REFNUM    : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ BGN_DATE  : chr  "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" ...
##  $ EVTYPE    : chr  NA NA NA NA ...
##  $ FATALITIES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ PROPDMG   : num  10 8 2 15 5 3 10 450 150 3 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

There were plenty of missing values, that were introduced as a result of the in-record validation procedure for the variables EVTYPE, PROPDMGEXP and CROPDMGEXP, but the number of distinct values didn’t indicate any more the presence of obvious abnormalities.

Table 6.4.5-1: Facts about the table with in-record validated data.
Variable Number of Distinct Values Number of NAs Percentage of NAs
REFNUM 144826 0 0.0000000
BGN_DATE 3746 0 0.0000000
EVTYPE 46 32775 0.2263061
FATALITIES 31 0 0.0000000
INJURIES 101 0 0.0000000
PROPDMG 1162 0 0.0000000
PROPDMGEXP 3 4158 0.0287103
CROPDMG 269 0 0.0000000
CROPDMGEXP 3 55041 0.3800492
Note:
The table with the in-record validated data contained 9 variables and 144826 observations.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.5 Impute Missing Values

Summary

In this stage of data processing procedure, an attempt was made to maximize the amount of available information for the analysis, by imputing some of the missing values that exist at the table with the in-record validated data with plausible values.

There were 3 variables (EVTYPE, PROPDMGEXP and CROPDMGEXP) that contained NAs, all of which were introduced through the in-record data validation stage.

Via a conservative deterministic approach which aimed to retrieve the missing values only for the cases that there were almost no doubt about the values that were imputed, the majority of those entries got successfully restored.

However it is highlighted that for the variable EVTYPE, there is no guarantee that the values imputed are error-free, due to the fact that the associations were made based on the invalid values found at the table with the target data subset, that were substituted by NAs and the information available in NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 7)*
by the analyst who has no expertise neither on weather nor in meteorology.

On the other hand, the missing values that were imputed for the variables PROPDMGEXP and CROPDMGEXP are almost certainly correct (and even if they are not, it didn’t affect in any significant way the results of the analysis as they all correspond to observations that resulted in 0 property and crop damage respectively while the analysis focused on the observations for the weather events that caused non-zero harm).

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.1 Impute missing values at the variable EVTYPE

The invalid values for the variable EVTYPE at the table with the target data subset (before they got substituted by NAs at the in-record data validation stage) were examined and associations were made to plausible valid substitutions. Those observations with missing values that corresponded to successfully associated plausible substitutions, were identified by their key values and were imputed.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.1.1 Examine the invalid values from the variable EVTYPE

For the variable EVTYPE at the table with the in-record validated data, out of the total 144826 observations 32775 (22.63%) were NAs.

Table 6.5.1.1-1: Information on the missing values for the variable EVTYPE at the table with the target data subset.
Variable Total Number of Values Number of Missing Values Percentage of Missing Values
EVTYPE 144826 32775 0.2263061

These 32775 missing values at the table with the in-record validated data, corresponded to 51 distinct invalid entries at the table with the target data subset before they got substituted by NAs at the in-record data validation stage.

Table 6.5.1.1-2: Information on the distinct invalid values for the variable EVTYPE at the table with the target data subset which got substituted by NAs at the in-record validation stage.
Invalid Values Number of Occurrences
TSTM WIND 31453
LANDSLIDE 189
WINTER WEATHER/MIX 139
WILD/FOREST FIRE 132
RIP CURRENTS 115
URBAN/SML STREAM FLD 115
MARINE TSTM WIND 109
TSTM WIND/HAIL 108
STORM SURGE 86
HEAVY SURF/HIGH SURF 50
HURRICANE 38
LIGHT SNOW 38
FOG 32
WIND 26
EXTREME COLD 24
DRY MICROBURST 17
HEAVY SURF 12
MIXED PRECIPITATION 12
COASTAL FLOODING 11
ASTRONOMICAL HIGH TIDE 8
STRONG WINDS 6
SNOW 5
FREEZE 4
SMALL HAIL 4
GUSTY WINDS 4
MUDSLIDE 3
HIGH SEAS 3
SNOW SQUALLS 3
EXTREME WINDCHILL 3
WINTER WEATHER MIX 2
FALLING SNOW/ICE 2
ROUGH SEAS 2
LIGHT FREEZING RAIN 2
LATE SEASON SNOW 1
THUNDERSTORM 1
ROGUE WAVE 1
NON-TSTM WIND 1
NON TSTM WIND 1
OTHER 1
LAKE EFFECT SNOW 1
MUD SLIDE 1
BRUSH FIRE 1
BLOWING DUST 1
GUSTY WIND 1
HIGH WATER 1
HIGH SURF ADVISORY 1
HAZARDOUS SURF 1
COLD WEATHER 1
WHIRLWIND 1
ICE ON ROAD 1
DROWNING 1


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.1.2 Associate plausible substitutions to the invalid values from the variable EVTYPE

To impute the corresponding NAs associations were made from the invalid entries to defined weather event types.

  • Some of the associations were based solely on the invalid values, which directly corresponded to defined event types as they seem to be either typos (e.g. “RIP CURRENTS” instead of “RIP CURRENT”) or acronyms of the expected values (e.g. “TSTM WIND” instead of “THUNDERSTORM WIND”).
  • While for others the description for each one of the 48 event types that was available in NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 7) was taken into account for cases where a variation of the defined value had been supplied (e.g. “URBAN/SML STREAM FLD” instead of “HEAVY RAIN”).

Nevertheless it is stressed that the associations mainly depend on ‘common sense’ judgment (instead of solid professional expertise) and in no way they are guaranteed to be error-free, despite the best efforts made to impute only the most obvious cases.

In total, 28 distinct invalid values were associated to some of the 48 defined weather event types:

  1. ‘COASTAL FLOODING’ –> COASTAL FLOOD
  2. ‘COLD WEATHER’ –> COLD/WIND CHILL
  3. ‘LANDSLIDE’ –> DEBRIS FLOW
  4. ‘MUDSLIDE’ –> DEBRIS FLOW
  5. ‘MUD SLIDE’ –> DEBRIS FLOW
  6. ‘DROWNING’ –> DROUGHT
  7. ‘EXTREME COLD’ –> EXTREME COLD/WIND CHILL
  8. ‘EXTREME WINDCHILL’ –> EXTREME COLD/WIND CHILL
  9. ‘FREEZE’ –> FROST/FREEZE
  10. ‘SMALL HAIL’ –> HAIL
  11. ‘URBAN/SML STREAM FLD’ –> HEAVY RAIN
  12. ‘HEAVY SURF/HIGH SURF’ –> HIGH SURF
  13. ‘HEAVY SURF’ –> HIGH SURF
  14. ‘HAZARDOUS SURF’ –> HIGH SURF
  1. ’ HIGH SURF ADVISORY’ –> HIGH SURF
  2. ‘HURRICANE’ –> HURRICANE/TYPHOON
  3. ‘LAKE EFFECT SNOW’ –> LAKE-EFFECT SNOW
  4. ‘MARINE TSTM WIND’ –> MARINE THUNDERSTORM WIND
  5. ‘RIP CURRENTS’ –> RIP CURRENT
  6. ‘STORM SURGE’ –> STORM SURGE/TIDE
  7. ‘STRONG WINDS’ –> STRONG WIND
  8. ‘TSTM WIND’ –> THUNDERSTORM WIND
  9. ‘DRY MICROBURST’ –> THUNDERSTORM WIND
  10. ‘THUNDERSTORM’ –> THUNDERSTORM WIND
  11. ‘WILD/FOREST FIRE’ –> WILDFIRE
  12. ‘BRUSH FIRE’ –> WILDFIRE
  13. ‘WINTER WEATHER/MIX’ –> WINTER WEATHER
  14. ‘WINTER WEATHER MIX’ –> WINTER WEATHER

( The 15th invalid value contained 3 spaces before the ‘HIGH SURF ADVISORY’, but for some unknown reason after rendering it seems to be only 1 space.)

On the other hand there were 23 distinct invalid values were not possible to get associated (with relatively high confidence) with any of the 48 defined event types:

  1. ‘FOG’
  2. ‘LIGHT SNOW’
  3. ‘WIND’
  4. ‘LIGHT FREEZING RAIN’
  5. ‘MIXED PRECIPITATION’
  6. ‘ASTRONOMICAL HIGH TIDE’
  7. ‘GUSTY WINDS’
  8. ‘SNOW’
  9. ‘HIGH SEAS’
  10. ‘ROUGH SEAS’
  11. ‘SNOW SQUALLS’
  12. ‘FALLING SNOW/ICE’
  1. ‘GUSTY WIND’
  2. ‘HIGH WATER’
  3. ‘OTHER’
  4. ‘BLOWING DUST’
  5. ‘ICE ON ROAD’
  6. ‘LATE SEASON SNOW’
  7. ‘NON TSTM WIND’
  8. ‘NON-TSTM WIND’
  9. ‘ROGUE WAVE’
  10. ‘WHIRLWIND’
  11. ‘TSTM WIND/HAIL’


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.1.3 Identify the imputable missing values at the variable EVTYPE

After having established the associations for the invalid entries of the variable EVTYPE the observations that contained values that could be retrieved were identified.

# Create a validator to identify which observations for the variable EVTYPE 
# contain a missing value that correspond to one of the invalid values 
# at the table with target data subset that can be retrieved 
# according to the list with the association. 
V_________identification_test_of_imputable_missing_values_for_EVTYPE <- validator(
  "COASTAL FLOOD" = EVTYPE %in% associations_on_defined_event_types[["COASTAL FLOOD"]],
  "COLD WIND CHILL" = EVTYPE %in% associations_on_defined_event_types[["COLD/WIND CHILL"]],
  "DEBRIS FLOW" = EVTYPE %in% associations_on_defined_event_types[["DEBRIS FLOW"]],
  "DROUGHT" = EVTYPE %in% associations_on_defined_event_types[["DROUGHT"]],
  "EXTREME COLD/WIND CHILL" = EVTYPE %in% associations_on_defined_event_types[["EXTREME COLD/WIND CHILL"]],
  "FROST/FREEZE" = EVTYPE %in% associations_on_defined_event_types[["FROST/FREEZE"]],
  "HAIL" = EVTYPE %in% associations_on_defined_event_types[["HAIL"]],
  "HEAVY RAIN" = EVTYPE %in% associations_on_defined_event_types[["HEAVY RAIN"]],
  "HIGH SURF" = EVTYPE %in% associations_on_defined_event_types[["HIGH SURF"]],
  "HURRICANE/TYPHOON" = EVTYPE %in% associations_on_defined_event_types[["HURRICANE/TYPHOON"]],
  "LAKE-EFFECT SNOW" = EVTYPE %in% associations_on_defined_event_types[["LAKE-EFFECT SNOW"]],
  "MARINE THUNDERSTORM WIND" = EVTYPE %in% associations_on_defined_event_types[["MARINE THUNDERSTORM WIND"]],
  "RIP CURRENT" = EVTYPE %in% associations_on_defined_event_types[["RIP CURRENT"]],
  "STORM SURGE/TIDE" = EVTYPE %in% associations_on_defined_event_types[["STORM SURGE/TIDE"]],
  "STRONG WIND" = EVTYPE %in% associations_on_defined_event_types[["STRONG WIND"]],
  "THUNDERSTORM WIND" = EVTYPE %in% associations_on_defined_event_types[["THUNDERSTORM WIND"]],
  "WILDFIRE" = EVTYPE %in% associations_on_defined_event_types[["WILDFIRE"]],
  "WINTER WEATHER" = EVTYPE %in% associations_on_defined_event_types[["WINTER WEATHER"]]
)

# Confront the table with the target data subset with the validator with 
# the criteria for the association of invalid entries for the variable EVTYPE.
CF_________identification_test_of_imputable_missing_values_for_EVTYPE <- confront(
  dat = target_data_subset[is.na(in_record_validated_data[["EVTYPE"]])],
  V_________identification_test_of_imputable_missing_values_for_EVTYPE
)

Out of the total 32775 missing values for the variable EVTYPE at the in-record validation data table, 32520 (99.22%) could be imputed while for only 255 (0.78%) values it wasn’t possible to safely associate them with some of the 48 defined event types.

Table 6.5.1.3-1: Information on the imputable and not imputable missing values at the variable EVTYPE.
Variable Number of Missing Values Number of Imputable Missing Values Number of Not Imputable Missing Values Percentage of Imputable Missing Values Percentage of Not Imputable Missing Values
EVTYPE 32775 32520 255 0.9922197 0.0077803

The imputed missing values were distributed according to the associations at 18 defined event types.

Table 6.5.1.3-2: Information on the number of invalid values that can be imputed by one the 48 defined weather event types for the variable EVTYPE at the table with the imputed data.
name items passes fails nNA error warning
COASTAL.FLOOD 32775 11 32764 0 FALSE FALSE
COLD.WIND.CHILL 32775 1 32774 0 FALSE FALSE
DEBRIS.FLOW 32775 193 32582 0 FALSE FALSE
DROUGHT 32775 1 32774 0 FALSE FALSE
EXTREME.COLD.WIND.CHILL 32775 27 32748 0 FALSE FALSE
FROST.FREEZE 32775 4 32771 0 FALSE FALSE
HAIL 32775 4 32771 0 FALSE FALSE
HEAVY.RAIN 32775 115 32660 0 FALSE FALSE
HIGH.SURF 32775 64 32711 0 FALSE FALSE
HURRICANE.TYPHOON 32775 38 32737 0 FALSE FALSE
LAKE.EFFECT.SNOW 32775 1 32774 0 FALSE FALSE
MARINE.THUNDERSTORM.WIND 32775 109 32666 0 FALSE FALSE
RIP.CURRENT 32775 115 32660 0 FALSE FALSE
STORM.SURGE.TIDE 32775 86 32689 0 FALSE FALSE
STRONG.WIND 32775 6 32769 0 FALSE FALSE
THUNDERSTORM.WIND 32775 31471 1304 0 FALSE FALSE
WILDFIRE 32775 133 32642 0 FALSE FALSE
WINTER.WEATHER 32775 141 32634 0 FALSE FALSE
Note:
The subset of the 32775 observations with missing values
was used for the identification of imputable invalid values.

The key value (denoted by the REFNUM variable) for the observations that successfully got associated with a defined weather event type was identified.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.2 Impute missing values at the variable PROPDMGEXP

The invalid values for the variable PROPDMGEXP at the table with the target data subset (before they got substituted by NAs at the in-record data validation stage) were examined and associations were made to plausible valid substitutions. Those observations with missing values that corresponded to successfully associated plausible substitutions, were identified by their key values and were imputed.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.2.1 Examine the invalid values from the variable PROPDMGEXP

For the variable PROPDMGEXP, at the table with the in-record validated data, out of the total 144826 observations, 4158 (2.87%) were NAs.

Table 6.5.2.1-1: Information on missing values for the variable PROPDMGEXP at the table with the in-record validated data.
Variable Total Number of Values Number of Missing Values Percentage of Missing Values
PROPDMGEXP 144826 4158 0.0287103

Those 4158 missing values at the table with the in-record validated data, corresponded to empty values at the table with the target data subset before they got substituted by NAs at the in-record data validation stage.

Table 6.5.2.1-2: The distinct invalid values for the variable ‘PROPDMGEXP’ that were substituted by NAs at the in-record data validation stage.
Distinct Values Number of Observations
4158


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.2.2 Associate plausible substitutions to the invalid values from the variable PROPDMGEXP

A single association (which works perfectly as shown in the next subsubsection was made for the missing values that corresponded to empty values:

  • The entries that correspond to property damage with zero magnitude, (denoted by the value 0 at the variable PROPDMG) could be associated with any of the valid values (“K”, “M”, “B”).


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.2.3 Identify the imputable missing values at the variable PROPDMGEXP

The observations that satisfied the criterion imposed by the association made for the invalid values from the variable PROPDMGEXP were identified.

All missing values at the variable PROPDMGEXP (4158 in total), corresponded to observations for which the magnitude of property damage (denoted by the variable PROPDMG) was zero.

Table 6.5.2.3-1: Results from identification of imputable missing values at the variable PROPDMGEXP.
name items passes fails nNA error warning
imputable_missing_values_at_PROPDMGEXP 4158 4158 0 0 FALSE FALSE
Note:
The subset of the 4158 observations with missing values was used for the identification of imputable invalid values

The key values (denoted by the variable REFNUM) of the observations for which the missing values at the variable PROPDMGEXP could be retrieved were identified.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.2.4 Substitute the imputable missing values at the variable PROPDMGEXP

The value “K” was imputed to all observations with imputable missing values at the variable PROPDMGEXP (which were identified by their key value).

(They could have been substituted by any of the valid values (“K”, “M” or “B”) for the variable PROPDMGEXP without changing the fact that they refer to 0$ property damage.)


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.5.3 Impute missing values at the variable CROPDMGEXP

The invalid values for the variable PROPDMGEXP at the table with the target data subset (before they got substituted by NAs at the in-record data validation stage) were examined and associations were made to plausible valid substitutions. Those observations with missing values that corresponded to successfully associated plausible substitutions, were identified by their key values and were imputed.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.3.1 Examine the invalid values from the variable CROPDMGEXP

For the variable CROPDMGEXP at the table with the in-record validated data, out of the total 144826 observations 55041 (38.00%) were NAs.

Table 6.5.3.1-1: Information on missing values for the variable CROPDMGEXP at the table with the in-record validated data.
Variable Total Number of Values Number of Missing Values Percentage of Missing Values
CROPDMGEXP 144826 55041 0.3800492

Those 55041 missing values at the table with the in-record validated data, corresponded to empty values at the table with the target data subset before they got substituted by NAs at the in-record data validation stage.

Table 6.5.3.1-2: The distinct invalid values for the variable ‘CROPDMGEXP’ that were substituted by NAs at the in-record data validation stage.
Distinct Values Number of Observations
55041


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.3.2 Associate plausible substitutions to the invalid values from the variable CROPDMGEXP

A single association (which works perfectly as shown in the next subsubsection was made for the missing values that corresponded to empty values:

  • The entries that correspond to crop damage with zero magnitude, (denoted by the value 0 at the variable CROPDMG) could be associated with any of the valid values (“K”, “M”, “B”).


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.3.3 Identify the imputable missing values at the variable CROPDMGEXP

The observations that satisfied the criterion imposed by the association made for the invalid values from the variable CROPDMGEXP were identified.

The key values (denoted by the variable REFNUM) of the observations for which the missing values at the variable CROPDMGEXP could be retrieved were identified.

Table 6.5.3.3-1: Results from identification of imputable missing values at the variable CROPDMGEXP.
name items passes fails nNA error warning
imputable_missing_values_at_CROPDMGEXP 55041 55041 0 0 FALSE FALSE
Note:
The subset of the 55041 observations with missing values was used for the identification of imputable invalid values.

The key values (denoted by the variable REFNUM) of the observations for which the missing values at the variable CROPDMGEXP could be retrieved were identified.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.5.3.4 Substitute the imputable missing values at the variable CROPDMGEXP

The value “K” was imputed to all observations with imputable missing values at the variable CROPDMGEXP (which were identified by their key value).

(They could have been substituted by any of the valid values (“K”, “M” or “B”) for the variable PROPDMGEXP without changing the fact that they refer to 0$ property damage.)



back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.5.4 Conduct post validation for the table with the imputed data

Post validation was conducted to verify that the values of variables at the table with imputed data were valid according to the same constrains that were used to identify the invalid values for each variable at the table with the target data subset.

All values for each variable at the table with the imputed data were valid.

Table 6.5.4-1: The results of post validation for the table with the imputed data.
name items passes fails nNA error warning
REFNUM 144826 144826 0 0 FALSE FALSE
BGN_DATE 144826 144826 0 0 FALSE FALSE
EVTYPE 144826 144571 0 255 FALSE FALSE
FATALITIES 144826 144826 0 0 FALSE FALSE
INJURIES 144826 144826 0 0 FALSE FALSE
PROPDMG 144826 144826 0 0 FALSE FALSE
PROPDMGEXP 144826 144826 0 0 FALSE FALSE
CROPDMG 144826 144826 0 0 FALSE FALSE
CROPDMGEXP 144826 144826 0 0 FALSE FALSE
Note:
The same constrains that were used to identify the invalid values of each variable at the table with the target data subset,
were used for the post validation of the observations at the table with the imputed data.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.5.5 Overview of the table with the imputed data

The table with the imputed data contained 9 variables and 144826 observations.

The variable REFNUM was set as the key of the this table.

## Classes 'data.table' and 'data.frame':   144826 obs. of  9 variables:
##  $ REFNUM    : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ BGN_DATE  : chr  "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" ...
##  $ EVTYPE    : chr  "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ FATALITIES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ PROPDMG   : num  10 8 2 15 5 3 10 450 150 3 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "K" "K" "K" "K" ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

There were 255 missing values left only at the variable EVTYTE (which were those that couldn’t be safely imputed). The number of distinct values at any of the variables didn’t indicate the presence of obvious abnormalities.

Table 6.5.5-1: Facts about the table with the imputed data.
Variable Number of Distinct Values Number of Missing Values Percentage of Missing Values
REFNUM 144826 0 0.0000000
BGN_DATE 3746 0 0.0000000
EVTYPE 47 255 0.0017607
FATALITIES 31 0 0.0000000
INJURIES 101 0 0.0000000
PROPDMG 1162 0 0.0000000
PROPDMGEXP 3 0 0.0000000
CROPDMG 269 0 0.0000000
CROPDMGEXP 3 0 0.0000000
Note:
The table with the imputed data contained 9 variables and 144826 observations.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.6 Conduct Cross-Record Data Validation

Summary

Each observation at the table with imputed data was checked to verify if it contains entries which were valid across all variables simultaneous. Those observations that were valid were used to create the table with the cross-record validated data.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.6.1 Identify all valid observations

A single constrain that spanned across all available variables at the table with the imputed data was created and used to identify the valid observations.

Specifically, each observation must simultaneous satisfy the 4 criteria below in order to be considered valid:

  1. The id must be unique (and non-missing).
  2. The weather event type value must be one of the defined weather events (and non-missing).
  3. The year must be in the period from 2001 to 2011 (and non-missing).
  4. There must be non-zero harm either to population health or to economy, so:
    • either fatalities must be positive (and non-missing),
    • or injuries must be positive (and non-missing),
    • or property damage (in dollars) must be retrievable and positive (and non-missing),
    • or crop damage (in dollars) must be retrievable and positive (and non-missing).

Out of the total of 144826 observation at the table with the imputed data 144571 were valid across all variables while only 255 were found to be invalid.

Table 6.6.1-1: The table contains the results of the cross-record data validation for the observation contained at the imputed data table.
name items passes fails nNA error warning
valid_observations 144826 144571 255 0 FALSE FALSE

The value of the key (denoted by the variable REFNUM) was used to identify the observations that were valid.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.6.2 Create the table with the cross-record validated data

From the table with the imputed data, the table with the cross-record validated data was created, by including only the observations that contained valid (and non-missing) values across all variables.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.6.3 Conduct post validation for table with the cross-record validated data

Post validation was conducted to verify that all observations at the table with the cross-validated data were valid according to the same constrains that were used to identify the valid observation at the table with the imputed data.

All the observations at the table with the cross-record validated data were valid.

Table 6.6.3-1: Presents the result of the post validation for the table with cross validated data.
name items passes fails nNA error warning
valid_observations 144571 144571 0 0 FALSE FALSE


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.6.4 Overview of the table with the cross-record validated data

The table with the cross-validated data contained 9 variables and 144571 observations.

The variable REFNUM was set as the key of this table.

## Classes 'data.table' and 'data.frame':   144571 obs. of  9 variables:
##  $ REFNUM    : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ BGN_DATE  : chr  "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" "1/19/2001 0:00:00" ...
##  $ EVTYPE    : chr  "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ FATALITIES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ PROPDMG   : num  10 8 2 15 5 3 10 450 150 3 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "K" "K" "K" "K" ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

All the observation at the table with the cross-validated data are complete as indicated by the results of post validation for the table with cross-validated data.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






6.7 Produce The Processed Data

Summary

Having identified, validated and imputed the target data for the period of interest, by transforming the variables from the table with cross-record validated data, the processed data table was constructed that contained all information that was necessary in order to proceed with this analysis and address the two questions of interest.

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


6.7.1 Create the table with the processed data

The following transformations were applied at the variables from the table with the cross-record validated data, in order to construct the table with the processed data:

  1. the variable REFNUM was transfered unchanged
  2. the variable BGN_DATE was omitted
  3. the variable EVTYPE was transfered and renamed to EVENT_TYPE
  4. the variable FATALITIES was transfered unchanged
  5. the variable INJURIES was transfered unchanged
  6. the variables FATALITIES and INJURIES were added
    in order to create the variable CASUALTIES
  7. the variables PROPDMG (that denoted the magnitude of property damage) and PROPDMGEXP (that indicated if the value of PROPDMG referred to thousands, millions or billions) were combined appropriately to retrieve the property damage in dollars in order to create the variable PROPERTY_DAMAGE
  8. the variables CROPDMG (that denoted the magnitude of crop damage) and CROPDMGEXP (that indicated if the value of CROPDMG referred to thousands, millions or billions) were combined appropriately to retrieve the crop damage in dollars in order to create the variable CROP_DAMAGE
  9. the variables PROPERTY_DAMAGE and CROP_DAMAGE were added in order to create the variable ECONOMIC_DAMAGE
# Create the table with the processed data 
# from the information contained 
# at the table with cross-record validated data.
processed_data <- cross_validated_data[
  ,
  list(
    # REFNUM variable doesn't need to change
    "REFNUM" = REFNUM,
    # EVTYPE variable should be renamed to EVENT_TYPE
    "EVENT_TYPE" = EVTYPE,
    # FATALITIES variable doesn't need to change
    "FATALITIES" = FATALITIES,
    # INJURIES variable doesn't need to change
    "INJURIES" = INJURIES,
    # PROPERTY_DAMAGE is created by combining the information
    # from the PROPDMG variable which denotes the magnitude of property damage
    # and the PROPDMGEXP variable that indicates if the magnitude
    # refers to thousands (K), millions (M) or billions (B) of dollars
    "PROPERTY_DAMAGE" = (function(magnitude, coded_exponent, code_dictionary) {
      recoded_exponent <- str_replace_all(
        string = coded_exponent,
        code_dictionary
      ) %>%
        as.integer()
      ## the magnitude is multiplied by a coefficient
      ## with base 10 raised to the appropriate power
      ## (3 for thousands, 6 for millions or 9 for billions)
      ## to retrieve the value of property damage 
      reconstructed_number <- magnitude * 10^recoded_exponent
    })(PROPDMG, PROPDMGEXP, c("K" = "3", "M" = "6", "B" = "9")),
    # CROP_DAMAGE is created by combining the information
    # from the CROPDMG variable which denotes the magnitude of crop damage
    # and the CROPDMGEXP variable that indicates if the magnitude
    # refers to thousands (K), millions (M) or billions (B) of dollars
    "CROP_DAMAGE" = (function(magnitude, coded_exponent, code_dictionary) {
      recoded_exponent <- str_replace_all(
        string = coded_exponent,
        code_dictionary
      ) %>%
        as.integer()
      ## the magnitude is multiplied by a coefficient
      ## with base 10 raised to the appropriate power
      ## (3 for thousands, 6 for millions or 9 for billions)
      ## to retrieve the value of crop damage 
      reconstructed_number <- magnitude * 10^recoded_exponent
    })(CROPDMG, CROPDMGEXP, c("K" = "3", "M" = "6", "B" = "9"))
  )
  ][
    ,
    # Create a variable with the number of casualties
    # caused by each weather event type 
    # by adding the fatalities and injuries
    CASUALTIES := FATALITIES + INJURIES][
      ,
      # Create a variable with the economic damage
      # caused by each weather event type 
      # by adding the property damage and crop damage
      ECONOMIC_DAMAGE := PROPERTY_DAMAGE + CROP_DAMAGE
      ][
        ,
        # Re-arrange the order of the variables 
        list(
          REFNUM, EVENT_TYPE, 
          FATALITIES, INJURIES, CASUALTIES,
          PROPERTY_DAMAGE, CROP_DAMAGE, ECONOMIC_DAMAGE
        )
        ]


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




6.7.2 Conduct post validation for the table with the processed data

Post validation was conducted to verify that all observations contained at the table with the processed data were valid across all variables it contained.

One constrain was created and used, which consists of three parts that must hold simultaneous for each observation:

  • The key for each observation denoted by the variable REFNUM must be unique.
    • The event type for each observation denoted by the variable EVENT_TYPE must be one of 48 defined weather event types according to the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (at chapter 7)
    • At least one of the six variables that indicate the harm (either to population health or to economy), denoted by the variables FATALITIES, INJURIES, CASUALTIES, PROPERTY_DAMAGE, CROP_DAMAGE or EC0NOMIC_DAMAGE must be positive.

All the 144571 observations included in the processed data table were found to satisfy the condition.

Table 6.7.2-1: The results of post validation for the table with the processed data.
name items passes fails nNA error warning error.1 warning.1
valid_observation 144571 144571 0 0 FALSE FALSE FALSE FALSE


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













7 PROCESSED DATA


The table with the processed data (which was the result of the data processing pipeline) contains all the information that was used in the chapters:

in order to address the two questions of interest for this analysis.

Details about the variables it contains and a short overview are presented in this chapter.

Finally in order to assist any attempt to reproduce the analysis a file with the processed data was exported to serve as checkpoint.


back to start of this chapter
back to TABLE OF CONTENTS


7.1 Information For The Table With The Processed Data

There are 8 variable at the table with the processed data:

  1. REFNUM (int) : a value that uniquely identifies each observation and was used as the key of the table
  2. EVENT_TYPE (chr) : the type of each weather event type
  3. FATALITIES (int) : the number of fatalities
  4. INJURIES (int) : the number of injuries
  5. CASUALTIES (int) : the number of casualties (injuries and fatalities)
  6. PROPERTY_DAMAGE (num) : the property damage in dollars
  7. CROP_DAMAGE (num) : the crop damage in dollars
  8. ECONOMIC_DAMAGE (num): the economic damage in dollars (property damage and crop damage)


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






7.2 Overview Of The Table With The Processed Data

The processed data consists of 8 variables and 144571 observations.

The variable REFNUM was set as the key of this table.

## Classes 'data.table' and 'data.frame':   144571 obs. of  8 variables:
##  $ REFNUM         : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ EVENT_TYPE     : chr  "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ FATALITIES     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES       : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ CASUALTIES     : int  0 0 0 0 0 0 0 4 0 0 ...
##  $ PROPERTY_DAMAGE: num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  $ CROP_DAMAGE    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ECONOMIC_DAMAGE: num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

All the observations included in the processed data table are complete.

Table 7.2-1: The percentage of complete observations at the table with the processed data.
Percentage Of Complete Observations
100%

The number of distinct values comply with what was expected from each variable.

Table 7.2-2: The number of distinct values for each variable at the table with the processed data.
Variable Number of Distinct Values
REFNUM 144571
EVENT_TYPE 47
FATALITIES 31
INJURIES 101
CASUALTIES 113
PROPERTY_DAMAGE 1369
CROP_DAMAGE 331
ECONOMIC_DAMAGE 1647
Note:
The table with the processed data consists of 8 variables
and 144571 observations.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






7.3 Export The Table With The Processed Data

The table with the processed data was exported (as an R file), in the sub-directory of the working directory:

  • outputs –> processed_data

with filename:

  • table_with_the_precessed_data.R

The main reason for exporting the a file with the processed data was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













8 HARM ON POPULATION HEALTH


In this chapter an attempt was made to quantify the harm on population health based on the information from the table with the processed data.

The harm on population health was examined over three perspectives:

  1. The harm on population health with respect to fatalities caused by each weather event type based on the observations for weather events that resulted in non-zero fatalities at United States in the period from 2001 to 2011.
  2. The harm on population health with respect to injuries caused by each weather event type based on the observations for weather events that resulted in non-zero injuries at United States in the period from 2001 to 2011.
  3. The harm on population health with respect to casualties (sum of fatalities and injuries) caused by each weather event type based on the observations for weather events that resulted in non-zero casualties at United States in the period from 2001 to 2011.

The weather event types for which less than 10 observations that resulted in non-zero harm were available with respect to a perspective of interest were ommitted (from the analysis of that particular perspective), to avoid highly misleading statistics. Consequently the subset of weather event types that were included for each of the three perspectives is different.

Due to the fact that for all perspectives the values of interest for the observations of most weather event types were highly positively skewed, it was consider important in order to obtain an insightful picture of their consequences to examine them over three different aspects:

  1. The overall harm on population health caused by each weather event type.
  2. The harm on population health caused by the 90% of cases with the lowest impact of each weather event type.
  3. The harm on population health caused by the 10% of cases with the highest impact of each weather event type.

For every aspect the sample size, the skewness and the mean of the values that encapsulated the harm with respect to each perspective were summarized by each weather event type and reported.

The results obtained for the harm on population health by each weather event type were presented at the section 10.1 Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? of the chapter 10 RESULTS).

For each of the three perspectives that were examined for the harm on population health by each weather event type a multiplot was created to visualize the respective results. Those multiplots constitute the three parts of the Figure 1, which was composed and presented at the subsection 10.1.1 Overview of results for the harm on population health of the chapter 10 RESULTS).

(In compliance with the restrictions of the assignment, according to which at least 1 but no more than 3 figures should be included in the report, the Multiplot as well as the elementary plots that contain were NOT displayed separately and can ONLY be examined as PARTs of the Figure 1 at the subsection 10.1.1 Overview of results for the harm on population health of the chapter 10 RESULTS.)


back to start of this chapter
back to TABLE OF CONTENTS


8.1 Harm On Population Health With Respect To Fatalities By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on population health with respect to fatalities were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on population health with respect to fatalities was summarized by each weather event type.

Three aspects were examined:

  1. The overall average number of fatalities by each weather event type.
  2. The average number of fatalities by each weather event type for the 90% of cases with the lowest impact.
  3. The average number of fatalities by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average number of fatalities by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average number of fatalities was used as the main criterion to determine which weather events caused the most harm on population health with respect to fatalities but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the fatalities were highly positively skewed).

The table with results for the harm on population health with respect to fatalities by each weather event type were presented at the subsection 10.1.2 Most harmful event types with respect to fatalities of the chapter 10 RESULTS.

Finally the Multiplot 1.1 was created to visualize the results for the harm on population health with respect to fatalities by each weather event type.

(Note that neither the Multiplot 1.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, where the Figure 1 was presented, of which the Multiplot 1.1 constitutes the PART 1.)

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.1 Extract the target data for harm on population health with respect to fatalities

In order to examine the harm on population health with respect to fatalities caused by each weather event type, the variables REFNUM, EVENT_TYPE and FATALITIES were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero fatalities were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero fatalities (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of fatalities for each weather event type was skewed.
  2. a period of 10 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero fatalities for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on population health with respect to fatalities consist of 3175 observations.

## Classes 'data.table' and 'data.frame':   3175 obs. of  3 variables:
##  $ REFNUM    : int  413652 413757 413763 413862 414153 414183 414184 414187 414200 414267 ...
##  $ EVENT_TYPE: chr  "THUNDERSTORM WIND" "TORNADO" "HIGH WIND" "THUNDERSTORM WIND" ...
##  $ FATALITIES: int  1 2 1 1 1 1 1 1 1 2 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 26 distinct weather event types, for most of which the variable FATALITIES was highly positively skewed.

Table 8.1.1-1: Facts about the table with the target data subset of observations for the harm on population health with respect to fatalities.
EVENT_TYPE N SKEWNESS
AVALANCHE 129 2.2979
BLIZZARD 15 2.6185
COLD/WIND CHILL 75 2.9759
DEBRIS FLOW 11 1.6608
EXCESSIVE HEAT 296 5.4405
EXTREME COLD/WIND CHILL 103 4.5318
FLASH FLOOD 392 8.0755
FLOOD 187 5.0049
HEAT 127 4.1476
HEAVY RAIN 34 2.5950
HEAVY SNOW 18 0.9923
HIGH SURF 86 2.2931
HIGH WIND 92 3.4457
HURRICANE/TYPHOON 23 2.1981
ICE STORM 20 2.7519
LIGHTNING 387 5.3156
MARINE STRONG WIND 12 1.7889
MARINE THUNDERSTORM WIND 12 2.3158
RIP CURRENT 384 5.3801
STRONG WIND 90 2.6667
THUNDERSTORM WIND 195 6.4762
TORNADO 339 13.5732
TROPICAL STORM 20 3.8434
WILDFIRE 31 2.6290
WINTER STORM 51 0.9436
WINTER WEATHER 46 3.7781
Note:
The skewness was rounded to 4 decimal places.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of fatalities, indicating that the corresponding distribution of fatalities has a heavy tail that wasn’t possible to be observed when few observation were available.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.1.2 Process the target data for harm on population health with respect to fatalities

To create the table with the processed data for the harm on population health with respect to fatalities from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable FATALITIES for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small number of fatalities were observed for the majority of cases that resulted in non-zero fatalities while in the few cases with the highest impact they caused lots of fatalities.

Having in mind that the average number of fatalities will be used to determine which weather event types were the most harmful to population health (with respect to fatalities) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on population health with respect to fatalities.
processed_data_____harm_on_population_health_____fatalities <- 
  target_data_____harm_on_population_health_____fatalities[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest fatalities
    ##   - the 10% of weather events that resulted in highest fatalities
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(FATALITIES, 0.9)
    , by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on population health with respect to fatalities contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 26 levels) : the type of each weather event
  3. FATALITIES (int) : the number of fatalities
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 3175 observations.

## Classes 'data.table' and 'data.frame':   3175 obs. of  4 variables:
##  $ REFNUM                  : int  413652 413757 413763 413862 414153 414183 414184 414187 414200 414267 ...
##  $ EVENT_TYPE              : Factor w/ 26 levels "AVALANCHE","BLIZZARD",..: 21 22 13 21 16 10 16 19 13 22 ...
##  $ FATALITIES              : int  1 2 1 1 1 1 1 1 1 2 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.1.3 Summarize the processed data for harm on population health with respect to fatalities by each weather event type

To evaluate the harm on population health by each weather event type with respect to fatalities a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average number of fatalities of the weather events that resulted in non-zero fatalities

The overall average number of fatalities caused by each weather event type was initially examined along with the skewness of the number of fatalities for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average number of fatalities for 90% of weather events with the lowest impact versus the average number of fatalities for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average number of fatalities that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on population health 
# with respect to fatalities for each weather event type.
summary_____harm_on_population_health______fatalities <- 
  processed_data_____harm_on_population_health_____fatalities[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average number of fatalities caused by each weather event type.
    "AVRG" = round(mean(FATALITIES), 2),
    ## The skewness of fatalities for the observations by each weather event type.
    "SKEWNESS" = round(skewness(FATALITIES), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average number of fatalities caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(FATALITIES), 2)],
    ## The skewness of fatalities for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(FATALITIES), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average number of fatalities caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(FATALITIES), 2)],
    ## The skewness of fatalities for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(FATALITIES), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average number of fatalities is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on population health with respect to fatalities by each weather event type that was created in this section were presented at the subsection 10.1.2 Most harmful event types with respect to fatalities of the chapter 10 RESULTS.

The table with the summary for the harm on population health with respect to fatalities by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_population_health –> results

with filename:

  • summary______harm_on_population_health______fatalities.R

The main reason for exporting the file with the summary for the harm on population health with respect to fatalities by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.1.4 Visualize the results of the summary for the harm on population health with respect to fatalities by each weather event type

From the table with the summary for the harm on population health by each weather event type with respect to fatalities the Multiplot 1.1 was created to present an overview of the results for the three different aspects that were examined for this perspective.

The elementary plots were created:

  • 8.1.4.1.1 Create The Plot 1.1.1
    • Displays the overall average number of fatalities caused by each weather event type based on all the cases of weather events that resulted in non-zero fatalities.
  • 8.1.4.1.2 Create The Plot 1.1.2
    • Displays the average number of fatalities caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero fatalities.
  • 8.1.4.1.3 Create The Plot 1.1.3
    • Displays the average number of fatalities caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero fatalities.
  • 8.1.4.1.4 Create The Plot 1.1.4
    • Displays a comparison for each weather event type, of the average number of fatalities for the 90% of its observations with the lowest impact versus the average number of fatalities for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero fatalities.

which were then combined in order to obtain the Multiplot 1.1.

It constitutes the PART 1 of the Figure 1 that displays the overview of the harm on population health by each weather event type.

(Note that neither the Multiplot 1.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.1 constitutes the PART 1.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.4.1 Create the components of Multiplot 1.1

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on population health with respect to fatalities by each weather event type.

  • 8.1.4.1.1 Create The Plot 1.1.1
    • Displays the overall average number of fatalities caused by each weather event type based on all the cases of weather events that resulted in non-zero fatalities.
  • 8.1.4.1.2 Create The Plot 1.1.2
    • Displays the average number of fatalities caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero fatalities.
  • 8.1.4.1.3 Create The Plot 1.1.3
    • Displays the average number of fatalities caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero fatalities.
  • 8.1.4.1.4 Create The Plot 1.1.4
    • Displays a comparison for each weather event type, of the average number of fatalities for the 90% of its observations with the lowest impact versus the average number of fatalities for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero fatalities.

The elementary plots were used to compose the Multiplot 1.1.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.4.1.1 Create The Plot 1.1.1

The Plot 1.1.1 displays the overall average number of fatalities caused by each weather event type taking into account all and only the observation that resulted in non-zero fatalities.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of fatalities they caused.

The skewness of the number of fatalities for the observations of each weather event type (based on which the overall number of fatalities was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.1.1 that displays 
# the overall average number of fatalities 
# by each weather event type for all cases. 
elementary_plot_1_1_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______fatalities,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average number of fatalities caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of fatalities.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of fatalities it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of fatalities for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.1 will be composed from the four elementary plots. 
    limits = c(-2, 14), 
    midpoint = 7, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 1.1.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Fatalities\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.4.1.2 Create The Plot 1.1.2

The Elementary Plot 1.1.2 displays the average number of fatalities for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero fatalities.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of fatalities they caused.
(so it is NOT based on the average number of fatalities caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the number of fatalities for the observations of each weather event type (based on which the average number of fatalities for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.1.2 that displays 
# the average number of fatalities by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_1_1_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______fatalities,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average number of fatalities caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of fatalities 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of fatalities it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of fatalities for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.1 will be composed from the four elementary plots.
    limits = c(-2, 14), 
    midpoint = 7, 
    low = "lightgreen",
    mid = "orange",
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.1.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Fatalities for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.4.1.3 Create The Plot 1.1.3

The Plot 1.1.3 displays the average number of fatalities for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero fatalities.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of fatalities they caused.
(so it is NOT based on the average number of fatalities caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the number of fatalities for the observations of each weather event type (based on which the average number of fatalities for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.1.3 that displays 
# the average number of fatalities by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_1_1_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______fatalities,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average number of fatalities caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of fatalities 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of fatalities it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of fatalities for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.1 will be composed from the four elementary plots.
    limits = c(-2, 14), 
    midpoint = 7, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.1.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Fatalities for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.1.4.1.4 Create The Plot 1.1.4

The Plot 1.1.4 displays a compact overview of all three aspect that were examined for the harm on population health with respect to fatalities.

For each weather event type, the comparison was visualized for the average number of fatalities for the 90% of cases with the lowest impact versus the average number of fatalities for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of fatalities they caused.

The skewness of the number of fatalities for the observations of each weather event type (based on which the overall number of fatalities was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.1.4 that displays 
# by each weather event type the comparison of 
# the average number of fatalities 
# for the 90% of cases with the lowest impact
# versus the average number of fatalities 
# for the 10% of cases with the highest impact.
elementary_plot_1_1_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______fatalities,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average number of fatalities it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of fatalities for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.1 will be composed from the four elementary plots.
    limits = c(-2, 14),
    midpoint = 7, 
    low = "lightgreen",
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(1, 18)) +
    ylim(c(0.75, 2)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.1.4",
    subtitle = paste0(
      "Comparison of the average number of fatalities ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average number of fatalities ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Fatalities by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Fatalities by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of fatalities for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 1) ", "\n",
      "When the color of a bar is gray, the skewness was indeterminable ",
      "due to the fact that all observations for that weather event type ",
      "took the same value."
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.1.4.2 Compose the Multiplot 1.1

The four elementary plots that were created from the results of the summary for the harm on population health with respect to fatalities by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on population health with respect to fatalities
# by each weather event type.
multiplot_1_1 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 1: Harm on population health by each weather event type ", 
        "with the respect to fatalities ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero fatalities.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero fatalities were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average number of fatalities.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the number of fatalities, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 1.1.1
    # Elementary plot for the average number of fatalities 
    # by each weather event type for all cases.
    elementary_plot_1_1_1,
    
    # ELEMENTARY PLOT 1.1.2
    # Elementary plot for the average number of fatalities 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_1_1_2,
    
    # ELEMENTARY PLOT 1.1.3
    # Elementary plot for the average number of fatalities 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_1_1_3,
    
    # ELEMENTARY PLOT 1.1.4
    # Elementary Plot 1.1.4 for the comparison of 
    # the average number of fatalities 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_1_1_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 13
    )
)

(Note that the Multiplot 1.1 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health of the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.1 constitutes the PART 1.)


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS








8.2 Harm On Population Health With Respect To Injuries By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on population health with respect to injuries were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on population health with respect to injuries was summarized by each weather event type.

Three aspects were examined:

  1. The overall average number of injuries by each weather event type.
  2. The average number of injuries by each weather event type for the 90% of cases with the lowest impact.
  3. The average number of injuries by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average number of injuries by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average number of injuries was used as the main criterion to determine which weather events caused the most harm on population health with respect to injuries but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the injuries were highly positively skewed).

The table with results for the harm on population health with respect to injuries by each weather event type were presented at the subsection 10.1.3 Most harmful event types with respect to injuries of the chapter 10 RESULTS.

Finally the Multiplot 1.2 was created to visualize the results of the harm on population health with respect to injuries by each weather event type.

*(Note that neither the Multiplot 1.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, where the Figure 1 was presented, of which the Multiplot 1.2 constitutes the PART 2.)

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.1 Extract the target data for harm on population health with respect to injuries

In order to examine the harm on population health with respect to injuries caused by each weather event type, the variables REFNUM, EVENT_TYPE and INJURIES were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero injuries were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero injuries (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of injuries for each weather event type was skewed.
  2. a period of 10 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero injuries for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on population health with respect to injuries consist of 5581 observations.

## Classes 'data.table' and 'data.frame':   5581 obs. of  3 variables:
##  $ REFNUM    : int  413614 413649 413652 413663 413737 413743 413746 413757 413763 413795 ...
##  $ EVENT_TYPE: chr  "TORNADO" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ INJURIES  : int  4 2 4 1 6 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 27 distinct weather event types, for most of which the variable INJURIES was highly positively skewed.

Table 8.2.1-1: Facts about the table with the target data subset of observations for the harm on population health with respect to injuries.
EVENT_TYPE N SKEWNESS
AVALANCHE 80 3.2455
BLIZZARD 12 2.0441
DEBRIS FLOW 12 0.6818
DENSE FOG 20 1.4182
DUST DEVIL 10 1.8590
DUST STORM 22 1.5095
EXCESSIVE HEAT 86 4.1751
FLASH FLOOD 190 9.4282
FLOOD 61 4.6609
HAIL 109 5.8015
HEAT 36 2.1619
HEAVY RAIN 50 4.0900
HEAVY SNOW 31 4.3682
HIGH SURF 54 5.7692
HIGH WIND 220 10.7119
HURRICANE/TYPHOON 15 2.7730
ICE STORM 25 3.4714
LIGHTNING 1411 6.6360
MARINE THUNDERSTORM WIND 11 2.2867
RIP CURRENT 149 4.5935
STRONG WIND 142 2.9883
THUNDERSTORM WIND 1236 9.0224
TORNADO 1252 16.3086
TROPICAL STORM 19 3.8833
WILDFIRE 230 5.8510
WINTER STORM 51 3.1228
WINTER WEATHER 47 4.1679
Note:
The skewness was rounded to 4 decimal places.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of injuries, indicating that the corresponding distribution of injuries has a heavy tail that wasn’t possible to be observed when few observation were available.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.2.2 Process the target data for harm on population health with respect to injuries

To create the table with the processed data for the harm on population health with respect to injuries from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable INJURIES for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small number of injuries were observed for the majority of cases that resulted in non-zero injuries while in the few cases with the highest impact they caused lots of injuries.

Having in mind that the average number of injuries will be used to determine which weather event types were the most harmful to population health (with respect to injuries) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on population health with respect to injuries.
processed_data_____harm_on_population_health_____injuries <- 
  target_data_____harm_on_population_health_____injuries[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest injuries
    ##   - the 10% of weather events that resulted in highest injuries
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(INJURIES, 0.9)
    , by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on population health with respect to injuries contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 27 levels) : the type of each weather event
  3. INJURIES (int) : the number of injuries
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 5581 observations.

## Classes 'data.table' and 'data.frame':   5581 obs. of  4 variables:
##  $ REFNUM                  : int  413614 413649 413652 413663 413737 413743 413746 413757 413763 413795 ...
##  $ EVENT_TYPE              : Factor w/ 27 levels "AVALANCHE","BLIZZARD",..: 23 22 22 22 22 22 22 23 15 18 ...
##  $ INJURIES                : int  4 2 4 1 6 1 1 1 1 1 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 1 1 2 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.2.3 Summarize the processed data for harm on population health with respect to injuries by each weather event type

To evaluate the harm on population health by each weather event type with respect to injuries a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average number of injuries of the weather events that resulted in non-zero injuries

The overall average number of injuries caused by each weather event type was initially examined along with the skewness of the number of injuries for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average number of injuries for 90% of weather events with the lowest impact versus the average number of injuries for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average number of injuries that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on population health 
# with respect to injuries for each weather event type.
summary_____harm_on_population_health______injuries <- 
  processed_data_____harm_on_population_health_____injuries[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average number of injuries caused by each weather event type.
    "AVRG" = round(mean(INJURIES), 2),
    ## The skewness of injuries for the observations by each weather event type.
    "SKEWNESS" = round(skewness(INJURIES), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average number of injuries caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(INJURIES), 2)],
    ## The skewness of injuries for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(INJURIES), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average number of injuries caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(INJURIES), 2)],
    ## The skewness of injuries for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(INJURIES), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average number of injuries is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on population health with respect to injuries by each weather event type that was created in this section were presented at the subsection 10.1.3 Most harmful event types with respect to injuries of the chapter 10 RESULTS.

The table with the summary for the harm on population health with respect to injuries by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_population_health –> results

with filename:

  • summary______harm_on_population_health______injuries.R

The main reason for exporting the file with the summary for the harm on population health with respect to injuries by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.2.4 Visualize the results of the summary for the harm on population health with respect to injuries by each weather event type

From the table with the summary for the harm on population health by each weather event type with respect to injuries the Multiplot 1.2 was created to present an overview of the results for the three different aspects that were examined for this perspective.

Four elementary plots were created:

  • 8.2.4.1.1 Create The Plot 1.2.1
    • Displays the overall average number of injuries caused by each weather event type based on all the cases of weather events that resulted in non-zero injuries.
  • 8.2.4.1.2 Create The Plot 1.2.2
    • Displays the average number of injuries caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero injuries.
  • 8.2.4.1.3 Create The Plot 1.2.3
    • Displays the average number of injuries caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero injuries.
  • 8.2.4.1.4 Create The Plot 1.2.4
    • Displays a comparison for each weather event type, of the average number of injuries for the 90% of its observations with the lowest impact versus the average number of injuries for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero injuries.

which were then combined in order to obtain the Multiplot 1.2.

It constitutes the PART 2 of the Figure 1 that displays the overview of the harm on population health by each weather event type.

(Note that neither the Multiplot 1.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.2 constitutes the PART 2.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.4.1 Create the components of Multiplot 1.2

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on population health with respect to injuries by each weather event type.

  • 8.2.4.1.1 Create The Plot 1.2.1
    • Displays the overall average number of injuries caused by each weather event type based on all the cases of weather events that resulted in non-zero injuries.
  • 8.2.4.1.2 Create The Plot 1.2.2
    • Displays the average number of injuries caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero injuries.
  • 8.2.4.1.3 Create The Plot 1.2.3
    • Displays the average number of injuries caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero injuries.
  • 8.2.4.1.4 Create The Plot 1.2.4
    • Displays a comparison for each weather event type, of the average number of injuries for the 90% of its observations with the lowest impact versus the average number of injuries for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero injuries.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.4.1.1 Create The Plot 1.2.1

The Plot 1.2.1 displays the overall average number of injuries caused by each weather event type taking into account all and only the observation that resulted in non-zero injuries.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of injuries they caused.

The skewness of the number of injuries for the observations of each weather event type (based on which the overall number of injuries was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.2.1 that displays 
# the overall average number of injuries 
# by each weather event type for all cases. 
elementary_plot_1_2_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______injuries,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average number of injuries caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of injuries.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of injuries it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of injuries for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.2 will be composed from the four elementary plots. 
    limits = c(-2, 17), 
    midpoint = 7, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 1.2.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Injuries\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.4.1.2 Create The Plot 1.2.2

The Elementary Plot 1.2.2 displays the average number of injuries for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero injuries.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of injuries they caused.
(so it is NOT based on the average number of injuries caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the number of injuries for the observations of each weather event type (based on which the average number of injuries for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.2.2 that displays 
# the average number of injuries by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_1_2_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______injuries,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average number of injuries caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of injuries 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of injuries it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of injuries for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.2 will be composed from the four elementary plots.
    limits = c(-2, 17), 
    midpoint = 7, 
    low = "lightgreen",
    mid = "orange",
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.2.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Injuries for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.4.1.3 Create The Plot 1.2.3

The Plot 1.2.3 displays the average number of injuries for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero injuries.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of injuries they caused.
(so it is NOT based on the average number of injuries caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the number of injuries for the observations of each weather event type (based on which the average number of injuries for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.2.3 that displays 
# the average number of injuries by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_1_2_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______injuries,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average number of injuries caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of injuries 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of injuries it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of injuries for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.2 will be composed from the four elementary plots.
    limits = c(-2, 17), 
    midpoint = 7, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.2.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Injuries for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.2.4.1.4 Create The Plot 1.2.4

The Plot 1.2.4 displays a compact overview of all three aspect that were examined for the harm on population health with respect to injuries.

For each weather event type, the comparison was visualized for the average number of injuries for the 90% of cases with the lowest impact versus the average number of injuries for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of injuries they caused.

The skewness of the number of injuries for the observations of each weather event type (based on which the overall number of injuries was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.2.4 that displays 
# by each weather event type the comparison of 
# the average number of injuries 
# for the 90% of cases with the lowest impact
# versus the average number of injuries 
# for the 10% of cases with the highest impact.
elementary_plot_1_2_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______injuries,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average number of injuries it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of injuries for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.2 will be composed from the four elementary plots.
    limits = c(-2, 17),
    midpoint = 7, 
    low = "lightgreen",
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(-20, 550)) +
    ylim(c(-1, 17)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.2.4",
    subtitle = paste0(
      "Comparison of the average number of injuries ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average number of injuries ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Injuries by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Injuries by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of injuries for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 2) ", "\n",
      "When the color of a bar is gray, the skewness was indeterminable ",
      "due to the fact that all observations for that weather event type ",
      "took the same value."
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.2.4.2 Compose the Multiplot 1.2

The four elementary plots that were created from the results of the summary for the harm on population health with respect to injuries by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on population health with respect to injuries
# by each weather event type.
multiplot_1_2 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 2: Harm on population health by each weather event type ", 
        "with the respect to injuries ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero injuries.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero injuries were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average number of injuries.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the number of injuries, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 1.2.1
    # Elementary plot for the average number of injuries 
    # by each weather event type for all cases.
    elementary_plot_1_2_1,
    
    # ELEMENTARY PLOT 1.2.2
    # Elementary plot for the average number of injuries 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_1_2_2,
    
    # ELEMENTARY PLOT 1.2.3
    # Elementary plot for the average number of injuries 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_1_2_3,
    
    # ELEMENTARY PLOT 1.2.4
    # Elementary Plot 1.2.4 for the comparison of 
    # the average number of injuries 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_1_2_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 13
    )
)

*(Note that the Multiplot 1.2 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health of the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.2 constitutes the PART 2.)


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






8.3 Harm On Population Health With Respect To Casualties By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on population health with respect to casualties were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on population health with respect to casualties was summarized by each weather event type.

Three aspects were examined:

  1. The overall average number of casualties by each weather event type.
  2. The average number of casualties by each weather event type for the 90% of cases with the lowest impact.
  3. The average number of casualties by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average number of casualties by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average number of casualties was used as the main criterion to determine which weather events caused the most harm on population health with respect to casualties but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the casualties were highly positively skewed).

The table with results for the harm on population health with respect to casualties by each weather event type were presented at the subsection 10.1.4 Most harmful event types with respect to casualties of the chapter 10 RESULTS.

Finally the Multiplot 1.3 was created to visualize the results of the harm on population health with respect to casualties by each weather event type.

*(Note that neither the Multiplot 1.3 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, where the Figure 1 was presented, of which the Multiplot 1.3 constitutes the PART 3.)

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.1 Extract the target data for harm on population health with respect to casualties

In order to examine the harm on population health with respect to casualties caused by each weather event type, the variables REFNUM, EVENT_TYPE and CASUALTIES were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero casualties were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero casualties (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of casualties for each weather event type was skewed.
  2. a period of 10 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero casualties for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on population health with respect to casualties consist of 7936 observations.

## Classes 'data.table' and 'data.frame':   7936 obs. of  3 variables:
##  $ REFNUM    : int  413614 413649 413652 413663 413737 413743 413746 413757 413763 413795 ...
##  $ EVENT_TYPE: chr  "TORNADO" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ CASUALTIES: int  4 2 5 1 6 1 1 3 2 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 30 distinct weather event types, for most of which the variable CASUALTIES was highly positively skewed.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of casualties, indicating that the corresponding distribution of casualties has a heavy tail that wasn’t possible to be observed when few observation were available.

Table 8.3.1-1: Facts about the table with the target data subset of observations for the harm on population health with respect to casualties.
EVENT_TYPE N SKEWNESS
AVALANCHE 180 2.3975
BLIZZARD 22 2.3705
COLD/WIND CHILL 76 5.0297
DEBRIS FLOW 19 2.2183
DENSE FOG 20 1.3831
DUST DEVIL 12 2.1224
DUST STORM 23 1.5025
EXCESSIVE HEAT 350 8.3298
EXTREME COLD/WIND CHILL 107 4.3053
FLASH FLOOD 540 14.4341
FLOOD 231 9.3312
HAIL 110 5.8303
HEAT 154 5.2894
HEAVY RAIN 75 5.0249
HEAVY SNOW 45 5.2993
HIGH SURF 119 8.3730
HIGH WIND 279 11.3363
HURRICANE/TYPHOON 33 4.4573
ICE STORM 38 4.3115
LIGHTNING 1657 6.9576
MARINE STRONG WIND 16 1.9270
MARINE THUNDERSTORM WIND 17 2.3442
RIP CURRENT 475 6.9329
STRONG WIND 211 3.0745
THUNDERSTORM WIND 1364 9.4260
TORNADO 1327 17.6038
TROPICAL STORM 34 5.3288
WILDFIRE 244 6.5566
WINTER STORM 84 3.9675
WINTER WEATHER 74 5.2237
Note:
The skewness was rounded to 4 decimal places.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.3.2 Process the target data for harm on population health with respect to casualties

To create the table with the processed data for the harm on population health with respect to casualties from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable CASUALTIES for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small number of casualties were observed for the majority of cases that resulted in non-zero casualties while in the few cases with the highest impact they caused lots of casualties.

Having in mind that the average number of casualties will be used to determine which weather event types were the most harmful to population health (with respect to casualties) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on population health with respect to casualties.
processed_data_____harm_on_population_health_____casualties <- 
  target_data_____harm_on_population_health_____casualties[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest casualties
    ##   - the 10% of weather events that resulted in highest casualties
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(CASUALTIES, 0.9), 
    by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on population health with respect to casualties contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 30 levels) : the type of each weather event
  3. CASUALTIES (int ): the number of casualties
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 7936 observations.

## Classes 'data.table' and 'data.frame':   7936 obs. of  4 variables:
##  $ REFNUM                  : int  413614 413649 413652 413663 413737 413743 413746 413757 413763 413795 ...
##  $ EVENT_TYPE              : Factor w/ 30 levels "AVALANCHE","BLIZZARD",..: 26 25 25 25 25 25 25 26 17 20 ...
##  $ CASUALTIES              : int  4 2 5 1 6 1 1 3 2 1 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 2 1 2 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.3.3 Summarize the processed data for harm on population health with respect to casualties by each weather event type

To evaluate the harm on population health by each weather event type with respect to casualties a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average number of casualties of the weather events that resulted in non-zero casualties

The overall average number of casualties caused by each weather event type was initially examined along with the skewness of the number of casualties for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average number of casualties for 90% of weather events with the lowest impact versus the average number of casualties for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average number of casualties that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on population health 
# with respect to casualties for each weather event type.
summary_____harm_on_population_health______casualties <- 
  processed_data_____harm_on_population_health_____casualties[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average number of casualties caused by each weather event type.
    "AVRG" = round(mean(CASUALTIES), 2),
    ## The skewness of casualties for the observations by each weather event type.
    "SKEWNESS" = round(skewness(CASUALTIES), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average number of casualties caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(CASUALTIES), 2)],
    ## The skewness of casualties for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(CASUALTIES), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average number of casualties caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(CASUALTIES), 2)],
    ## The skewness of casualties for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(CASUALTIES), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average number of casualties is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on population health with respect to casualties by each weather event type that was created in this section were presented at the subsection 10.1.4 Most harmful event types with respect to casualties of the chapter 10 RESULTS.

The table with the summary for the harm on population health with respect to casualties by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_population_health –> results

with filename:

  • summary______harm_on_population_health______casualties.R

The main reason for exporting the file with the summary for the harm on population health with respect to casualties by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.3.4 Visualize the results of the summary for the harm on population health with respect to casualties by each weather event type

From the table with the summary for the harm on population health by each weather event type with respect to casualties the Multiplot 1.3 was created to present an overview of the results for the three different aspects that were examined for this perspective.

Four elementary plots were created:

  • 8.3.4.1.1 Create The Plot 1.3.1
    • Displays the overall average number of casualties caused by each weather event type based on all the cases of weather events that resulted in non-zero casualties.
  • 8.3.4.1.2 Create The Plot 1.3.2
    • Displays the average number of casualties caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero casualties.
  • 8.3.4.1.3 Create The Plot 1.3.3
    • Displays the average number of casualties caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero casualties.
  • 8.3.4.1.4 Create The Plot 1.3.4
    • Displays a comparison for each weather event type, of the average number of casualties for the 90% of its observations with the lowest impact versus the average number of casualties for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero casualties.

which were then combined in order to obtain the Multiplot 1.3.

It constitutes the PART 3 of the Figure 1 that displays the overview of the harm on population health by each weather event type.

(Note that neither the Multiplot 1.3 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health at the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.2 constitutes the PART 2.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.4.1 Create the components of Multiplot 1.3

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on population health with respect to casualties by each weather event type.

  • 8.3.4.1.1 Create The Plot 1.3.1
    • Displays the overall average number of casualties caused by each weather event type based on all the cases of weather events that resulted in non-zero casualties.
  • 8.3.4.1.2 Create The Plot 1.3.2
    • Displays the average number of casualties caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero casualties.
  • 8.3.4.1.3 Create The Plot 1.3.3
    • Displays the average number of casualties caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero casualties.
  • 8.3.4.1.4 Create The Plot 1.3.4
    • Displays a comparison for each weather event type, of the average number of casualties for the 90% of its observations with the lowest impact versus the average number of casualties for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero casualties.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.4.1.1 Create The Plot 1.3.1

The Plot 1.3.1 displays the overall average number of casualties caused by each weather event type taking into account all and only the observation that resulted in non-zero casualties.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of casualties they caused.

The skewness of the number of casualties for the observations of each weather event type (based on which the overall number of casualties was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.3.1 that displays 
# the overall average number of casualties 
# by each weather event type for all cases. 
elementary_plot_1_3_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______casualties,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average number of casualties caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of casualties.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of casualties it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of casualties for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.3 will be composed from the four elementary plots. 
    limits = c(-2, 18), 
    midpoint = 8, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 1.3.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Casualties\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.4.1.2 Create The Plot 1.3.2

The Elementary Plot 1.3.2 displays the average number of casualties for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero casualties.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of casualties they caused.
(so it is NOT based on the average number of casualties caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the number of casualties for the observations of each weather event type (based on which the average number of casualties for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.3.2 that displays 
# the average number of casualties by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_1_3_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______casualties,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average number of casualties caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of casualties 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of casualties it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of casualties for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.3 will be composed from the four elementary plots.
    limits = c(-2, 18), 
    midpoint = 8, 
    low = "lightgreen",
    mid = "orange",
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.3.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Casualties for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.4.1.3 Create The Plot 1.3.3

The Plot 1.3.3 displays the average number of casualties for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero casualties.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of casualties they caused.
(so it is NOT based on the average number of casualties caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the number of casualties for the observations of each weather event type (based on which the average number of casualties for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.3.3 that displays 
# the average number of casualties by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_1_3_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______casualties,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average number of casualties caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average number of casualties 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## number of casualties it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of casualties for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.3 will be composed from the four elementary plots.
    limits = c(-2, 18), 
    midpoint = 8, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.3.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Casualties for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


8.3.4.1.4 Create The Plot 1.3.4

The Plot 1.3.4 displays a compact overview of all three aspect that were examined for the harm on population health with respect to casualties.

For each weather event type, the comparison was visualized for the average number of casualties for the 90% of cases with the lowest impact versus the average number of casualties for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to population health, based on the overall average number of casualties they caused.

The skewness of the number of casualties for the observations of each weather event type (based on which the overall number of casualties was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 1.3.4 that displays 
# by each weather event type the comparison of 
# the average number of casualties 
# for the 90% of cases with the lowest impact
# versus the average number of casualties 
# for the 10% of cases with the highest impact.
elementary_plot_1_3_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_population_health______casualties,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average number of casualties it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average number of casualties for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 1.3 will be composed from the four elementary plots.
    limits = c(-2, 18),
    midpoint = 8, 
    low = "lightgreen",
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(0, 320)) +
    ylim(c(0.5, 7)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 1.3.4",
    subtitle = paste0(
      "Comparison of the average number of casualties ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average number of casualties ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Casualties by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Casualties by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of casualties for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 3) "
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




8.3.4.2 Compose the Multiplot 1.3

The four elementary plots that were created from the results of the summary for the harm on population health with respect to casualties by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on population health with respect to casualties
# by each weather event type.
multiplot_1_3 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 3: Harm on population health by each weather event type ", 
        "with the respect to casualties ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero casualties.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero casualties were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average number of casualties.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the number of casualties, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 1.3.1
    # Elementary plot for the average number of casualties 
    # by each weather event type for all cases.
    elementary_plot_1_3_1,
    
    # ELEMENTARY PLOT 1.3.2
    # Elementary plot for the average number of casualties 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_1_3_2,
    
    # ELEMENTARY PLOT 1.3.3
    # Elementary plot for the average number of casualties 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_1_3_3,
    
    # ELEMENTARY PLOT 1.3.4
    # Elementary Plot 1.3.4 for the comparison of 
    # the average number of casualties 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_1_3_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 13
    )
)

(Note that the Multiplot 1.3 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.1.1 Overview of results for the harm on population health of the chapter 10 RESULTS, were the Figure 1 was presented, of which the Multiplot 1.3 constitutes the PART 3.)


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













9 HARM ON ECONOMY


In this chapter an attempt was made to quantify the harm on economy based on the information from the table with the processed data.

The harm on economy was examined over three perspectives:

  1. The harm on economy with respect to property damage caused by each weather event type based on the observations for weather events that resulted in non-zero property damage at United States in the period from 2001 to 2011.
  2. The harm on economy with respect to crop damage caused by each weather event type based on the observations for weather events that resulted in non-zero crop damage at United States in the period from 2001 to 2011.
  3. The harm on economy with respect to economic damage
    (sum of property damage and crop damage) caused by each weather event type based on the observations for weather events that resulted in non-zero economic damage at United States in the period from 2001 to 2011.

The weather event types for which less than 10 observations that resulted in non-zero harm were available with respect to a perspective of interest were ommited (from the analysis of that particular perspective), to avoid highly misleading statistics. Consequently the subset of weather event types that were included for each of the three perspectives is different.

Due to the fact that for all perspectives the values of interest for the observations of most weather event types were highly positively skewed, it was consider important in order to obtain an insightful picture of their consequences to examine them over three different apsects:

  1. The overall harm on economy caused by each weather event type.
  2. The harm on economy cauced by the 90% of cases with the lowest impact of each weather event type.
  3. The harm on economy cauced by the 10% of cases with the highest impact of each weather event type.

For every apsect the sample size, the skewness and the mean of the values that encapsulated the harm with respect to each perspective were summarized by each weather event type and reported.

The results obtained for the harm on economy by each weather event type were presented at the section 10.2 Question 2 : Across the United States, which types of events have the greatest economic consequences? of the chapter 10 RESULTS.

(In compliance with the restrictions of the assignment, according to which at least 1 but no more than 3 figures should be included in the report, the multiplots as well as the elementary plots that they contain were NOT displayed separately and can ONLY be examined as PARTs of the Figure 2 at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.)


back to start of this chapter
back to TABLE OF CONTENTS


9.1 Harm On Economy With Respect To Property Damage By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on economy with respect to property damage were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on economy with respect to property damage was summarized by each weather event type.

Three aspects were examined:

  1. The overall average property damage by each weather event type.
  2. The average property damage by each weather event type for the 90% of cases with the lowest impact.
  3. The average property damage by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average property damage by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average property damage was used as the main criterion to determine which weather events caused the most harm on economy with respect to property damage but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the property damage were highly positively skewed).

The table with results for the harm on economy with respect to property damage by each weather event type were presented at the subsection 10.2.2 Most harmful event types with respect to property damage of the chapter 10 RESULTS.

Finally the Multiplot 2.1 was created to visualize the results of the harm on economy with respect to property damage by each weather event type.

*(Note that neither the Multiplot 2.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS, where the Figure 2 was presented, of which the Multiplot 2.1 constitutes the PART 1.)

of the chapter .

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.1 Extract the target data for harm on economy with respect to property damage

In order to examine the harm on economy with respect to property damage caused by each weather event type, the variables REFNUM, EVENT_TYPE and PROPERTY_DAMAGE were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero property damage were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero property damage (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of property damage for each weather event type was skewed.
  2. a period of 11 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero property damage for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on economy with respect to property damage consist of 136928 observations.

## Classes 'data.table' and 'data.frame':   136928 obs. of  3 variables:
##  $ REFNUM         : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ EVENT_TYPE     : chr  "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ PROPERTY_DAMAGE: num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 37 distinct weather event types, for most of which the variable PROPERTY_DAMAGE was highly positively skewed.

Table 9.1.1-1: Facts about the table with the target data subset of observations for the harm on economy with respect to property damage.
EVENT_TYPE N SKEWNESS
AVALANCHE 33 3.4882
BLIZZARD 129 10.5403
COASTAL FLOOD 152 4.5996
COLD/WIND CHILL 14 1.5907
DEBRIS FLOW 189 6.0565
DENSE FOG 56 3.7347
DROUGHT 30 4.9802
DUST DEVIL 60 2.4345
DUST STORM 60 3.7794
EXCESSIVE HEAT 20 4.0309
EXTREME COLD/WIND CHILL 22 4.0178
FLASH FLOOD 13902 61.0935
FLOOD 7072 83.9862
FROST/FREEZE 18 1.7679
HAIL 14584 69.4449
HEAVY RAIN 836 11.4264
HEAVY SNOW 573 7.0114
HIGH SURF 76 5.0462
HIGH WIND 3851 37.6952
HURRICANE/TYPHOON 107 4.9333
ICE STORM 410 8.6732
LAKE-EFFECT SNOW 195 13.1024
LIGHTNING 6162 22.3701
MARINE HIGH WIND 18 3.8120
MARINE STRONG WIND 34 5.3773
MARINE THUNDERSTORM WIND 127 10.0994
STORM SURGE/TIDE 131 9.6344
STRONG WIND 3179 51.6282
THUNDERSTORM WIND 73657 167.8966
TORNADO 8552 55.2385
TROPICAL DEPRESSION 35 5.4232
TROPICAL STORM 363 18.5864
TSUNAMI 14 2.7176
WATERSPOUT 12 3.0130
WILDFIRE 832 15.4642
WINTER STORM 930 29.7861
WINTER WEATHER 493 9.2933
Note:
The skewness was rounded to 4 decimal places.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of property damage, indicating that the corresponding distribution of property damage has a heavy tail that wasn’t possible to be observed when few observation were available.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.1.2 Process the target data for harm on economy with respect to property damage

To create the table with the processed data for the harm on economy with respect to property damage from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable PROPERTY_DAMAGE for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small property damage were observed for the majority of cases that resulted in non-zero fatalities while in the few cases with the highest impact they caused lots of property damage.

Having in mind that the average property damage will be used to determine which weather event types were the most harmful to economy (with respect to property damage) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on economy with respect to property damage.
processed_data_____harm_on_economy_____property_damage <- 
  target_data_____harm_on_economy_____property_damage[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest fatalities
    ##   - the 10% of weather events that resulted in highest fatalities
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(PROPERTY_DAMAGE, 0.9), 
    by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on economy with respect to property damage contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 37 levels) : the type of each weather event
  3. PROPERTY_DAMAGE (int) : the property damage in dollars
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 136928 observations.

## Classes 'data.table' and 'data.frame':   136928 obs. of  4 variables:
##  $ REFNUM                  : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ EVENT_TYPE              : Factor w/ 37 levels "AVALANCHE","BLIZZARD",..: 29 29 29 29 29 29 29 30 29 29 ...
##  $ PROPERTY_DAMAGE         : num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 1 1 1 1 1 1 2 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.1.3 Summarize the processed data for harm on economy with respect to property damage by each weather event type

To evaluate the harm on economy by each weather event type with respect to property damage a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average property damage of the weather events that resulted in non-zero property damage

The overall average property damage caused by each weather event type was initially examined along with the skewness of the property damage for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average property damage for 90% of weather events with the lowest impact versus the average property damage for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average property damage that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on economy 
# with respect to property damage for each weather event type.
summary_____harm_on_economy______property_damage <- 
  processed_data_____harm_on_economy_____property_damage[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average property damage caused by each weather event type.
    "AVRG" = round(mean(PROPERTY_DAMAGE), 0),
    ## The skewness of property damage for the observations by each weather event type.
    "SKEWNESS" = round(skewness(PROPERTY_DAMAGE), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average property damage caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(PROPERTY_DAMAGE), 0)],
    ## The skewness of property damage for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(PROPERTY_DAMAGE), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average property damage caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(PROPERTY_DAMAGE), 0)],
    ## The skewness of property damage for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(PROPERTY_DAMAGE), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average property damage is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on economy with respect to property damage by each weather event type that was created in this section were presented at the subsection 10.2.2 Most harmful event types with respect to property damage of the chapter 10 RESULTS.

The table with the summary for the harm on economy with respect to property damage by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_economy –> results

with filename:

  • summary______harm_on_economy______property_damage.R

In addition a txt file that contains the MD5 hash of the file was created and saved at the same directory with filename:

  • summary_____harm_on_economy______property_damage.R—–(MD5 HASH).txt

The main reason for exporting the file with the summary for the harm on economy with respect to property damage by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.1.4 Visualize the results of the summary for the harm on economy with respect to property damage by each weather event type

From the table with the summary for the harm on economy by each weather event type with respect to property damage the Multiplot 2.1 was created to present an overview of the results for the three different aspects that were examined for this perspective.

Four elementary plots were created:

  • 9.1.4.1.1 Create The Plot 2.1.1
    • Displays the overall average property damage caused by each weather event type based on all the cases of weather events that resulted in non-zero property damage.
  • 9.1.4.1.2 Create The Plot 2.1.2
    • Displays the average property damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero property damage.
  • 9.1.4.1.3 Create The Plot 2.1.3
    • Displays the average property damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero property damage.
  • 9.1.4.1.4 Create The Plot 2.1.4
    • Displays a comparison for each weather event type, of the average property damage for the 90% of its observations with the lowest impact versus the average property damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero property damage.

which were then combined in order to obtain the Multiplot 2.1.

It constitutes the PART 1 of the Figure 2 that displays the overview of the harm on economy by each weather event type.

(Note that neither the Multiplot 2.1 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.4.1 Create the components of Multiplot 2.1

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on economy with respect to property damage by each weather event type.

  • 9.1.4.1.1 Create The Plot 2.1.1
    • Displays the overall average property damage caused by each weather event type based on all the cases of weather events that resulted in non-zero property damage.
  • 9.1.4.1.2 Create The Plot 2.1.2
    • Displays the average property damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero property damage.
  • 9.1.4.1.3 Create The Plot 2.1.3
    • Displays the average property damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero property damage.
  • 9.1.4.1.4 Create The Plot 2.1.4
    • Displays a comparison for each weather event type, of the average property damage for the 90% of its observations with the lowest impact versus the average property damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero property damage.


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.4.1.1 Create The Plot 2.1.1

The Plot 2.1.1 displays the overall average property damage caused by each weather event type taking into account all and only the observation that resulted in non-zero property damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average property damage they caused.

The skewness of the property damage for the observations of each weather event type (based on which the overall property damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.1.1 that displays 
# the overall average property damage 
# by each weather event type for all cases. 
elementary_plot_2_1_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______property_damage,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average property damage caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average property damage.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## property damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average property damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.1 will be composed from the four elementary plots. 
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 2.1.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Property Damage\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.4.1.2 Create The Plot 2.1.2

The Elementary Plot 2.1.2 displays the average property damage for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero property damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average property damage they caused.
(so it is NOT based on the average property damage caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the property damage for the observations of each weather event type (based on which the average property damage for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.1.2 that displays 
# the average property damage by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_2_1_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______property_damage,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average property damage caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average property damage 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## property damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average property damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.1 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.1.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Property Damage for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.4.1.3 Create The Plot 2.1.3

The Plot 2.1.3 displays the average property damage for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero property damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average property damage they caused.
(so it is NOT based on the average property damage caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the property damage for the observations of each weather event type (based on which the average property damage for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.1.3 that displays 
# the average property damage by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_2_1_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______property_damage,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average property damage caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average property damage 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## property damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average property damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.1 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.1.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Property Damage for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.1.4.1.4 Create The Plot 2.1.4

The Plot 2.1.4 displays a compact overview of all three aspect that were examined for the harm on economy with respect to property damage.

For each weather event type, the comparison was visualized for the average property damage for the 90% of cases with the lowest impact versus the average property damage for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average property damage they caused.

The skewness of the property damage for the observations of each weather event type (based on which the overall property damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.1.4 that displays 
# by each weather event type the comparison of 
# the average property damage 
# for the 90% of cases with the lowest impact
# versus the average property damage 
# for the 10% of cases with the highest impact.
elementary_plot_2_1_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______property_damage,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average property damage it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average property damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.1 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(-0.5e9, 6e9)) +
    ylim(c(-1e7, 8.5e7)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.1.4",
    subtitle = paste0(
      "Comparison of the average property damage ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average property damage ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Property Damage by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Property Damage by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of property damage for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 1) "
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.1.4.2 Compose the Multiplot 2.1

The four elementary plots that were created from the results of the summary for the harm on economy with respect to property damage by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on economy with respect to property damage
# by each weather event type.
multiplot_2_1 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 1: Harm on economy by each weather event type ", 
        "with the respect to property damage ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero property damage.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero property damage were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average property damage.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the property damage, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 2.1.1
    # Elementary plot for the average property damage 
    # by each weather event type for all cases.
    elementary_plot_2_1_1,
    
    # ELEMENTARY PLOT 1.1.2
    # Elementary plot for the average property damage 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_2_1_2,
    
    # ELEMENTARY PLOT 1.1.3
    # Elementary plot for the average property damage 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_2_1_3,
    
    # ELEMENTARY PLOT 1.1.4
    # Elementary Plot 2.1.4 for the comparison of 
    # the average property damage 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_2_1_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 13
    )
)

(Note that the Multiplot 2.1 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.), were the Figure 2 was presented, of which the Multiplot 2.1 constitutes the PART 1.)*


back to start of this subsubsection
back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






9.2 Harm On Economy With Respect To Crop Damage By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on economy with respect to crop damage were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on economy with respect to crop damage was summarized by each weather event type.

Three aspects were examined:

  1. The overall average crop damage by each weather event type.
  2. The average crop damage by each weather event type for the 90% of cases with the lowest impact.
  3. The average crop damage by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average crop damage by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average crop damage was used as the main criterion to determine which weather events caused the most harm on economy with respect to crop damage but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the crop damage were highly positively skewed).

The table with results for the harm on economy with respect to crop damage by each weather event type were presented at the subsection 10.2.3 Most harmful event types with respect to crop damage of the chapter 10 RESULTS.

Finally the Multiplot 2.2 was created to visualize the results of the harm on economy with respect to crop damage by each weather event type.

*(Note that neither the Multiplot 2.2 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS, where the Figure 2 was presented, of which the Multiplot 2.2 constitutes the PART 2.)

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.1 Extract the target data for harm on economy with respect to crop damage

In order to examine the harm on economy with respect to crop damage caused by each weather event type, the variables REFNUM, EVENT_TYPE and CROP_DAMAGE were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero crop damage were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero crop damage (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of crop damage for each weather event type was skewed.
  2. a period of 10 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero crop damage for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on economy with respect to crop damage consist of 12177 observations.

## Classes 'data.table' and 'data.frame':   12177 obs. of  3 variables:
##  $ REFNUM     : int  413886 413890 413893 415001 415205 415230 415477 415533 415652 416062 ...
##  $ EVENT_TYPE : chr  "HAIL" "HAIL" "HAIL" "HAIL" ...
##  $ CROP_DAMAGE: num  3000 3000 3000 5000 2500 3000 5000 2500 100000 30000 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 16 distinct weather event types, for most of which the variable CROP_DAMAGE was highly positively skewed.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of crop damage, indicating that the corresponding distribution of crop damage has a heavy tail that wasn’t possible to be observed when few observation were available.

Table 9.2.1-1: Facts about the table with the target data subset of observations for the harm on economy with respect to crop damage.
EVENT_TYPE N SKEWNESS
DROUGHT 158 4.9333
EXTREME COLD/WIND CHILL 11 1.6402
FLASH FLOOD 1296 13.5455
FLOOD 1263 19.0535
FROST/FREEZE 106 5.8134
HAIL 5590 18.5382
HEAVY RAIN 75 7.8538
HIGH WIND 123 7.5985
HURRICANE/TYPHOON 48 5.6962
LIGHTNING 50 6.2946
STRONG WIND 94 8.5291
THUNDERSTORM WIND 2321 13.4840
TORNADO 889 27.0249
TROPICAL STORM 52 3.4070
WILDFIRE 91 5.3055
WINTER STORM 10 2.6305
Note:
The skewness was rounded to 4 decimal places.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.2.2 Process the target data for harm on economy with respect to crop damage

To create the table with the processed data for the harm on economy with respect to crop damage from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable CROP_DAMAGE for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small crop damage were observed for the majority of cases that resulted in non-zero fatalities while in the few cases with the highest impact they caused lots of crop damage.

Having in mind that the average crop damage will be used to determine which weather event types were the most harmful to economy (with respect to crop damage) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on economy with respect to crop damage.
processed_data_____harm_on_economy_____crop_damage <- 
  target_data_____harm_on_economy_____crop_damage[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest fatalities
    ##   - the 10% of weather events that resulted in highest fatalities
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(CROP_DAMAGE, 0.9), 
    by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on economy with respect to crop damage contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 16 levels) : the type of each weather event
  3. CROP_DAMAGE (int) : the crop damage in dollars
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 12177 observations.

## Classes 'data.table' and 'data.frame':   12177 obs. of  4 variables:
##  $ REFNUM                  : int  413886 413890 413893 415001 415205 415230 415477 415533 415652 416062 ...
##  $ EVENT_TYPE              : Factor w/ 16 levels "DROUGHT","EXTREME COLD/WIND CHILL",..: 6 6 6 6 6 6 6 6 3 8 ...
##  $ CROP_DAMAGE             : num  3000 3000 3000 5000 2500 3000 5000 2500 100000 30000 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.2.3 Summarize the processed data for harm on economy with respect to crop damage by each weather event type

To evaluate the harm on economy by each weather event type with respect to crop damage a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average crop damage of the weather events that resulted in non-zero crop damage

The overall average crop damage caused by each weather event type was initially examined along with the skewness of the crop damage for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average crop damage for 90% of weather events with the lowest impact versus the average crop damage for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average crop damage that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on economy 
# with respect to crop damage for each weather event type.
summary_____harm_on_economy______crop_damage <- 
  processed_data_____harm_on_economy_____crop_damage[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average crop damage caused by each weather event type.
    "AVRG" = round(mean(CROP_DAMAGE), 0),
    ## The skewness of crop damage for the observations by each weather event type.
    "SKEWNESS" = round(skewness(CROP_DAMAGE), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average crop damage caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(CROP_DAMAGE), 0)],
    ## The skewness of crop damage for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(CROP_DAMAGE), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average crop damage caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(CROP_DAMAGE), 0)],
    ## The skewness of crop damage for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(CROP_DAMAGE), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average crop damage is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on economy with respect to crop damage by each weather event type that was created in this section were presented at the subsection 10.2.3 Most harmful event types with respect to crop damage of the chapter 10 RESULTS.

The table with the summary for the harm on economy with respect to crop damage by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_economy –> results

with filename:

  • summary______harm_on_economy______crop_damage.R

The main reason for exporting the file with the summary for the harm on economy with respect to crop damage by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.2.4 Visualize the results of the summary for the harm on economy with respect to crop damage by each weather event type

From the table with the summary for the harm on economy by each weather event type with respect to crop damage the Multiplot 2.2 was created to present an overview of the results for the three different aspects that were examined for this perspective.

Four elementary plots were created:

  • 9.2.4.1.1 Create The Plot 2.2.1
    • Displays the overall average crop damage caused by each weather event type based on all the cases of weather events that resulted in non-zero crop damage.
  • 9.2.4.1.2 Create The Plot 2.2.2
    • Displays the average crop damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero crop damage.
  • 9.2.4.1.3 Create The Plot 2.2.3
    • Displays the average crop damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero crop damage.
  • 9.2.4.1.4 Create The Plot 2.2.4
    • Displays a comparison for each weather event type, of the average crop damage for the 90% of its observations with the lowest impact versus the average crop damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero crop damage.

which were then combined in order to obtain the Multiplot 2.2.

It constitutes the PART 2 of the Figure 2 that displays the overview of the harm on economy by each weather event type.

(Note that neither the Multiplot 2.2 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.4.1 Create the components of Multiplot 2.2

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on economy with respect to crop damage by each weather event type.

  • 9.2.4.1.1 Create The Plot 2.2.1
    • Displays the overall average crop damage caused by each weather event type based on all the cases of weather events that resulted in non-zero crop damage.
  • 9.2.4.1.2 Create The Plot 2.2.2
    • Displays the average crop damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero crop damage.
  • 9.2.4.1.3 Create The Plot 2.2.3
    • Displays the average crop damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero crop damage.
  • 9.2.4.1.4 Create The Plot 2.2.4
    • Displays a comparison for each weather event type, of the average crop damage for the 90% of its observations with the lowest impact versus the average crop damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero crop damage.


back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.4.1.1 Create The Plot 2.2.1

The Plot 2.2.1 displays the overall average crop damage caused by each weather event type taking into account all and only the observation that resulted in non-zero crop damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average crop damage they caused.

The skewness of the crop damage for the observations of each weather event type (based on which the overall crop damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.2.1 that displays 
# the overall average crop damage 
# by each weather event type for all cases. 
elementary_plot_2_2_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______crop_damage,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average crop damage caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average crop damage.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## crop damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average crop damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.2 will be composed from the four elementary plots. 
    limits = c(0, 28), 
    midpoint = 14, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 2.2.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Crop Damage\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.4.1.2 Create The Plot 2.2.2

The Elementary Plot 2.2.2 displays the average crop damage for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero crop damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average crop damage they caused.
(so it is NOT based on the average crop damage caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the crop damage for the observations of each weather event type (based on which the average crop damage for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.2.2 that displays 
# the average crop damage by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_2_2_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______crop_damage,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average crop damage caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average crop damage 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## crop damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average crop damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.2 will be composed from the four elementary plots.
    limits = c(0, 28), 
    midpoint = 14, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.2.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Crop Damage for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.4.1.3 Create The Plot 2.2.3

The Plot 2.2.3 displays the average crop damage for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero crop damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average crop damage they caused.
(so it is NOT based on the average crop damage caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the crop damage for the observations of each weather event type (based on which the average crop damage for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.2.3 that displays 
# the average crop damage by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_2_2_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______crop_damage,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average crop damage caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average crop damage 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## crop damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average crop damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.2 will be composed from the four elementary plots.
    limits = c(0, 28), 
    midpoint = 14, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.2.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Crop Damage for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.2.4.1.4 Create The Plot 2.2.4

The Plot 2.2.4 displays a compact overview of all three aspect that were examined for the harm on economy with respect to crop damage.

For each weather event type, the comparison was visualized for the average crop damage for the 90% of cases with the lowest impact versus the average crop damage for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average crop damage they caused.

The skewness of the crop damage for the observations of each weather event type (based on which the overall crop damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.2.4 that displays 
# by each weather event type the comparison of 
# the average crop damage 
# for the 90% of cases with the lowest impact
# versus the average crop damage 
# for the 10% of cases with the highest impact.
elementary_plot_2_2_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______crop_damage,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average crop damage it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average crop damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.2 will be composed from the four elementary plots.
    limits = c(0, 28), 
    midpoint = 14, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(-0.25e8, 5.2e8)) +
    ylim(c(-0.2e7, 1.5e7)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.2.4",
    subtitle = paste0(
      "Comparison of the average crop damage ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average crop damage ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Crop Damage by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Crop Damage by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of crop damage for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 2) ", "\n",
      "When the color of a bar is gray, the skewness was indeterminable ",
      "due to the fact that all observations for that weather event type ",
      "took the same value."
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.2.4.2 Compose the Multiplot 2.2

The four elementary plots that were created from the results of the summary for the harm on economy with respect to crop damage by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on economy with respect to crop damage
# by each weather event type.
multiplot_2_2 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 2: Harm on economy by each weather event type ", 
        "with the respect to crop damage ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero crop damage.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero crop damage were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average crop damage.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the crop damage, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 2.2.1
    # Elementary plot for the average crop damage 
    # by each weather event type for all cases.
    elementary_plot_2_2_1,
    
    # ELEMENTARY PLOT 1.2.2
    # Elementary plot for the average crop damage 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_2_2_2,
    
    # ELEMENTARY PLOT 1.2.3
    # Elementary plot for the average crop damage 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_2_2_3,
    
    # ELEMENTARY PLOT 1.2.4
    # Elementary Plot 2.2.4 for the comparison of 
    # the average crop damage 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_2_2_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 12
    )
)

(Note that the Multiplot 2.2 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.), were the Figure 2 was presented, of which the Multiplot 2.2 constitutes the PART 2.)*


back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






9.3 Harm On Economy With Respect To Economic Damage By Each Weather Event Type

Summary

The required variables and the target data subset of observations for the harm on economy with respect to economic damage were extracted from the table with the processed data, and processed to create a new variable that divided the observations for each of the included weather event types to two supplementary groups:

  • the 90% of observations with the lowest impact
  • the 10% of observations with the highest impact

before the information for the harm on economy with respect to economic damage was summarized by each weather event type.

Three aspects were examined:

  1. The overall average economic damage by each weather event type.
  2. The average economic damage by each weather event type for the 90% of cases with the lowest impact.
  3. The average economic damage by each weather event type for the 10% of cases with the highest impact.

For each aspect, the average economic damage by each weather event type, the number of its available observations (based on which the average was computed) and their skewness were examined.

The overall average economic damage was used as the main criterion to determine which weather events caused the most harm on economy with respect to economic damage but it is important to take into account the other two aspect that were presented in order to obtain a more insightful and complete ‘picture’ of their consequences, (especially given the fact that for most of the weather event types, the economic damage were highly positively skewed).

The table with results for the harm on economy with respect to economic damage by each weather event type were presented at the subsection 10.2.4 Most harmful event types with respect to economic damage of the chapter 10 RESULTS.

Finally the Multiplot 2.3 was created to visualize
the results of the harm on economy with respect to economic damage by each weather event type.

*(Note that neither the Multiplot 2.3 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS, where the Figure 2 was presented, of which the Multiplot 2.3 constitutes the PART 3.)

Steps


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.1 Extract the target data for harm on economy with respect to economic damage

In order to examine the harm on economy with respect to economic damage caused by each weather event type, the variables REFNUM, EVENT_TYPE and ECONOMIC_DAMAGE were selected from the table with the processed data and only the observations that refer to weather events that resulted in non-zero economic damage were extracted.

Furthermore, in an attempt to avoid highly misleading statistics due to the small number of observations for some of the weather event types, a lowest bound of 10 weather events that caused non zero economic damage (for each of the included weather event types) was selected (subjectively by the analyst) and applied.

This lowest bound, although it may seem (and generally it is) not enough to get trustworthy statistics, it was considered to be “good enough” taking into account that :

  1. the analysis focuses in describing historical data without trying to make inferences that would demand substantially bigger samples, although any statistic based on less than 10 observations could not be taken seriously especially in cases (such as in this analysis) where the distribution of economic damage for each weather event type was skewed.
  2. a period of 10 years (from 2001 to 2011) in which the observations that were used in the analysis occurred, is relatively small time to produce big samples of weather events that caused non zero economic damage for some the weather event types. Thus, if a highest bound was selected to get more robust statistics such as samples of 100 or 300, the majority of weather event types would have been excluded, making the results of the analysis trivial.

The table with the target data for the harm on economy with respect to economic damage consist of 140236 observations.

## Classes 'data.table' and 'data.frame':   140236 obs. of  3 variables:
##  $ REFNUM         : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ EVENT_TYPE     : chr  "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" "THUNDERSTORM WIND" ...
##  $ ECONOMIC_DAMAGE: num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"

The variable EVENT_TYPE includes 37 distinct weather event types, for most of which the variable ECONOMIC_DAMAGE was highly positively skewed.

It was worth noting that for the weather event types with highest number of observations there was highest skewness for the values of economic damage, indicating that the corresponding distribution of economic damage has a heavy tail that wasn’t possible to be observed when few observation were available.

Table 9.3.1-1: Facts about the table with the target data subset of observations for the harm on economy with respect to economic damage.
EVENT_TYPE N SKEWNESS
AVALANCHE 33 3.4882
BLIZZARD 129 10.5403
COASTAL FLOOD 152 4.5996
COLD/WIND CHILL 16 1.2895
DEBRIS FLOW 189 5.6453
DENSE FOG 56 3.7347
DROUGHT 171 4.6871
DUST DEVIL 60 2.4345
DUST STORM 62 5.4939
EXCESSIVE HEAT 21 4.2483
EXTREME COLD/WIND CHILL 32 3.5596
FLASH FLOOD 13954 58.0040
FLOOD 7368 85.7213
FROST/FREEZE 120 6.1949
HAIL 16305 72.6945
HEAVY RAIN 883 19.2418
HEAVY SNOW 573 7.0098
HIGH SURF 76 5.0462
HIGH WIND 3863 37.0482
HURRICANE/TYPHOON 108 4.7929
ICE STORM 410 8.6435
LAKE-EFFECT SNOW 195 13.1024
LIGHTNING 6199 22.3186
MARINE HIGH WIND 18 3.8120
MARINE STRONG WIND 34 5.3773
MARINE THUNDERSTORM WIND 128 10.1387
STORM SURGE/TIDE 131 9.6344
STRONG WIND 3251 53.9812
THUNDERSTORM WIND 74183 166.2756
TORNADO 8782 55.9160
TROPICAL DEPRESSION 35 5.4232
TROPICAL STORM 370 18.7288
TSUNAMI 14 2.7178
WATERSPOUT 12 3.0130
WILDFIRE 878 15.6629
WINTER STORM 931 29.8022
WINTER WEATHER 494 19.5434
Note:
The skewness was rounded to 4 decimal places.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.3.2 Process the target data for harm on economy with respect to economic damage

To create the table with the processed data for the harm on economy with respect to economic damage from the corresponding target data subset for this perspective, a new variable was created that divides the observations for each of the included weather event types in two complementary levels:

  • one that contains the 90% of cases with lowest impact
  • the other that contains the 10% of cases with highest impact

This decision was made due to the high skewness that was observed for the values of the variable ECONOMIC_DAMAGE for most weather event types, which indicates that the underlining distributions of such phenomena has a heavy tail that causes this heterogeneity on the observations. As a result a small economic damage were observed for the majority of cases that resulted in non-zero fatalities while in the few cases with the highest impact they caused lots of economic damage.

Having in mind that the average economic damage will be used to determine which weather event types were the most harmful to economy (with respect to economic damage) combined with the fact that the average doesn’t represent well the distribution of variables with high skewness, as it is highly affected by the most extreme values, it was considered necessary to examine the subsets created by those two levels in order to obtain an insightful picture.

# Create the table with the processed data 
# for the harm on economy with respect to economic damage.
processed_data_____harm_on_economy_____economic_damage <- 
  target_data_____harm_on_economy_____economic_damage[
    ,
    ## Create a new variable divides the observations
    ## for each weather event into two supplementary groups:  
    ##   - the 90% of weather events that resulted in lowest fatalities
    ##   - the 10% of weather events that resulted in highest fatalities
    BIN_GROUP_PER_EVENT_TYPE := (function(x, p_bins) {
      
      # adds 0 and 1 in the vector supplied at the argument 'p_bins' 
      # to the start and the end respectively  
      # the supplied percentiles if they are missing 
      # and sort them ascending
      p_bins_increasing <- sort(c(0, p_bins, 1))
      
      # creates the character strings that labels of the bins by the values supplied at 
      # the argument 'p_bins' that will be the values of the new variable
      bin_labels <- paste0("(", p_bins_increasing[-length(p_bins_increasing)]*100,
                           "% - ", p_bins_increasing[-1]*100, "%]")
      
      # identify the number of occurrences that correspond to each label
      n_times <- vapply(2:length(p_bins_increasing),
                        function(i) {
                          as.integer(floor(length(x) * p_bins_increasing[i]) -
                                       floor(length(x) * p_bins_increasing[i - 1]))
                        }, integer(1))
      
      # multiply each label with the number of its occurrences
      x_bins_expanded <- rep(x = bin_labels, times = n_times)
      
      # order the label to much the values of the corresponding vector
      x_bins_expanded_reordered <- x_bins_expanded[order(seq_along(x)[order(x)])]
      
      ## Coerce the character vector with the labels of bins to a factor
      x_bins_factor <- factor(x_bins_expanded_reordered, labels = bin_labels, ordered = TRUE)
      
    })(ECONOMIC_DAMAGE, 0.9), 
    by = EVENT_TYPE
  ][
    ## Coerce the EVENT_VARIABLE to factor
    , EVENT_TYPE := as.factor(EVENT_TYPE) 
  ]

The table with the processed data for the harm on economy with respect to economic damage contains 4 variables:

  1. REFNUM (int) : an id that uniquely identifies each observation
  2. EVENT_TYPE (Factor w/ 37 levels) : the type of each weather event
  3. ECONOMIC_DAMAGE (int) : the economic damage
  4. BIN_GROUP_PER_EVENT_TYPE (Ord.factor w/ 2 levels) : a factor that divides the observations for each weather event type to two complementary levels, one with the 90% of observations with the lowest impact and another with the 10% of observations with the highest impact.

and 140236 observations.

## Classes 'data.table' and 'data.frame':   140236 obs. of  4 variables:
##  $ REFNUM                  : int  413607 413608 413609 413610 413611 413612 413613 413614 413615 413616 ...
##  $ EVENT_TYPE              : Factor w/ 37 levels "AVALANCHE","BLIZZARD",..: 29 29 29 29 29 29 29 30 29 29 ...
##  $ ECONOMIC_DAMAGE         : num  10000 8000 2000 15000 5000 3000 10000 450000 150000 3000 ...
##  $ BIN_GROUP_PER_EVENT_TYPE: Ord.factor w/ 2 levels "(0% - 90%]"<"(90% - 100%]": 1 1 1 1 1 1 1 1 2 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "REFNUM"


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.3.3 Summarize the processed data for harm on economy with respect to economic damage by each weather event type

To evaluate the harm on economy by each weather event type with respect to economic damage a simplistic approach was adopted :

  • the weather event types were ranked from the most harmful to the least based on the overall average economic damage of the weather events that resulted in non-zero economic damage

The overall average economic damage caused by each weather event type was initially examined along with the skewness of the economic damage for each weather event type. In most cases the skewness was high (or even extremely high), so it was possible that the overall mean misrepresented the consequences of each weather event type.

That is the reason why the average economic damage for 90% of weather events with the lowest impact versus the average economic damage for the 10% of weather events with the highest impact were also computed and examined.

It is highlighted that for the average economic damage that refers to the 10% of the cases that had the highest impact, there were few observations available for a lot of weather event types and the corresponding mean values should be interpreted with caution.

# Create the table with the summary for the harm on economy 
# with respect to economic damage for each weather event type.
summary_____harm_on_economy______economic_damage <- 
  processed_data_____harm_on_economy_____economic_damage[
  ,
  list(
    ## The total number of observation by each weather event type.
    "N" = .N,
    ## The average economic damage caused by each weather event type.
    "AVRG" = round(mean(ECONOMIC_DAMAGE), 0),
    ## The skewness of economic damage for the observations by each weather event type.
    "SKEWNESS" = round(skewness(ECONOMIC_DAMAGE), 4),
    ## The number of observations for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "N_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , .N],
    ## The average economic damage caused by each weather event type 
    ## for the 90% of cases with the lowest impact.
    "AVRG_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(mean(ECONOMIC_DAMAGE), 0)],
    ## The skewness of economic damage for the 90% of cases with the lowest impact 
    ## by each weather event type.
    "SKEWNESS_LOW" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(0% - 90%]" , round(skewness(ECONOMIC_DAMAGE), 4)],
    ## The number of observations for the 10% of cases with the lowest impact 
    ## by each weather event type.
    "N_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , .N],
    ## The average economic damage caused by each weather event type 
    ## for the 10% of cases with the highest impact.
    "AVRG_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(mean(ECONOMIC_DAMAGE), 0)],
    ## The skewness of economic damage for the 10% of cases with the highest impact 
    ## by each weather event type.
    "SKEWNESS_HIGH" = .SD[BIN_GROUP_PER_EVENT_TYPE == "(90% - 100%]" , round(skewness(ECONOMIC_DAMAGE), 4)]
  ),
  by = "EVENT_TYPE"
  ][
    ## The average economic damage is used to order the rows of the table
    ## from the most harmful weather event type to the least.
    order(-AVRG),
    ## Create a variable with the rank of the harmness of each weather event type.
    RANK := 1:length(EVENT_TYPE)
    ][
      ,
      ## Reorder the variables at the table.
      list(
        RANK, EVENT_TYPE, N, AVRG, SKEWNESS, N_LOW, AVRG_LOW, SKEWNESS_LOW, N_HIGH, AVRG_HIGH, SKEWNESS_HIGH
      )
      ]

The results of the table with the summary for the harm on economy with respect to economic damage by each weather event type that was created in this section were presented at the subsection 10.2.4 Most harmful event types with respect to economic damage of the chapter 10 RESULTS.

The table with the summary for the harm on economy with respect to economic damage by each weather event type was exported (as an R file), in the folder of the working directory:

  • outputs –> harm_on_economy –> results

with filename:

  • summary______harm_on_economy______economic_damage.R

The main reason for exporting the file with the summary for the harm on economy with respect to crop damage by each weather event type was to supply a checkpoint for any attempts to reproduce the analysis.


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.3.4 Visualize the results of the summary for the harm on economy with respect to economic damage by each weather event type

From the table with the summary for the harm on economy by each weather event type with respect to economic damage the Multiplot 2.3 was created to present an overview of the results for the three different aspects that were examined for this perspective.

Four elementary plots were created:

  • 9.3.4.1.1 Create The Plot 2.3.1
    • Displays the overall average economic damage caused by each weather event type based on all the cases of weather events that resulted in non-zero economic damage.
  • 9.3.4.1.2 Create The Plot 2.3.2
    • Displays the average economic damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero economic damage.
  • 9.3.4.1.3 Create The Plot 2.3.3
    • Displays the average economic damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero economic damage.
  • 9.3.4.1.4 Create The Plot 2.3.4
    • Displays a comparison for each weather event type, of the average economic damage for the 90% of its observations with the lowest impact versus the average economic damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero economic damage.

which were then combined in order to obtain the Multiplot 2.3.

It constitutes the PART 2 of the Figure 2 that displays the overview of the harm on economy by each weather event type.

*(Note that neither the Multiplot 2.3 nor the elementary plots that it contains were presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS, where the Figure 2 was presented, of which the Multiplot 2.3 constitutes the PART 3.)


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.4.1 Create the components of Multiplot 2.3

Creates four elementary plots to visualize the results for the aspects that were examined for the harm on economy with respect to economic damage by each weather event type.

  • 9.3.4.1.1 Create The Plot 2.3.1
    • Displays the overall average economic damage caused by each weather event type based on all the cases of weather events that resulted in non-zero economic damage.
  • 9.3.4.1.2 Create The Plot 2.3.2
    • Displays the average economic damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero economic damage.
  • 9.3.4.1.3 Create The Plot 2.3.3
    • Displays the average economic damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero economic damage.
  • 9.3.4.1.4 Create The Plot 2.3.4
    • Displays a comparison for each weather event type, of the average economic damage for the 90% of its observations with the lowest impact versus the average economic damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero economic damage.


back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.4.1.1 Create The Plot 2.3.1

The Plot 2.3.1 displays the overall average economic damage caused by each weather event type taking into account all and only the observation that resulted in non-zero economic damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average economic damage they caused.

The skewness of the economic damage for the observations of each weather event type (based on which the overall economic damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.3.1 that displays 
# the overall average economic damage 
# by each weather event type for all cases. 
elementary_plot_2_3_1 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______economic_damage,
    mapping = aes(
      x = AVRG,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to make them displayed alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a square shaped point to the position that corresponds to 
  ## the average economic damage caused by each weather event type, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(color = SKEWNESS),
    shape = 15, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average economic damage.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG, 
      group = EVENT_TYPE, 
      color = SKEWNESS
    )
    ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## economic damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2.5
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average economic damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.3 will be composed from the four elementary plots. 
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels.  
  labs(
    title = "Plot 2.3.1", 
    subtitle = "Aspect: Overall",
    x = "Average Number of Economic Damage\n",
    y = "Weather Event Types \n"
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.4.1.2 Create The Plot 2.3.2

The Elementary Plot 2.3.2 displays the average economic damage for the 90% of cases with the lowest impact caused by each weather event type from all the observation that resulted in non-zero economic damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average economic damage they caused.
(so it is NOT based on the average economic damage caused by the 90% of cases with the lowest impact of each weather event type).

The skewness of the economic damage for the observations of each weather event type (based on which the average economic damage for the 90% of cases with the lowest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.3.2 that displays 
# the average economic damage by each weather event type 
# for the 90% of its cases with the lowest impact.
elementary_plot_2_3_2 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______economic_damage,
    mapping = aes(
      x = AVRG_LOW,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a circle shaped point to the position that corresponds to 
  ## the average economic damage caused by each weather event type
  ## for the 90% of its cases with the lowest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_LOW
    ), 
    size = 3.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average economic damage 
  ## for the 90% of its cases with the lowest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_LOW, 
      group = EVENT_TYPE, 
      color = SKEWNESS_LOW
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## economic damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ), 
    size = 2
    ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average economic damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.3 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.3.2",
    subtitle = "Aspect: 90% of cases with the lowest impact",
    x = paste0(
      "Average Number of Economic Damage for the 90% ", "\n",
      "of Observations with the Lowest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.4.1.3 Create The Plot 2.3.3

The Plot 2.3.3 displays the average economic damage for the 10% of cases with the highest impact caused by each weather event type from all the observation that resulted in non-zero economic damage.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average economic damage they caused.
(so it is NOT based on the average economic damage caused by the 10% of cases with the highest impact of each weather event type).

The skewness of the economic damage for the observations of each weather event type (based on which the average economic damage for the 10% of cases with the highest impact was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.3.3 that displays 
# the average economic damage by each weather event type 
# for the 10% of its cases with the highest impact.
elementary_plot_2_3_3 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______economic_damage,
    mapping = aes(
      x = AVRG_HIGH,
      ### Reverse the order of the factors for the EVENT_TYPE variable 
      ### to display them alphabetically from top to bottom.
      y = factor(
        x = EVENT_TYPE, 
        levels = rev(x = levels(x = EVENT_TYPE)
        )
      ) 
    )
  ) +
  ## Draw a diamond shaped point to the position that corresponds to 
  ## the average economic damage caused by each weather event type
  ## for the 10% of its cases with the highest impact, 
  ## of which the color indicates the skewness of observations 
  ## based on which each average was computed.
  geom_point(
    mapping = aes(
      color = SKEWNESS_HIGH
    ), 
    shape = 18, 
    size = 4.5
  ) +
  ## Draw a line that visually associates each weather event type 
  ## with its respective average economic damage 
  ## for the 10% of its cases with the highest impact.
  geom_linerange(
    mapping = aes(
      xmin = 0, 
      xmax = AVRG_HIGH, 
      group = EVENT_TYPE, 
      color = SKEWNESS_HIGH
    )
  ) +
  ## Draw a number that indicates the rank assigned to each weather event type 
  ## (from the most harmful to the least) based on the overall average number
  ## economic damage it caused inside the square point 
  ## that displays the average.
  geom_text(
    mapping = aes(
      label = RANK
    ),
    size = 2
  ) +
  ## Adjust the scale for the color of each point.
  scale_color_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average economic damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.3 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
  ) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.3.3",
    subtitle ="Aspect: 10% of cases with the highest impact",
    x = paste0(
      "Average Number of Economic Damage for the 10% ", "\n", 
      "of Observations with the Highest Impact" 
    )
  ) +
  ## Select a theme.
  theme_linedraw() + 
  ## Customize the selected theme.
  theme(
    ### Remove the legend.
    legend.position = "none",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    ),
    ### Remove the text, ticks and title of the y axis 
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


9.3.4.1.4 Create The Plot 2.3.4

The Plot 2.3.4 displays a compact overview of all three aspect that were examined for the harm on economy with respect to economic damage.

For each weather event type, the comparison was visualized for the average economic damage for the 90% of cases with the lowest impact versus the average economic damage for the 10% of cases with the highest impact.

The weather event types were matched with a number that represents the rank which was assigned to each of them from the most harmful to the least with respect to economy, based on the overall average economic damage they caused.

The skewness of the economic damage for the observations of each weather event type (based on which the overall economic damage was computed) had been encoded in the color of the bar associated with each of them.

# Create the Elementary Plot 2.3.4 that displays 
# by each weather event type the comparison of 
# the average economic damage 
# for the 90% of cases with the lowest impact
# versus the average economic damage 
# for the 10% of cases with the highest impact.
elementary_plot_2_3_4 <-
  ## Supply the constant arguments for the aesthetics of all included geoms.
  ggplot(
    data = summary_____harm_on_economy______economic_damage,
    mapping = aes(
      x = AVRG_HIGH, 
      y = AVRG_LOW
    )
  ) +
  geom_point(
    mapping = aes(
      fill = SKEWNESS
    ), 
    shape = 21
  ) +
  ## Draw a label with a number that indicates the rank assigned 
  ## to each weather event type (from the most harmful to the least) 
  ## based on the overall average economic damage it caused.
  geom_label_repel(
    mapping = aes(
      label = RANK, 
      fill = SKEWNESS
    ),
    size = 2.5
  ) +
  ## Adjust the scale for the fill of each label.
  scale_fill_gradient2(
    ### Choose such limits and midpoint for the colorbar of the legend
    ### that they can be used unchanged to correctly display 
    ### the skewness of the observations based on which 
    ### the average economic damage for all three aspects: 
    ###   1. overall
    ###   2. 90% of cases with the lowest impact 
    ###   3. 10% of cases with the highest impact
    ### was computed. 
    ### This will allow to include only one common legend when the 
    ### Multiplot 2.3 will be composed from the four elementary plots.
    limits = c(-5, 170), 
    midpoint = 70, 
    low = "lightgreen", 
    mid = "orange", 
    high = "purple"
    ) +
  ## Set proper limits to the plot.
    xlim(c(-0.5e9, 8.5e9)) +
    ylim(c(-1e7, 9.5e7)) +
  ## Supply descriptive labels. 
  labs(
    title = "Plot 2.3.4",
    subtitle = paste0(
      "Comparison of the average economic damage ", 
      "for the 90% of observations with the lowest impact ", 
      "versus the average economic damage ", 
      "for the 10% of observations with highest impact. "
    ),
    x = paste0(
      "Average Number of Economic Damage by each Weather Event Type ", 
      "for the 10% of its Observations with the Highest Impact"
    ),
    y = paste0(
      "Average Number of Economic Damage by each Weather Event Type ", "\n", 
      "for the 90% of its Observations with the Lowest Impact."
    ),
    ### Add a descriptive label for the legend.
    fill = paste0(
      "The color indicates the skewness ",
      "of economic damage for the each weather event type. ",
      "(the color scale is unique for all four plots of PART 3) "
    )
  ) +
  ## Select a theme.
  theme_linedraw() +
  ## Customize the selected theme.
  theme(
    ### Adjust the legend.
    legend.position = "bottom",
    legend.direction = "horizontal",
    ### Adjust the title.
    plot.title = element_text(
      size = 12,
      face = "bold"
    ),
    ### Adjust the subtitle.
    plot.subtitle = element_text(
      size = 10
    )
  )


back to start of this subsubsubsection
back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




9.3.4.2 Compose the Multiplot 2.3

The four elementary plots that were created from the results of the summary for the harm on economy with respect to economic damage by each weather event type, were combined to construct a single multiplot that displays the complete picture for this perspective.

# Create a multiplot that displays the overview of the summary 
# for the harm on economy with respect to economic damage
# by each weather event type.
multiplot_2_3 <- arrangeGrob(
  grobs = list(
      
    # Title
    textGrob(
      label = paste0(
        "\n",
        "PART 3: Harm on economy by each weather event type ", 
        "with the respect to economic damage ", "\n", 
        "based on the cases of weather events ", 
        "that resulted in non-zero economic damage.", "\n", 
        "\n"
      ),
       gp=gpar(
         fontsize = 16, 
         fontface = "bold"
       )
    ),
    
    # Subtitle
    textGrob(
      label = paste0(
          "\n", 
          "The results include only the weather event types, ", 
          "for which at least 10 observations ", 
          "that resulted in non-zero economic damage were available. ", "\n",
          "The number associated with each weather event type ", 
          "represents the rank (from the most harmful to the least) ", 
          "which was assigned based on the overall average economic damage.", "\n",
          "Because for most of the weather event types ", 
          "high positive skewness was observed for the economic damage, ",
          "the average of the 90% of cases with lowest impact ", "\n",
          "and the 10% of cases with highest impact were reported ", 
          "to provide a more representative picture of their consequences.","\n",
          "\n"
      ),
       gp=gpar(
         fontsize = 14, 
         fontface = "bold"
       )
    ),
    
    # Plot 2.3.1
    # Elementary plot for the average economic damage 
    # by each weather event type for all cases.
    elementary_plot_2_3_1,
    
    # ELEMENTARY PLOT 1.3.2
    # Elementary plot for the average economic damage 
    # by each weather event type for 90% of cases with the lowest impact.
    elementary_plot_2_3_2,
    
    # ELEMENTARY PLOT 1.3.3
    # Elementary plot for the average economic damage 
    # by each weather event type for 10% of cases with the highest impact.
    elementary_plot_2_3_3,
    
    # ELEMENTARY PLOT 1.3.4
    # Elementary Plot 2.3.4 for the comparison of 
    # the average economic damage 
    # for the 90% of cases with the lowest impact versus 
    # the 10% of cases with the highest impact.
    elementary_plot_2_3_4
  ),
  # Set the layout for this elementary plots
  layout_matrix = 
    matrix(
      c(1,1,1,1,1,1,1,1,1,
        2,2,2,2,2,2,2,2,2,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        3,3,3,3,3,4,4,5,5,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6,
        NA,6,6,6,6,6,6,6,6
      ),
      byrow = TRUE, 
      nrow = 13
    )
)

(Note that the Multiplot 2.3 was NOT presented in this section due to the restrictions imposed by the assignment to include in the report at least 1 but no more than 3 figures. It can be examined at the subsection 10.2.1 Overview of results for the harm on economy of the chapter 10 RESULTS.), were the Figure 2 was presented, of which the Multiplot 2.3 constitutes the PART 3.)*


back to start of this subsubsection back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













10 RESULTS


The unprocessed raw data from the file repdata_data_StormData.csv.bz2 that contains observations from Storm Events Dataset that was created and made publicly available by the U.S. National Oceanic and Atmospheric Administration (NOAA), was processed to obtain the table with processed data (through a processing pipeline which was described in detail at the chapter 6 DATA PROCESSING).

Based on the table with the processed data which contains valid observations for weather events that happened at United States in the period from 2001 to 2011 and caused harm either to population health (resulted in fatalities or injuries) or to economy (resulted in property or crop damage) the results of this analysis were produced for the two questions of interest set by the assignment (for which the guidelines can be found at the section 2.1 About The Assignment, that were presented in the following sections of this chapter:


back to start of this chapter
back to TABLE OF CONTENTS


10.1 Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

In an attempt to identify the most harmful weather event types with respect to population health three different perspectives were examined (for which the analysis can be examined at the chapter 8 HARM ON POPULATION HEALTH).

A short overview of the results was presented at the subsection:

Further details, at each of the three perspectives, are available at the following subsections:

It is highlighted that the results must be evaluated under the following context in order to be meaningful:

  • The results for any perspective (fatalities, injuries or casualties) refer specifically to the harm that was caused when harm with respect to that particular perspective was observed.

(In other words the results do not refer to the harm caused for a perspective of interest when a weather phenomenon of an included weather event type occurred independently of whether or not it caused harm with respect to the perspective that was examined.)

In addition, due to the fact that it was decided to include only the weather event types for which there were available at least 10 observations that corresponded to weather events that resulted in non-zero harm with respect to each perspective examined, the composition of weather event types for the three perspectives is different.

For each perspective, it was consider appropriate to present three aspects in order to supply an insightful picture of the consequences caused by each weather event type:

  • the overall average harm
  • the average harm of 90% of cases with lower impact
  • the average harm of 10% of cases with higher impact

The number of observations as well as their skewness were summarized by each weather event types for every aspect and presented along with the corresponding average.

Although the overall average harm was used as the primary criterion to determine the most harmful events, it should be examined along with the average harm for the two other subgroups, especially when the overall skewness for a weather event type of interest is high.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


10.1.1 Overview of results for the harm on population health

In order to display an overview of the results for the harm on population health by each weather event type the Figure 1 was created.

The Figure 2 consists of three parts, one for each of the three perspective examined:

  • PART 1
    • Contains the Multiplot 1.1 which was constructed at the subsection 8.1.4 Visualize the results of the summary for the harm on population health with respect to fatalities by each weather event type and displays the results for the harm on population health with respect to fatalities by each weather event type for all the aspects that were examined. It consists of four plots:
      • Plot 1.1.1
        • Displays the overall average number of fatalities caused by each weather event type based on all the cases of weather events that resulted in non-zero fatalities.
      • Plot 1.1.2
        • Displays the average number of fatalities caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero fatalities.
      • Plot 1.1.3
        • Displays the average number of fatalities caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero fatalities.
      • Plot 1.1.4
        • Displays a comparison for each weather event type, of the average number of fatalities for the 90% of its observations with the lowest impact versus the average number of fatalities for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero fatalities.
  • PART 2
    • Contains the Multiplot 1.2 which was constructed at the subsection 8.2.4 Visualize the results of the summary for the harm on population health with respect to injuries by each weather event type and displays the results for the harm on population health with respect to injuries by each weather event type for all the aspects that were examined. It consists of four plots:
      • Plot 1.2.1
        • Displays the overall average number of injuries caused by each weather event type based on all the cases of weather events that resulted in non-zero injuries.
      • Plot 1.2.2
        • Displays the average number of injuries caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero injuries.
      • Plot 1.2.3
        • Displays the average number of injuries caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero injuries.
      • Create The Plot 1.2.4
        • Displays a comparison for each weather event type, of the average number of injuries for the 90% of its observations with the lowest impact versus the average number of injuries for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero injuries.
  • PART 3
    • Contains the Multiplot 1.3 which was constructed at the subsection 8.3.4 Visualize the results of the summary for the harm on population health with respect to casualties by each weather event type and displays the results for the harm on population health with respect to casualties by each weather event type for all the aspects that were examined. It consists of four plots:
      • Plot 1.3.1
        • Displays the overall average number of casualties caused by each weather event type based on all the cases of weather events that resulted in non-zero casualties.
      • Plot 1.3.2
        • Displays the average number of casualties caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero casualties.
      • Plot 1.3.3
        • Displays the average number of casualties caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero casualties.
      • Plot 1.3.4
        • Displays a comparison for each weather event type, of the average number of casualties for the 90% of its observations with the lowest impact versus the average number of casualties for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero casualties.

The Figure 1 was exported (as a png file), in the folder of the working directory:

  • outputs –> harm_on_population_health –> figures

with filename:

  • figure_1.png


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.1.2 Most harmful event types with respect to fatalities

According to the summary for the harm on population health with respect to fatalities by each weather event type (that were obtained at the section 8.1 Harm On Population Health With Respect To Fatalities By Each Weather Event Type) out of the 26 included weather event types (for each of which at least 10 observations that resulted in non-zero fatalities at United States in the period from 2001 to 2011 were available) there were 7 of them that stand out:

  1. When a weather event of type TORNADO resulted in fatalities, it caused about 3.4 fatalities on average (based on 339 observations that had extreme positive skewness equal to 13.5732). For 9 out of 10 times of such cases, an average of 1.88 fatalities was observed (based on the 90% of cases with the lower impact for which 305 observations were available, that had moderate positive skewness equal to 1.812), while for the remaining 1 out of 10 times it caused around 17 fatalities on average (based on the 10% of cases with the higher impact for which 34 observations were available, that had high positive skewness equal to 4.9099).
  2. When a weather event of type DEBRIS FLOW resulted in fatalities, it caused about 3.36 fatalities on average (based on 11 observations that had moderate positive skewness equal to 1.6608). For 9 out of 10 times of such cases, an average of 1.44 fatalities was observed (based on the 90% of cases with the lower impact for which only 9 observations were available, that had moderate positive skewness equal to 2.0673), while for the remaining 1 out of 10 times it caused around 12 fatalities on average (based on the 10% of cases with the higher impact for which only 2 observations were available, that had low positive skewness equal to 0).
  3. When a weather event of type HURRICANE/TYPHOON resulted in fatalities, it caused about 2.96 fatalities on average (based on 23 observations that had moderate positive skewness equal to 2.1981). For 9 out of 10 times of such cases, an average of 1.95 fatalities was observed (based on the 90% of cases with the lower impact for which 20 observations were available, that had moderate positive skewness equal to 1.6605), while for the remaining 1 out of 10 times it caused around 9.67 fatalities on average (based on the 10% of cases with the higher impact for which only 3 observations were available, that had low positive skewness equal to 0.7071).
  4. When a weather event of type EXCESSIVE HEAT resulted in fatalities, it caused about 2.89 fatalities on average (based on 296 observations that had high positive skewness equal to 5.4405). For 9 out of 10 times of such cases, an average of 1.51 fatalities was observed (based on the 90% of cases with the lower impact for which 266 observations were available, that had moderate positive skewness equal to 1.9625), while for the remaining 1 out of 10 times it caused around 15.17 fatalities on average (based on the 10% of cases with the higher impact for which 30 observations were available, that had moderate positive skewness equal to 1.6149).
  5. When a weather event of type WILDFIRE resulted in fatalities, it caused about 2.61 fatalities on average (based on 31 observations that had moderate positive skewness equal to 2.629). For 9 out of 10 times of such cases, an average of 1.59 fatalities was observed (based on the 90% of cases with the lower impact for which 27 observations were available, that had moderate positive skewness equal to 1.2688), while for the remaining 1 out of 10 times it caused around 9.5 fatalities on average (based on the 10% of cases with the higher impact for which only 4 observations were available, that had low negative skewness equal to -0.278).
  6. When a weather event of type TROPICAL STORM resulted in fatalities, it caused about 2.5 fatalities on average (based on 20 observations that had high positive skewness equal to 3.8434). For 9 out of 10 times of such cases, an average of 1.33 fatalities was observed (based on the 90% of cases with the lower impact for which 18 observations were available, that had moderate positive skewness equal to 2.3814), while for the remaining 1 out of 10 times it caused around 13 fatalities on average (based on the 10% of cases with the higher impact for which only 2 observations were available, that had low positive skewness equal to 0).
  7. When a weather event of type HEAT resulted in fatalities, it caused about 1.81 fatalities on average (based on 127 observations that had high positive skewness equal to 4.1476). For 9 out of 10 times of such cases, an average of 1.26 fatalities was observed (based on the 90% of cases with the lower impact for which 114 observations were available, that had moderate positive skewness equal to 1.912), while for the remaining 1 out of 10 times it caused around 6.62 fatalities on average (based on the 10% of cases with the higher impact for which 13 observations were available, that had moderate positive skewness equal to 1.4602).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.1.3 Most harmful event types with respect to injuries

According to the summary for the harm on population health with respect to injuries by each weather event type (that were obtained at the section 8.2 Harm On Population Health With Respect To Injuries By Each Weather Event Type) out of the 27 included weather event types (for each of which at least 10 observations that resulted in non-zero injuries at United States in the period from 2001 to 2011 were available) there were 3 of them that stand out:

Specifically :

  1. When a weather event of type HURRICANE/TYPHOON resulted in injuries, it caused about 86.07 injuries on average (based on 15 observations that had moderate positive skewness equal to 2.773). For 9 out of 10 times of such cases, an average of 15 injuries was observed (based on the 90% of cases with the lower impact for which 13 observations were available, that had moderate positive skewness equal to 2.8806), while for the remaining 1 out of 10 times it caused around 548 injuries on average (based on the 10% of cases with the higher impact for which only 2 observations were available, that had low positive skewness equal to 0).
  2. When a weather event of type EXCESSIVE HEAT resulted in injuries, it caused about 37.7 injuries on average (based on 86 observations that had high positive skewness equal to 4.1751). For 9 out of 10 times of such cases, an average of 16.48 injuries was observed (based on the 90% of cases with the lower impact for which 77 observations were available, that had moderate positive skewness equal to 1.2674), while for the remaining 1 out of 10 times it caused around 219.22 injuries on average (based on the 10% of cases with the higher impact for which only 9 observations were available, that had low positive skewness equal to 0.7763).
  3. When a weather event of type HEAT resulted in injuries, it caused about 33.94 injuries on average (based on 36 observations that had moderate positive skewness equal to 2.1619). For 9 out of 10 times of such cases, an average of 13.56 injuries was observed (based on the 90% of cases with the lower impact for which 32 observations were available, that had moderate positive skewness equal to 2.4589), while for the remaining 1 out of 10 times it caused around 197 injuries on average (based on the 10% of cases with the higher impact for which only 4 observations were available, that had moderate negative skewness equal to -1.0869).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.1.4 Most harmful event types with respect to casualties

According to the summary for the harm on population health with respect to casualties by each weather event type (that were obtained at the section 8.3 Harm On Population Health With Respect To Casualties By Each Weather Event Type) out of the 30 included weather event types (for each of which at least 10 observations that resulted in non-zero casualties at United States in the period from 2001 to 2011 were available) there were 7 of them that stand out:

Specifically :

  1. When a weather event of type HURRICANE/TYPHOON resulted in casualties, it caused about 41.18 casualties on average (based on 33 observations that had high positive skewness equal to 4.4573). For 9 out of 10 times of such cases, an average of 3.93 casualties was observed (based on the 90% of cases with the lower impact for which 29 observations were available, that had moderate positive skewness equal to 2.1573), while for the remaining 1 out of 10 times it caused around 311.25 casualties on average (based on the 10% of cases with the higher impact for which only 4 observations were available, that had low positive skewness equal to 0.7473).
  2. When a weather event of type EXCESSIVE HEAT resulted in casualties, it caused about 11.71 casualties on average (based on 350 observations that had extreme positive skewness equal to 8.3298). For 9 out of 10 times of such cases, an average of 2.85 casualties was observed (based on the 90% of cases with the lower impact for which 315 observations were available, that had moderate positive skewness equal to 2.7042), while for the remaining 1 out of 10 times it caused around 91.43 casualties on average (based on the 10% of cases with the higher impact for which 35 observations were available, that had moderate positive skewness equal to 2.7186).
  3. When a weather event of type TORNADO resulted in casualties, it caused about 11.67 casualties on average (based on 1327 observations that had extreme positive skewness equal to 17.6038). For 9 out of 10 times of such cases, an average of 4.29 casualties was observed (based on the 90% of cases with the lower impact for which 1194 observations were available, that had moderate positive skewness equal to 1.936), while for the remaining 1 out of 10 times it caused around 77.9 casualties on average (based on the 10% of cases with the higher impact for which 133 observations were available, that had high positive skewness equal to 6.2215).
  4. When a weather event of type DUST STORM resulted in casualties, it caused about 9.7 casualties on average (based on 23 observations that had moderate positive skewness equal to 1.5025). For 9 out of 10 times of such cases, an average of 6.35 casualties was observed (based on the 90% of cases with the lower impact for which 20 observations were available, that had moderate positive skewness equal to 1.2737), while for the remaining 1 out of 10 times it caused around 32 casualties on average (based on the 10% of cases with the higher impact for which only 3 observations were available, that had low positive skewness equal to 0.4703).
  5. When a weather event of type HEAT resulted in casualties, it caused about 9.43 casualties on average (based on 154 observations that had high positive skewness equal to 5.2894). For 9 out of 10 times of such cases, an average of 1.7 casualties was observed (based on the 90% of cases with the lower impact for which 138 observations were available, that had moderate positive skewness equal to 2.459), while for the remaining 1 out of 10 times it caused around 76.12 casualties on average (based on the 10% of cases with the higher impact for which 16 observations were available, that had low positive skewness equal to 0.9965).
  6. When a weather event of type TROPICAL STORM resulted in casualties, it caused about 9.32 casualties on average (based on 34 observations that had high positive skewness equal to 5.3288). For 9 out of 10 times of such cases, an average of 1.9 casualties was observed (based on the 90% of cases with the lower impact for which 30 observations were available, that had moderate positive skewness equal to 1.4887), while for the remaining 1 out of 10 times it caused around 65 casualties on average (based on the 10% of cases with the higher impact for which only 4 observations were available, that had moderate positive skewness equal to 1.1226).
  7. When a weather event of type DENSE FOG resulted in casualties, it caused about 7.6 casualties on average (based on 20 observations that had moderate positive skewness equal to 1.3831). For 9 out of 10 times of such cases, an average of 5.83 casualties was observed (based on the 90% of cases with the lower impact for which 18 observations were available, that had low positive skewness equal to 0.5675), while for the remaining 1 out of 10 times it caused around 23.5 casualties on average (based on the 10% of cases with the higher impact for which only 2 observations were available, that had low positive skewness equal to 0).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






10.2 Question 2 : Across the United States, which types of events have the greatest economic consequences?

In an attempt to identify the most harmful weather event types with respect to economy three different perspectives were examined (for which the analysis can be examined at the chapter 9 HARM ON ECONOMY.

A short overview of the results was presented at the subsection:

Further details, at each of the three perspectives, are available at the following subsections:

It is highlighted that the results must be evaluated under the following context in order to be meaningful:

  • The results for a perspective (property damage, crop damage or economic damage) refer specifically to the harm that was caused when harm with respect to that perspective was observed.

(In other words the results do not refer to the harm caused for a perspective of interest when a weather phenomenon of an included weather event type occurred independently of whether or not it caused harm with respect to the perspective that was examined.)

In addition, due to the fact that it was decided to include only the weather event types for which there were available at least 10 observations that corresponded to weather events that resulted in non-zero harm with respect to each perspective examined, the composition of weather event types for the three perspectives is different.

For each perspective, it was consider appropriate to present three aspects in order to supply an insightful picture of the consequences caused by each weather event type:

  • the overall average harm
  • the average harm of 90% of cases with lower impact
  • the average harm of 10% of cases with higher impact

The number of observations as well as their skewness were summarized by each weather event types for every aspect and presented along with the corresponding average.

Although the overall average harm was used as the primary criterion to determine the most harmful events, it should be examined along with the average harm for the two other subgroups, especially when the overall skewness for a weather event type of interest is high.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


10.2.1 Overview of results for the harm on economy

In order to display an overview of the results for the harm on economy by each weather event type the Figure 2 was created.

The Figure 2 consists of three parts, one for each of the three perspective examined:

  • PART 1
    • Contains the Multiplot 2.1 which was constructed at the subsection 9.2.4 Visualize the results of the summary for the harm on economy with respect to crop damage by each weather event type and displays the results for the harm on economy with respect to property damage by each weather event type for all the aspects that were examined. It consists of four plots:
      • The Plot 2.1.1
        • Displays the overall average property damage caused by each weather event type based on all the cases of weather events that resulted in non-zero property damage.
      • Plot 2.1.2
        • Displays the average property damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero property damage.
      • Plot 2.1.3
        • Displays the average property damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero property damage.
      • Plot 2.1.4
        • Displays a comparison for each weather event type, of the average property damage for the 90% of its observations with the lowest impact versus the average property damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero property damage.
  • PART 2
    • Contains the Multiplot 2.2 which was constructed at the subsection 9.2.4 Visualize the results of the summary for the harm on economy with respect to crop damage by each weather event type and displays the results for the harm on economy with respect to crop damage by each weather event type for all the aspects that were examined. It consists of four plots:
      • Plot 2.2.1
        • Displays the overall average crop damage caused by each weather event type based on all the cases of weather events that resulted in non-zero crop damage.
      • Plot 2.2.2
        • Displays the average crop damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero crop damage.
      • Plot 2.2.3
        • Displays the average crop damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero crop damage.
      • Plot 2.2.4
        • Displays a comparison for each weather event type, of the average crop damage for the 90% of its observations with the lowest impact versus the average crop damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero crop damage.
  • PART 3
    • Contains the Multiplot 2.3 which was constructed at the subsection 9.2.4 Visualize the results of the summary for the harm on economy with respect to crop damage by each weather event type and displays the results for the harm on economy with respect to economic damage by each weather event type for all the aspects that were examined. It consists of four plots:
      • Plot 2.3.1
        • Displays the overall average economic damage caused by each weather event type based on all the cases of weather events that resulted in non-zero economic damage.
      • Plot 2.3.2
        • Displays the average economic damage caused by each weather event type based on 90% of weather events with the lowest impact (for each weather event type) that resulted in non-zero economic damage.
      • Plot 2.3.3
        • Displays the average economic damage caused by each weather event type based on 10% of weather events with the highest impact (for each weather event type) that resulted in non-zero economic damage.
      • The Plot 2.3.4
        • Displays a comparison for each weather event type, of the average economic damage for the 90% of its observations with the lowest impact versus the average economic damage for the 10% of its observations with the highest impact based only on the weather events that resulted in non-zero economic damage.

The Figure 2 was exported (as a png file), in the folder of the working directory:

  • outputs –> harm_on_economy –> figures

with filename:

  • figure_2.png


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.2.2 Most harmful event types with respect to property damage

According to the summary for the harm on economy with respect to property damage by each weather event type (that were obtained at the section 9.1 Harm On Economy With Respect To Property Damage By Each Weather Event Type) out of the 37 included weather event types (for each of which at least 10 observations that resulted in non-zero property damage
at United States in the period from 2001 to 2011 were available) there were 2 of them that stand out:

  1. When a weather event of type HURRICANE/TYPHOON resulted in property damage, it caused about 676106028$ of property damage on average (based on 107 observations that had high positive skewness equal to 4.9333). For 9 out of 10 times of such cases, an average of 81701511$ of property damage was observed (based on the 90% of cases with the lower impact for which 96 observations were available, that had high positive skewness equal to 3.4556), while for the remaining 1 out of 10 times it caused around 5863636364$ of property damage on average (based on the 10% of cases with the higher impact for which 11 observations were available, that had moderate positive skewness equal to 1.5154).
  2. When a weather event of type STORM SURGE/TIDE resulted in property damage, it caused about 364969183$ of property damage on average (based on 131 observations that had extreme positive skewness equal to 9.6344). For 9 out of 10 times of such cases, an average of 749256$ of property damage was observed (based on the 90% of cases with the lower impact for which 117 observations were available, that had moderate positive skewness equal to 2.9093), while for the remaining 1 out of 10 times it caused around 3408807143$ of property damage on average (based on the 10% of cases with the higher impact for which 14 observations were available, that had moderate positive skewness equal to 2.7389).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.2.3 Most harmful event types with respect to crop damage

According to the summary for the harm on economy with respect to crop damage by each weather event type (that were obtained at the section 9.2 Harm On Economy With Respect To Crop Damage By Each Weather Event Type) out of the 16 included weather event types (for each of which at least 10 observations that resulted in non-zero crop damage
at United States in the period from 2001 to 2011 were available) there were 2 of them that stand out:

  1. When a weather event of type HURRICANE/TYPHOON resulted in crop damage, it caused about 63684017$ of crop damage on average (based on 48 observations that had high positive skewness equal to 5.6962). For 9 out of 10 times of such cases, an average of 13275181$ of crop damage was observed (based on the 90% of cases with the lower impact for which 43 observations were available, that had moderate positive skewness equal to 2.4986), while for the remaining 1 out of 10 times it caused around 497200000$ of crop damage on average (based on the 10% of cases with the higher impact for which only 5 observations were available, that had moderate positive skewness equal to 1.3378).
  2. When a weather event of type DROUGHT resulted in crop damage, it caused about 42389146$ of crop damage on average (based on 158 observations that had high positive skewness equal to 4.9333). For 9 out of 10 times of such cases, an average of 11981373$ of crop damage was observed (based on the 90% of cases with the lower impact for which 142 observations were available, that had moderate positive skewness equal to 2.3645), while for the remaining 1 out of 10 times it caused around 312258125$ of crop damage on average (based on the 10% of cases with the higher impact for which 16 observations were available, that had moderate positive skewness equal to 1.8881).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




10.2.4 Most harmful event types with respect to economic damage

According to the summary for the harm on economy with respect to economic damage by each weather event type (that were obtained at the section 9.3 Harm On Economy With Respect To Economic Damage By Each Weather Event Type) out of the 16 included weather event types (for each of which at least 10 observations that resulted in non-zero economic damage
at United States in the period from 2001 to 2011 were available) there were 2 of them that stand out:

  1. When a weather event of type HURRICANE/TYPHOON resulted in economic damage, it caused about 698149795$ economic damage on average (based on 108 observations that had high positive skewness equal to 4.7929). For 9 out of 10 times of such cases, an average of 92388431$ economic damage was observed (based on the 90% of cases with the lower impact for which 97 observations were available, that had high positive skewness equal to 3.0615), while for the remaining 1 out of 10 times it caused around 6039863636$ economic damage on average (based on the 10% of cases with the higher impact for which 11 observations were available, that had moderate positive skewness equal to 1.3803).
  2. When a weather event of type STORM SURGE/TIDE resulted in economic damage, it caused about 364975672$ economic damage on average (based on 131 observations that had extreme positive skewness equal to 9.6344). For 9 out of 10 times of such cases, an average of 756521$ economic damage was observed (based on the 90% of cases with the lower impact for which 117 observations were available, that had moderate positive skewness equal to 2.898), while for the remaining 1 out of 10 times it caused around 3408807143$ economic damage on average (based on the 10% of cases with the higher impact for which 14 observations were available, that had moderate positive skewness equal to 2.7389).


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













11 REPRODUCIBILITY DETAILS


To help in any attempt to reproduce the report with this analysis beyond the structure and the in-detail description of the procedure that took place during the execution of the script, several details are provided to make it as easy as possible.

Specifically, in this chapter, information is supplied about:

  1. the r session
  2. the r options
  3. the MD5 checksums of some important files
  4. the random seed


back to start of this chapter
back to TABLE OF CONTENTS


11.1 Session Info

The details with respect to the operating system, R version as wells as the versions of the libraries used to create this report are supplied to help in any attempt to reproduce the report.

## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 18.3
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=el_GR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=el_GR.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=el_GR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      tools     stats     graphics  grDevices utils     datasets  methods  
## [9] base     
## 
## other attached packages:
##  [1] rsconnect_0.8.16  gridExtra_2.3     ggrepel_0.8.2     ggplot2_3.3.0    
##  [5] moments_0.14      stringr_1.4.0     validate_0.9.3    data.table_1.12.8
##  [9] DT_0.13           magrittr_1.5      kableExtra_1.1.0  knitr_1.28       
## [13] rmdformats_0.3.7  rmarkdown_2.1    
## 
## loaded via a namespace (and not attached):
##  [1] settings_0.2.4    tidyselect_1.1.0  xfun_0.13         purrr_0.3.4      
##  [5] colorspace_1.4-1  vctrs_0.3.0       htmltools_0.4.0   viridisLite_0.3.0
##  [9] yaml_2.2.1        rlang_0.4.6       R.oo_1.23.0       pillar_1.4.4     
## [13] glue_1.4.0        withr_2.2.0       R.utils_2.9.2     lifecycle_0.2.0  
## [17] munsell_0.5.0     gtable_0.3.0      rvest_0.3.5       R.methodsS3_1.8.0
## [21] htmlwidgets_1.5.1 evaluate_0.14     labeling_0.3      crosstalk_1.1.0.1
## [25] curl_4.3          highr_0.8         Rcpp_1.0.4.6      readr_1.3.1      
## [29] scales_1.1.1      jsonlite_1.6.1    webshot_0.5.2     farver_2.0.3     
## [33] hms_0.5.3         packrat_0.5.0     digest_0.6.25     stringi_1.4.6    
## [37] bookdown_0.18     dplyr_0.8.5       tibble_3.0.1      crayon_1.3.4     
## [41] pkgconfig_2.0.3   ellipsis_0.3.0    xml2_1.3.2        assertthat_0.2.1 
## [45] httr_1.4.1        rstudioapi_0.11   R6_2.4.1          compiler_3.6.3

An object with the information on the session was also exported at the folder of working directory:

  • outputs –> reproducibility_support –> r_session

with filename:

  • session_info.R


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






11.2 Options

The details with respect to the r options active while the script that produced the report were executed are supplied to help in any attempt to reproduce the report.

## $add.smooth
## [1] TRUE
## 
## $askpass
## function (prompt) 
## {
##     .Call("rs_askForPassword", prompt)
## }
## <environment: 0x55b8df09dda0>
## 
## $asksecret
## function (name, title = name, prompt = paste(name, ":", sep = "")) 
## {
##     result <- .Call("rs_askForSecret", name, title, prompt, .rs.isPackageInstalled("keyring"), 
##         .rs.hasSecret(name))
##     if (is.null(result)) 
##         stop("Ask for secret operation was cancelled.")
##     result
## }
## <environment: 0x55b8df09dda0>
## 
## $bitmapType
## [1] "cairo"
## 
## $browser
## function (url) 
## {
##     .Call("rs_browseURL", url)
## }
## <environment: 0x55b8ddacd658>
## 
## $browserNLdisabled
## [1] FALSE
## 
## $buildtools.check
## function (action) 
## {
##     if (identical(.Platform$pkgType, "mac.binary.mavericks")) {
##         .Call("rs_canBuildCpp")
##     }
##     else {
##         if (!.Call("rs_canBuildCpp")) {
##             .rs.installBuildTools(action)
##             FALSE
##         }
##         else {
##             TRUE
##         }
##     }
## }
## <environment: 0x55b8df09dda0>
## 
## $buildtools.with
## function (code) 
## {
##     .rs.addRToolsToPath()
##     on.exit(.rs.restorePreviousPath(), add = TRUE)
##     force(code)
## }
## <environment: 0x55b8df09dda0>
## 
## $CBoundsCheck
## [1] FALSE
## 
## $check.bounds
## [1] FALSE
## 
## $citation.bibtex.max
## [1] 1
## 
## $connectionObserver
## $connectionObserver$connectionOpened
## function (type, host, displayName, icon = NULL, connectCode, 
##     disconnect, listObjectTypes, listObjects, listColumns, previewObject, 
##     connectionObject, actions = NULL) 
## {
##     if (!inherits(listObjectTypes, "function")) {
##         stop("listObjectTypes must be a function returning a list of object types", 
##             call. = FALSE)
##     }
##     promote <- function(name, l) {
##         if (length(l) == 0) 
##             return(list())
##         if (is.null(l$contains)) {
##             return(list(list(name = name, icon = l$icon, contains = "data")))
##         }
##         else {
##             return(unlist(append(list(list(list(name = name, 
##                 icon = l$icon, contains = names(l$contains)))), 
##                 lapply(names(l$contains), function(name) {
##                   promote(name, l$contains[[name]])
##                 })), recursive = FALSE))
##         }
##         return(list())
##     }
##     objectTree <- listObjectTypes()
##     objectTypes <- lapply(names(objectTree), function(name) {
##         promote(name, objectTree[[name]])
##     })[[1]]
##     connection <- list(type = type, host = host, displayName = displayName, 
##         icon = icon, connectCode = connectCode, disconnect = disconnect, 
##         objectTypes = objectTypes, listObjects = listObjects, 
##         listColumns = listColumns, previewObject = previewObject, 
##         actions = actions, connectionObject = connectionObject)
##     class(connection) <- "rstudioConnection"
##     .rs.validateConnection(connection)
##     cacheKey <- paste(connection$type, connection$host, .Call("rs_generateShortUuid"), 
##         sep = "_")
##     assign(cacheKey, value = connection, envir = .rs.activeConnections)
##     invisible(.Call("rs_connectionOpened", connection))
## }
## <environment: 0x55b8e0039108>
## 
## $connectionObserver$connectionClosed
## function (type, host, ...) 
## {
##     .rs.validateCharacterParams(list(type = type, host = host))
##     name <- .rs.findConnectionName(type, host)
##     if (!is.null(name)) 
##         rm(list = name, envir = .rs.activeConnections)
##     invisible(.Call("rs_connectionClosed", type, host))
## }
## <environment: 0x55b8e0039108>
## 
## $connectionObserver$connectionUpdated
## function (type, host, hint, ...) 
## {
##     .rs.validateCharacterParams(list(type = type, host = host, 
##         hint = hint))
##     invisible(.Call("rs_connectionUpdated", type, host, hint))
## }
## <environment: 0x55b8e0039108>
## 
## 
## $continue
## [1] "+ "
## 
## $contrasts
##         unordered           ordered 
## "contr.treatment"      "contr.poly" 
## 
## $datatable.alloccol
## [1] 1024
## 
## $datatable.allow.cartesian
## [1] FALSE
## 
## $datatable.auto.index
## [1] TRUE
## 
## $datatable.dfdispatchwarn
## [1] TRUE
## 
## $datatable.old.unique.by.key
## [1] FALSE
## 
## $datatable.optimize
## [1] Inf
## 
## $datatable.print.class
## [1] FALSE
## 
## $datatable.print.colnames
## [1] "auto"
## 
## $datatable.print.keys
## [1] FALSE
## 
## $datatable.print.nrows
## [1] 100
## 
## $datatable.print.rownames
## [1] TRUE
## 
## $datatable.print.topn
## [1] 5
## 
## $datatable.use.index
## [1] TRUE
## 
## $datatable.verbose
## [1] FALSE
## 
## $datatable.warnredundantby
## [1] TRUE
## 
## $defaultPackages
## [1] "datasets"  "utils"     "grDevices" "graphics"  "stats"     "methods"  
## 
## $demo.ask
## [1] "default"
## 
## $deparse.cutoff
## [1] 60
## 
## $device
## function (width = 7, height = 7, ...) 
## {
##     grDevices::pdf(NULL, width, height, ...)
## }
## <bytecode: 0x55b8e1318000>
## <environment: namespace:knitr>
## 
## $device.ask.default
## [1] FALSE
## 
## $digits
## [1] 7
## 
## $download.file.method
## [1] "libcurl"
## 
## $dplyr.show_progress
## [1] TRUE
## 
## $dvipscmd
## [1] "dvips"
## 
## $echo
## [1] TRUE
## 
## $editor
## [1] "vi"
## 
## $encoding
## [1] "native.enc"
## 
## $error
## (function () 
## {
##     .rs.recordTraceback(TRUE, 5, .rs.enqueueError)
## })()
## 
## $example.ask
## [1] "default"
## 
## $expressions
## [1] 5000
## 
## $ggvis.renderer
## [1] "svg"
## 
## $help.search.types
## [1] "vignette" "demo"     "help"    
## 
## $help.try.all.packages
## [1] FALSE
## 
## $help_type
## [1] "html"
## 
## $HTTPUserAgent
## [1] "RStudio Desktop (1.2.5033); R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"
## 
## $httr_oauth_cache
## [1] NA
## 
## $httr_oob_default
## [1] FALSE
## 
## $internet.info
## [1] 2
## 
## $keep.parse.data
## [1] TRUE
## 
## $keep.parse.data.pkgs
## [1] FALSE
## 
## $keep.source
## [1] TRUE
## 
## $keep.source.pkgs
## [1] FALSE
## 
## $knitr.in.progress
## [1] TRUE
## 
## $knitr.table.format
## [1] "html"
## 
## $locatorBell
## [1] TRUE
## 
## $mailer
## [1] "mailto"
## 
## $matprod
## [1] "default"
## 
## $max.print
## [1] 1000
## 
## $menu.graphics
## [1] FALSE
## 
## $na.action
## [1] "na.omit"
## 
## $nwarnings
## [1] 50
## 
## $OutDec
## [1] "."
## 
## $pager
## function (files, header, title, delete.file) 
## {
##     for (i in 1:length(files)) {
##         if ((i > length(header)) || !nzchar(header[[i]])) 
##             fileTitle <- title
##         else fileTitle <- header[[i]]
##         .Call("rs_showFile", fileTitle, files[[i]], delete.file)
##     }
## }
## <environment: 0x55b8df09dda0>
## 
## $page_viewer
## function (url, title = "RStudio Viewer", self_contained = FALSE) 
## {
##     if (!is.character(url) || (length(url) != 1)) 
##         stop("url must be a single element character vector.", 
##             call. = FALSE)
##     if (!is.character(title) || (length(title) != 1)) 
##         stop("title must be a single element character vector.", 
##             call. = FALSE)
##     if (!is.logical(self_contained) || (length(self_contained) != 
##         1)) 
##         stop("self_contained must be a single element logical vector.", 
##             call. = FALSE)
##     invisible(.Call("rs_showPageViewer", url, title, self_contained))
## }
## <environment: 0x55b8ddacd658>
## 
## $papersize
## [1] "a4"
## 
## $PCRE_limit_recursion
## [1] NA
## 
## $PCRE_study
## [1] 10
## 
## $PCRE_use_JIT
## [1] TRUE
## 
## $pdfviewer
## [1] "/usr/bin/xdg-open"
## 
## $pkgType
## [1] "source"
## 
## $plumber.swagger.url
## function (url) 
## {
##     invisible(.Call("rs_plumberviewer", url, getwd(), 3))
## }
## <environment: 0x55b8df09dda0>
## attr(,"plumberViewerType")
## [1] 3
## 
## $printcmd
## [1] "/usr/bin/lpr"
## 
## $profvis.keep_output
## [1] TRUE
## 
## $profvis.print
## function (x) 
## {
##     envir <- as.environment(which(search() == "tools:rstudio"))
##     eval(substitute(.rs.profilePrint(x), list(x = x)), envir = envir)
## }
## <environment: 0x55b8dd66cb80>
## 
## $profvis.prof_extension
## [1] ".Rprof"
## 
## $profvis.prof_output
## [1] "/home/rick/Documents/training/coursera/spec__data_science_specialization/Reproducible-Research--2nd-Assignment/.Rproj.user/30C9779D/profiles-cache"
## 
## $prompt
## [1] "> "
## 
## $readr.show_progress
## [1] TRUE
## 
## $repos
##                          CRAN 
## "https://cloud.r-project.org" 
## 
## $restart
## function (afterRestartCommand = "") 
## {
##     afterRestartCommand <- paste(as.character(afterRestartCommand), 
##         collapse = "\n")
##     .Call("rs_restartR", afterRestartCommand, PACKAGE = "(embedding)")
## }
## <environment: 0x55b8df09dda0>
## 
## $reticulate.repl.hook
## function (buffer, contents, trimmed) 
## {
##     if (buffer$empty()) {
##         if (grepl("^[?]", trimmed)) {
##             text <- substring(trimmed, 2)
##             .Call("rs_showPythonHelp", text, PACKAGE = "(embedding)")
##             return(TRUE)
##         }
##         reHelp <- "help\\((.*)\\)"
##         if (grepl(reHelp, trimmed)) {
##             text <- gsub(reHelp, "\\1", trimmed)
##             .Call("rs_showPythonHelp", text, PACKAGE = "(embedding)")
##             return(TRUE)
##         }
##     }
##     FALSE
## }
## <environment: 0x55b8df09dda0>
## 
## $reticulate.repl.initialize
## function () 
## {
##     builtins <- reticulate::import_builtins(convert = FALSE)
##     help <- builtins$help
##     .rs.setVar("reticulate.help", builtins$help)
##     builtins$help <- function(...) {
##         dots <- list(...)
##         if (length(dots) == 0) {
##             message("Error: Interactive Python help not available within RStudio")
##             return()
##         }
##         help(...)
##     }
##     if (requireNamespace("png", quietly = TRUE) && reticulate::py_module_available("matplotlib")) {
##         matplotlib <- reticulate::import("matplotlib", convert = TRUE)
##         backend <- matplotlib$get_backend()
##         if (!identical(tolower(backend), "agg")) {
##             sys <- reticulate::import("sys", convert = TRUE)
##             if ("matplotlib.backends" %in% names(sys$modules)) 
##                 matplotlib$pyplot$switch_backend("agg")
##             else matplotlib$use("agg", warn = FALSE, force = TRUE)
##         }
##         plt <- matplotlib$pyplot
##         .rs.setVar("reticulate.matplotlib.show", plt$show)
##         plt$show <- .rs.reticulate.matplotlib.showHook
##     }
## }
## <environment: 0x55b8df09dda0>
## 
## $reticulate.repl.teardown
## function () 
## {
##     builtins <- reticulate::import_builtins(convert = FALSE)
##     builtins$help <- .rs.getVar("reticulate.help")
##     show <- .rs.getVar("reticulate.matplotlib.show")
##     if (!is.null(show)) {
##         matplotlib <- reticulate::import("matplotlib", convert = TRUE)
##         plt <- matplotlib$pyplot
##         plt$show <- show
##     }
## }
## <environment: 0x55b8df09dda0>
## 
## $rl_word_breaks
## [1] " \t\n\"\\'`><=%;,|&{()}"
## 
## $rsconnect.http.timeout
## [1] 5
## 
## $rsconnect.max.bundle.files
## [1] 10000
## 
## $rsconnect.max.bundle.size
## [1] 3145728000
## 
## $rstudio.notebook.executing
## [1] FALSE
## 
## $scipen
## [1] 0
## 
## $shinygadgets.showdialog
## function (caption, url, width = NULL, height = NULL) 
## {
##     if (!is.character(caption) || (length(caption) != 1)) 
##         stop("caption must be a single element character vector.", 
##             call. = FALSE)
##     if (!is.character(url) || (length(url) != 1)) 
##         stop("url must be a single element character vector.", 
##             call. = FALSE)
##     if (is.null(width)) 
##         width <- 600
##     if (is.null(height)) 
##         height <- 600
##     if (!is.numeric(width) || (length(width) != 1)) 
##         stop("width must be a single element numeric vector.", 
##             call. = FALSE)
##     if (!is.numeric(height) || (length(height) != 1)) 
##         stop("height must be a single element numeric vector.", 
##             call. = FALSE)
##     invisible(.Call("rs_showShinyGadgetDialog", caption, url, 
##         width, height))
## }
## <environment: 0x55b8ddacd658>
## 
## $shiny.launch.browser
## function (url) 
## {
##     invisible(.Call("rs_shinyviewer", url, getwd(), 3))
## }
## <environment: 0x55b8df09dda0>
## attr(,"shinyViewerType")
## [1] 3
## 
## $show.coef.Pvalues
## [1] TRUE
## 
## $show.error.messages
## [1] TRUE
## 
## $show.signif.stars
## [1] TRUE
## 
## $str
## $str$strict.width
## [1] "no"
## 
## $str$digits.d
## [1] 3
## 
## $str$vec.len
## [1] 4
## 
## $str$drop.deparse.attr
## [1] TRUE
## 
## $str$formatNum
## function (x, ...) 
## format(x, trim = TRUE, drop0trailing = TRUE, ...)
## <environment: 0x55b8debadb80>
## 
## 
## $str.dendrogram.last
## [1] "`"
## 
## $stringsAsFactors
## [1] TRUE
## 
## $terminal.manager
## $terminal.manager$terminalActivate
## function (id = NULL, show = TRUE) 
## {
##     if (!is.null(id) && (!is.character(id) || (length(id) != 
##         1))) 
##         stop("'id' must be NULL or a character vector of length one")
##     if (!is.logical(show)) 
##         stop("'show' must be TRUE or FALSE")
##     .Call("rs_terminalActivate", id, show)
##     invisible(NULL)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalCreate
## function (caption = NULL, show = TRUE, shellType = NULL) 
## {
##     if (!is.null(caption) && (!is.character(caption) || (length(caption) != 
##         1))) 
##         stop("'caption' must be NULL or a character vector of length one")
##     if (is.null(show) || !is.logical(show)) 
##         stop("'show' must be a logical vector")
##     if (!is.null(shellType) && (!is.character(shellType) || (length(shellType) != 
##         1))) 
##         stop("'shellType' must be NULL or a character vector of length one")
##     validShellType = TRUE
##     if (!is.null(shellType)) {
##         validShellType <- tolower(shellType) %in% c("default", 
##             "win-cmd", "win-ps", "win-git-bash", "win-wsl-bash", 
##             "custom")
##     }
##     if (!validShellType) 
##         stop("'shellType' must be NULL, or one of 'default', 'win-cmd', 'win-ps', 'win-git-bash', 'win-wsl-bash', or 'custom'.")
##     .Call("rs_terminalCreate", caption, show, shellType)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalClear
## function (id) 
## {
##     if (is.null(id) || !is.character(id) || length(id) != 1) 
##         stop("'id' must be a character vector of length one")
##     .Call("rs_terminalClear", id)
##     invisible(NULL)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalList
## function () 
## {
##     .Call("rs_terminalList")
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalContext
## function (id) 
## {
##     if (is.null(id) || !is.character(id) || (length(id) != 1)) 
##         stop("'id' must be a single element character vector")
##     .Call("rs_terminalContext", id)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalBuffer
## function (id, stripAnsi = TRUE) 
## {
##     if (is.null(id) || !is.character(id) || (length(id) != 1)) 
##         stop("'id' must be a single element character vector")
##     if (is.null(stripAnsi) || !is.logical(stripAnsi)) 
##         stop("'stripAnsi' must be a logical vector")
##     .Call("rs_terminalBuffer", id, stripAnsi)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalVisible
## function () 
## {
##     .Call("rs_terminalVisible")
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalBusy
## function (id) 
## {
##     if (is.null(id) || !is.character(id)) 
##         stop("'id' must be a character vector")
##     .Call("rs_terminalBusy", id)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalRunning
## function (id) 
## {
##     if (is.null(id) || !is.character(id)) 
##         stop("'id' must be a character vector")
##     .Call("rs_terminalRunning", id)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalKill
## function (id) 
## {
##     if (is.null(id) || !is.character(id)) 
##         stop("'id' must be a character vector")
##     .Call("rs_terminalKill", id)
##     invisible(NULL)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalSend
## function (id, text) 
## {
##     if (!is.character(text)) 
##         stop("'text' should be a character vector", call. = FALSE)
##     if (is.null(id) || !is.character(id) || length(id) != 1) 
##         stop("'id' must be a character vector of length one")
##     .Call("rs_terminalSend", id, text)
##     invisible(NULL)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalExecute
## function (command, workingDir = NULL, env = character(), show = TRUE) 
## {
##     if (is.null(command) || !is.character(command) || (length(command) != 
##         1)) 
##         stop("'command' must be a single element character vector")
##     if (!is.null(workingDir) && (!is.character(workingDir) || 
##         (length(workingDir) != 1))) 
##         stop("'workingDir' must be a single element character vector")
##     if (!is.null(env) && !is.character(env)) 
##         stop("'env' must be a character vector")
##     if (is.null(show) || !is.logical(show)) 
##         stop("'show' must be a logical vector")
##     .Call("rs_terminalExecute", command, workingDir, env, show)
## }
## <environment: 0x55b8df09dda0>
## 
## $terminal.manager$terminalExitCode
## function (id) 
## {
##     if (is.null(id) || !is.character(id) || (length(id) != 1)) 
##         stop("'id' must be a single element character vector")
##     .Call("rs_terminalExitCode", id)
## }
## <environment: 0x55b8df09dda0>
## 
## 
## $texi2dvi
## [1] "/usr/bin/texi2dvi"
## 
## $tikzMetricsDictionary
## [1] "RepRes_analysis-tikzDictionary"
## 
## $timeout
## [1] 60
## 
## $try.outFile
## A connection with                            
## description "output"        
## class       "textConnection"
## mode        "wr"            
## text        "text"          
## opened      "opened"        
## can read    "no"            
## can write   "yes"           
## 
## $ts.eps
## [1] 1e-05
## 
## $ts.S.compat
## [1] FALSE
## 
## $unzip
## [1] "/usr/bin/unzip"
## 
## $useFancyQuotes
## [1] FALSE
## 
## $verbose
## [1] FALSE
## 
## $viewer
## function (url, height = NULL) 
## {
##     if (!is.character(url) || (length(url) != 1)) 
##         stop("url must be a single element character vector.", 
##             call. = FALSE)
##     if (identical(height, "maximize")) 
##         height <- -1
##     if (!is.null(height) && (!is.numeric(height) || (length(height) != 
##         1))) 
##         stop("height must be a single element numeric vector or 'maximize'.", 
##             call. = FALSE)
##     invisible(.Call("rs_viewer", url, height))
## }
## <environment: 0x55b8ddacd658>
## 
## $warn
## [1] 0
## 
## $warning.length
## [1] 1000
## 
## $width
## [1] 92

An object with the information for r option was also exported at the folder of working directory:

  • outputs –> reproducibility_support –> r_session

with filename:

  • r_options.R
## Warning in saveRDS(object = r_options, file = filepath_____r_options): 'package:stats' may
## not be available when loading


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






11.3 MD5 Checksums

To easily verify the integrity and validity of some important files that were either imported or exported through the execution of the script that produces the report with the analysis their MD5 checksums were computed and exported as txt files with the help of a utility function that was created and used, export_md5sums().

Three txt files with MD5 checksums were created:

  1. unprocessed_data_____MD5_checksum.txt
  2. processed_data_____MD5_checksum.txt
  3. results_____MD5_checksum.txt

and exported at the subdirectory of the working directory:

  • output –> reproducibility_support –> MD5_checksums

The original files with the MD5 Checksums have been uploaded to github.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS


11.3.1 Create a utility function to export MD5 checksums

To create txt files with the MD5 checksums the utility function export_md5sums() was created and used.

It takes as input two arguments:

  1. target_files
    • the paths to the files of which we want to compute the MD5 checksums
  2. output_file
    • the path at which the txt file with the MD5 checksums of the target files will be exported

Upon execution, it creates a txt file at the path denoted by the argument ‘output_file’ in which it stores the MD5 checksums of the files found at the paths supplied via the ‘target_files’ argument.

The txt files consists of one row for each of the target files, which:

  • begins with MD5 checksum
  • followed by two spaces
  • ends the the path of the file to which the MD5 corresponds
# utility function: export_md5sum()
#
# Creates and exports a txt file with the MD5 checksums of some target files.  
#
# Arguments:
#  'target_files'  :  A character vector with the paths of the target files,
#                     of which the MD5 checksums will be computed.
#                     All supplied files must exist.
#
#  'output_file'   :  A character string with the path to file 
#                     which will be created to store the MD5 checksums
#                     of the target files.
#                     The output file must end with the txt extenrgtion.
#                     Any number of directories can be included in the path 
#                     prior to the filename, that will be created 
#                     even if they don't exist. 
#
# Return: 
#  If the function executes to correctly it returns a named vector 
#  with the MD5 checksums of the target file, 
#  named after their corresponding paths. 

# Define a utility function to use in order to compute and export 
# the MD5 checksums of the files of interest. 
export_md5sum <- function(target_files, output_file = "MD5.txt") {

  # Check the validity of the supplied arguments.
  ## a single character string with a txt extention must have been supplied 
  ## as the value of 'output_file' argument.
  stopifnot(
    is.character(output_file) &&
      ( length(output_file) == 1 ) &&
      ( tools::file_ext(output_file) == "txt" )
  )
  ## An character vector with arbitrary number of EXISTING files 
  ## must have been supplied as the value of 'target_files' argument. 
  do_all_target_files_exist <- file.exists(target_files) & !dir.exists(target_files)
  if (!all(do_all_target_files_exist)) {
    not_existing_target_files <- target_files[!do_all_target_files_exist]
    stop(
      "\n",
      "The following supplied target files do not exists: ", "\n",
      paste("\t", not_existing_target_files, "\n", sep = "")
    )
  }

  # Computes the MD5 checksums of the target files.
  md5_checksums_of_target_files <- tools::md5sum(target_files)

  # Creates the content of that will be written inside the output file.
  content_of_output_file <- paste(
    unname(md5_checksums_of_target_files),
    "  ",
    names(md5_checksums_of_target_files)
  )

  # If the value of output file contains some directory name 
  # it is identified and created.
  dest_dir <- dirname(output_file)
  if (!dir.exists(dest_dir)) {
    dir.create(dest_dir)
  }
  # A blank output file is created.
  file.create(output_file)

  # If the output file was successfully created..
  if (file.exists(output_file)) {
    # ...it get populated with contents.
    con_to_output_file <- file(output_file)
    writeLines(text = content_of_output_file, con = output_file)
    close(con = con_to_output_file)
  } else {
    # else the operation fails and the execution stops.
    stop(
      "\n",
      "Failed to create the output file at the path:", "\n",
      "\t", output_file,
      "\n"
    )
  }
}


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




11.3.2 MD5 checksum of the input file with the unprocessed data

The input file, repdata_data_StormData.csv.bz2, with the unprocessed data was downloaded from the link that was supplied by the instructions of the assignment.

The same file that was download and used to produce the original report for this analysis was also uploaded at the github and can be accessed from the following link:

A txt file with the MD5 checksum of the input file, repdata_data_StormData.csv.bz2, was exported at the subdirectory of the working directory:

  • output –> reproducibility_support –> MD5_checksums

with name:

  • unprocessed_data_____MD5_checksum.txt

To verify the input file with the unprocessed data, repdata_data_StormData.csv.bz2, compare the MD5 checksum contained at the file with name, unprocessed_data_____MD5_checksum.txt that was exported when you reproduced the analysis with the the original which was uploaded at github and can be accessed through the following link:


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




11.3.3 MD5 checksum of the output file with the processed data

An R file with the table with the processed data was exported through the execution of the script.

The original file have been uploaded at the github and can be accessed from the following link:

A txt file with the MD5 checksum of the output R file, table_with_the_processed_data.R, was exported at the subdirectory of the working directory:

  • output –> reproducibility_support –> MD5_checksums

with name:

  • processed_data_____MD5_checksum.txt

To verify the table with the processed data, table_with_the_processed_data.R, compare the MD5 checksum contained at the file with name, processed_data_____MD5_checksum.txt that was exported when you reproduced the analysis with the the original which was uploaded at github and can be accessed through the following link:


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS




11.3.4 MD5 checksum of the output files with the results

The results obtained in this analysis consist of 6 summary tables.

Those that correspond to the results for the harm on population health (over each of the three perspectives examined) which were exported as R files through the execution of the script, with names:

  1. summary_____harm_on_population_health______fatalities.R
  2. summary_____harm_on_population_health______injuries.R
  3. summary_____harm_on_population_health______casualties.R

And those that correspond to the results for the harm on economy (over each of the three perspectives examined) which were exported as R files through the execution of the script, with names:

  1. summary_____harm_on_economy______property_damage.R
  2. summary_____harm_on_economy______crop_damage.R*
  3. summary_____harm_on_economy______economic_damage.R

A txt file with the MD5 checksum of all 6 output R files described above, was exported at the subdirectory of the working directory:

  • output –> reproducibility_support –> MD5_checksums

with name:

  • resulsts_____MD5_checksum.txt

To verify the results compare the MD5 checksum contained at the file with name, results_____MD5_checksum.txt that was exported when you reproduced the analysis with the the original which was uploaded at github and can be accessed through the link:


back to start of this subsection
back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS






11.4 Random Seed

At the beginning of the analysis a random seed was selected equal to 1234567890 to enhance the reproducibility of the report.

If the procedure have been reproduced correctly, (with respect to random events) at this point it is expected to produce a sample from standard normal distribution with the following 5 values :

  • -2.2152999
  • 0.4738228
  • -0.4869480
  • -0.5343663
  • 1.3206245
## [1] -0.5343663  1.3206245  1.5558662  2.6298662 -0.2373495

However, keep in mind that the only random events that took place through the execution of the script that produces this report happened at the creation of the plots:

  1. Plot 1.1.4
  2. Plot 1.2.4
  3. Plot 1.3.4
  4. Plot 2.1.4
  5. Plot 2.2.4
  6. Plot 2.3.4

by the function geom_repel_label() in order to assign randomly the positions of the labels.

So even if the random seed is not the same only the labels in those plots should be in different places, while the actual results are expected to be the identical.


back to start of this section
back to start of this chapter
back to TABLE OF CONTENTS













12 LICENSE


The script RepRes_analysis.Rmd with the code to conduct the analysis as well as any of the results and outputs obtained when it is executed can be used freely for any propose under the terms of MIT License.

Copyright (c)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


back to start of this chapter
back to TABLE OF CONTENTS













13 REFERENCES


NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007. URL https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf


Storm Data FAQ Page (2008). URL https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf


The History of the Strom Events Database (2014). URL https://www.ncdc.noaa.gov/stormevents/versions.jsp


R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.


RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.


JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2020). rmarkdown: Dynamic Documents for R. R package version 2.1. URL https://rmarkdown.rstudio.com.


Yihui Xie and J.J. Allaire and Garrett Grolemund (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC. ISBN 9781138359338. URL https://bookdown.org/yihui/rmarkdown.


Yihui Xie (2020). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.28.


Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963


Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595


Hao Zhu (2019). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. package version 1.1.0. https://CRAN.R-project.org/package=kableExtra


Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr


Yihui Xie, Joe Cheng and Xianying Tan (2020). DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.13. https://CRAN.R-project.org/package=DT


Julien Barnier (2020). rmdformats: HTML Output Formats and Templates for ‘rmarkdown’ Documents. R package version 0.3.7. https://CRAN.R-project.org/package=rmdformats


Matt Dowle and Arun Srinivasan (2019). data.table: Extension of data.frame. R package version 1.12.8. https://CRAN.R-project.org/package=data.table


van der Loo M, de Jonge E (2019). “Data Validation Infrastructure for R.” Journal of Statistical Software, Accepted for publication. <URL:https://CRAN.R-project.org/package=validate>.


Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr


Lukasz Komsta and Frederick Novomestky (2015). moments: Moments, cumulants, skewness, kurtosis and related tests. R package version 0.14. https://CRAN.R-project.org/package=moments


H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.


Kamil Slowikowski (2020). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.8.2. https://CRAN.R-project.org/package=ggrepel


Baptiste Auguie (2017). gridExtra: Miscellaneous Functions for “Grid” Graphics. R package version 2.3. https://CRAN.R-project.org/package=gridExtra


Yan Holtz (2018). PIMP MY RMD: GitHub Page with tips on refining an rmarkdown document. URL https://holtzy.github.io/Pimp-my-rmd/#github-link


back to start of this chapter
back to TABLE OF CONTENTS













END OF THE REPORT