TwoRavens

Two Ravens: Event Data

###

This version of TwoRavens is used to subset and aggregate event data. Events are classified by their time, location, actors, and actions. We currently offer twelve datasets with event data; however, some of these datasets lack one of the aforementioned categories of data. Below we list changes, standardizations, and constructions we have made to the datasets to allow cross-dataset analysis.

Subsets

Event data datasets generally classify events by date, location, actors, and actions. If a dataset is missing one of these categories, it is not included in the menus for subsetting. All edits to datasets are prefixed with "TwoRavens_" with the exception of actors; this allows users to download the dataset and drop all columns prefixed with "TwoRavens_" to restore the original dataset.

Date

We standardize date fields to a timestamp. All dates fields are converted and prefixed with "TwoRavens_". The three important fields used in EventData are: "TwoRavens_start date", "TwoRavens_end date", and "TwoRavens_date info". "TwoRavens_start date" and "TwoRavens_end date" correspond to the start and end of the event; all times are aligned to midnight of each date. "TwoRavens_date info" is used to represent the accuracy of the start and end date; a "0" represents an exact start and end date, a "1" represents the exact day is missing, a "2" represents the exact month is missing, and a "3" represents both the day and month are missing. Missing dates are aligned to the earliest date available: a missing day will be aligned to the first day of that month, and a missing month will be aligned to the first day of that year.

Location

We standardize location fields to two fields: "TwoRavens_country" and "TwoRavens_country_historic". The former is using the ISO-3 standard to represent the modern day country; the latter is using the Correlates of War (COW) standard to represent the historical state of the event. For datasets using alternate country codes, we use a translation table to convert to COW codes. If an alternate code is not in the original COW code list or if the COW code is referring to a modern-day region, we have created a substitute (historical regions and states are discussed further below). We try to pick substitute codes that are close to their counterpart. The following is a list of substitutes:

Substitute ISO-3

StateName COW COWcode ISO-3 UN M.49 Notes
Yugoslavia YUG 345 MNE 499 defaulted to Montenegro
Kosovo KOS 347 XKX 412 is listed as an alternate ISO-3 code

Substitute COW

StateName COW COWcode ISO-3 Notes
Puerto Rico PRI 3 PRI US territory
Virgin Island, US VIR 4 VIR US territory
American Samoa ASM 5 ASM US territory
Guam GUM 6 GUM US territory
Northern Mariana Islands MNP 7 MNP US Commonwealth
United States Minor Outlying Islands UMI 8 UMI US territories
Jersey JEY 203 JEY self-governing British dependency
Isle of Man IMN 202 IMN self-governing British dependency
Guernsey GGY 201 GGY island in English Channel
Saint Helena SHN 199 SHN British Overseas Territories (BOTs)
Bermuda BMU 198 BMU BOTs
Falkland Islands FLK 197 FLK BOTs
South Georgia and the South Sandwich Islands SGS 196 SGS BOTs
British Indian Ocean Territory IOT 195 IOT BOTs
Anguilla AIA 194 AIA BOTs
Cayman Islands CYM 193 CYM BOTs
Montserrat MSR 192 MSR BOTs
Turks and Caicos Islands TCA 191 TCA BOTs
Virgin Island, British VGB 190 VGB BOTs
Gibraltar GIB 189 GIB BOTs
Pitcairn PCN 188 PCN BOTs
Netherlands Antilles ANT 209 ANT former country of the Netherlands
Aruba ABW 207 ABW country of the Netherlands
Mayotte MYT 219 MYT overseas region of France
Reunion REU 218 REU overseas region of France
Saint Pierre and Miquelon SPM 217 SPM French territory
French Guiana GUF 216 GUF overseas region of France
Saint Barthelemy BLM 215 BLM French territory
Saint Martin (French part) MAF 214 MAF overseas region of France
New Caledonia NCL 213 NCL overseas region of France
Guadeloupe GLP 170 GLP overseas region of France
Martinique MTQ 171 MTQ overseas region of France
French Polynesia PYF 172 PYF overseas region of France
Wallis and Futuna WLF 173 WLF overseas region of France
French Southern Territories ATF 174 ATF overseas region of France
Holy See VAT 324 VAT Vatican City State
Serbia SRB 342 SRB parted from Serbia and Montenegro
Aland Islands ALA 374 ALA region of Finland
Bouvet Island BVT 384 BVT dependency of Norway
Svalbard and Jan Mayen SJM 383 SJM island of Norway
Greenland GRL 389 GRL autonomous Danish (Denmark) territory
Faroe Islands FRO 388 FRO autonomous country of Denmark
People's Republic of the Congo PRC 485 COG socialist state that was eventually replaced by Congo (Republic of)
Zanzibar TAZ 511 TZA semi-autonomous region of Tanzania
Western Sahara ESH 599 ESH disputed territory by Morocco
Palestinian Territory PSE 665 PSE occupied by Israel
Hong Kong HKG 709 HKG former British territory, Special Administrative Region (SAR) of China
Macao MCA 708 MAC SAR of China
Cocos Islands CCK 899 CCK territory of Australia
Christmas Island CXR 898 CXR territory of Australia
Norfolk Island NFK 897 NFK territory of Australia
Heard Island and McDonald Islands HMD 896 HMD territory of Australia
Cook Islands COK 919 COK island associated with New Zealand
Niue NIU 918 NIU island associated with New Zealand
Tokelau TKL 917 TKL dependent territory of New Zealand
Antartica ATA 999 ATA multiple territories

New ISO-3 and COW

StateName COW COWCode ISO-3 UN M.49 Notes
CuraƧao CUW 208 CUW 530 country from Netherlands Antilles after dissolution
Sint Maarten SXM 206 SXM 664 constituent country of Netherlands
Corsica CRS 175 CRS --- French Mediterranean island; added as part of GTD
International III 1 III 1 international group of countries; added as part of GTD
Multinational MTN 0 MTN 0 multinational group of countries; added as part of GTD
Asian ASN 1000 ASN --- group of countries in Asia; added as part of GTD

Dissolved states

For States that do not exist anymore, we store their date of dissolution as part of the translation table. This allows for greater detail when subsetting by date and location. The ISO-3 code is the modern day state name, whereas the COW code is the state name at the time of the event. The following is a list of countries that fall under this category:

StateName COW ISO-3 DateofDissolution Notes
Hanover HAN DEU August 23, 1866 Austro-Prussian War
Bavaria BAV DEU November 11, 1918 WW1
German Federal Republic GFR DEU October 3, 1990 German reunification
German Democratic Republic GDR DEU October 3, 1990 German reunification
Baden BAD DEU September 2, 1945 WW2
Saxony SAX DEU November 11, 1918 WW1
Wuerttemburg WRT DEU November 11, 1918 WW1
Hesse Electoral HSE DEU August 23, 1866 Austro-Prussian War
Hesse Grand Ducal HSG DEU November 11, 1918 WW1
Mecklenburg Schwerin MEC DEU November 11, 1918 WW1
Austria-Hungary AUH AUT November 11, 1918 WW1
Czechoslovakia CZE CZE January 1, 1993 CZE (COW) is for Czechoslovakia; CZE (ISO-3) is for Czech Republic
Papal States PAP ITA September 20, 1870 Capture of Rome
Two Sicilies SIC ITA March 17, 1861 Declaration of Unification
Modena MOD ITA December 3, 1859 Italian Unification
Parma PMA ITA December 3, 1859 Italian Unification
Tuscany TUS ITA December 8, 1859 Italian Unification
Yugoslavia YUG MNE June 3, 2006 split into Serbia and Montenegro; defaulted to Montenegro (MNE)
Yemen Arab Republic YAR YEM May 22, 1990 Yemeni unification
Yemen People's Republic YPR YEM May 22, 1990 Yemeni unification
Korea KOR KOR July 27, 1953 Korean War; defaulted to South Korea (KOR)
Republic of Vietnam RVN VNM July 2, 1976 Reunification of Vietnam; RVN is South Vietnam; DRV is modern day Vietnam
Netherlands Antilles ANT ANT October 10, 2010 Disestablishment of Netherlands Antilles
People's Republic of the Congo PRC COG January 31, 1969-December 31, 1992 socialist state that was eventually replaced by Congo (Republic of)

Notes and other standardizations

Note that the Soviet Union and the resulting Commonwealth of Independent States are not in COW; they are mapped to Russia (RUS).

For other standardizations (Gleditsch and Ward number (GW codes) and GTD currently), I have mapped them to the COW codes. If there are codes in these standardizations that are not in COW and are not in the datasets themselves, I have not included them (to do later).

Process

The standardization process begins with extracting the field in the dataset with location information. If coordinate data is present, this is used to reverse geolocate the following fields: "TwoRavens_address", "TwoRavens_city", "TwoRavens_country", "TwoRavens_postal", "TwoRavens_postal_ext", "TwoRavens_region", "TwoRavens_subregion". If a physical location name is present, this is used to geolocate the previous fields. Only "TwoRavens_country" is used in EventData; this is in ISO-3 format. We then map the ISO-3 code with the event date to COW to fill the "TwoRavens_country_historic" field.

The full table of alignments can be found in here. All references below refer to the column names in the JSON file.

Below is a list of the corresponding fields of datasets that have been standardized:

  • acled_africa: field is "ISO", in UN M.49 format
  • acled_asia: field is "ISO", in UN M.49 format
  • acled_middle_east: field is "ISO", in UN M.49 format
  • cline_phoenix_fbis: field is "countryname", in ISO-3 format; may have empty fields
  • cline_phoenix_nyt: field is "countryname", in ISO-3 format; may have empty fields
  • cline_phoenix_swb: field is "countryname", in ISO-3 format; may have empty fields
  • cline_speed: field is "GP7", "GP8" (coordinate data); may have empty TwoRavens_country fields
  • ged: field is "country_id", in GW format
  • gtd: field is "country", in GTD format
  • icews: field is "Country", in ICEWS format
  • terrier: field is "country_code", in ISO-2 format

Coordinate data

If longitude and latitude data are present, a subset option called "Coordinates" is available for regional subsetting.

Actors

We typically use the dataset's classification schema of actors. Actor data is represented as a source agent and a target agent. If no actor data is present, the subset menu for actors is not shown. If the dataset uses countries as actors, then we offer two versions of actors to subset on: modern or historic country codes (see Locations for more information). The modern codes are under "TwoRavens_country_src" and "TwoRavens_country_tgt", and the historic codes are under "TwoRavens_country_historic_src" and "TwoRavens_country_historic_tgt".

If the dataset stores actors in a list, these are parsed and each combination of actors is split into an individual event. For example, if a dataset has the source actors as [ctryA, ctryB] and the target actors as [ctryC, ctryD], then four events would replace the original: ctryA to ctryC, ctryA to ctryD, ctryB to ctryC, and ctryB to ctryD.

Actions

The dataset's classification schema of actors is used. If a conversion can be made to another format, then the option to subset on these different formats is made available. Below is a list of formats that we currently support conversions between:

  • CAMEO
  • CAMEO root code (first two digits of the CAMEO code)
  • Phoenix penta class (see here for conversion)
  • PLOVER