One of my friends is a Syrian refugee, who was granted asylum in Sweden last year. I also want to try data analysis, so it fits that I should analyze something that’s relevant to my friend. This is my first ever analysis in pandas, apologies for code abomination in advance.
In this analysis, I use pandas for dataframe, numpy for dealing with numbers (because I need to count and do some math with it) and matplotlib for plotting graphs.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
The next step is to clean up the dataframe for further analysis. The steps are:
- Read csv
- Group by origin country and year resettled
- Remove destination country column (because it’s the same value)
- Remove non-integer values (because you can’t do math magic with it)
- Convert year and value to integer (hello, math magic)
## data prep
df = pd.read_csv('unhcr_resettlement_residence_swe.csv')[1:]
df = df.groupby(['Origin', 'Year'], as_index=False).sum() # group by two columns
df = df.drop('Country / territory of asylum/residence', axis=1) # drop destination country column
df = df[(df != '*').all(1)] # remove any rows that has '*' value
df.Year = df.Year.astype(np.int64) # convert year to int
df.Value = df.Value.astype(np.int64) # convert value to int
df
Origin | Year | Value | |
---|---|---|---|
0 | Afghanistan | 1984 | 7 |
1 | Afghanistan | 1985 | 4 |
2 | Afghanistan | 1986 | 4 |
3 | Afghanistan | 1987 | 1 |
4 | Afghanistan | 1988 | 1 |
5 | Afghanistan | 1991 | 2 |
6 | Afghanistan | 1992 | 18 |
7 | Afghanistan | 1997 | 1 |
8 | Afghanistan | 1998 | 5 |
9 | Afghanistan | 1999 | 16 |
10 | Afghanistan | 2000 | 339 |
11 | Afghanistan | 2001 | 270 |
12 | Afghanistan | 2002 | 156 |
13 | Afghanistan | 2003 | 244 |
14 | Afghanistan | 2004 | 314 |
15 | Afghanistan | 2005 | 183 |
16 | Afghanistan | 2006 | 353 |
17 | Afghanistan | 2007 | 185 |
18 | Afghanistan | 2008 | 414 |
19 | Afghanistan | 2009 | 318 |
20 | Afghanistan | 2010 | 336 |
21 | Afghanistan | 2011 | 404 |
22 | Afghanistan | 2012 | 438 |
23 | Afghanistan | 2013 | 219 |
24 | Afghanistan | 2014 | 328 |
25 | Afghanistan | 2015 | 222 |
26 | Afghanistan | 2016 | 20 |
27 | Albania | 1991 | 1 |
28 | Albania | 1992 | 1 |
29 | Albania | 2003 | 3 |
... | ... | ... | ... |
705 | Various/unknown | 2009 | 2 |
706 | Various/unknown | 2013 | 2 |
707 | Venezuela (Bolivarian Republic of) | 2015 | 4 |
708 | Viet Nam | 1984 | 76 |
709 | Viet Nam | 1985 | 48 |
710 | Viet Nam | 1986 | 171 |
711 | Viet Nam | 1987 | 232 |
712 | Viet Nam | 1988 | 94 |
713 | Viet Nam | 1990 | 939 |
714 | Viet Nam | 1991 | 656 |
715 | Viet Nam | 1992 | 474 |
716 | Viet Nam | 1993 | 197 |
717 | Viet Nam | 1994 | 32 |
718 | Viet Nam | 1995 | 4 |
719 | Viet Nam | 1997 | 21 |
720 | Viet Nam | 2002 | 1 |
721 | Viet Nam | 2004 | 10 |
722 | Viet Nam | 2006 | 10 |
723 | Viet Nam | 2009 | 2 |
724 | Viet Nam | 2010 | 6 |
726 | Yemen | 1992 | 1 |
727 | Yemen | 2004 | 1 |
728 | Yemen | 2005 | 4 |
729 | Yemen | 2006 | 1 |
730 | Zimbabwe | 2006 | 4 |
731 | Zimbabwe | 2008 | 1 |
732 | Zimbabwe | 2011 | 1 |
733 | Zimbabwe | 2014 | 7 |
734 | Zimbabwe | 2015 | 6 |
735 | Zimbabwe | 2016 | 9 |
725 rows × 3 columns
Since I want to plot a multiple line graph, I need to supply one dataframe per each line. This step is to create one dataframe per source country and clean it up. For example, if there is one year where no refugees are resettled, that year doesn’t exist in the dataframe, so I have to check whether the years are missing or not, and if missing, create it and set the value to 0.
## create one dataframe per one origin country
UniqueNames = df.Origin.unique()
DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}
for key in DataFrameDict.keys():
DataFrameDict[key] = df[:][df.Origin == key]
def clean_up_dataframe(df):
country = df.Origin.unique()[0]
df = df.drop('Origin', axis=1)
df.index = df.Year
df = df.drop('Year', axis=1)
df = df.rename(columns={'Value': country})
df2 = pd.DataFrame({'Year':range(1983,2016+1), country:0}) # dummy dataframe
df2.index = df2.Year
df2[country] = df[country]
df2 = df2.fillna(0)
df2 = df2[country]
return df2
And because Syria is in the Middle East, I want to focus in the MENA region (Middle East and North Africa). However, the list is too big, and I’ve yet to figure out how to make it look pretty. What I do instead is group countries into each subregion and plot them.
## orginal MENA, too big
UniqueNames_og_mena = ['Algeria', 'Bahrain', 'Djibouti', 'Egypt', 'Iran', 'Iraq', 'Israel', 'Jordan',
'Kuwait', 'Lebanon', 'Libya', 'Mauritania', 'Morocco', 'Oman', 'Palestine', 'Qatar',
'Sahrawi Arab Democratic Republic', 'Saudi Arabia', 'Somalia', 'Sudan', 'Syria', 'Tunisia',
'United Arab Emirates', 'Yemen', 'Afghanistan', 'Armenia', 'Azerbaijan', 'Chad', 'Comoros',
'Cyprus', 'Eritrea', 'Georgia', 'Mali', 'Niger', 'Pakistan', 'Turkey']
## MENA
UniqueNames_mena = ['Algeria', 'Bahrain', 'Djibouti', 'Egypt', 'Iran (Islamic Rep. of)', 'Iraq', 'Jordan',
'Kuwait', 'Lebanon', 'Libya', 'Mauritania', 'Saudi Arabia', 'Somalia', 'Sudan',
'Syrian Arab Rep.', 'Tunisia', 'Yemen', 'Afghanistan',
'Armenia', 'Azerbaijan', 'Chad', 'Eritrea', 'Georgia', 'Pakistan', 'Turkey']
## LEVANT
UniqueNames_levant = [ 'Iraq', 'Jordan', 'Lebanon', 'Syrian Arab Rep.']
## NORTH AFRICA
UniqueNames_north_africa = ['Algeria', 'Djibouti', 'Egypt', 'Libya', 'Mauritania', 'Somalia', 'Sudan',
'Tunisia', 'Chad', 'Eritrea']
def plot(region_name, region_list):
df1 = clean_up_dataframe(DataFrameDict[region_list[0]])
ax = df1.plot(figsize=(15,10))
for i in region_list[1:]:
df = clean_up_dataframe(DataFrameDict[i])
df.plot(ax=ax)
plt.xlabel('Year')
plt.ylabel('Value')
plt.title('Resettled Refugees in Sweden from {} Region Between 1983-2016'.format(region_name))
ax.legend()
plt.show()
## plot('All MENA', UniqueNames_og_mena) # list is too big
plot('MENA', UniqueNames_mena)
You can see that a lot of Iraqi refugees resettled between 1990-1995, which coincides with the Gulf War (1990-1).
plot('Levant', UniqueNames_levant)
This graph shows only refugees from the Levant region. As expected, a lot of Iraqis sought asylum during the 90’s, but Syrian refugees spiked up after 2010, which coincides with Arab Spring (2010-2).
plot('North Africa', UniqueNames_north_africa)
In North Africa, Somalian refugees spiked up around 2010, which is the result from non-functioning government, which resulted in rising clan wars. Additionally, you can see that there are a lot of Eritrean refugees too, from indefinite conscription. Families of those who fled the military are also targeted.