Correlation of IQ with our life (Feature ranking)

  • Tutorial

Prologue


Each of us sometimes has a question that does not give us rest. And as a rule, the answer to such a question can only be obtained by analyzing the experience of a large number of people. I had the following question: “What factors influence IQ and is it even a little advantage?”. Of course, the reader may exclaim that everyone has known everything for a long time and you can read articles on this topic. To some extent you will be right, but alas, articles on IQ turned out to be extremely controversial and have imposed on me even more questions. Therefore, I decided to conduct my modest research on this topic.



Cambridge study in delinquent development


In 1962, in England began a large-scale and long-term study (20 years) on what factors affect asocial behavior. About 500 boys of 10 years were selected as respondents, 890 parameters are attached to each respondent, which describe his youth, growing up, the life of his family and his environment. Among these parameters was the IQ level, which led me to the idea of ​​studying the dependencies between it and other variables.



Import libraries and load data:


import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn import preprocessing
import warnings
warnings.filterwarnings('ignore')
import random as rn
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import svm

data = pd.read_stata('/Users/####/Downloads/ICPSR_08488/DS0001/08488-0001-Data.dta')

Data processing


The IQ coefficient was chosen as the target variable, and it needed a little adjustment:

data['V288'].replace('IQ75',75,inplace=True )
data['V288'].replace('IQ129',129,inplace=True)
data['V288'].replace('IQ128',128,inplace=True)

Selection of necessary features


#Создаем словарь, где мы будем хранить наши коэффициенты важности переменных
ranks = {}
# Создадим функцию для заполнения нашего словаря
def ranking(ranks, names, order=1):
    minmax = MinMaxScaler()
    ranks = minmax.fit_transform(order*np.array([ranks]).T).T[0]
    ranks = map(lambda x: round(x,2), ranks)
    return dict(zip(names, ranks))

#Зададим целевую переменную(Y) 
Y = data['V288'].values
#Убираем Y из нашей обучающей выборки 
IQ = data.drop(['V288'], axis=1)
X = data.as_matrix()
# Названия колонок
colnames = IQ.columns

%matplotlib inline
from sklearn.feature_selection import RFE, f_regression
from sklearn.linear_model import (LinearRegression, Ridge, Lasso, RandomizedLasso)
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
# Теперь выявляем наиболее важные признаки 
rlasso = RandomizedLasso(alpha=0.04)
Y = data['V288'].values
rlasso.fit(X, Y)
ranks["rlasso/Stability"] = ranking(np.abs(rlasso.scores_), colnames)
print('finished')

In order not to overload the article with code, I gave a fragment of only one test for evaluating signs. If it is interesting, I can drop all the sources.

We display all the importance values ​​in our dictionary


r = {}
for name in colnames:
    r[name] = round(np.mean([ranks[method][name] 
                             for method in ranks.keys()]), 2)
methods = sorted(ranks.keys())
ranks["Mean"] = r
methods.append("Mean")
print("\t%s" % "\t".join(methods))
for name in colnames:
    print("%s\t%s" % (name, "\t".join(map(str, 
                         [ranks[method][name] for method in methods]))))

The results matrix is ​​as follows, and the last column displays the average value of importance based on all the tests:



Let's choose the top 100 variables by the average value:

sorted(r, key=r.get, reverse=True)[:100]

Description of the most significant signs of signs


Also, just in case, I rechecked these variables using the Pearson criterion.

1. The average IQ and life condition:
p_value 0.035
Average: 98.171533
terrible: 103.934307

2. Secondary IQ and behavior:
p_value 0.005
Rowdy: 102.395833
adequate: 98.286385

3. Secondary IQ and lies
p_value 0.004
Rarely lying: 94.357895
Periodically lie: 99.627907
lying Often: 101.702381
Always lying: 102.204545 4.

Average IQ and social support:
Implies subsidies and allowances.
p_value 0.004
Not supported by the state: 98.310976
Supported: 107.132530

5. Average IQ and appearance:
p_value 0.011
Neat: 96.295597
Average: 102.608696
Untidy: 100.526316 6.

Average IQ and attention span
p_value 0.007
Good concentration: 98.732218
Poor concentration: 105.186207

7. Average IQ and developmental problems in infancy
p_value 0.012
Normal: 99.294304
Development delay 25

interesting : 104.5 look at the dependence of what kind of IQ a child had at school and how much he has earned already at 30 (average weekly income is taken)

IQ and salary:

111 and higher: 17.500000
101-110: 16.906250
91-100: 17.364486
90 and lower: 17.558140

Conclusion


There are factors that are really capable of influencing our IQ, but on the other hand, IQ in the case of our sample could not affect the level of earnings.

Also popular now: