[75.06 / 95.58] Organización de Datos
Trabajo Práctico 2: Machine Learning

Creación de Dataframes

Grupo 30: Datatouille

http://fdelmazo.github.io/7506-Datos/

En este notebook, que es un re-trabajo sobre el Notebook Anexo del TP1 se buscan features del set de datos y se crean nuevos atributos a partir de ellos. Estos se almacenan en nuevos archivos .csv para luego ser concatenados al set de datos principal.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

df = pd.read_csv('./data/events_up_to_01062018.csv', low_memory=False)
df['timestamp'] = pd.to_datetime(df['timestamp'])
In [2]:
print("len: {}".format(len(df)))
df.head()
len: 2341681
Out[2]:
timestamp event person url sku model condition storage color skus ... search_engine channel new_vs_returning city region country device_type screen_resolution operating_system_version browser_version
0 2018-05-18 00:11:59 viewed product 4886f805 NaN 9288.0 Samsung Galaxy J7 Prime Excelente 32GB Dourado NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2018-05-18 00:11:27 viewed product ad93850f NaN 304.0 iPhone 5s Muito Bom 32GB Cinza espacial NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2018-05-18 00:11:16 viewed product 0297fc1e NaN 6888.0 iPhone 6S Muito Bom 64GB Prateado NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2018-05-18 00:11:14 viewed product 2d681dd8 NaN 11890.0 iPhone 7 Bom 128GB Vermelho NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2018-05-18 00:11:09 viewed product cccea85e NaN 7517.0 LG G4 H818P Excelente 32GB Branco NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 23 columns

brands.csv

Marcas de celulares

In [3]:
df['model'].unique()
Out[3]:
array(['Samsung Galaxy J7 Prime', 'iPhone 5s', 'iPhone 6S', 'iPhone 7',
       'LG G4 H818P', nan, 'iPhone 6', 'iPhone 6 Plus',
       'Motorola Moto G5 Plus', 'Motorola Moto G4 Plus', 'iPad Air Wi-Fi',
       'Samsung Galaxy S6 Flat', 'Samsung Galaxy S8 Plus',
       'Samsung Galaxy S5', 'iPhone SE', 'Samsung Galaxy S7',
       'Motorola Moto X2', 'Motorola Moto G3 4G',
       'Motorola Moto X Play 4G Dual', 'Motorola Moto G4 Play',
       'Samsung Galaxy J5', 'Samsung Galaxy S5 New Edition Duos ',
       'Motorola Moto G3 HDTV', 'Motorola Moto G5 ', 'Samsung Galaxy J3',
       'Samsung Galaxy S6 Edge', 'Samsung Galaxy Note 8',
       'Sony Xperia Z2', 'Motorola Moto G2 3G Dual',
       'Motorola Moto Z Play', 'Samsung Galaxy S8', 'Samsung Galaxy On 7',
       'Samsung Galaxy S7 Edge', 'iPhone 8', 'iPhone 6S Plus',
       'LG K10 Novo', 'Samsung Galaxy Core Plus Duos TV', 'iPhone 5c',
       'iPhone 4S', 'Motorola Moto X Style', 'iPhone 7 Plus',
       'Samsung Galaxy A5 2016', 'Samsung Galaxy A5',
       'Motorola Moto G5S Plus', 'Samsung Galaxy J7',
       'Samsung Galaxy A3 2016', 'Samsung Galaxy S3 i9300',
       'Sony Xperia M4 Aqua Dual', 'Samsung Galaxy Note 3',
       'Motorola Moto G2 4G Dual', 'iPad Mini Wi-Fi + 4G', 'iPhone 5',
       'Samsung Galaxy Note 5', 'Sony Xperia M4 Aqua', 'LG G5 SE',
       'Motorola Moto Z', 'Samsung Galaxy Note 4', 'iPad Air 2 Wi-Fi',
       'Samsung Galaxy S4 i9515', 'Sony Xperia Z5',
       'Samsung Galaxy J5 PRO', 'Samsung Galaxy S6 Edge Plus',
       'Lenovo Vibe A7010 Dual Chip', 'Samsung Galaxy J1 Mini',
       'Samsung Galaxy S3 Mini', 'iPhone 4G', 'Samsung Galaxy Y Duos',
       'Motorola Moto X Force', 'Motorola Moto Z2 Play',
       'Samsung Galaxy Gran Prime Duos TV', 'Samsung Galaxy A5 2017',
       'Samsung Galaxy Gran Neo Duos', 'Motorola Moto MAXX ',
       'Motorola Moto G4 DTV', 'Samsung Galaxy A9 Pro 2016',
       'Samsung Galaxy S5 Mini Duos', 'Samsung Galaxy A7 2017',
       'Samsung Galaxy J2 Prime TV', 'Samsung Galaxy Gran Prime 3G Duos',
       'Sony Xperia Z3 Dual', 'Sony Xperia Z3 TV',
       'Sony Xperia Z3 Compact', 'Samsung Galaxy J5 Prime',
       'Motorola Moto E2 4G Dual', 'Samsung Galaxy Core 2 Duos',
       'LG Nexus 5 D821', 'LG K10 TV', 'Motorola Moto E2 3G Dual',
       'Samsung Galaxy J2 4G Duos TV', 'LG K10',
       'Motorola Moto G4 Play DTV', 'Samsung Galaxy A7', 'Quantum GO 4G',
       'iPad Air 2 Wi-Fi + 4G', 'LG G4 Stylus H630',
       'Samsung Galaxy J5 2016 Metal', 'LG G3 Beat D724',
       'Samsung Galaxy Win Duos', 'Samsung Gear S2',
       'Sony Xperia Z3 Plus', 'Samsung Galaxy S4 i9505', 'Lenovo Vibe K5',
       'iPhone X', 'LG G4 Beat H736', 'Samsung Galaxy S5 Duos',
       'iPhone 8 Plus', 'Samsung Galaxy A7 2016', 'LG G3 Stylus D690',
       'Samsung Galaxy E5 4G Duos', 'LG G3 D855',
       'Samsung Galaxy Grand Duos i9082', 'Samsung Galaxy J1 2016',
       'Samsung Galaxy J7 2016 Metal', 'Samsung Galaxy J7 Neo',
       'Samsung Galaxy S5 New Edition', 'Samsung Galaxy J2 4G Duos ',
       'Samsung Galaxy E7', 'Samsung Galaxy Gran 2 Duos TV',
       'Samsung Galaxy Win 2 Duos TV', 'Samsung Galaxy Note 3 Neo Duos',
       'LG Prime Plus H522', 'Sony Xperia Z3 ', 'Sony Xperia Z5 Premium',
       'Samsung Galaxy Note 2 N7100', 'Samsung Galaxy Note Edge',
       'iPad Mini 3 Wi-Fi + 4G', 'Samsung Galaxy Tab 3 10.1 Wi-Fi + 3G',
       'Samsung Galaxy S3 Duos', 'LG G4 H815P', 'Samsung Galaxy A3 Duos',
       'LG L Prime D337', 'iPad 3 Wi-Fi', 'Samsung Galaxy S5 Mini',
       'iPad Mini 4 Wi-Fi', 'Samsung Galaxy Tab S2 8 Wi-Fi + 4G',
       'Samsung Galaxy J7 PRO', 'LG  X Screen',
       'Samsung Galaxy S4 Mini Duos', 'Asus Zenfone 3 Max 16 GB',
       'Samsung Galaxy S3 Slim Duos', 'Quantum Muv Pro',
       'Samsung Galaxy Tab E 7 Wi-Fi', 'Quantum Muv',
       'Samsung Gear Fit 2 Pequeno', 'iPad Mini Wi-Fi',
       'iPad Air Wi-Fi + 4G', 'LG X Power', 'iPad 2 Wi-Fi + 3G',
       'iPad Mini 3 Wi-Fi', 'Samsung Galaxy S4 i9500',
       'Samsung Galaxy S4 Mini', 'Samsung Galaxy Tab S 10.5 Wi-Fi + 4G',
       'Motorola Moto Z Power Edition',
       'Samsung Galaxy Tab 4  10.1 Wi-Fi + 3G',
       'Samsung Galaxy Tab A 2016 10.1  W-Fi + 4G', 'iPad 4 Wi-Fi',
       'Asus Zenfone 3 Max  32 GB', 'Samsung Galaxy Tab S 8.4 Wi-Fi + 4G',
       'Samsung Galaxy Tab A com S Pen 8 Wi-Fi + 4G',
       'LG G4 Stylus HDTV H540T', 'LG K8', 'iPad 4 Wi-Fi + 4G', 'LG K4',
       'Motorola Moto G5S', 'Samsung Galaxy Tab Pro 10.1 Wi-Fi',
       'Samsung Gear Fit 2 Grande', 'LG L80 Dual', 'Asus Zenfone Selfie',
       'Motorola Moto Z2 Force', 'Quantum YOU',
       'Samsung Galaxy Gran Neo Plus Duos', 'Asus Zenfone 6',
       'iPad Mini 2 Wi-Fi', 'Samsung Galaxy Gran Prime Duos',
       'Asus Zenfone 5', 'Samsung Gear S3 Frontier',
       'Samsung Galaxy Tab S2 9.7 Wi-Fi + 4G', 'Samsung Galaxy Mega Duos',
       'Samsung Galaxy Tab E 9.6 Wi-Fi', 'iPad Mini 2 Wi-Fi + 4G',
       'iPad 3 Wi-Fi + 4G', 'Asus Zenfone 2 Deluxe',
       'Motorola Moto G1 4G', 'Samsung Gear S3 Classic', 'Quantum Muv Up',
       'Samsung Galaxy Tab E 9.6 Wi-Fi + 3G', 'iPad 2 Wi-Fi',
       'Asus Zenfone 2', 'Samsung Galaxy Tab E 7 Wi-Fi + 3G',
       'Asus Zenfone Go', 'Asus Zenfone 2 Laser', 'Asus Live',
       'Asus Zenfone 2 Laser 6"', 'Quantum GO 3G',
       'Samsung Galaxy S3 Neo Duos i9300i',
       'Samsung Galaxy Pocket 2 Duos', 'Samsung Galaxy Young 2 Duos TV',
       'iPad Mini 4 Wi-Fi + 4G', 'Motorola Moto E1',
       'Samsung Galaxy Tab S 10.5 Wi-Fi',
       'Samsung Galaxy Tab 4 10.1 Wi-Fi', 'Sony Xperia Z ULTRA',
       'Motorola Moto G1 3G', 'Xiaomi Redmi 2', 'LG G2 Mini D618',
       'LG Nexus 4', 'Outros TV LED 15', 'Motorola Moto E4 Plus',
       'Samsung Galaxy S Duos 2'], dtype=object)
In [4]:
checked = ['iphone', 'samsung', 'motorola', 'lenovo', 'sony', 'lg', 'ipad', 'asus', 'quantum', 'blackberry']
model_parsed = df['model'].dropna().map(lambda x: x.lower())
model_parsed = model_parsed.map(lambda x: x.split())

def find_brand(model):
    for str in model:
        if str in checked:
            return str
    return "other"


df['brand'] = model_parsed.map(find_brand)
df['brand'] = df['brand'].astype('category')
In [5]:
df[['model', 'brand']].dropna().drop_duplicates().to_csv('data/brands.csv', index=False)
In [6]:
brands_csv = pd.read_csv("data/brands.csv", low_memory=False)
display(brands_csv.head())
display(brands_csv.shape)
model brand
0 Samsung Galaxy J7 Prime samsung
1 iPhone 5s iphone
2 iPhone 6S iphone
3 iPhone 7 iphone
4 LG G4 H818P lg
(208, 2)

os.csv

Sistemas operativos de celulares

In [7]:
df['operating_system_version'].unique()
Out[7]:
array([nan, 'Android 5.0.2', 'Ubuntu ', 'Android 7', 'Android 6.0.1',
       'Windows 7 ', 'Windows 10 ', 'iOS 11.0.3', 'Android 6',
       'Android 4.4.4', 'Android 7.1.1', 'Mac OS X 10.12.6',
       'Android 5.1', 'Windows 8.1 ', 'Android 5.1.1', 'Android 8.1',
       'Windows 8 ', 'iOS 9.3.5', 'Android 4.2.2', 'Android 5',
       'iOS 11.3', 'Android 4.1.2', 'Android 4.4.2', 'Android 5.0.1',
       'iOS 11.1.1', 'Windows XP ', 'iOS 10.3.3', 'Windows Phone 8.1',
       'Chrome OS 10452.85', 'Android 8', 'Mac OS X 10.10.4',
       'iOS 11.2.6', 'Android ', 'Android 4.3', 'Mac OS X 10.11.6',
       'Windows Vista ', 'iOS 11.1.2', 'Fedora ', 'Windows Phone 10',
       'Linux ', 'Mac OS X 10.13.4', 'Android 7.1.2', 'iOS 8.1.3',
       'iOS 11.2.1', 'Android 4.0.3', 'FreeBSD ', 'iOS 11.2.2',
       'Android 2.3.6', 'iOS 10.2.1', 'iOS 7.1.2', 'Android 4.0.4',
       'Mac OS X 10.7.5', 'Chrome OS 9901.77', 'Chrome OS 10323.67',
       'Chrome OS 10452.96', 'Other ', 'iOS 8.1.1', 'iOS 11.0.2',
       'iOS 11.2.5', 'iOS 10.3.1', 'Mac OS X 10.10.5', 'Mac OS X 10.11.5',
       'iOS 10.3.2', 'iOS 11.2', 'Android 3.2', 'Mac OS X 10.13.3',
       'Mac OS X 10.13.1', 'Mac OS X 10.12.2', 'Android 4.4.3',
       'BlackBerry OS 10.3.2', 'Android 6.1', 'iOS 11.1',
       'Windows Phone 8', 'iOS 10.2', 'iOS 11.4', 'BlackBerry OS 10.3.3',
       'Android 4.1.1', 'Chrome OS 10452.74', 'Mac OS X 10.12',
       'Android 4.4', 'iOS 10.1', 'Mac OS X 10.13.2', 'iOS 10.1.1',
       'Mac OS X 10.12.5', 'Mac OS X 10.8.5', 'iOS 8.2', 'Mac OS X 10.10',
       'iOS 10.0.2', 'iOS 9.1', 'iOS 6.1.6', 'Mac OS X 10.9', 'iOS 9.2.1',
       'Mac OS X 10.9.5', 'Mac OS X 10.11', 'Mac OS X 10.6.8',
       'iOS 7.1.1', 'iOS 3.2', 'Windows RT ', 'Tizen 2.3', 'iOS 9.3.2',
       'Mac OS X 10.13', 'Mac OS X 10.11.4', 'Mac OS X 10.7', 'iOS 8.4.1',
       'iOS 11', 'iOS 11.0.1', 'Chrome OS 10032.86', 'Android 7.0.1',
       'iOS 5.0.1', 'iOS 7.0.3', 'Mac OS X 10.12.4', 'Android 7.1',
       'iOS 9.2', 'Android 3.2.2', 'Mac OS X 10.13.5', 'iOS 8.1',
       'Symbian OS ', 'Chrome OS 10176.76', 'Chrome OS 10176.66',
       'Chrome OS 10323.62', 'iOS 6.1.2', 'Tizen 3', 'iOS 10.3', 'iOS 5',
       'Tizen 2.4', 'Android 4', 'Mac OS X 10.10.1', 'Chrome OS 10032.75',
       'iOS 9.3.4', 'BlackBerry OS ', 'Android 10.0.2', 'iOS 8.3'],
      dtype=object)
In [8]:
checked = ['windows', 'android', 'linux', 'mac', 'ios', 'ubuntu', 'chrome os', 'tizen', 'blackberry', 'other']
os_version_parsed = df['operating_system_version'].dropna().map(lambda x: x.lower())

def find_os(os_version):
    for os in checked:
        if os in os_version:
            return os
    return "another"


df['operating_system'] = os_version_parsed.map(find_os)
df['operating_system'] = df['operating_system'].astype('category')
In [9]:
# Chequeamos cuantos os quedaron con el nombre 'another' (idealmente, ninguno)
df[['operating_system_version', 'operating_system']].dropna().head(10)
df[df['operating_system'] == 'another'][['operating_system_version', 'operating_system']].head(10)
Out[9]:
operating_system_version operating_system
2139528 Fedora another
2139529 Fedora another
2141817 FreeBSD another
2260955 Fedora another
2260956 Fedora another
2267906 Symbian OS another
In [10]:
df['operating_system'].value_counts()
Out[10]:
android       96901
windows       96531
ios            8312
mac             902
linux           766
ubuntu          230
chrome os       205
other           105
blackberry       74
tizen            37
another           6
Name: operating_system, dtype: int64
In [11]:
df[['operating_system_version', 'operating_system']].dropna().drop_duplicates().to_csv('data/os.csv', index=False)
In [12]:
os_csv = pd.read_csv("data/os.csv", low_memory=False)
display(os_csv.head())
display(os_csv.shape)
operating_system_version operating_system
0 Android 5.0.2 android
1 Ubuntu ubuntu
2 Android 7 android
3 Android 6.0.1 android
4 Windows 7 windows
(131, 2)

browsers.csv

Exploradores de internet desde los cuales se accedió al sitio

In [13]:
df['browser_version'].unique().tolist()
Out[13]:
[nan,
 'Chrome Mobile 66.0',
 'Firefox 57',
 'Chrome 66.0',
 'Chrome Mobile 34.0',
 'Chrome Mobile 65.0',
 'Facebook 172',
 'Facebook 173',
 'Mobile Safari 11',
 'Chrome 65.0',
 'Chrome 67.0',
 'Safari 11.1',
 'Firefox 60',
 'Firefox 59',
 'IE 11',
 'Chrome Mobile 39',
 'Chrome Mobile 56.0',
 'Chrome 63.0',
 'Chrome 64.0',
 'Chrome Mobile 64.0',
 'Mobile Safari 9',
 'Chrome 62.0',
 'Facebook 166',
 'Facebook 163',
 'Chrome Mobile 63.0',
 'Chrome 61.0',
 'Chrome 55.0',
 'Chrome 51.0',
 'Chrome Mobile 61.0',
 'Chrome Mobile 55.0',
 'Samsung Internet 6.4',
 'Opera 52.0',
 'Chrome 49.0',
 'Edge 13.10586',
 'Facebook 94',
 'Android 5.1',
 'Chrome Mobile 62.0',
 'Opera 53.0',
 'Edge 17.17134',
 'Facebook 91',
 'Facebook 92',
 'Facebook 88',
 'Facebook 89',
 'Chrome 68.0',
 'Edge 16.16299',
 'Chrome Mobile 50.0',
 'Chrome Mobile 58.0',
 'Chrome 56.0',
 'Mobile Safari 10',
 'Chrome 60',
 'Chrome Mobile 46.0',
 'Firefox 54',
 'Facebook 171',
 'IE Mobile 11',
 'Chrome 57.0',
 'Chrome Mobile iOS 64.0',
 'Safari 8.0',
 'Android 4.4',
 'Firefox 52',
 'Samsung Internet 7',
 'Samsung Internet 4',
 'Firefox 43',
 'Opera Mini 20.0',
 'Chrome Mobile 57.0',
 'Android 4.1',
 'Chrome Mobile 39.0',
 'Facebook 170',
 'Facebook 167',
 'Chrome Mobile 43.0',
 'Chrome 58.0',
 'Chrome Mobile iOS 63.0',
 'Android 4.3',
 'Chrome 39',
 'Firefox Mobile 60',
 'Chrome 58.5',
 'Chrome 40.0',
 'Chrome Mobile iOS 66.0',
 'Chrome Mobile 59.0',
 'Chrome Mobile 51.0',
 'Edge Mobile 13.10586',
 'Facebook 159',
 'Chrome Mobile 30',
 'Chrome Mobile 30.0',
 'Samsung Internet 4.2',
 'UC Browser 12.5',
 'Facebook 153',
 'Facebook 169',
 'Chrome Mobile 28.0',
 'Chrome 41.0',
 'Samsung Internet 3.5',
 'Samsung Internet 6.2',
 'Facebook 160',
 'Mobile Safari 8',
 'Facebook 161',
 'Facebook 162',
 'Chrome Mobile 53.0',
 'Chrome Mobile 54.0',
 'Chrome 42.0',
 'Edge Mobile 15.15063',
 'Chrome Mobile 40.0',
 'Chrome 43.0',
 'Chrome 44.0',
 'Chrome 59.0',
 'UC Browser 7.0',
 'Samsung Internet 3.3',
 'Chrome 46.0',
 'Chrome 23.0',
 'Android 2.3',
 'Facebook 164',
 'Mobile Safari 7',
 'Chromium 66.0',
 'Chrome Mobile 44.0',
 'Android 4.0',
 'Opera 50.0',
 'Firefox 17',
 'Opera Mini 8.0',
 'Facebook 111',
 'Opera Mobile 35.0',
 'IE 9',
 'Firefox 58',
 'Chrome Mobile iOS 65.0',
 'Facebook 170.1',
 'Facebook 168',
 'Chrome Mobile iOS 47.0',
 'Chrome 45.0',
 'WebKit Nightly 535.20',
 'Chrome Mobile 60.0',
 'Facebook 93',
 'Opera Mini 6.5',
 'Chrome Mobile 52.0',
 'Samsung Internet 5.4',
 'Chrome Mobile 40',
 'Chrome 39.0',
 'Opera 51.0',
 'Firefox 48',
 'Firefox Mobile 59',
 'Safari 10.1',
 'Samsung Internet 2.1',
 'Samsung Internet 5',
 'Facebook 174',
 'Facebook 154',
 'Edge Mobile 14.14393',
 'Chrome Mobile 35.0',
 'Firefox 49',
 'Chrome Mobile 38.0',
 'Facebook 86',
 'Vivaldi 1.96',
 'Firefox 10.0',
 'UC Browser 12.2',
 'Android 4.2',
 'Samsung Internet 5.2',
 'Chrome 54.0',
 'Facebook 156',
 'Opera Mini 33.0',
 'Chrome Mobile 18.0',
 'Chrome Mobile 47.0',
 'Edge 14.14393',
 'Firefox 37',
 'Firefox 45',
 'Puffin 7.5',
 'Puffin 7.1',
 'Samsung Internet 6',
 'Facebook 90',
 'Opera Mini 28.0',
 'Chrome Mobile 45.0',
 'Mobile Safari UI/WKWebView 10.2',
 'Firefox 36',
 'Mobile Safari UI/WKWebView 10.3',
 'Chrome 30',
 'Chrome 57.9',
 'UC Browser 10.10',
 'Chrome Mobile 49.0',
 'Opera Mini 5.0',
 'Safari 11.0',
 'Edge Mobile 15.15254',
 'Chrome 60.0',
 'Edge 12.10240',
 'IE 10',
 'Firefox 50',
 'Chrome 52.0',
 'Chrome 38.0',
 'Chrome Mobile 33',
 'Firefox 51',
 'Chromium 64.0',
 'Facebook 158',
 'Firefox 47',
 'Mobile Safari 10.3',
 'Chrome 57.14',
 'UC Browser 11.5',
 'BlackBerry WebKit 10.3',
 'Chrome 34.0',
 'Chrome 58.1',
 'Edge 15.15063',
 'Chrome Mobile 36.0',
 'Chrome 53.0',
 'Chrome 58.8',
 'Opera 49.0',
 'Opera 36.0',
 'IE Mobile 10',
 'Samsung Internet 2',
 'Chrome 32.0',
 'Chrome 48.0',
 'Firefox 53',
 'Facebook 155',
 'Facebook 165',
 'Mobile Safari 11.2',
 'Chrome 22.0',
 'Facebook 96',
 'Facebook 62',
 'Firefox 38',
 'Chrome 33',
 'Firefox 55',
 'Firefox 56',
 'Firefox 40',
 'Firefox 46',
 'Firefox Mobile 47',
 'Opera Mobile 33.0',
 'Samsung Internet 7.2',
 'Chrome Mobile iOS 59.0',
 'Opera Mini 7.1',
 'Opera 9.80',
 'Chrome 58.2',
 'Facebook 81',
 'Yandex Browser 18.4',
 'Mobile Safari UI/WKWebView 7.1',
 'Chrome Mobile 31.0',
 'Opera Mobile 46.1',
 'Opera Mobile 45.1',
 'Chrome 26.0',
 'Mobile Safari 7.1',
 'Mobile Safari 9.3',
 'Firefox 15',
 'Chromium 25.0',
 'Mobile Safari UI/WKWebView 11.2',
 'Chrome Mobile 67.0',
 'Firefox 24',
 'Chrome 50.0',
 'Facebook 95',
 'K-Meleon 76',
 'Facebook 135',
 'Chrome 27.0',
 'Facebook 87',
 'Chrome 57.16',
 'Chromium 65.0',
 'Chromium 62.0',
 'Facebook 145',
 'Facebook 175',
 'Firefox 31',
 'Chrome Mobile 42.0',
 'Facebook 79',
 'Chrome 47.0',
 'UC Browser 11.3',
 'Firefox Mobile 58',
 'Facebook 84',
 'Safari 8',
 'Facebook 157',
 'Chrome 11.0',
 'Chrome Mobile iOS 62.0',
 'Yandex Browser 18.3',
 'Mobile Safari 6',
 'Pinterest',
 'Chrome 54.8',
 'Mobile Safari 9.2',
 'Firefox 44',
 'Chrome 56.13',
 'Opera Mini 12.0',
 'Opera Mobile 28.0',
 'Firefox 32',
 'Opera Mini 7.0',
 'Facebook 85',
 'Chrome Mobile 48.0',
 'Vivaldi 1.95',
 'Safari 5.1',
 'Safari 9.1',
 'Chrome 30.0',
 'Mobile Safari 4.0',
 'Facebook 137',
 'Samsung Internet 1',
 'Chrome Mobile 26.0',
 'Facebook 68',
 'Other',
 'Mobile Safari UI/WKWebView 9.3',
 'Facebook 82',
 'Opera Mini 7.6',
 'UC Browser 10.9',
 'Firefox 35',
 'Opera Mini 31.0',
 'Opera Mini 30.0',
 'Opera Mobile 46.3',
 'Chrome 57.19',
 'Maxthon 5.2',
 'Chrome 57.12',
 'Mobile Safari 11.1',
 'Safari 9',
 'Mobile Safari 10.1',
 'Firefox 42',
 'Chrome 57.18',
 'Maxthon 5.1',
 'Chrome 57.2',
 'Opera Mini 35.0',
 'Opera 42.0',
 'UC Browser 12',
 'Chrome 57.20',
 'Chrome Mobile 37',
 'Firefox 41',
 'Chrome Mobile iOS 60.0',
 'Facebook 152',
 'Android 7',
 'Mobile Safari 11.0',
 'Chrome 31.0',
 'Opera 46.0',
 'Chrome 36.0',
 'Safari 10.0',
 'Chrome Mobile 33.0',
 'Firefox 26',
 'Firefox 33',
 'Mobile Safari 7.0',
 'Chrome 28.0',
 'Firefox 39',
 'Chromium 58.0',
 'Firefox 61',
 'Firefox Mobile 61',
 'Opera 48.0',
 'Android 3.2',
 'Chrome 57.17',
 'Facebook 149',
 'Chrome 35.0',
 'Facebook 80',
 'Facebook 83',
 'Chrome 56.17',
 'Puffin 5.2',
 'Opera Mobile 32.0',
 'Opera Mini 7.5',
 'Chrome 48.4',
 'Opera 47.0',
 'Facebook 144',
 'Opera Mini 4.2',
 'Opera Mini 4.5',
 'Edge 17.17133',
 'Opera Mini 32.0',
 'Mobile Safari UI/WKWebView 11.3',
 'Safari 3',
 'Firefox 15.0',
 'Opera 41.0',
 'Chrome 57.3',
 'Opera Mobile 25.0',
 'Mobile Safari 5.1',
 'Chrome Mobile 37.0',
 'Firefox 23',
 'Chrome 57.7',
 'Samsung Internet 1.1',
 'Chrome 57.21',
 'Safari 4',
 'Facebook 151',
 'Yandex Browser 18.1',
 'Safari 10',
 'Chrome 18.0',
 'Chrome Mobile 34',
 'Facebook 139',
 'BingPreview 1',
 'Firefox 28',
 'Chrome 53.19',
 'Facebook 97',
 'Facebook 147',
 'UC Browser 10.7',
 'Firefox Mobile 57',
 'Chromium 63.0']
In [14]:
checked = ['mobile safari', 'chrome mobile', 'ie mobile', 'firefox mobile', 'edge mobile', 'opera mobile',
           'mobile', 'blackberry os', 'blackberry webkit', 'chrome', 'android', 'opera', 'ie', 'firefox', 
           'facebook', 'samsung', 'chromium', 'edge', 'yandex', 'uc', 
           'other', 'safari', 'puffin', 'maxthon', 'vivaldi', 'pinterest']
browser_version_parsed = df['browser_version'].dropna().map(lambda x: x.lower())

def find_browser(browser_version):
    for browser in checked:
        if browser in browser_version:
            return browser
    return "other"


df['browser'] = browser_version_parsed.map(find_browser)
df['browser'] = df['browser'].astype('category')
In [15]:
df[['browser_version', 'browser']].dropna()
display(df['browser'].value_counts().head())
df[['browser_version', 'browser']].head()
df[df['browser'] == 'other']['browser_version']
chrome           87918
chrome mobile    85915
mobile safari     7730
firefox           6777
samsung           4809
Name: browser, dtype: int64
Out[15]:
2143535    WebKit Nightly 535.20
2172409              K-Meleon 76
2203820                    Other
2239081                    Other
2254099    WebKit Nightly 535.20
2277183                    Other
2277184                    Other
2277185                    Other
2277186                    Other
2283962                    Other
2286442                    Other
2331163    WebKit Nightly 535.20
2337195                    Other
Name: browser_version, dtype: object
In [16]:
df[['browser_version', 'browser']].dropna().drop_duplicates().to_csv('data/browsers.csv', index=False)
In [17]:
browsers_csv = pd.read_csv("data/browsers.csv", low_memory=False)
display(browsers_csv.head())
display(browsers_csv.shape)
browser_version browser
0 Chrome Mobile 66.0 chrome mobile
1 Firefox 57 firefox
2 Chrome 66.0 chrome
3 Chrome Mobile 34.0 chrome mobile
4 Chrome Mobile 65.0 chrome mobile
(366, 2)

sessions.csv

Eventos por usuario

Definimos una sesión como una serie de eventos por usuario, los cuales están todos con menos de 30 minutos de inactividad entre el actual y el anterior.

In [18]:
funnel = df

funnel = funnel.sort_values(['person', 'timestamp'])

funnel.head()
Out[18]:
timestamp event person url sku model condition storage color skus ... city region country device_type screen_resolution operating_system_version browser_version brand operating_system browser
1507286 2018-05-17 12:27:47 checkout 0008ed71 NaN 3372.0 Samsung Galaxy S6 Flat Muito Bom 32GB Dourado NaN ... NaN NaN NaN NaN NaN NaN NaN samsung NaN NaN
2336760 2018-05-17 13:44:59 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... Unknown Unknown Brazil Computer 1920x1080 Windows 10 Chrome 66.0 NaN windows chrome
1507716 2018-05-17 13:45:00 checkout 0008ed71 NaN 8247.0 iPhone SE Bom 64GB Cinza espacial NaN ... NaN NaN NaN NaN NaN NaN NaN iphone NaN NaN
2336761 2018-05-17 16:21:54 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... Unknown Unknown Brazil Computer 1920x1080 Windows 10 Chrome 66.0 NaN windows chrome
2122051 2018-05-17 16:22:06 generic listing 0008ed71 NaN NaN NaN NaN NaN NaN 6594,6651,6664,7253,2820,6706,6721,12606,480,1... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 26 columns

In [19]:
funnel['time_diff'] = funnel.groupby('person')['timestamp'].diff()
funnel['time_diff'] = funnel['time_diff'].fillna(0)
funnel['time_diff_min'] = funnel['time_diff'] / np.timedelta64(1, 'm')
In [20]:
funnel[['person', 'event', 'timestamp', 'time_diff', 'time_diff_min']].head()
Out[20]:
person event timestamp time_diff time_diff_min
1507286 0008ed71 checkout 2018-05-17 12:27:47 00:00:00 0.000000
2336760 0008ed71 visited site 2018-05-17 13:44:59 01:17:12 77.200000
1507716 0008ed71 checkout 2018-05-17 13:45:00 00:00:01 0.016667
2336761 0008ed71 visited site 2018-05-17 16:21:54 02:36:54 156.900000
2122051 0008ed71 generic listing 2018-05-17 16:22:06 00:00:12 0.200000
In [21]:
THRESHOLD = 30 # minutos

funnel['new_session'] = funnel['time_diff_min'] > THRESHOLD
funnel['session_id'] = funnel.groupby('person')['new_session'].cumsum()
funnel['session_id'] = funnel['session_id'].astype('int')
In [22]:
funnel[['person', 'new_session', 'session_id']].head(8)
Out[22]:
person new_session session_id
1507286 0008ed71 False 0
2336760 0008ed71 True 1
1507716 0008ed71 False 1
2336761 0008ed71 True 2
2122051 0008ed71 False 2
1505383 0008ed71 False 2
2146920 00091926 False 0
121981 00091926 False 0
In [23]:
gb = funnel.groupby(['person', 'session_id'])


funnel['session_cumno'] = gb.cumcount()                        
funnel['session_total_events'] = pd.Series(np.repeat(gb.size(), gb.size().values)).values
funnel['session_first'] = funnel['session_cumno'] == 0
funnel['session_last'] = funnel['session_cumno'] == (-1+funnel['session_total_events'])
In [24]:
cols = ['person', 'timestamp', 'time_diff_min', \
        'session_id', 'event', 'session_total_events', \
        'session_cumno', 'session_first', 'session_last']

funnel[cols].head()
Out[24]:
person timestamp time_diff_min session_id event session_total_events session_cumno session_first session_last
1507286 0008ed71 2018-05-17 12:27:47 0.000000 0 checkout 1 0 True True
2336760 0008ed71 2018-05-17 13:44:59 77.200000 1 visited site 2 0 True False
1507716 0008ed71 2018-05-17 13:45:00 0.016667 1 checkout 2 1 False True
2336761 0008ed71 2018-05-17 16:21:54 156.900000 2 visited site 3 0 True False
2122051 0008ed71 2018-05-17 16:22:06 0.200000 2 generic listing 3 1 False False
In [25]:
funnel['is_conversion'] = funnel['event'] == 'conversion'
gb = funnel.groupby(['person', 'session_id'])['is_conversion']

funnel['session_total_conversions'] = pd.Series(np.repeat(gb.sum(), gb.size().values)).values
funnel['session_has_conversion'] = funnel['session_total_conversions'] > 0

funnel.head()
Out[25]:
timestamp event person url sku model condition storage color skus ... time_diff_min new_session session_id session_cumno session_total_events session_first session_last is_conversion session_total_conversions session_has_conversion
1507286 2018-05-17 12:27:47 checkout 0008ed71 NaN 3372.0 Samsung Galaxy S6 Flat Muito Bom 32GB Dourado NaN ... 0.000000 False 0 0 1 True True False 0.0 False
2336760 2018-05-17 13:44:59 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... 77.200000 True 1 0 2 True False False 0.0 False
1507716 2018-05-17 13:45:00 checkout 0008ed71 NaN 8247.0 iPhone SE Bom 64GB Cinza espacial NaN ... 0.016667 False 1 1 2 False True False 0.0 False
2336761 2018-05-17 16:21:54 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... 156.900000 True 2 0 3 True False False 0.0 False
2122051 2018-05-17 16:22:06 generic listing 0008ed71 NaN NaN NaN NaN NaN NaN 6594,6651,6664,7253,2820,6706,6721,12606,480,1... ... 0.200000 False 2 1 3 False False False 0.0 False

5 rows × 37 columns

In [26]:
funnel['is_checkout'] = funnel['event'] == 'checkout'
gb = funnel.groupby(['person', 'session_id'])['is_checkout']
funnel['session_total_checkouts'] = pd.Series(np.repeat(gb.sum(), gb.size().values)).values

funnel['session_has_checkout'] = funnel['session_total_checkouts'] > 0

funnel.head()
Out[26]:
timestamp event person url sku model condition storage color skus ... session_cumno session_total_events session_first session_last is_conversion session_total_conversions session_has_conversion is_checkout session_total_checkouts session_has_checkout
1507286 2018-05-17 12:27:47 checkout 0008ed71 NaN 3372.0 Samsung Galaxy S6 Flat Muito Bom 32GB Dourado NaN ... 0 1 True True False 0.0 False True 1.0 True
2336760 2018-05-17 13:44:59 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... 0 2 True False False 0.0 False False 1.0 True
1507716 2018-05-17 13:45:00 checkout 0008ed71 NaN 8247.0 iPhone SE Bom 64GB Cinza espacial NaN ... 1 2 False True False 0.0 False True 1.0 True
2336761 2018-05-17 16:21:54 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... 0 3 True False False 0.0 False False 1.0 True
2122051 2018-05-17 16:22:06 generic listing 0008ed71 NaN NaN NaN NaN NaN NaN 6594,6651,6664,7253,2820,6706,6721,12606,480,1... ... 1 3 False False False 0.0 False False 1.0 True

5 rows × 40 columns

In [27]:
funnel['ad_origin'] = (funnel['event'] == 'ad campaign hit') & ((funnel['session_first']) | (funnel['session_cumno'] == 1))
gb = funnel.groupby(['person', 'session_id'])['ad_origin']

funnel['session_ad'] = pd.Series(np.repeat(gb.sum().values, gb.size().values)).values
funnel['session_ad'] = funnel['session_ad'] > 0

funnel.head()
Out[27]:
timestamp event person url sku model condition storage color skus ... session_first session_last is_conversion session_total_conversions session_has_conversion is_checkout session_total_checkouts session_has_checkout ad_origin session_ad
1507286 2018-05-17 12:27:47 checkout 0008ed71 NaN 3372.0 Samsung Galaxy S6 Flat Muito Bom 32GB Dourado NaN ... True True False 0.0 False True 1.0 True False False
2336760 2018-05-17 13:44:59 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... True False False 0.0 False False 1.0 True False False
1507716 2018-05-17 13:45:00 checkout 0008ed71 NaN 8247.0 iPhone SE Bom 64GB Cinza espacial NaN ... False True False 0.0 False True 1.0 True False False
2336761 2018-05-17 16:21:54 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... True False False 0.0 False False 1.0 True False False
2122051 2018-05-17 16:22:06 generic listing 0008ed71 NaN NaN NaN NaN NaN NaN 6594,6651,6664,7253,2820,6706,6721,12606,480,1... ... False False False 0.0 False False 1.0 True False False

5 rows × 42 columns

In [28]:
gb = funnel.groupby(['person', 'session_id'])['timestamp']

funnel['session_timestamp_first'] = pd.Series(np.repeat(gb.min().values, gb.size().values)).values
funnel['session_timestamp_last'] = pd.Series(np.repeat(gb.max().values, gb.size().values)).values

funnel.head()
Out[28]:
timestamp event person url sku model condition storage color skus ... is_conversion session_total_conversions session_has_conversion is_checkout session_total_checkouts session_has_checkout ad_origin session_ad session_timestamp_first session_timestamp_last
1507286 2018-05-17 12:27:47 checkout 0008ed71 NaN 3372.0 Samsung Galaxy S6 Flat Muito Bom 32GB Dourado NaN ... False 0.0 False True 1.0 True False False 2018-05-17 12:27:47 2018-05-17 12:27:47
2336760 2018-05-17 13:44:59 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... False 0.0 False False 1.0 True False False 2018-05-17 13:44:59 2018-05-17 13:45:00
1507716 2018-05-17 13:45:00 checkout 0008ed71 NaN 8247.0 iPhone SE Bom 64GB Cinza espacial NaN ... False 0.0 False True 1.0 True False False 2018-05-17 13:44:59 2018-05-17 13:45:00
2336761 2018-05-17 16:21:54 visited site 0008ed71 NaN NaN NaN NaN NaN NaN NaN ... False 0.0 False False 1.0 True False False 2018-05-17 16:21:54 2018-05-17 16:28:37
2122051 2018-05-17 16:22:06 generic listing 0008ed71 NaN NaN NaN NaN NaN NaN 6594,6651,6664,7253,2820,6706,6721,12606,480,1... ... False 0.0 False False 1.0 True False False 2018-05-17 16:21:54 2018-05-17 16:28:37

5 rows × 44 columns

In [29]:
funnel[cols].head()
Out[29]:
person timestamp time_diff_min session_id event session_total_events session_cumno session_first session_last
1507286 0008ed71 2018-05-17 12:27:47 0.000000 0 checkout 1 0 True True
2336760 0008ed71 2018-05-17 13:44:59 77.200000 1 visited site 2 0 True False
1507716 0008ed71 2018-05-17 13:45:00 0.016667 1 checkout 2 1 False True
2336761 0008ed71 2018-05-17 16:21:54 156.900000 2 visited site 3 0 True False
2122051 0008ed71 2018-05-17 16:22:06 0.200000 2 generic listing 3 1 False False
In [30]:
cols_csv = ['time_diff_min', \
        'session_id', 'session_total_events', \
        'session_cumno', 'session_first', 'session_last', \
        'session_total_conversions', 'session_has_conversion', \
        'session_total_checkouts', 'session_has_checkout', 'session_ad',
        'session_timestamp_first', 'session_timestamp_last']

funnel = funnel.sort_index()

funnel[cols].head()
Out[30]:
person timestamp time_diff_min session_id event session_total_events session_cumno session_first session_last
0 4886f805 2018-05-18 00:11:59 0.050000 0 viewed product 9 7 False False
1 ad93850f 2018-05-18 00:11:27 0.016667 2 viewed product 31 2 False False
2 0297fc1e 2018-05-18 00:11:16 0.116667 78 viewed product 2 1 False True
3 2d681dd8 2018-05-18 00:11:14 0.616667 0 viewed product 10 5 False False
4 cccea85e 2018-05-18 00:11:09 0.066667 11 viewed product 65 55 False False
In [31]:
cols_check_ord = ['person', 'timestamp', 'time_diff_min', \
        'session_id', 'session_total_events', \
        'session_cumno', 'session_first', 'session_last', \
        'session_total_conversions', 'session_has_conversion', \
        'session_total_checkouts', 'session_has_checkout', 'session_ad',
        'session_timestamp_first', 'session_timestamp_last']

funnel.sort_values(['person', 'timestamp'])[cols_check_ord].head()
Out[31]:
person timestamp time_diff_min session_id session_total_events session_cumno session_first session_last session_total_conversions session_has_conversion session_total_checkouts session_has_checkout session_ad session_timestamp_first session_timestamp_last
1507286 0008ed71 2018-05-17 12:27:47 0.000000 0 1 0 True True 0.0 False 1.0 True False 2018-05-17 12:27:47 2018-05-17 12:27:47
2336760 0008ed71 2018-05-17 13:44:59 77.200000 1 2 0 True False 0.0 False 1.0 True False 2018-05-17 13:44:59 2018-05-17 13:45:00
1507716 0008ed71 2018-05-17 13:45:00 0.016667 1 2 1 False True 0.0 False 1.0 True False 2018-05-17 13:44:59 2018-05-17 13:45:00
2336761 0008ed71 2018-05-17 16:21:54 156.900000 2 3 0 True False 0.0 False 1.0 True False 2018-05-17 16:21:54 2018-05-17 16:28:37
2122051 0008ed71 2018-05-17 16:22:06 0.200000 2 3 1 False False 0.0 False 1.0 True False 2018-05-17 16:21:54 2018-05-17 16:28:37
In [32]:
funnel[cols_csv].to_csv('data/sessions.csv', index=False)
In [33]:
sessions_csv = pd.read_csv("data/sessions.csv", low_memory=False)
display(sessions_csv.head())
display(sessions_csv.shape)
time_diff_min session_id session_total_events session_cumno session_first session_last session_total_conversions session_has_conversion session_total_checkouts session_has_checkout session_ad session_timestamp_first session_timestamp_last
0 0.050000 0 9 7 False False 0.0 False 1.0 True False 2018-05-18 00:07:22 2018-05-18 00:30:30
1 0.016667 2 31 2 False False 0.0 False 0.0 False True 2018-05-18 00:11:26 2018-05-18 00:23:33
2 0.116667 78 2 1 False True 0.0 False 0.0 False False 2018-05-18 00:11:09 2018-05-18 00:11:16
3 0.616667 0 10 5 False False 0.0 False 0.0 False False 2018-05-18 00:08:29 2018-05-18 00:11:25
4 0.066667 11 65 55 False False 0.0 False 0.0 False False 2018-05-17 23:52:57 2018-05-18 00:12:50
(2341681, 13)

prices.csv

In [2]:
mapping_conditions = {'Bom':15,
                      'Muito Bom': 6,
                      'Excelente': 5,
                      'Bom - Sem Touch ID':17,
                      'Novo':18,
                       np.nan:0
                     }

df_tmp = df.loc[df['model'].notnull()][['sku','model','color','storage','condition']]
df_tmp['to_search'] = df_tmp['model'] + ' ' + df_tmp['storage'].astype(str) + ' ' + df_tmp['color']
df_tmp['condition_n'] = df_tmp['condition'].transform(lambda x: mapping_conditions[x])
df_tmp['to_search+condition_n'] = df_tmp['to_search'] + ' ' + df_tmp['condition_n'].astype(str)

df_tmp = df_tmp.drop_duplicates()
df_tmp = df_tmp.dropna()
display(df_tmp.head())

dic_modelos_condiciones = {}
modelos = df_tmp['to_search'].tolist()
condiciones = df_tmp['condition_n'].tolist()
for m,c in zip(modelos,condiciones):
    if m in dic_modelos_condiciones:
        dic_modelos_condiciones[m].append(c)
    else: dic_modelos_condiciones[m] = [c]
sku model color storage condition to_search condition_n to_search+condition_n
0 9288.0 Samsung Galaxy J7 Prime Dourado 32GB Excelente Samsung Galaxy J7 Prime 32GB Dourado 5 Samsung Galaxy J7 Prime 32GB Dourado 5
1 304.0 iPhone 5s Cinza espacial 32GB Muito Bom iPhone 5s 32GB Cinza espacial 6 iPhone 5s 32GB Cinza espacial 6
2 6888.0 iPhone 6S Prateado 64GB Muito Bom iPhone 6S 64GB Prateado 6 iPhone 6S 64GB Prateado 6
3 11890.0 iPhone 7 Vermelho 128GB Bom iPhone 7 128GB Vermelho 15 iPhone 7 128GB Vermelho 15
4 7517.0 LG G4 H818P Branco 32GB Excelente LG G4 H818P 32GB Branco 5 LG G4 H818P 32GB Branco 5
In [5]:
def request_soup(url):
    try:
        req = requests.get(url)
        html = req.content
        soup = BeautifulSoup(html, "lxml")
    except requests.exceptions.ConnectionError:
        raise ConnectionError(f"Connection with {url} refused.")
    return soup

BASE = 'https://www.trocafone.com/comprar/list?'
MODELO = 'q'
CONDICION = 'condition'

precios = {}
i=1
for modelo, condiciones in dic_modelos_condiciones.items(): 
    print(f'Iteración {i} de {len(dic_modelos_condiciones)}:')
    modelo_s = modelo.replace(' ','+')
    for condicion in condiciones:
        url = f'{BASE}{MODELO}={modelo_s}&{CONDICION}={condicion}'
        print(f'\t * {url}: ',end='')
        soup = request_soup(url)
        precio = soup.find(class_='price-value')
        if precio and precio.string:
            precio = precio.string
        else: precio = 0
        precio.replace('.','').replace(',','.') # (pasar de 1.2900,00 a 12900.00, para que Python se lleve bien con el float)
        print(f'${precio}')
        precios[f'{modelo} {condicion}'] = precio
    i+=1
Iteración 1 de 810:
	 * https://www.trocafone.com/comprar/list?q=Samsung+Galaxy+J7+Prime+32GB+Dourado&condition=5: $699,00
	 * https://www.trocafone.com/comprar/list?q=Samsung+Galaxy+J7+Prime+32GB+Dourado&condition=6: $2.569,00
	 * https://www.trocafone.com/comprar/list?q=Samsung+Galaxy+J7+Prime+32GB+Dourado&condition=15: $2.229,00
Iteración 2 de 810:
	 * https://www.trocafone.com/comprar/list?q=iPhone+5s+32GB+Cinza+espacial&condition=6: $879,00
	 * https://www.trocafone.com/comprar/list?q=iPhone+5s+32GB+Cinza+espacial&condition=5: $939,00
	 * https://www.trocafone.com/comprar/list?q=iPhone+5s+32GB+Cinza+espacial&condition=17: $949,00
	 * https://www.trocafone.com/comprar/list?q=iPhone+5s+32GB+Cinza+espacial&condition=15: $779,00
Iteración 3 de 810:
	 * https://www.trocafone.com/comprar/list?q=iPhone+6S+64GB+Prateado&condition=6: $1.779,00
	 * https://www.trocafone.com/comprar/list?q=iPhone+6S+64GB+Prateado&condition=5: 

KeyboardInterrupt: Proceso muy costoso en tiempo. Se corta la ejecución, que ya sus resultados fueron guardados
In [ ]:
orden = df_tmp['to_search+condition_n']

precios_ordenado = []

for modelo in orden:
    precios_ordenado.append(precios[modelo])

df_tmp['precio_reales'] = precios_ordenado
In [ ]:
df_tmp[['sku','precio_reales']].to_csv('data/prices.csv', index=False)
In [6]:
prices_csv = pd.read_csv("data/prices.csv", low_memory=False)
display(prices_csv.head())
display(prices_csv.shape)
sku precio_reales
0 9288.0 749.0
1 304.0 1679.0
2 6888.0 2329.0
3 11890.0 2469.0
4 7517.0 1939.0
(2328, 2)