API Connections, Dataframes + Pandas

Objectives
Introduction
HTTP Requests
OAuth
1. Access Tokens
2. OAuth Requests
Pandas
1. Debugging
Visualization
Summary
More APIs

Objectives

Explain an HTTP Request
Explain OAuth
Use the requests package to make HTTP Get Requests
Use the requests to make HTTP Get Requests with headers and url parameters
Create a DataFrame using the Pandas Package
Use basic Pandas methods such as:
- df.head() / df.tail()
- df[col].plot(kind = 'barh')
Visualize Results Using Folium

Introduction

APIs are a big buzzword in the tech industry. So what is an API you ask? API stands for Application Program Interface. Think of it as a protocol for how to make requests and communicate with another server.

But before we get to APIs, we should have a general understanding of how HTTP requests work. Often, we just type in a website domain into the url bar and hit go. Sometimes we don't even do that, we just google it and click the link. A lot is happening in the background. Let's explore this process a little further.

HTTP Requests

HTTP stands for Hyper Text Transfer Protocol. This protocol (like many) was proposed by the Internet Engineering Task Force (IETF) through a request for comments (RFC). We're going to start with a very simple HTTP method: the get method.

To learn more about HTTP methods see:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods

Python's Requests Package

The first thing to understand when dealing with APIs is how to make get requests in general. To do this, we'll use the Python requests package.

https://requests.readthedocs.io/en/master/

Making a get request

Let's take a look at how to make a get request with Python to retrieve a web page.

In [1]:

import requests

In [2]:

response = requests.get('https://flatironschool.com')
print('Type:', type(response), '\n')
print('Response:', response, '\n')
print('Response text:\n', response.text[:500])

Type: <class 'requests.models.Response'> 

Response: <Response [200]> 

Response text:
 <!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta http-equiv="x-ua-compatible" content="ie=edge"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta name="google-site-verification" content="X--Dsxcv97NzPomhlz80wswUgUOF8iMxYhmaY-qNHFY"/><link rel="dns-prefetch" href="https://images.ctfassets.net"/><link rel="dns-prefetch" href="https://www.googletagmanager.com"/><link rel="dns-prefetch" href="https://www.google-analytics.com"/><link rel="dns-prefetch" hre

Voila! We just retrieved a web page using python rather then our web browser! First, you can see the HTTP Response code. From there, we accessed the web page itself (or more precisely the HTML doc) via the .text method of the response object. Let's take a quick aside and look at some other common HTTP response codes:

HTTP Response Codes

Let's try retrieving another webpage:

In [3]:

#The Electronic Frontier Foundation (EFF) website; advocating for data privacy and an open internet
response = requests.get('https://www.eff.org')
print(response)
print(response.text[:2500])

<Response [200]>
<!DOCTYPE html>
  <!--[if IEMobile 7]><html class="no-js ie iem7" lang="en" dir="ltr"><![endif]-->
  <!--[if lte IE 6]><html class="no-js ie lt-ie9 lt-ie8 lt-ie7" lang="en" dir="ltr"><![endif]-->
  <!--[if (IE 7)&(!IEMobile)]><html class="no-js ie lt-ie9 lt-ie8" lang="en" dir="ltr"><![endif]-->
  <!--[if IE 8]><html class="no-js ie lt-ie9" lang="en" dir="ltr"><![endif]-->
  <!--[if (gte IE 9)|(gt IEMobile 7)]><html class="no-js ie" lang="en" dir="ltr" prefix="fb: http://ogp.me/ns/fb# og: http://ogp.me/ns#"><![endif]-->
  <!--[if !IE]><!--><html class="no-js" lang="en" dir="ltr" prefix="fb: http://ogp.me/ns/fb# og: http://ogp.me/ns#"><!--<![endif]-->
<head>
  <meta charset="utf-8" />
<link href="https://www.eff.org/es" rel="alternate" hreflang="es" />
<link href="https://www.eff.org/sv" rel="alternate" hreflang="sv" />
<link href="https://www.eff.org/th" rel="alternate" hreflang="th" />
<link href="https://www.eff.org/tr" rel="alternate" hreflang="tr" />
<link href="https://www.eff.org/ur" rel="alternate" hreflang="ur" />
<link href="https://www.eff.org/vi" rel="alternate" hreflang="vi" />
<link rel="shortcut icon" href="https://www.eff.org/sites/all/themes/frontier/favicon.ico" type="image/vnd.microsoft.icon" />
<link href="https://www.eff.org/ru" rel="alternate" hreflang="ru" />
<link rel="profile" href="http://www.w3.org/1999/xhtml/vocab" />
<meta name="HandheldFriendly" content="true" />
<meta name="MobileOptimized" content="width" />
<meta http-equiv="cleartype" content="on" />
<link rel="apple-touch-icon-precomposed" href="https://www.eff.org/sites/all/themes/phoenix/apple-touch-icon-precomposed-72x72.png" sizes="72x72" />
<link rel="apple-touch-icon-precomposed" href="https://www.eff.org/sites/all/themes/phoenix/apple-touch-icon-precomposed-114x114.png" sizes="114x114" />
<link rel="apple-touch-icon-precomposed" href="https://www.eff.org/sites/all/themes/phoenix/apple-touch-icon-precomposed-144x144.png" sizes="144x144" />
<link href="https://www.eff.org/sh" rel="alternate" hreflang="sh" />
<link rel="apple-touch-icon-precomposed" href="https://www.eff.org/sites/all/themes/phoenix/apple-touch-icon-precomposed.png" />
<link href="https://www.eff.org/ro" rel="alternate" hreflang="ro" />
<link href="https://www.eff.org/am" rel="alternate" hreflang="am" />
<link href="https://www.eff.org/fr" rel="alternate" hreflang="fr" />
<link href="https://www.eff.org/" rel="alternate" hreflang="en" />
<link href="https://www.eff.org/nl" rel="alternate" hreflang="nl"

Success! As you can see, the response.text is the html code for the given url that we requested. In the background, this forms the basis for web browsers themselves. Every time you put in a new url or click on a link your computer makes a get request for that particular page and then the browser itself renders that page into a visual display on screen.

OAuth

Some requests are a bit more complicated. Often, websites require identity verification such as logins. This helps a variety of issues such as privacy concerns, limiting access to content and tracking users history. Going forward, OAuth has furthered this idea by allowing third parties such as apps access to user information without providing the underlying password itself.

In the words of the Internet Engineering Task Force, "The OAuth 2.0 authorization framework enables a third-party application to obtain limited access to an HTTP service, either on behalf of a resource owner by orchestrating an approval interaction between the resource owner and the HTTP service, or by allowing the third-party application to obtain access on its own behalf. This specification replaces and obsoletes the OAuth 1.0 protocol described in RFC 5849."

See https://oauth.net/2/ or https://tools.ietf.org/html/rfc6749 for more details.

Alternatively, for a specific case check out Yelp's authentication guide, which we're about to check out!

Access Tokens

With that, lets go grab an access token from an API site and make some API calls! Point your browser over to this yelp page and start creating an app in order to obtain and api access token:

Now it's time to start making some api calls!

In [4]:

#As a general rule of thumb, don't store passwords in a main file like this!
#Instead, you would normally store those passwords under a sub file like passwords.py which you would then import.
#This code snippet might not work if repeatedly used.
client_id = '######'
api_key = '######'

Example Request with OAuth

Great! Now that we have our access tokens set up we can make a request to the API. Every API has a specific format required to make requests, or calls to the server. Here's how to implement the yelp API using the Python requests package. In future lessons, you'll practice translating an APIs documenation yourself! As a reference, here's the documentation to the Yelp API: https://www.yelp.com/developers/documentation/v3/get_started

In [5]:

term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 10

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text[:1000])

<Response [200]>
<class 'str'>
{"businesses": [{"id": "yvva7IYpD6J7OfKlCdQrkw", "alias": "mi-espiguita-taqueria-astoria", "name": "Mi Espiguita Taqueria", "image_url": "https://s3-media2.fl.yelpcdn.com/bphoto/TEho39G01VJX05mNhI8W8A/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/mi-espiguita-taqueria-astoria?adjust_creative=xNHtXRpNa-MXGFJJTHHUvw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=xNHtXRpNa-MXGFJJTHHUvw", "review_count": 113, "categories": [{"alias": "mexican", "title": "Mexican"}], "rating": 4.5, "coordinates": {"latitude": 40.7612033639422, "longitude": -73.9261436462402}, "transactions": ["pickup", "delivery"], "price": "$", "location": {"address1": "32-44 31st St", "address2": "", "address3": "", "city": "Astoria", "zip_code": "11106", "country": "US", "state": "NY", "display_address": ["32-44 31st St", "Astoria, NY 11106"]}, "phone": "+17187775648", "display_phone": "(718) 777-5648", "distance": 714.301080232381}, {"id": "jzVv_21473lAMYXIhVbuTA", "alias": "de-mole

JSON

MMMM Look at that! We have a nice nifty little return now! As you can see, the contents of the response is formatted as a string but what kind of data structures does this remind you of?

To start there's the outer curly brackets:
{"businesses":

Hopefully you're thinking 'hey that's just like a python dictionary!'

Then within that we have what appears to be a list of dictionaries:
[{"id": "jeWIYbgBho9vBDhc5S1xvg",

This response is an example of a json (Javascript Object Notation) format. You can read more about json here, but it's pretty similar to the data structures you've already seen in python.

DataFrames and Pandas

We can also take json and convert it into a DataFrame, a spreadsheet style object (ala excel), using the Pandas package:

In [6]:

#import the package under an alias (short typing in the future)
import pandas as pd

In [7]:

#Create a dataframe
df = pd.DataFrame.from_dict(response.json())

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-3fd8d17ec009> in <module>
      1 #Create a dataframe
----> 2 df = pd.DataFrame.from_dict(response.json())

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in from_dict(cls, data, orient, dtype, columns)
   1136             raise ValueError('only recognize index or columns for orient')
   1137 
-> 1138         return cls(data, index=index, columns=columns, dtype=dtype)
   1139 
   1140     def to_numpy(self, dtype=None, copy=False):

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    390                                  dtype=dtype, copy=copy)
    391         elif isinstance(data, dict):
--> 392             mgr = init_dict(data, index, columns, dtype=dtype)
    393         elif isinstance(data, ma.MaskedArray):
    394             import numpy.ma.mrecords as mrecords

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    210         arrays = [data[k] for k in keys]
    211 
--> 212     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    213 
    214 

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     49     # figure out the index, if necessary
     50     if index is None:
---> 51         index = extract_index(arrays)
     52     else:
     53         index = ensure_index(index)

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in extract_index(data)
    318 
    319             if have_dicts:
--> 320                 raise ValueError('Mixing dicts with non-Series may lead to '
    321                                  'ambiguous ordering.')
    322 

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

Debugging

Whoops, what's going on here!? Well, notice from our previous preview of the response that we saw there were a hierarhcy within the response. Let's begin to investigate further to see what the problem is.

First, recall that the overall strucutre of the response was a dictionary. Let's look at what those values are.

In [8]:

response.json().keys()

Out[8]:

dict_keys(['businesses', 'total', 'region'])

Now let's go a bit further and start to preview what's stored in each of the values for these keys.

In [9]:

for key in response.json().keys():
    print(key)
    value = response.json()[key] #Use standard dictionary formatting
    print(type(value)) #What type is it?
    print('\n\n') #Seperate out data

businesses
<class 'list'>



total
<class 'int'>



region
<class 'dict'>

Let's continue to preview these further to get a little better acquainted.

In [10]:

response.json()['businesses'][:2]

Out[10]:

[{'id': 'yvva7IYpD6J7OfKlCdQrkw',
  'alias': 'mi-espiguita-taqueria-astoria',
  'name': 'Mi Espiguita Taqueria',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/TEho39G01VJX05mNhI8W8A/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/mi-espiguita-taqueria-astoria?adjust_creative=xNHtXRpNa-MXGFJJTHHUvw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=xNHtXRpNa-MXGFJJTHHUvw',
  'review_count': 113,
  'categories': [{'alias': 'mexican', 'title': 'Mexican'}],
  'rating': 4.5,
  'coordinates': {'latitude': 40.7612033639422,
   'longitude': -73.9261436462402},
  'transactions': ['pickup', 'delivery'],
  'price': '$',
  'location': {'address1': '32-44 31st St',
   'address2': '',
   'address3': '',
   'city': 'Astoria',
   'zip_code': '11106',
   'country': 'US',
   'state': 'NY',
   'display_address': ['32-44 31st St', 'Astoria, NY 11106']},
  'phone': '+17187775648',
  'display_phone': '(718) 777-5648',
  'distance': 714.301080232381},
 {'id': 'jzVv_21473lAMYXIhVbuTA',
  'alias': 'de-mole-astoria-astoria',
  'name': 'De Mole Astoria',
  'image_url': 'https://s3-media4.fl.yelpcdn.com/bphoto/w56szEF0EMQ2s8DBGv4icg/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/de-mole-astoria-astoria?adjust_creative=xNHtXRpNa-MXGFJJTHHUvw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=xNHtXRpNa-MXGFJJTHHUvw',
  'review_count': 369,
  'categories': [{'alias': 'mexican', 'title': 'Mexican'}],
  'rating': 4.0,
  'coordinates': {'latitude': 40.7625999, 'longitude': -73.9129028},
  'transactions': ['pickup', 'delivery'],
  'price': ' \(  \) ',
  'location': {'address1': '4220 30th Ave',
   'address2': '',
   'address3': '',
   'city': 'Astoria',
   'zip_code': '11103',
   'country': 'US',
   'state': 'NY',
   'display_address': ['4220 30th Ave', 'Astoria, NY 11103']},
  'phone': '+17187771655',
  'display_phone': '(718) 777-1655',
  'distance': 915.60744589032}]

In [11]:

response.json()['total']

Out[11]:

In [12]:

response.json()['region']

Out[12]:

{'center': {'longitude': -73.92219543457031, 'latitude': 40.76688875374591}}

As you can see, we're primarily interested in the 'bussinesses' entry.

Let's go ahead and create a dataframe from that.

In [13]:

df = pd.DataFrame.from_dict(response.json()['businesses'])
print(len(df)) #Print how many rows
print(df.columns) #Print column names
df.head() #Previews the first five rows. 
#You could also write df.head(10) to preview 10 rows or df.tail() to see the bottom

10
Index(['alias', 'categories', 'coordinates', 'display_phone', 'distance', 'id',
       'image_url', 'is_closed', 'location', 'name', 'phone', 'price',
       'rating', 'review_count', 'transactions', 'url'],
      dtype='object')

Out[13]:

	alias	categories	coordinates	display_phone	distance	id	image_url	is_closed	location	name	phone	price	rating	review_count	transactions	url
0	mi-espiguita-taqueria-astoria	[{'alias': 'mexican', 'title': 'Mexican'}]	{'latitude': 40.7612033639422, 'longitude': -7...	(718) 777-5648	714.301080	yvva7IYpD6J7OfKlCdQrkw	https://s3-media2.fl.yelpcdn.com/bphoto/TEho39...	False	{'address1': '32-44 31st St', 'address2': '', ...	Mi Espiguita Taqueria	+17187775648	$	4.5	113	[pickup, delivery]	https://www.yelp.com/biz/mi-espiguita-taqueria...
1	de-mole-astoria-astoria	[{'alias': 'mexican', 'title': 'Mexican'}]	{'latitude': 40.7625999, 'longitude': -73.9129...	(718) 777-1655	915.607446	jzVv_21473lAMYXIhVbuTA	https://s3-media4.fl.yelpcdn.com/bphoto/w56szE...	False	{'address1': '4220 30th Ave', 'address2': '', ...	De Mole Astoria	+17187771655		4.0	369	[pickup, delivery]	https://www.yelp.com/biz/de-mole-astoria-astor...
2	chela-and-garnacha-astoria	[{'alias': 'mexican', 'title': 'Mexican'}, {'a...	{'latitude': 40.7557171543477, 'longitude': -7...	(917) 832-6876	1318.326547	AUyKmFjpaVLwc3awfUnqgQ	https://s3-media1.fl.yelpcdn.com/bphoto/ChVbA1...	False	{'address1': '33-09 36th Ave', 'address2': '',...	Chela & Garnacha	+19178326876		4.5	374	[pickup, delivery]	https://www.yelp.com/biz/chela-and-garnacha-as...
3	la-flor-vieja-queens	[{'alias': 'mexican', 'title': 'Mexican'}]	{'latitude': 40.76401, 'longitude': -73.92234}	(347) 448-6120	322.282100	OwpRLG5SmzMn14WDvE9trQ	https://s3-media2.fl.yelpcdn.com/bphoto/lqMJgl...	False	{'address1': '3203 31st Ave', 'address2': None...	La flor vieja	+13474486120	NaN	5.0	4	[pickup, delivery]	https://www.yelp.com/biz/la-flor-vieja-queens?...
4	las-catrinas-mexican-bar-and-eatery-astoria	[{'alias': 'cocktailbars', 'title': 'Cocktail ...	{'latitude': 40.7614214682633, 'longitude': -7...	(917) 745-0969	642.525771	6AJwsgXr7YwsqneGVAdgzw	https://s3-media4.fl.yelpcdn.com/bphoto/xJzg6W...	False	{'address1': '32-02 Broadway', 'address2': '',...	Las Catrinas Mexican Bar & Eatery	+19177450969		4.0	319	[pickup, delivery]	https://www.yelp.com/biz/las-catrinas-mexican-...

Visualization

Finally, let's put this all together and build a little map! The folium package makes this very easy. First, lets build the basemap.

In [14]:

import folium

In [15]:

#Retrieve the Latitude and Longitude from the Yelp Response
lat_long = response.json()['region']['center']
lat = lat_long['latitude']
long = lat_long['longitude']

#Create a map of the area
yelp_map = folium.Map([lat, long])
yelp_map

Out[15]:

Adding the Mexican Restaurants

Now let's add the restuarants. Remember, that you can pull up the docstring to see how a method works by simply putting a '?'. Let's briefly inspect how the .Marker method works in Folium.

In [16]:

folium.Marker?

Great. Now let's finish building the map!

In [18]:

for row in df.index:
    lat_long = df['coordinates'][row]
    lat = lat_long['latitude']
    long = lat_long['longitude']
    name = df['name'][row]
    rating = df['rating'][row]
    price = df['price'][row]
    details = '{} Price: {} Rating:{}'.format(name,price,rating)
    marker = folium.Marker([lat, long], popup=details)
    marker.add_to(yelp_map)
yelp_map

Out[18]:

Summary

Congratulations! We've covered a lot here! We started with HTTP requests, one of the fundamental protocols underlying the internet that we know and love. From there, we further investigated OAuth and saw how to get an access token to use in an API such as yelp. Then we made some requests to retrieve information that came back as a json format. We then transformed this data into a dataframe using the Pandas package. Finally, we created an initial visualization of the data that we retrieved using folium.

More APIs to Checkout

Google Maps
Twitter
AWS
IBM's Watson
Yelp

Python Api Intro Yelp