2. Programowanie HTTP

2.1. Biblioteki standardowe

2.1.1. http

2.1.2. urllib

Code Listing 2.5. ściąganie danych z internetu, które trzeba rozpakować, Dane są w formacie TSV (tab separator values), można je rozpakować modułem CSV i podać jako delimiter='\t'
import os
import urllib.request
import zipfile

data_path = 'data'

os.makedirs(data_path, exist_ok=True)

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip'
file_name = url.split('/')[-1]
dest_file = os.path.join(data_path, file_name)

data_file = 'SMSSpamCollection'
data_full = os.path.join(data_path, data_file)

urllib.request.urlretrieve(url, dest_file)

with zipfile.ZipFile(dest_file) as zip_file:
    zip_file.extract(data_file, path=data_path)

2.2. Biblioteki zewnętrzne

2.2.1. suds

2.2.2. requests

>>> import requests

>>> requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> requests.delete('http://httpbin.org/delete')
>>> requests.head('http://httpbin.org/get')
>>> requests.options('http://httpbin.org/get')
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
>>> import requests

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
>>> r.text
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get(url, headers=headers)
>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
  "form": {
    "key2": "value2",
    "key1": "value1"
>>> r = requests.head('http://github.com', allow_redirects=True)

>>> r.url

>>> r.history
[<Response [301]>]
>>> import json

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, data=json.dumps(payload))
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, json=payload)

2.2.3. Requests OAuth


pip install requests_oauthlib
Code Listing 2.6. Requests OAuth
from requests_oauthlib import OAuth2Session

from flask import Flask, request, redirect, session, url_for
from flask.json import jsonify

# This information is obtained upon registration of a new GitHub
client_id = "<your client key>"
client_secret = "<your client secret>"
authorization_base_url = 'https://github.com/login/oauth/authorize'
token_url = 'https://github.com/login/oauth/access_token'

def login():
    github = OAuth2Session(client_id)
    authorization_url, state = github.authorization_url(authorization_base_url)

    # State is used to prevent CSRF, keep this for later.
    session['oauth_state'] = state
    return redirect(authorization_url)

def callback():
    github = OAuth2Session(client_id, state=session['oauth_state'])
    token = github.fetch_token(token_url, client_secret=client_secret,

    return jsonify(github.get('https://api.github.com/user').json())

2.2.4. HTML Scrapping i BeautifulSoup

$ pip install beautifulsoup4
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
# <title>The Dormouse's story</title>

# u'title'

# u'The Dormouse's story'

# u'head'

# <p class="title"><b>The Dormouse's story</b></p>

# u'title'

# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
for link in soup.find_all('a'):

# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
# The Dormouse's story
# The Dormouse's story
# Once upon a time there were three little sisters; and their names were
# Elsie,
# Lacie and
# Tillie;
# and they lived at the bottom of a well.
# ...

2.3. Standard WSGI

2.4. Frameworki i technologie webowe

2.4.1. Google App Engine

A powerful platform to build apps and scale automatically

  • Popular Languages - Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—or bring your own language runtime
  • Open & Flexible - Custom runtimes allow you to bring any library and framework to App Engine by supplying a Docker container
  • Fully Managed - A fully managed environment lets you focus on code while App Engine manages infrastructure concerns
  • Monitoring, Logging & Diagnostics - Google Stackdriver gives you powerful application diagnostics to debug and monitor the health and performance of your app
  • Application Versioning - Easily host different versions of your app, easily create development, test, staging, and production environments
  • Traffic Splitting - Route incoming requests to different app versions, A/B test and do incremental feature rollouts
  • Services Ecosystem - Tap a growing ecosystem of GCP services from your app including an excellent suite of cloud developer tools

2.4.2. django

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.

  • Ridiculously fast - Django was designed to help developers take applications from concept to completion as quickly as possible.
  • Reassuringly secure - Django takes security seriously and helps developers avoid many common security mistakes.
  • Exceedingly scalable - Some of the busiest sites on the Web leverage Django’s ability to quickly and flexibly scale.
$ pip install django

2.4.3. flask

Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions. And before you ask: It’s BSD licensed!

$ pip install Flask
$ python hello.py
 * Running on http://localhost:5000/
$ export FLASK_APP=hello.py
$ python -m flask run --host=
 * Running on
Code Listing 2.7. Simple usage of Flask
from flask import Flask

app = Flask(__name__)

def hello():
    return "Hello World!"

if __name__ == "__main__":
Code Listing 2.8. Flask using templates and data from user
from flask import json
from flask import Response
from flask import render_template
from flask import Flask

app = Flask(__name__)

def summary():
    data = {'first_name': 'Jose', 'last_name': 'Jimenez'}

    return Response(

def show_post(post_id):
    # show the post with the given id, the id is an integer
    return 'Post %d' % post_id

def hello(name=None):
    return render_template('hello.html', name=name)

2.4.4. webapp2

webapp2 is a lightweight Python web framework compatible with Google App Engine’s webapp.

  • webapp2 is a simple - it follows the simplicity of webapp, but improves it in some ways: it adds better URI routing and exception handling, a full featured response object and a more flexible dispatching mechanism.
  • webapp2 also offers the package webapp2_extras - with several optional utilities: sessions, localization, internationalization, domain and subdomain routing, secure cookies and others.
  • webapp2 can also be used outside of Google App Engine, independently of the App Engine SDK.
application: helloworld
version: 1
runtime: python27
api_version: 1
threadsafe: true

- url: /.*
  script: main.app
import webapp2

class HelloWebapp2(webapp2.RequestHandler):
    def get(self):
        self.response.write('Hello, webapp2!')

app = webapp2.WSGIApplication([
    ('/', HelloWebapp2),
], debug=True)

2.4.5. tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

def make_app():
    return tornado.web.Application([
        (r"/", MainHandler),

if __name__ == "__main__":
    app = make_app()

2.4.6. Formatowanie JSON

$ echo '{"json": "obj"}' | python -m json.tool
    "json": "obj"
$ echo '{1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
$ curl https://api.github.com/repos/django/django/commits |python -m json.tool


2.5. Utils

2.5.1. atlassian-python-api

from atlassian import Confluence
from atlassian import Jira

jira = Jira(

confluence = Confluence(

JQL = 'project = DEMO AND status NOT IN (Closed, Resolved) ORDER BY issuekey'
data = jira.jql(JQL)

status = confluence.create_page(
    title='This is the title',
    body=f'This is the body. You can use <strong>HTML tags</strong>!<div>{data}</div>')


2.6. Template

2.6.1. Jinja2

<title>{% block title %}{% endblock %}</title>
{% for user in users %}
  <li><a href="{{ user.url }}">{{ user.username }}</a></li>
{% endfor %}

2.7. Przykłady praktyczne

2.7.1. Prosty serwer HTTP

$ python -m http.server 8000 --bind
import re
from http.server import BaseHTTPRequestHandler
from http.server import HTTPServer

SERVER = ('localhost', 8080)

class RequestHandler(BaseHTTPRequestHandler):
    def do_HEAD(self):
        self.send_header('Content-type', 'text/html')

    def do_GET(self):
        self.wfile.write('<body>Hello World!</body>')

    def do_POST(self):
        if re.search('/api/v1/*', self.path):
            content_length = int(self.headers['Content-Length'])
            post_data = self.rfile.read(content_length)

            self.wfile.write('<body>Hello World!</body>')

    print('Starting server {SERVER}, use <Ctrl-C> to stop')
    httpd = HTTPServer(SERVER, RequestHandler)

except KeyboardInterrupt:
    print ('^C received, shutting down the web server...')

2.8. Zadania kontrolne

2.8.1. REST API

  1. Używając biblioteki standardowej w Pythonie zaciągnij informacje o repozytoriach użytkownika Django na https://github.com

  2. w przeglądarce internetowej wygeneruj w swoim profilu token https://github.com/settings/tokens

  3. Następnie z przeglądnij listę z poziomu Pythona i znajdź URL dla repozytorium django.

    "name": "django",
    "full_name": "django/django",
    # wyszukaj "commits_url": ???
  4. Przeglądnij to repozytorium i jego listę commitów.

  5. Podaj datę i opis ostatniego commita

  6. Znajdź numery ID ticketów (Fixed #...) z issue trackera, które zostały rozwiązane w ostatnim miesiącu

  7. Spróbuj skorzystać zamiast biblioteki standardowej z pakietu requests


GET /orgs/django/repos
GET /repos/django/django/commits
$ curl https://api.github.com/orgs/django/repos
$ curl https://api.github.com/repos/django/django/commits
>>> auth = b'username:token'
>>> key = base64.b64encode(auth).decode("ascii")
>>> headers={
...     'Authorization': 'Basic {key}',
...     'User-Agent': 'Python HTTP',
... }

# ...

>>> body = resp.read().decode()
>>> data = json.loads(body)
Co zadanie sprawdza?:
  • Komunikacja HTTP (request, response)
  • Parsowanie odpowiedzi HTTP
  • Sprawdzanie stanu połączenia
  • Serializacja i parsowanie JSON
  • Korzystanie z API i dokumentacji
  • Regexpy
  • Używanie biblioteki standardowej i bibliotek zewnętrznych