GitHub
bowmanjd.com/ttl/python-data
repl.it

Data Wrangling with Python

Presented by Jonathan Bowman
jbowman@candoris.com

Salesforce Project Manager at Candoris

Interruptions welcome

Favorite Monty Python line

or

Most memorable snake story

Python

When “clicking through” your problems no longer works,

try Python

or Powershell, or Javascript, or Ruby, or Go...

Documentation

Automate the Boring Stuff book cover

Should I use Python 2 or 3?

Installing Python

$ python
Python 3.7.3 (default, Mar 27 2019, 13:36:35) 
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Consider version control

Editors

Python basics

def greet(greeting):
    return greeting + ", World!"


greet("Hello")
def greet(greeting="Hello"):
    return greeting + ", World!"


greet()
def greet(greeting="Hello", audience="World"):
    return f"{greeting}, {audience}!"


greet()
greet("Salutations", "Galaxy")
greet(audience="Galaxy")
a_number = 12
another_number = 7.1
a_string = "Some text"
another_string = "Some more text"
a_range = range(10)
a_list = ["Some text", 14, another_number, "的", 1]
a_lonely_number = a_list[4]
a_dict = {"a_key": "a_value",
          "first_name": "Sheila",
          "pi": 3.14159}

archimedes_constant = a_dict["pi"]
import random
import random as rnd
from random import randint

random.randint(1, 10)
rnd.randint(1, 10)
randint(1, 10)
def list_random_numbers(quantity, maximum=10):
    for i in range(quantity):
        print(random.randint(0, maximum))

(If you need cryptographically strong random numbers, use the secrets module instead of the random module.)

def compare(a, b):
    if a == b:
        print("equality")
    if not a == b:
        print("inequality")
    if a > b:
        print("greater than")
    else:
        print("less than or equal")

File handling

def print_file(filename="sample.csv"):
    f = open(filename)
    contents = f.read()
    print(contents)
    f.close()
def walk_file(filename="sample.csv"):
    f = open(filename)
    for line in f:
        print(line, end="")
        f.close()
def write_new_file(infilename="sample.csv",
                   outfilename="output.csv"):
    infile = open(infilename, "r")
    outfile = open(outfilename, "w")
    for line in infile:
        if "Teacher" in line:
            outfile.write(line)
    infile.close()
    outfile.close()
    return outfilename
First name,Last name,Role
Jonathan,Bowman,Project Manager
John,Cleese,Actor
[
  {
    "First name": "Jonathan",
    "Last name": "Bowman",
    "Role": "Project Manager"
  },
  {
    "First name": "John",
    "Last name": "Cleese",
    "Role": "Actor"
  }
]

Let's share some data wrangling scenarios.

And do some HTTP calls to a REST API.

Our mild-mannered REST API has one endpoint:
https://ttl2019.bowmanjd.com/responses

(No authentication is required, unlike any self-respecting REST API you will encounter in the wild.)

Rabbit trail to subprocess
from subprocess import check_output

def show_kernel_info():
    cmd = ["uname", "-a"]
    output = check_output(cmd)
    return output
import json
from subprocess import check_output

def list_responses():
    cmd = ["curl",
           "https://ttl2019.bowmanjd.com/responses"]
    output = check_output(cmd)
    responses = json.loads(output)
    return responses
Return from rabbit trail
import requests

endpoint = "https://ttl2019.bowmanjd.com/responses"

def list_responses():
    r = requests.get(endpoint)
    responses = r.json()
    return responses
import requests

endpoint = "https://ttl2019.bowmanjd.com/responses"

def add_response(response):
    r = requests.post(endpoint, json=response)
    count = r.json()
    return count

What data wrangling scenarios do/could you face?

POST your response(s), and feel free to GET everyone else’s, using Python.

import csv

infile = open("sample.csv", "r",
              newline="", encoding="utf-8")
reader = csv.DictReader(infile)

for row in reader:
    print(row["First name"])
outfile = open("output.csv", "w",
               newline="", encoding="utf-8")
writer = csv.DictWriter(outfile,
                        ["First", "Last"])
writer.writeheader()
new_row = {"First": "Jonathan",
           "Last": "Bowman"}
writer.writerow(new_row)

For SSH and SFTP work, try Paramiko.

Data ninja stuff

Networking

Server management

Hypervisor management