Python
Following this will default to the newest version of Python, which is version 3. For backwards compatibility reasons, python will refer to Python 2. However this version will be deprecated at the end of 2019.
On macOS
Installing Python on macOS via Homebrew, at the time of writing, will not install the most current version of Python available, despite it being released 5 months ago.
brew install python@3.8
# Add to .zshenv
path=(/usr/local/opt/python@3.8/libexec/bin ${path})
On Debian
sudo apt install python3.8
sudo apt install python3-pip
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 1
sudo update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
On Ubuntu
Do you really need to close?
In business, yes. In Python, however, no*. For more information, see below for a copy of the explanation I gave to a collaborator on GitHub:
TL;DR:
Once the reference count of file reaches zero, which happens at the end of this try block, CPython will call the __del__() method of the underlying path-like object, which makes the call to close() therein. An argument can be made that how Python manages memory depends on which Python you're using. Most people use CPython, which uses this reference count system, so we have the guarantee that the file will receive the necessary call to close(). But since this is running within a Docker container, and we know we're using CPython, I don't think it's worth changing this line of code to preserve niche cross-Python compatibility concerns, like maintaining support for Jython π€π§
Silencing the python Console Welcome Message
Normally, when you open the python console, the following welcome message will appear when you enter.
Python 3.7.3 (default, Jun 19 2019, 07:38:49)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
To disable it, use the following command to enter the python console:
set()
A set in Python is implemented as a hash table. It has O(1) lookup and insertion time.
Creating a set and inserting values:
# Creating an empty set
bag = {}
bag = set()
# Creating a set from a list of items
bag = {1, 2}
bag = set([1, 2])
Adding items to a set:
bag = {1, 2}
array = [1, 2, 3, 4, 4, 5]
# Adding 1 item with add()
set.add(3)
# Adding a list of items using update()
bag.update(array)
# Adding a set of items using |=
bag |= set(array)
Create a dictionary, using the elements of a set as the keys
names = set({'Tommy', 'Tina', 'Traveler'})
default_value = '+1 (123) 456-7890'
contacts = dict({name: default_value for name in names})
Create a dictionary from all the key: value pairs of an existing dictionary, excluding keys a, b, and c:
d = {'a': 'alpha', 'b': 'bravo', 'c': 'charlie', 'd': 'delta', 'e': 'echo', 'f': 'foxtrot'}
print({k:d[k] for k in d.keys() - {'a', 'b', 'c'}})
Output:
{'e': 'echo', 'd': 'delta', 'f': 'foxtrot'}
Testing for membership in dicts/sets:
hashmap = {
'a': 1,
'b', 2,
'c', 3
}
keys = hashmap.keys()
# Test that {'a', 'b', 'c'} is a proper subset of hashmap's keys
print({'a', 'b', 'c'} < hashmap.keys())
# => False
# Test that {'a', 'b', 'c'} is a proper superset of hashmap's keys
print({'a', 'b', 'c'} > hashmap.keys())
# Test that 'a' and 'b' are a valid subset of hashmap's keys
print({'a', 'b'} <= hashmap.keys())
# => True
# Test that {'a', 'b', 'c'} is a superset of hashmap's keys
print({'a', 'b', 'c'} >= hashmap.keys())
os.path()
from os.path import *
from glob import glob
home = expanduser('~')
# => /Users/tommy
sym_folder = join(home, 'tmp')
# => /Users/tommy/tmp
abs_folder = realpath(sym_folder)
# => /Users/tommy/real/location/folder
files = glob(expanduser('~/notes/*.md'))
# => ['/Users/tommy/notes/file1.md', '/Users/tommy/notes/file2.md']
Jupyter Notebook
A Jupyter Notebook brew install jupyter is an open source web application that allows you to create and share documents containing snippets of pre-executed code.
The Jupyter Notebook supports the following languages, among others:
pythonrubynodejscc++bashzshgoperlphpredis
The Jupyter Notebook will also render LaTeX and Markdown, allowing for flexible formatting of data.
You can execute bash scripts in your Jupyter notebook, simply by pre-pending any given cell with %%bash on the first row.
to add configuration files to your local machine's jupyter notebook, type the following command. This will generate the folder ~/.jupyter and insert a file into it called jupyter_notebook_config.py.
jupyter notebook --generate-config
pass="letmein"
python -c "from notebook.auth import passwd; print(passwd('${pass}'))"
pylint
Setting Up pylint
On macOS
pip install pylint
export PATH="~/Library/Python/3.7/bin:${PATH}"
On Debian
pylint --generate-rcfile > ~/.pylintrc
If pylint notifies you about a linting error that you don't like, add it as an entry, seperated by a comma, to disable. e.g.
disable=missing-docstring,
invalid-name,
bare-except
Plotly
Getting Started
Plotly is a visualization tool you can install with pip3 install plotly. You can create an account at plot.ly and generate an API key. Once you have, edit the ~/.plotly/.credentials and insert the following information.
{
"username": "tommytrojan",
"api_key": "Fou4dE18o4TUtCz91n6O"
}
Plotting Data
Plotting Online
import pandas as pd
import plotly.plotly as py
import plotly.figure_factory as ff
df = pd.read_csv("earnings.csv")
table = ff.create_table(df)
py.iplot(table, filename='table1')
Plotting Offline
import plotly.offline as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, plot, iplot
init_notebook_mode()
data = go.Bar(x=df.height, y=df.weight)
figure = [data]
py.iplot(data, filename='height_weight')
The only big difference between plotting online and plotting offline is whether to use plotly.plotly or plotly.offline for your import. Both will include the plot() and iplot() methods.
Regular Expressions
Use the re library for regular expressions in python. Regular expression patterns are created by calling re.compile() which accepts two arguments, a raw string, and flags. To specify multiple flags, specify each with the bitwise OR operator |
Regular Expression Flags
| Flag | Function |
|---|
re.A | Make the pattern match only ASCII characters |
re.I | Make the pattern case insensitive |
re.M | The ^ & $ special characters match the start/end of each line in a string, instead of the start/end of the string itself |
re.S | Allow the . character to match newline characters |
re.X | Ignore whitespace in the pattern definition, and allow for comments |
| |
Special Characters
| Special Character | Function |
|---|
| + | Match 1 or more |
| * | Match 0 or more |
| ? | 0 or 1 |
| {k} | Match k consecutive occurances of the preceeding pattern |
| {m,n} | Match from m to n consecutive occurences (inclusive) of the preceeding pattern (as many as possible) |
| {m,n}? | Match from m to n consecutive occurrences (inclusive) of the preceeding pattern (as few as possible) |
| . | Match any character except a newline \n |
| ^ | Match the start of the string |
| $ | Match the end of string |
| ( | Specify the start of capture group |
| ) | Specify the end of capture group |
Escaped Characters
| Escaped Character | Matches |
|---|
\t | Horizontal tab |
\v | Vertical tab |
\n | Newline |
\f | Form feed |
\r | Carriage return |
\w | [a-zA-Z0β9_] |
\d | [0β9] |
\s | [\t\n\r\f\v] |
\b | Specify a word boundary |
Coding example
# Import the RegEx library
import re
# Create a raw-string pattern
pattern = re.compile(r'http[s]?://([^/?:]+)', re.A|re.I)
# Search for the text matching the pattern in "sample"
text = 'https://helpful.wiki/python'
match = pattern.search(text)
print(match.group(0))
# => https://helpful.wiki
Dates, Times, Timestamps
ISO8601 and RFC3339 are the documents that outline how date and time should be denoted on computers. 1996-12-20T00:39:57Z is an example of an ISO8601 timestamp. The Z denotes that this timestamp is in zulu time, the UTC timezone. The UTC timezone is Coordinated Universal Time and, depending on the time of year, is either 7 or 8 hours ahead of the time in California. The equivalent time in California would be represented as 1996-12-19T16:39:57-08:00
The datetime library has a few packages:
datetime.date
datetime.time
datetime.datetime
datetime.timedelta
datetime.timezone
datetime.tzinfo
Working with ISO formatted timestamp strings
import datetime
import time
event = datetime.date.fromisoformat("2018-12-31")
# -00:00 is an illegal format but Python will interpret it successfully
event = datetime.datetime.fromisoformat("2018-12-31T12:31:58-08:00")
# Using whitespace seperator instead of 'T' character
event = datetime.datetime.fromisoformat("2018-12-31 04:31:58+00:00")
Parsing a non-ISO date into a datetime object
moment = datetime.strptime('05-19-2018', '%m-%d-%Y')
print(moment)
datetime.datetime(year=2018, month=5, day,19, hour=0, second=0)
Printing a timezone-aware UTC timestamp in RFC3339 format
from datetime import datetime, timezone
datetime.now(timezone.utc).isoformat(sep=' ', timespec='seconds')
# => '2023-04-21 23:50:30+00:00'
Printing a timezone-aware UTC timestamp in ISO8601 format
from datetime import datetime, timezone
datetime.now(timezone.utc).isoformat(sep='T', timespec='seconds')
# => '2023-04-21T23:50:30+00:00'
Printing a timezone-unaware UTC timestamp in RFC3339 format
datetime.utcnow().isoformat(sep=' ', timespec='seconds')
# => '2023-04-21 23:47:55'
Printing a timezone-unaware UTC timestamp in ISO8601 format
datetime.utcnow().isoformat(timespec='seconds')
# => '2023-04-21T23:47:55'
Parsing a non-ISO timestamp into a datetime object
moment = datetime.strptime('10-05-2017 05:00:00 PM', '%m-%d-%Y %I:%M:%S %p')
print(moment)
# => datetime.datetime(year=2017, month=10, day=5, hour=17, second=0)
Working with daylight savings
# If it's currently daylight savings
if time.daylight:
tz = time.altzone
# => 25200 (# of seconds offset from UTC)
else:
tz = time.timezone
# => 28800 (# of seconds offset from UTC)
print(time.tzname)
# => ('PST', 'PDT')
Working with local timezones
print(time.localtime)
# => time.struct_time(tm_year=2019, tm_mon=7, tm_mday=17, tm_hour=11, tm_min=42, tm_sec=15, tm_wday=2, tm_yday=198, tm_isdst=1)
moment = datetime.datetime.utcnow()
print(moment.isoformat(timespec='seconds'))
# => 2019-07-17T11:25:07
print(moment.isoformat(timespec='milliseconds'))
# => 2019-07-31T02:21:15.125
print(moment.isoformat(sep=' ', timespec='microseconds'))
# => 2019-07-3102:21:15.125991
Getting the local timezone information
import datetime
timezone = datetime.datetime.now().astimezone().tzinfo
Setting the timezone
import time
import os
os.environ['TZ'] = 'US/Eastern'
time.tzset()
print(time.tzname)
# => ('EST', 'EDT')
Printing out all available timezones
```py
from zoneinfo import ZoneInfo
from pathlib import Path
for area in ['America', 'Europe', 'Asia', 'Africa', 'Australia', 'Antarctica', 'Etc']:
print(area)
for zone in Path('/usr/share/zoneinfo').glob(f'{area}/*'):
print(f'\t/{zone.name}')
```
Some common timezones have been included below:
```py
from zoneinfo import ZoneInfo
# Constructing a timezone object
tz = timezone('America/Los_Angeles')
# Other valid timezones included below
ZoneInfo('Pacific/Honolulu') # -10:00
ZoneInfo('America/Juneau') # -09:00
ZoneInfo('America/Los_Angeles') # -08:00
ZoneInfo('America/Denver') # -07:00
ZoneInfo('America/Chicago') # -06:00
ZoneInfo('America/New_York') # -05:00
ZoneInfo('Europe/London') # +00:00
ZoneInfo('Europe/Paris') # +01:00
ZoneInfo('Europe/Athens') # +02:00
ZoneInfo('Europe/Moscow') # +03:00
ZoneInfo('Asia/Tehran') # +03:30
ZoneInfo('Asia/Dubai') # +04:00
ZoneInfo('Asia/Kabul') # +04:30 (capitol of Afghanistan)
ZoneInfo('Asia/Dushanbe') # +05:00 (capitol of Tajikstan)
ZoneInfo('Asia/Kathmandu') # +05:45 (capitol of Nepal)
ZoneInfo('Asia/Dhaka') # +06:00 (capitol of Bangladesh)
ZoneInfo('Asia/Bangkok') # +07:00
ZoneInfo('Asia/Shanghai') # +08:00
ZoneInfo('Asia/Tokyo') # +09:00
ZoneInfo('Australia/Sydney') # +10:00
ZoneInfo('Asia/Noumea') # +11:00 (capitol of New Caledonia)
ZoneInfo('Pacific/Fiji') # +12:00
```
Custom Module Locations
If you have your own modules that you want to use, there's a way to tell python where to look for a module that are added by the import keyword.
To do this, set the environment variable PYTHONPATH in your shell, and export that variable. For example, export PYTHONPATH=~/example/python/modules. Now, all of the folders within this directory will be considered a module. For instance, if ~/example/python/modules/ex was a folder containing python code, now you'd be able to type import ex in future python programs.
Sockets
Server-Side TCP Socket
import socket
# Create a TCP socket
mysocket = socket.socket(
type=socket.SOCK_STREAM
)
# Create an address tuple, the interface to bind to, and a port number
address = ('0.0.0.0', 1234)
# Bind the socket to the address
mysocket.bind(address)
# Listen, allowing 1 pending connection
mysocket.listen(1)
# Listen until the process is killed
while True:
# Save the accepted connection
connection, address = mysocket.accept()
print(f"Accepted connection from {address}")
# Continue until the transmission has no more data
while True:
data, orig_address = connection.recvfrom(4096)
# If there's no data being transmitted, exit
if not data:
break
else:
# For now, reply to the same connection, echoing the message
reply = f"echo \'{data.decode()}\'".encode()
connection.sendto(reply, address)
# Close the connection now that the message has been replied to
connection.close()
from socket import socket
mysocket = socket.socket()
address = ('127.0.0.1', 1234)
# Connect the socket to the server
mysocket.connect(address)
# Send a message to the server, registering the name of the client
message = f'register {args.name}'
# Send the message to the connection
mysocket.sendto(message.encode(), address)
# Save the reply from the response, as well as the address
reply, address = mysocket.recvfrom(4096)
# Decode the reply's binary encoding, store as UTF-8 string
reply = reply.decode()
print(reply)
mysocket.close()
ofile = open(args.logfile, 'w')
ofile.write('connected to server and registered\n')
ofile.write('waiting for messages...\n')
ofile.write('exit')
Pandas
from pandas import DataFrame, Series, read_json
barset["EMA12"] = barset["o"].ewm(span=12).mean()
barset.to_json('ofile.json', date_format='iso', date_unit='s', orient='index')
df = read_json('ofile1.json', orient='index', date_unit='s')
print(df.head())
Type Hints
from typing import List, Set, Dict, Tuple, Optional, Callable, Iterator, Union
# For simple built-in types, just use the name of the type
x: int = 1
x: float = 1.0
x: bool = True
x: str = "test"
x: bytes = b"test"
# For collections, the name of the type is capitalized, and the
# name of the type inside the collection is in brackets
x: List[int] = [1]
x: Set[int] = {6, 7}
# Same as above, but with type comment syntax
x = [1] # type: List[int]
# For mappings, we need the types of both keys and values
x: Dict[str, float] = {'field': 2.0}
# For tuples, we specify the types of all the elements
x: Tuple[int, str, float] = (3, "yes", 7.5)
# Use Optional[] for values that could be None
x: Optional[str] = some_function()
# Mypy understands a value can't be None in an if-statement
if x is not None:
print(x.upper())
# If a value can never be None due to some invariants, use an assert
assert x is not None
print(x.upper())
# This is how you annotate a function definition
def stringify(num: int) -> str:
return str(num)
# And here's how you specify multiple arguments
def plus(num1: int, num2: int) -> int:
return num1 + num2
# Add default value for an argument after the type annotation
def f(num1: int, my_float: float = 3.5) -> float:
return num1 + my_float
# This is how you annotate a callable (function) value
x: Callable[[int, float], float] = f
# A generator function that yields ints is secretly just a function that
# returns an iterator of ints, so that's how we annotate it
def g(n: int) -> Iterator[int]:
i = 0
while i < n:
yield i
i += 1
# You can of course split a function annotation over multiple lines
def send_email(address: Union[str, List[str]],
sender: str,
cc: Optional[List[str]],
bcc: Optional[List[str]],
subject='',
body: Optional[List[str]] = None
) -> bool:
Subprocesses
Using the subprocess library, you can execute other commands from within your script, and capture the standard input and standard output of those commands
Filepaths
from pathlib import Path
filepath = Path.home() / 'Downloads' / 'meme.jpg'
Function Parameters
There is a new function parameter syntax / to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments
def f(a, b, /, c, d, *, e, f):
print(a, b, c, d, e, f)
One use case for this notation is that it allows pure Python functions to fully emulate behaviors of existing C coded functions. For example, the built-in divmod() function does not accept keyword arguments:
def divmod(a, b, /):
"Emulate the built in divmod() function"
return (a // b, a % b)
Python Image Library
Comprehensions
Iterables
Iterable is a βsequenceβ of data, you can iterate over using a loop.
The easiest visible example of iterable can be a list of integers, such as [1, 2, 3, 4, 5, 6, 7]
However, itβs possible to iterate over other types of data like a str(), dict(), tuple(), set(), etc.
- Verify an object is iterable by checking that it has defined the
iter() method
print(hasattr(str, '__iter__'))
# => "True"
print(hasattr(bool, '__iter__'))
# => "False"
argparse
Attached below is a program I made to import CSV data exported from my Apple Card into the budgeting software YNAB
import webbrowser
from datetime import datetime
from sys import exit
from json import load, dumps
from os import getenv
from sys import stdin, stdout, stderr, argv
from csv import DictReader, DictWriter
from urllib.request import Request, urlopen
from argparse import ArgumentParser, FileType
parser = ArgumentParser(
prog='ynab',
usage='%(prog)s [CSV_FILE]',
description='%(prog)s: a data pipeline'
)
parser.add_argument(
'-v',
"--verbose",
dest='verbose',
action='store_true',
help='option to print CSV to stdout'
)
options = parser.parse_args()
endpoint = 'https://api.youneedabudget.com/v1/budgets/last-used'
def get_account_id():
account_request = Request(
url=f'{endpoint}/accounts'
)
account_request.add_header('Authorization', f'Bearer {api_token}')
account_response = urlopen(account_request)
accounts = load(account_response)['data']['accounts']
ynab_account_id = None
for account in accounts:
if account['name'] == 'Apple Card':
ynab_account_id = account['id']
if ynab_account_id is None:
raise (ValueError('ynab: unable to find account "Apple Card"\n'))
return ynab_account_id
if (api_token := getenv('YNAB_TOKEN')) is None:
exit('ynab: expected environment variable ${YNAB_TOKEN}')
# Force input to be provided via file redirection
if stdin.isatty():
if len(argv) == 1:
stderr.writelines([
'ynab: please supply the CSV file via standard input\n',
'\tusage: `ynab < ./Downloads/apple.csv > ~/ynab.csv`\n'
])
exit(2)
apple_csv = DictReader(
f=stdin, # The file to read from (standard input)
fieldnames=None, # Assume the CSV file's first row contains the field names
dialect='unix', # Specify the encoding method for the CSV file
)
expected_fields = [
'Transaction Date',
'Clearing Date',
'Description',
'Merchant',
'Category',
'Type',
'Amount (USD)'
]
if apple_csv.fieldnames != expected_fields:
stderr.writelines([
'ynab: problem reading CSV header row\n',
f'\texpected:\t{expected_fields}\n',
f'\treceived:\t{apple_csv.fieldnames}\n'
])
exit(1)
ynab_csv = DictWriter(
f=stdout, # The file to write to (standard output)
fieldnames=['Date', 'Payee', 'Memo', 'Amount'], # Specify the field names
dialect='unix', # Specify the encoding method for the CSV file
)
csv_transactions = list()
api_transactions = list()
account_id = get_account_id()
# Create a list of api_transactions
for row in apple_csv:
# Format the date from 2020/01/13 to 2020-01-13
date = datetime.strptime(
row['Transaction Date'], '%m/%d/%Y'
).date().isoformat()
# Write an entry to the CSV file
csv_transactions.append({
'Date': date,
'Payee': row['Merchant'],
'Amount': '{:.2f}'.format(float(row['Amount (USD)']) * -1),
'Memo': ''
})
# Store the next transaction as a dictionary, append it to the list
api_transactions.append({
'account_id': account_id,
'date': date,
'payee_name': row['Merchant'],
'cleared': 'cleared',
'approved': False,
'amount': int(float(row['Amount (USD)']) * -1_000)
})
if options.verbose:
# Write the header row to the CSV file (the field names)
ynab_csv.writeheader()
# Write each transaction in the list to a row in the CSV file
ynab_csv.writerows(csv_transactions)
data = {
'transactions': api_transactions
}
transaction_request = Request(
headers={
'Authorization': f'Bearer {api_token}',
"Content-Type": 'application/json',
},
url=f'{endpoint}/transactions',
data=dumps(data).encode('utf-8')
)
transaction_response = urlopen(transaction_request)
# print(f'ynab: successfully imported {len(api_transactions)} into YNAB')
# Open YNAB for the user on their default web browser
webbrowser.open('https://app.youneedabudget.com')
Strings
There's two main ways to substitute values into the contents of the template string. You can use formatted string literals (more commonly known simply as "f-strings"), or you can use the old string formatting method, which uses the modulo % operator, reminiscent of the C-style printf() syntax.
Python Caching
You can disable python caching entirely, preventing .pyc files from being written when source modules are imported.
Python wonβt try to write .pyc files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.
Starting from Python 3.8, you can configure the environment to prevent Python from reading and writing __pycache__ directories, sourcing them instead from a separate location on the filesystem, specified by you.
plistlib
Email
Gmail's API requires a MIME type, but teaches you how to create a MIME message in their documentation.
Jupyter Notebook
To get started, you'll need to install some packages
pip install notebook ipywidgets
By default, when you launch a Jupyter notebook, it will be hosted at 127.0.0.1 on port 8888
Passwords
Changing a notebook's password the proper way
First, enter a Python shell
Run the passwd() function in the notebook library
from notebook.auth import passwd
passwd
# Enter password:
# Verify password:
# => 'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'
Edit your jupyter_notebook_config.py file
# The password should be of the form 'type:salt:hash'
c.NotebookApp.password = 'sha1:0827b2390e3d:b54ee3e38895aaccc182705ad174bfb3c6e86a10'
Changing a notebook's password the lazy way
- Edit your
jupyter_notebook_config.py file
```py
from jupyter.auth import passwd
c.NotebookApp.password = passwd('lol_nobody_will_see_this')
```
Plotly
Pandas
import pandas as pd
import sys
# Define a dictionary containing employee data
df = pd.DataFrame(
index=['a', 'b', 'c'],
columns=['time', 'date', 'name']
)
# access the first row
df.loc['a']
# equivalent
df.iloc[0]
# select the date column from all rows, starting after the row labeled 'b'
df.loc['b': , 'date']
# equivalent
df.iloc[1: , 1]
# select all rows from the column labeled "time"
df['time']
# equivalent
df.loc[:, 'time']
# select columns from two columns, 'time' and 'date'
print(df.index)
# select the 1st & 3rd rows only, and the column 'date'
bool_array = [True, False, True]
df.loc[bool_array , 'date']
# select the 1st & 3rd columns only, and all rows
df.loc[: , bool_array]
Package Management
pyproject.toml
Relevant Python documentation: Declaring Project Metadata
Relevant setuptools documentation: Configuring setuptools using pyproject.toml files
Example pyproject.toml boilerplate:
[project]
name = "myproject"
authors = [
{name = "Austin Traver", email = "austintraver@gmail.com"},
]
maintainers = [
{name = "Austin Traver", email = "austintraver@gmail.com"},
]
readme = "README.md"
license = {file = "LICENSE"}
description = "An example project to copy-paste into future projects."
# Reference:https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata
dynamic = ["version"]
requires-python = ">=3.10,<3.12"
dependencies = [
"jsonschema ~= 3.2",
"PyYAML ~= 6.0",
]
# Specify the PEP-508 "extras"
# This package, along with its extras,
# can be installed in 'editable mode' using the following command:
# `pip install --editable '.[dev]'`
[project.optional-dependencies]
dev = [
"autopep8",
"mypy",
"pytest",
]
[project.urls]
homepage = "https://github.com/austintraver/myexample"
documentation = "https://github.com/austintraver/myexample/tree/main/docs"
repository = "https://github.com/austintraver/myexample"
changelog = "https://github.com/austintraver/myexample/releases"
[project.scripts]
# This will create the command `mycli` in your shell, and calling it will invoke the function `main()` within your package.
mycli = "mypackage.__main__:main"
[build-system]
requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.2"]
build-backend = "setuptools.build_meta"
# Reference: https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata
[tool.setuptools]
packages = [
"mypackage",
"mypackage.cli",
"mypackage.templates"
]
# Reference: https://setuptools.pypa.io/en/latest/userguide/datafiles.html
# Default value for 'include-package-data': true
include-package-data = true
# If not specified, setuptools will try to guess a reasonable default for the package
# Reference: https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html
zip-safe = false
# Documentation: https://github.com/pypa/setuptools_scm
[tool.setuptools_scm]
# Documentation: https://docs.pytest.org/en/latest/reference/customize.html#pyproject-toml
# Configurations: https://docs.pytest.org/en/latest/reference/reference.html#ini-options-ref
[tool.pytest.ini_options]
minversion = "7.2"
Dynamic Version Numbering
The article Dynamic Versioning in the setuptools documentation explains how more about dynamic fields.
dynamic = ["version"]
[build-system]
requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.2"]
build-backend = "setuptools.build_meta"
[tool.setuptools_scm]
from argparse import ArgumentParser
from importlib.metadata import version
parser = ArgumentParser()
parser.add_argument(
'--version',
help='installed version of `mypackage`',
action='version',
version=f"v{version('mypackage')}"
)
Installing packages from GitHub
You can install packages from GitHub repositories (even private ones!) via SSH by using the following syntax:
pip install 'PACKAGE_NAME @ git+ssh://git@github.com/OWNER/REPO@REF'
Example of a branch endpoint:
package @ git+https://github.com/package@main
Example of a tag endpoint:
package @ git+https://github.com/package@v0.1.0
Example of a commit hash endpoint:
package @ git+https://github.com/package@fd80709
Example of a pull-request endpoint
package @ git+https://github.com/package@refs/pull/123/head
Where REF is the git reference (branch, tag, or commit hash) that you want to install. You can even omit the REF and it will default to the head of the main branch of the repository.
Note: You can check what the latest release is by going to this endpoint:
https://github.com/OWNER/REPO/releases/latest
Installing development dependencies
You can, according to PEP-508, specify extras, for a package. This allows you to install particular sets of extra dependencies or features for your package. For example, we can create an extra for the developer dependencies by adding the following section to pyproject.toml:
pip install --editable '.[dev]'
[project.optional-dependencies]
dev = [
"autopep8",
"mypy",
"pytest",
]
First-time contributors to the project can then initialize the repository on their local machine using the following command
python3.10 -m venv .venv \
--clear \
--upgrade-deps
.venv/bin/activate
python -m pip install \
--upgrade \
--upgrade-strategy 'only-if-needed' \
wheel
python -m pip install \
--editable \
--upgrade \
--upgrade-strategy 'only-if-needed' \
'.[dev]'
pytest
mypy
python -m mypy \
--install-types \
--non-interactive \
${ROOT_DIR}/package_1_top_folder/ \
${ROOT_DIR}/package_2_top_folder/ \
Python Wheels and Docker Images
You can use a multi-stage Docker build to build the wheels for your Python package and its dependencies in a dedicated stage, and then copy the wheels into the final image. This allows you to build a Docker image that is as small as possible. Additionally, installing dependencies from pre-built wheels dramatically speeds up the build process.
# ---------------------------------------------------
# Build the wheels in a dedicated builder stage.
FROM public.ecr.aws/docker/library/python:3.10-alpine AS builder
# Set environment values.
ENV WORKDIR /opt/worker
WORKDIR ${WORKDIR}
# Install needed system packages for `grpcio`
RUN apk add --no-cache build-base linux-headers
# Build wheels for the SDK and its dependencies.
COPY src/sdk ${WORKDIR}/sdk
RUN python -m pip wheel \
--disable-pip-version-check \
--no-cache-dir \
--wheel-dir /wheel \
${WORKDIR}/sdk
# Build wheels for the dependencies of this package.
COPY src/requirements.txt ${WORKDIR}/
RUN python -m pip wheel \
--disable-pip-version-check \
--no-cache-dir \
--wheel-dir /wheel \
--requirement ${WORKDIR}/requirements.txt
# ---------------------------------------------------
# Build the final image.
FROM public.ecr.aws/docker/library/python:3.10-alpine
ENV WORKDIR /opt/worker
WORKDIR ${WORKDIR}
# Include 'nmap', which is needed by the SDK.
RUN apk add --no-cache git nmap nmap-scripts
# Copy the wheels over from the builder stage.
COPY --from=builder /wheel/ /wheel
# Install the dependencies of this package,
# using the wheels from the builder stage.
COPY src/requirements.txt ${WORKDIR}/
RUN python -m pip install \
--disable-pip-version-check \
--progress-bar off \
--no-cache-dir \
--no-index \
--find-links /wheel \
--requirement ${WORKDIR}/requirements.txt
# Install the SDK and its dependencies,
# using the wheels from the builder stage.
COPY src/sdk ${WORKDIR}/sdk
RUN python -m pip install \
--disable-pip-version-check \
--progress-bar off \
--no-cache-dir \
--no-index \
--find-links /wheel \
${WORKDIR}/sdk
# Include source code
COPY src/pkg/ ${WORKDIR}/
ENTRYPOINT python ${WORKDIR}/main.py
In order to get it to run on multiple architectures, I had to consult the Docker documentation for buildx, which suggested that I deploy a custom builder:
docker buildx create \
--name 'custom-builder' \
--driver 'docker-container' \
--platform 'darwin,linux/amd64,linux/arm64' \
--bootstrap
Then, I could use this custom builder:
docker buildx use 'custom-builder'
Once I had done so, the following command successfully build images for multiple architectures, and no longer produced error messages on my M1 Mac:
docker buildx build \
--platform 'linux/amd64,linux/arm64' \
--progress plain \
--build-arg BUILDKIT_CONTEXT_KEEP_GIT_DIR=1 \
--build-arg BUILDKIT_MULTI_PLATFORM=1 \
--file path/to/Dockerfile \
--tag 'localhost/imagename' \
--output=docker \
.
If you need to push a multi-architecture image to Amazon ECR you'll first need to authenticate to the private registry:
aws ecr get-login-password --region region \
| docker login \
--username 'AWS' \
--password-stdin 'AWS_ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com'
Note: The username AWS is supposed to stay unchanged. Don't replace it with your own username.
Once you have authenticated, you can push the Docker image to the Amazon ECR registry:
docker push AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/MY_ECR_REPO:TAG
If you can't get Podman to build AMD64 images on your M1 Mac, you can run the following commands:
podman machine ssh sudo rpm-ostree install qemu-user-static
podman machine ssh sudo systemctl reboot
After doing this, your build issues should go away.
podman build \
--no-cache \
--platform linux/amd64 \
-tag myimage:latest \
--file "path/to/Dockerfile" \
.