Fork me on GitHub

PyCon US 2017

пт 15 марта 2019, tags: pythonpycon

Keynote Day 1


Introduction. PyCon is changing. Over years the conference has changed from all mostly web-focused it to a very diverse conference. Now there’s a lot of scientists (numpy, scipy), devops (ansible, redhat), web (django, flask), data engineering (spark, dask), IOT (micropython) Now you can meet there anyone - professionals, students, scientists, hobbysts, teachers, analysts. People use Python differently, don’t be aggressive. What works for you - doesn’t work for others. Same with the code, code style, use, etc.

Funny: PyCon has dinner that you can attend for a payment, every year there were 100 ppl less than planned. They paid but didn’t show up

Experiment Assignment on the Web


A/B testing has a very high risk of having BIASed results.


  • Create and follow the rules
  • Create metrics! Have clear goals
  • Divide and rule, avoid testing many features on the same group of people at a time. Split into groups, test different feature on different groups
  • Have a timed results. E.g. 1 week experiment with A and 1 weeb with B, then compare and repeat

Big picture software testing unit testing, Lean Startup, and everything in between


How to test your software, common mistakes. 100% code coverage - often waste of time, often useless and doesn’t guarantee quality Testing strategy depends on the project, situation, stage of project, goals

Example (startup):

  • Have an idea
  • Test the idea (build a mvp)
  • Learn users (define requirements - collect specific requirements)
  • a/b
  • Logs
  • Analytics (kissmetrics)
  • Build product based on learned
  • Test correctness
  • Automated tests - often doesn’t prove correctness
  • Manual testing - proves correctness
  • Scaling and getting users
  • Test changes (preventing unexpected changes)
  • Automated test - effective for testing changes. Why it’s important? Once the product is proved to be correct, you need to prevent changes. Manual testing is not effective
  • Don’t try to automate testing of places that change often - waste of time, humans can do it better
  • After getting users - it’s time to do load testing, collect usage metrics (cpu, network, memory)

Next Level Testing


Exploring different types of testing

Property Based testing

  • Don’t return invalid results in case of error, raise an exception instead!

Passing all possible values to functions

  • Trying to send random values and expecting valid results

Fuzz Testing

  • Good for catching unhandled exceptions

Mutation Testing

  • Changing the code and trying to run. Very long, can be running days or weeks.

Dask A Pythonic Distributed Data Science Framework


Dask is a flexible parallel computing library for analytic computing -

Dask is composed of two components:

  • Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
  • "Big Data" collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.

Has multiple interfaces, can be used in Pandas, Asyncio, scikit-learn style. Allows to make any code distribtued

Keynote Day 2


How Instagram upgraded to Python 3.6 and how they scale django. It took 1 year to upgrade

How they upgraded their code:

  • Upgraded 3rd party libraries first to make sure they support python3, don’t allow new dependencies that don’t support python 3
  • Make the code python 2 and 3 compatible
  • Making 1 change at a time. E.g. fix all print statements, fix all imports, fix all iterators, fix all strings
  • Switched to Python 3 in preprod and kept Python 2 in prod
  • Long testing in preprod
  • Upgrading prod to Python 3

Common issues:

  • pickle. output not compatible between Python 2 and 3
  • Iterators! They lose items while iterating
  • Json.dumps produces different keys order. Use sort_keys=True
  • String! Binary, unicode, etc

They push straight to master (production)

Debugging in Python 3 6 Better, Faster, Stronger


Why running code in Python 2 with debugger is slow: tracing function is being executed for every line Python 3.6 has new API for tracing that allows to dynamically turn on/off tracing in the code during execution!

Readability Counts


How to make the code better to read. Write code for humans, not for machines. Line length - short line is not the goal, readability is Regular expressions - avoid writing long oneline expressions, use re.VERBOSE more that allows your to use comments in regexp Function arguments - always move to new line, it’s easier to read Re-read PEP8 every 6 months

Code Structure:

  • Use descriptive names, long but descriptive
  • Use tuple unpacking in loops
  • How to make sure that the code is readable: read your code out loud
  • Variables should describe what they’re doing
  • Write documentation for blocks of the code, describe what it does
  • Context managers - good way to hide exception handling (contextlib)
  • Replace loops on list comprehensions
  • Do not reimplement standard magic class methods like. Instead of .has_item() implement .__contains__
  • If you notice patterns that you pass same arguments in a set of functions, create class

How to make a good library API



Good Values:

  • Simplicity
  • Focus on 90% of cases. Documentation, examples (simple, easy to read)
  • Let use library internals (inheritance, import) for implementing other 10%
  • Use "good" default parameters that will work for 90%
  • Consistency
  • Keep code Pythonic
  • Safety
  • Flexibility

Too abstract API are far away from reality, but it depends on the case. Celery, Pillow, BeautifulSoup Complex is better than complicated Side effects - bad, try to write clean functions. E.g. print_formatted(‘a b c’) - bad, print(format(‘a b c’)) - good Design Patterns - caused by bad language design. Can be replaced by better solutions from Python. Learn different languages to find better solutions

Library UX Using abstraction towards friendlier APIs


Good interface

  • Reduces number of errors
  • Makes complex tasks simple
  • Helps to adapt technology

Python is in between low and high level languages Hide implementations in controlled environment Duplications is better than bad abstraction Top level should allow access to 2nd level. It will help to make library flexible Level of abstraction depends on users

No More Sad Pandas Optimizing Pandas Code for Speed and Efficiency


How to make Pandas fast

%timeit function - execution timing %load_ext line_profiler %lprun -f function - profile line by line

Apply in pandas 2 times faster than iterrows Numpy arrays faster than pandas (ndarray) Cython helps to speed up loops Don’t optimize things that should not be optimized. Aka don’t do premature optimization

Tracing, Fast and Slow Digging into and improving your web service’s performance


TLDR: use distributed tracing for microservices or project with many parts

  • Understanding performance
  • Understanding how systems affect each other

Setup nginx to send x-request-id Standartize log structure It depends on the goal what to track Use request id as id Tracing can and will make your system slower. But it’s not required to track all parts of the system Don’t always required

One Data Pipeline to Rule Them All


How to use Kafka efficiently and plan your data pipelines You need schema, without schema everything will be bad Schema changes within topic should be backwards compatible Migration to new schema: create new schema and switch from old one

Comments !