Monitor performance issues & errors in your code

Data Wrangling with Python

Episode #90, published Wed, Dec 21, 2016, recorded Mon, Nov 28, 2016

Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results.

What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python.

Links from the show:

Katharine on the web: kjamistan.com
Katharine on twitter: @kjam
Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier: amzn.to/2fGc0Cx
Pycon 2016: How to Automate your Data Cleanup with Python: youtube.com/watch?v=gp-ngPV_ZX8

Packages from Data Cleanup talk
Dedupe Python Library: github.com/datamade/dedupe
probablepeople: github.com/datamade/probablepeople
usaddress: github.com/datamade/usaddress
jellyfish: github.com/jamesturk/jellyfish
Fuzzywuzzy: github.com/seatgeek/fuzzywuzzy
scrubadub: github.com/datascopeanalytics/scrubadub
pint: pint.readthedocs.io
arrow: github.com/crsmithdev/arrow
pdftables.six: github.com/vnaydionov/pdftables
Datacleaner: github.com/rhiever/datacleaner
Parserator: github.com/datamade/parserator
Gensim: radimrehurek.com/gensim
Faker: github.com/joke2k/faker
Dask: dask.pydata.org
SpaCy: spacy.io
Airflow: airflow.incubator.apache.org
Luigi: luigi.readthedocs.io
Hypothesis (testing): hypothesis.works

Katharine's courses

Data Pipelines with Python
shop.oreilly.com/product/0636920055334.do
Data Wrangling & Analysis with Python. Learn Pandas
shop.oreilly.com/product/0636920051831.do

Sponsors
Rollbar: rollbar.com/talkpythontome
GoCD: go.cd


Want to go deeper? Check out our courses

Talk Python's Mastodon Michael Kennedy's Mastodon