Brought to you by Rollbar - Detect, diagnose, defeat errors w/ pip install rollbar

Episode #90: Data Wrangling with Python

Published Wed, Dec 21, 2016, recorded Mon, Nov 28, 2016.


Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results.

What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python.

Links from the show:

Katharine on the web: kjamistan.com
Katharine on twitter: @kjam
Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier: amzn.to/2fGc0Cx
Pycon 2016: How to Automate your Data Cleanup with Python: youtube.com/watch?v=gp-ngPV_ZX8

Packages from Data Cleanup talk
Dedupe Python Library: github.com/datamade/dedupe
probablepeople: github.com/datamade/probablepeople
usaddress: github.com/datamade/usaddress
jellyfish: github.com/jamesturk/jellyfish
Fuzzywuzzy: github.com/seatgeek/fuzzywuzzy
scrubadub: github.com/datascopeanalytics/scrubadub
pint: pint.readthedocs.io
arrow: github.com/crsmithdev/arrow
pdftables.six: github.com/vnaydionov/pdftables
Datacleaner: github.com/rhiever/datacleaner
Parserator: github.com/datamade/parserator
Gensim: radimrehurek.com/gensim
Faker: github.com/joke2k/faker
Dask: dask.pydata.org
SpaCy: spacy.io
Airflow: airflow.incubator.apache.org
Luigi: luigi.readthedocs.io
Hypothesis (testing): hypothesis.works

Katharine's courses

Data Pipelines with Python
shop.oreilly.com/product/0636920055334.do
Data Wrangling & Analysis with Python. Learn Pandas
shop.oreilly.com/product/0636920051831.do

Sponsors
Rollbar: rollbar.com/talkpythontome
GoCD: go.cd



Want to go deeper? Check out my courses

Katharine Jarmul
Katharine Jarmul
Katharine Jarmul is a Pythonista based in Berlin, Germany focused on data analysis. She's been writing Python for 8 years, and has worked with several startups and larger corporations in her career doing automation, web development, natural language processing and data science. She's one of the founding members of PyLadies (in Los Angeles in 2011) and she recently co-authored a book for O'Reilly on Data Wrangling with Python. You can follow her work via Twitter (@kjam) or on her site: kjamistan.com.


Individuals: Support this podcast via Patreon or one-time via Square Cash or . Corporate sponsorship opportunities available here.