Learn Python with Talk Python's Python courses

Data Wrangling with Python

Episode #90, published Wed, Dec 21, 2016, recorded Mon, Nov 28, 2016.

This episode is carbon neutral.
Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results.

What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python.

Links from the show:

Katharine on the web: kjamistan.com
Katharine on twitter: @kjam
Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier: amzn.to/2fGc0Cx
Pycon 2016: How to Automate your Data Cleanup with Python: youtube.com/watch?v=gp-ngPV_ZX8

Packages from Data Cleanup talk
Dedupe Python Library: github.com/datamade/dedupe
probablepeople: github.com/datamade/probablepeople
usaddress: github.com/datamade/usaddress
jellyfish: github.com/jamesturk/jellyfish
Fuzzywuzzy: github.com/seatgeek/fuzzywuzzy
scrubadub: github.com/datascopeanalytics/scrubadub
pint: pint.readthedocs.io
arrow: github.com/crsmithdev/arrow
pdftables.six: github.com/vnaydionov/pdftables
Datacleaner: github.com/rhiever/datacleaner
Parserator: github.com/datamade/parserator
Gensim: radimrehurek.com/gensim
Faker: github.com/joke2k/faker
Dask: dask.pydata.org
SpaCy: spacy.io
Airflow: airflow.incubator.apache.org
Luigi: luigi.readthedocs.io
Hypothesis (testing): hypothesis.works

Katharine's courses

Data Pipelines with Python
Data Wrangling & Analysis with Python. Learn Pandas

Rollbar: rollbar.com/talkpythontome
GoCD: go.cd

Want to go deeper? Check out our courses

Katharine Jarmul
Katharine Jarmul
Katharine Jarmul is a Pythonista based in Berlin, Germany focused on data analysis. She's been writing Python for 8 years, and has worked with several startups and larger corporations in her career doing automation, web development, natural language processing and data science. She's one of the founding members of PyLadies (in Los Angeles in 2011) and she recently co-authored a book for O'Reilly on Data Wrangling with Python. You can follow her work via Twitter (@kjam) or on her site: kjamistan.com.
Episode sponsored by
Ads served ethically
Become a friend of the show
Stay in the know and get a chance to win our contests.
See our privacy statement about email communications.