Clean Data Год издания: 2015 Автор: Megan Squire Издательство: Packt Publishing ISBN: 9781785284014 Язык: Английский Формат: ePub Качество: Изначально компьютерное (eBook) Интерактивное оглавление: Да Количество страниц: 268 Описание: Is much of your time spent doing tedious tasks such as cleaning dirty data, accounting for lost data, and preparing data to be used by others? If so, then having the right tools makes a critical difference, and will be a great investment as you grow your data science expertise. The book starts by highlighting the importance of data cleaning in data science, and will show you how to reap rewards from reforming your cleaning process. Next, you will cement your knowledge of the basic concepts that the rest of the book relies on: file formats, data types, and character encodings. You will also learn how to extract and clean data stored in RDBMS, web files, and PDF documents, through practical examples.At the end of the book, you will be given a chance to tackle a couple of real-world projects.
Примеры страниц
Оглавление
1: Why Do You Need Clean Data? A fresh perspective The data science process Communicating about data cleaning Our data cleaning environment An introductory example Summary 2: Fundamentals – Formats, Types, and Encodings File formats Archiving and compression Data types, nulls, and encodings Summary 3: Workhorses of Clean Data – Spreadsheets and Text Editors Spreadsheet data cleaning Text editor data cleaning An example project Summary 4: Speaking the Lingua Franca – Data Conversions Quick tool-based conversions Converting with PHP Converting with Python The example project Summary 5: Collecting and Cleaning Data from the Web Understanding the HTML page structure Method one – Python and regular expressions Method two – Python and BeautifulSoup Method three – Chrome Scraper Example project – Extracting data from e-mail and web forums Summary 6: Cleaning Data in PDF Files Why is cleaning PDF files difficult? Try simple solutions first – copying Another technique to try – pdfMiner Third choice – Tabula When all else fails – the fourth technique Summary 7: RDBMS Cleaning Techniques Getting ready Step one – download and examine Sentiment140 Step two – clean for database import Step three – import the data into MySQL in a single table Step four – clean the & character Step five – clean other mystery characters Step seven – separate user mentions, hashtags, and URLs Step eight – cleaning for lookup tables Summary 8: Best Practices for Sharing Your Clean Data Preparing a clean data package Documenting your data Setting terms and licenses for your data Publicizing your data Summary 9: Stack Overflow Project Step one – posing a question about Stack Overflow Step two – collecting and storing the Stack Overflow data Step three – cleaning the data Step four – analyzing the data Step five – visualizing the data Step six – problem resolution Moving from test tables to full tables Summary 10: Twitter Project Step one – posing a question about an archive of tweets Step two – collecting the data Step three – data cleaning Step four – simple data analysis Step five – visualizing the data Step six – problem resolution Moving this process into full (non-test) tables Summary
[only-soft.org].t34066.torrent
Торрент:
Зарегистрирован
[ 2015-09-14 15:40 ]
2 KB
Статус:
√проверено
Скачан:
2 раз
Размер:
5 MB
Оценка:
(Голосов: 0)
Поблагодарили:
0
Megan Squire - Clean Data [2015, ePub, ENG] скачать торрент бесплатно и без регистрации
Вы не можете начинать темы Вы не можете отвечать на сообщения Вы не можете редактировать свои сообщения Вы не можете удалять свои сообщения Вы не можете голосовать в опросах Вы не можете прикреплять файлы к сообщениям Вы можете скачивать файлы