Practical Data Analysis Using Jupyter Notebook
上QQ阅读APP看书,第一时间看更新

Preface

Welcome, and thank you for taking the time to read this book. Throughout this book, I will take you on a journey through the evolution of data analysis in a very simple and easy-to-understand manner. The book will introduce you to modern tools, such as Jupyter Notebook and various Python libraries, to teach you how to work with data. In the process, you will learn about the many different types of data and how to clean, blend, visualize, and analyze data to gain useful insights.

Data literacy is the ability to read, work with, analyze, and argue with data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines both concepts by sharing proven techniques and hands-on examples, so you can learn how to communicate effectively with data.

Complete with hands-on tutorials and real-world examples, this easy-to-follow guide will teach you concepts of data analysis using SQL, Python, and Jupyter Notebook.

Who this book is for

This book is for anyone who is looking to develop their skills to become data-driven, personally and professionally. No prior knowledge of data analysis or programming is required to get started with this book. Anyone looking for a new career working with data will enjoy reading the book.

What this book covers

Chapter 1, Fundamentals of Data Analysis, is a straightforward introduction to what data analysis is and how a blend of traditional and modern techniques can be used for data analysis.

Chapter 2,Overview of Python and Installing Jupyter Notebook, provides an introduction to the Python programming language using an open source data analysis tool called Jupyter Notebook.

Chapter 3,Getting Started with NumPy, is where you will learn about the key functions used for analysis with a powerful Python library named NumPy; You will also explore arrays and matrix data structures.

Chapter 4,Creating Your First pandas DataFrame, teaches you what a pandas DataFrame is and how to create them from different file type sources, such as CSV, JSON, and XML.

Chapter 5, Gathering and Loading Data in Python, shows you how to run SQL SELECT queries from Jupyter Notebook and how to load them into DataFrames.

Chapter 6,Visualizing and Working with Time Series Data, explores the process of making your first data visualization by breaking down the anatomy of a chart. Basic statistics, data lineage, and metadata (data about data) will be explained.

Chapter 7,Exploring, Cleaning, Refining, and Blending Datasets, focuses on essential concepts and numerical skills required to wrangle, sort, and explore a dataset.

Chapter 8,Understanding Joins, Relationships, and Aggregates, delves into constructing high-quality datasets for further analysis. The concepts of joining and summarizing data will be introduced.

Chapter 9,Plotting, Visualization, and Storytelling, continues to teach you how to visualize data by exploring additional chart options, such as histograms and scatterplots, to advance your data literacy and analysis skills.

Chapter 10,Exploring Text Data and Unstructured Data, introduces Natural Language Processing (NLP), which has become a must-have skill in data analysis. This chapter looks at the concepts you'll need to know in order to analyze narrative free text that can provide insights from unstructured data.

Chapter 11,Practical Sentiment Analysis, covers the basics of supervised machine learning. After that, there's a walk-through of sentiment analysis.

Chapter 12,Bringing It All Together, brings together many of the concepts covered in the book using real-world examples to demonstrate the skills needed to read, work with, analyze, and argue with data.

To get the most out of this book

This book is for anyone who is absolutely new to the field of data analysis. No prior knowledge or experience of working with data or programming is required. The book is a step-by-step guide that walks you through installations and exercises.

Only basic technical acumen is required. The ability to download files, access websites, and install applications on your computer is all that is needed in this regard.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the Support tab.
  3. Click on Code Downloads.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Practical-Data-Analysis-using-Jupyter-Notebook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838826031_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText:Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.Here is an example:"So, purchase_data.iloc[0] or purchase_data.ix[0] will both return the same results."

A block of code is set as follows:

product_data = {
'product a': [13, 20, 0, 10],
'project b': [10, 30, 17, 20],
'project c': [6, 9, 10, 0]
}

Any command-line input or output is written as follows:

          >cd \
          
>cd projects
>jupyter notebook

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Depending on the OS, such as Linux, a CSV would only include a line feed (LF) and not a carriage return (CR) for each row."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book,mention the book title in the subject of your message and email us atcustomercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visitwww.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us atcopyright@packt.comwith a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visitauthors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.