Author: Adam G. Dobrakowski
Redaction: Zuzanna Kwiatkowska
Since I started working on Machine Learning projects a couple of years ago, I decided to build a cheatsheet with the most important commands to use on a day-to-day basis.
Most of them are used so rarely that they are hard to remember. On the other hand, after a short time, I realised that I’m looking for the same information on Stack Overflow over and over again.
That’s why in this short article, I would like to share with you a part of my cheatsheet. I hope that it is going to be useful in your work.
Tech Stack
The technologies I use most often in my projects are:
- Jupyter Notebook – to do a quick data analysis and experimenting,
- Visual Studio Code – as an IDE to write Python code,
- Remote repository,
- Linux.
To create data analysis efficiently and quickly, I use Python’s libraries such as Pandas and Matplotlib.
Let’s start with imports!
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import HTML
Pandas
1. Show all rows in a table
# turn on
pd.set_option('display.max_rows', None)
# turn off
pandas.reset_option('display.max_rows')
or alternatively
with pd.option_context("display.max_rows", 1000):
display(df)
In my opinion, the second option is better, because we don’t have to remember about switching it off every time. This is based on my own experience, when I often forgot about it and crushed my jupyter notebook when trying to display a large table.
2. One-liners to make DataFrame processing easier
I particularly like 3 of them.
To change name of a single column, for example from “A” to “B”, use:
df.rename(columns={'A': 'B'})
To add a column and automatically fill it with ones:
df.assign(one=1)
To delete column called “campaign_id”:
df.drop(columns=['campaign_id'])
3. Merging the DataFrames
If you want to do it row-wise, you can use:
df = df1.append(df2)
When merging column-wise, you just have to add additional argument:
pd.concat([df1, df2], axis=1)
4. Converting DataFrame with 2 columns to dictionary
df.set_index('Column1')['Column2'].to_dict()
You can also do a backward operation and create DataFrame from dict quickly:
pd.DataFrame.from_dict(my_dict, orient='index')
5. Creating additional column with percentage statistics
Imagine you have a database consisting of ad campaigns. For each ad campaign, we know how many clicks it got and on which day. Now, for each day, we want to know how much each campaign contributed to all clicks within this day. Sounds difficult, but we can actually do that in a single line!
df['clicks_perc'] = df[['clicks', 'campaign_id', 'day']].groupby(['campaign_id', 'day']).transform(lambda x: x / x.sum())
6. Plotting 2 variables in a single graph using Pandas
df[['income', 'cost']].plot()
plt.show()
7. Creating a plot for a single category
Imagine you have a database with ad clicks. You measure them every hour for all of your websites. How would you create a plot in which you can see the number of ad clicks over time for every website separately? My solution would be:
plot_df = df[['clicks', 'page', 'hour']].set_index(['page', 'hour']).unstack('hour')
plot_df.columns = [c for (_, c) in plot_df.columns]
plot_df.plot()
Plots
8. Quickly beautify plots in Matplotlib
plt.rcParams["figure.figsize"] = (20,10)
plt.rcParams["font.size"] = 22
plt.style.use('bmh')
# reset
plt.rcParams.update(plt.rcParamsDefault)
9. Add vertical and horizontal grid lines to your plot
For vertical:
plt.axvline(x=0, color='grey', linestyle='-')
And horizontal:
plt.axhline(y=0.0, color='k', linestyle='-')
Jupyter Notebook
10. Using Python code from .py files in the notebook
Imagine you have a directory where you store two sub-directories: ipython with Jupyter Notebooks and lib with your Python code in .py files. To import from lib inside the notebook, simply use:
import os
while 'ipython' in os.getcwd():
os.chdir("../")
11. Making the command windows larger
By default, the code window in Jupyter doesn’t cover the full width of your browser. If you have a wide monitor, it may be frustrating, especially when you want to analyse databases with a lot of columns. You can change it using:
12. Beautify HTML titles in the notebook
display(HTML("<style> .container {width: 100% !important; } </style>"))
Terminal and Git
13. Displaying JSON-like format in your terminal
echo '{"a":[2,3]}' | json_pp
14. Find the system processes that use your computer memory the most
ps aux --sort=-%mem | head
15. Running Jupyter Notebooks from the terminal
runipy -o my_notebook.ipynb
16. Choosing a file when you have a merge conflict in Git
In my opinion, it’s particularly useful when you have a conflict between Jupyter Notebooks.
git checkout --theirs [--ours] path/to/file
17. Reverting your commit
Imagine you want to revert 5 commits to 3 commits behind. You can then provide a list to your git revert:
git revert HEAD~5..HEAD~2
Simply using HEAD~2 would only revert a single commit.
Conclusions
I hope that some of those commands were surprising for you and that you’re going to use them!
Do you also have your own command and functions cheat sheet? If so, share your best ones on LinkedIn with us!