Filtering¶
We often only want to look at specific parts of the data. To be able to do so we can filter the data in a few ways.
Masks¶
We can use conditions which will return a Series of whether the column matches or doesn"t match the condition. This boolean Series, a so called filter mask can then be applied to the dataframe to only have the rows that are True (or False if we want that).
import pandas as pd
df_cars = pd.read_csv("data/cars.csv", sep=";", skiprows=[1])
mask = df_cars["Cylinders"] == "4"
mask
0 False 1 False 2 False 3 False 4 False ... 401 False 402 False 403 False 404 False 405 False Name: Cylinders, Length: 406, dtype: bool
df_cars[mask]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin |
---|
df_cars[df_cars["Cylinders"] == "4"] # all at once
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin |
---|
Conditions can also be combined with & as and
, and | as or
.
df_cars[(df_cars["Cylinders"] == "4") & (df_cars["MPG"] == "24.0")]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin |
---|
df_cars[(df_cars["MPG"] == "4") | (df_cars["MPG"] == "24.0")]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin |
---|
Isin¶
We can check to see if certain values are in a list.
mpgs = ["24.0", "25.0", "30.0"]
df_cars[df_cars["MPG"].isin(mpgs)]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin |
---|
Str accessor¶
The functions and methods under the str accessor provide flexible ways to filter rows based on strings.
df_cars[df_cars["Car"].str.startswith("Toyota")]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin | |
---|---|---|---|---|---|---|---|---|---|
20 | Toyota Corolla Mark ii | 24.0 | 4 | 113.0 | 95.0 | 2372.0 | 15.0 | 70 | Japan |
37 | Toyota Corolla | 25.0 | 4 | 113.0 | 95.0 | 2228.0 | 14.0 | 71 | Japan |
60 | Toyota Corolla 1200 | 31.0 | 4 | 71.0 | 65.0 | 1773.0 | 19.0 | 71 | Japan |
64 | Toyota Corolla Hardtop | 24.0 | 4 | 113.0 | 95.0 | 2278.0 | 15.5 | 72 | Japan |
89 | Toyota Corolla Mark II (sw) | 23.0 | 4 | 120.0 | 97.0 | 2506.0 | 14.5 | 72 | Japan |
91 | Toyota Corolla 1600 (sw) | 27.0 | 4 | 97.0 | 88.0 | 2100.0 | 16.5 | 72 | Japan |
115 | Toyota Camry | 20.0 | 4 | 97.0 | 88.0 | 2279.0 | 19.0 | 73 | Japan |
130 | Toyota Mark II | 20.0 | 6 | 156.0 | 122.0 | 2807.0 | 13.5 | 73 | Japan |
138 | Toyota Corolla 1200 | 32.0 | 4 | 71.0 | 65.0 | 1836.0 | 21.0 | 74 | Japan |
151 | Toyota Corolla | 31.0 | 4 | 76.0 | 52.0 | 1649.0 | 16.5 | 74 | Japan |
174 | Toyota Corolla | 29.0 | 4 | 97.0 | 75.0 | 2171.0 | 16.0 | 75 | Japan |
178 | Toyota Corolla | 24.0 | 4 | 134.0 | 96.0 | 2702.0 | 13.5 | 75 | Japan |
212 | Toyota Corolla | 28.0 | 4 | 97.0 | 75.0 | 2155.0 | 16.4 | 76 | Japan |
217 | Toyota Mark II | 19.0 | 6 | 156.0 | 108.0 | 2930.0 | 15.5 | 76 | Japan |
242 | Toyota Corolla Liftback | 26.0 | 4 | 97.0 | 75.0 | 2265.0 | 18.2 | 77 | Japan |
274 | Toyota Corolla | 27.5 | 4 | 134.0 | 95.0 | 2560.0 | 14.2 | 78 | Japan |
277 | Toyota Celica GT Liftback | 21.1 | 4 | 134.0 | 95.0 | 2515.0 | 14.8 | 78 | Japan |
317 | Toyota Corolla Tercel | 38.1 | 4 | 89.0 | 60.0 | 1968.0 | 18.8 | 80 | Japan |
325 | Toyota Corolla Liftback | 29.8 | 4 | 134.0 | 90.0 | 2711.0 | 15.5 | 80 | Japan |
328 | Toyota Corolla | 32.2 | 4 | 108.0 | 75.0 | 2265.0 | 15.2 | 80 | Japan |
350 | Toyota Starlet | 39.1 | 4 | 79.0 | 58.0 | 1755.0 | 16.9 | 81 | Japan |
355 | Toyota Tercel | 37.7 | 4 | 89.0 | 62.0 | 2050.0 | 17.3 | 81 | Japan |
363 | Toyota Corolla | 32.4 | 4 | 108.0 | 75.0 | 2350.0 | 16.8 | 81 | Japan |
369 | Toyota Cressida | 25.4 | 6 | 168.0 | 116.0 | 2900.0 | 12.6 | 81 | Japan |
390 | Toyota Corolla | 34.0 | 4 | 108.0 | 70.0 | 2245.0 | 16.9 | 82 | Japan |
398 | Toyota Celica GT | 32.0 | 4 | 144.0 | 96.0 | 2665.0 | 13.9 | 82 | Japan |
df_cars[df_cars["Car"].str.contains("Corolla")]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin | |
---|---|---|---|---|---|---|---|---|---|
20 | Toyota Corolla Mark ii | 24.0 | 4 | 113.0 | 95.0 | 2372.0 | 15.0 | 70 | Japan |
37 | Toyota Corolla | 25.0 | 4 | 113.0 | 95.0 | 2228.0 | 14.0 | 71 | Japan |
60 | Toyota Corolla 1200 | 31.0 | 4 | 71.0 | 65.0 | 1773.0 | 19.0 | 71 | Japan |
64 | Toyota Corolla Hardtop | 24.0 | 4 | 113.0 | 95.0 | 2278.0 | 15.5 | 72 | Japan |
89 | Toyota Corolla Mark II (sw) | 23.0 | 4 | 120.0 | 97.0 | 2506.0 | 14.5 | 72 | Japan |
91 | Toyota Corolla 1600 (sw) | 27.0 | 4 | 97.0 | 88.0 | 2100.0 | 16.5 | 72 | Japan |
138 | Toyota Corolla 1200 | 32.0 | 4 | 71.0 | 65.0 | 1836.0 | 21.0 | 74 | Japan |
151 | Toyota Corolla | 31.0 | 4 | 76.0 | 52.0 | 1649.0 | 16.5 | 74 | Japan |
174 | Toyota Corolla | 29.0 | 4 | 97.0 | 75.0 | 2171.0 | 16.0 | 75 | Japan |
178 | Toyota Corolla | 24.0 | 4 | 134.0 | 96.0 | 2702.0 | 13.5 | 75 | Japan |
212 | Toyota Corolla | 28.0 | 4 | 97.0 | 75.0 | 2155.0 | 16.4 | 76 | Japan |
242 | Toyota Corolla Liftback | 26.0 | 4 | 97.0 | 75.0 | 2265.0 | 18.2 | 77 | Japan |
274 | Toyota Corolla | 27.5 | 4 | 134.0 | 95.0 | 2560.0 | 14.2 | 78 | Japan |
317 | Toyota Corolla Tercel | 38.1 | 4 | 89.0 | 60.0 | 1968.0 | 18.8 | 80 | Japan |
325 | Toyota Corolla Liftback | 29.8 | 4 | 134.0 | 90.0 | 2711.0 | 15.5 | 80 | Japan |
328 | Toyota Corolla | 32.2 | 4 | 108.0 | 75.0 | 2265.0 | 15.2 | 80 | Japan |
363 | Toyota Corolla | 32.4 | 4 | 108.0 | 75.0 | 2350.0 | 16.8 | 81 | Japan |
390 | Toyota Corolla | 34.0 | 4 | 108.0 | 70.0 | 2245.0 | 16.9 | 82 | Japan |
Tilde¶
The tilde operator is used for “not” logic in filtering. If we add the tilde operator before the filter expression, the rows that do not fit the condition are returned.
df_cars[~df_cars["Car"].str.contains("Corolla")]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin | |
---|---|---|---|---|---|---|---|---|---|
0 | Chevrolet Chevelle Malibu | 18.0 | 8 | 307.0 | 130.0 | 3504.0 | 12.0 | 70 | US |
1 | Buick Skylark 320 | 15.0 | 8 | 350.0 | 165.0 | 3693.0 | 11.5 | 70 | US |
2 | Plymouth Satellite | 18.0 | 8 | 318.0 | 150.0 | 3436.0 | 11.0 | 70 | US |
3 | AMC Rebel SST | 16.0 | 8 | 304.0 | 150.0 | 3433.0 | 12.0 | 70 | US |
4 | Ford Torino | 17.0 | 8 | 302.0 | 140.0 | 3449.0 | 10.5 | 70 | US |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
401 | Ford Mustang GL | 27.0 | 4 | 140.0 | 86.0 | 2790.0 | 15.6 | 82 | US |
402 | Volkswagen Pickup | 44.0 | 4 | 97.0 | 52.0 | 2130.0 | 24.6 | 82 | Europe |
403 | Dodge Rampage | 32.0 | 4 | 135.0 | 84.0 | 2295.0 | 11.6 | 82 | US |
404 | Ford Ranger | 28.0 | 4 | 120.0 | 79.0 | 2625.0 | 18.6 | 82 | US |
405 | Chevy S-10 | 31.0 | 4 | 119.0 | 82.0 | 2720.0 | 19.4 | 82 | US |
388 rows × 9 columns
df_cars[~df_cars["MPG"].isin(mpgs)]
Car | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model | Origin | |
---|---|---|---|---|---|---|---|---|---|
0 | Chevrolet Chevelle Malibu | 18.0 | 8 | 307.0 | 130.0 | 3504.0 | 12.0 | 70 | US |
1 | Buick Skylark 320 | 15.0 | 8 | 350.0 | 165.0 | 3693.0 | 11.5 | 70 | US |
2 | Plymouth Satellite | 18.0 | 8 | 318.0 | 150.0 | 3436.0 | 11.0 | 70 | US |
3 | AMC Rebel SST | 16.0 | 8 | 304.0 | 150.0 | 3433.0 | 12.0 | 70 | US |
4 | Ford Torino | 17.0 | 8 | 302.0 | 140.0 | 3449.0 | 10.5 | 70 | US |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
401 | Ford Mustang GL | 27.0 | 4 | 140.0 | 86.0 | 2790.0 | 15.6 | 82 | US |
402 | Volkswagen Pickup | 44.0 | 4 | 97.0 | 52.0 | 2130.0 | 24.6 | 82 | Europe |
403 | Dodge Rampage | 32.0 | 4 | 135.0 | 84.0 | 2295.0 | 11.6 | 82 | US |
404 | Ford Ranger | 28.0 | 4 | 120.0 | 79.0 | 2625.0 | 18.6 | 82 | US |
405 | Chevy S-10 | 31.0 | 4 | 119.0 | 82.0 | 2720.0 | 19.4 | 82 | US |
406 rows × 9 columns