- Learning pandas(Second Edition)
- Michael Heydt
- 544字
- 2021-07-02 20:37:12
Slicing a Series into subsets
pandas Series support a feature called slicing. Slicing is a powerful way to retrieve subsets of data from a pandas object. Through slicing, we can select data based upon position or index labels and have greater control over the sequencing of the items that result (forwards or reverse) and the interval (every item, every other).
Slicing overloads the normal array [] operator (and also .loc[], .iloc[], and .ix[]) to accept a slice object. A slice object is created using a syntax of start:end:step, the components representing the first item, last item, and the increment between each item that you would like as the step.
Each component of the slice is optional and provides a convenient means to select entire rows by omitting a component of the slice specifier.
To start demonstrating slicing, we will use the following Series:
We can select consecutive items using start:end for the slice. The following selects the five items in positions 1 through 5 in the Series. Since we did not specify a step component, it defaults to 1. Also note that the end label is not included in the result:
This result is roughly equivalent to the following:
It is roughly equivalent as this use of .iloc[] returns a copy of the data in the source. A slice is a reference to the data in the source. Modification of contents of the resulting slice will affect the source Series. We will examine this process further in a later section on modifying Series data in place.
A slice can return every other item by specifying a step of 2:
As stated earlier, each component of the slice is optional. If the start component is omitted, the results will start at the first item. As an example, the following is a shorthand for .head():
All items at and after a specific position can be selected by specifying the start component and omitting the end. The following selects all items, starting with the 4th:
A step can also be used in both of the two previous scenarios to skip over items:
Use of a negative step value will reverse the result. The following demonstrates how to reverse the Series:
A value of -2 will return every other item from the start position, working towards the beginning of the Series in reverse order. The following example returns every other item before and including the row at position 4:
Negative values for the start and end of a slice have special meaning. A negative start value of -n means the last n rows:
A negative end value of -n will return all but the last n rows:
Negative start and end components can be combined. The following first retrieves the last four rows, and then, from those, all but the last one (so the first three):
It is also possible to slice a series with a non-integer index. To demonstrate, let's use the following Series:
Using this Series, slicing with integer values will extract items based on position (as before):
But, when using non-integer values as components for the slice, pandas will attempt to understand the data type and pick the appropriate items from the series. As an example, the following slices from 'b' through 'd':