- Learning pandas(Second Edition)
- Michael Heydt
- 422字
- 2021-07-02 20:37:12
Alignment via index labels
Alignment of Series data by index labels is a fundamental concept in pandas, as well as being one of its most powerful concepts. Alignment provides automatic correlation of related values in multiple Series objects based upon index labels. This saves a lot of error-prone effort matching data in multiple sets using standard procedural techniques.
To demonstrate alignment, let's perform an example of adding values in two Series objects. Let's start with the following two Series objects representing two different samples of a set of variables (a and b):
Now suppose we would like to total the values for each variable. We can express this simply as s1 + s2:
pandas has matched the measurement for each variable in each series, added those values, and returned us the sum for each in one succinct statement.
It is also possible to apply a scalar value to a Series. The result will be that the scalar will be applied to each value in the Series using the specified operation:
Remember earlier when it was stated that we would come back to creating a Series with a scalar value? When performing this type of operation, pandas actually performs the following actions:
The first step is the creation of a Series from the scalar value, but with the index of the target Series. The multiplication is then applied to the aligned values of the two Series objects, which perfectly align because the index is identical.
The labels in the indexes are not required to align. Where alignment does not occur, pandas will return NaN as the result:
The NaN value is, by default, the result of any pandas alignment where an index label does not align with the other Series. This is an important characteristic of pandas, when compared to NumPy. If labels do not align, there should not be an exception thrown. This helps when some data is missing but it is acceptable for this to happen. Processing continues, but pandas lets you know there's an issue (but not necessarily a problem) by returning NaN.
Labels in a pandas index do not need to be unique. The alignment operation actually forms a Cartesian product of the labels in the two Series. If there are n 'a' labels in series 1, and m labels in series 2, then the result will have n*m total rows in the result.
To demonstrate this let's use the following two Series objects:
This will result in 6 'a' index labels and NaN for 'b' and 'c':