Avocado, bright green fruit with a large pit and dark leathery skin, has become a fashion among millennials in America. Avocados are featured in their unique flavor, rich texture, and the high percentage of healthy fat, which is an ideal food for the healthy diet. However, the average price for avocados is not cheap, which made those millennials unable to achieve their “Avocado Freedom” dream. In addition, as an applicant in American master degree and a fan of avocados, it is of significant interest for me to conduct the explorative data analysis of avocados, hoping to figure out some hidden patterns, like seasonal patterns or cyclical patterns. Furthermore, related factors were also evaluated to suggest the influence on the avocados’ price, which could help these millennials to find the cheaper avocados in a scientific way.
Import the libraries
1 | library(tidyr) |
Load the data and return the head of data
1 | df <- read.csv("/Users/yuxuan/Desktop/INT303-Avocado-prediction/avocado-updated-2020.csv") |
date | average_price | total_volume | X4046 | X4225 | X4770 | total_bags | small_bags | large_bags | xlarge_bags | type | year | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2015-01-04 | 1.22 | 40873.28 | 2819.50 | 28287.42 | 49.90 | 9716.46 | 9186.93 | 529.53 | 0 | conventional | 2015 | Albany |
2 | 2015-01-04 | 1.79 | 1373.95 | 57.42 | 153.88 | 0.00 | 1162.65 | 1162.65 | 0.00 | 0 | organic | 2015 | Albany |
3 | 2015-01-04 | 1.00 | 435021.49 | 364302.39 | 23821.16 | 82.15 | 46815.79 | 16707.15 | 30108.64 | 0 | conventional | 2015 | Atlanta |
4 | 2015-01-04 | 1.76 | 3846.69 | 1500.15 | 938.35 | 0.00 | 1408.19 | 1071.35 | 336.84 | 0 | organic | 2015 | Atlanta |
5 | 2015-01-04 | 1.08 | 788025.06 | 53987.31 | 552906.04 | 39995.03 | 141136.68 | 137146.07 | 3990.61 | 0 | conventional | 2015 | Baltimore/Washington |
6 | 2015-01-04 | 1.29 | 19137.28 | 8040.64 | 6557.47 | 657.48 | 3881.69 | 3881.69 | 0.00 | 0 | organic | 2015 | Baltimore/Washington |
Check whether the dataset contains the missing value
1 | sum(is.na(df)) |
The overall dataset do not contain any missing value
Explore the data and some clarification
Explain the features
- date - The date of the observation
- average_price - The average price of a single
- total_volume - Total number of avocados sold
- year - The year
- type - conventional or organic
- geography - The city or region of the observation
X4046, X4225, X4770 stands for the PLU code
- Small/Medium Hass Avocado (~3-5oz avocado) | #4046
- Large Hass Avocado (~8-10oz avocado) | #4225
- Extra Large Hass Avocado (~10-15oz avocado) | #4770
Exploratory Data Analysis
Density plot of the difference between two avocados.
1 | levels(df$type) |
1 | library(ggplot2) |
1 | library(dplyr) |
type | average_volume | average_price | volume_percent |
---|---|---|---|
organic | 60127.48 | 1.624 | 3.201% |
conventional | 1818206.02 | 1.158 | 96.799% |
As can be seen from the density plot and the table in avocados.
- there are two types of avocado: organic and conventional
- organic avocado share a small percent (3.2%) of volume but has a high price (1.62)
- conventional avocado share a large percent (96.8) of volume but has a relative low price (1.16)
Compare the volume of each avocado
1 | x4770 <- df$X4770 %>% sum() |
1 | [1] "The sum of volume in x4770, x4046, x4225 is: 18181860558.8" |
The reason causing this difference is that there are different kinds of avocados, this dataset only consider the plu code of Hass avocados, it is obvious that it will cause the difference
Avocado price from 2015-2020
1 | library(ggplot2) |
1 | ggplot(data=df, aes(x=date, y=average_price,col=type))+ |
- The price of organic avocados are always higher than normal
- The price of both avocados follow some seasonal patterns, which obeyed the commob view
- Whether the relationship between the volume and price existed need to be explored in the following part
Relationship between Prices and Total on either conventional or organic avocados
Filter the data into two categories, conventional or organic
1 | organic <- df %>% select(type,average_price,total_volume,date) %>% filter(type=="organic") |
1 | library(tibbletime) |
Monthly avocados price in either conventional or organic avocados
1 |
|
- In order to find the seasonal patterns, I used the average price and volume in each month to conduct data analysis
- Figure A,B stands for the average price in each month (monthly)
- Blue dashed line stands for the minimum value while the red dashed value stands for the max value
- The most expensive conventional avocados in one month could 1.8 $, the cheapest month of conventional avocados can be 0.82 $
- The most expensive organic avocados in one month could 2.1 $, the cheapest month of conventional avocados can be 1.21 $
- Figure C,D stands for the volume in each month, the red line stands for the trend
- The love of American people are consistent, the sold volume continues to grow (conventional avocado)
- Regarding with the organic avocados, in 2019-2020, possibly owing to the decline of economic situation and the covid 19 pandemic after 2020
- The volume patterns could follow some seasonal patterns and need further analysis
Patterns among the years in each month (Autoplot library)
1 | # Process the data into year and month format |
Seasonal patterns analysis
Avocado price (both organic and conventional) distribution plot from 2015 to 2020
1 | ggplot(seasonal_df,aes(x=average_price,fill=as.factor(year)))+ |
- The distribution plot of each year from 2015 to 2020
- 2017 is the year when avocado price functions like the normal distribution plot
1 |
|
Seasonality patterns
Monthly analysis
1 |
|
1 | conv_patterns_vol <- seasonal_df %>% select(monthabb,total_volume,type) %>% filter(type=="conventional") %>% group_by(monthabb) %>% summarise(avg=mean(total_volume)) %>% |
- Overall, the avocado price could reach the peak of the year during Sep and Oct, and Feb could be the lowest price
- Regarding with the volume avocado sold, American people like to buy avocado at Feb and May, but not at Nov
- Based on the research, avocado tend to ripe at Aug and Sep, plus the time in transportation and packaging, it is unavoidable that it could have some delay
- From the plot we could see that the trend in Aug are positive
- We could see that the sold volume and price demonstrated some negative correlation at some time, which obeys our common sense, people like buying staffs when their price are low
- Meanwhile, some positive correlation could be discovered between price and volumes, the hypothesis is that the newly harvest avocados are definitely in high quality and the previous stored avocados could be consumed significantly, which means there is a gap between demand and output
- The volumes are increasing with the time which possibly because of the widely broadcasting.
Seasonal patterns
1 | options(repr.plot.width=10,repr.plot.height=8) |
- Spring (3-5), Summer (6-8), Autumn (9-11), Winter (12-2)
- Figure A, B stands for the average price of avocados for either conventional or organic
- Figure C, D stands for the volume sold for either conventional or organic, the unit is million (m)
- In summary:
- it is relative cheaper to avocado in Spring or Winter
- avocado most sold in Spring and Summer since it is close to the next round of ripen in avocado and the market are in great supply of avocados
Find the city where avocado’s price is lowest
1 | library(forcats) |
- Use the calculation to return the six cities or regions whether the average price for avocados is lowest
- Based on the research from map, Top 5 is close to the Mexico, where the avocado originated
- These places are possibly the region to supply avocado
The future price prediction
1 | library(prophet) |
- Figure A is analyze based on year: Since 2015, the average price of avocados continues to grow and 2017 was a crazy year, they grow in a speed higher than 1.3; After 2018, the average price tend to be stable, but still has a rate about 1.28; Regarding to the future prediction, the range is about [1.20-1.34]. But considering the pandemic in North America, the possible rate could be 1.20
- Figure B is the analysis based on month: we could see that during May, the price tend to grow in a positive way and this trend continues to grow to Oct; After Oct, the price starts to decrease, which follows the previous analysis
- Figure C is the analysis based on week. The average price follow the rule that ordinary family shopping habits, they prefer to buy things during weekends, so the price during weekends could be higher in other days
Other platforms:
kaggle: https://www.kaggle.com/yuxuanwu17/eda-of-avocado-price-from-2015-to-2020
zhihu (Chinese version): https://zhuanlan.zhihu.com/p/332699778
Github: https://github.com/yuxuanwu17/INT303-Avocado-prediction
- Post title:Explorative data anlysis of avocado price from 2015-2020
- Post author:Yuxuan Wu
- Create time:2021-01-25 01:21:26
- Post link:yuxuanwu17.github.io2021/01/25/avocado_pred/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.