You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The rescale_species function has a min and max argument to stop extreme values exerting undue influence on the geometric mean. We always assumed this would address problems with zeros in the input data. However, this is not true, because min and max are applied after the index values have been rescaled.
# Get the multipliers neede to achieved the index value
multipliers <- index / Data[1,2:ncol(Data)]
# Apply these multipliers
indicator_scaled <- t(t(Data[,2:ncol(Data)]) * multipliers)
# Make values over max == max, and < min == min
indicator_scaled[indicator_scaled < min & !is.na(indicator_scaled)] <- min
indicator_scaled[indicator_scaled > max & !is.na(indicator_scaled)] <- max
Cases where Data==0 result in mulitplier taking the value Inf, which means indicator_scaled becomes NaN, which is not captured by the conversion to either min or max statements. Further, when Data==0 in the first year, all subsequent years get first rescaled to Inf then capped at max.
We can't solve this using the min and max statements, since these values are relative to the index. Using them to test the raw data would result in unintended consequences (e.g. for occupancy data, all the input data would be below the default value of min=1. Instead, we need two separate fixes:
Convert zeros at the start of each species' time series to NA
Convert internal zeros to some value that is smaller than any other value in the series.
The text was updated successfully, but these errors were encountered:
Yes, I see that a 0 in the first year is going to cause some problems!
Really this should be generalised to say that any time series starting with a 0 is a problem (i.e. 0,1,2,3 is problematic as is NA,NA,0,1). I think if you take this more general approach and change all 'leading zeros' to NAs then the zeros in the middle of sequences are no longer an issue? I believe once all the leading zeros are removed all the mid-series zeros will be multiplied by the multiplication factor (therefore still zero), and maybe capped to the min. So I don't see a problem there, but I might have overlooked something.
If you have a leading zero in e.g. a butterfly monitoring dataset for TRIM analysis then there must be prior information around, otherwise it would be NA until the first observation has been made (i.e. the 0 means that there was information available saying that that species was there but wasn’t found, am not sure what the model assumptions say about situations like that.) My understanding would be that leading zeroes are not possible before something has been observed so both 0,1,2,0,5 and NA,0,1,2,0,5 would be NA,1,2,0,5 & NA,NA,1,2,0,5. The zero inside the series is meaningful, though.
Thanks both.
I agree with your logic and suggestion for a fix, @AugustT @larspett : I agree with you that leading zeros are typically informative. In the application I'm working on, leading zero arise because I have set the baseline year for the multispecies indicator to be later than the start date for some of the contributing species. Specifically, I'm combining data from multiple taxonomic groups, and I want the indicator to start at the first year in which all groups are present (thus discarding all data prior to this point). It so happens that some species have a true zero in the baseline year I've chosen. I'm taking the path of least resistance by switching those zeros to NA, but I accept that potentially smarter solutions could be developed.
@JackHHatfield91 spotted this.
The
rescale_species
function has a min and max argument to stop extreme values exerting undue influence on the geometric mean. We always assumed this would address problems with zeros in the input data. However, this is not true, because min and max are applied after the index values have been rescaled.Cases where
Data==0
result inmulitplier
taking the valueInf
, which meansindicator_scaled
becomesNaN
, which is not captured by the conversion to eithermin
ormax
statements. Further, whenData==0
in the first year, all subsequent years get first rescaled toInf
then capped atmax
.We can't solve this using the
min
andmax
statements, since these values are relative to the index. Using them to test the raw data would result in unintended consequences (e.g. for occupancy data, all the input data would be below the default value ofmin=1
. Instead, we need two separate fixes:NA
The text was updated successfully, but these errors were encountered: