Introduction to PHEindicatormethods

Georgina Anderson

2020-01-13

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  19 133 0.1428571 0.09340234 0.2123636        95% proportion of 1
#> 2 Area2 2015  22 130 0.1692308 0.11449563 0.2429531        95% proportion of 1
#> 3 Area3 2015  54 126 0.4285714 0.34553800 0.5158314        95% proportion of 1
#> 4 Area4 2015  95 177 0.5367232 0.46326231 0.6086239        95% proportion of 1
#> 5 Area1 2016  77 152 0.5065789 0.42792215 0.5849114        95% proportion of 1
#> 6 Area2 2016  69 162 0.4259259 0.35236420 0.5029193        95% proportion of 1
#> 7 Area3 2016  81 179 0.4525140 0.38135857 0.5256647        95% proportion of 1
#> 8 Area4 2016  98 136 0.7205882 0.63991250 0.7891448        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  19 133 0.1428571 0.07310538 0.2604596      99.8% proportion of 1
#> 2 Area2 2015  22 130 0.1692308 0.09120193 0.2925295      99.8% proportion of 1
#> 3 Area3 2015  54 126 0.4285714 0.30215585 0.5650514      99.8% proportion of 1
#> 4 Area4 2015  95 177 0.5367232 0.42200675 0.6476798      99.8% proportion of 1
#> 5 Area1 2016  77 152 0.5065789 0.38463503 0.6277451      99.8% proportion of 1
#> 6 Area2 2016  69 162 0.4259259 0.31330969 0.5467890      99.8% proportion of 1
#> 7 Area3 2016  81 179 0.4525140 0.34287697 0.5669610      99.8% proportion of 1
#> 8 Area4 2016  98 136 0.7205882 0.59027293 0.8219579      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  19 133 14.28571  9.340234 21.23636        95% percentage Wilson
#> 2 Area2 2015  22 130 16.92308 11.449563 24.29531        95% percentage Wilson
#> 3 Area3 2015  54 126 42.85714 34.553800 51.58314        95% percentage Wilson
#> 4 Area4 2015  95 177 53.67232 46.326231 60.86239        95% percentage Wilson
#> 5 Area1 2016  77 152 50.65789 42.792215 58.49114        95% percentage Wilson
#> 6 Area2 2016  69 162 42.59259 35.236420 50.29193        95% percentage Wilson
#> 7 Area3 2016  81 179 45.25140 38.135857 52.56647        95% percentage Wilson
#> 8 Area4 2016  98 136 72.05882 63.991250 78.91448        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  19 133 14.28571  7.310538 26.04596      99.8% percentage Wilson
#> 2 Area2 2015  22 130 16.92308  9.120193 29.25295      99.8% percentage Wilson
#> 3 Area3 2015  54 126 42.85714 30.215585 56.50514      99.8% percentage Wilson
#> 4 Area4 2015  95 177 53.67232 42.200675 64.76798      99.8% percentage Wilson
#> 5 Area1 2016  77 152 50.65789 38.463503 62.77451      99.8% percentage Wilson
#> 6 Area2 2016  69 162 42.59259 31.330969 54.67890      99.8% percentage Wilson
#> 7 Area3 2016  81 179 45.25140 34.287697 56.69610      99.8% percentage Wilson
#> 8 Area4 2016  98 136 72.05882 59.027293 82.19579      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  19 133 14.28571  7.310538 26.04596
#> 2 Area2 2015  22 130 16.92308  9.120193 29.25295
#> 3 Area3 2015  54 126 42.85714 30.215585 56.50514
#> 4 Area4 2015  95 177 53.67232 42.200675 64.76798
#> 5 Area1 2016  77 152 50.65789 38.463503 62.77451
#> 6 Area2 2016  69 162 42.59259 31.330969 54.67890
#> 7 Area3 2016  81 179 45.25140 34.287697 56.69610
#> 8 Area4 2016  98 136 72.05882 59.027293 82.19579

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl  uppercl confidence       statistic
#> 1 Area1 2015  19 133 14285.71  8596.978 22310.03        95% rate per 100000
#> 2 Area2 2015  22 130 16923.08 10601.917 25622.91        95% rate per 100000
#> 3 Area3 2015  54 126 42857.14 32193.437 55920.36        95% rate per 100000
#> 4 Area4 2015  95 177 53672.32 43423.011 65612.28        95% rate per 100000
#> 5 Area1 2016  77 152 50657.89 39976.969 63314.49        95% rate per 100000
#> 6 Area2 2016  69 162 42592.59 33138.127 53904.44        95% rate per 100000
#> 7 Area3 2016  81 179 45251.40 35934.947 56244.04        95% rate per 100000
#> 8 Area4 2016  98 136 72058.82 58499.433 87817.47        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic method
#> 1 Area1 2015  19 133 14.28571  6.217672 27.62798      99.8% rate per 100  Byars
#> 2 Area2 2015  22 130 16.92308  7.887686 31.33951      99.8% rate per 100  Byars
#> 3 Area3 2015  54 126 42.85714 27.047717 64.14027      99.8% rate per 100  Byars
#> 4 Area4 2015  95 177 53.67232 38.240074 72.97515      99.8% rate per 100  Byars
#> 5 Area1 2016  77 152 50.65789 34.659672 71.17371      99.8% rate per 100  Byars
#> 6 Area2 2016  69 162 42.59259 28.473385 60.95598      99.8% rate per 100  Byars
#> 7 Area3 2016  81 179 45.25140 31.278577 63.05818      99.8% rate per 100  Byars
#> 8 Area4 2016  98 136 72.05882 51.627792 97.52569      99.8% rate per 100  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  19 133 14.28571  6.217672 27.62798
#> 2 Area2 2015  22 130 16.92308  7.887686 31.33951
#> 3 Area3 2015  54 126 42.85714 27.047717 64.14027
#> 4 Area4 2015  95 177 53.67232 38.240074 72.97515
#> 5 Area1 2016  77 152 50.65789 34.659672 71.17371
#> 6 Area2 2016  69 162 42.59259 28.473385 60.95598
#> 7 Area3 2016  81 179 45.25140 31.278577 63.05818
#> 8 Area4 2016  98 136 72.05882 51.627792 97.52569

These functions can also return aggregate data if the input dataframes are grouped:



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop standardised_pop value lowercl
#>    <fct> <int> <fct>       <int>     <int>            <dbl> <dbl>   <dbl>
#>  1 Area1  2006 Fema~        1902    280649             760.  760.    724.
#>  2 Area1  2006 Male         2238    284526             776.  776.    742.
#>  3 Area1  2007 Fema~        1939    283938             714.  714.    681.
#>  4 Area1  2007 Male         1427    286021             533.  533.    504.
#>  5 Area1  2008 Fema~        1901    270756             782.  782.    744.
#>  6 Area1  2008 Male         1404    277536             547.  547.    517.
#>  7 Area1  2009 Fema~        1813    289587             731.  731.    696.
#>  8 Area1  2009 Male         2510    266885             973.  973.    932.
#>  9 Area1  2010 Fema~        1724    275556             683.  683.    650.
#> 10 Area1  2010 Male         2273    280717             768.  768.    734.
#> # ... with 30 more rows, and 4 more variables: uppercl <dbl>, confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year [20]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <fct> <int> <fct>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1902    280649  760.    724.    797.
#>  2 Area1  2006 Male          2238    284526  776.    742.    812.
#>  3 Area1  2007 Female        1939    283938  714.    681.    748.
#>  4 Area1  2007 Male          1427    286021  533.    504.    564.
#>  5 Area1  2008 Female        1901    270756  782.    744.    821.
#>  6 Area1  2008 Male          1404    277536  547.    517.    579.
#>  7 Area1  2009 Female        1813    289587  731.    696.    766.
#>  8 Area1  2009 Male          2510    266885  973.    932.   1014.
#>  9 Area1  2010 Female        1724    275556  683.    650.    717.
#> 10 Area1  2010 Male          2273    280717  768.    734.    803.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop standardised_pop value lowercl
#>    <fct> <int> <fct>       <int>     <int>            <dbl> <dbl>   <dbl>
#>  1 Area1  2006 Fema~        1902    280649             760.  760.    724.
#>  2 Area1  2006 Male         2238    284526             776.  776.    742.
#>  3 Area1  2007 Fema~        1939    283938             714.  714.    681.
#>  4 Area1  2007 Male         1427    286021             533.  533.    504.
#>  5 Area1  2008 Fema~        1901    270756             782.  782.    744.
#>  6 Area1  2008 Male         1404    277536             547.  547.    517.
#>  7 Area1  2009 Fema~        1813    289587             731.  731.    696.
#>  8 Area1  2009 Male         2510    266885             973.  973.    932.
#>  9 Area1  2010 Fema~        1724    275556             683.  683.    650.
#> 10 Area1  2010 Male         2273    280717             768.  768.    734.
#> # ... with 30 more rows, and 4 more variables: uppercl <dbl>, confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop standardised_pop value lowercl
#>    <fct> <int> <fct>       <int>     <int>            <dbl> <dbl>   <dbl>
#>  1 Area1  2006 Fema~        1902    280649             760.  760.    724.
#>  2 Area1  2006 Male         2238    284526             776.  776.    742.
#>  3 Area1  2007 Fema~        1939    283938             714.  714.    681.
#>  4 Area1  2007 Male         1427    286021             533.  533.    504.
#>  5 Area1  2008 Fema~        1901    270756             782.  782.    744.
#>  6 Area1  2008 Male         1404    277536             547.  547.    517.
#>  7 Area1  2009 Fema~        1813    289587             731.  731.    696.
#>  8 Area1  2009 Male         2510    266885             973.  973.    932.
#>  9 Area1  2010 Fema~        1724    275556             683.  683.    650.
#> 10 Area1  2010 Male         2273    280717             768.  768.    734.
#> # ... with 30 more rows, and 4 more variables: uppercl <dbl>, confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop standardised_pop value lowercl
#>    <fct> <int> <fct>       <int>     <int>            <dbl> <dbl>   <dbl>
#>  1 Area1  2006 Fema~        1570    222016             713.  783.    744.
#>  2 Area1  2006 Male         1624    228497             672.  739.    703.
#>  3 Area1  2007 Fema~        1510    232395             626.  687.    652.
#>  4 Area1  2007 Male         1066    220432             470.  517.    486.
#>  5 Area1  2008 Fema~        1500    215180             719.  790.    749.
#>  6 Area1  2008 Male         1158    221122             503.  552.    520.
#>  7 Area1  2009 Fema~        1636    226142             698.  767.    730.
#>  8 Area1  2009 Male         1907    213639             872.  959.    915.
#>  9 Area1  2010 Fema~        1429    217487             617.  678.    643.
#> 10 Area1  2010 Male         1729    221139             703.  773.    736.
#> # ... with 30 more rows, and 4 more variables: uppercl <dbl>, confidence <chr>,
#> #   statistic <chr>, method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(area, year) %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 11
#> # Groups:   area [4]
#>    area   year total_count total_pop standardised_pop value lowercl uppercl
#>    <fct> <int>       <int>     <int>            <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006        4140    565175             730.  730.    707.    754.
#>  2 Area1  2007        3366    569959             614.  614.    593.    636.
#>  3 Area1  2008        3305    548292             645.  645.    622.    669.
#>  4 Area1  2009        4323    556472             822.  822.    797.    849.
#>  5 Area1  2010        3997    556273             712.  712.    689.    735.
#>  6 Area2  2006        4309    583335             752.  752.    728.    777.
#>  7 Area2  2007        3770    545306             682.  682.    659.    705.
#>  8 Area2  2008        4131    558671             742.  742.    718.    767.
#>  9 Area2  2009        4093    576157             707.  707.    684.    731.
#> 10 Area2  2010        4049    583252             698.  698.    675.    721.
#> 11 Area3  2006        3519    513099             695.  695.    671.    720.
#> 12 Area3  2007        4183    585328             742.  742.    718.    765.
#> 13 Area3  2008        3523    561450             637.  637.    615.    660.
#> 14 Area3  2009        3979    572040             694.  694.    671.    717.
#> 15 Area3  2010        4219    573184             750.  750.    726.    774.
#> 16 Area4  2006        3205    557733             613.  613.    591.    635.
#> 17 Area4  2007        3631    605148             602.  602.    581.    623.
#> 18 Area4  2008        3908    543962             780.  780.    754.    806.
#> 19 Area4  2009        3750    581381             670.  670.    647.    692.
#> 20 Area4  2010        3506    581500             617.  617.    595.    639.
#> # ... with 3 more variables: confidence <chr>, statistic <chr>, method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1902    1904. 0.999   0.955   1.05  95%       
#>  2 Area1  2006 Male      2238    1952. 1.15    1.10    1.19  95%       
#>  3 Area1  2007 Fema~     1939    1965. 0.987   0.943   1.03  95%       
#>  4 Area1  2007 Male      1427    1969. 0.725   0.687   0.763 95%       
#>  5 Area1  2008 Fema~     1901    1849. 1.03    0.982   1.08  95%       
#>  6 Area1  2008 Male      1404    1882. 0.746   0.708   0.786 95%       
#>  7 Area1  2009 Fema~     1813    2010. 0.902   0.861   0.945 95%       
#>  8 Area1  2009 Male      2510    1810. 1.39    1.33    1.44  95%       
#>  9 Area1  2010 Fema~     1724    1886. 0.914   0.872   0.958 95%       
#> 10 Area1  2010 Male      2273    1935. 1.17    1.13    1.22  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1902    1904. 0.999   0.955   1.05  95%       
#>  2 Area1  2006 Male      2238    1952. 1.15    1.10    1.19  95%       
#>  3 Area1  2007 Fema~     1939    1965. 0.987   0.943   1.03  95%       
#>  4 Area1  2007 Male      1427    1969. 0.725   0.687   0.763 95%       
#>  5 Area1  2008 Fema~     1901    1849. 1.03    0.982   1.08  95%       
#>  6 Area1  2008 Male      1404    1882. 0.746   0.708   0.786 95%       
#>  7 Area1  2009 Fema~     1813    2010. 0.902   0.861   0.945 95%       
#>  8 Area1  2009 Male      2510    1810. 1.39    1.33    1.44  95%       
#>  9 Area1  2010 Fema~     1724    1886. 0.914   0.872   0.958 95%       
#> 10 Area1  2010 Male      2273    1935. 1.17    1.13    1.22  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15173   15173  1       0.984   1.02 
#> 2  2007    14950   15864. 0.942   0.927   0.958
#> 3  2008    14867   15145. 0.982   0.966   0.998
#> 4  2009    16145   15687. 1.03    1.01    1.05 
#> 5  2010    15771   15758. 1.00    0.985   1.02

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1902    1904.     684.  683.    653.    715. 95%       
#>  2 Area1  2006 Male      2238    1952.     684.  784.    752.    817. 95%       
#>  3 Area1  2007 Fema~     1939    1965.     684.  675.    645.    705. 95%       
#>  4 Area1  2007 Male      1427    1969.     684.  495.    470.    522. 95%       
#>  5 Area1  2008 Fema~     1901    1849.     684.  703.    672.    735. 95%       
#>  6 Area1  2008 Male      1404    1882.     684.  510.    484.    538. 95%       
#>  7 Area1  2009 Fema~     1813    2010.     684.  617.    589.    646. 95%       
#>  8 Area1  2009 Male      2510    1810.     684.  948.    911.    986. 95%       
#>  9 Area1  2010 Fema~     1724    1886.     684.  625.    596.    655. 95%       
#> 10 Area1  2010 Male      2273    1935.     684.  803.    771.    837. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1902    1904.     684.  683.    653.    715. 95%       
#>  2 Area1  2006 Male      2238    1952.     684.  784.    752.    817. 95%       
#>  3 Area1  2007 Fema~     1939    1965.     684.  675.    645.    705. 95%       
#>  4 Area1  2007 Male      1427    1969.     684.  495.    470.    522. 95%       
#>  5 Area1  2008 Fema~     1901    1849.     684.  703.    672.    735. 95%       
#>  6 Area1  2008 Male      1404    1882.     684.  510.    484.    538. 95%       
#>  7 Area1  2009 Fema~     1813    2010.     684.  617.    589.    646. 95%       
#>  8 Area1  2009 Male      2510    1810.     684.  948.    911.    986. 95%       
#>  9 Area1  2010 Fema~     1724    1886.     684.  625.    596.    655. 95%       
#> 10 Area1  2010 Male      2273    1935.     684.  803.    771.    837. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15173   15173      684.  684.    673.    695.
#> 2  2007    14950   15864.     684.  644.    634.    655.
#> 3  2008    14867   15145.     684.  671.    660.    682.
#> 4  2009    16145   15687.     684.  704.    693.    715.
#> 5  2010    15771   15758.     684.  684.    674.    695.