ECA JAN2025: From Past to Present: An Analytical Overview of Singapore’s Parliarmentary Elections

PI No. Y2510707

Name: Ng Chee Wee

Submission Date: 05.03.2025

Executive Summary

The objective of this analytical overview of Singapore Parliamentary Elections aims to provide a factual analysis of Singapore electoral history through quantitative data, extracted from Elections Department to present an objective assessment of electoral trends. As the nation approaches its next General Election, the motivation of this report seeks to provide an understanding of how demographic changes, past systemic and structural political reforms and recent political developments have shaped Singapore’s parliamentary democracy over time, characterized by the People’s Action Party continued electoral success and the gradual emergence of opposition representation.

The report will reveal several significant findings about Singapore’s evolving electoral landscape. First, the data will show the expansion in registered voter base. The steady increase reflects propulation growth, which is consistent with a young nation, and the increase has implication for electoral boundary revision which in turn will affect strategies of each political parties keen to contest in the next election.

Another observed key trend was the significant fluctuation of political parties with notable peaks and troughs over pre and post independence Singapore. This display suggest notable patterns of consolidation and diversification over the years.

Historically, PAP has been the dominant political party and has maintained parliamentary majority since 1959. The most remarkable feat of PAP is perhaps its clean sweep of parliamentary seats in three consecutive elections, leading to systemic reform. These reforms and other critical findings will be presented in later sections of this report.

Introduction

With General election due to be held by 23 November 2025, election season is well and truly underway as Singaporeans gear up for political hustings and wait with bated breath for Polling Day to cast their votes. Will the Opposition parties, led by Workers’ Party and the Progress Singapore Party (PSP) mount a serious challenge to the incumbent People’s Action Party (PAP) to gain more political ground? Or will PAP, led by new Prime Minister Mr. Lawerence Wong, and the fourth generation (4G) leaders, contesting their first election since Mr Lee Hsien Loong stepped down as Prime Minister, maintain a strong mandate from the electorates?

This report will look at some interesting data from past to present and try to analyse how various political reforms and developments over the years have come to shape Singapore’s current political landscape.

Data

For this report, two datasets were extracted from the Elections Department (ELD) provided via data.gov.sg API. The datasets are “Parliamentary General Election Results by Candidate” and “Parliamentary General Election - Registered Electors Rejected Votes and Spoilt Ballots”.

“Parliamentary General Election Results by Candidate” dataset consist of 8 variables and 1539 observations.

“Parliamentary General Election Results by Candidate” and “Parliamentary General Election - Registered Electors Rejected Votes and Spoilt Ballots” consist of 6 variables and 721 observations.

One GeoJSON file “ElectoralBoundary2020GEOJSON” was also used to merge with one of the above dataset to plot a choropleth map. Details of data wrangling work, as well as the RMarkdown file embeded, can be found in the Data appendix.

Data Appendix

For this report, the elections data sets used were from Elections Department (ELD) extracted via the data.gov.sg API. Here I have shown an example of an API to extract data on “Parliamentary GE Results by candidate” from the url https://data.gov.sg/datasets/d_581a30bee57fa7d8383d6bc94739ad00/view

There are 3 main components in the above web API:
1. Base url: https://data.gov.sg/
2. Resource path: api/action/datastore_search
3. Query: ?resource_id=d_581a30bee57fa7d8383d6bc94739ad00

#API to extract data on “Parliamentary GE Results by candidate”

Constructing the URL

dataset_id <- “d_581a30bee57fa7d8383d6bc94739ad00” elections.url <- paste0(“https://data.gov.sg/api/action/datastore_search?resource_id=”, dataset_id, “&limit=10000”)

Extracting the Data

out.elections <- fromJSON(elections.url, simplifyDataFrame = T) # Fetching the data df.elections <- out.electionsresultrecords # Saving the records


Sanity Check

In the “Parliamentary GE Results by candidate” dataframe, it contains 8 variables and 1539 rows. The vote_count and vote_percentage variables were in “chr”.

## 'data.frame':    1609 obs. of  8 variables:
##  $ _id              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year             : chr  "1955" "1955" "1955" "1955" ...
##  $ constituency     : chr  "Bukit Panjang" "Bukit Panjang" "Bukit Timah" "Bukit Timah" ...
##  $ constituency_type: chr  "na" "na" "na" "na" ...
##  $ candidates       : chr  "Goh Tong Liang" "Lim Wee Toh" "S. F. Ho" "Lim Ching Siong" ...
##  $ party            : chr  "PP" "SLF" "PP" "PAP" ...
##  $ vote_count       : chr  "3097" "1192" "722" "3259" ...
##  $ vote_percentage  : chr  "0.7221" "0.2779" "0.1162" "0.5245" ...

Data Cleaning

The two columns were convert to numeric data type and all “na” were replaced with NA.

#convert columns to numeric
df.elections <- df.elections %>%
  mutate(    vote_count = as.numeric(vote_count), 
    vote_percentage = as.numeric(vote_percentage), year = as.numeric(year))

# Replace 'na' with NA for proper handling
df.elections[df.elections == "na"] <- NA
str(df.elections)
## 'data.frame':    1609 obs. of  8 variables:
##  $ _id              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year             : num  1955 1955 1955 1955 1955 ...
##  $ constituency     : chr  "Bukit Panjang" "Bukit Panjang" "Bukit Timah" "Bukit Timah" ...
##  $ constituency_type: chr  NA NA NA NA ...
##  $ candidates       : chr  "Goh Tong Liang" "Lim Wee Toh" "S. F. Ho" "Lim Ching Siong" ...
##  $ party            : chr  "PP" "SLF" "PP" "PAP" ...
##  $ vote_count       : num  3097 1192 722 3259 924 ...
##  $ vote_percentage  : num  0.722 0.278 0.116 0.524 0.149 ...

Sanity Check

The “Parliamentary General Election - Registered Electors Rejected Votes and Spoilt Ballots” dataframe consist of 6 variables and 721 rows. On first inspection, several of the columns were not in the correct data types.

## 'data.frame':    753 obs. of  6 variables:
##  $ _id                       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year                      : chr  "1955" "1955" "1955" "1955" ...
##  $ constituency              : chr  "Bukit Panjang" "Bukit Timah" "Cairnhill" "Changi" ...
##  $ no_of_registered_electors : chr  "8012" "9173" "13528" "11239" ...
##  $ no_of_rejected_votes      : chr  "66" "59" "65" "70" ...
##  $ no_of_spoilt_ballot_papers: chr  "7" "8" "12" "6" ...

Data Cleaning

Those columns were converted to numeric.

# Convert character columns to numeric
df.regelectors <- df.regelectors %>%
  mutate(
    no_of_registered_electors = as.numeric(no_of_registered_electors),
    no_of_rejected_votes = as.numeric(no_of_rejected_votes),
    no_of_spoilt_ballot_papers = as.numeric(no_of_spoilt_ballot_papers),
    year = as.integer(year)
  )
str(df.regelectors)
## 'data.frame':    753 obs. of  6 variables:
##  $ _id                       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year                      : int  1955 1955 1955 1955 1955 1955 1955 1955 1955 1955 ...
##  $ constituency              : chr  "Bukit Panjang" "Bukit Timah" "Cairnhill" "Changi" ...
##  $ no_of_registered_electors : num  8012 9173 13528 11239 12242 ...
##  $ no_of_rejected_votes      : num  66 59 65 70 93 88 61 144 120 60 ...
##  $ no_of_spoilt_ballot_papers: num  7 8 12 6 14 16 7 4 14 6 ...

Data Aggregation

From the “Registered Electors Rejected Votes and Spoilt Ballots” data frame, a summary statistic was created by aggregating each of 3 columns “no_of_registered_electors”, “no_of_rejected_votes” & “no_of_spoilt_ballot_papers” to find each of its total by year and assign it as a new dataframe “yearly_summary”.

# Create yearly summary
yearly_summary <- df.regelectors %>% 
  group_by(year) %>%
  summarise(
    total_registered = sum(no_of_registered_electors, na.rm=TRUE),
    total_rejected = sum(no_of_rejected_votes, na.rm=TRUE),
    total_spoilt = sum(no_of_spoilt_ballot_papers, na.rm=TRUE)

The result is a new dataframe yearly_summary to prepare for visualization

## tibble [17 × 6] (S3: tbl_df/tbl/data.frame)
##  $ year            : int [1:17] 1955 1959 1963 1968 1972 1976 1980 1984 1988 1991 ...
##  $ total_registered: num [1:17] 300299 587780 617650 759367 908382 ...
##  $ total_rejected  : num [1:17] 1830 6650 5893 2058 15229 ...
##  $ total_spoilt    : num [1:17] 215 310 204 32 342 308 200 355 580 204 ...
##  $ rejection_rate  : num [1:17] 0.609 1.131 0.954 0.271 1.676 ...
##  $ spoilt_rate     : num [1:17] 0.0716 0.05274 0.03303 0.00421 0.03765 ...

Feature Engineering

To create plot for Rejection rates & spoilt ballots rates, 2 new variables were created in the yearly_summary data frame. Rejection rate, which is total_rejected divided by total_registered. spoilt_rate, which is total_spoilt divided by total_registered. The new features provide more meaningful insights than raw counts from the original data set.

yearly_summary <- yearly_summary %>%
  mutate(
    rejection_rate = (total_rejected / total_registered) * 100,
    spoilt_rate = (total_spoilt / total_registered) * 100
  )

Pre processing before merging

# rename Name column to constituency
geo_data <- geo_data %>% 
  rename(constituency = Name)

# Filter to year 2020 & convert constituency name to Upper case
df.elections2020 <- df.elections %>%
  filter(year == 2020) %>%
  mutate(constituency = trimws(toupper(constituency)))
# Extract the winning party per constituency
winning_parties <- df.elections2020 %>%
  group_by(constituency) %>%
  filter(vote_percentage == max(vote_percentage, na.rm=TRUE)) %>%
  ungroup()
# Use a left-join to merge df.elections & geo_data by common column "constituency"
combined_data <- geo_data %>% 
 # left_join(df.elections2020, by = "constituency")
  left_join(winning_parties, by = "constituency")

After read in the 2020GeoJSON file and assigned as geo_data, a sanity check found that a column “Name” of geo_data, which shows constituency names are in CAPITAL letters. On the other hand, the df.elections consist of a column “constituency” with the constituency names but not in CAPITAL letters.

Hence I renamed “Name” column in geo_data dataframe to “constituency”. In the df.elections dataframe, I filter to the year 2020 and converted “constituency” to upper case and assigned the dataframe as df.elections2020.

As I wanted to show winning party vote percentage in year 2020, I created a new data frame “winning_parties”, which was grouped by constituency and filtered by the highest vote percentage using max(). I used “na.rm = TRUE” to skip the rows with missing value when performing the max() function so that it would not return a NA should it encounter a missing value.

After that, I merged geo_data with df.elections with a left_join() and by the common column “constituency” and named the new dataframe as combined_data. The new dataframe is ready for plotting.

Marking locations on the map

I wanted to show the names of four GRC on the choropleth map, hence I googled for their respective logitudes and latitudes, assigned their names and constructed a data frame for the places to be marked, as shown in below codes.

#save longitude,latitude & location name
Jurong <- c(103.7216, 1.3346, 'Jurong GRC')
AMK <- c(103.84834, 1.36944, 'AMK GRC')
East <- c(103.9951, 1.3511, 'East Coast GRC')
West <- c(103.72767, 1.27931, 'West Coast GRC')

# Construct the data frame for the places to be marked
places <- rbind(Jurong, AMK, East, West) %>% as.data.frame()
colnames(places) <- c("long", "lat", "ID") # ID contains the name of the places
places$long <- as.numeric(places$long) #ensure the coordinates are numeric
places$lat <- as.numeric (places$lat)

RMarkdown file