AWS S3 as a database with S3 Select
Have you ever wished to pay little to no money for using a database?
Table of contents
When I see AWS S3, I don't think of it as a database. I think of it as data storage to put my big file data like photos or videos. Someday I read somewhere on the Internet that says S3 is a database, making me want to experiment with it. In this article, I will try to implement CRUD operation in AWS S3.
Prerequisite
In this experiment, We will be using AWS S3 and Python. Here is what I have prepared before the experiment.
- AWS Account
- S3 Bucket to store CSV File
- Python Environment
- A Dummy CSV File filled with one thousand users' information
Implementing CRUD
And now, let's get our hands dirty and code this thing. First, of Its a good thing to know what we are going to build, so I wrote a test (not really) before implementing it.
from s3.user import read, delete, update, create
# Read Query
print("Read Data")
print(read("WHERE firstname = 'Devina'"))
# Output
# [['1274', 'Devina', 'Terencio', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]
# Update Query
print("Update Data")
update(
"WHERE firstname = 'Devina'",
{
'lastname': 'Green'
}
)
# Output
print(read("WHERE firstname = 'Devina'"))
# [['1274', 'Devina', 'Green', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]
# Delete Item
# print("Delete Item")
delete("WHERE firstname = 'Devina'")
# Output
print(read("WHERE firstname = 'Devina'"))
# []
# Create Item
print("Create Item")
create({
'firstname': 'Devina',
'lastname': 'Terencio',
'email': 'Devina.Terencio@yopmail.com',
'email2': 'Devina.Terencio@gmail.com',
'profession': 'doctor'
})
# Output
print(read("WHERE firstname = 'Devina'"))
# [['2100', 'Devina', 'Terencio', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]
If what is printed in the terminal is the same as expected, we successfully built our application.
Here is the GitHub link: github.com/alvinend/s3-as-a-db
Read
What do we do when we want to query some items.
- Use S3 Select to query CSV. This will return object
- Get payload string from the response object
- Change payload string to list
Easy!
import boto3
import pandas
import settings
s3_client = boto3.client('s3')
def read(
query,
select='*'
):
# Use S3 Select to Query CSV
res = s3_client.select_object_content(
Bucket = settings.BUCKET_NAME,
Key = 'user.csv',
ExpressionType = 'SQL',
Expression =f"Select {select} from S3Object s " + query,
InputSerialization = {
'CompressionType': 'NONE',
'CSV' : {
'FileHeaderInfo' : 'Use',
'RecordDelimiter' : '\n',
'FieldDelimiter' : ','
}
},
OutputSerialization = {
'CSV' : {
'RecordDelimiter' : '\n',
'FieldDelimiter' : ','
}
}
)
records = ''
for event in res['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
# Change String to Array 1, "Name1, Address1\n 2, Name2, Address2\n" -> ["1, Name1, Address1", "2, Name2, Address2", ""]
records = records.split('\n')
# Remove Empty String Element ["1, Name1, Address1", "2, Name2, Address2", ""] -> ["1, Name1, Address1", "2, Name2, Address2"]
records = filter(len, records)
# Change Elemnt to be String [["1", "Name1", "Address1"], ["2", "Name2", "Address2"]]
records = list(map(lambda x: x.replace('\r', '').split(','), records))
return records
Delete
Since S3 has no feature to manipulate CSV files, we must do it manually. By manually, I mean taking the file from S3 manipulate and replacing the old file with the manipulated file.
import boto3
import pandas
import os
import settings
from io import StringIO
s3_resource = boto3.resource('s3')
s3_client = boto3.client('s3')
def get_df():
res = s3_client.get_object(Bucket=settings.BUCKET_NAME, Key="user.csv")
df = pandas.read_csv(res.get("Body"))
return df
def save_to_s3(df):
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3_resource.Object(settings.BUCKET_NAME, 'user.csv').put(Body=csv_buffer.getvalue())
We will use this method to create and update operations too. For delete, we do these actions.
- Get Dataframe from S3
- Get Ids from S3 Select query
- Filter out selected ID
- Save Dataframe to S3 as CSV
from s3.user.utils import get_df, save_to_s3
from s3.user.read import read
def delete(query):
# Get Dataframe
df = get_df()
# Get Ids from query
ids = read(query, select='id')
ids = list(map(lambda x: int("".join(x)), ids))
# Filter out Selected ID
newdf = df[~df.id.isin(ids)]
# Save Dataframe to S3 as CSV
save_to_s3(newdf)
Update
The update operation is almost the same as delete, with the exception instead of filtering it, we update it.
- Get Dataframe from S3
- Get Ids from S3 Select query
- Update selected ID with new values.
- Save Dataframe to S3 as CSV
def update(
query,
data_dict
):
update_column_key = list(data_dict.keys())
update_column_value = list(data_dict.values())
# Get Dataframe
df = get_df()
# Get Ids from query
ids = read(query, select='id')
ids = list(map(lambda x: int("".join(x)), ids))
# Update DF
df.loc[df.id.isin(ids), update_column_key] = update_column_value
# Save Dataframe to S3 as CSV
save_to_s3(df)
Create
Last but not least is the create operation. Although I would like to insert data in pandas' DataFrame, it has no auto-increment feature. So we have to manually search the latest stored ID, increment it, and save new data.
- Get Dataframe from S3
- Get latest id from S3 Select query
- Create new data with the incremented latest ID
- Save Dataframe to S3 as CSV
from s3.user.read import read
from s3.user.utils import get_df, save_to_s3
def create(
data_dict
):
# Get Dataframe
df = get_df()
# Get Ids from query
max_id = read('', select='MAX(cast(id as int))')
max_id = int(max_id[0][0])
# Update DF
data_dict['id'] = max_id + 1
df = df.append(data_dict, ignore_index=True)
# Save Dataframe to S3 as CSV
save_to_s3(df)
Closing
There you have it, S3 as a database! I know it is slower than a conventional relational database like MySQL and has no ACID compliance. But it is cheap when you are experimenting or making a prototype of something. It is nice to have a choice that does not require you to manage the database and pay monthly server costs.
As always, thank you for reading this post!
Did you find this article valuable?
Support Alvin Endratno by becoming a sponsor. Any amount is appreciated!