<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Alvin Endratno — Tech Founder, Engineer & Builder of PT Lensiro Digital Solusi]]></title><description><![CDATA[Thoughts on building SaaS products, AI deployments, offshore engineering, and running a tech company in Indonesia. Written by a founder who also codes.]]></description><link>https://blog.alvinend.tech</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1648274327673/PX2uP7jQX.png</url><title>Alvin Endratno — Tech Founder, Engineer &amp; Builder of PT Lensiro Digital Solusi</title><link>https://blog.alvinend.tech</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 12:38:23 GMT</lastBuildDate><atom:link href="https://blog.alvinend.tech/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[The Founder's Perspective: Why Vertical SaaS?]]></title><description><![CDATA[Or: Why we stopped trying to build software for everyone, and built Lensiro just for optical shops.
If you've ever walked into an optical shop in Indonesia to buy a pair of glasses, you probably didn']]></description><link>https://blog.alvinend.tech/the-founder-s-perspective-why-vertical-saas</link><guid isPermaLink="true">https://blog.alvinend.tech/the-founder-s-perspective-why-vertical-saas</guid><category><![CDATA[SaaS]]></category><category><![CDATA[sales]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Tue, 31 Mar 2026 16:02:10 GMT</pubDate><content:encoded><![CDATA[<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/91d93ddf-8296-4150-ad24-8d347fb9d87f.png" alt="" style="display:block;margin:0 auto" />

<p>Or: Why we stopped trying to build software for everyone, and built Lensiro just for optical shops.</p>
<p>If you've ever walked into an optical shop in Indonesia to buy a pair of glasses, you probably didn't think much about the software running on the computer behind the counter. But as an engineer and founder, I couldn't help but look.</p>
<p>What I saw was a nightmare of generic Point of Sale (POS) systems trying—and failing—to handle the incredibly specific workflows of optical retail. The system would say there were 5 frames in stock, but a customer with a specific prescription would need a lens that had to be custom-ordered, sent to a lab for faset (cutting and edging), and tracked through a multi-day quality control process.</p>
<p>A generic POS simply cannot handle that. It just sees "Item A" and "Item B". It doesn't understand that Item A (the frame) and Item B (the lens) need to be married together in a lab before the customer can pick them up.</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/630e601d-b095-450d-8265-d3f0f56c898c.png" alt="" style="display:block;margin:0 auto" />

<p>But the problem isn't just on the business side. The customer experience (CX) of buying glasses hasn't fundamentally improved in decades. You walk in, stare at hundreds of frames on a wall, try on a few that probably don't fit your face shape or nose bridge, get your eyes checked, and then wait days with zero transparency on when your glasses will actually be ready.</p>
<p>This was the "aha" moment that led us to build Lensiro . We realized that to truly revolutionize the optical industry, we couldn't just fix the backend business operations—we had to completely reinvent the customer experience. And to do that, we needed Vertical SaaS.</p>
<h2>The Problem with Horizontal SaaS</h2>
<p>Horizontal SaaS—software built to serve a wide variety of industries—is great for general problems. Think of Slack for communication or generic CRM tools. But when it comes to the core operational workflows and customer touchpoints of a specific industry, horizontal software often falls short.</p>
<p>It's the classic "jack of all trades, master of none" problem. To appeal to a restaurant, a clothing store, and a pharmacy, a horizontal POS has to stay generic.</p>
<p>In the optical industry, a generic POS leads to:</p>
<ul>
<li><p>Inventory Chaos: Optical shops don't just sell items; they sell combinations of frames, lenses with specific prescriptions (spherical, cylinder, axis), and contact lenses.</p>
</li>
<li><p>Broken Workflows: There is no built-in way to track the Faset &amp; QC process, meaning customers are left in the dark about their order status.</p>
</li>
<li><p>Stagnant Customer Experience: There is no personalized styling, no seamless after-service follow-ups, and no modern digital touchpoints for the buyer.</p>
</li>
</ul>
<h2>The Vertical SaaS Advantage</h2>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/fafab1c9-0093-4a85-bd31-a0f6e4afd5dc.png" alt="" style="display:block;margin:0 auto" />

<p>Vertical SaaS, on the other hand, is software built for a specific niche. And right now, it is dominating.</p>
<p>Recent market data shows that the global vertical software market is projected to reach USD 282.98 billion by 2031, growing at a CAGR of over 11% . Why? Because 87% of vertical SaaS users report that their provider truly understands their industry's needs, compared to just 62% for horizontal solutions .</p>
<p>When we decided to build Lensiro, we didn't want to build just another POS. We wanted to build the operating system for optical retail in Indonesia, with a dual mandate: bulletproof business operations and a revolutionary customer experience.</p>
<p>Here is why this specialist approach works:</p>
<h3>Revolutionizing the Customer Experience (CX)</h3>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/9f77e308-7027-4e2d-a83b-f26582dd3ce3.png" alt="" style="display:block;margin:0 auto" />

<p>Because we only focus on optical retail, we can build customer-facing features that a generic POS company would never touch.</p>
<p>Our ultimate objective with Lensiro isn't just to make shop owners' lives easier; it's to make buying glasses feel like magic. That's why we are building Lensiro Fit—an AI-powered eyewear fitting system that uses MediaPipe and Gemini AI to analyze a customer's face shape, measure their nose bridge, and recommend the perfect frames based on a custom sizing and styling algorithm. (I'll be writing a deep-dive article specifically on Lensiro Fit soon!)</p>
<p>We also integrated the WhatsApp Business API so customers receive automated, branded invoices and real-time updates when their glasses move from the lab to "ready for pickup." We are turning a historically blind, analog process into a modern, transparent digital experience.</p>
<h3>High Switching Costs and Deep Integration</h3>
<p>When you embed your software deeply into an industry's mission-critical workflows, you become indispensable. Lensiro handles everything from the initial DP (down payment), to the lab tracking, to the final customer WhatsApp notification.</p>
<p>When a system handles your double-entry bookkeeping, your multi-branch stock mutasi, and your customer OTPs, replacing it becomes unthinkable. This is why top vertical SaaS platforms frequently achieve Net Revenue Retention (NRR) rates of over 120% .</p>
<h3>Solving the Unsexy Problems</h3>
<p>The best vertical SaaS companies solve the unsexy, deeply frustrating problems that horizontal players ignore. For us, that was the stock system. We built an event-sourced ledger architecture so that our stock counts cannot be wrong. (I wrote a whole deep-dive on that <a href="https://blog.alvinend.tech/building-a-stock-system-that-cannot-be-wrong">here</a>).</p>
<p>We also tackled the complexities of multi-branch management. Many optical chains in Indonesia have 5, 10, or 50+ branches. Managing roles, access, and real-time stock opname across that many locations requires a specialized architecture, not a generic "location" tag.</p>
<h2>The Future is Specialized</h2>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/ddebb90e-98c6-49c5-8b10-6fb06d7e6f5b.png" alt="" style="display:block;margin:0 auto" />

<p>As AI continues to commoditize basic software functions, the true moat for SaaS companies will be deep, specialized domain knowledge and proprietary workflows that touch both the business backend and the end-consumer experience.</p>
<p>The horizontal player is like a general practitioner doctor—good for a basic checkup. But the vertical player is the specialized surgeon. And in software, just like in medicine, specialists deliver better outcomes.</p>
<p>Building Lensiro has taught me that you don't need to build software for everyone to build a massive, impactful business. You just need to build the perfect software for someone, and completely change how their customers experience the world.</p>
<hr />
<h2>About Lensiro</h2>
<p><a href="https://lensiro.com/">Lensiro</a> is a complete retail management platform built specifically for optical stores. It handles everything an eyewear business needs in one place — from inventory and stock management across multiple branches, to point-of-sale, purchasing, member management, and accounting with double-entry bookkeeping. If you run an optical retail business and want a system that actually gets inventory right, check it out at <a href="https://lensiro.com/">lensiro.com</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Building a Stock System That Cannot Be Wrong]]></title><description><![CDATA[The Problem Every Retailer Knows
If you've ever managed inventory for a retail business, you know the nightmare: the system says you have 5 units, but you can only find 3 on the shelf. Or worse, you s]]></description><link>https://blog.alvinend.tech/building-a-stock-system-that-cannot-be-wrong</link><guid isPermaLink="true">https://blog.alvinend.tech/building-a-stock-system-that-cannot-be-wrong</guid><category><![CDATA[Next.js]]></category><category><![CDATA[prisma]]></category><category><![CDATA[PostgreSQL]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Fri, 20 Mar 2026 14:57:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/8d7e4675-1363-47a9-a7ee-649353ac21c7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Problem Every Retailer Knows</h2>
<p>If you've ever managed inventory for a retail business, you know the nightmare: the system says you have 5 units, but you can only find 3 on the shelf. Or worse, you sell something the system said was in stock, only to discover it was already gone.</p>
<p>Traditional inventory systems store stock as a <strong>single number in a database column</strong> — and every operation just increments or decrements that number. It works... until it doesn't. A failed API call, a race condition, a developer forgetting to update the count in one edge case — and suddenly your stock is off. Forever. And you have no idea <em>when</em> or <em>why</em> it went wrong.</p>
<p>I built Lensiro's stock system so that this class of bugs is <strong>architecturally impossible</strong>.</p>
<hr />
<h2>The Core Idea: Never Store the Stock Count</h2>
<p>Here's the fundamental insight: <strong>the stock quantity is never stored anywhere.</strong> There is no <code>quantity</code> column in the Stock table. Zero. None.</p>
<p>Instead, every time anyone asks "how many of item X do we have at branch Y?", the system <strong>calculates it from scratch</strong> by summing up every movement that has ever happened to that item.</p>
<pre><code class="language-plaintext">Stock = SUM(all inflows) - SUM(all outflows)
</code></pre>
<p>That's it. That's the whole trick.</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/b0ab7d3b-fcbd-47f1-9543-5bf697786ef8.png" alt="Screenshot of the stock detail trail view showing a chronological ledger of all movements — purchases, sales, transfers, adjustments — with running totals" style="display:block;margin:0 auto" />

<hr />
<h2>The Ledger: Every Movement Tells a Story</h2>
<p>Every stock change is recorded as an <strong>immutable event</strong> in its own dedicated table. There are 8 types of movements:</p>
<p><strong>Things that add stock (+):</strong></p>
<ul>
<li><p><strong>Purchase Receive</strong> — goods physically received from a supplier</p>
</li>
<li><p><strong>Stock Opname (positive)</strong> — physical count reveals more stock than expected</p>
</li>
<li><p><strong>Transfer In</strong> — items arriving from another branch</p>
</li>
</ul>
<p><strong>Things that remove stock (-):</strong></p>
<ul>
<li><p><strong>Sale</strong> — items sold to customers</p>
</li>
<li><p><strong>Complimentary</strong> — items given away</p>
</li>
<li><p><strong>Purchase Return</strong> — items sent back to supplier</p>
</li>
<li><p><strong>Transfer Out</strong> — items shipped to another branch</p>
</li>
<li><p><strong>Stock Opname (negative)</strong> — physical count reveals less stock than expected</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/d859fc92-82e8-47e4-8e7d-7ed621ca759e.png" alt="Screenshot of the stock detail expanded view showing increase/decrease entries with dates, descriptions, and clickable links to source documents" style="display:block;margin:0 auto" />

<p>Each movement record contains:</p>
<ul>
<li><p>The count (how many)</p>
</li>
<li><p>The timestamp (when)</p>
</li>
<li><p>The cost price (for accounting)</p>
</li>
<li><p>A link back to its source document (the sale, the purchase order, the transfer)</p>
</li>
</ul>
<p>These records are <strong>append-only</strong>. Once a purchase receive is recorded, it stays recorded. You don't go back and edit the number — if there's a correction, you create a <em>new</em> adjustment record. The history is sacred.</p>
<hr />
<h2>This Was Not Easy to Implement</h2>
<p>The idea is simple — just sum up all movements. The implementation? Not so much. Here's what made it hard and what I did to make it work.</p>
<h3>The Query From Hell</h3>
<p>The stock count query unions <strong>9 different tables</strong> into a single CTE (Common Table Expression). Each sub-query has its own JOIN conditions, its own sign logic (positive for inflows, negative for outflows), and its own edge cases. The first version of this query was slow. Really slow. Loading the stock list page with a few thousand items could take seconds.</p>
<p>The fix was <strong>pre-aggregating inside each UNION ALL branch.</strong> Instead of dumping every individual row into the CTE and then aggregating at the end, each sub-query does its own <code>GROUP BY stockId</code> and <code>SUM(count)</code> first. This means the CTE combines 9 small, pre-summed result sets instead of potentially tens of thousands of raw rows. The difference was night and day.</p>
<p>I also split the query into layers using multiple CTEs:</p>
<ol>
<li><p><code>movements</code> — unions and pre-aggregates all 9 movement sources</p>
</li>
<li><p><code>filtered_stock</code> — applies all the filter conditions (branch, category, brand, color, prescription, price range, name search) against the Stock and Item tables</p>
</li>
<li><p><code>stock_counts</code> — joins filtered stocks with their movements, applies HAVING clauses for min/max stock filters</p>
</li>
</ol>
<p>This way the database can optimize each layer independently. Filters narrow down which stocks we care about <em>before</em> we join with movements, so we're not calculating counts for items nobody asked for.</p>
<h3>The Consistency Problem</h3>
<p>If you're not careful, event-sourced stock calculation can show inconsistent numbers. Imagine:</p>
<ul>
<li><p>The stock list page calculates stock using the <strong>SQL CTE</strong> on the server</p>
</li>
<li><p>A user clicks "Detail" on an item, which fetches all relations and calculates stock using <strong>TypeScript</strong> on the client (<code>calcStockPlus()</code> minus <code>calcStockMinus()</code>)</p>
</li>
<li><p>If the logic in SQL and TypeScript doesn't match <em>exactly</em>, the numbers disagree</p>
</li>
</ul>
<p>This actually happened during development. I added a new movement type (<code>StockBuffer</code>) to the SQL query but forgot to add it to the TypeScript calculation. The stock list showed one number, the detail view showed another. Staff would understandably lose trust in the system.</p>
<p>The solution was discipline: <strong>every time a new movement type is added, it must be added in three places</strong> — the SQL CTE, the TypeScript <code>calcStock</code> functions, and the <code>generateTrail</code> function that renders the timeline. There's no clever abstraction that enforces this automatically. It's just a rule you follow.</p>
<h3>FIFO Cost Price Calculation</h3>
<p>Getting the stock count right was only half the battle. For accounting, every outflow (sale, return, transfer) needs a <strong>cost price</strong> — how much did we pay for this specific unit? We use FIFO (First In, First Out): the oldest purchased units are assumed to be sold first.</p>
<p>The tricky part: what happens when you sell an item <em>before</em> the purchase is recorded? This happens in real retail — you receive goods on Monday, sell one on Tuesday, but the purchase order isn't entered until Wednesday. The outflow happened before the inflow exists in the system.</p>
<p>I handle this with <strong>retroactive cost pricing</strong>. Outflows that happen before sufficient inflows are queued as "pending." When a new purchase is recorded, the system re-processes pending movements and assigns them the correct cost price. This runs asynchronously using <code>waitUntil()</code> after the main transaction commits, so it doesn't block the user.</p>
<h3>Batch Operations</h3>
<p>Early on, creating stock records one at a time during bulk operations (like receiving a purchase order with 50 line items) was painfully slow — each one triggered a separate <code>findFirst</code> then <code>create</code> query. I replaced this with <code>fetchOrCreateStocksBatch()</code>, which finds all existing stocks in one query, identifies the missing ones, bulk-creates them with <code>createMany</code>, and fetches the new ones. What used to be 100+ queries became 3-4.</p>
<hr />
<h2>Why "Cannot Be Wrong"?</h2>
<p>When I say this system "cannot be wrong," I mean something very specific. The stock count is <strong>mathematically derived</strong> from the complete set of transactions. If the transactions are recorded correctly, the stock count is correct — by definition. There is no way for the count to "drift" out of sync, because there is no separate count to drift.</p>
<p>Here's what makes this different from a traditional <code>UPDATE stock SET quantity = quantity - 1</code> approach:</p>
<h3>1. No Stale State</h3>
<p>There is no cached or stored quantity that can go stale. Every query recalculates from the source of truth. If you run the query right now and again in 5 minutes (with no new transactions), you get the exact same answer.</p>
<h3>2. Complete Auditability</h3>
<p>Every single unit of stock can be traced back to <em>how</em> it got there. Got 12 units? You can see: 10 came from a purchase on January 3rd, 3 came from a transfer on January 15th, and 1 was sold on January 20th. 10 + 3 - 1 = 12. The math is right there on screen.</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/62d58518-a41c-4439-9a7f-e55fa355a7da.png" alt="Screenshot of the trail view showing the running total column, where each row adds or subtracts and you can visually verify the math" style="display:block;margin:0 auto" />

<h3>3. Self-Correcting with Stock Opname</h3>
<p>When staff does a physical stock take (we call it "Stock Opname"), they count what's actually on the shelf and enter it. The system compares: ledger says 12, physical count says 11. It creates an adjustment record of -1. Now the ledger says 11. The discrepancy is <em>recorded as an event itself</em> — not silently patched.</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/ca462cbf-657d-4032-8b92-1d549746c838.png" alt="Screenshot of the Stock Opname page showing the adjustment creation flow" style="display:block;margin:0 auto" />

<h3>4. Atomic Transactions</h3>
<p>Every operation that touches stock is wrapped in a database transaction. If you're recording a sale with 3 line items, either all 3 items get their movement records, or none of them do. No partial states.</p>
<h3>5. Branch Isolation</h3>
<p>Each branch has its own stock ledger. Branch A's inventory is completely independent of Branch B's. When items move between branches, it creates a <em>pair</em> of records: a decrease at the source branch and an increase at the destination branch (only when the receiving branch confirms receipt).</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/67731eb0-8cde-40eb-b280-20c6a089df86.png" alt="Screenshot of the stock list filtered by branch, showing branch selector and different stock counts per branch" style="display:block;margin:0 auto" />

<hr />
<h2>The SQL Behind It</h2>
<p>On the backend, a single SQL query with a Common Table Expression (CTE) unions all movement tables together and sums them up:</p>
<pre><code class="language-sql">WITH movements AS (
  SELECT stockId, SUM(count) as value
  FROM PurchaseReceiveItem
  GROUP BY stockId

  UNION ALL

  SELECT stockId, SUM(count) as value
  FROM StockAdjustment
  GROUP BY stockId

  UNION ALL

  SELECT targetStockId, SUM(count) as value
  FROM StockMutationItem
  WHERE received = true
  GROUP BY targetStockId

  UNION ALL

  SELECT stockId, -SUM(count) as value
  FROM SaleItem
  GROUP BY stockId

  -- ... and so on for all movement types
)
SELECT stockId, COALESCE(SUM(value), 0) AS stockCount
FROM movements
GROUP BY stockId
</code></pre>
<p>On the client side, the same logic is mirrored in TypeScript — <code>calcStockPlus()</code> sums all inflows, <code>calcStockMinus()</code> sums all outflows, and <code>calcStock()</code> returns the difference. Both server and client use the same formula. There's nowhere for the numbers to disagree.</p>
<hr />
<h2>The Trail View: Show Your Work</h2>
<p>My favorite part of this system is the trail view. When a staff member clicks on any stock item, they see a full chronological timeline of every movement, each one clickable back to the source document.</p>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/d859fc92-82e8-47e4-8e7d-7ed621ca759e.png" alt="Screenshot of the expanded trail view for a single item, showing the full timeline with purchase receives, sales, and transfers in chronological order" style="display:block;margin:0 auto" />

<p>It's like a bank statement for inventory. Every "deposit" and "withdrawal" is listed. The running total is visible. Anyone can verify the math by scrolling through the list.</p>
<p>This isn't just for debugging — it's a daily tool. When a staff member questions the count, they don't need to call IT. They click "Detail," read the trail, and understand exactly why the system says what it says.</p>
<hr />
<h2>But Wait — It Can Still Be Bugged</h2>
<p>I want to be honest here. When I say the system "cannot be wrong," I'm talking about the <em>architecture</em> — the structural impossibility of stock count drift. The system is designed so that <strong>if all movements are recorded, the count is mathematically guaranteed to be correct.</strong></p>
<p>But software is software. Bugs can still happen:</p>
<ul>
<li><p><strong>A movement could fail to be recorded.</strong> If a purchase receive action crashes halfway through and the transaction doesn't roll back properly, a movement might be lost. The stock count would then be wrong — not because of drift, but because of missing data.</p>
</li>
<li><p><strong>A movement could be recorded with the wrong count.</strong> If the UI sends <code>count: 10</code> instead of <code>count: 1</code> due to a frontend bug, the ledger faithfully records the wrong number. Garbage in, garbage out.</p>
</li>
<li><p><strong>A new movement type could be added without updating the calculation.</strong> If a developer adds a 9th movement type but forgets to include it in the calculation functions or the SQL query, those movements would be invisible.</p>
</li>
<li><p><strong>Concurrent transactions could interact unexpectedly.</strong> Although we use database transactions, extremely high concurrency on the same stock item could theoretically cause issues depending on the isolation level.</p>
</li>
<li><p><strong>The client and server calculations could diverge.</strong> We mirror the logic in both SQL and TypeScript. If someone updates one but not the other, you'd see different numbers depending on where you look.</p>
</li>
</ul>
<p>The architecture eliminates the <em>most common</em> class of inventory bugs — the silent drift that accumulates over months and is impossible to diagnose. But it doesn't make the software immune to all bugs. No architecture can.</p>
<p>What it <em>does</em> give you is <strong>debuggability</strong>. When something looks wrong, you can open the trail, read every movement, and find the problem. Compare that to a traditional system where the stock is just... a number. How did it get there? Who knows.</p>
<hr />
<h2>The Buffer System: Handling In-Progress Sales</h2>
<p>One more detail I'm proud of. In optical retail, there's often a gap between selling an item and physically handing it to the customer (because lenses need to be fitted into frames). During this gap, the item is "sold" but still physically in the store.</p>
<p>We handle this with a <strong>buffer</strong> concept. The stock count includes sold-but-unfulfilled items as decreases, but the UI shows a clear breakdown:</p>
<ul>
<li><p><strong>Total Stock:</strong> 5 (the ledger total)</p>
</li>
<li><p><strong>Buffer:</strong> 2 (sold but not yet fulfilled)</p>
</li>
<li><p><strong>Available:</strong> 3 (actually available to sell)</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/6667e159-83ad-4d6a-996e-462fdcde2e6b.png" alt="Screenshot of the buffer stock indicator showing the yellow warning box with Total Stock, Buffer, and Available counts" style="display:block;margin:0 auto" />

<p>This prevents double-selling without hiding information from staff.</p>
<hr />
<h2>Lessons Learned</h2>
<ol>
<li><p><strong>Derived state &gt; stored state.</strong> If you can calculate it, don't store it. Stored state lies. Calculated state can't.</p>
</li>
<li><p><strong>Make the audit trail a feature, not an afterthought.</strong> When staff can see <em>why</em> the number is what it is, trust in the system goes up dramatically.</p>
</li>
<li><p><strong>Event sourcing isn't just for distributed systems.</strong> Even a simple retail app benefits from treating every state change as an immutable event.</p>
</li>
<li><p><strong>The best debugging tool is transparency.</strong> When something goes wrong, a complete trail of events is worth more than any amount of logging.</p>
</li>
</ol>
<hr />
<img src="https://cdn.hashnode.com/uploads/covers/61f4e5bb4bea13573e62c7da/fbc5c4b0-53ce-4edf-a250-e986b0d3b566.png" alt="Screenshot of the full Lensiro stock management page with filters, table, and a detail trail expanded — the whole system working together" style="display:block;margin:0 auto" />

<hr />
<hr />
<h2>About Lensiro</h2>
<p><a href="https://lensiro.com/">Lensiro</a> is a complete retail management platform built specifically for optical stores. It handles everything an eyewear business needs in one place — from inventory and stock management across multiple branches, to point-of-sale, purchasing, member management, and accounting with double-entry bookkeeping. If you run an optical retail business and want a system that actually gets inventory right, check it out at <a href="https://lensiro.com/">lensiro.com</a>.</p>
<p><em>Built with Next.js, PostgreSQL, Prisma, and a lot of stubbornness about doing inventory right.</em></p>
]]></content:encoded></item><item><title><![CDATA[200台サーバーのローカルllmクラスターを構築]]></title><description><![CDATA[Large-language-model（LLM）APIはとても強力ですが、コストが高いものです。一方、ローカルでLLMを推論させる場合、ハードウェアさえ手元にあればほぼ無料で動かせます。本記事では、オフィスで眠っていたApple Siliconラップトップを活用し、200台の推論クラスターを構築して本番トラフィックの25 %を処理するまでの道のりを紹介します。しかもデータセンター契約は一切ありません。
ネタバレ: あるほこりだらけの会議室から始まり、最後は午前3時にオフィスのネットワークを総配...]]></description><link>https://blog.alvinend.tech/200llm</link><guid isPermaLink="true">https://blog.alvinend.tech/200llm</guid><category><![CDATA[Japanese,]]></category><category><![CDATA[llm]]></category><category><![CDATA[SelfHosting]]></category><category><![CDATA[ansible]]></category><category><![CDATA[Grafana]]></category><category><![CDATA[Haproxy]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Fri, 04 Jul 2025 05:42:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751607670862/50c1c901-6dda-4067-9d02-5ee6fa53d8af.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Large-language-model（LLM）APIはとても強力ですが、コストが高いものです。一方、ローカルでLLMを推論させる場合、ハードウェアさえ手元にあればほぼ無料で動かせます。本記事では、オフィスで眠っていたApple Siliconラップトップを活用し、<strong>200台の推論クラスター</strong>を構築して本番トラフィックの25 %を処理するまでの道のりを紹介します。しかもデータセンター契約は一切ありません。</p>
<p><em>ネタバレ</em>: あるほこりだらけの会議室から始まり、最後は午前3時にオフィスのネットワークを総配線し直すことになりました。</p>
<h2 id="heading-1macbook">フェーズ1：眠れるMacBookに息を吹き込む</h2>
<p>オフィスの棚には、誰も使っていない <strong>M1 MacBook Pro（32 GB RAM）</strong>が12台並んでいました。時間がたてば資産価値は下がるだけ――そこで私は「これをLLM推論サーバーにしよう」と提案。CEOはコスト削減のアイデアを気に入り、私たちはすぐに行動を開始しました。</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606134537/3d8d8cf0-72f4-459f-a41b-47d4f5c11a66.jpeg?auto=compress,format&amp;format=webp" alt /></p>
<h3 id="heading-44k544k44od44kv5qel5oiq">スタック構成</h3>
<p>特別なものはありません。</p>
<ul>
<li><p><strong>Ollama</strong> でモデル提供（4つ試した中で最終的に採用）</p>
</li>
<li><p>1台のMacBookに <strong>HAProxy</strong> を入れ、シンプルなラウンドロビン負荷分散</p>
</li>
<li><p><strong>Prometheus + Grafana</strong> でメトリクスとダッシュボード</p>
</li>
<li><p>6 m²の会議室が“データセンター”なので、冷却用の安い卓上ファン</p>
</li>
<li><p>MacBookは<strong>1台ずつ</strong>設置</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606710440/9b418d3e-ace1-4913-8b99-76fc7d19aea7.png?auto=compress,format&amp;format=webp" alt /></p>
<h3 id="heading-44ov44os44o844og44ov44o844kv5qu6lyd">フレームワーク比較</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>フレームワーク</td><td>採用しなかった理由</td></tr>
</thead>
<tbody>
<tr>
<td><strong>LM Studio（MLXバックエンド）</strong></td><td>MLX対応は優秀だが、長いコンテキストや並列リクエストでフリーズ</td></tr>
<tr>
<td><strong>Raw MLX Library</strong></td><td>OpenAI風APIがなく、独自解析が必要。メモリ使用量も多い</td></tr>
<tr>
<td><strong>ollama.cpp</strong></td><td>性能は高いが、当時は自動化が困難（Ansible未対応）</td></tr>
<tr>
<td><strong>Ollama</strong></td><td>デプロイが簡単で性能も十分。OpenAI互換エンドポイント。<strong>採用決定</strong></td></tr>
</tbody>
</table>
</div><p>1週間で12台のMacBookが30Bモデルをローカルで動かし、本番トラフィックの約5 %を処理できるようになりました。</p>
<h2 id="heading-2mac-studio">フェーズ2：Mac Studioの寄り道 ― “大きい＝速い”とは限らない</h2>
<p>成功すると欲が出るもの。マネジメントは「メモリが多ければスループットも上がる」と考え、<strong>Mac Studio（512 GB RAM・80-core GPU）</strong>を6台購入。各Mac StudioでMacBook 8台分の性能を期待しました。</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606310302/9e838dd5-9c78-454c-bec2-4544fbb6024d.jpeg?auto=compress,format&amp;format=webp" alt /></p>
<p>しかし現実は、LLMの速度はほぼ<strong>GPUコア数</strong>に比例し、<strong>Amdahlの法則</strong>が示すようにパイプラインの一部は並列化できません。その結果、Mac Studio 1台はMacBookの<strong>約3–4倍</strong>しか速くありませんでした（8倍ではない）。</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/AmdahlsLaw.svg/1920px-AmdahlsLaw.svg.png" alt="undefined" /></p>
<p><img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hYys61pY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nnn1wctofyp25352v51v.png" alt="Understanding Concurrency Through Amdahl's Law - DEV Community" class="image--center mx-auto" /></p>
<p>教訓は得ましたが、6台のMac Studioでトラフィックカバー率は約25 %に増えました。</p>
<h2 id="heading-3200mac-miniansible">フェーズ3：200台のMac miniとAnsibleの歓喜</h2>
<h3 id="heading-mac-mini">なぜMac mini？</h3>
<p>コスト分析の結果、<strong>Mac mini（20-core GPU）2台</strong>で1台のMac Studioよりもトークンあたりのコストパフォーマンスが高いと判明。<strong>200台</strong>を一括購入しました。</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606390595/c7e32ec5-287c-49dc-adcc-fc8d90ed4fdb.jpeg?auto=compress,format&amp;format=webp" alt /></p>
<h3 id="heading-6ieq5yuv44ox44ot44ot44k444on44ol44oz44kw">自動プロビジョニング</h3>
<p>200台を手動セットアップなど論外。そこで<strong>Ansible</strong>を導入しました。</p>
<p>（以下は例。本番環境はもっと複雑です）</p>
<pre><code class="lang-bash"><span class="hljs-comment"># プレイブックの抜粋</span>
- hosts: mac
  tasks:
    - name: Install Ollama
      homebrew:
        name: ollama
        state: present
    - name: Configure model
      shell: ollama pull mistral:7b-instruct
    - name: Register with HAProxy
      template:
        src: haproxy.cfg.j2
        dest: /usr/<span class="hljs-built_in">local</span>/etc/haproxy/haproxy.cfg
</code></pre>
<p>最初の50台が一気に稼働した瞬間はまさに魔法のよう。あとは繰り返しでOKです。</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606439615/ef003b5f-8b5d-4580-ad36-13c58ce78d39.jpeg?auto=compress,format&amp;format=webp" alt /></p>
<h3 id="heading-3">午前3時のネットワーク大混乱</h3>
<p>最大の難所はネットワークでした。サーバー群用に<strong>専用VLAN</strong>が必要でしたが、手元のYamahaルーターのマニュアルは中途半端な翻訳PDF、前任のネットワーク技術者は退職済み。ポート設定を1つミスっただけでオフィスWi-Fiが全滅。12時間、コーヒー3ポット、VLANタグ付けの猛勉強の末、無事に両ネットワークが復旧しました。</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606531418/1425a0b8-a726-49be-b09c-4ac876e80710.jpeg?auto=compress,format&amp;format=webp" alt /></p>
<h2 id="heading-5luk5b6m44gu6kii55s7">今後の計画</h2>
<p><strong>ラック化スケール</strong> – 200台は収まっているものの、エアフローとケーブル管理が限界。次は42Uラックと本格PDUを導入予定です。</p>
<p>パート2ではVLAN奮闘記の詳細と、クラスターを支えるGrafanaダッシュボードを紹介します。お楽しみに！</p>
<h3 id="heading-6kqt44kt44gn44gp44km44gm44gc44kk44gm44go44gg77yb">読んでくれてありがとう！</h3>
<p>質問やコメント、あなたの失敗談などあればぜひ教えてください。情報交換しましょう。</p>
]]></content:encoded></item><item><title><![CDATA[Building a 200‑Server Local LLM Cluster]]></title><description><![CDATA[Large‑language‑model (LLM) APIs are incredibly powerful—but they are also expensive. Local LLM inference, on the other hand, is almost free once the hardware is on your desk. This post walks through how we turned an ever‑growing pile of idle Apple si...]]></description><link>https://blog.alvinend.tech/building-a-200server-local-llm-cluster</link><guid isPermaLink="true">https://blog.alvinend.tech/building-a-200server-local-llm-cluster</guid><category><![CDATA[SelfHosting]]></category><category><![CDATA[llm]]></category><category><![CDATA[ansible]]></category><category><![CDATA[Grafana]]></category><category><![CDATA[Haproxy]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Fri, 04 Jul 2025 05:31:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751605855973/ed56745b-87fe-4101-a2d3-24fbaf3d77c4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Large‑language‑model (LLM) APIs are incredibly powerful—but they are also expensive. Local LLM inference, on the other hand, is almost free once the hardware is on your desk. This post walks through how we turned an ever‑growing pile of idle Apple silicon laptops into a 200‑node inference cluster that now carries a quarter of our production traffic, all without a data‑center contract.</p>
<p><em>Spoiler</em>: it started with a dusty meeting room and ended with me rewiring the entire office network at 3 a.m.</p>
<h2 id="heading-phase-one-breathing-life-into-unused-macbooks">Phase One: Breathing Life into Unused MacBooks</h2>
<p>Our office shelves were lined with twelve M1 MacBook Pros (32 GB RAM) that nobody had touched in months. Instead of letting them depreciate silently, I proposed re‑purposing them as LLM inference servers. The CEO loved the cost‑saving angle, so we rolled up our sleeves.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606134537/3d8d8cf0-72f4-459f-a41b-47d4f5c11a66.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-the-stack">The stack</h3>
<p>Nothing fancy:</p>
<ul>
<li><p><strong>Ollama</strong> for model serving (picked after trying four frameworks—see below)</p>
</li>
<li><p><strong>HAProxy</strong> on one MacBook for simple round‑robin load balancing</p>
</li>
<li><p><strong>Prometheus + Grafana</strong> for metrics and dashboards</p>
</li>
<li><p>A cheap desktop fan to keep the “data center” (a 6 m² meeting room) cool</p>
</li>
<li><p>Installing one macbook at atime</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606710440/9b418d3e-ace1-4913-8b99-76fc7d19aea7.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-framework-bakeoff">Framework bake‑off</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Framework</td><td>Why it didn’t make the cut</td></tr>
</thead>
<tbody>
<tr>
<td><strong>LM Studio (MLX backend)</strong></td><td>Great MLX support, but froze on long contexts and parallel requests</td></tr>
<tr>
<td><strong>Raw MLX Library</strong></td><td>No OpenAI‑style API; required custom parsing; high memory usage</td></tr>
<tr>
<td><strong>ollama.cpp</strong></td><td>Impressive performance, but hard to automate at the time (no Ansible yet)</td></tr>
<tr>
<td><strong>Ollama</strong></td><td>Easiest to deploy, solid performance, OpenAI‑compatible endpoint. <strong>We are using this</strong></td></tr>
</tbody>
</table>
</div><p>Within a week we had twelve MacBooks serving local 30B models and handling ~5 % of live traffic.</p>
<h2 id="heading-phase-two-the-mac-studio-detourbigger-isnt-always-better">Phase Two: The Mac Studio Detour—Bigger Isn’t Always Better</h2>
<p>Success breeds ambition. Convinced that “more memories = more throughput,” management ordered six fully loaded <strong>Mac Studio (512 GB RAM, 80‑core GPU)</strong> machines, expecting each to replace eight MacBooks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606310302/9e838dd5-9c78-454c-bec2-4544fbb6024d.jpeg" alt class="image--center mx-auto" /></p>
<p>Reality check: LLM speed scaled almost linearly with <em>GPU cores</em>, not RAM, and <strong>Amdahl’s Law</strong> reminded us that some parts of the pipeline stay serial no matter what. A single Mac Studio was only about <strong>3×–4x faster</strong> than a MacBook, not 8×.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/AmdahlsLaw.svg/1920px-AmdahlsLaw.svg.png" alt="undefined" /></p>
<p><img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hYys61pY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nnn1wctofyp25352v51v.png" alt="Understanding Concurrency Through Amdahl's Law - DEV Community" class="image--center mx-auto" /></p>
<p>Lesson learned, but the six Mac Studios still bumped us to ~25 % traffic coverage.</p>
<h2 id="heading-phase-three-200-mac-minis-and-the-joy-of-ansible">Phase Three: 200 Mac Minis and the Joy of Ansible</h2>
<h3 id="heading-why-mac-minis">Why Mac minis?</h3>
<p>A cost analysis showed that <strong>two Mac minis (20‑core GPU)</strong> delivered more tokens per dollar than a single Mac Studio. We bulk‑ordered <em>two hundred</em> of them.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606390595/c7e32ec5-287c-49dc-adcc-fc8d90ed4fdb.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-automated-provisioning">Automated provisioning</h3>
<p>Hand‑installing 200 machines was a non‑starter, so I dove head‑first into <strong>Ansible</strong>:</p>
<p>(Below is example. the real deal is much complex)</p>
<pre><code class="lang-bash"><span class="hljs-comment"># excerpt from playbook</span>
- hosts: mac
  tasks:
    - name: Install Ollama
      homebrew:
        name: ollama
        state: present
    - name: Configure model
      shell: ollama pull mistral:7b-instruct
    - name: Register with HAProxy
      template:
        src: haproxy.cfg.j2
        dest: /usr/<span class="hljs-built_in">local</span>/etc/haproxy/haproxy.cfg
</code></pre>
<p>Bringing the first 50 nodes online felt like magic. After that, it was rinse‑and‑repeat.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606439615/ef003b5f-8b5d-4580-ad36-13c58ce78d39.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-the-3-am-network-meltdown">The 3 a.m. network meltdown</h3>
<p>The real nightmare was networking. We needed a <strong>separate VLAN</strong> for the “server farm,” but the only documentation for our Yamaha router was a half‑translated PDF, and the prior network engineer had left months earlier. One mis‑tagged port later, the office Wi‑Fi went dark. Twelve hours, three pots of coffee, and a crash course in VLAN tagging later, both networks were humming.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606531418/1425a0b8-a726-49be-b09c-4ac876e80710.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-future-plans">Future Plans</h2>
<p><strong>Scale to racks</strong> – 200 minis fit, but airflow and cabling are becoming a headache. A 42U rack with proper PDUs is next.</p>
<p>Stay tuned for part 2; next time I’ll cover the VLAN saga in detail and share the Grafana dashboards that keep this Frankencluster alive.</p>
<h3 id="heading-thanks-for-reading">Thanks for reading!</h3>
<p>Questions, comments, or horror stories of your own? Let me know below; I’d love to compare notes.</p>
]]></content:encoded></item><item><title><![CDATA[Managing Snowflake's Procedure & UDF with Github]]></title><description><![CDATA[Snowflake is an incredible data platform that my company and I have been leveraging for about a year and a half. It's robust, reliable, and feature-rich, with an intuitive UI that makes it easy to navigate. Notably, we haven't experienced any acciden...]]></description><link>https://blog.alvinend.tech/managing-snowflakes-procedure-udf-with-github</link><guid isPermaLink="true">https://blog.alvinend.tech/managing-snowflakes-procedure-udf-with-github</guid><category><![CDATA[snowflake]]></category><category><![CDATA[big data]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Mon, 24 Jun 2024 03:46:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1719200227948/b7835a7b-6125-4ef9-aaf3-611ff51151ac.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Snowflake is an incredible data platform that my company and I have been leveraging for about a year and a half. It's robust, reliable, and feature-rich, with an intuitive UI that makes it easy to navigate. Notably, we haven't experienced any accidents caused by Snowflake, underscoring its stability and reliability.</p>
<p>Among Snowflake's myriad features, User-Defined Functions (UDFs) and Procedures stand out. UDFs are functions that you can call within a <code>SELECT</code> (or other) query to format data to your specifications. Procedures, on the other hand, are functions that you can call to process tasks in the background, including creating, updating, and deleting records.</p>
<p>However, there was no way to manage, test, or review these functions before deploying them to production—until <a target="_blank" href="https://docs.snowflake.com/en/developer-guide/git/git-overview">Snowflake implement integration with git</a>. Although it's still in open preview, this integration has the potential to solve that problem effectively.</p>
<h2 id="heading-prepare-codebase">Prepare Codebase</h2>
<p>Let's start by preparing the codebase. Here's the directory structure:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718629841361/7917707a-7af4-411f-b742-c7e24ff1fbcb.png" alt class="image--center mx-auto" /></p>
<p>As you can see, I'm using <a target="_blank" href="https://python-poetry.org/">Poetry</a> to set up the virtual environment and manage packages. I also employ <a target="_blank" href="https://docs.astral.sh/ruff/">Ruff</a> for linting, ensuring that my code remains clean and maintainable (well, one can hope (´ω`)).</p>
<p>The source code resides in the <code>/src</code> directory, which I’ve divided into <code>/src/procedures</code> and <code>/src/udf</code>. For this example, I'll create a simple fetch procedure that, given a table name, returns ten rows from it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os

<span class="hljs-keyword">from</span> snowflake.snowpark <span class="hljs-keyword">import</span> Session, Table


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch</span>(<span class="hljs-params">session: Session, table_name: str</span>) -&gt; Table:</span>
    <span class="hljs-keyword">return</span> session.table(table_name).select([<span class="hljs-string">"id"</span>]).limit(<span class="hljs-number">10</span>)

<span class="hljs-keyword">if</span> __name__== <span class="hljs-string">"__main__"</span>:
    <span class="hljs-keyword">from</span> snowflake.snowpark <span class="hljs-keyword">import</span> Session

    session = Session.builder.configs({
        <span class="hljs-string">"account"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__ACCOUNT"</span>),
        <span class="hljs-string">"user"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__USER"</span>),
        <span class="hljs-string">"password"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__PASSWORD"</span>),
        <span class="hljs-string">"role"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__ROLE"</span>),
        <span class="hljs-string">"warehouse"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__WAREHOUSE"</span>),
        <span class="hljs-string">"database"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__DATABASE"</span>),
        <span class="hljs-string">"schema"</span>: os.getenv(<span class="hljs-string">"SNOWFLAKE__SCHEMA"</span>),
    }).create()

    table = fetch(
        session=session,
        table_name=<span class="hljs-string">"TEST_TABLE"</span>,
    )

    print(table.collect())
</code></pre>
<p>This simple example demonstrates how to fetch data from a Snowflake table using a stored procedure, managed and versioned in a GitHub repository. This code enables you to test it locally by running <code>poetry run python3 src/procedures/fetch/fetch.py</code>.</p>
<p>For the full code and additional examples, visit my <a target="_blank" href="https://github.com/alvinend/snowflake-extensions">GitHub repository.</a></p>
<h2 id="heading-create-github-personal-access-token">Create Github Personal Access Token</h2>
<p>We've written code for Snowflake to understand! Yay! But now we need Snowflake to read our code. To enable this, we need to generate a personal access token (PAT) that allows Snowflake to access our GitHub repository.</p>
<p>Here is the step:</p>
<ol>
<li><p>Open the GitHub page and click on your profile picture in the top right corner.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718630917467/10e05009-d7a6-4971-b052-8ec91196b175.png" alt class="image--center mx-auto" /></p>
<p> Click on <strong>Settings</strong>.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718630941532/f57e04b6-d3fc-47e0-a467-59dfb311353e.png" alt class="image--center mx-auto" /></p>
<p> In the sidebar, click on <strong>Developer settings</strong>.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718630977794/d9517849-2106-496c-af72-02117dd3d64a.png" alt class="image--center mx-auto" /></p>
<p> Go to <strong>Personal access tokens</strong> → <strong>Fine-grained tokens</strong>.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718631030899/513e7e48-e7fc-4500-901f-5d7951d68d03.png" alt class="image--center mx-auto" /></p>
<p> Generate a new token. Select your repository and set the expiry date.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718631134466/33b267f2-a785-4d0a-9c96-b3b4e7728013.png" alt class="image--center mx-auto" /></p>
<p> Grant it permission to read contents.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718631191165/4061a9be-392b-4647-bb11-beeec1db88a9.png" alt class="image--center mx-auto" /></p>
<p> Done!</p>
</li>
</ol>
<p>You should now have your GitHub PAT, which will look something like this:</p>
<pre><code class="lang-plaintext">github_pat_xxxxxxxxxxxx....
</code></pre>
<h2 id="heading-create-needed-snowflake-resources">Create Needed Snowflake Resources</h2>
<p>Okay, let's jump into the Snowflake console and get our hands dirty. First, we need to save our GitHub Personal Access Token in Snowflake as a <strong>Secret</strong>.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">OR</span> <span class="hljs-keyword">REPLACE</span> SECRET git_secret
    <span class="hljs-keyword">TYPE</span> = <span class="hljs-keyword">password</span>
    USERNAME = <span class="hljs-string">'dummy-username'</span>
    <span class="hljs-keyword">PASSWORD</span> = <span class="hljs-string">'github_pat_xxxxxxxxxxxx....'</span>;
</code></pre>
<p>The username doesn't really matter; you can input whatever you want. The password must be the generated GitHub PAT token.</p>
<p>Next, using that Secret, let's create our <strong>API Integration</strong> to integrate GitHub and Snowflake.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">OR</span> <span class="hljs-keyword">REPLACE</span> API INTEGRATION git_api_integration
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = (<span class="hljs-string">'https://github.com/alvinend'</span>)
  ALLOWED_AUTHENTICATION_SECRETS = (git_secret)
  ENABLED = <span class="hljs-literal">TRUE</span>;
</code></pre>
<p>Using that API Integration, let's create the Git repository.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">OR</span> <span class="hljs-keyword">REPLACE</span> GIT REPOSITORY snowflake_extensions
  API_INTEGRATION = git_api_integration
  GIT_CREDENTIALS = git_secret
  ORIGIN = <span class="hljs-string">'https://github.com/alvinend/snowflake-extensions.git'</span>;
</code></pre>
<p>Invoke fetch to fetch the repository.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">ALTER</span> GIT REPOSITORY snowflake_extensions <span class="hljs-keyword">FETCH</span>;
</code></pre>
<p>Check if our Snowflake Git repository can actually read it.</p>
<pre><code class="lang-sql">LS @snowflake_extensions/branches/main;
</code></pre>
<p>Now, let's declare our procedure. This simple procedure accepts a table name and returns ten rows from that table.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">OR</span> <span class="hljs-keyword">REPLACE</span> <span class="hljs-keyword">PROCEDURE</span> <span class="hljs-keyword">fetch</span>(
    table_name <span class="hljs-built_in">VARCHAR</span>,
)
  <span class="hljs-keyword">RETURNS</span> <span class="hljs-keyword">TABLE</span>(
    <span class="hljs-keyword">ID</span> <span class="hljs-built_in">VARCHAR</span>,
    <span class="hljs-keyword">NAME</span> <span class="hljs-built_in">VARCHAR</span>
  )
  <span class="hljs-keyword">LANGUAGE</span> PYTHON
  RUNTIME_VERSION = <span class="hljs-string">'3.11'</span>
  PACKAGES = (<span class="hljs-string">'snowflake-snowpark-python'</span>)
  IMPORTS = (<span class="hljs-string">'@temp_snowflake_extensions/branches/main/src/procedures/fetch/fetch.py'</span>)
  <span class="hljs-keyword">HANDLER</span> = <span class="hljs-string">'fetch.fetch'</span>;
</code></pre>
<p>All that's left to do is call it!</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CALL</span> <span class="hljs-keyword">fetch</span>(
    <span class="hljs-string">'TEST_TABLE'</span>
);
</code></pre>
<h2 id="heading-closing">Closing</h2>
<p>Managing Snowflake's UDFs and procedures with GitHub can significantly streamline your development process, enabling collaborative work and ensuring better code management. By leveraging Snowflake's powerful data platform and integrating it with GitHub, you can easily version-control your UDFs and procedures, reducing the risk of errors and making your workflows more efficient.</p>
<p>In this blog, we walked through the entire process—from creating a GitHub personal access token to integrating it with Snowflake. By following these steps, you can ensure that your Snowflake environment is robust, scalable, and easy to manage.</p>
<p>Happy coding, and <strong>may your data always be clean and your queries always be fast!</strong></p>
]]></content:encoded></item><item><title><![CDATA[Deploying Big Files with AWS Lambda and EFS Made Easy]]></title><description><![CDATA[Background
In some cases, we want to deploy our trained deep learning models or pre-trained models from platforms like Hugging Face to AWS Lambda for serverless inference.
While the official service for such tasks is AWS Sagemaker, it can sometimes b...]]></description><link>https://blog.alvinend.tech/deploying-big-files-with-aws-lambda-and-efs-made-easy</link><guid isPermaLink="true">https://blog.alvinend.tech/deploying-big-files-with-aws-lambda-and-efs-made-easy</guid><category><![CDATA[AWS]]></category><category><![CDATA[lambda]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Tue, 11 Jun 2024 02:53:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718030225367/88de3134-349f-4542-8b9c-6aeddb8ef1c7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-background">Background</h2>
<p>In some cases, we want to deploy our trained deep learning models or pre-trained models from platforms like Hugging Face to AWS Lambda for serverless inference.</p>
<p>While the official service for such tasks is AWS Sagemaker, it can sometimes be overly complex for simple deployment needs. Although Sagemaker offers benefits like model management and MLOps, there are scenarios where a simpler solution is preferred.</p>
<p>Deploying large models with just Lambda presents challenges due to the service's size limitations—50 MB for direct uploads (zipped) and 250 MB (unzipped). Using Lambda Docker with ECR can support up to 10 GB, but storing all files in memory can lead to slower cold starts and increased costs.</p>
<p>To achieve efficient deployment, I recommend using Lambda with EFS as the file system.</p>
<h2 id="heading-how-it-works">How it Works</h2>
<p>EFS is a file system that Lambda can access if it is mounted properly. To achieve this, both the EFS resource and Lambda need to be within a VPC.</p>
<p>According to AWS best practices, only resources that need to be accessible from outside the VPC should be placed in public subnets. Typically, this includes NAT gateway, which adds an extra layer of network security by ensuring critical resources are only accessible from within the VPC.</p>
<p>Consequently, EFS mounts should be placed inside private subnets. Since Lambda functions do not have public IPs, they should also be placed in private subnets.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716721018210/96bb78c9-2587-4c46-abd1-4109ea46f314.png" alt class="image--center mx-auto" /></p>
<p>Here is an overview of the steps.</p>
<ol>
<li><p>Create a Lambda function in a VPC with a private subnet.</p>
</li>
<li><p>Create an EFS in the same VPC as the Lambda function, also in a private subnet.</p>
</li>
<li><p>Create an EFS Mount Target in the Availability Zones (AZs) where the Lambda function will be deployed.</p>
</li>
<li><p>Create a Security Group to enable the Lambda function to access the EFS.</p>
</li>
<li><p>Mount the EFS from the Lambda settings.</p>
</li>
</ol>
<p>To insert the file, I usually create a temporary EC2 instance and mount the EFS on that EC2 instance so that I can transfer my files through the EC2 instance to the EFS.</p>
<p><strong>As a prerequisite, I assume that we already have a VPC with a NAT Gateway and internet access set up from the private subnet.</strong></p>
<h2 id="heading-sounds-hard-enough-how-to-build-it">Sounds Hard Enough... How to build it?</h2>
<p>There are many ways to build this infrastructure, and I generally recommend using some form of Infrastructure as Code (IaaC) such as Terraform, CloudFormation, or AWS SAM.</p>
<p>Since we are dealing with Lambda, I will be using AWS SAM in this tutorial. If you already use Terraform in your stack, I suggest managing EFS and EC2 with Terraform and Lambda with AWS SAM.</p>
<p>However, for simplicity, I will be using 100% AWS SAM for this tutorial. The easiest way for us to jump-start this is by using <code>sam init</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-string">sam</span> <span class="hljs-string">init</span>
</code></pre>
<p>After that, you should see the prompt below. Choose <code>1 - AWS Quick Start Templates</code> .</p>
<pre><code class="lang-yaml"><span class="hljs-string">Which</span> <span class="hljs-string">template</span> <span class="hljs-string">source</span> <span class="hljs-string">would</span> <span class="hljs-string">you</span> <span class="hljs-string">like</span> <span class="hljs-string">to</span> <span class="hljs-string">use?</span>
        <span class="hljs-number">1</span> <span class="hljs-bullet">-</span> <span class="hljs-string">AWS</span> <span class="hljs-string">Quick</span> <span class="hljs-string">Start</span> <span class="hljs-string">Templates</span>
        <span class="hljs-number">2</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Custom</span> <span class="hljs-string">Template</span> <span class="hljs-string">Location</span>
<span class="hljs-attr">Choice:</span>
</code></pre>
<p>There are a lot of templates. Choose <code>14 - Lambda EFS example</code>1</p>
<pre><code class="lang-yaml"><span class="hljs-string">Choose</span> <span class="hljs-string">an</span> <span class="hljs-string">AWS</span> <span class="hljs-string">Quick</span> <span class="hljs-string">Start</span> <span class="hljs-string">application</span> <span class="hljs-string">template</span>
        <span class="hljs-number">1</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Hello</span> <span class="hljs-string">World</span> <span class="hljs-string">Example</span>
        <span class="hljs-number">2</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Data</span> <span class="hljs-string">processing</span>
        <span class="hljs-number">3</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Hello</span> <span class="hljs-string">World</span> <span class="hljs-string">Example</span> <span class="hljs-string">with</span> <span class="hljs-string">Powertools</span> <span class="hljs-string">for</span> <span class="hljs-string">AWS</span> <span class="hljs-string">Lambda</span>
        <span class="hljs-number">4</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Multi-step</span> <span class="hljs-string">workflow</span>
        <span class="hljs-number">5</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Scheduled</span> <span class="hljs-string">task</span>
        <span class="hljs-number">6</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Standalone</span> <span class="hljs-string">function</span>
        <span class="hljs-number">7</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Serverless</span> <span class="hljs-string">API</span>
        <span class="hljs-number">8</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Infrastructure</span> <span class="hljs-string">event</span> <span class="hljs-string">management</span>
        <span class="hljs-number">9</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Lambda</span> <span class="hljs-string">Response</span> <span class="hljs-string">Streaming</span>
        <span class="hljs-number">10</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Serverless</span> <span class="hljs-string">Connector</span> <span class="hljs-string">Hello</span> <span class="hljs-string">World</span> <span class="hljs-string">Example</span>
        <span class="hljs-number">11</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Multi-step</span> <span class="hljs-string">workflow</span> <span class="hljs-string">with</span> <span class="hljs-string">Connectors</span>
        <span class="hljs-number">12</span> <span class="hljs-bullet">-</span> <span class="hljs-string">GraphQLApi</span> <span class="hljs-string">Hello</span> <span class="hljs-string">World</span> <span class="hljs-string">Example</span>
        <span class="hljs-number">13</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Full</span> <span class="hljs-string">Stack</span>
        <span class="hljs-number">14</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Lambda</span> <span class="hljs-string">EFS</span> <span class="hljs-string">example</span>
        <span class="hljs-number">15</span> <span class="hljs-bullet">-</span> <span class="hljs-string">DynamoDB</span> <span class="hljs-string">Example</span>
        <span class="hljs-number">16</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Machine</span> <span class="hljs-string">Learning</span>
</code></pre>
<p>Pick your Python version.</p>
<pre><code class="lang-yaml"><span class="hljs-string">Which</span> <span class="hljs-string">runtime</span> <span class="hljs-string">would</span> <span class="hljs-string">you</span> <span class="hljs-string">like</span> <span class="hljs-string">to</span> <span class="hljs-string">use?</span>
        <span class="hljs-number">1</span> <span class="hljs-bullet">-</span> <span class="hljs-string">python3.9</span>
        <span class="hljs-number">2</span> <span class="hljs-bullet">-</span> <span class="hljs-string">python3.8</span>
        <span class="hljs-number">3</span> <span class="hljs-bullet">-</span> <span class="hljs-string">python3.12</span>
        <span class="hljs-number">4</span> <span class="hljs-bullet">-</span> <span class="hljs-string">python3.11</span>
        <span class="hljs-number">5</span> <span class="hljs-bullet">-</span> <span class="hljs-string">python3.10</span>
</code></pre>
<p>By this point, we should have all we need to deploy Lambda and EFS to the cloud.</p>
<p>The complete generated template can be found here.</p>
<p><a target="_blank" href="https://github.com/alvinend/sample-lambda-efs">SAM Init, Lambda EFS example. Generated Template</a></p>
<h2 id="heading-okay-problem-solved">Okay, Problem solved?</h2>
<p>Not exactly. While using <code>sam init</code> can give us a jump start for our development, there are two problems when using it for Lambda with EFS.</p>
<ol>
<li><p>It doesn't provide a way to put data on the EFS.</p>
</li>
<li><p>The bigger problem is that we cannot exactly use it locally. While there are <a target="_blank" href="https://github.com/aws/aws-sam-cli/issues/2589">issues on GitHub discussing this</a>, there is currently no way to test it locally.</p>
</li>
</ol>
<p>In the next sections, we will try to fix the above problems.</p>
<h2 id="heading-improve-sam-template-amp-add-ec2">Improve SAM Template &amp; Add EC2</h2>
<p>When adding an EC2 instance, it's not necessary to rewrite the entire template. The template generated by <code>sam init</code> can be hard to read and understand, so I've rewritten it to make it clear what inputs are needed and what resources are required to create resources like EFS, Lambda, and EC2.</p>
<p>Also, to fix the second problem, we need our Lambda to be image-based instead of a zip file.</p>
<p>First, let's start with the parameters:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">AWSTemplateFormatVersion:</span> <span class="hljs-string">'2010-09-09'</span>
<span class="hljs-attr">Transform:</span> <span class="hljs-string">AWS::Serverless-2016-10-31</span>
<span class="hljs-attr">Description:</span> <span class="hljs-string">Build</span> <span class="hljs-string">Lambda</span> <span class="hljs-string">with</span> <span class="hljs-string">EFS!</span>

<span class="hljs-attr">Parameters:</span>
  <span class="hljs-attr">VpcId:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">String</span>
    <span class="hljs-attr">Default:</span> <span class="hljs-string">your-vpc-id</span>

  <span class="hljs-attr">VpcCidr:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">String</span>
    <span class="hljs-attr">Default:</span> <span class="hljs-number">12.1</span><span class="hljs-number">.0</span><span class="hljs-number">.0</span><span class="hljs-string">/16</span>

  <span class="hljs-attr">PublicSubnetId:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">String</span>
    <span class="hljs-attr">Default:</span> <span class="hljs-string">your-vpc-id's</span> <span class="hljs-string">public</span> <span class="hljs-string">subnet</span>

  <span class="hljs-attr">PrivateSubnetId:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">String</span>
    <span class="hljs-attr">Default:</span> <span class="hljs-string">your-vpc-id's</span> <span class="hljs-string">private</span> <span class="hljs-string">subnet</span>

<span class="hljs-attr">Resources:</span>
  <span class="hljs-comment"># We Fill this Later</span>
</code></pre>
<p>As you can see, we need the VPC ID, VPC CIDR, one public subnet ID, and one private subnet ID. Using these inputs, we can create our resources.</p>
<p><strong>Lambda Resources</strong></p>
<p>Next, we define the Lambda resources:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">Resources:</span>
  <span class="hljs-comment">#</span>
  <span class="hljs-comment"># Lambda</span>
  <span class="hljs-comment">#</span>
  <span class="hljs-attr">LambdaExecutionRole:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::IAM::Role</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">AssumeRolePolicyDocument:</span>
        <span class="hljs-attr">Version:</span> <span class="hljs-string">'2012-10-17'</span>
        <span class="hljs-attr">Statement:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">Effect:</span> <span class="hljs-string">Allow</span>
            <span class="hljs-attr">Principal:</span>
              <span class="hljs-attr">Service:</span> [<span class="hljs-string">lambda.amazonaws.com</span>]
            <span class="hljs-attr">Action:</span> [<span class="hljs-string">'sts:AssumeRole'</span>]
      <span class="hljs-attr">Policies:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">PolicyName:</span> <span class="hljs-string">root</span>
          <span class="hljs-attr">PolicyDocument:</span>
            <span class="hljs-attr">Version:</span> <span class="hljs-string">'2012-10-17'</span>
            <span class="hljs-attr">Statement:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-attr">Effect:</span> <span class="hljs-string">Allow</span>
                <span class="hljs-attr">Action:</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">logs:CreateLogGroup</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">logs:CreateLogStream</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">logs:PutLogEvents</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">ec2:CreateNetworkInterface</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">ec2:DescribeNetworkInterfaces</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">ec2:DeleteNetworkInterface</span>
                <span class="hljs-attr">Resource:</span> <span class="hljs-string">'*'</span>
              <span class="hljs-bullet">-</span> <span class="hljs-attr">Effect:</span> <span class="hljs-string">Allow</span>
                <span class="hljs-attr">Action:</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">elasticfilesystem:ClientMount</span>
                  <span class="hljs-bullet">-</span> <span class="hljs-string">elasticfilesystem:ClientWrite</span>
                <span class="hljs-attr">Resource:</span> <span class="hljs-string">'*'</span>

  <span class="hljs-attr">LambdaFunction:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::Serverless::Function</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">PackageType:</span> <span class="hljs-string">Image</span>
      <span class="hljs-attr">FunctionName:</span> <span class="hljs-string">my-function-name</span>
      <span class="hljs-attr">Role:</span> <span class="hljs-type">!GetAtt</span> <span class="hljs-string">LambdaExecutionRole.Arn</span>

      <span class="hljs-comment"># Compute</span>
      <span class="hljs-attr">Timeout:</span> <span class="hljs-number">600</span>
      <span class="hljs-attr">MemorySize:</span> <span class="hljs-number">4096</span>
      <span class="hljs-attr">Architectures:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">x86_64</span>

      <span class="hljs-comment"># Network</span>
      <span class="hljs-attr">FileSystemConfigs:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">Arn:</span> <span class="hljs-type">!GetAtt</span> <span class="hljs-string">EFSAccessPoint.Arn</span> <span class="hljs-comment"># <span class="hljs-doctag">TODO:</span> Implement</span>
          <span class="hljs-attr">LocalMountPath:</span> <span class="hljs-string">/mnt/files</span>
      <span class="hljs-attr">VpcConfig:</span>
        <span class="hljs-attr">SecurityGroupIds:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EFSAccessSecurityGroup</span> <span class="hljs-comment"># <span class="hljs-doctag">TODO:</span> Implement</span>
        <span class="hljs-attr">SubnetIds:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">PrivateSubnetId</span>
    <span class="hljs-attr">Metadata:</span>
      <span class="hljs-attr">Dockerfile:</span> <span class="hljs-string">Dockerfile</span>
      <span class="hljs-attr">DockerContext:</span> <span class="hljs-string">./src</span>
      <span class="hljs-attr">DockerTag:</span> <span class="hljs-string">test</span>
</code></pre>
<p>To create the Lambda function, we need two resources: the Lambda itself and an IAM Role for executing it (accessing EFS, putting logs, etc.).</p>
<p><strong>EFS Resources</strong><br />Next, we create our EFS resources:</p>
<pre><code class="lang-yaml">  <span class="hljs-comment">#</span>
  <span class="hljs-comment"># EFS</span>
  <span class="hljs-comment">#</span>
  <span class="hljs-attr">EFSAccessSecurityGroup:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">'AWS::EC2::SecurityGroup'</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">GroupDescription:</span> <span class="hljs-string">Security</span> <span class="hljs-string">Group</span> <span class="hljs-string">for</span> <span class="hljs-string">Lambda</span> <span class="hljs-string">and</span> <span class="hljs-string">EFS</span> <span class="hljs-string">communication</span>
      <span class="hljs-attr">VpcId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">VpcId</span>
      <span class="hljs-attr">SecurityGroupIngress:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">IpProtocol:</span> <span class="hljs-string">tcp</span>
          <span class="hljs-attr">FromPort:</span> <span class="hljs-number">2049</span>  <span class="hljs-comment"># NFS port used by EFS</span>
          <span class="hljs-attr">ToPort:</span> <span class="hljs-number">2049</span>
          <span class="hljs-attr">CidrIp:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">VpcCidr</span>

  <span class="hljs-attr">EFSFileSystem:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::EFS::FileSystem</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">Encrypted:</span> <span class="hljs-literal">false</span>

  <span class="hljs-attr">EFSMountTarget:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::EFS::MountTarget</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">FileSystemId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EFSFileSystem</span>
      <span class="hljs-attr">SubnetId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">PrivateSubnetId</span>
      <span class="hljs-attr">SecurityGroups:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EFSAccessSecurityGroup</span>

  <span class="hljs-attr">EFSAccessPoint:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::EFS::AccessPoint</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">FileSystemId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EFSFileSystem</span>
      <span class="hljs-attr">PosixUser:</span>
        <span class="hljs-attr">Uid:</span> <span class="hljs-string">"1000"</span>
        <span class="hljs-attr">Gid:</span> <span class="hljs-string">"1000"</span>
      <span class="hljs-attr">RootDirectory:</span>
        <span class="hljs-attr">CreationInfo:</span>
          <span class="hljs-attr">OwnerGid:</span> <span class="hljs-string">"1000"</span>
          <span class="hljs-attr">OwnerUid:</span> <span class="hljs-string">"1000"</span>
          <span class="hljs-attr">Permissions:</span> <span class="hljs-string">"0777"</span>
</code></pre>
<p>The above defines the bare minimum so that our EFS can work. We need an EFS File System, a mount target, and an access point. Additionally, we need a Security Group to allow access between EFS and Lambda.</p>
<p><strong>EC2 Resources</strong></p>
<p>Finally, we add the EC2 instance to interact with EFS:</p>
<pre><code class="lang-yaml">  <span class="hljs-comment">#</span>
  <span class="hljs-comment"># EC2 Instance</span>
  <span class="hljs-comment">#</span>
  <span class="hljs-attr">EC2InstanceIAMRole:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::IAM::Role</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">AssumeRolePolicyDocument:</span>
        <span class="hljs-attr">Version:</span> <span class="hljs-string">'2012-10-17'</span>
        <span class="hljs-attr">Statement:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">Effect:</span> <span class="hljs-string">Allow</span>
            <span class="hljs-attr">Principal:</span>
              <span class="hljs-attr">Service:</span>
                <span class="hljs-bullet">-</span> <span class="hljs-string">ec2.amazonaws.com</span>
            <span class="hljs-attr">Action:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-string">'sts:AssumeRole'</span>
      <span class="hljs-attr">ManagedPolicyArns:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess</span>

  <span class="hljs-attr">EC2InstanceProfile:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::IAM::InstanceProfile</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">Roles:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EC2InstanceIAMRole</span>

  <span class="hljs-attr">EC2Instance:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::EC2::Instance</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">ImageId:</span> <span class="hljs-string">ami-07c589821f2b353aa</span> <span class="hljs-comment"># ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20231207</span>
      <span class="hljs-attr">InstanceType:</span> <span class="hljs-string">t2.micro</span>
      <span class="hljs-attr">SubnetId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">PublicSubnetId</span>
      <span class="hljs-attr">SecurityGroupIds:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">LambdaSecurityGroup</span>
      <span class="hljs-attr">IamInstanceProfile:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">EC2InstanceProfile</span>
</code></pre>
<p>(AMI ID might need to be updated based on the region and availability)</p>
<p>This allows the EC2 instance to access EFS. Additionally, you will need to mount the EFS to the EC2 instance before accessing it. You can find the mounting guide in the AWS official documentation.</p>
<p>The final template can be confirmed at my Github repository.</p>
<h2 id="heading-bigger-problem-local-invocation">Bigger Problem: Local Invocation.</h2>
<p>While the inability to test locally is not the end of the world, it is certainly a major inconvenience. Without local invocation, you need to deploy every time you want to test your code. This can take hours of your time and has certainly taken hours or even days of mine.</p>
<p>Worry no more, I have found a solution by reverse-engineering <code>sam-cli</code>.</p>
<p>The way it works is that when you press <code>sam local invoke</code>, the <code>sam-cli</code> creates an image, makes a container of it, opens an endpoint, calls it, and then removes the container. So to solve this, we just need to create a container with a Docker volume attached to it, the same directory that we attach our EFS to. In this case, we define it as:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">LocalMountPath:</span> <span class="hljs-string">/mnt/files</span>
</code></pre>
<p>To make this happen, we build our SAM project:</p>
<pre><code class="lang-sh">sam build --cached
</code></pre>
<p>Using the built image, we run the container with the volume attached to it:</p>
<pre><code class="lang-sh">docker run \
    --rm \
    -p 8000:8080 \
    --platform linux/amd64 \
    -v $$(<span class="hljs-built_in">pwd</span>)/efs:/mnt/efs \
    -e AWS_LAMBDA_FUNCTION_MEMORY_SIZE=8192 \
    -e AWS_LAMBDA_FUNCTION_TIMEOUT=600 \
    -e AWS_LAMBDA_FUNCTION_NAME=my-function-name \
    -e AWS_ACCESS_KEY_ID=dummy \
    -e AWS_SECRET_ACCESS_KEY=dummy \
    lambdafunction:<span class="hljs-built_in">test</span>
</code></pre>
<p>After that, you can invoke your function with the following command:</p>
<pre><code class="lang-sh">curl \
    -X POST \
    http://localhost:8000/2015-03-31/<span class="hljs-built_in">functions</span>/<span class="hljs-keyword">function</span>/invocations \
    -d <span class="hljs-string">'{"test":"test"}'</span>
</code></pre>
<p>This approach allows you to test your Lambda function locally, saving you significant time and effort.</p>
<h2 id="heading-closing">Closing</h2>
<p>By improving our SAM template and adding the necessary configurations, we've streamlined the deployment process for Lambda and EFS resources. We've also implemented a solution for local invocation, significantly reducing development and testing time. Now, you can store your model in EFS and read it from Lambda, enabling efficient handling of large files for serverless inference.</p>
<p><strong>Thank you for following along, and happy coding!</strong></p>
]]></content:encoded></item><item><title><![CDATA[Exploring AWS Aurora: MySQL vs PostgreSQL]]></title><description><![CDATA[Last week, I created a blog post describing the differences between databases in AWS RDS. Now, it seems fitting to elucidate the differences between Aurora MySQL and PostgreSQL.
Breakdown
Security




MySQLPostgreSQL



Kerberos Auth✓✓


Aurora Postg...]]></description><link>https://blog.alvinend.tech/exploring-aws-aurora-mysql-vs-postgresql</link><guid isPermaLink="true">https://blog.alvinend.tech/exploring-aws-aurora-mysql-vs-postgresql</guid><category><![CDATA[AWS]]></category><category><![CDATA[aurora]]></category><category><![CDATA[AWS RDS]]></category><category><![CDATA[MySQL]]></category><category><![CDATA[PostgreSQL]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Mon, 07 Aug 2023 03:00:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1690628442458/d80227fe-86de-4b21-89cf-f4843023ace5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last week, I created a blog post describing the <a target="_blank" href="https://blog.alvinend.tech/exploring-aws-rds-database-differences">differences between databases in AWS RDS</a>. Now, it seems fitting to elucidate the differences between Aurora MySQL and PostgreSQL.</p>
<h1 id="heading-breakdown">Breakdown</h1>
<h2 id="heading-security">Security</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>PostgreSQL</td></tr>
</thead>
<tbody>
<tr>
<td>Kerberos Auth</td><td>✓</td><td>✓</td></tr>
</tbody>
</table>
</div><p>Aurora PostgreSQL has supported Kerberos Auth for a long time. As for MySQL, it appears to have been recently included, as detailed in <a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-kerberos.html">Using Kerberos authentication for Aurora MySQL</a>. Unfortunately, I could not ascertain the exact date this feature was added.</p>
<h2 id="heading-monitoring">Monitoring</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>PostgreSQL</td></tr>
</thead>
<tbody>
<tr>
<td>Performance Insights</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Enhanced Monitoring</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Data Activity Stream (Sync)</td><td>×</td><td>✓</td></tr>
<tr>
<td>Data Activity Stream (Async)</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Advanced Auditing</td><td>✓</td><td>×</td></tr>
</tbody>
</table>
</div><p>Much like AWS RDS, both Aurora's MySQL and PostgreSQL support Performance Insights and Enhanced Monitoring features.</p>
<p>Both can also execute Data Activity Streams, essentially logs that record database activity. Examples of such activities include selecting commands against a database, connecting logs, and configuration changes.</p>
<p>However, only PostgreSQL supports Synchronous mode.</p>
<p>Furthermore, there is an Advanced Auditing feature for Aurora MySQL. This feature facilitates the collection of database activity logs and operates with high performance.</p>
<h2 id="heading-availability">Availability</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>PostgreSQL</td></tr>
</thead>
<tbody>
<tr>
<td>Backtrack</td><td>✓</td><td>×</td></tr>
<tr>
<td>Fault Injection Queries</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Fast DDL (Instant DDL)</td><td>✓</td><td>×</td></tr>
<tr>
<td>Parallel Query</td><td>✓</td><td>×</td></tr>
<tr>
<td>Replica</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Cross-Region Replica</td><td>✓</td><td>×</td></tr>
<tr>
<td>Multi-Master Cluster</td><td>✓</td><td>×</td></tr>
</tbody>
</table>
</div><p>Backtracking "rewinds" the DB cluster to the time you specify. Simple as it may sound, it is a powerful feature. Traditionally (a.k.a AWS RDS MySQL) to retrieve lost data we need to start the instance from backups and get data from there. Unfortunately, it only supports MySQL at the time of writing.</p>
<p>Fault Injection is a way to "sabotage" your instance. Why do we do this? Mainly test the failover capability of our cluster. It supports both Aurora's MySQL and PostgreSQL.</p>
<p>Fast DDL or Instant DDL depending on Aurora MySQL versions, is DDL on steroids. Creating, Altering etc can be done faster with this feature. It supports only Aurora MySQL.</p>
<p>The parallel Query feature is a way to improve I/O. It supports Aurora Mysql</p>
<p>Aurora MySQL supports normal replica, but MySQL support normal replica, cross-region replica and multi-Master Cluster (Replica that can accept write request)</p>
<h2 id="heading-others">Others</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>PostgreSQL</td></tr>
</thead>
<tbody>
<tr>
<td>Aurora Serverless v1</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Aurora Serverless v2</td><td>✓</td><td>×</td></tr>
<tr>
<td>Global Database</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Aurora Lab Mode</td><td>✓</td><td>×</td></tr>
<tr>
<td>Babelfish</td><td>×</td><td>✓</td></tr>
</tbody>
</table>
</div><p>Aurora Serverless like Aurora we can use without managing instances, v2 have more features that v1 doesn't have like AWS Identity and Access Management (IAM) database authentication, and Performance Insights. While we should use v2 if we can it doesn't support Aurora PostgreSQL at the of writing.</p>
<p>Aurora Global Database is a feature that allows replication to cluster in cross-region and it supports MySQL and PostgreSQL.</p>
<p>Aurora lab mode is used to enable Aurora features that are available in the current Aurora database version but are not enabled by default. Only supports Aurora Mysql.</p>
<p>Babelfish for Aurora PostgreSQL is a new capability for Amazon Aurora PostgreSQL-Compatible Edition that enables Aurora to understand commands from applications written for Microsoft SQL Server.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In conclusion, while both Aurora MySQL and PostgreSQL offer a multitude of robust features, there are distinct differences that could influence your choice between the two. When selecting a database, you must carefully consider the unique requirements and constraints of your use case. Features such as Backtracking, Fast DDL, and Advanced Auditing can be pivotal for certain applications and are thus only available for Aurora MySQL. Conversely, the synchronous mode in Data Activity Stream and Babelfish are exclusive to PostgreSQL. Therefore, understanding these differences can guide you in making a more informed choice, ensuring that you leverage the best of what AWS has to offer.</p>
<h1 id="heading-disclaimer">Disclaimer</h1>
<p>This document is intended as a general guide to the features of Amazon Aurora within AWS RDS as of the time of writing in July 2023. While every effort has been made to ensure accuracy, the rapidly evolving nature of cloud services means that some information may become outdated or inaccurate over time.</p>
<p>The document does not constitute professional advice, and decisions should not be made solely based on this content. It is recommended that readers refer to the latest official AWS documentation, or consult with an AWS certified professional or the AWS support team for the most accurate and up-to-date information.</p>
<p><strong>Feel free to correct me if I made any mistakes :)</strong></p>
]]></content:encoded></item><item><title><![CDATA[Exploring AWS RDS Database Differences]]></title><description><![CDATA[As an aspiring AWS Certified Database Specialist, I've spent a considerable amount of time diving into the extensive world of AWS Relational Database Services (RDS). The certification demands a deep understanding of AWS RDS, encompassing the various ...]]></description><link>https://blog.alvinend.tech/exploring-aws-rds-database-differences</link><guid isPermaLink="true">https://blog.alvinend.tech/exploring-aws-rds-database-differences</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS RDS]]></category><category><![CDATA[Databases]]></category><category><![CDATA[AWS certification]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sat, 29 Jul 2023 08:58:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1690621043168/5c5b05c5-4778-4a18-be67-d07f4e8f9d41.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As an aspiring AWS Certified Database Specialist, I've spent a considerable amount of time diving into the extensive world of AWS Relational Database Services (RDS). The certification demands a deep understanding of AWS RDS, encompassing the various database engines it supports, their distinct features, capabilities, and how they can be optimally used to meet specific application needs.</p>
<p>Yet, as I ventured further into my studies, I quickly realized that grasping the full range of offerings across the different flavors of RDS was no small feat. The variety is astounding - MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server, each with its own set of strengths and functionalities. It became apparent that a comprehensive comparison of these systems, highlighting what each one can do, would be an invaluable resource not just for me, but for anyone on a similar path.</p>
<p>This sparked the idea to create this blog post, a concentrated effort to demystify the capabilities of the different RDS systems. The aim was to dissect the key aspects of each RDS flavor, focusing on categories like Backup, Storage, Security, Read Replica, Monitoring, and others, and present a clear, concise, and useful comparison that could serve as a go-to guide for AWS RDS.</p>
<h1 id="heading-breakdown">Breakdown</h1>
<h2 id="heading-general">General</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>MariaDB</td><td>PostgreSQL</td><td>Oracle</td><td>SQL Server</td></tr>
</thead>
<tbody>
<tr>
<td>Parameter Group (PG)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Option Group (OG)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
</tbody>
</table>
</div><p>The above table gives a quick comparison of the AWS RDS offerings, indicating whether they support Parameter Group (PG) and Option Group (OG).</p>
<h2 id="heading-availability">Availability</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>MariaDB</td><td>PostgreSQL</td><td>Oracle</td><td>SQL Server</td><td>Reference link</td><td></td></tr>
</thead>
<tbody>
<tr>
<td>Automatic Backup</td><td>○</td><td>○</td><td>○</td><td>○</td><td>○</td><td></td><td></td></tr>
<tr>
<td>Manual Backup (Snapshots)</td><td>○</td><td>○</td><td>○</td><td>○</td><td>○</td><td></td><td></td></tr>
<tr>
<td>Multi-AZ</td><td>○</td><td>○</td><td>○</td><td>○</td><td>○</td><td></td><td></td></tr>
<tr>
<td>Database Mirroring and Always on</td><td>×</td><td>×</td><td>×</td><td>×</td><td>○</td><td></td></tr>
</tbody>
</table>
</div><p>The above table showcases the high availability options supported by each RDS engine.</p>
<h3 id="heading-automatic-backup-manual-backup-and-multi-az">Automatic Backup, Manual Backup and Multi-AZ</h3>
<p>All systems in RDS support automatic backup, manual backup (Snapshots) and Multi-AZ for additional availability.</p>
<h3 id="heading-database-mirroring-and-always-on-for-sql-server">Database Mirroring and Always on for SQL Server</h3>
<p>Amazon RDS supports Multi-AZ deployments for Microsoft SQL Server by using either SQL Server Database Mirroring (DBM) or Always On Availability Groups (AGs).</p>
<p>When setting up SQL Server Multi-AZ in RDS, it automatically configures all databases on the instance to use DBM or AGs. Amazon RDS manages the primary, the witness, and the secondary DB instance for you. Because the configuration is automatic, RDS selects DBM or Always On AGs based on the version of SQL Server that you deploy.</p>
<p>Refer to the following official documentation for more details:</p>
<ul>
<li><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_SQLServerMultiAZ.html">Amazon RDS - SQL Server Multi-AZ</a></li>
</ul>
<h2 id="heading-security">Security</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>MariaDB</td><td>PostgreSQL</td><td>Oracle</td><td>SQL Server</td></tr>
</thead>
<tbody>
<tr>
<td>Windows Authentication</td><td>×</td><td>×</td><td>×</td><td>×</td><td>✓</td></tr>
<tr>
<td>Encryption in Transit</td><td>ALTER REQUIRE SSL</td><td>ALTER REQUIRE SSL</td><td>Set <code>rds.force_ssl</code> to 1 in PG</td><td>Add SSL in OG</td><td>Set <code>rds.force_ssl</code> to 1 in PG</td></tr>
<tr>
<td>Encryption at Rest</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Transparent Data Encryption (TDE)</td><td>×</td><td>×</td><td>×</td><td>Set TDE Option in OG</td><td>Set TRANSPARENT_DATA_ENCRYPTION in OG</td></tr>
</tbody>
</table>
</div><h3 id="heading-using-ssl-on-mysql-and-mariadb">Using SSL on MySQL and MariaDB</h3>
<p>Both MySQL and MariaDB support the use of SSL/TLS connections for specific user accounts. For example, you can use one of the following statements:</p>
<pre><code class="lang-bash">ALTER USER <span class="hljs-string">'encrypted_user'</span>@<span class="hljs-string">'%'</span> REQUIRE SSL;
</code></pre>
<p>Users can then connect over SSL with the following commands:</p>
<ul>
<li>MySQL 5.7+:</li>
</ul>
<pre><code class="lang-bash">mysql -h mysql–instance1.123456789012.us-east-1.rds.amazonaws.com --ssl-ca=global-bundle.pem --ssl-mode=VERIFY_IDENTITY -P 3306 -u myadmin -p
</code></pre>
<ul>
<li>MySQL &lt; 5.7:</li>
</ul>
<pre><code class="lang-bash">mysql -h mysql–instance1.123456789012.us-east-1.rds.amazonaws.com --ssl-ca=global-bundle.pem --ssl-verify-server-cert -P 3306 -u myadmin -p
</code></pre>
<ul>
<li>MariaDB:</li>
</ul>
<pre><code class="lang-bash">mysql -h mysql–instance1.123456789012.us-east-1.rds.amazonaws.com --ssl-ca=global-bundle.pem --ssl-mode=REQUIRED -P 3306 -u myadmin -p
</code></pre>
<p>Refer to the official documentation for more details:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mariadb-ssl-connections.html">MariaDB SSL connections</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql-ssl-connections.html">MySQL SSL connections</a></p>
</li>
</ul>
<h3 id="heading-using-ssl-on-postgresql-and-sql-server">Using SSL on PostgreSQL and SQL Server</h3>
<p>For PostgreSQL and SQL Server, you can set <code>rds.force_ssl</code> to 1 in the parameter group to require SSL. By default, this setting is off (0) in SQL Server and PostgreSQL versions below 15. It's on (1) in PostgreSQL version 15 and above.</p>
<p>Refer to the official documentation for more details:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Concepts.General.SSL.html">PostgreSQL SSL</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/ssl-certificate-rotation-sqlserver.html">SQL Server SSL</a></p>
</li>
</ul>
<h3 id="heading-using-ssl-on-oracle">Using SSL on Oracle</h3>
<p>To enable SSL encryption for an Oracle DB instance, add the Oracle SSL option to the option group associated with the DB instance.</p>
<p>Refer to the official documentation for more details:</p>
<ul>
<li><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Oracle.Concepts.SSL.html">Oracle SSL</a></li>
</ul>
<h3 id="heading-using-transparent-data-encryption-on-sql-server-and-oracle">Using Transparent Data Encryption on SQL Server and Oracle</h3>
<p>Amazon RDS supports TDE to encrypt stored data on your DB instances running Microsoft SQL Server or Oracle. TDE automatically encrypts data before it is written to storage and automatically decrypts data when it is read from storage.</p>
<p>Refer to the official documentation for more details:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.Options.TDE.html">SQL Server TDE</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.Options.AdvSecurity.html">Oracle TDE</a></p>
</li>
</ul>
<h2 id="heading-read-replica">Read Replica</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>MariaDB</td><td>PostgreSQL</td><td>Oracle</td><td>SQL Server</td></tr>
</thead>
<tbody>
<tr>
<td>Read Replica (RR)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Second Tier RR</td><td>✓</td><td>×</td><td>×</td><td>×</td><td>×</td></tr>
<tr>
<td>Cross Region RR</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Enable Write RR</td><td>Set <code>read_only</code> to 0 in PG</td><td>Set <code>read_only</code> to 0 in PG</td><td>×</td><td>×</td><td>×</td></tr>
<tr>
<td>External Database RR</td><td>✓</td><td>✓</td><td>×</td><td>×</td><td>×</td></tr>
<tr>
<td>Automated Backup on RR</td><td>✓</td><td>✓○</td><td>×</td><td>✓</td><td>×</td></tr>
<tr>
<td>Manual Snapshot on RR</td><td>✓</td><td>✓</td><td>✓</td><td>×</td><td>×</td></tr>
</tbody>
</table>
</div><h3 id="heading-second-tier-read-replica-for-mysql">Second Tier Read Replica for MySQL</h3>
<p>By creating a second-tier Read Replica, you can potentially distribute some of the replication load from the master database instance to a first-tier Read Replica.</p>
<p>AWS RDS allows the creation of 5 first-tier read replicas. Each read replica can create another 5 replicas. Thus, a total of 30 replicas can exist with one master using this feature. However, note that second-tier read replicas may experience higher replication lag.</p>
<p><a target="_blank" href="https://aws.amazon.com/blogs/aws/new-read-replica-capabilities-for-amazon-rds/">Refer to this article for more details.</a></p>
<h3 id="heading-cross-region-read-replica">Cross Region Read Replica</h3>
<p>AWS RDS can create cross-region read replicas on any system. Notably, <a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-rds-sql-server-cross-region-read-replica/">SQL Server began supporting cross-region read replicas recently.</a></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RDS_Fea_Regions_DB-eng.Feature.CrossRegionReadReplicas.html">Refer to this document for more details.</a></p>
<h3 id="heading-write-on-read-replica-in-mysql-and-mariadb">Write on Read Replica in MySQL and MariaDB</h3>
<p>To enable write operations on read replicas in MySQL and MariaDB, set the read_only parameter to false for the DB parameter group associated with your DB instance. Other Amazon RDS engines, such as Amazon Aurora, do not permit the modification of the <code>read_only</code> parameter.</p>
<p><a target="_blank" href="https://repost.aws/knowledge-center/rds-read-replica">Refer to this article for more details.</a></p>
<h3 id="heading-external-database-read-replica-with-mysql-and-mariadb">External Database Read Replica with MySQL and MariaDB</h3>
<p>Replication can be set up between an RDS for MySQL or MariaDB DB instance and a MySQL or MariaDB instance that is external to Amazon RDS using binary log file replication.</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.External.Repl.html">Refer to this document for more details.</a></p>
<h3 id="heading-backup-on-read-replica-in-mysql-and-mariadb">Backup on Read Replica in MySQL and MariaDB</h3>
<p>Automatic backups and manual snapshots are supported on RDS for MySQL, RDS for Oracle or RDS for MariaDB read replicas.</p>
<p>For RDS for PostgreSQL, you can create a manual snapshot of read replicas. However, automated backups for read replicas are only supported for RDS for PostgreSQL 14.1 and higher versions. If you're using RDS for PostgreSQL versions earlier than 14.1 and want a backup, create a snapshot from a read replica.</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html">Refer to this document for more details.</a></p>
<h2 id="heading-monitoring">Monitoring</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>MySQL</td><td>MariaDB</td><td>PostgreSQL</td><td>Oracle</td><td>SQL Server</td></tr>
</thead>
<tbody>
<tr>
<td>Common Metrics</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Audit Log</td><td>Enable MariaDB Audit Plugin on OG</td><td>Enable MariaDB Audit Plugin on OG</td><td>with <code>pgaudit</code> extension</td><td>✓</td><td>✓</td></tr>
<tr>
<td>AWS Trusted Advisor</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>RDS Event Notifications</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Enhanced Monitoring</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>Performance Insight</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>RDS Recommendations</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>CloudTrail</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr>
<tr>
<td>CloudWatch Application Insight</td><td>×</td><td>×</td><td>×</td><td>×</td><td>✓</td></tr>
<tr>
<td>Set Log Retention</td><td>set with stored procedures</td><td>×</td><td>log_retantion_period on PG</td><td>×</td><td>×</td></tr>
</tbody>
</table>
</div><h3 id="heading-common-metrics-and-enhanced-monitoring">Common Metrics and Enhanced Monitoring</h3>
<p>Check the available Cloudwatch Metrics <a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-metrics.html">here</a>.</p>
<p>Notes on Enhanced Monitoring:</p>
<ul>
<li><p>Currently, Physical Devices graphs are not available for Microsoft SQL Server DB instances.</p>
</li>
<li><p>Currently, viewing OS metrics for a Multi-AZ standby replica is not supported for MariaDB or Microsoft SQL Server DB instances.</p>
</li>
</ul>
<h3 id="heading-event-notifications">Event Notifications</h3>
<p>RDS Event Notifications resources include:</p>
<ul>
<li><p>DB instance</p>
</li>
<li><p>DB snapshot</p>
</li>
<li><p>DB parameter group</p>
</li>
<li><p>DB security group</p>
</li>
<li><p>RDS Proxy</p>
</li>
<li><p>Custom engine version</p>
</li>
</ul>
<h3 id="heading-rds-recommendations">RDS Recommendations</h3>
<p>Amazon RDS provides automated recommendations for database resources such as DB instances, read replicas, and DB parameter groups. These recommendations offer best practice guidance by analyzing DB instance configuration, usage, and performance data.</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/accessing-monitoring.html">Read more here.</a></p>
<h3 id="heading-aws-trusted-advisor">AWS Trusted Advisor</h3>
<p>From the Trusted Advisor dashboard, you can review the following cost optimization, security, fault tolerance, and performance improvement checks:</p>
<ul>
<li><p>Amazon RDS Idle DB Instances</p>
</li>
<li><p>Amazon RDS Security Group Access Risk</p>
</li>
<li><p>Amazon RDS Backups</p>
</li>
<li><p>Amazon RDS Multi-AZ</p>
</li>
</ul>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MonitoringOverview.html">Read more here.</a></p>
<h3 id="heading-cloudwatch-application-insight">CloudWatch Application Insight</h3>
<p>When you add your applications to Amazon CloudWatch Application Insights, it scans the resources in the applications and recommends and configures metrics and logs on CloudWatch for application components. Example application components include SQL Server backend databases and Microsoft IIS/Web tiers.</p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=9BODRmzpEao">Watch this video for more information.</a></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-application-insights.html">Read more here.</a></p>
<h2 id="heading-others">Others</h2>
<ul>
<li>Oracle Does not yet support RAC</li>
</ul>
<h1 id="heading-conclusion">Conclusion</h1>
<p>We have examined various features of popular relational database services (RDBS) including MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server within Amazon's RDS framework. These databases each possess unique strengths and offer different capabilities that can be leveraged depending on the specific requirements of your application. With careful consideration of these features, you can select the RDBS that best fits your needs, thus maximizing performance, cost-efficiency, and overall productivity.</p>
<p>Remember that, as cloud technologies and AWS services continue to evolve, the features and capabilities of these RDBS could change over time. Hence, it is crucial to keep up-to-date with the latest AWS documentation, updates, and best practice guidelines to ensure the optimal use of the selected RDBS.</p>
<h1 id="heading-disclaimer">Disclaimer</h1>
<p>This document is intended as a general guide to the features of relational database services within AWS RDS as of the time of writing in July 2023. While every effort has been made to ensure accuracy, the rapidly evolving nature of cloud services means that some information may become outdated or inaccurate over time.</p>
<p>The document does not constitute professional advice, and decisions should not be made solely based on this content. It is recommended that readers refer to the latest official AWS documentation, or consult with an AWS certified professional or the AWS support team for the most accurate and up-to-date information.</p>
<p><strong>Feel free to correct me if I made any mistakes :)</strong></p>
]]></content:encoded></item><item><title><![CDATA[Scrape and Extract  Information from Images with GCP's AutoML Vision]]></title><description><![CDATA[Nowadays, much information is stored in the form of images. That is also true on websites. The problem with that it is more challenging for computers to recognize the information and scrape it. This article will introduce a way to extract pieces of i...]]></description><link>https://blog.alvinend.tech/scrape-and-extract-information-from-images-with-gcps-automl-vision</link><guid isPermaLink="true">https://blog.alvinend.tech/scrape-and-extract-information-from-images-with-gcps-automl-vision</guid><category><![CDATA[Python 3]]></category><category><![CDATA[Scraping]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sat, 24 Dec 2022 07:16:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1671866029114/574121ac-dfd3-4805-9927-007c70422933.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Nowadays, much information is stored in the form of images. That is also true on websites. The problem with that it is more challenging for computers to recognize the information and scrape it. This article will introduce a way to extract pieces of information from images by utilizing one of Google Cloud Platform's services, the AutoML Vision.</p>
<h1 id="heading-about-web-scraping">About Web Scraping</h1>
<p>Web scraping has been developing in past years. We also have tools that don't require coding to scrape (ParseHub, Octoparse). While it is easy, it lacks flexibility.</p>
<p>Building our scraper from zero allows us to use it however we want. For Python, Scraping can be done using scraping libraries (Requests, BeautifulSoup, Cheerio) and frameworks like Scrapy and Selenium.</p>
<p>In this experiment, we will use BeautifulSoup to parse HTML documents to find the part we want. GCP's AutoML will also be used to extract text from images.</p>
<h1 id="heading-overview-process">Overview Process</h1>
<p>Before we go hands-on with implementation, I want to explain how it will work from accessing the web page until we get the pieces of information that we want.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671862283030/fc4e1698-f61a-44e7-b8a6-11524c3d3f4b.jpeg" alt="Illustration of Overview Process." class="image--center mx-auto" /></p>
<p>First, we need to access a web page from which the image we want to extract. In this article, we will use my blog page for privacy reasons, and I want to avoid getting in trouble for scraping another website and publishing an article about it. We aim to get all titles from the article's cover images I post.</p>
<p>After we got the HTML document of the web page, we extracted the image by finding the "img" tag. We should have a list of "img" tags before we move to the next step.</p>
<p>From that "img" tags list we download images by extracting URL from "src" attribute. Download, as binary and store it as variable.</p>
<p>Next, we send a request containing the binary of that images to GCP's Auto ML Vision. There is a free tier on it, don't worry about spending money on this experiment.</p>
<p>The response from AutoML Vision will be a string, but it could be more pretty. We need to make those strings readable and useable for the final part.</p>
<h1 id="heading-implement">Implement</h1>
<p>Now that you have a high level of what we will build, let us start coding!</p>
<p>First thing first, we will need to set up the environment. I use <code>venv</code> to keep my computer clean, but it is up to you. There are many ways to do this that I will not explain. Make sure to install packages. Below are the packages that we need.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup <span class="hljs-keyword">as</span> bs
<span class="hljs-keyword">from</span> google.cloud <span class="hljs-keyword">import</span> vision
<span class="hljs-keyword">import</span> urllib

<span class="hljs-keyword">import</span> re
<span class="hljs-keyword">import</span> requests
</code></pre>
<p>Our first method will be to get an HTML document from a URL (my blog page) and extract "img" tags from it. After that, we used links from the "src" attribute, downloaded them, and stored them in a variable. The return value is a list of image files in binaries.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_image_files_from_link</span>(<span class="hljs-params">url</span>):</span>
    r = requests.get(url)
    soup = bs(r.content)
    image_tags = soup.find_all(
        <span class="hljs-string">'img'</span>, {<span class="hljs-string">'src'</span>: re.compile(<span class="hljs-string">'(.*)(jpg|png|gif|JPG|PNG|GIF)'</span>)})
    image_srcs = list(map(<span class="hljs-keyword">lambda</span> x: x.attrs.get(<span class="hljs-string">'src'</span>), image_tags))

    image_files = []
    <span class="hljs-keyword">for</span> src <span class="hljs-keyword">in</span> image_srcs:
        <span class="hljs-keyword">try</span>:
            src = src.split(<span class="hljs-string">'='</span>)[<span class="hljs-number">1</span>]
            src = urllib.parse.unquote(src)
            response = requests.get(src)
            image_files.append(BytesIO(response.content))
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            <span class="hljs-keyword">continue</span>

    <span class="hljs-keyword">return</span> image_files
</code></pre>
<p>Using the first method's return value, we create a method to loop it and detect (as in extracting text from an image) and print it. We will implement the "detect" method in a moment.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_text_from_image</span>(<span class="hljs-params">url</span>):</span>
    image_files = get_image_files_from_link(url)
    titles = []
    <span class="hljs-keyword">for</span> image_file <span class="hljs-keyword">in</span> image_files:
        title = detect(image_file)
        title = parse_text(title)
        <span class="hljs-keyword">if</span> title != <span class="hljs-string">''</span>:
            titles.append(title)

    print(titles)
</code></pre>
<p>For the "detect" method, we initialize GCP's client, build a request, and send it to GCP. The response will be an object with metadata on how it was processed, but the <code>text_annotations</code> content is essential. It has extracted text from an image.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">detect</span>(<span class="hljs-params">image_file</span>):</span>
    <span class="hljs-keyword">try</span>:
        client = vision.ImageAnnotatorClient()

        content = image_file.read()
        image = vision.Image(content=content)

        response = client.document_text_detection(image=image)
        labels = response.text_annotations

        <span class="hljs-comment"># Get longest text in labels.description</span>
        text = <span class="hljs-string">''</span>
        <span class="hljs-keyword">for</span> label <span class="hljs-keyword">in</span> labels:
            <span class="hljs-keyword">if</span> len(label.description) &gt; len(text):
                text = label.description

        <span class="hljs-keyword">return</span> text
    <span class="hljs-keyword">except</span> Exception:
        <span class="hljs-keyword">return</span> <span class="hljs-string">''</span>
</code></pre>
<p>If we return it as it is, we will receive a string with many unnecessary values like space, line breaks, or words we don't want. Here is the example of it.</p>
<pre><code class="lang-bash">[<span class="hljs-string">'aend'</span>, <span class="hljs-string">'aend'</span>, <span class="hljs-string">'My First Talk\nas an Engineer\nAlvin Endratno\nFull Stack Engineer\nZel'</span>, <span class="hljs-string">''</span>, <span class="hljs-string">'Many Ways to\nDeploy Docker in\nAWS\nAlvin Endratno\nFull Stack Engineer\nZel'</span>, <span class="hljs-string">''</span>, <span class="hljs-string">'@alvinend\nUsing SQL to Query Data with\nDelta Lake\nAPACHE\nSpark A\nDELTA LAKE'</span>, <span class="hljs-string">''</span>, <span class="hljs-string">''</span>, <span class="hljs-string">'@alvinend\nSetup Jupyter in EC2 and\nApache Spark with Delta Lake\nconnection to S3\nAPACHE\nADELTA LAKE'</span>, <span class="hljs-string">"@alvinend\nConfigure Lambda's Provisioned\nConcurrency in Multi Environments\nAWS SAM\naws"</span>, <span class="hljs-string">'@alvinend\nComparing React JS and\nSolid JS in Syntax'</span>, <span class="hljs-string">'@alvinend\nServerless Cron Jobs with\nAWS Batch\naws'</span>, <span class="hljs-string">'@alvinend\nBuilding AWS EC2 Manager with\nLambda and Slack\naws\n●入米'</span>, <span class="hljs-string">'@alvinend\nAWS S3 as a Database\nwith S3 Select\naws\n2'</span>, <span class="hljs-string">'@alvinend\nBuilding Serverless Task Manager:\nIntroduction &amp; Design (Part 1)\nXd'</span>, <span class="hljs-string">'@alvinend\nKnowing How Javascript\nExecute Its Codes\nUS'</span>, <span class="hljs-string">'@alvinend\nBuilding CLI Tic Tac Toe Game\nwith Node JS\nUS\nXXO\nOox\nXXX'</span>]
</code></pre>
<p>To avoid that, we need to parse it. Usually, it is done by writing a Regular Expression (REGEX) or some machine learning, but in this example, we will remove a word that has at least five characters, remove the author's name and trim it.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">parse_text</span>(<span class="hljs-params">text</span>):</span>
    parsed_text = <span class="hljs-string">''</span>
    <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> text.split(<span class="hljs-string">'\n'</span>):
        <span class="hljs-keyword">if</span> len(c) &lt; <span class="hljs-number">5</span>:
            <span class="hljs-keyword">continue</span>

        parsed_text += c + <span class="hljs-string">' '</span>

    parsed_text = parsed_text.replace(<span class="hljs-string">'@alvinend'</span>, <span class="hljs-string">''</span>)
    parsed_text = parsed_text.replace(<span class="hljs-string">'Alvin Endratno'</span>, <span class="hljs-string">''</span>)
    parsed_text = parsed_text.replace(<span class="hljs-string">'Full Stack Engineer'</span>, <span class="hljs-string">''</span>)
    parsed_text = parsed_text.strip()

    <span class="hljs-keyword">return</span> parsed_text
</code></pre>
<p>The last thing to do is run it. <code>python3</code> <a target="_blank" href="http://main.py"><code>main.py</code></a></p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    res = get_text_from_image(
        <span class="hljs-string">'https://blog.alvinend.tech/'</span>)
</code></pre>
<p>And the result is below.</p>
<pre><code class="lang-bash">[<span class="hljs-string">'My First Talk as an Engineer'</span>, <span class="hljs-string">'Many Ways to Deploy Docker in'</span>, <span class="hljs-string">'Using SQL to Query Data with Delta Lake APACHE Spark A DELTA LAKE'</span>, <span class="hljs-string">'Setup Jupyter in EC2 and Apache Spark with Delta Lake connection to S3 APACHE ADELTA LAKE'</span>, <span class="hljs-string">"Configure Lambda's Provisioned Concurrency in Multi Environments AWS SAM"</span>, <span class="hljs-string">'Comparing React JS and Solid JS in Syntax'</span>, <span class="hljs-string">'Serverless Cron Jobs with AWS Batch'</span>, <span class="hljs-string">'Building AWS EC2 Manager with Lambda and Slack'</span>, <span class="hljs-string">'AWS S3 as a Database with S3 Select'</span>, <span class="hljs-string">'Building Serverless Task Manager: Introduction &amp; Design (Part 1)'</span>, <span class="hljs-string">'Knowing How Javascript Execute Its Codes'</span>, <span class="hljs-string">'Building CLI Tic Tac Toe Game with Node JS'</span>]
</code></pre>
<p>It could be better, but it shows how to extract information from images with GCP AutoML Vision.</p>
<h1 id="heading-closing">Closing</h1>
<p>In real-life projects, we usually extract information from images to get information unavailable in texts on websites. For example, a company website with a phone number in an image. We can get it by using a regex that determines the phone number in the raw string GCP AutoML provided.</p>
<p>Also, it is crucial to ensure it is legal before scaping websites. Happy Scraping!</p>
]]></content:encoded></item><item><title><![CDATA[My First Talk as an Engineer]]></title><description><![CDATA[This year's September, I was asked by my manager at Baseconnect to give a lightning talk to a startup community made by AWS called AWS Startup Community. I was pleasantly surprised even though I had no speech experience as an Engineer. Without a seco...]]></description><link>https://blog.alvinend.tech/my-first-talk-as-an-engineer</link><guid isPermaLink="true">https://blog.alvinend.tech/my-first-talk-as-an-engineer</guid><category><![CDATA[presentations]]></category><category><![CDATA[Public Speaking]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Tue, 08 Nov 2022 14:53:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1666161486383/FqkZj7yOa.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This year's September, I was asked by my manager at Baseconnect to give a lightning talk to a startup community made by AWS called AWS Startup Community. I was pleasantly surprised even though I had no speech experience as an Engineer. Without a second thought, I agreed to give my lighting talk.</p>
<h2 id="heading-preparation">Preparation</h2>
<p>After agreeing to speak, I was given some time to prepare it. I began building presentation slides. It was a tech event, so I talked about my project, where I was responsible for architecting servers with AWS. We used a rather unpopular service of AWS and managed to create a great outcome with it, making me think that people would like to hear it.</p>
<p>Done with the slides, I asked them to review my slides. My manager, an SRE engineer, and I gathered in a meeting and heard my presentation. I got a lot of feedback on my way of presenting and errors in technical content. I'm glad to have colleagues that have my back.</p>
<p>Finally, we asked the design team to decorate my slides. It was mind-blowing. Here is the first page. It says, "Implementation of Processing Large Amounts of Data using Serverless".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1666160785458/U4HZWzw65.png" alt="image.png" /></p>
<p>Mainly talked about leveraging AWS serverless services (Lambda, API Gateway, and AWS Batch) to make high-availability and low-cost services. It was similar to my <a target="_blank" href="https://blog.alvinend.tech/serverless-cron-jobs-with-aws-batch">blog post</a> that I published this May.</p>
<h2 id="heading-event-day">Event Day</h2>
<p>The event was held in Osaka, Japan. At that time, I was in Indonesia, so all the speakers were onsite except me, who talked through video call.</p>
<p>I woke up in the morning, brushed my teeth, ate my breakfast, and rehearsed it many times. From morning until it was my turn, late evening at 9 PM. I was nervous.</p>
<p>Not only is this my first presentation as an engineer, but other keynote speakers are also amazing people. The Head of Product Development, CTO, and Co-Founder speak at that event. The cherry on top, when it came to my turn to speak, there was a technical problem, and my turn was pushed to last. Again I was nervous.</p>
<p>However, a few minutes before my turn, I calmed. I don't know why, but my heartbeats are slowed, my mind is clear, and I can normally speak. The presentation went well.</p>
<p>Several days later, I received an email with comments about my presentation. It was nice to know that people took an interest in what you said.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1666161081086/sKzrgJPWi.png" alt="image.png" /></p>
<h2 id="heading-closing">Closing</h2>
<p>As an introverted engineer, I know how hard it is to talk to many peoples. But an opportunity like this won't come forever, and I want to seize every opportunity to the best of my ability.</p>
]]></content:encoded></item><item><title><![CDATA[Many Ways to Deploy Docker in AWS]]></title><description><![CDATA[Docker has been around for more than a decade. Since then,  many applications have been used it. It also has a significant role in the recent boom of microservices. Therefore it is essential to understand how docker builds one and deploys it into the...]]></description><link>https://blog.alvinend.tech/many-ways-to-deploy-docker-in-aws</link><guid isPermaLink="true">https://blog.alvinend.tech/many-ways-to-deploy-docker-in-aws</guid><category><![CDATA[AWS]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Docker]]></category><category><![CDATA[containers]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Mon, 17 Oct 2022 18:42:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1666032114825/P9EoHDh1W.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Docker has been around for more than a decade. Since then,  many applications have been used it. It also has a significant role in the recent boom of microservices. Therefore it is essential to understand how docker builds one and deploys it into the cloud. This time we will look into how to deploy docker in AWS, the largest cloud provider right now.</p>
<p>When you search in google on how to deploy docker in AWS, chances are you will find AWS ECS (Elastic Container Service). In my opinion, it is a good service, if you don't mind the complexity. However, if you are starting and want to deploy it, there may be another way that you will find fit.</p>
<h1 id="heading-simple-compute-service">Simple compute service</h1>
<p>While some people may say that using a server to deploy docker is outdated. It is the simplest way to get your application started and running. If you want to know more about AWS and its concept, I suggest you use EC2 (Elastic Container). But if your only interest is getting stuff running, there is a service called AWS Lightsail.</p>
<h1 id="heading-platform-as-a-service-with-aws-elastic-beanstalk">Platform as a service with AWS Elastic Beanstalk</h1>
<p>AWS Elastic Beanstalk helps developers focus on applications by offering a way to quickly deploy and manage applications in the AWS Cloud without learning about the infrastructure that runs those applications. Elastic Beanstalk reduces management complexity without restricting choice or control. You upload your application, and Elastic Beanstalk automatically handles the details of capacity provisioning, load balancing, scaling, and application health monitoring.</p>
<p>For deploying docker, AWS Elastic Beanstalk offers two types of platform branches. One is called docker, Running on 64bit Amazon Linux 2, and the other is called ECS, Running on 64bit Amazon Linux 2. If you want to get your application started but with some AWS infrastructure behind it, use <code>Docker Running on 64bit Amazon Linux 2</code>. It will create EC2 for you. On the contrary, choose the latter if you want to build it on ECS so it can be scalable.</p>
<h1 id="heading-the-famous-elastic-container-service">The Famous Elastic Container Service</h1>
<p>ECS is renowned for some reasons. For many things that ECS does, like managing and orchestrating many containers, health checks, and multi-availability zone clusters, ECS is straightforward and easy to understand, in my opinion. Not to mention it work great with a serverless container service named Fargate. That is if you know AWS.</p>
<p>To build an ECS cluster, you will need to understand how to make EC2 and set up a network in VPC so that ECS can communicate with its agent. Also, you don't want a security issue, so you set up a firewall called a security group. Because you are dealing with a cluster, you will need a load balancer. Besides that, there is also Code Pipeline to build CI/CD and Elastic Cloud Registry (ECR) to store your image before it can be used by ECS.</p>
<p>ECS is an excellent service if you know what you are doing.</p>
<h1 id="heading-kubernetes-with-elastic-kubernetes-service">Kubernetes with Elastic Kubernetes Service</h1>
<p>I only can recommend EKS if you are both comfortable with EKS and Kubernetes. While I said that ECS is a simple service if you know AWS, EKS is, in my opinion, complex even if you know AWS. Learning Kubernetes alone may take some time. The merit that I can think of if that Kubernetes is open source which many cloud providers offer. It gives us the flexibility to change, and because it is open source and widely known, it has a broader appeal for hiring.</p>
<h1 id="heading-closing">Closing</h1>
<p>There are many ways to deploy docker, and I think it is caused by the broad need to deploy docker. For instance, someone may want to deploy it without much thinking, but others may need more control over the infrastructure. I am trying to say there are no wrong choices or services.</p>
]]></content:encoded></item><item><title><![CDATA[Using SQL to Query Data with Delta Lake]]></title><description><![CDATA[Last time, we set up Jupyter in EC2 and Apache Spark with Delta Lake connection to S3. We will import data from the dataset and query it with SQL this time.
About Dataset
For this experiment, we will use a dataset about courses, students, and their i...]]></description><link>https://blog.alvinend.tech/using-sql-to-query-data-with-delta-lake</link><guid isPermaLink="true">https://blog.alvinend.tech/using-sql-to-query-data-with-delta-lake</guid><category><![CDATA[big data]]></category><category><![CDATA[spark]]></category><category><![CDATA[lakehouse]]></category><category><![CDATA[Data-lake]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Thu, 22 Sep 2022 17:27:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1663867336560/DMfuvFdA1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last time, we set up Jupyter in EC2 and Apache Spark with Delta Lake connection to S3. We will import data from the dataset and query it with SQL this time.</p>
<h1 id="heading-about-dataset">About Dataset</h1>
<p>For this experiment, we will use a dataset about courses, students, and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). You can get it <a target="_blank" href="https://data.world/uci/open-university-learning-analytics-dataset">here</a>.</p>
<p>It has enormous data. One of them has around 450 MB of CSV. We will use that to see how fast Delta Lake can insert and query the data.</p>
<h1 id="heading-import-dataset">Import Dataset</h1>
<p>After we downloaded the dataset and uploaded it into the server from jupyter, we needed to read it with Apache Spark, write it as delta's parquet, and upload it to S3.</p>
<p>Below is a method to do that.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">import_csv</span>(<span class="hljs-params">filename</span>):</span>
    <span class="hljs-comment"># Read CSV from local directory</span>
    df = spark \
        .read \
        .option(<span class="hljs-string">"header"</span>,<span class="hljs-string">"true"</span>) \
        .csv(<span class="hljs-string">f"./dataset/<span class="hljs-subst">{filename}</span>.csv"</span>)

    <span class="hljs-comment"># Write to S3</span>
    df.write\
        .mode(<span class="hljs-string">"overwrite"</span>)\
        .format(<span class="hljs-string">"delta"</span>)\
        .save(<span class="hljs-string">f"s3://s3-bucket-name/table/<span class="hljs-subst">{filename}</span>/"</span>)
</code></pre>
<p>After that, call it with a parameter of the name of the CSV name. Below is an example of importing assessment.csv to delta's parquet in S3.</p>
<pre><code class="lang-python">import_csv(<span class="hljs-string">"vle"</span>)
</code></pre>
<p>After it succeeds, we can confirm it by going to S3.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663865304604/qRVRCnfId.png" alt="image.png" /></p>
<p>Repeat it so that all CSV can be imported. Here is a summary of the file name and execution time.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>File Name</td><td>Size</td><td>Execution Time</td></tr>
</thead>
<tbody>
<tr>
<td>assessments.csv</td><td>8 kB</td><td>CPU times: user 8.07 ms, sys: 1.11 ms, total: 9.19 ms Wall time: 3.59 s</td></tr>
<tr>
<td>courses.csv</td><td>526 B</td><td>CPU times: user 18.2 ms, sys: 5.43 ms, total: 23.6 ms Wall time: 4.62 s</td></tr>
<tr>
<td>studentAssessment.csv</td><td>5.69 MB</td><td>CPU times: user 14.4 ms, sys: 5.06 ms, total: 19.5 ms Wall time: 7.12 s</td></tr>
<tr>
<td>studentInfo.csv</td><td>3.46 MB</td><td>CPU times: user 10.7 ms, sys: 9.68 ms, total: 20.4 ms Wall time: 5.93 s</td></tr>
<tr>
<td>studentRegistration.csv</td><td>1.13 MB</td><td>CPU times: user 11.9 ms, sys: 0 ns, total: 11.9 ms Wall time: 4.96 s</td></tr>
<tr>
<td>studentVle.csv</td><td>454 MB</td><td>CPU times: user 15 ms, sys: 5.31 ms, total: 20.3 ms Wall time: 27.5 s</td></tr>
<tr>
<td>vle.csv</td><td>271 kB</td><td>CPU times: user 11.9 ms, sys: 0 ns, total: 11.9 ms Wall time: 3.63 s</td></tr>
</tbody>
</table>
</div><h1 id="heading-querying-data">Querying Data</h1>
<p>After we import all our datasets into S3, we can start querying them with SQL. Before that, let me declare one method that I will explain later. </p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">table_dir</span>(<span class="hljs-params">tablename, with_as=True</span>):</span>
    <span class="hljs-keyword">if</span> with_as:
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"delta.`s3://s3-bucket-name/table/<span class="hljs-subst">{tablename}</span>/` AS <span class="hljs-subst">{tablename}</span>"</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"delta.`s3://s3-bucket-name/table/<span class="hljs-subst">{tablename}</span>/`"</span>
</code></pre>
<p>When querying with delta lake, we need to specify where the folder that holds delta's parquet is. Rather than writing it every time, I made a method of it. </p>
<h2 id="heading-select">SELECT</h2>
<p>Start with an easy one. Let us write a SELECT query.</p>
<pre><code class="lang-python">spark.sql(<span class="hljs-string">f"""
    SELECT code_module, count(*) as module_count
    FROM <span class="hljs-subst">{table_dir(<span class="hljs-string">'studentVle'</span>)}</span>
    GROUP BY code_module
"""</span>).show()
</code></pre>
<p>Here we query a 400MB table and group it. It only took 1.74 seconds! </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663866157710/gjC630spG.png" alt="image.png" /></p>
<h2 id="heading-join">JOIN</h2>
<p>Let's take it a little further. Try joining three tables.</p>
<pre><code class="lang-python">spark.sql(<span class="hljs-string">f"""
    SELECT activity_type, SUM(sum_click)
    FROM <span class="hljs-subst">{table_dir(<span class="hljs-string">'studentVle'</span>)}</span> 
    INNER JOIN <span class="hljs-subst">{table_dir(<span class="hljs-string">'vle'</span>)}</span>
    ON studentVle.id_site = vle.id_site
    INNER JOIN <span class="hljs-subst">{table_dir(<span class="hljs-string">'studentInfo'</span>)}</span>
    ON studentVle.id_student = studentInfo.id_student
    WHERE studentInfo.final_result = "Pass"
    GROUP BY vle.activity_type
"""</span>).show()
</code></pre>
<p>Here we got two joins, one where and one grouping. Execute time is a little longer at 6.88 seconds, but I got the jobs done.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663866337171/DHFT0mLEi.png" alt="image.png" /></p>
<h2 id="heading-delete">DELETE</h2>
<p>Enough with the read. How about writes? Okay, let's try delete first.</p>
<pre><code class="lang-python">spark.sql(<span class="hljs-string">f"""
    DELETE FROM
        <span class="hljs-subst">{table_dir(<span class="hljs-string">'assessments'</span>)}</span>
    WHERE
        code_module = "AAA"
        AND code_presentation = "2013J"
        AND id_assessment = 1752 
"""</span>).show()
</code></pre>
<p>And yes, we can delete stuff.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663866568164/GZogS_eOW.png" alt="image.png" /></p>
<h2 id="heading-insert">INSERT</h2>
<p>Insert it back.</p>
<pre><code class="lang-python">spark.sql(<span class="hljs-string">f"""
    INSERT INTO
        <span class="hljs-subst">{table_dir(<span class="hljs-string">'assessments'</span>, <span class="hljs-literal">False</span>)}</span>
    VALUES
        ("AAA","2013J","1752","TMA","19","10")
"""</span>).show()
</code></pre>
<p>We also can insert stuff.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663866693133/VuLxCrHN-.png" alt="image.png" /></p>
<h1 id="heading-closing">Closing</h1>
<p>With this experiment, we know we can use Delta Lake to import and query data with our beloved SQL. It is good to note that what we do today can be done with normal Apache Spark. Next time, we will do an experiment covering Delta Lake features that made it special. Thank you for reading!</p>
]]></content:encoded></item><item><title><![CDATA[Setup Jupyter in EC2 and Apache Spark with Delta Lake connection to S3]]></title><description><![CDATA[Delta lake has been booming for the last two years after Databricks announce it as "New Generation Data Lakehouse," but behind the boom, there are not enough examples and posts of it. I want to change it by adding one article about it. This time we w...]]></description><link>https://blog.alvinend.tech/setup-jupyter-in-ec2-and-apache-spark-with-delta-lake-connection-to-s3</link><guid isPermaLink="true">https://blog.alvinend.tech/setup-jupyter-in-ec2-and-apache-spark-with-delta-lake-connection-to-s3</guid><category><![CDATA[lakehouse]]></category><category><![CDATA[Data-lake]]></category><category><![CDATA[#datawarehouse]]></category><category><![CDATA[big data]]></category><category><![CDATA[spark]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Fri, 09 Sep 2022 07:29:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1662708493625/8zGd5FlTM.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Delta lake has been booming for the last two years after Databricks announce it as "New Generation Data Lakehouse," but behind the boom, there are not enough examples and posts of it. I want to change it by adding one article about it. This time we will be building an EC2 server with Apache Spark with Delta Lake on it and accessible using Jupyter from your local computer.</p>
<h1 id="heading-launching-ec2">Launching EC2</h1>
<p>We will not do an EC2 launch tutorial, so I am just going to write simple steps and hope you can manage! (Tips: There are a lot of tutorials about EC2, google it)</p>
<ol>
<li>Log in to your AWS Console</li>
<li>Launch EC2 with these setting</li>
</ol>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Name</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>OS</td><td>Ubuntu 22.04 64bit</td></tr>
<tr>
<td>Instance type</td><td>t3a.xlarge</td></tr>
<tr>
<td>Key Pair</td><td>(Fill it with your key)</td></tr>
<tr>
<td>Security Group</td><td>Open All Trafic to Public</td></tr>
<tr>
<td>Storage</td><td>gp2 80GB</td></tr>
</tbody>
</table>
</div><p>That is all! Try to ssh into EC2 before continuing to the next step.</p>
<h1 id="heading-install-required-tools">Install Required Tools</h1>
<p>Next, let's install python, pip, and pyspark</p>
<h3 id="heading-install-python">Install Python</h3>
<p>Ubuntu 22.04 LTS ships with the latest toolchains for Python, Rust, Ruby, Go, PHP and Perl, and users get first access to the latest updates for essential libraries and packages. Just in case, let's upgrade packages.</p>
<pre><code class="lang-bash">sudo apt update
sudo apt -y upgrade
</code></pre>
<p>And check out the python version.</p>
<pre><code>python3 -V
</code></pre><p>When I wrote this article, my latest python version was <code>3.10.4.</code></p>
<h3 id="heading-install-pip">Install PIP</h3>
<p>Although our python is built-in into Ubuntu, it does not come with package manager <code>pip</code>. So let us install it! </p>
<pre><code class="lang-bash">sudo apt install -y python3-pip
</code></pre>
<p>Same with python, it is always good to check if it is correctly installed or not. Run the below command to check its version.</p>
<pre><code class="lang-bash">pip -V
</code></pre>
<p>Here is my output.</p>
<pre><code class="lang-bash">pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
</code></pre>
<h3 id="heading-install-apache-spark">Install Apache Spark</h3>
<p>Running Delta Lake require Apache Spark, so let's install it!</p>
<pre><code class="lang-bash">pip install pyspark==3.3
</code></pre>
<h3 id="heading-install-java">Install Java</h3>
<p>Execute the following command to install the JRE  and JDK from OpenJDK 11.</p>
<pre><code class="lang-bash">sudo apt install default-jre
sudo apt install default-jdk
</code></pre>
<p>It is always good to check if it is correctly installed or not. Run the below command to check its version.</p>
<pre><code class="lang-bash">java -version
</code></pre>
<p>Here is my output.</p>
<pre><code>openjdk version <span class="hljs-string">"11.0.16"</span> <span class="hljs-number">2022</span><span class="hljs-number">-07</span><span class="hljs-number">-19</span>
OpenJDK Runtime Environment (build <span class="hljs-number">11.0</span><span class="hljs-number">.16</span>+<span class="hljs-number">8</span>-post-Ubuntu<span class="hljs-number">-0</span>ubuntu122<span class="hljs-number">.04</span>)
OpenJDK <span class="hljs-number">64</span>-Bit Server VM (build <span class="hljs-number">11.0</span><span class="hljs-number">.16</span>+<span class="hljs-number">8</span>-post-Ubuntu<span class="hljs-number">-0</span>ubuntu122<span class="hljs-number">.04</span>, mixed mode, sharing)
</code></pre><h1 id="heading-setting-up-jupyter">Setting up Jupyter</h1>
<p>Running your Delta Lake in CLI is cool, but it isn't enjoyable. So, let us install Jupyter in Ubuntu which is accessible from our local computer.</p>
<h3 id="heading-install-jupyter">Install Jupyter</h3>
<p>Believe it or not, to run jupyter, we need to install jupyter. Here is a command to install it.</p>
<pre><code class="lang-bash">pip install notebook
</code></pre>
<p>Easy right? But in my case, I had some problems. Although it was installed successfully, I can't run the <code>jupyter</code> command. When I look back at the install log, I found some warnings.</p>
<pre><code class="lang-bash">WARNING: The script jupyter-execute is installed <span class="hljs-keyword">in</span> <span class="hljs-string">'/home/ubuntu/.local/bin'</span>, <span class="hljs-built_in">which</span> is not on PATH.
  Consider adding this directory to PATH or, <span class="hljs-keyword">if</span> you prefer to suppress this warning, use --no-warn-script-location.
</code></pre>
<p>So I suspect it will work when I add that directory to the path. For adding a path to <code>~/.bashrc</code>, run the command below.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"export PATH=<span class="hljs-variable">$PATH</span>:<span class="hljs-variable">$HOME</span>/.local/bin"</span>  | tee -a ~/.bashrc
</code></pre>
<p>And I successfully ran <code>jupyter</code> in my terminal.</p>
<h3 id="heading-expose-jupyter-to-public">Expose Jupyter to Public</h3>
<p>The next step is to make jupyter accessible from our local computer. First, generate jupyter's config file by executing the command below.</p>
<pre><code class="lang-bash">jupyter notebook --generate-config
</code></pre>
<p>It should output your config directory. Mine was <code>/home/ubuntu/.jupyter/jupyter_notebook_config.py</code>. Open that file.</p>
<pre><code class="lang-bash">vi /home/ubuntu/.jupyter/jupyter_notebook_config.py
</code></pre>
<p>Find, Un-comment, and edit these options to expose your juypter</p>
<p>First one is <code>c.Notebook.App.ip</code>. It is to specify the IP address the notebook server will listen on so that we can access it with our EC2 public IP address.</p>
<pre><code class="lang-python">c.NotebookApp.ip = <span class="hljs-string">'*'</span>
</code></pre>
<p>Second is <code>c.NotebookApp.open_browser</code> and specify it to <code>False</code>. We don't want our Ubuntu to open the notebook when we start the server</p>
<pre><code>c.NotebookApp.open_browser = False
</code></pre><p>And we are all set. The last thing to do is to start the notebook server.</p>
<pre><code class="lang-bash">jupyter notebook
</code></pre>
<p>Access it with EC2's public IP, and we should get a notebook similar to this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662690152111/pi63UL-FY.png" alt="image.png" /></p>
<h1 id="heading-run-spark-with-delta-lake">Run Spark with Delta Lake</h1>
<p>In the final step, let us initiate spark's session and confirm if there are no errors. To do that, we need to create a new notebook in jupyter's console and run the code below. (It is set up to connect with S3)</p>
<p>Note: I tried many ways to make this work, including setting it with the latest version of delta lake and Hadoop (<code>3.3.x</code>), but it throws a java error, and I cannot find a way to fix it. If you can have the latest version working, please let me know in the comments. I referenced the below code from <a target="_blank" href="https://towardsdatascience.com/getting-started-with-delta-lake-spark-in-aws-the-easy-way-9215f2970c58">Getting started with Delta Lake &amp; Spark in AWS</a>— The Easy Way post by Irfan Elahi (Thank you!).</p>
<pre><code><span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession
spark_jars_packages = <span class="hljs-string">"com.amazonaws:aws-java-sdk:1.11.563,org.apache.hadoop:hadoop-aws:3.2.2,io.delta:delta-core_2.12:1.2.1"</span>
spark = (
    SparkSession.builder.master(<span class="hljs-string">"local[*]"</span>)
    .appName(<span class="hljs-string">"PySparkLocal"</span>)
    .config(<span class="hljs-string">"spark.sql.extensions"</span>, <span class="hljs-string">"io.delta.sql.DeltaSparkSessionExtension"</span>)
    .config(<span class="hljs-string">"spark.sql.catalog.spark_catalog"</span>, <span class="hljs-string">"org.apache.spark.sql.delta.catalog.DeltaCatalog"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3.impl"</span>, <span class="hljs-string">"org.apache.hadoop.fs.s3a.S3AFileSystem"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.AbstractFileSystem.s3.impl"</span>, <span class="hljs-string">"org.apache.hadoop.fs.s3a.S3AFileSystem"</span>)
    .config(<span class="hljs-string">"spark.delta.logStore.class"</span>, <span class="hljs-string">"org.apache.spark.sql.delta.storage.S3SingleDriverLogStore"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3a.connection.timeout"</span>, <span class="hljs-string">"3600000"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3a.connection.maximum"</span>, <span class="hljs-string">"1000"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3a.threads.max"</span>, <span class="hljs-string">"1000"</span>)
    .config(<span class="hljs-string">"spark.jars.packages"</span>, spark_jars_packages)
    .config(<span class="hljs-string">"spark.sql.sources.partitionOverwriteMode"</span>, <span class="hljs-string">"dynamic"</span>)
    .config(<span class="hljs-string">"spark.databricks.delta.schema.autoMerge.enabled"</span>, <span class="hljs-string">"true"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3a.endpoint"</span>, <span class="hljs-string">"s3.ap-southeast-2.amazonaws.com"</span>)
    .config(<span class="hljs-string">"spark.hadoop.fs.s3a.aws.credentials.provider"</span>, <span class="hljs-string">"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"</span>)
    .getOrCreate()
)
</code></pre><p>And it should install the necessary packages and if there are no errors, congratulations you set up Apache Spark, Delta Lake, and Jupyter in EC2.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662691806018/Z2wtE9Ntn.png" alt="image.png" /></p>
<h1 id="heading-closing">Closing</h1>
<p>When using Delta Lake for lakehouse in companies, we usually use databricks service, AWS EMR, or services that use for big data processing not traditional servers, maybe that is why there are only a few articles or tutorials that provide a way to deploy it in servers like EC2. Next, I will be performing Delta Lake process in this notebook. Hope this helps you, Cheers!</p>
]]></content:encoded></item><item><title><![CDATA[Configure Lambda's Provisioned Concurrency in Multi Environments AWS SAM]]></title><description><![CDATA[Last week, I worked on a project that used Lambda as an API to run an algorithm triggered by multiple services in my company. I implemented it with AWS Serverless Application Model (AWS SAM), wrote the template, and set up CI/CD with no problem, exce...]]></description><link>https://blog.alvinend.tech/configure-lambdas-provisioned-concurrency-in-multi-environments-aws-sam</link><guid isPermaLink="true">https://blog.alvinend.tech/configure-lambdas-provisioned-concurrency-in-multi-environments-aws-sam</guid><category><![CDATA[AWS]]></category><category><![CDATA[aws lambda]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Infrastructure as code]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Thu, 09 Jun 2022 09:01:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1654765179044/nKHq717Gm.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last week, I worked on a project that used Lambda as an API to run an algorithm triggered by multiple services in my company. I implemented it with AWS Serverless Application Model (AWS SAM), wrote the template, and set up CI/CD with no problem, except for one, the first request was slow.</p>
<h1 id="heading-the-problem">The Problem</h1>
<p>A Lambda function instance created at startup will disappear after a certain period. Therefore, if Lambda is executed after a certain period, there is a good chance that the function instance has disappeared and is re-executed from its creation. This is what we call Cold Start.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654761386296/bAe7WI0gr.png" alt="image.png" /></p>
<p>Source: https://www.slideshare.net/AmazonWebServices/become-a-serverless-black-belt-optimizing-your-serverless-applications-aws-online-tech-talks/14</p>
<p>If Lambda is re-executed before the function instance is lost, the container is reused, as shown in the following slide, and execution without startup time (warm start) becomes possible. If Lambda functions are executed frequently, the container is more likely to be reused, and the probability of a cold start will decrease.</p>
<h1 id="heading-solution">Solution</h1>
<p>My goal is to prevent cold start so that customers using our services will have a good user experience. I could implement an event that fires lambda every minute, but In addition to the time-consuming setup process, There is no fixed amount of time before a function instance goes into an unconscious state. Therefore, periodic execution is only a method to reduce the probability of a cold start and cannot guarantee a warm start.</p>
<p>Another way, which guarantees a warm start and can be done with a bit of setup, is <strong>Provisioned Concurrency</strong>.</p>
<h2 id="heading-configuration">Configuration</h2>
<p>In my case, I use AWS SAM Template to Implement it. If you prefer to configure it on the console, I recommend reading <a target="_blank" href="https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/">AWS Official's blog post</a> about how to do it.</p>
<h3 id="heading-aws-sam-template-example">AWS SAM Template Example</h3>
<p>It is as simple as adding these three lines to the <code>template.yaml</code>. I will pretend we all know about SAM Template and some basic Code as Infrastructure in AWS.</p>
<pre><code class="lang-yaml"><span class="hljs-string">....</span>

<span class="hljs-attr">Resources:</span>
  <span class="hljs-attr">MyFunction:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::Serverless::Function</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-string">.....</span>
      <span class="hljs-attr">AutoPublishAlias:</span> <span class="hljs-string">live</span>
      <span class="hljs-attr">ProvisionedConcurrencyConfig:</span>
        <span class="hljs-attr">ProvisionedConcurrentExecutions:</span> <span class="hljs-number">1</span>
      <span class="hljs-string">.....</span>
</code></pre>
<p>And if you have multiple environments (QA, test, etc.) and want to apply it only on production to save cost. Here is my approach to doing that.</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">Parameters:</span>
    <span class="hljs-attr">Environment:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">String</span>
    <span class="hljs-attr">Description:</span> <span class="hljs-string">'Enter Env Name'</span>

<span class="hljs-attr">Conditions:</span>
  <span class="hljs-attr">ConfigureConcurrency:</span> <span class="hljs-type">!Equals</span> 
    <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">Environment</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">prd</span>

<span class="hljs-attr">Resources:</span>
  <span class="hljs-attr">MyFunction:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::Serverless::Function</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-string">.....</span>
      <span class="hljs-attr">AutoPublishAlias:</span> <span class="hljs-string">live</span>
      <span class="hljs-attr">ProvisionedConcurrencyConfig:</span>
        <span class="hljs-type">!If</span> 
          <span class="hljs-bullet">-</span> <span class="hljs-string">ConfigureConcurrency</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">ProvisionedConcurrentExecutions:</span> <span class="hljs-number">1</span>
          <span class="hljs-bullet">-</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">"AWS::NoValue"</span>
      <span class="hljs-string">.....</span>
</code></pre>
<p>It will create an alias but will not set provisioned concurrency in other environments except <code>prd</code>, which is an alias for production.</p>
<h2 id="heading-note-about-provisioned-concurrency">Note About Provisioned Concurrency</h2>
<ol>
<li>Provisioning Concurrency need time, we can set <code>DeploymentPreference</code> (<code>AllAtOnce</code>, <code>Canary10Percent5Minutes</code>, <code>Linear10PercentEvery1Minute</code> etc). as of my research, I couldn't find if it causes downtime or not.</li>
<li>Cannot use $LATEST. We must create a version or alias.</li>
<li>To make sure we use Provisioned Concurrency,  access with arn of the alias.  (<code>arn:aws:lambda:REGION:function:FUNTION_NAME:ALIAS_NAME</code>)</li>
<li>It is not cheap (sadly). Here is some simple Lambda simulation</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654762972132/5HWW7Rrh_.png" alt="image.png" /></p>
<p>As of my finding, here is a way to lower the price:</p>
<ul>
<li>Lower Concurrency count. Chances are we don't need it</li>
<li>Use Application Auto Scaling to scale down outside busy hours.</li>
<li>Use it only in an environment that needs it.</li>
</ul>
<h1 id="heading-closing">Closing</h1>
<p>Here, I have summarized my understanding. I hope it will be helpful to those who may have difficulty setting it up in AWS SAM template or making it work in multiple environments.</p>
<p>Cheers!</p>
]]></content:encoded></item><item><title><![CDATA[Comparing React JS and Solid JS in Syntax]]></title><description><![CDATA[Many new front-end web frameworks have been released in the last few years. Svelte, Preact, Angular, Ember, and Remix (although it is not all front-end frameworks). Lately, Solid Js has been discussed, and I decided to look at it.
There are many arti...]]></description><link>https://blog.alvinend.tech/comparing-react-js-and-solid-js-in-syntax</link><guid isPermaLink="true">https://blog.alvinend.tech/comparing-react-js-and-solid-js-in-syntax</guid><category><![CDATA[THW Web Apps]]></category><category><![CDATA[Frontend Development]]></category><category><![CDATA[Web Development]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Thu, 19 May 2022 09:03:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1652950915881/qRI6m2ytk.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Many new front-end web frameworks have been released in the last few years. Svelte, Preact, Angular, Ember, and Remix (although it is not all front-end frameworks). Lately, Solid Js has been discussed, and I decided to look at it.</p>
<p>There are many articles regarding how excellent Solid js is and how blazingly efficient it is. But this time, I will focus on syntax. I'm a big fan of React js; react is one framework that gets me a job as Engineer and my first "language" when it comes to frontend framework, so I will be comparing it to react.</p>
<h2 id="heading-state">State</h2>
<p>Let's start with the state. The easiest way to show the state's power is by building a counter. In React, I would be writing it like this.</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> React <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params">props</span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = React.useState(<span class="hljs-number">0</span>)

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Count: {count}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count + 1)}&gt;add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>in Solid, it will be</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> { render } <span class="hljs-keyword">from</span> <span class="hljs-string">"solid-js/web"</span>;
<span class="hljs-keyword">import</span> { createSignal } <span class="hljs-keyword">from</span> <span class="hljs-string">"solid-js"</span>;

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = createSignal(<span class="hljs-number">0</span>);

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>{count()}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count() + 1)}&gt;
        add
      <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>Beside Solid called it <code>Signal.</code> It is almost the same. The only difference is when referencing state, Solid needs to call it (<code>count()</code>) while React does not (<code>count</code>).</p>
<h2 id="heading-props">Props</h2>
<p>As for props, it is almost the same. We hand over the value to the child element, and the child element reads it. Because Solid's reactivity occurs on property access on Prop and State objects. Referencing them outside of a binding or reactive computation will not be tracked. Use built-in utilities like <code>mergeProps</code> and <code>splitProps</code> to prevent that.</p>
<h2 id="heading-effect">Effect</h2>
<p>Next, I will try to replicate React's <code>useEffect</code> on Solid. Let's say we want to log our count when the state/signal is updated.</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> React <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params">props</span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = React.useState(<span class="hljs-number">0</span>)

  React.useEffect(
    <span class="hljs-function">() =&gt;</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Count: <span class="hljs-subst">${count}</span>`</span>)
    },
    [count]
  )

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Count: {count}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count + 1)}&gt;add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>In Solid, I would write it like this.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = createSignal(<span class="hljs-number">0</span>);
  createEffect(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Count: <span class="hljs-subst">${count()}</span>`</span>)
  })

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>{count()}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count() + 1)}&gt;
        add
      <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>Pretty similar. With Solid, we don't need to fill out dependencies manually. As for lifecycle, same as React, we can use <code>createEffect</code> without any dependencies (or use <code>onMount</code>), and for the clean up, we can use the <code>onCleanup</code> function. As opposed to React, we use <code>return.</code></p>
<h2 id="heading-memos">Memo(s)</h2>
<p>Unlike React, Solid's components only mount once and don't rerender. Therefore, Solid doesn't need a reference identity function like <code>useCallback.</code> But Solid has a <code>createMemo</code> function for caching purposes. It makes read-only variables that only calculate when dependency (in most cases, prop and signal). In React, we would write it like this.</p>
<pre><code class="lang-js"><span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params">props</span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = React.useState(<span class="hljs-number">0</span>)

  <span class="hljs-keyword">const</span> calcValue = React.useMemo(
    <span class="hljs-function">() =&gt;</span> count**<span class="hljs-number">100</span>/<span class="hljs-number">5</span>**<span class="hljs-number">200</span>,
    [count]
  )

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Count: {count}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count + 1)}&gt;add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Calc: {calcValue}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>In Solid, we would write it like this.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = createSignal(<span class="hljs-number">0</span>);
  <span class="hljs-keyword">const</span> calcValue = createMemo(<span class="hljs-function">() =&gt;</span> count()**<span class="hljs-number">100</span>/<span class="hljs-number">5</span>**<span class="hljs-number">200</span>)

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>{count()}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count() + 1)}&gt;
        add
      <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Calc: {calcValue()}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}
</code></pre>
<p>Again, the only difference is with Solid, we don't need to declare dependencies, and when referencing it, we call it (. <code>calcValue()</code>)</p>
<h2 id="heading-context">Context</h2>
<p>When building a large application, chances are we will need some kind of global state. In React, we have of it; the most popular is Redux. But this time, we will take into built-in function <code>Context.</code> In React, we would write it like this.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> CountContext = React.createContext({
  <span class="hljs-attr">count</span>: <span class="hljs-number">0</span>,
  <span class="hljs-attr">setCount</span>: <span class="hljs-function">() =&gt;</span> <span class="hljs-keyword">void</span> <span class="hljs-number">0</span>
});

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params">props</span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = React.useState(<span class="hljs-number">0</span>)

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">CountContext.Provider</span> <span class="hljs-attr">value</span>=<span class="hljs-string">{{</span>
      <span class="hljs-attr">count</span>,
      <span class="hljs-attr">setCount</span>
    }}&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> /&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">CountContext.Provider</span>&gt;</span></span>
  )
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> countContext = React.useContext(CountContext)

  <span class="hljs-keyword">return</span> (<span class="xml"><span class="hljs-tag">&lt;&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Count: {countContext.count}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> countContext.setCount(count =&gt; count + 1)}&gt;add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
  <span class="hljs-tag">&lt;/&gt;</span></span>)
}
</code></pre>
<p>In Solid, we could write like this.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> CountContext = createContext({
  <span class="hljs-attr">count</span>: <span class="hljs-number">0</span>,
  <span class="hljs-attr">setCount</span>: <span class="hljs-function">() =&gt;</span> <span class="hljs-keyword">void</span> <span class="hljs-number">0</span>
})

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = createSignal(<span class="hljs-number">0</span>)
  <span class="hljs-keyword">const</span> store = {
    count,
    setCount
  }

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">CountContext.Provider</span> <span class="hljs-attr">value</span>=<span class="hljs-string">{store}</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> /&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">CountContext.Provider</span>&gt;</span></span>
  );
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> countContext = useContext(CountContext)

  <span class="hljs-keyword">return</span> (<span class="xml"><span class="hljs-tag">&lt;&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Count: {countContext.count()}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> countContext.setCount(count =&gt; count + 1)}&gt;add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
  <span class="hljs-tag">&lt;/&gt;</span></span>)
}
</code></pre>
<p>There is not much to comment on here; we can write it the same way.</p>
<h1 id="heading-ending">Ending</h1>
<p>I'm surprised at how similar React and Solid are and excited about Solid in the future. What do you think</p>
]]></content:encoded></item><item><title><![CDATA[Serverless Cron Jobs with AWS Batch]]></title><description><![CDATA[When we think about Serverless Jobs in AWS, the first thing that comes to mind is AWS Lambda. AWS Lambda is a fantastic computing service that lets you run code without provisioning or managing servers. But AWS Lambda has some limitations; the functi...]]></description><link>https://blog.alvinend.tech/serverless-cron-jobs-with-aws-batch</link><guid isPermaLink="true">https://blog.alvinend.tech/serverless-cron-jobs-with-aws-batch</guid><category><![CDATA[Cloud Computing]]></category><category><![CDATA[AWS]]></category><category><![CDATA[app development]]></category><category><![CDATA[software development]]></category><category><![CDATA[THW Cloud Computing]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Wed, 18 May 2022 11:01:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1652871608555/64pbl_yDd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When we think about Serverless Jobs in AWS, the first thing that comes to mind is AWS Lambda. AWS Lambda is a fantastic computing service that lets you run code without provisioning or managing servers. But AWS Lambda has some limitations; the function can only execute in a maximum of 15 minutes. For Cron Jobs that need more than 15 minutes, there is an AWS Batch service.</p>
<h1 id="heading-what-is-aws-batch">What is AWS Batch</h1>
<p>AWS Batch is a set of batch management capabilities that enables developers, scientists, and engineers to quickly and efficiently run hundreds of thousands of batch computing jobs on AWS. </p>
<p>AWS Batch can be integrated with serverless container service (AWS Fargate) and set so that we only pay for what we use.</p>
<h1 id="heading-build-batch">Build Batch</h1>
<p>First of all, we need to build a working batch job.</p>
<h2 id="heading-write-the-code">Write the code</h2>
<p>The easiest way to build a batch, in my opinion, is by creating a docker image. As long as it runs on a docker container, you can run any language in AWS Batch. In this example, we will be using python.</p>
<h3 id="heading-dockerfile">Dockerfile</h3>
<p>To create a docker image, we need Dockerfile. We will create a simple python container and install the package to our <code>requirements.txt.</code></p>
<p><code>./Dockerfile</code></p>
<pre><code>FROM python:<span class="hljs-number">3.9</span>

ADD . /app

WORKDIR <span class="hljs-operator">/</span>app

RUN pip install <span class="hljs-operator">-</span>r requirements.txt

VOLUME <span class="hljs-operator">/</span>app
</code></pre><h3 id="heading-sample-code">Sample Code</h3>
<p>For this example, we will run the downloaded package and test it if we can confirm it in the execution logs.</p>
<p><code>requirements.txt</code></p>
<pre><code class="lang-py">requests==<span class="hljs-number">2.27</span><span class="hljs-number">.1</span>
</code></pre>
<p><code>run.py</code></p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> requests

res = requests.get(<span class="hljs-string">"https://blog.alvinend.tech/"</span>)
print(res.text)
</code></pre>
<h2 id="heading-put-image-in-ecr">Put Image in ECR</h2>
<p>The next step is putting our image in ECR. </p>
<h3 id="heading-create-new-repository">Create New Repository</h3>
<ol>
<li>Go to <a target="_blank" href="https://us-west-1.console.aws.amazon.com/ecr/home?region=us-west-1">AWS ECR</a> (Don't forget to check your AWS region)</li>
<li>From the sidebar, go to "Repositories" and click the "Create repository" button.</li>
<li>Choose "Visibility Settings" to "Private," Enter the repository name.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1652865357648/Mo0Qhfbea5.png" alt="Screen Shot 2022-05-18 at 18.12.png" /></p>
<h3 id="heading-push-image">Push Image</h3>
<p>Select the created repository and click the "View push Command" button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1652865640566/f3gw6-3V5.png" alt="blurred_push_command.png" /></p>
<p>Run those four commands and check if the repository is updated from the AWS console.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1652866184136/7QfOjnFza.png" alt="image.png" /></p>
<p>P.S. If you face any error while pushing the image, there is a good chance there are authentication problems. The link below might help you.</p>
<p>https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html</p>
<h2 id="heading-execute-batch-in-aws-batch">Execute Batch in AWS Batch</h2>
<p>There are four steps before we can run our job.</p>
<h3 id="heading-create-compute-environment">Create Compute Environment</h3>
<p>As the name suggests, Compute Environment is where our jobs will be executed. There are two types of Compute environments.</p>
<ul>
<li>EC2</li>
<li>EC2 Spot</li>
<li>Fargate</li>
<li>Fargate Spot</li>
</ul>
<p>If you choose EC2, AWS will start a new EC2 Server and run your job there. Fargate is an AWS Serverless container service. Spot is an option to use surplus resources for a low price in exchange for the risk that the resource might be suddenly unavailable. We will be using Fargate Spot.</p>
<ul>
<li>Go to <a target="_blank" href="https://us-west-1.console.aws.amazon.com/batch/">AWS Batch Page</a></li>
<li>Click "Compute Environment" from the sidebar</li>
<li>Click the "Create" button</li>
<li>Fill out Create Form<ul>
<li>For "Compute environment configuration."</li>
</ul>
</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Key</td><td>Value</td><td>Explanation</td></tr>
</thead>
<tbody>
<tr>
<td>Compute environment type</td><td>Managed</td><td>Let's make AWS do complicated stuff for us 😀</td></tr>
<tr>
<td>Compute environment name</td><td>(anything)</td><td>Up to you 👍</td></tr>
<tr>
<td>Enable compute environment</td><td>True</td><td>We want to use it! 👊</td></tr>
</tbody>
</table>
</div><ul>
<li>For "Instance configuration."</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Key</td><td>Value</td><td>Explanation</td></tr>
</thead>
<tbody>
<tr>
<td>Provisioning model</td><td>Fargate Spot</td><td>Fargate for serverless! Spot for a low price! 💰</td></tr>
<tr>
<td>Maximum vCPUs</td><td>1</td><td>Only doing simple execution, does not need high compute power</td></tr>
</tbody>
</table>
</div><ul>
<li>For "Networking." Just leave it! Default VPC has a public subnet that can connect internet.</li>
</ul>
<ul>
<li>Click "Create."</li>
</ul>
<p>Just like that, we created our computing environment.</p>
<h3 id="heading-create-job-queue">Create Job Queue</h3>
<p>Jobs are submitted to a job queue where they reside until they can be scheduled to run in a Compute Environment.</p>
<ul>
<li>Go to <a target="_blank" href="https://us-west-1.console.aws.amazon.com/batch/">AWS Batch Page</a></li>
<li>Click "Job queues" from the sidebar</li>
<li>Click the "Create" button</li>
<li>Fill out Create Form<ul>
<li>For "Job queue configuration."</li>
</ul>
</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Key</td><td>Value</td><td>Explanation</td></tr>
</thead>
<tbody>
<tr>
<td>Job queue name</td><td>(anything)</td><td>Again, Up to you 👍</td></tr>
<tr>
<td>Priority</td><td>1</td><td>Priority of this queue against other queues. Since we only create one, it doesn't matter.</td></tr>
</tbody>
</table>
</div><ul>
<li>For "Connected compute environments," select compute environment that we created before.</li>
</ul>
<ul>
<li>Click "Create."</li>
</ul>
<h3 id="heading-create-job-definition">Create Job Definition</h3>
<p>Job Definition is a blueprint for creating jobs.</p>
<ul>
<li>Go to <a target="_blank" href="https://us-west-1.console.aws.amazon.com/batch/">AWS Batch Page</a></li>
<li>Click "Job Definition" from the sidebar</li>
<li>Click the "Create" button</li>
<li>Fill out Create Form<ul>
<li>For "Job type," Choose "Single Node" just because we don't need parallel execution.</li>
<li>For "General configuration," Enter any name you want and timeout of 300 (5 minutes).</li>
<li>For "Platform compatibility," Enter Fargate with the latest version, Check "Assign Public IP," and execution role to default.</li>
<li>For "Job configuration."</li>
</ul>
</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Key</td><td>Value</td><td>Explanation</td></tr>
</thead>
<tbody>
<tr>
<td>Image</td><td>YOUR_ACCOUNT_Id.dkr.ecr.YOUR_REGION.amazonaws.com/REPOSITORY_NAME:latest</td><td>Point it to your ECR repository</td></tr>
<tr>
<td>Command syntax</td><td>Bash</td><td>Because we love Bash 💖</td></tr>
<tr>
<td>vCPUs</td><td>1.0</td><td>We don't need much</td></tr>
<tr>
<td>Memory</td><td>2GB</td><td>Leave it to default</td></tr>
</tbody>
</table>
</div><p>And leave the rest to default.</p>
<ul>
<li>Click the "Create" button </li>
</ul>
<h3 id="heading-create-job">Create Job</h3>
<p>Preparation is done, and let us test-run it.</p>
<ul>
<li>Stay on the "Job Definition" Page</li>
<li>Select created job definition</li>
<li>Click on the "Submit new job" Button</li>
<li>Enter job name (again, whatever you want) and choose the job queue that we created</li>
<li>Click "Submit"</li>
</ul>
<p>Congratulations ! you have created your first AWS Batch job 🎉. Let's sit back and wait for our execution logs (Hint, it's not real-time, and there are delays)</p>
<p>Logs:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1652870119749/aNiI0HKfK.png" alt="image.png" /></p>
<h1 id="heading-scheduled-jobs">Scheduled Jobs</h1>
<p>The last thing we need to do is set a schedule because, without it, this will be a "job," not a "cron job." 😆</p>
<h2 id="heading-set-rule-in-aws-eventbridge">Set Rule in AWS Eventbridge</h2>
<ul>
<li>Go to <a target="_blank" href="https://ap-northeast-1.console.aws.amazon.com/events/home?region=ap-northeast-1#/">AWS Eventbridge</a> </li>
<li>Click the "Create Rule" button<ul>
<li>For "Define rule detail," Enter "Name" and "Description" with anything you like. Leave "Event Bus" to default and Rule type to "Schedule."</li>
<li>For "Define schedule," Select what schedule you want to run the job. See <a target="_blank" href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html">AWS Schedule Expressions</a></li>
<li>For "Select target(s)," Select target types to "AWS Service" and "Select a target" to "Batch job queue." Enter your resource ARN for Job Queue and Job Definition. Enter any name you like in the Job Name.</li>
<li>Skip "Configure tags."</li>
<li>Review and create</li>
</ul>
</li>
</ul>
<p>If done correctly, it will appear in the "Rules" list.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1652870884759/GKnt1pIDi.png" alt="image.png" /></p>
<h1 id="heading-ending">Ending</h1>
<p>This time we created a serverless job in AWS. Compared to AWS Lambda, AWS Batch need some work to set up, but both have unique use cases, and in my opinion, no tool can fit all scenarios. Knowing many tools on the internet will help us in our journey. Besides, I am happy there are many tools we can use 😃.</p>
]]></content:encoded></item><item><title><![CDATA[Building AWS EC2 Manager with Lambda and Slack]]></title><description><![CDATA[If you, like me, have a low spec laptop but want to build a heavy application, you should rent a cloud server and do cloud development. In my case, I started EC2 Instance and did my work there. I start my EC2 Instance at the beginning of the day and ...]]></description><link>https://blog.alvinend.tech/building-aws-ec2-manager-with-lambda-and-slack</link><guid isPermaLink="true">https://blog.alvinend.tech/building-aws-ec2-manager-with-lambda-and-slack</guid><category><![CDATA[AWS]]></category><category><![CDATA[aws lambda]]></category><category><![CDATA[ec2]]></category><category><![CDATA[slack]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sat, 26 Mar 2022 12:35:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1648298121390/t5zzcr-lD.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you, like me, have a low spec laptop but want to build a heavy application, you should rent a cloud server and do cloud development. In my case, I started EC2 Instance and did my work there. I start my EC2 Instance at the beginning of the day and stop after I am done with my job, or so I thought. I ended up sleeping without stopping my instance, and at the end of a month, my bill was going up high. After that and a lot of thinking, I begin to develop a solution for this problem. That by creating a way for me to stop or start the instance with less effort. </p>
<h1 id="heading-control-ec2-instance-with-slack-chat-with-lambda">Control EC2 Instance with Slack Chat with Lambda</h1>
<p>We will start with building a slack bot for controlling our EC2 Instance. For this, we need two lambda functions and an API Gateway. The first function we will name it<code>ec2-slack-handler</code> function. It will forward requests from API Gateway that came from the slack bot into our second function, <code>ec2-manager</code>. <code>ec2-manager</code> will stop and start function for us.</p>
<h2 id="heading-build-slack-handler">Build Slack Handler</h2>
<p>You might be thinking, why do we need a separate lambda function for handling slack event subscription, which will send a request to API Gateway when there are chats in slack and lambda function for a start or stopping EC2 Instance. I tried to do it in one function. After I finished implementing it, I was encountered a strange bug. When I trigger EC2 Instance one time, I get up to four triggers in my lambda. After many hours of research, I found out that you must return a response to slack. So, to avoid this bug, we need to make two lambda functions.</p>
<blockquote>
<p>Your app should respond to the event request with an HTTP 2xx within three seconds. &gt; If it does not, we'll consider the event delivery attempt failed. After a failure, we'll retry &gt; three times, backing off exponentially.</p>
<p>Maintain a response success rate of at least 5% of events per 60 minutes to prevent 
automatic disabling.</p>
<p>Respond to events with a HTTP 200 OK as soon as you can. Avoid actually processing &gt; and reacting to events within the same process. Implement a queue to handle 
inbound events after they are received.</p>
</blockquote>
<p>https://api.slack.com/apis/connections/events-api</p>
<h3 id="heading-slack-handler-function">Slack Handler Function</h3>
<p>This time we will be using python in lambda. Here is step by step on setting up the EC2 Instance bot in your slack.</p>
<h4 id="heading-create-lambda-function-with-python-39-runtime">Create Lambda function with <code>python 3.9</code> runtime.</h4>
<p>First of all, we will create a lambda that forwards slack bot requests to the slack manager and return 200 responses as soon as they come.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648277606009/MOgT-sqVp.png" alt="image.png" /></p>
<p>Before writing our "forward request" code, we need to finish the challenge that slack gives up when subscribing to the event. </p>
<pre><code><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> boto3

lambda_client = boto3.client(<span class="hljs-string">'lambda'</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    print(<span class="hljs-string">f"Received event:\n<span class="hljs-subst">{event}</span>\nWith context:\n<span class="hljs-subst">{context}</span>"</span>)

    slack_body = event.get(<span class="hljs-string">"body"</span>)
    slack_event = json.loads(slack_body)
    challenge_answer = slack_event.get(<span class="hljs-string">"challenge"</span>)

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">200</span>,
        <span class="hljs-string">'body'</span>: challenge_answer
    }
</code></pre><p>Lambda alone cannot connect to slack. We need our entry point, and it is API Gateway. AWS makes it easy for us to do that.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648277817613/O1Z0tiin0.png" alt="image.png" /></p>
<p>We are done with setting up our environment in AWS for now. Please take note of our API Gateway Endpoint.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648291819931/eXJf0brCG.png" alt="image.png" /></p>
<p>Next step, lets set up our slack apps.</p>
<h4 id="heading-setting-up-slack-apps">Setting up Slack Apps</h4>
<p>We want to make a bot that sends our chat in a certain channel to be sent to API Gateway. Here is how we make Slack Bot:</p>
<ol>
<li>Go to https://api.slack.com/ and click on the <code>Your Apps</code> Button</li>
<li>Login and create a new app in your workspace</li>
</ol>
<p>After you create your app, set Event Subscription to Lambda</p>
<ol>
<li>Go to Event Subscription by accessing it from the sidebar.</li>
<li>In Request URL input, enter API Gateway Endpoint that we memo earlier.</li>
<li>In "Subscribe to bot events" add "message.channels" user event.</li>
<li>Save changes.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648297914486/soPGYuzZ1.png" alt="nVtnkXCaT.png" /></p>
<p>What we set until now is a flow for our chat in slack to be sent to API Gateway as a request. Besides that, we need to develop a flow for our Lambda function to be able to send chats to our channel in slack.</p>
<ol>
<li>From the sidebar, go to "OAuth &amp; Permissions".</li>
<li>In the "Scopes" section, set "Bot Token Scopes" to be the same as below.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648293701135/9i09ur4Rj.png" alt="image.png" /></p>
<p>Press install to workspace, and don't forget to memo bot token that begins with <code>xoab-</code>.</p>
<h4 id="heading-configure-slack-handler-to-forward-function">Configure Slack Handler to forward Function</h4>
<p>Since we finished with our integration setting, we can delete our current code and replace it with the "request forwarding" code.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> boto3

lambda_client = boto3.client(<span class="hljs-string">'lambda'</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    print(<span class="hljs-string">f"Received event:\n<span class="hljs-subst">{event}</span>\nWith context:\n<span class="hljs-subst">{context}</span>"</span>)

    <span class="hljs-keyword">try</span>:
      lambda_client.invoke(
        FunctionName=<span class="hljs-string">'ec2-manager'</span>,
        InvocationType=<span class="hljs-string">'Event'</span>,
        Payload= json.dumps(event)
      ) 

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
      print(e)

    <span class="hljs-keyword">return</span> {
      <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">200</span>
    }
</code></pre>
<h3 id="heading-setting-up-ec2-manager-lambda-function">Setting up EC2 Manager Lambda Function</h3>
<p>Halfway there! We just need to create a lambda function two manage our instance, in this case, starting and stopping our EC2 Instance.</p>
<p>First thing first, we need to create and code it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> urllib.request
<span class="hljs-keyword">import</span> json

ec2 = boto3.resource(<span class="hljs-string">'ec2'</span>)
INSTANCE_ID = <span class="hljs-string">'YOUR_INSTANCE_ID_HERE'</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">send_text_response</span>(<span class="hljs-params">event, response_text</span>):</span>
    channel = event.get(<span class="hljs-string">"channel"</span>)
    ts = event.get(<span class="hljs-string">"ts"</span>)
    print(<span class="hljs-string">"Messaging Slack..."</span>)
    SLACK_URL = <span class="hljs-string">"https://slack.com/api/chat.postMessage"</span>

    data = urllib.parse.urlencode(
      (
        (<span class="hljs-string">"token"</span>, os.environ[<span class="hljs-string">"BOT_TOKEN"</span>]),
        (<span class="hljs-string">"channel"</span>, channel),
        (<span class="hljs-string">"thread_ts"</span>, ts),
        (<span class="hljs-string">"text"</span>, response_text)
      )
    )
    data = data.encode(<span class="hljs-string">"ascii"</span>)

    request = urllib.request.Request(SLACK_URL, data=data, method=<span class="hljs-string">"POST"</span>)
    request.add_header( <span class="hljs-string">"Content-Type"</span>, <span class="hljs-string">"application/x-www-form-urlencoded"</span> )

    print(<span class="hljs-string">'Fire off the request!'</span>)
    x = urllib.request.urlopen(request).read()
    print(x)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">start_workspace</span>():</span>
    instance = ec2.Instance(INSTANCE_ID)
    <span class="hljs-keyword">if</span> instance.state.get(<span class="hljs-string">'Name'</span>) == <span class="hljs-string">'running'</span>:
          <span class="hljs-keyword">return</span> <span class="hljs-string">f"Instance Already Running."</span>

    <span class="hljs-comment"># Start EC2 Instance</span>
    instance.start()

    <span class="hljs-comment"># Wait Until its Run</span>
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
      instance = ec2.Instance(INSTANCE_ID)
      <span class="hljs-keyword">if</span> instance.state.get(<span class="hljs-string">'Name'</span>) == <span class="hljs-string">'running'</span> <span class="hljs-keyword">and</span> instance.public_ip_address <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Instance Started Successfully. \n Public IP: <span class="hljs-subst">{instance.public_ip_address}</span>"</span>    

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">stop_workspace</span>():</span>
    instance = ec2.Instance(INSTANCE_ID)

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> instance.state.get(<span class="hljs-string">'Name'</span>) == <span class="hljs-string">'running'</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Instance Already Stopped."</span>

    <span class="hljs-comment"># Stop EC2 Instance</span>
    instance.stop()

    <span class="hljs-comment"># Check Until is Stopping</span>
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
      instance = ec2.Instance(INSTANCE_ID)
      <span class="hljs-keyword">if</span> instance.state.get(<span class="hljs-string">'Name'</span>) == <span class="hljs-string">'stopped'</span> <span class="hljs-keyword">or</span> instance.state.get(<span class="hljs-string">'Name'</span>) == <span class="hljs-string">'stopping'</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Instance Stopped Successfully."</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    print(<span class="hljs-string">f"Received event:\n<span class="hljs-subst">{event}</span>\nWith context:\n<span class="hljs-subst">{context}</span>"</span>)
    slack_body = event.get(<span class="hljs-string">'body'</span>)
    print(<span class="hljs-string">f"Body:\n<span class="hljs-subst">{slack_body}</span>"</span>)
    slack_event = json.loads(slack_body).get(<span class="hljs-string">'event'</span>)
    user_id = slack_event.get(<span class="hljs-string">"user"</span>)
    text = slack_event.get(<span class="hljs-string">"text"</span>)

    <span class="hljs-keyword">try</span>:
      <span class="hljs-keyword">if</span> text != <span class="hljs-literal">None</span> <span class="hljs-keyword">and</span> (<span class="hljs-string">'Start Instance'</span> <span class="hljs-keyword">in</span> text):
        print(<span class="hljs-string">"Starting Instance"</span>)
        response_text = start_workspace()
        <span class="hljs-keyword">if</span> response_text:
          send_text_response(slack_event, response_text)

      <span class="hljs-keyword">if</span> text != <span class="hljs-literal">None</span> <span class="hljs-keyword">and</span> (<span class="hljs-string">'Stop Instance'</span> <span class="hljs-keyword">in</span> text):
        print(<span class="hljs-string">"Stopping Instance"</span>)
        response_text = stop_workspace()
        <span class="hljs-keyword">if</span> response_text:
          send_text_response(slack_event, response_text)

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
      print(e)

    <span class="hljs-keyword">return</span> {
      <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">200</span>
    }
</code></pre>
<p>So this is how our code works:</p>
<ol>
<li>When there is a "Start Instance" keyword in chat, it will trigger EC2 start and reply public IP when the process is a success</li>
<li>When there is a "Stop Instance" keyword in chat, it will trigger EC2 start and reply public IP when the process is a success</li>
</ol>
<p>But, for our lambda to be able to start or stop our instance, we need to permit it to do so.</p>
<ul>
<li>Go to the <code>Configure</code> tab and click Role name</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648280343486/9I0LrPs8k.png" alt="image.png" /></p>
<ul>
<li>Add a needed policy to start and stop the instance. if you don't care about security and stuff, attach the EC2FullAccess Policy</li>
</ul>
<p>And for our lambda to be able to send messages through Slack API, we need an access token, which is given through the environment variable.</p>
<ul>
<li>Go to the <code>Configure</code> tab and click Role name</li>
<li>Add <code>BOT_TOKEN</code> with the value of Slack Bot Token that we memo earlier</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648297811243/lVSF_NSAf.png" alt="pPXXrB0xJ.png" /></p>
<h2 id="heading-testing">Testing</h2>
<p>If we set everything upright, we should have our bot happy and running.</p>
<p>Starting up Instance:
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648297177321/hm0cQf7fd.png" alt="image.png" /></p>
<p>Stopping Instance:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1648297219471/mJo0bVYno.png" alt="image.png" /></p>
<h1 id="heading-closing">Closing</h1>
<p>It was a long journey. We went through AWS, Slack, AWS, and back to Slack to trigger AWS. Although we built an EC2 instance manager this time, there are many ways to use this flow. Like they always say, the sky is the limit, Happy Hacking!</p>
]]></content:encoded></item><item><title><![CDATA[AWS S3 as a database with S3 Select]]></title><description><![CDATA[When I see AWS S3, I don't think of it as a database. I think of it as data storage to put my big file data like photos or videos. Someday I read somewhere on the Internet that says S3 is a database, making me want to experiment with it. In this arti...]]></description><link>https://blog.alvinend.tech/aws-s3-as-a-database-with-s3-select</link><guid isPermaLink="true">https://blog.alvinend.tech/aws-s3-as-a-database-with-s3-select</guid><category><![CDATA[AWS]]></category><category><![CDATA[Amazon S3]]></category><category><![CDATA[S3]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sun, 27 Feb 2022 13:14:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1645967617113/W8VSce3X-.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I see AWS S3, I don't think of it as a database. I think of it as data storage to put my big file data like photos or videos. Someday I read somewhere on the Internet that says S3 is a database, making me want to experiment with it. In this article, I will try to implement CRUD operation in AWS S3.</p>
<h1 id="heading-prerequisite">Prerequisite</h1>
<p>In this experiment, We will be using AWS S3 and Python. Here is what I have prepared before the experiment.</p>
<ul>
<li>AWS Account</li>
<li>S3 Bucket to store CSV File</li>
<li>Python Environment</li>
<li>A Dummy CSV File filled with one thousand users' information</li>
</ul>
<h1 id="heading-implementing-crud">Implementing CRUD</h1>
<p>And now, let's get our hands dirty and code this thing. First, of Its a good thing to know what we are going to build, so I wrote a test (not really) before implementing it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> s3.user <span class="hljs-keyword">import</span> read, delete, update, create

<span class="hljs-comment"># Read Query</span>
print(<span class="hljs-string">"Read Data"</span>)
print(read(<span class="hljs-string">"WHERE firstname = 'Devina'"</span>))

<span class="hljs-comment"># Output</span>
<span class="hljs-comment"># [['1274', 'Devina', 'Terencio', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]</span>

<span class="hljs-comment"># Update Query</span>
print(<span class="hljs-string">"Update Data"</span>)
update(
  <span class="hljs-string">"WHERE firstname = 'Devina'"</span>,
  {
    <span class="hljs-string">'lastname'</span>: <span class="hljs-string">'Green'</span>
  }
)

<span class="hljs-comment"># Output</span>
print(read(<span class="hljs-string">"WHERE firstname = 'Devina'"</span>))
<span class="hljs-comment"># [['1274', 'Devina', 'Green', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]</span>

<span class="hljs-comment"># Delete Item</span>
<span class="hljs-comment"># print("Delete Item")</span>
delete(<span class="hljs-string">"WHERE firstname = 'Devina'"</span>)

<span class="hljs-comment"># Output</span>
print(read(<span class="hljs-string">"WHERE firstname = 'Devina'"</span>))
<span class="hljs-comment"># []</span>

<span class="hljs-comment"># Create Item</span>
print(<span class="hljs-string">"Create Item"</span>)
create({
  <span class="hljs-string">'firstname'</span>: <span class="hljs-string">'Devina'</span>,
  <span class="hljs-string">'lastname'</span>: <span class="hljs-string">'Terencio'</span>,
  <span class="hljs-string">'email'</span>: <span class="hljs-string">'Devina.Terencio@yopmail.com'</span>,
  <span class="hljs-string">'email2'</span>: <span class="hljs-string">'Devina.Terencio@gmail.com'</span>,
  <span class="hljs-string">'profession'</span>: <span class="hljs-string">'doctor'</span>
})

<span class="hljs-comment"># Output</span>
print(read(<span class="hljs-string">"WHERE firstname = 'Devina'"</span>))
<span class="hljs-comment"># [['2100', 'Devina', 'Terencio', 'Devina.Terencio@yopmail.com', 'Devina.Terencio@gmail.com', 'doctor']]</span>
</code></pre>
<p>If what is printed in the terminal is the same as expected, we successfully built our application. </p>
<p>Here is the GitHub link: https://github.com/alvinend/s3-as-a-db</p>
<h2 id="heading-read">Read</h2>
<p>What do we do when we want to query some items.</p>
<ul>
<li>Use S3 Select to query CSV. This will return object</li>
<li>Get payload string from the response object</li>
<li>Change payload string to list</li>
</ul>
<p>Easy!</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> pandas
<span class="hljs-keyword">import</span> settings

s3_client = boto3.client(<span class="hljs-string">'s3'</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read</span>(<span class="hljs-params">
    query,
    select=<span class="hljs-string">'*'</span>
</span>):</span>
    <span class="hljs-comment"># Use S3 Select to Query CSV</span>
    res = s3_client.select_object_content(
        Bucket = settings.BUCKET_NAME,
        Key = <span class="hljs-string">'user.csv'</span>,
        ExpressionType = <span class="hljs-string">'SQL'</span>,
        Expression =<span class="hljs-string">f"Select <span class="hljs-subst">{select}</span> from S3Object s "</span> + query,
        InputSerialization = {
            <span class="hljs-string">'CompressionType'</span>: <span class="hljs-string">'NONE'</span>,
            <span class="hljs-string">'CSV'</span> : {
                <span class="hljs-string">'FileHeaderInfo'</span> : <span class="hljs-string">'Use'</span>,
                <span class="hljs-string">'RecordDelimiter'</span> : <span class="hljs-string">'\n'</span>,
                <span class="hljs-string">'FieldDelimiter'</span> : <span class="hljs-string">','</span>
            }
        },
        OutputSerialization = {
            <span class="hljs-string">'CSV'</span> : {
                <span class="hljs-string">'RecordDelimiter'</span> : <span class="hljs-string">'\n'</span>,
                <span class="hljs-string">'FieldDelimiter'</span> : <span class="hljs-string">','</span>
            }
        }
    )

    records = <span class="hljs-string">''</span>

    <span class="hljs-keyword">for</span> event <span class="hljs-keyword">in</span> res[<span class="hljs-string">'Payload'</span>]:
        <span class="hljs-keyword">if</span> <span class="hljs-string">'Records'</span> <span class="hljs-keyword">in</span> event:
            records = event[<span class="hljs-string">'Records'</span>][<span class="hljs-string">'Payload'</span>].decode(<span class="hljs-string">'utf-8'</span>)

    <span class="hljs-comment"># Change String to Array 1, "Name1, Address1\n 2, Name2, Address2\n" -&gt; ["1, Name1, Address1", "2, Name2, Address2", ""]</span>
    records = records.split(<span class="hljs-string">'\n'</span>)

    <span class="hljs-comment"># Remove Empty String Element ["1, Name1, Address1", "2, Name2, Address2", ""] -&gt; ["1, Name1, Address1", "2, Name2, Address2"]</span>
    records = filter(len, records)

    <span class="hljs-comment"># Change Elemnt to be String [["1", "Name1", "Address1"], ["2", "Name2", "Address2"]]</span>
    records = list(map(<span class="hljs-keyword">lambda</span> x: x.replace(<span class="hljs-string">'\r'</span>, <span class="hljs-string">''</span>).split(<span class="hljs-string">','</span>), records))

    <span class="hljs-keyword">return</span> records
</code></pre>
<h2 id="heading-delete">Delete</h2>
<p>Since S3 has no feature to manipulate CSV files, we must do it manually. By manually, I mean taking the file from S3 manipulate and replacing the old file with the manipulated file. </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> pandas
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> settings
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> StringIO

s3_resource = boto3.resource(<span class="hljs-string">'s3'</span>)
s3_client = boto3.client(<span class="hljs-string">'s3'</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_df</span>():</span>
    res = s3_client.get_object(Bucket=settings.BUCKET_NAME, Key=<span class="hljs-string">"user.csv"</span>)

    df = pandas.read_csv(res.get(<span class="hljs-string">"Body"</span>))

    <span class="hljs-keyword">return</span> df

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_to_s3</span>(<span class="hljs-params">df</span>):</span>
    csv_buffer = StringIO()
    df.to_csv(csv_buffer, index=<span class="hljs-literal">False</span>)
    s3_resource.Object(settings.BUCKET_NAME, <span class="hljs-string">'user.csv'</span>).put(Body=csv_buffer.getvalue())
</code></pre>
<p>We will use this method to create and update operations too. For delete, we do these actions. </p>
<ol>
<li>Get Dataframe from S3</li>
<li>Get Ids from S3 Select query</li>
<li>Filter out selected ID</li>
<li>Save Dataframe to S3 as CSV</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> s3.user.utils <span class="hljs-keyword">import</span> get_df, save_to_s3
<span class="hljs-keyword">from</span> s3.user.read <span class="hljs-keyword">import</span> read

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">delete</span>(<span class="hljs-params">query</span>):</span>
    <span class="hljs-comment"># Get Dataframe</span>
    df = get_df()

    <span class="hljs-comment"># Get Ids from query</span>
    ids = read(query, select=<span class="hljs-string">'id'</span>)
    ids = list(map(<span class="hljs-keyword">lambda</span> x: int(<span class="hljs-string">""</span>.join(x)), ids))

    <span class="hljs-comment"># Filter out Selected ID</span>
    newdf = df[~df.id.isin(ids)]

    <span class="hljs-comment"># Save Dataframe to S3 as CSV</span>
    save_to_s3(newdf)
</code></pre>
<h2 id="heading-update">Update</h2>
<p>The update operation is almost the same as delete, with the exception instead of filtering it, we update it.</p>
<ol>
<li>Get Dataframe from S3</li>
<li>Get Ids from S3 Select query</li>
<li>Update selected ID with new values.</li>
<li>Save Dataframe to S3 as CSV</li>
</ol>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update</span>(<span class="hljs-params">
    query,
    data_dict
</span>):</span>
    update_column_key = list(data_dict.keys())
    update_column_value = list(data_dict.values())

    <span class="hljs-comment"># Get Dataframe</span>
    df = get_df()

    <span class="hljs-comment"># Get Ids from query</span>
    ids = read(query, select=<span class="hljs-string">'id'</span>)
    ids = list(map(<span class="hljs-keyword">lambda</span> x: int(<span class="hljs-string">""</span>.join(x)), ids))

    <span class="hljs-comment"># Update DF</span>
    df.loc[df.id.isin(ids), update_column_key] = update_column_value

    <span class="hljs-comment"># Save Dataframe to S3 as CSV</span>
    save_to_s3(df)
</code></pre>
<h2 id="heading-create">Create</h2>
<p>Last but not least is the create operation. Although I would like to insert data in pandas' DataFrame, it has no auto-increment feature. So we have to manually search the latest stored ID, increment it, and save new data. </p>
<ol>
<li>Get Dataframe from S3</li>
<li>Get latest id from S3 Select query</li>
<li>Create new data with the incremented latest ID</li>
<li>Save Dataframe to S3 as CSV</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> s3.user.read <span class="hljs-keyword">import</span> read
<span class="hljs-keyword">from</span> s3.user.utils <span class="hljs-keyword">import</span> get_df, save_to_s3

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create</span>(<span class="hljs-params">
    data_dict
</span>):</span>
    <span class="hljs-comment"># Get Dataframe</span>
    df = get_df()

    <span class="hljs-comment"># Get Ids from query</span>
    max_id = read(<span class="hljs-string">''</span>, select=<span class="hljs-string">'MAX(cast(id as int))'</span>)
    max_id = int(max_id[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>])

    <span class="hljs-comment"># Update DF</span>
    data_dict[<span class="hljs-string">'id'</span>] = max_id + <span class="hljs-number">1</span>

    df = df.append(data_dict, ignore_index=<span class="hljs-literal">True</span>)

    <span class="hljs-comment"># Save Dataframe to S3 as CSV</span>
    save_to_s3(df)
</code></pre>
<h1 id="heading-closing">Closing</h1>
<p>There you have it,  S3 as a database! I know it is slower than a conventional relational database like MySQL and has no ACID compliance. But it is cheap when you are experimenting or making a prototype of something. It is nice to have a choice that does not require you to manage the database and pay monthly server costs.  </p>
<p>As always, thank you for reading this post!</p>
]]></content:encoded></item><item><title><![CDATA[Building Serverless Task Manager: Introduction & Design (Part 1)]]></title><description><![CDATA[Around half a year ago, my team and I were doing a project that needed task management. We looked at major task management services, but it was too expensive, and we just needed simple features, like setting deadlines, assigning them to someone, and ...]]></description><link>https://blog.alvinend.tech/building-serverless-task-manager-introduction-and-design-part-1</link><guid isPermaLink="true">https://blog.alvinend.tech/building-serverless-task-manager-introduction-and-design-part-1</guid><category><![CDATA[Web Development]]></category><category><![CDATA[projects]]></category><category><![CDATA[project management]]></category><category><![CDATA[Design]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sat, 26 Feb 2022 14:49:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1645884317615/ludj8OYSO.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Around half a year ago, my team and I were doing a project that needed task management. We looked at major task management services, but it was too expensive, and we just needed simple features, like setting deadlines, assigning them to someone, and managing the completion of tasks. So we decided to build one.</p>
<h1 id="heading-what-do-we-need">What do we need</h1>
<p>Before jumping to code, we need to have a clear image of what we will build. To do that, I listed up what this application should be able to do.</p>
<ul>
<li>Create, read, update and delete tasks</li>
<li>Have user authentication</li>
<li>Can assign tasks to a user</li>
<li>Can list ongoing tasks that the user has</li>
<li>Can set a deadline on tasks</li>
<li>Notify user when a task is assigned</li>
<li>Remind user when task is over deadline</li>
</ul>
<h1 id="heading-what-should-it-look-like">What should it look like</h1>
<p>Again, to have a clear image of our goal, we should have a rough idea of what our application will look like. So making UI design is an excellent start to doing that. We want to keep it simple, easy to use, and dark because everybody loves the dark user interface this day. I jumped to <a target="_blank" href="https://www.uihut.com/">UI Hut</a> for inspiration and started making my very own UI Design with Adobe XD. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1645886866358/ewW9M5Lba.png" alt="Web 1920 – 1.png" /></p>
<h1 id="heading-what-do-we-use-to-make-it">What do we use to make it</h1>
<p>In my opinion, there is no right or wrong in choosing what tech stack when building something. What I think is important is why do we use it. Here is our's task stack</p>
<ul>
<li><p><strong>AWS API Gateway &amp; Lambda &amp; DynamoDB</strong> for the backend. We work for a small number of people and do not plan on accessing it frequently. It will be a waste to deploy a whole server just for it. Serverless may be expensive in some use cases, but for low in-frequent access, it is perfect.</p>
</li>
<li><p><strong>React</strong> for the frontend. We have our REST API in serverless, and I think it makes sense to use SPA for our application. The reason why do we use React is because I used to use React, and I love to react. After building it, we will deploy it in S3 together with Cloudfront.</p>
</li>
<li><p><strong>React Native</strong> for the mobile when tasks are assigned or have a task over deadline, we want to be notified. While I think it is overkill to build a whole mobile application for it, it is an excellent chance to try new technology and have it with it.</p>
</li>
</ul>
<h1 id="heading-ending">Ending</h1>
<p>I will be building this application in the next two or three months. I am excited to show how I create the application and how I think what and how it should be done. </p>
<p>Thank you for reading this post!</p>
]]></content:encoded></item><item><title><![CDATA[Knowing How Javascript Execute Its Codes]]></title><description><![CDATA[I woke up Saturday morning, and out of nowhere, I was thinking, how does javascript execute hundreds of hundreds of code in such a fast time. So just like an average person, I go to my PC, start googling and watching some explanation for hours, and t...]]></description><link>https://blog.alvinend.tech/knowing-how-javascript-execute-its-codes</link><guid isPermaLink="true">https://blog.alvinend.tech/knowing-how-javascript-execute-its-codes</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[Browsers]]></category><category><![CDATA[basics]]></category><dc:creator><![CDATA[Alvin Endratno]]></dc:creator><pubDate>Sun, 13 Feb 2022 11:36:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1644752098332/UEXsb75m1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I woke up Saturday morning, and out of nowhere, I was thinking, how does javascript execute hundreds of hundreds of code in such a fast time. So just like an average person, I go to my PC, start googling and watching some explanation for hours, and this is I'm here to share what I got.</p>
<h1 id="heading-javascript-is-a-single-threaded-language">Javascript is a single-threaded language</h1>
<p>Do you know that our brain is not able to do multitask? Except for a tiny percentage of people, When we think we’re multitasking, most often, we aren’t doing two things at once. Still, instead, we’re doing individual actions in rapid succession or task-switching. Single-threaded language like Javascript does the same thing. It stacks up a list of tasks and executes them one by one. That stack has a name, and it is "Call Stack".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644795512263/DhESQu2hg.png" alt="callstack_async.drawio.png" /></p>
<p>From the top, code is inserted per task, and if a task is a function and has another task in it, it will push into the stack. Once one set of processes is pushed in, <code>Call Stack</code> will consume it in a LIFO manner. The last task pushed will be the first one to be consumed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644795999865/wY51_sJ-Uc.png" alt="callstack_async_2.drawio.png" /></p>
<p>When one set of processes in this case function is finished, Javascript Runtime will continue to execute the rest of the codes.</p>
<h1 id="heading-how-about-promises">How About Promises?</h1>
<p>If you know Javascript and build some kind of application, 9 out of 10, you already know about <code>Promise</code>. </p>
<h2 id="heading-what-is-promise">What is Promise</h2>
<p>Maybe you already wrote it but didn't know that your code is <code>Promise</code>, so let us start with an example.</p>
<pre><code class="lang-js"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getUser</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">'/user'</span>)

  <span class="hljs-keyword">return</span> res.body
}
</code></pre>
<p>HTTP Request is a common thing we do when we build Web Apps. In that example, we get user data from the backend end to return its content.</p>
<p>Let us test that function.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> user = getUser()
<span class="hljs-built_in">console</span>.log(user)
</code></pre>
<p>We expect it to log out user data. But here, the output says otherwise.</p>
<pre><code class="lang-js"><span class="hljs-literal">undefined</span>
</code></pre>
<p>It happens because we don't until our data is fetched. To fix it, we insert <code>await</code> at declaration.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> user = <span class="hljs-keyword">await</span> getUser()
<span class="hljs-built_in">console</span>.log(user)
</code></pre>
<p>And the output is as we are expected.</p>
<pre><code class="lang-js">{
  <span class="hljs-attr">name</span>: <span class="hljs-string">'Alvin'</span>
}
</code></pre>
<p>Promise in Javascript is an object that may produce value in the future. Since JavaScript is an asynchronous language, the following process will be executed without waiting to complete the previous one if the previous execution takes a long time. But if we wait for <code>Promise</code> to end with <code>await</code>, we get our expected value.</p>
<p>The question here is, "How does javascript do something asynchronously while itself is single-threaded?". </p>
<h1 id="heading-javascript-concurrent-like-process">Javascript concurrent-like process</h1>
<p>I know it's a little bit misleading, but while javascript runtime can only do one thing simultaneously, the browser does not only consist of javascript runtime.</p>
<h2 id="heading-webapis">WebApis</h2>
<p>To make it simple, WebApis is where asynchronous code is executed. When code contains an asynchronous function, the code is then pushed to <code>Call stack.</code> <code>Call Stack</code> know it is an asynchronous task and throw it to WebApis. WebApis and then process it and throws callback into <code>Callback Queue</code>.</p>
<p>WebApis also contain DOM and also handles user interactions. For example, when the user clicks a button, WebApis throws a click callback into Callback Queue.</p>
<h2 id="heading-callback-queue">Callback Queue</h2>
<p>Callback Queue is a queue of tasks thrown by WebApis. Tasks will later give back to <code>Call Stack</code>and be processed.</p>
<h2 id="heading-event-loop">Event Loop</h2>
<p>The process of moving tasks from main code to call stack, callback queue back to call stack is the event loop. Event Loop stops when the call stack processes something and loop when there are no tasks left. 
Javascript will execute all main code processes in that logic before touching <code>Callback Queue</code>.</p>
<h2 id="heading-executing">Executing</h2>
<p>Let's get a better image of it. We start with simple code, the same as before, rather than only logging strings. We try to make an async request, fetch a user, and log out the user name.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644750418813/bnJnSF8BG.png" alt="callstack_async_1.drawio.png" /></p>
<p>Event Loop will start its loop at the call stack and process the main code as before. We have an async function, so instead of processing in the call stack, it will be thrown to WebApis to be executed there. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644750599677/2GpFqPdLq.png" alt="callstack_async_1.drawio.png" /></p>
<p>It will continue until codes and stacks are processed. When WebApis is done executing, it will throw a callback in the <code>Callback queue</code>. In this case, it's logging out the user name.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644751188092/89uMISno0.png" alt="callstack_async_1.drawio.png" /></p>
<p>After that, Event Loop will start looping between <code>Callback Queue</code> and <code>Call Stack</code>. If there are tasks in <code>Callback Queue</code>, It will throw it to <code>Call Stack</code> and execute it later when the event loop stops at <code>Call Stack</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644751269964/68KDsZC8-.png" alt="callstack_async_1.drawio.png" /></p>
<h1 id="heading-ending">Ending</h1>
<p>I hope you get the big picture of how javascript runs in our browser. If you are interested in more, I suggest you watch this video; it's the easiest-to-understand video, in my opinion.</p>
<p><a target="_blank" href="https://youtu.be/8aGhZQkoFbQ">Youtube Link</a></p>
]]></content:encoded></item></channel></rss>