Microblog

Building a Multi-Label Classification Model with BERT

Steps to create a multi-label classification model using BERT and the Hugging Face Transformers library: Load the Data: Read the CSV file containing text snippets and their corresponding labels. Parse Labels: Convert the comma-separated labels into a list format. Prepare Labels: Use MultiLabelBinarizer to transform label lists into multi-hot vectors. Train/Test Split: Split the data into training and validation sets. Tokenize Text: Use a BERT tokenizer to preprocess the text snippets. Create Dataset: Define a custom Dataset class for the tokenized data and labels. Load Model: Initialize a BERT model for sequence classification with multi-label support. Training Arguments: Set up training parameters such as batch size, learning rate, and number of epochs. Train the Model: Use the Hugging Face Trainer to train the model. Save the Model: Save the trained model and tokenizer for future use. Example Code import pandas as pd import torch import torch.nn as nn from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments from sklearn.model_selection import train_test_split from sklearn.preprocessing import MultiLabelBinarizer # Load and preprocess data df = pd.read_csv("data/calls_with_context_ALL_2025-03-07T01-09-11-931Z_labeled.csv") df["labels"] = df["labels"].fillna("").apply(lambda x: [lbl.strip() for lbl in x.split(",") if lbl.strip() != ""]) all_labels = ["cancel appointment", "collect patient info", "collect medicaid info", "collect insurance info", "confirm appointment", "general question", "intro/outro", "question about patient's chart", "reschedule appointment", "running late", "schedule appointment", "taking a message"] mlb = MultiLabelBinarizer(classes=all_labels) label_matrix = mlb.fit_transform(df["labels"]) # Train/test split train_df, val_df, train_labels, val_labels = train_test_split(df, label_matrix, test_size=0.1, random_state=42) # Tokenize text tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") train_encodings = tokenizer(list(train_df["snippetText"]), truncation=True, padding=True, max_length=128) val_encodings = tokenizer(list(val_df["snippetText"]), truncation=True, padding=True, max_length=128) # Create dataset class IntentDataset(torch.utils.data.Dataset): def __init__(self, encodings, labels): self.encodings = encodings self.labels = labels def __getitem__(self, idx): item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()} item["labels"] = torch.tensor(self.labels[idx]).float() return item def __len__(self): return len(self.labels) train_dataset = IntentDataset(train_encodings, train_labels) val_dataset = IntentDataset(val_encodings, val_labels) # Load model and set training arguments model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=len(all_labels), problem_type="multi_label_classification") training_args = TrainingArguments(output_dir="./multi_intent_model", evaluation_strategy="epoch", per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=4, learning_rate=2e-5, weight_decay=0.01, logging_steps=50, load_best_model_at_end=True, save_strategy="epoch") # Train the model trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset) trainer.train() # Save the model trainer.save_model("./multi_intent_model") tokenizer.save_pretrained("./multi_intent_model") print("Training complete!") Important Ensure you have the necessary libraries installed: transformers, torch, pandas, scikit-learn. ...

Quick How-To: Enabling and Disabling ESLint Rules for TypeScript

Configuring ESLint for TypeScript projects involves enabling and disabling specific rules to suit your project’s needs. Here’s a quick guide on how to manage these rules in your eslint.config.mjs file. Example Configuration import typescriptPlugin from "@typescript-eslint/eslint-plugin"; import typescriptParser from "@typescript-eslint/parser"; export default [ { files: ["**/*.{ts,tsx}"], languageOptions: { parser: typescriptParser, parserOptions: { ecmaVersion: "latest", sourceType: "module", project: "./tsconfig.json", }, }, plugins: { "@typescript-eslint": typescriptPlugin, }, rules: { ...typescriptPlugin.configs.recommended.rules, "no-console": "warn", "no-unused-vars": "off", "@typescript-eslint/no-unused-vars": "warn", "@typescript-eslint/no-explicit-any": "off", "@typescript-eslint/ban-ts-comment": "off", }, }, ]; Key Points Disabling Rules: Set the rule to "off". For example, to disable the @typescript-eslint/no-explicit-any rule: ...

Getting File Stats in Electron

You can get basic information about a file using the fs.stat method. This asynchronous function accepts a file path and returns an object that looks like this: { "dev": 16777234, "mode": 33188, "nlink": 1, "uid": 501, "gid": 20, "rdev": 0, "blksize": 4096, "ino": 806397637, "size": 311339, "blocks": 616, "atimeMs": 1740349370637.4573, "mtimeMs": 1740349369783.987, "ctimeMs": 1740349369783.987, "birthtimeMs": 1740349369783.5325, "atime": "2025-02-23T22:22:50.637Z", "mtime": "2025-02-23T22:22:49.784Z", "ctime": "2025-02-23T22:22:49.784Z", "birthtime": "2025-02-23T22:22:49.784Z" } Here’s how you could use it to check if the file is empty: const stats = await fs.stat(filePath); if(stats.size === 0){ // Do something }

Valuable Social Rewards

There are not many social media websites that reward you for objectively useful behavior. Twitter rewards yapping, Instagram rewards deceptive perspective, LinkedIn rewards slop. Stack Overflow, on the other hand, rewards knowledge that leads to problem solving. Despite the green badge impacting my life just as much as a red or blue badge, the green badge on the SO trophy raises my dopamine levels just a little higher. ...

Naming Things Better

There’s a section in the Electron docs on naming things that I thought was worth saving. If a module is a class or can be instantiated as one, name it using PascalCase If a module is a set of APIs or utils, name it using camelCase AirBnB also has a section on naming conventions which generally offers the same guidance. They go one step further and note that file names should exactly match the main export of that file: ...

Quote Steve Jobs Death

I watched the Steve Jobs biographical drama this weekend and was surprised for the fifth time that Jobs was not really a technical person. Compared to the layman, sure, but his entire career and the rise of Apple was due mostly to his obsession with craftsmanship. Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose. You are already naked. There is no reason not to follow your heart ...

VS Code SVG Extension

I spent 3-4 hours creating a Flutter app to edit and export SVGs only to find a VS Code extension that does both of these things in a few clicks. The extension is simply called SVG and it has almost 2 million downloads. The best part is that it allows you to edit SVGs directly in the editor and export them as PNGs. After installing the extension, right-click on one of your PNGs and select “Preview SVG” to start editing. This will open a new tab with the SVG preview and a toolbar at the top. In the toolbar, select the “Code interactive” button and click on any element in the SVG to start editing it. This will open a panel on the left with the SVG code. ...

Hugo Code Highlight

After reading about Expressive code highlighting in Astro, I was curious if it was possible to achieve the same in Hugo. This post is proof that it is, but it wasn’t as straightforward as I had hoped. Since I’m using the PaperMod theme, highlighting doesn’t work the same as it does in generic Hugo themes (PaperMod is doing some opininated things with the syntax highlighting). Nonetheless, all I had to do was add this CSS to the themes/papermod/assets/css/post-single.css file: ...

Four Backticks for Code

I recently realized there is no upper limit on the number of backticks you can put in front of your markdown code blocks. This uses 4 backticks. This uses 5 backticks. This uses 6 backticks. And so on. The use cases for this are rare but they exist. For example, I recently posted about using expressive code features in Astro. Since Hugo (what this blog is built on doesn’t support expressive code), expressive code features either break the build or don’t render correctly. But by using 4 backticks, I was able to show the code block as it would appear in Astro. ...

Expressive Code in Astro Starlight

There are a lot of cool things you can do to code blocks using Expressive Code in Astro’s Starlight. Add File Names to Code Blocks ```js // my-title.js export default function MyTitle() { return <h1>My Title</h1> } ``` Add Titles to Code Blocks ```js title="my-title.js" export default function MyTitle() { return <h1>My Title</h1> } ``` Add Line Numbers to Code Blocks ```javascript {1} export default function MyTitle() { return <h1>My Title</h1> } ``` Combine diffing with Syntax Highlighting ```diff lang="js" function thisIsJavaScript() { // This entire block gets highlighted as JavaScript, // and we can still add diff markers to it! - console.log('Old code to be removed') + console.log('New and shiny code!') } ```