Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code and simplify file processing #48

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions src/bin/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,11 @@ const client = new MarkdownDB({
},
});

// Ignore top-level await errors
//@ts-ignore
await client.init();

//@ts-ignore
await client.indexFolder({
folderPath: contentPath,
ignorePatterns: ignorePatterns,
});

process.exit();
54 changes: 35 additions & 19 deletions src/lib/indexFolderToObjects.ts
Original file line number Diff line number Diff line change
@@ -1,48 +1,55 @@
import { WikiLink, getUniqueValues, recursiveWalkDir } from "../utils/index.js";
import { markdownToObject } from "./markdownToObject.js";
import { File, FileTag, Link, Tag } from "./types/schemaTypes.js";
import { getUniqueValues, recursiveWalkDir } from "../utils/index.js";
import type { WikiLink } from "../utils/index.js";
import { extractFileSchemeFromObject } from "../utils/extractFileSchemeFromObject.js";
import { readLocalMarkdownFileToObject } from "./readLocalMarkdownFileToObject.js";
import type { File, FileTag, Link, Tag } from "./types/schemaTypes.js";

Check warning on line 5 in src/lib/indexFolderToObjects.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

'Tag' is defined but never used
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

export function indexFolderToObjects(
folderPath: string,
ignorePatterns?: RegExp[],
pathToUrlResolver?: (filePath: string) => string
pathToUrlResolver: (filePath: string) => string,
ignorePatterns?: RegExp[]
) {
const filePathsToIndex = recursiveWalkDir(folderPath);
const files: File[] = [];
const tags: Tag[] = [];
const tags: string[] = [];
const fileTags: FileTag[] = [];
const links: Link[] = [];
const filteredFilePathsToIndex = filePathsToIndex.filter((filePath) => {
return !(
ignorePatterns && ignorePatterns.some((pattern) => pattern.test(filePath))
);
});
const filteredFilePathsToIndex = filePathsToIndex.filter((filePath) =>
shouldIncludeFile(filePath, ignorePatterns)
);

for (const filePath of filteredFilePathsToIndex) {
const fileObject = markdownToObject(folderPath, filePath, filePathsToIndex);
const fileObject = readLocalMarkdownFileToObject(
folderPath,
filePath,
filePathsToIndex,
pathToUrlResolver
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit awkward imo... Both the naming, the arguments it takes and unnecesarily adds a layer of abstraction. I wonder if we could get rid of the readLocalMarkdownFileToObject whatsoever. This could just be:

Suggested change
const fileObject = readLocalMarkdownFileToObject(
folderPath,
filePath,
filePathsToIndex,
pathToUrlResolver
);
const id = generateFileIdFromPath(filePath);
const extension = getFileExtensionFromPath(filePath);
if (MddbFile.supportedExtensions.includes(extension)) {
return { id, extension, filePath };
}
const pathRelativeToFolder = path.relative(folderPath, filePath);
const urlPath = pathToUrlResolver(pathRelativeToFolder);
const data = parseMarkdownFile(filePath)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.
This could benefit from a slight refactor...


const file = extractFileSchemeFromObject(fileObject);
Copy link
Member

@olayway olayway Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest "marshalling" (or molding the object into the ready-to-insert-to-db structure) is done by relevant Mddb* classes (in this case MddbFile). So I'd just add the fileObject to the list and let MddbFile do the rest. And if you'd really want to do this here, creating a function for it is not necessary, as you could just destructure what's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to proceed, but I have a question about the structure of the data in the JSON files. Do you think it should mirror the format of the database? If they should align, we might want to keep things as they are. However, if variations are acceptable, I am more than willing to implement the modifications based on your recommended alternative structure.

files.push(file);

files.push(fileObject.file);
tags.push(...fileObject.tags);

const fileTagsToInsert = fileObject.tags.map((tag) => ({
tag: tag.name,
file: fileObject.file._id,
tag: tag,
file: fileObject._id,
}));
const uniqueFileTags = getUniqueValues(fileTagsToInsert);
fileTags.push(...uniqueFileTags);
fileTags.push(...fileTagsToInsert);

const linksToInsert: Link[] = processWikiLinks(
fileObject.links,
fileObject.file._id,
fileObject._id,
filePathsToIndex
);
links.push(...linksToInsert);
}

const uniqueTags = getUniqueValues(tags);
const TagsToInsert = uniqueTags.map((tag) => ({ name: tag }));
return {
files: files,
tags: uniqueTags,
tags: TagsToInsert,
fileTags: fileTags,
links: links,
};
Expand All @@ -56,8 +63,17 @@
return links
.map((link) => ({
from: fileId,
to: filePathsToIndex.find((file) => file === link.linkSrc)!,

Check warning on line 66 in src/lib/indexFolderToObjects.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
link_type: link.linkType,
}))
.filter((link) => link.to !== undefined);
}

function shouldIncludeFile(
filePath: string,
ignorePatterns?: RegExp[]
): boolean {
return !(
ignorePatterns && ignorePatterns.some((pattern) => pattern.test(filePath))
);
}
82 changes: 0 additions & 82 deletions src/lib/markdownToObject.ts

This file was deleted.

24 changes: 8 additions & 16 deletions src/lib/markdowndb.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,9 @@ import knex, { Knex } from "knex";

import { MddbFile, MddbTag, MddbFileTag, MddbLink } from "./schema.js";
import { DbQueryManager } from "./DbQueryManager.js";
import { FilesQuery, LinkQuery } from "./types/DbQueryTypes.js";
import type { FilesQuery, LinkQuery } from "./types/DbQueryTypes.js";
import { indexFolderToObjects } from "./indexFolderToObjects.js";

export const defaultFilePathToUrl = (filePath: string) => {
let url = filePath
.replace(/\.(mdx|md)/, "")
.replace(/\\/g, "/") // replace windows backslash with forward slash
.replace(/(\/)?index$/, ""); // remove index from the end of the permalink
url = url.length > 0 ? url : "/"; // for home page
return encodeURI(url);
};
import { defaultFilePathToUrl } from "../utils/defaultFilePathToUrl.js";

export class MarkdownDB {
config: Knex.Config;
Expand Down Expand Up @@ -42,14 +34,14 @@ export class MarkdownDB {
await resetDatabaseTables(this.db);
const { files, tags, fileTags, links } = indexFolderToObjects(
folderPath,
ignorePatterns,
pathToUrlResolver
pathToUrlResolver,
ignorePatterns
);

MddbFile.batchInsert(this.db, files);
MddbTag.batchInsert(this.db, tags);
MddbFileTag.batchInsert(this.db, fileTags);
MddbLink.batchInsert(this.db, links);
await MddbFile.batchInsert(this.db, files);
await MddbTag.batchInsert(this.db, tags);
await MddbFileTag.batchInsert(this.db, fileTags);
await MddbLink.batchInsert(this.db, links);
}

async getFileById(id: string): Promise<MddbFile | null> {
Expand Down
65 changes: 65 additions & 0 deletions src/lib/readLocalMarkdownFileToObject.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import fs from "fs";
import path from "path";
import { getUniqueValues, parseMarkdownContent } from "../utils/index.js";
import type { FileObject } from "./types/FileObject.js";
import { MddbFile } from "./schema.js";
import { generateFileIdFromPath } from "../utils/index.js";
import { getFileExtensionFromPath } from "../utils/index.js";

export function readLocalMarkdownFileToObject(
folderPath: string,
filePath: string,
filePathsToIndex: string[],
pathToUrlResolver: (filePath: string) => string
): FileObject {
const id = generateFileIdFromPath(filePath);
const extension = getFileExtensionFromPath(filePath);
const fileObject: FileObject = {
_id: id,
file_path: filePath,
extension,
url_path: null,
filetype: null,
metadata: null,
tags: [],
links: [],
};

// if not md or mdx return this
const isExtensionNotSupported =
!MddbFile.supportedExtensions.includes(extension);
if (isExtensionNotSupported) {
return fileObject;
}
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

// url_path
const pathRelativeToFolder = path.relative(folderPath, filePath);
const urlPath = pathToUrlResolver(pathRelativeToFolder);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const urlPath = pathToUrlResolver(pathRelativeToFolder);
fileObject.url_path = pathToUrlResolver(pathRelativeToFolder);

// metadata, tags, links
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});

({ metadata, links } = parseMarkdownContent(source, {
permalinks: filePathsToIndex,
}));

fileObject.url_path = urlPath;
fileObject.metadata = metadata;
fileObject.filetype = metadata?.type || null;
fileObject.links = links;
const tags = metadata.tags;
if (tags) {
fileObject.tags = getUniqueValues(tags);
}

return fileObject;
} catch (e) {
console.error(`Error processing file ${filePath}: ${e}`);
return fileObject;
}
}
7 changes: 3 additions & 4 deletions src/lib/types/FileObject.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import { WikiLink } from "../../utils";
import { File, Tag } from "./schemaTypes";
import { File } from "./schemaTypes";

export interface FileObject {
file: File;
tags: Tag[];
export interface FileObject extends File {
tags: string[];
links: WikiLink[];
}
2 changes: 1 addition & 1 deletion src/lib/types/schemaTypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ export enum Table {
Tags = "tags",
FileTags = "file_tags",
Links = "links",
}
}
8 changes: 8 additions & 0 deletions src/utils/defaultFilePathToUrl.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
export const defaultFilePathToUrl = (filePath: string) => {
let url = filePath
.replace(/\.(mdx|md)/, "")
.replace(/\\/g, "/") // replace windows backslash with forward slash
.replace(/(\/)?index$/, ""); // remove index from the end of the permalink
url = url.length > 0 ? url : "/"; // for home page
return encodeURI(url);
};
12 changes: 12 additions & 0 deletions src/utils/extractFileSchemeFromObject.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import { FileObject } from "../lib/types/FileObject.js";

export function extractFileSchemeFromObject(fileObject: FileObject) {
return {
_id: fileObject._id,
file_path: fileObject.file_path,
extension: fileObject.extension,
url_path: fileObject.url_path,
filetype: fileObject.filetype,
metadata: fileObject.metadata,
};
}
Comment on lines +1 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I'd move "marshalling" responsibility to relevant Mddb* classes and do it only right before inserting into db.

4 changes: 3 additions & 1 deletion src/utils/index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
export { recursiveWalkDir } from "./recursiveWalkDir.js";
export { extractWikiLinks, WikiLink } from "./extractWikiLinks.js";
export { parseFile } from "./parseFile.js";
export { parseMarkdownContent } from "./parseMarkdownContent.js";
export { getUniqueValues } from "./getUniqueValues.js";
export { generateFileIdFromPath } from "./generateFileIdFromPath.js";
export { getFileExtensionFromPath } from "./getFileExtensionFromPath.js";
export { defaultFilePathToUrl } from "./defaultFilePathToUrl.js";
export { extractFileSchemeFromObject } from "./extractFileSchemeFromObject.js";
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { parseFile } from "./parseFile";
import { parseMarkdownContent } from "./parseMarkdownContent";

const source = `---
title: Hello World
Expand All @@ -25,7 +25,7 @@ describe("parseFile", () => {
{ linkType: "normal", linkSrc: "blog/Some Other Link" },
{ linkType: "embed", linkSrc: "Some Image.png" },
];
const { metadata, links } = parseFile(source);
const { metadata, links } = parseMarkdownContent(source);
expect(metadata).toEqual(expectedMetadata);
expect(links).toEqual(expectedLinks);
});
Expand All @@ -48,7 +48,7 @@ describe("parseFile", () => {
"/some/folder/blog/Some Other Link",
"/some/folder/Some Image.png",
];
const { metadata, links } = parseFile(source, { permalinks });
const { metadata, links } = parseMarkdownContent(source, { permalinks });
expect(metadata).toEqual(expectedMetadata);
expect(links).toEqual(expectedLinks);
});
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import matter from "gray-matter";
import { extractWikiLinks } from "./extractWikiLinks.js";

export function parseFile(source: string, options?: { permalinks?: string[] }) {
export function parseMarkdownContent(source: string, options?: { permalinks?: string[] }) {
// Metadata
const { data: metadata } = matter(source);

Expand Down
2 changes: 1 addition & 1 deletion tsconfig.lib.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"strict": true,
"strictPropertyInitialization": false,
"target": "es2020",
"module": "es2020",
"module": "esnext",
"moduleResolution": "node",
"esModuleInterop": true,
"types": ["node"]
Expand Down
Loading