How to tokenize markdown using Node.js?

ⅰ亾dé卋堺 提交于 2021-02-18 10:43:12

问题


Im building an iOS app that have a view that is going to have its source from markdown.

My idea is to be able to parse markdown stored in MongoDB into a JSON-object that looks something like:

{
    "h1": "This is the heading",
    "p" : "Heres the first paragraph",
    "link": {
        "text": "Text for link",
        "url": "http://exampledomain.com",
    }
}

On the server I am running Node.js, and was looking at the module marked which seem to be the most popular one out there. It gives me access to the Lexer, which is tokenizing the markdown to some custom object. But when I look at the object, it doesnt tokenize the link. If I go ahead and parse the markdown to HTML, the link is detected and the HTML looks correct.

After looking into some more modules, and failing I thought that maybe I could do this on the client instead and found MMMarkdown which seemed promising, but then again .. that worked fine when parsing directly to HTML, but when stepping in between and just parsing the markdown to the so called MMDocument, it did not consist of any MMElement of type Link.

So, is there anything fundamental about markdown parsing that I am missing? Is the lexing of the inline links supposed to be done in a second round, or something? I cant get my head around it.

If nothing else works, I might just go with using a UIWebView filled withed the HTML from the parsed markdown, but then we have to design the whole thing again, but with CSS, and we are running out of time so we cant reallt afford the double work.


回答1:


Did you look at https://github.com/evilstreak/markdown-js ?

It seems to give you access to the syntax tree.

For example:

var md = require( "markdown" ).markdown,
text = "Header\n---------------\n\n" +
       "This is a paragraph\n\n" +
"This is [an example](http://example.com/ \"Title\") inline link.";

// parse the markdown into a tree and grab the link references
var tree = md.parse( text );

console.log(JSON.stringify(tree));

produces

[
    "markdown",
    [
        "header",
        {
            "level": 2
        },
        "Header"
    ],
    [
        "para",
        "This is a paragraph"
    ],
    [
        "para",
        "This is ",
        [
            "link",
            {
                "href": "http://example.com/",
                "title": "Title"
            },
            "an example"
        ],
        " inline link."
    ]
]



回答2:


Although this question is already quite a few years old, I wanted to give a little update.

I found the combination of unified and remark-parse a good fit for my situation. After installing those packages (with npm, yarn, pnpm or your most favourite js package manager) I wrote a little test script as follows:

const unified = require('unified');
const markdown = require('remark-parse');

const tokens = unified()
  .use(markdown)
  .parse('# Hello world');

console.log(tokens);

This of course generates a token tree and needs further processing.

Maybe this is useful for someone else who stumbled upon this question.




回答3:


Here's the code that I ended up using instead.

var nodes = markdownText.split('\r\n');
var content = [];

nodes.forEach(function(node) {

    // Heading 2
    if (node.indexOf('##') == 0) {
        content.push({
            h2: node.replace('##','')
        })
    }

    // Heading 1
    else if (node.indexOf('#') == 0) {
        content.push({
            h1: node.replace('#','')
        })
    }

    // Link (Text + URL)
    else if (node.indexOf('[') == 0) {
        var matches = node.match(/\[(.*)\]\((.*)\)/);
        content.push({
            link: {
                text: matches[1],
                url: matches[2]
            }
        })
    }

    // Paragraph
    else if (node.length > 0) {
        content.push({
            p: node
        })
    }

});

I know this matching is very non-forgiving, but in our case it works fine.



来源:https://stackoverflow.com/questions/22041795/how-to-tokenize-markdown-using-node-js

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!