Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index tokens ('^', '_') are parsed as string #114

Open
slava-arapov opened this issue Sep 5, 2024 · 2 comments
Open

Index tokens ('^', '_') are parsed as string #114

slava-arapov opened this issue Sep 5, 2024 · 2 comments

Comments

@slava-arapov
Copy link

Hi Jason.

Thanks for your parsing utilities. We are making a latex formula editor and your work helps us a lot in development.

Unfortunately, I can't find a solution to one problem.

The problem

Superscript and subscript tokens (^, _) are often used in mathematical expressions. They are recognized correctly in math mode but when ^ and _ are inside a group or in a deep level (index of index of index), they are treated as text by the parser.

It seems that math mode stops being inherited inside a group.

I tried to get some AST trees in the Playground, here are some examples:

  1. $a_{b}$
    No groups. Works correctly:
{
  "type": "root",
  "content": [
    {
      "type": "inlinemath",
      "content": [
        {
          "type": "string",
          "content": "a"
        },
        {
          "type": "macro",
          "content": "_",
          "escapeToken": "",
          "args": [
            {
              "type": "argument",
              "content": [
                {
                  "type": "string",
                  "content": "b"
                }
              ],
              "openMark": "{",
              "closeMark": "}"
            }
          ]
        }
      ]
    }
  ]
}
  1. ${a_b}$
    Wrapped in group. a_b is parsed as string:
{
  "type": "root",
  "content": [
    {
      "type": "inlinemath",
      "content": [
        {
          "type": "group",
          "content": [
            {
              "type": "string",
              "content": "a_b"
            }
          ]
        }
      ]
    }
  ]
}
  1. $a_{b_{c}}$
    First level is ok but b_ in subscript argument is parsed as string and it is OK for default _ macro settings:
{
  "type": "root",
  "content": [
    {
      "type": "inlinemath",
      "content": [
        {
          "type": "string",
          "content": "a"
        },
        {
          "type": "macro",
          "content": "_",
          "escapeToken": "",
          "args": [
            {
              "type": "argument",
              "content": [
                {
                  "type": "string",
                  "content": "b_"
                },
                {
                  "type": "group",
                  "content": [
                    {
                      "type": "string",
                      "content": "c"
                    }
                  ]
                }
              ],
              "openMark": "{",
              "closeMark": "}"
            }
          ]
        }
      ]
    }
  ]
}
  1. ${a_{b_{c}}}$
    All the expression is wrapped in group. a_ and b_ are parsed as strings:
{
  "type": "root",
  "content": [
    {
      "type": "inlinemath",
      "content": [
        {
          "type": "group",
          "content": [
            {
              "type": "string",
              "content": "a_"
            },
            {
              "type": "group",
              "content": [
                {
                  "type": "string",
                  "content": "b_"
                },
                {
                  "type": "group",
                  "content": [
                    {
                      "type": "string",
                      "content": "c"
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

What I tried

In my project I tried to redefine macros this way:

... 
const processor = processLatexViaUnified({
  mode: 'math',
  macros: {
    '^': {
      renderInfo: {
        inMathMode: true,
      },
      signature: 'm',
      escapeToken: '',
    },
    '_': {
      renderInfo: {
        inMathMode: true,
      },
      signature: 'm',
      escapeToken: '',
    },
  },
});

const latexAst = processor.parse(latexString);
...

It helps to handle $a_{b_{c}}$ case with 3 levels but not $a_{b_{c_d}}$ case with 4+ levels:

{
 "type": "root",
 "content": [
  {
   "type": "group",
   "content": [
    {
     "type": "string",
     "content": "a",
     "position": {
      "start": {
       "offset": 1,
       "line": 1,
       "column": 2
      },
      "end": {
       "offset": 2,
       "line": 1,
       "column": 3
      }
     }
    }
   ],
   "position": {
    "start": {
     "offset": 0,
     "line": 1,
     "column": 1
    },
    "end": {
     "offset": 3,
     "line": 1,
     "column": 4
    }
   }
  },
  {
   "type": "macro",
   "content": "_",
   "escapeToken": "",
   "position": {
    "start": {
     "offset": 3,
     "line": 1,
     "column": 4
    },
    "end": {
     "offset": 4,
     "line": 1,
     "column": 5
    }
   },
   "_renderInfo": {
    "inMathMode": true
   },
   "args": [
    {
     "type": "argument",
     "content": [
      {
       "type": "group",
       "content": [
        {
         "type": "string",
         "content": "b",
         "position": {
          "start": {
           "offset": 1,
           "line": 1,
           "column": 2
          },
          "end": {
           "offset": 2,
           "line": 1,
           "column": 3
          }
         }
        }
       ],
       "position": {
        "start": {
         "offset": 0,
         "line": 1,
         "column": 1
        },
        "end": {
         "offset": 3,
         "line": 1,
         "column": 4
        }
       }
      },
      {
       "type": "macro",
       "content": "_",
       "escapeToken": "",
       "position": {
        "start": {
         "offset": 3,
         "line": 1,
         "column": 4
        },
        "end": {
         "offset": 4,
         "line": 1,
         "column": 5
        }
       },
       "_renderInfo": {
        "inMathMode": true
       },
       "args": [
        {
         "type": "argument",
         "content": [
          {
           "type": "group",
           "content": [
            {
             "type": "string",
             "content": "c",
             "position": {
              "start": {
               "offset": 6,
               "line": 1,
               "column": 7
              },
              "end": {
               "offset": 7,
               "line": 1,
               "column": 8
              }
             }
            }
           ],
           "position": {
            "start": {
             "offset": 5,
             "line": 1,
             "column": 6
            },
            "end": {
             "offset": 8,
             "line": 1,
             "column": 9
            }
           }
          },
          {
           "type": "string",
           "content": "_d",
           "position": {
            "start": {
             "offset": 8,
             "line": 1,
             "column": 9
            },
            "end": {
             "offset": 10,
             "line": 1,
             "column": 11
            }
           }
          }
         ],
         "openMark": "{",
         "closeMark": "}"
        }
       ]
      }
     ],
     "openMark": "{",
     "closeMark": "}"
    }
   ]
  }
 ],
 "_renderInfo": {
  "inMathMode": true
 }
}

It also doesn't fix the situation of a group-wrapped expression: ${a_b}$:

{
 "type": "root",
 "content": [
  {
   "type": "group",
   "content": [
    {
     "type": "string",
     "content": "a_b",
     "position": {
      "start": {
       "offset": 1,
       "line": 1,
       "column": 2
      },
      "end": {
       "offset": 4,
       "line": 1,
       "column": 5
      }
     }
    }
   ],
   "position": {
    "start": {
     "offset": 0,
     "line": 1,
     "column": 1
    },
    "end": {
     "offset": 5,
     "line": 1,
     "column": 6
    }
   }
  }
 ],
 "_renderInfo": {
  "inMathMode": true
 }
}

Questions

Is this behavior expected?

Is there any options or workarounds to parse

  • groups content in math mode,
  • deep level indices (index of index of index of index)?

Thank you in advance.

@siefkenj
Copy link
Owner

siefkenj commented Sep 7, 2024

The expected behavior is that inside of math _{} is parsed as a macro and not a string, so this sounds like a bug. It will be a little bit before I have time to investigate this further.

@siefkenj
Copy link
Owner

siefkenj commented Sep 9, 2024

This issue goes deeper than I thought. The information about whether to parse in a math environment or a regular environment isn't propagated trough to groups. Getting this to work correctly will require quite a rework of the parsing algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants