Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain local versions of Mediawiki files #18

Open
darksidemilk opened this issue Mar 11, 2023 · 5 comments
Open

Obtain local versions of Mediawiki files #18

darksidemilk opened this issue Mar 11, 2023 · 5 comments
Assignees

Comments

@darksidemilk
Copy link
Member

To make it possible to do an initial conversion of what's in the wiki we can use a local version of pandoc to convert things.

We first need to find either a built in Mediawiki method to get the source files or build a web scraper to get that content

This would likely go in a to-convert folder along with converted md files that would gradually be vetted l, updated, and moved into their new homes

@Sebastian-Roth
Copy link
Member

We first need to find either a built in Mediawiki method to get the source files or build a web scraper to get that content

I think I have done something similar in another project when moving from one wiki to another to make absolutely sure the contents were all there. Pulled it all down from both wikis and compared the contents.

I am sure I have the (python?) scripts ready somewhere. Will check it out and share it here.

@Sebastian-Roth
Copy link
Member

@darksidemilk Give this a try: https://gist.github.com/Sebastian-Roth/4e660a35b5c5be751c7f459b9f161cb1 (should work out of the box)

@darksidemilk
Copy link
Member Author

@Sebastian-Roth It started out great but hits a snag
I get to this page and get this python error

title: Add & Extend a 2nd Virtual HDD, id: 4778, revs: 7
Traceback (most recent call last):
  File "..\fog-docs\wikiArchive\get-wikifiles.py", line 38, in <module>
    f.write(content)
  File "..AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I tried a couple python 3 revisions, do you know if it matters what version of python is used? Maybe there's just something off in one of the revisions of that file?

@darksidemilk
Copy link
Member Author

I added a try/catch

with open(fn, 'w') as f:
                    try:
                        f.write(content)
                    except:
                        print(f"write failed")

That got it to continue through.

darksidemilk added a commit that referenced this issue Mar 19, 2023
…ere able to be downloaded with @Sebastian-Roth 's python script, only need to convert the latest revisions. Just being sure to commit this so thie isn't lost
@darksidemilk
Copy link
Member Author

darksidemilk commented Mar 19, 2023

I think I might have lost one file in my filtering to first rev only, but I'm not 100% sure. My powershell filtering showed 318 unique names once I removed all the _rev## strings. But there are 317 files in the wikiArchive folder after I filtered it. I'll try to figure out what may have been lost.

Nevermind, it was the python script that got deleted. We have all the wiki files local to the repo now. We don't have all the images but this will still help a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants