Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot execute "git fat init" on Windows #42

Open
drauch opened this issue Jun 24, 2014 · 17 comments
Open

Cannot execute "git fat init" on Windows #42

drauch opened this issue Jun 24, 2014 · 17 comments

Comments

@drauch
Copy link

drauch commented Jun 24, 2014

After following the installation guide, I've tried to execute "git fat init" on my repository, however, I only get the following error:

user@pc ~/Desktop/git-fat/BareTestRepoClone1 (master)
$ git fat init
  File "c:\Users\user\Desktop\git-fat\git-fat-master\git-fat", line 526
    for path, sizes in sorted(pathsizes.items(), cmp=lambda (p1,s1),(p2,s2): cmp(max(s1),max(s2)), reverse=True):
                                                            ^
SyntaxError: invalid syntax

What's wrong? Do I have to use Python 2.x instead of 3.x?

@drauch
Copy link
Author

drauch commented Jun 24, 2014

Yup, works with Python 2.7.x. Unfortunately I'm no Python programmer, but probably somebody should look into this issue :-)

@jedbrown
Copy link
Owner

There are encoding issues on Windows due to conflicting choices made by Git and Python-3, so this is not currently supported.

@jjardon
Copy link

jjardon commented Aug 26, 2014

I have this problem in Linux too (Arch).
In Arch python is python3, not python2.

Easy fix: #!/usr/bin/env python -> #!/usr/bin/env python2

@bilderbuchi
Copy link

I'm pretty sure that is a lambda syntax change, see here: "Using parentheses to unpack the arguments in a lambda is not allowed in Python3". Also, this PEP

@jedbrown
Copy link
Owner

The main stumbling block for Python-3 is unicode. Git stores paths unencoded and we can pass it directly to the file system on Linux, but Windows requires encoding to create a huge string that is later parsed. I'm not wild about an intrusive unicode change that won't work on Windows, so I've been delaying. I don't know what is recommended here because the Python community has chosen a convention distinctly different from Git. Being a Git tool that is only incidentally written in Python, I would rather use the Git conventions.

@bilderbuchi
Copy link

I have to admit I don't see where unicode comes into play in the present issue, but wouldn't a comparison function similar to cmp=lambda item1, item2: cmp(max(item1[1]),max(item2[1])) be an easy and py2/py3 compatible fix for this problem?

@jedbrown
Copy link
Owner

The unicode issue has to be resolved to support python3. The sorted call should just use key=lambda p,s: max(s), but it's just the tip of the iceberg.

@bilderbuchi
Copy link

The sorted call should just use key=lambda p,s: max(s), but it's just the tip of the iceberg.

yeah, I just realized that, was hunting for the minimum supported python version of git-fat to find out if key would be a viable alternative.

@jedbrown
Copy link
Owner

sorted appeared in python-2.4, including the key argument.

@bilderbuchi
Copy link

yes, I know. the question was: what is the minimum python version that git-fat expects/needs?
btw, might be good to have a python version checker in the code that bails and prints an error message (or just a warning) if python3 is used, as long as it's not supported?

jedbrown added a commit that referenced this issue Aug 26, 2014
Python-3 requires careful handling of unicode to avoid breaking Git
semantics (which manages strings unencoded) or Python (which insists on
processing encoded strings on Windows).  This is not done yet, so error
cleanly for now.  See discussion in issue #42.

Suggested-by: Christoph Buchner
@jedbrown
Copy link
Owner

Christoph Buchner [email protected] writes:

yes, I know. the question was: what is the minimum python version that
git-fat expects/needs?

Most of my testing is with 2.7. 2.6 has worked, but beware the Python
issue12786 problem in issue #46 that may or may not bite you. (I intend
to fix this, but a portable fix is not trivial due to significantly
different semantics depending on the version and platform.) I haven't
tested earlier versions.

btw, might be good to have a python version checker in the code that
bails and prints an error message (or just a warning) if python3 is
used, as long as it's not supported?

Done.

@TimMensch
Copy link

As of Python 3.4.1, the win32/Python 3 branch is entirely and profoundly broken. I spent most of an hour playing whack-a-mole with issues before I gave up.

In addition to the basic issues with 3.4.1, it also uses os.path.join to create Windows paths, but I'm using MSYS/MinGW, and so the backslashes actually break things. Would be better to use posixpath.join directly, since Windows has long supported using forward slashes in pretty much all paths.

Giving up on git-fat for now. I like the idea of a simpler Python-based git-media, but I don't really want to install multiple versions of Python on my system.

+1 to update to latest Python and actually support Windows. Until then, it doesn't work for me.

@jedbrown
Copy link
Owner

jedbrown commented Jun 3, 2015 via email

@TimMensch
Copy link

Hmm.... I don't know of such a guide. The basics, as I'm familiar with them, include:

  • Don't use backslash, ever. It's not worth it.
  • Don't JUST look for a leading slash to determine if a path is absolute (if you look for a colon in the second position that's safe for a Windows path, but a leading slash CAN be absolute if you're using MinGW/MSYS -- the paths look like /c/foo/bar instead of c:\foo\bar).

Aside from that, mostly things Just Work, at least when you're using Python. At least in my experience (I just wrote a huge git tool for my last employer that worked on OS X with only a couple of minor tweaks). There do exist a few tools (Perforce, maybe?) that have trouble with forward-slashes, and if you're trying to complete paths from the CMD prompt, only backslash works, but nothing that you're doing in git-fat should have a problem with forward slash.

MOST of the problems I saw were buffer-vs-string issues in Python 3.4. Some places you were doing everything in buffers and then doing a string op on them, and it would error out. Other places you were sending a buffer AND a string to, for example, os.rename(), and it would complain that both parameters need to be the same.

So updating to the latest 3.4.x version of Python and making it run there should get you 95% of the way past the problems I saw.

One thing that did worry me is that sys.getfilesystemencoding() was returning "mbcs", i.e., multi-byte characters, which I think is the wrong thing.

Looking at this link, Python 3 is supposed to support UTF-8 file names, even on Windows, so I think converting all path names to and from UTF-8 is a better practice than sys.getfilesystemencoding(). Internally Windows uses UTF-16 for path names; assuming Python is doing the "right thing", it should be taking any UTF-8 string and just converting straight to UTF-16, using the "Unicode" APIs (if you don't know, it's two encodings for the same set of code points, so the conversion between the two is trivial). In fact, this link implies that Python 3.2+ is doing the right thing, so you should always just hand it UTF-8 (or otherwise Unicode) file names. (On Windows, the APIs that end in "W" are the wide-char, i.e., Unicode APIs. The ones that end in "A" are ASCII or mbcs APIs.)

So in short: Use UTF-8 encoding for file names and forward slash, and get it running on 3.4.x, and it should all Just Work.

Looking at what I did accomplish, I'd forgotten that I actually got things to the point where I could try to PUSH files, but then then rsync command was failing for other reasons. I have no idea whether the "pull" side works at all, for obvious reasons. Here's what I had: https://gist.github.com/TimMensch/86064e34d8c901dbb5c3

I guess what it still needs is a different back-end than calling command-line rsync. Probably out of scope for your project, but if you ever did get s3 working, that would be awesome.

@jedbrown
Copy link
Owner

jedbrown commented Jun 3, 2015 via email

@TimMensch
Copy link

UTF-8 agrees with ASCII for the full first 128 bytes, so it's safe to
use for '.' and '/'. It's only above 128 that it diverges from any of
the Windows code pages or mbcs encodings.

Tim

On 6/3/2015 3:55 PM, Jed Brown wrote:

Thanks for the run-down. With regard to file names, my first priority
is to be exactly compatible with Git, which stores file names with
unspecified encoding (the encoding must agree with ASCII for '.' and
'/'). You can debate whether that was the best choice, but it's how
Git works. This creates some impedance mismatch with Python
(especially version 3, which wants to be strict about filename encodings).


Reply to this email directly or view it on GitHub
#42 (comment).

@jedbrown
Copy link
Owner

jedbrown commented Jun 4, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants