Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/commit more changes #37

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Arthur-MONNET
Copy link

@Arthur-MONNET Arthur-MONNET commented Oct 4, 2024

🚀 Proposal: Enhanced Commit Message Generation for Large Projects 🌟

Description

As a contributor to this project, I'm excited to propose an update to the commit message generation system. This enhancement will significantly improve how we document changes in large-scale projects.

New Features :

  • Large Diff Handling : When the overall diff exceeds OpenAI's token limit, we split the diff by file.
  • Per-File Summaries : Each file is analyzed individually, generating a concise summary for each file diff.
  • Improved Commit Messages : These summaries are compiled to form comprehensive, clear commit messages covering all changes made.

Methodology :

  1. Diff Size Check : If the total diff is too large, we divide it by file.
  2. API Calls per File : An API call to OpenAI is made for each file to obtain a concise summary of the changes.
  3. Summary Compilation : The summaries are compiled to form a structured commit message.
  4. Fallback Mechanism : If the diff is below the threshold, the existing commit generation process is used.

Advantages :

  • Efficient Documentation : Reduces the risk of exceeding token limits while improving commit accuracy.
  • Scalability : Effectively manages projects with numerous modified files.
  • Clarity : Ensures each file's changes are considered in the final commit message.

Technical Changes

  • Addition of parseDiffByFile Function : Allows dividing the global diff into per-file diffs.
  • Update of openai.js and index.js : Integrates logic for per-file summarization and adjusts API calls.

Impact

This update will significantly improve how we document changes in our project, especially for larger pull requests. It ensures that every modification is correctly captured and integrated into the Git history.


Thank you for considering this proposal! I believe this enhancement will greatly benefit our development workflow and contribute to better code documentation.


Feel free to ask any questions or suggest improvements. I'm eager to hear your thoughts! 🤔

Arthur Monnet added 2 commits October 2, 2024 18:00
- Add `gpt-3-encoder` import and define `MAX_TOKENS` in `index.js`
- Implement `parseDiffByFile` function to handle large diffs
- Modify `generateAICommit` to manage diffs exceeding `MAX_TOKENS`, with user confirmations
- Refactor `sendMessage` in `openai.js` for improved readability
- Update prompts in commit generation functions for clarity
- Add `getPromptForDiffSummary` method to summarize git diffs per file
- Decrease `MAX_TOKENS` in `openai.js` from 128k to 4k with adjustable note
@Arthur-MONNET
Copy link
Author

ready for review

Copy link
Owner

@insulineru insulineru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for your work and improvements.

Comments need to be changed, they are not in English. And also, I am not sure that it is necessary to support large commits. It is bad practice to commit everything in one commit, it is desirable to avoid it.

It seems that if you analyze each file separately, and then all together, then you will greatly spoil the speed of the solution. I use openai gpt-4o and do not encounter problems with the model not detecting files well, here is an example

image

It would be cool to see visible changes in the speed of prompts (benchmarks) and visual examples of improvements in the fact that analysis through files improves the final result (same files with 2 different versions)

It seems that without this it is not worth merge it in

index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
package.json Show resolved Hide resolved
- Replace French comments with their English equivalents in `index.js`.
- Clarify the functionality of code blocks with updated comments in both `index.js` and `openai.js`.
- Introduce a new function description for generating prompts from diffs in `openai.js`.
@Arthur-MONNET
Copy link
Author

Hello,

I understand your concerns about the speed of generating commit messages and the potential impact on performance. However, I want to reassure you that my proposal should not be directly compared to the current version. Here's why:

Context of the Split Method

  • Conditional Use: This file-splitting method is only used if the current method, which relies on a single call to chatGPT based on the diff, exceeds the token limit. Thus, it only intervenes in specific cases where the method cannot be executed in one call.

Advantages of the Split Method

  • Handling Large Files: By analyzing each file separately, we avoid having to check automatically generated files that have an astronomical amount of changes that are not important to include in the commit message. This allows us to focus on files that are not too large while excluding the most imposing ones.

  • Autonomy and Simplicity: This method avoids the need to configure commit-ai with any file exclusion system. The program does it "on its own" by excluding overly large files, simplifying the process and making it more autonomous.

Why is this Method Useful?

  • For Users Exceeding Token Limits: This method is particularly useful for those who exceed the token limit when calling GPT. In these cases, the code works as initially intended, with a single call for the complete difference.

  • Backup Solution: If the initial behavior works well with commit-ai, we continue to use the initial method. This PR is just a backup solution for users who encounter token limit issues.

Reflection on Commit Practices

It's true that making commits with too many changes is a bad practice. However, this method allows us to cover a broader range of developers and manage various commit situations. If it takes a bit longer, it's often due to users employing less optimal commit methods.

In conclusion, this approach aims to offer additional flexibility without impacting those who don't need it. I am convinced that this solution can enrich our tool and meet the needs of many users.

Thank you for your consideration. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants