Feature/commit more changes #37

Arthur-MONNET · 2024-10-04T13:41:53Z

🚀 Proposal: Enhanced Commit Message Generation for Large Projects 🌟

Description

As a contributor to this project, I'm excited to propose an update to the commit message generation system. This enhancement will significantly improve how we document changes in large-scale projects.

New Features :

Large Diff Handling : When the overall diff exceeds OpenAI's token limit, we split the diff by file.
Per-File Summaries : Each file is analyzed individually, generating a concise summary for each file diff.
Improved Commit Messages : These summaries are compiled to form comprehensive, clear commit messages covering all changes made.

Methodology :

Diff Size Check : If the total diff is too large, we divide it by file.
API Calls per File : An API call to OpenAI is made for each file to obtain a concise summary of the changes.
Summary Compilation : The summaries are compiled to form a structured commit message.
Fallback Mechanism : If the diff is below the threshold, the existing commit generation process is used.

Advantages :

Efficient Documentation : Reduces the risk of exceeding token limits while improving commit accuracy.
Scalability : Effectively manages projects with numerous modified files.
Clarity : Ensures each file's changes are considered in the final commit message.

Technical Changes

Addition of parseDiffByFile Function : Allows dividing the global diff into per-file diffs.
Update of openai.js and index.js : Integrates logic for per-file summarization and adjusts API calls.

Impact

This update will significantly improve how we document changes in our project, especially for larger pull requests. It ensures that every modification is correctly captured and integrated into the Git history.

Thank you for considering this proposal! I believe this enhancement will greatly benefit our development workflow and contribute to better code documentation.

Feel free to ask any questions or suggest improvements. I'm eager to hear your thoughts! 🤔

- Add `gpt-3-encoder` import and define `MAX_TOKENS` in `index.js` - Implement `parseDiffByFile` function to handle large diffs - Modify `generateAICommit` to manage diffs exceeding `MAX_TOKENS`, with user confirmations - Refactor `sendMessage` in `openai.js` for improved readability - Update prompts in commit generation functions for clarity - Add `getPromptForDiffSummary` method to summarize git diffs per file - Decrease `MAX_TOKENS` in `openai.js` from 128k to 4k with adjustable note

Arthur-MONNET · 2024-10-04T13:59:19Z

ready for review

…limits in openai.js

insulineru

Hey, thanks for your work and improvements.

Comments need to be changed, they are not in English. And also, I am not sure that it is necessary to support large commits. It is bad practice to commit everything in one commit, it is desirable to avoid it.

It seems that if you analyze each file separately, and then all together, then you will greatly spoil the speed of the solution. I use openai gpt-4o and do not encounter problems with the model not detecting files well, here is an example

It would be cool to see visible changes in the speed of prompts (benchmarks) and visual examples of improvements in the fact that analysis through files improves the final result (same files with 2 different versions)

It seems that without this it is not worth merge it in

index.js

package.json

- Replace French comments with their English equivalents in `index.js`. - Clarify the functionality of code blocks with updated comments in both `index.js` and `openai.js`. - Introduce a new function description for generating prompts from diffs in `openai.js`.

Arthur-MONNET · 2024-10-07T12:22:00Z

Hello,

I understand your concerns about the speed of generating commit messages and the potential impact on performance. However, I want to reassure you that my proposal should not be directly compared to the current version. Here's why:

Context of the Split Method

Conditional Use: This file-splitting method is only used if the current method, which relies on a single call to chatGPT based on the diff, exceeds the token limit. Thus, it only intervenes in specific cases where the method cannot be executed in one call.

Advantages of the Split Method

Handling Large Files: By analyzing each file separately, we avoid having to check automatically generated files that have an astronomical amount of changes that are not important to include in the commit message. This allows us to focus on files that are not too large while excluding the most imposing ones.
Autonomy and Simplicity: This method avoids the need to configure commit-ai with any file exclusion system. The program does it "on its own" by excluding overly large files, simplifying the process and making it more autonomous.

Why is this Method Useful?

For Users Exceeding Token Limits: This method is particularly useful for those who exceed the token limit when calling GPT. In these cases, the code works as initially intended, with a single call for the complete difference.
Backup Solution: If the initial behavior works well with commit-ai, we continue to use the initial method. This PR is just a backup solution for users who encounter token limit issues.

Reflection on Commit Practices

It's true that making commits with too many changes is a bad practice. However, this method allows us to cover a broader range of developers and manage various commit situations. If it takes a bit longer, it's often due to users employing less optimal commit methods.

In conclusion, this approach aims to offer additional flexibility without impacting those who don't need it. I am convinced that this solution can enrich our tool and meet the needs of many users.

Thank you for your consideration. 😊

… clean up output

Arthur Monnet added 2 commits October 2, 2024 18:00

first commit

156d081

Arthur Monnet and others added 8 commits October 4, 2024 16:02

🚑 fix: increase MAX_TOKENS to 128000 in index.js and openai.js

3338191

🚑 fix: update console log to dynamically display token and character …

13499fe

…limits in openai.js

Delete .idea/.gitignore

5177af2

Delete .idea/ai-commit.iml

c126e43

Delete .idea/modules.xml

756c8e5

Delete .idea/php.xml

70960d6

Delete .idea/vcs.xml

94de501

🔧 chore: update .gitignore to ignore '.idea/' directory

4bdfb7f

Arthur-MONNET force-pushed the feature/commit-more-changes branch from 33f26ed to 4bdfb7f Compare October 4, 2024 14:26

fix: update dependencies to latest versions

a5d910f

insulineru requested changes Oct 7, 2024

View reviewed changes

index.js Outdated Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

package.json Show resolved Hide resolved

Arthur Monnet added 2 commits October 9, 2024 10:22

✨ feat: add logging of prompts to generateAICommit function

d88a681

🚑 fix: remove console.log statement from generateAICommit function to…

4adefda

… clean up output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/commit more changes #37

Feature/commit more changes #37

Arthur-MONNET commented Oct 4, 2024 •

edited

Loading

Arthur-MONNET commented Oct 4, 2024

insulineru left a comment •

edited

Loading

Arthur-MONNET commented Oct 7, 2024

Feature/commit more changes #37

Are you sure you want to change the base?

Feature/commit more changes #37

Conversation

Arthur-MONNET commented Oct 4, 2024 • edited Loading

🚀 Proposal: Enhanced Commit Message Generation for Large Projects 🌟

Description

New Features :

Methodology :

Advantages :

Technical Changes

Impact

Arthur-MONNET commented Oct 4, 2024

insulineru left a comment • edited Loading

Choose a reason for hiding this comment

Arthur-MONNET commented Oct 7, 2024

Context of the Split Method

Advantages of the Split Method

Why is this Method Useful?

Reflection on Commit Practices

Arthur-MONNET commented Oct 4, 2024 •

edited

Loading

insulineru left a comment •

edited

Loading