Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awstats appears to be inconsistent and slow #251

Open
jmafc opened this issue Jul 1, 2024 · 4 comments
Open

Awstats appears to be inconsistent and slow #251

jmafc opened this issue Jul 1, 2024 · 4 comments

Comments

@jmafc
Copy link

jmafc commented Jul 1, 2024

Describe the bug
When I ran awstats for May at the beginning of June, the numbers appeared much higher than the previous month, an order of magnitude higher. However, although traffic to my website has increased, it has not increased by that much, as evidenced by (a) the size of the log files (number of lines increased by about 30%) and (b) traffic reported by Google Analytics. In addition, awstats has been taking much longer to complete its analysis than it was, say, two months ago (even during weekly updates). Typically, it used to process weekly updates in 30 minutes or so but it now took two to three hours.

To Reproduce
In order to determine the cause for these problems, I completely removed awstats from my system and reinstalled it. Then I started re-running it with the log files from January 2024 to now. The most surprising thing is that the monthly reports show numbers that differ substantially from the previously produced reports.

Expected behavior
Perhaps I misunderstand something but I thought that each line in log file represents a "hit", so that in a monthly report, in the Summary section, the number of "Viewed traffic" plus "Not viewed traffic" hits ought to corrrespond approximately to the number of lines in the log file and more precisely, equal to the number of "new qualified records". However, for example, while the number of hits for April "viewed traffic" is about 65% of the "qualified records", the "not viewed traffic" hits is reported as 4.3 times the number of qualifed records, as if awstats had extrapolated data that is not present in the log file (or counted some lines multiple times).

Screenshots
I can provide some if necessary.

Desktop (please complete the following information):

  • OS: Debian Linux 12
  • Browser: Firefox/Chromium (not really relevant)

Smartphone (please complete the following information):
N/A

Additional context
The system on which awstats has been running has not changed hardware-wise in the past six months, and has been on Debian stable all this time, which doesn't get much in the way of software updates. FWIW, it's running Perl 5.36.0.

@jmafc
Copy link
Author

jmafc commented Jul 2, 2024

The results of processing the June log file are even more incomprehensible:

Parsed lines in file: 280774
 Found 100 dropped records,
 Found 0 comments,
 Found 0 blank records,
 Found 179545 corrupted records,
 Found 0 old records,
 Found 101129 new qualified records.

Note: for comparison, for May it only found 273 corrupted records out of 210647 parsed lines.

Yet, the Summary section shows the following numbers:

                          Pages         Hits
Viewed traffic *        1,485,447    1,616,533
Not viewed traffic *	2,423,308    2,894,034

How can there be over 4 million hits in a file that has 100k records?

@jmafc
Copy link
Author

jmafc commented Jul 8, 2024

Some further observations, based on just the first six days of July. I allowed the awstats 'update' to run independently, rather than start it manually and endure its apparent slowness.

  • Actual number of lines in log file: 57,495
  • Number of Pages/Hits per awstats report: 530,136 / 598,632
  • Actual number of lines dated 01/Jul/2024: 8396
  • Number of Pages/Hits reported for 01 Jul 2024: 63,624 / 77,976
  • Actual number of unique IPs in log file: 22,516
  • Unique visitors / Number of visits, per report: 15,945 / 118,665
  • Number of lines in log file from "top host" reported IP: 340 (Last visit time agrees)
  • Number of Pages/Hits for "top host" per report: 8,136 / 8,160
  • Number of lines in log file from 2nd "top host" reported IP: 81 (Last visit time agrees)
  • Number of Pages/Hits for "top host" per report: 1,944 / 1,944

Can someone please confirm that the number of Hits on the report should agree with the number of lines in the log file for any given period or identifiable subset such as IP or given date? Or if that is not the case, can you explain how can the two numbers be correlated?

@jmafc
Copy link
Author

jmafc commented Aug 4, 2024

I am truly surprised that nobody has at least commented on this yet.

Having downloaded the log files for July, I notice that there are 15 processes running

/usr/bin/perl /usr/lib/cgi-bin/awstats.pl -config=awstats -update

It occurs to me that if all those processes are reading from the single log file for July and trying to update whatever records awstats maintains to produce the reports, it would be very easy for the processes to generate multiple entries for one line in the log file if the processes don't lock the records being written or otherwise coordinate between themselves to avoid duplication. Even if they do take write locks on the awstats "database" files, which would account for the slowness, they would still need to coordinate the reads from the log file.

Can someone please point out where is the update process explained in sufficient technical detail or how can one control how many or when do these background processes run?

@jmafc
Copy link
Author

jmafc commented Oct 13, 2024

FWIW, the problem was apparently because, on Linux, an awstats daemon is started every ten minutes or so and it doesn't take a lock while processing the files. The solution is to set EnableLockForUpdate=1 in the awstats.conf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant