Skip to content

dos: fix djgpp build#497

Merged
gwsw merged 1 commit into
gwsw:masterfrom
avih:dos-build
Apr 12, 2024
Merged

dos: fix djgpp build#497
gwsw merged 1 commit into
gwsw:masterfrom
avih:dos-build

Conversation

@avih

@avih avih commented Apr 9, 2024

Copy link
Copy Markdown
Contributor

Only DJGPP build was tested, but these changes should help other dos build envs too.

There were two issues:

  • undefined vt_enabled at win_flush() at output.c .
  • Write to read-only memory ("text") at WIN32textout() at screen.c .

And, the DJGPP Makefile.dsg explicitly invokes command.com /c copy ... so try also cmd.exe and (*nix) cp, so that it can also cross-compile.

Note that command.com is also used explicitly at the make targets clean, distclean, realclean, and it remains unmodified, so those will not work when cross compiling on Windows or *nix, but one can always clean the build manually.

Also, Makefile.dsg doesn't generate funcs.h and help.c, so they should exist (e.g. a release tarball) or using e.g. Makefile.{aut|wng}.

The build was very smooth, without even a single warning.

Tested in dosbox after (cross) compiling with DJGPP, and with the DJGPP dpmi exe at the same dir (CWS{DPMI,DPR0,DSTR0,DSTUB,PARAM}.EXE).

Colors/SGR (at the help page, and with -R) seem to work fine too.

UTF-8 displays broken by default, which I guess is expected.

Used DJGPP tools (gcc 12.2): https://github.com/andrewwutw/build-djgpp

Only DJGPP build was tested, but these changes should help other dos
build envs too.

There were two issues:
- undefined vt_enabled at win_flush() at output.c .
- Write to read-only memory ("text") at WIN32textout() at screen.c .

And, the DJGPP Makefile.dsg explicitly invokes command.com /c copy ...
so try also cmd.exe and (*nix) cp, so that it can also cross-compile.

Note that command.com is also used explicitly at the make targets
clean, distclean, realclean, and it remains unmodified, so those will
not work when cross compiling on Windows or *nix, but one can always
clean the build manually.

Also, Makefile.dsg doesn't generate funcs.h and help.c, so they should
exist (e.g. a release tarball) or using e.g. Makefile.{aut|wng}.

The build was surprisingly smooth, without even a single warning.
Interestingly, after stripping, it's nearly twice as big compared to
a windows 32 bit build (400k in DJGPP, about 200k windows binary).

Tested in dosbox after (cross) compiling with DJGPP, and with the DJGPP
DPMI loader CWSDPMI.EXE at the same dir, and, alternatively, with the
DPMI loader embedded to create a standalone less.exe:
  EXE2COFF less.exe                   # creates "less" coff image.
  COPY /B CWSDSTUB.EXE+less less.exe  # cat CWSDSTUB.EXE less >less.exe

Colors/SGR (at the help page, and with -R) seem to work fine too.

UTF-8 displays broken by default, which I guess is expected.

Used DJGPP tools (gcc 12.2): https://github.com/andrewwutw/build-djgpp
@avih

avih commented Apr 9, 2024

Copy link
Copy Markdown
Contributor Author

Force-pushed with updated commit message with additional info about the DPMI loader and the binary size (about twice than windows binary after strip).

@avih

avih commented Apr 9, 2024

Copy link
Copy Markdown
Contributor Author

Hmm.. noticed at least one bug (when running in dosbox), that typing / to init a search does work, but then typing any text displays each keypress twice, e.g. typing /test displays as /tteesstt , however, this is only a display issue, as it does search (and find) "test".

@gwsw

gwsw commented Apr 9, 2024

Copy link
Copy Markdown
Owner

I wonder if disabling echo is not working? If you type digits on the command line do they also repeat? If you type j does the j appear on screen before the page scrolls? That might be hard to tell because it will probably be immediately overwritten.

@avih

avih commented Apr 9, 2024

Copy link
Copy Markdown
Contributor Author

If you type digits on the command line do they also repeat?

Yes, at the normal part of the KB. At the numeric pad I couldn't get digits.

If you type j does the j appear on screen before the page scrolls?

As far as I can tell, it doesn't appear before scroll.

I'm attaching my stand-alone (embedded dpmi) less.exe of current master with the patch in this PR. It runs for me in dosbox in windows and debian.

Just extract it to the current dir, then start dosbox, then run mount c . to mount the current dir as drive c, then c: and then less --help, and you can try it out yourself.

less-djgpp-embedded-dpmi.zip

@avih

avih commented Apr 10, 2024

Copy link
Copy Markdown
Contributor Author

Also noticed that the arrow keys (up/down/left/right) navigate the content currectly, but at the (less) prompt (/ to search), arrow keys produce gibberish and don't actually move the cursor.

@avih

avih commented Apr 10, 2024

Copy link
Copy Markdown
Contributor Author

For what it's worth, djgpp-less (v530 from 2018) - which is part of the djgpp distribution files, also exhibits these two issues (extra echo when typing at the prompt, arrow keys at the prompt produce gibberish) - https://ftp.fu-berlin.de/pc/languages/djgpp/current/v2gnu/lss530b.zip (runs in dosbox, needs CWSDPMI.EXE at the path or same dir - https://ftp.fu-berlin.de/pc/languages/djgpp/current/v2misc/csdpmi7b.zip )

So either these are relatively long standing issues, or they only happen in dosbox (on both windows and debian). I tend to think it's the former.

I did try to test it in freedos in a VM (there's freedos live dvd), but i couldn't get the network working, so I don't know how to transfer less.exe into the VM...

but at the (less) prompt (/ to search), arrow keys produce gibberish and don't actually move the cursor.

Well, they do move the cursor, but because they also output gibberish, it's hard to use them for edit.

@avih

avih commented Apr 11, 2024

Copy link
Copy Markdown
Contributor Author

So either these are relatively long standing issues, or they only happen in dosbox (on both windows and debian). I tend to think it's the former.

Well, I think I was mistaken.

Finally managed to test it in freedos (live DVD in virtualbox) and windows 95 (online pcjs emulator, I think in DOS 6.22) by creating a floppy image of less.exe and cwsdpmi.exe (using flopgen) and mounting it at the VM, and in both cases typing at the "less" prompt doesn't have extra echo, and the arrow keys (both axis) work as expected.

So I think this is only a dosbox issue, and as far as I can tell, still with only minimal testing, the djgpp less binary is fine.

@gwsw

gwsw commented Apr 12, 2024

Copy link
Copy Markdown
Owner

Ok, thanks for investigating. It sounds like it's a dosbox issue. It would be nice to understand exactly why dosbox is behaving differently than other environments, but it seems like there's nothing to be done about it in less itself.

@gwsw gwsw merged commit a98154c into gwsw:master Apr 12, 2024
@avih

avih commented Apr 12, 2024

Copy link
Copy Markdown
Contributor Author

It would be nice to understand exactly why dosbox is behaving differently than other environments

It would, but also, apparently the last release of dosbox is 5 years old, and more up to date forks exists.

One of the popular ones - dosbox-x, seem to behave better, with the extra-echo issue completely gone, and the arrows-gibberish issue less severe, where the original dosbox produces two chars of gibberish, dosbox-x has only one, and less gibberish-y (i.e. only ASCII, e.g. unexpected M when pressing right-arrow, 'K' when pressing left).

Also, dosbox-x already has CWSDPMI.EXE at its path out of the box (and other common dos extenders, like DOS4GW.EXE, etc), so that's one less step to handle when testing less, as a compiled DJGPP binary runs in fresh dosbox-x env without any additional requirements.

So my guess is that it's a dosbox[-x] emulation accuracy issue, which probably no one reported. I'm guessing that most attention (and user issue reports) is about accuracy with games, and less with console apps, but that's a guess.

Still, of the methods I tried, dosbox[-x] is by far the quickest way to iterate build/testing. The others can be very tedious to bring a new binary into the emulated env.

@avih

avih commented Apr 12, 2024

Copy link
Copy Markdown
Contributor Author

Also, dosbox-x already has CWSDPMI.EXE at its path out of the box (and other common dos extenders, like DOS4GW.EXE, etc)

Terminology for future reference:

DPMI is DOS Protected Mode Interface - a standard (Since Windows 3.x) to run 32 bit dos application in protected mode while accessing memory above 1M. There are various DPMI servers for dos, and CWSDPMI is one of them.

DJGPP applications require a DPMI server, typically (?) CWSDPMI - which can also be embedded into the binary to create a stand-alone binary.

Windows 3x, 9x, and XP (and possibly later 32 bit windows) have built-in DPMI server, and can run DPMI DOS application out of the box - like less compiled with DJGPP.

As a data point, DJGPP-less (with or without CWSDPMI present) in windows XP also doesn't have these issues - no extra echo, and arrows work as expected at the prompt.

"DOS Extender", like DOS4GW typically adds some functionality, and sometimes include a DPMI server, but in the case of less, DOS4GW (or its drop-in replacement WIN32A) can't run less which was compiled with DJGPP.

@avih

avih commented Apr 14, 2024

Copy link
Copy Markdown
Contributor Author

I'm noticing an issue with chopped lines (-S) where if a view with a chopped line is reached via page-down or search, and the following line is not chopped, then the tail of that following line has white BG.

less -S decode.c  # in current less master, then 9 page-downs (80x25)

less-dos-chopped-line

This happens with the DJGPP build (but not win32 build) after every chopped line at the view which is followed by not-chopped line. Same behavior in dosbox-x, windows xp (in virtualbox), freedos (in virtualbox).

It doesn't happen if:

  • The same view is reached via consecutive down-arrow scrolls.
  • The same view is reached with page-up (e.g. page-down, page-up from that view).
  • Empty rscroll char (-S --rscroll=-).
  • Using the -c option (repaint with clear rather than scroll).

I'm not sure exactly where the inverse attribute is set for rscroll_char, and I tried to reset it after the char using this diff, but that didn't help:

diff --git a/line.c b/line.c
index 35daa60..3007ffc 100644
--- a/line.c
+++ b/line.c
@@ -1251,6 +1251,7 @@ public void pdone(int endline, int chopped, int forw)
 		put_wchar(&up, rscroll_char);
 		*up = '\0';
 		addstr_linebuf(rscroll_utf8, rscroll_attr, 0);
+		add_attr_normal();
 		inc_end_column(1); /* assume rscroll_char is single-width */
 	} else
 	{

I don't think it's a critical issue, but if you have some ideas how to fix it, I can try them out.

It could be an issue that win_flush somehow preserves the attribute/sgr state from the rscrll char printout, and doesn't reset correctly, and only applies it again after the end of the next line, but I didn't try to follow it out closely, and, it doesn't happen with the win32 build, which presumably uses the same codepath at win_flush.

@gwsw

gwsw commented Apr 14, 2024

Copy link
Copy Markdown
Owner

Did you have -R set when you tested after adding the add_attr_normal() call?

How about if instead of adding add_attr_normal() after the addstr_linebuf(), you instead add

addstr_linebuf("\033[m", AT_ANSI, 0);

Even if this works, I'm not sure it's a correct fix, but it will help me understand what's happening.

@avih

avih commented Apr 14, 2024

Copy link
Copy Markdown
Contributor Author

Did you have -R set when you tested after adding the add_attr_normal() call?

No, but I just tried and it behaves the same with -R.

How about if instead of adding add_attr_normal() after the addstr_linebuf(), you instead add

addstr_linebuf("\033[m", AT_ANSI, 0);

I just tried.

Without -R, it just prints the sequence to screen, and it renders literally (with ESC as left arrow), breaking the formatting (wrapping to the next line).

With -R the issue still exists, and I can't tell a difference with or without this line.

But I now don't think the issue is rscroll.

I just created a test file which has (long) lines trimmed to 80 cols (taking tabs into account), and the last char (at col 80) is printed as printf("\033[7m%c\033[0m\n", buf[col80]);, and tested with -R but without -S (because no line is longer than 80 anyway), and it has identical issue.

expand < decode.c | awk 'length > 80 {$0 = substr($0,1,79) "\033[7m<\033[0m"} {print}' > decode80.c

If I change the \e[7m to e.g. \e[43m (yellow background), then it's again the same issue - just with yellow instead of white bg.

If use a test file which is trimmed to 79 cols instead (and the last col is inverted between SGR sequences), then no issue.

So it can be a console behavior issue somehow, but I don't think this is it, because it was tested with 3 different console implementations (freedos, xp, dosbox-x).

So I think it's maybe that less doesn't apply correctly the final attribute (SGR 0) if it happens after 80 cols.

But still, the next line starts printing correctly. It's only after the EOL that we get the "old" attribute (inverse or yellow bg).

Maybe when doing page down without -c, less clears from EOL till col 80, and this write "remembers" the attribute at the end of the previous line?

This is weird...

@avih

avih commented Apr 14, 2024

Copy link
Copy Markdown
Contributor Author

Also, notice at the screenshot how there's no issue at the prompt line - it followed a chopped line, and itself is not chopped, yet there's no white BG at the bottom line of the screen where the : prompt is.

@avih

avih commented Apr 15, 2024

Copy link
Copy Markdown
Contributor Author

I still couldn't figure out what the issue is. It might be worth trying to reproduce it with a minimal program.

Regardless, this hack seems to fix the issue for me. Not sure how I feel about it, because I couldn't get to the bottom of it. Nevertheless, for posterity (and possibly merge), this appears to be working for me:

diff --git a/line.c b/line.c
index 35daa60..75b9425 100644
--- a/line.c
+++ b/line.c
@@ -1280,7 +1280,21 @@ public void pdone(int endline, int chopped, int forw)
 	 */
 	if (end_column < sc_width + cshift || !auto_wrap || (endline && ignaw) || ctldisp == OPT_ON)
 	{
+#if MSDOS_COMPILER==DJGPPC
+		/*
+		 * Workaround. Unclear why, but if the char at the rightmost
+		 * screen column of the previous line has non-default
+		 * background (e.g. inversed rscroll char, or SGR 4x with -R),
+		 * then the tail (after \n) of the current line gets the same
+		 * background till the rightmost screen column when scrolling.
+		 * as a workaround, fill it with spaces (and \n is not needed
+		 * because the cursor moves to the next line after last col)
+		 */
+		while (end_column < sc_width + cshift)
+			add_linebuf(' ', AT_NORMAL, 1);
+#else
 		add_linebuf('\n', AT_NORMAL, 0);
+#endif
 	} 
 	else if (ignaw && end_column >= sc_width + cshift && forw)
 	{

This is very similar to the block just above which fills the status line with spaces, except that we fill the whole line while the status thing stops before the last column:

	/*
	 * If we're coloring a status line, fill out the line with spaces.
	 */
	if (status_line && line_mark_attr != 0) {
		while (end_column +1 < sc_width + cshift)
			add_linebuf(' ', line_mark_attr, 1);
	}

@gwsw

gwsw commented Apr 15, 2024

Copy link
Copy Markdown
Owner

I'm trying to understand the details here. It sounds like perhaps printing a highlighted char in the last position of a line immediately highlights the following line? That seems weird. I've looked through the DJGPP docs a bit but haven't found anything that addresses this.

In Linux, printing the rscroll char ">" at the end of a line prints ESC[7m>ESC[27m; that is, it enables standout, prints the char, and then disables standout. But it sounds like the equivalent "disable standout" in DJGPP isn't acting as expected. I guess that's done by calling textcolor(). Maybe a clreol() is needed somewhere after restoring the default color.

@avih

avih commented Apr 15, 2024

Copy link
Copy Markdown
Contributor Author

It sounds like perhaps printing a highlighted char in the last position of a line immediately highlights the following line?

It's possible, but I don't know if immediately, and I don't know if the whole next line is highlighted, because I didn't try and I don't know how to stop it at this exact point.

We do know that when the page completes (which I believe involves printing each line at the bottom of the screen and then scrolling up after each line is added), then the issue is apparent.

So it's also possible that there's no "following line" at all when printing the line which ends in a highlight (if it prints to the bottom).

Do also keep in mind that there's no issue with -c, i.e. if the page is printed top-down without scroll then it's fine.

In Linux, printing the rscroll char ">" at the end of a line prints ESC[7m>ESC[27m; that is, it enables standout, prints the char, and then disables standout. But it sounds like the equivalent "disable standout" in DJGPP isn't acting as expected. I guess that's done by calling textcolor().

I also think it involves textcolor, and the internal SGR processor (update_sgr) does understand and knows how to apply SGR 27.

But note this comment which shouldn't involve SGR 27, but still exhibits the issue (with -R when the content is 79 chars + SGR 43 + "X" + SGR 0.).

If I change the \e[7m to e.g. \e[43m (yellow background), then it's again the same issue - just with yellow instead of white bg.

I think the/some SGR reset (or 27) does get applied, because the next line starts fine.

My current suspicion is that it's triggered by the scroll. Possibly indeed that right after the scroll the whole line gets "highlighted", and then the next line overwrites this highlight till the newline - and the rest remains highlighted.

It's possible that somehow the reset/27 is not applied at the end of the "rscroll line", but rather at the begining of the next line (which happens after the scroll) - maybe because the scroll is triggered as soon as the col80 char is printed (and before we apply the reset)?

It's also possible that the reset/25 does get applied before scroll happens, but doesn't behave as expected due to some idiosyncracy with the implementation of the scroll (djgpp/dos/conio) when the rightmost (or bottom-rightmost) has non-default attribute and then the content is scrolled.

I didn't get to that resolution, though it should be possible with short crafted content (which shouldn't be hard to produce), or with a a minimal test program to experiment. Inside "less" it's non trivial (to me) to debug.

Maybe a clreol() is needed somewhere after restoring the default color.

Yes. This is possible, and that's what I wanted to try before adding the "clear with spaces" code, but I couldn't actually find at the code where linebuf is printed to the screen. I don't think it's at pdone, but I looked at the callers of pdone and couldn't find it at the code.

Can you point me to it please?

But also, I don't think we know where the cursor is after printing the 80th column of the bottom-most line. Does it remain at the bottom-right? or does it scroll before we reset (or both)?

@gwsw

gwsw commented Apr 15, 2024

Copy link
Copy Markdown
Owner

The line is built in memory by the code in input.c calling the line.c functions (pappend, pdone, etc). It is then written to the screen by put_line() which repeatedly calls gline() to get each char from the in-memory line and putchr() to write that char to the screen. The final at_exit() in put_line is supposed to restore normal color. Maybe a clreol() after at_exit() will help?

Does it remain at the bottom-right? or does it scroll before we reset (or both)?

The auto_wrap variable is supposed to tell us whether the cursor wraps after a char is written to the last char in a line.

@avih

avih commented Apr 16, 2024

Copy link
Copy Markdown
Contributor Author

It's possible that somehow the reset/27 is not applied at the end of the "rscroll line", but rather at the begining of the next line (which happens after the scroll) - maybe because the scroll is triggered as soon as the col80 char is printed (and before we apply the reset)?

So this is exactly what happens.

With non-default BG, when writing rscroll_char (or any other char) to the bottom-right corner of the screen (row 25, col 80), the screen scrolls up immediately, and the whole next line gets the non-default BG before we get the chance to reset the color (which we do).

The cursor is (0-based using ScreenGetCursor) at 79,24 before printing that last char, and 0,24 after printing it (and the screen scrolled).

The behavior of clearing the bottom line with the current attribute when printing to the bottom-right and then scrolling also happens by default on linux (xterm and tmux), e.g. if the screen width is 80, and the shell prompt is at the bottom of the screen (in xterm):

for x in $(seq 79); do printf 1; done; printf "\033[43mX"; sleep 1; echo

The only difference is that on *nix the cursor typically stays at the bottom right after printing the bottom-right char (and doesn't scroll) so we can reset the attribute before scrolling, while with our djgpp build it does scroll immediately, and colors the whole next line before we get the chance to reset the color.

So I'm guessing this may happen also on *nix if the terminal doesn't keep the cursor at the last column when printing a char to it.

We still don't know if this is a dos thing or djgpp or conio implementation, but it behaves identically with our djgpp build in dosbox-x, freedos in virtualbox, windows xp in virtualbox, and windows 95 in an online x86 emulator (https://copy.sh/v86/ - pretty good).

(I checked the djgpp binary/source after writing the text above)

Interestingly, the "official" binary from the djgpp distribution (v530) doesn't have this issue. When I built it with "their" djgpp less sources then there's no problem.

Also interestingly, this is a diff of screen.c between v530 in this repo vs screen.c at the djgpp distro less source:

diff --git a/screen.v530.c b/../lss530s--gnu-less-SOURCE/gnu/less-530/screen.c
index 32b2720..19086ea 100644
--- a/screen.v530.c
+++ b/../lss530s--gnu-less-SOURCE/gnu/less-530/screen.c
@@ -96,10 +103,29 @@ static unsigned short *whitescreen;
 static int flash_created = 0;
 #endif
 #if MSDOS_COMPILER==BORLANDC || MSDOS_COMPILER==DJGPPC
+extern int top_scroll;
+	static void
+clreol_maybe()
+{
+	/* Clear to EOL, but only if in the last display line.  This
+	   is a kludgey way to work around display problems when color
+	   is switched at the rightmost column of the last display
+	   line.  The problem is that when the display is scrolled,
+	   the empty line added from below inherits the colors of the
+	   last character on the previous line.  */
+	if (top_scroll != OPT_ON) {
+		int x, y;
+		extern int sc_height;
+		ScreenGetCursor(&x, &y);
+		if (x == sc_height - 1)
+			clreol();
+	}
+}
 #define _settextposition(y,x)   gotoxy(x,y)
 #define _clearscreen(m)         clrscr()
 #define _outtext(s)             cputs(s)
-#define	SETCOLORS(fg,bg)	{ textcolor(fg); textbackground(bg); }
+#define	SETCOLORS(fg,bg)	{ textcolor(fg); textbackground(bg); \
+				  if (bg == nm_bg_color) clreol_maybe(); }
 extern int sc_height;
 #endif

Which describes exactly the issue, and implements a solution similar to what I had in mind: it basically clreol whenever setting the default bg at the bottom line (I thought of doing clreol right before a line is printed to screen, rather than every time the bg becomes normal). Note that x holds the y-position, because it's ScreenGetCursor(&row, &col), but he code works.

The DJGPP less sources also have other patches. I think some of them are not DJGPP-specific. That's the summary at the source package <root>/gnu/less-530/djgpp/diffs (LFN is Long File Names):

2018-02-18  Juan Manuel Guerrero  <...@gmx.de>

	* cmdbuf.c (histfile_name) [DJGPPC]: If LFN support is available load
	.lesshst else _lesshst.  If LFN is not available load _lesshst.
	(histfile_name): Use CANONICALIZE_PATH to canonicalize HOME and
	LESSHISTFILE.

	* decode.c (init_cmds) [MSDOS_COMPILER, DJGPPC]: If BINDIR exists and
	LFN support is available load .sysless else _sysless.
	If SYSDIR exists load sysless as system wide lesskey file.  If SYSDIR
	does not exist load "c:\\_sysless".
	If LFN support ia available load .less else _less.  If LFN is not
	available load _less.

	* forwback.c (forw): Ignore any line shift that may still exist from
	the last line of the previous screenful.

	* less.h [MSDOS_COMPILER, DJGPPC]: For DJGPP define HAVE_LFN_SUPPORT,
	DIR_EXISTS, FILE_EXISTS, CANONICALIZE_PATH and STRIP_FULL_PATH_AND_EXTENSION.
	For all others the macros are no-ops.

	* lesskey.c (main): If LFN support available used ".less" else "_less".
	(parse_args): If LFN support available used ".lesskey" else "_lesskey".

	* lesskey.nro: Add DJGPP specific informations.

	* less.nro: Add DJGPP specific informations.

	* line.c [MSDOS_COMPILER, DJGPPC]: For DJGPP make (current shift)
	cshift public.
	(pdone): If the current line is finished, reset the current shift.

	* main.c (main): If edit is the used editor change the default prompt.
	(main): Use STRIP_FULL_PATH_AND_EXTENSION for argv[0].  Use CANONICALIZE_PATH
	to canonicalize all filenames from command line.

	* screen.c [DJGPPC]: New function clreol_maybe.  Clear to EOL, but only
	if in the last display line.

(with actual diffs following at the same file).

The overall diff is not huge, but not tiny either. I don't know whether the fixed issues still exist in current master. We know the "screen.c" issue is not fixed.

Also, I noticed djgpp does have a compliant snprintf, but HAVE_SNPRINTF is 0 unconditionally at defines.ds.

So what do you want to do about it?

@avih

avih commented Apr 16, 2024

Copy link
Copy Markdown
Contributor Author

Also, I noticed djgpp does have a compliant snprintf, but HAVE_SNPRINTF is 0 unconditionally at defines.ds.

Well, I think they do have a patch for that as well, at <root>/gnu/less-530/djgpp/config_h.sed

I guess the changes are more than just the diffs...

@gwsw

gwsw commented Apr 18, 2024

Copy link
Copy Markdown
Owner

Since there is a less release in beta right now, I'm inclined to not bring in any major changes immediately. I will look at the changes in detail but I think it's safer to hold off on these until the beta is complete. I might want to get some input from the djgpp maintainers as well (I'm surprised they never sent these changes to me to be merged into the base). However we should open a new PR to track this since it's beyond the scope of "fixing the djgpp build". Let me know if you'd like me to do that.

@avih

avih commented Apr 18, 2024

Copy link
Copy Markdown
Contributor Author

I'm also surprised they never sent any patches.

I don't think it's urgent, and I agree that it belongs in a new PR, though in general I don't care too much about DJGPP.

I just wanted to check if my color changes work in DOS builds because as far as I know no one tried to build DJGPP for a very long time (and indeed it was failing for at least few years without the fixes in this PR). But once it built relatively easily, I played with it a bit and noticed this issue, which then led to this investigation.

I didn't look at the changes closely, and I think the bigger part is to be able to "extract" what is actually changed in their build compared to master via Makefile.dsg (they use autotools and don't use this makefile, as far as I can tell).

We already know about the diffs and the config modifications via sed, but there could be more, e.g. with build options. But as far as I can tell all of it is at the djgpp/ dir, and as far as I can tell the product of using disgg/* is the generated files at the _build/ dir, so basically one only needs to copy the sources to _build and then run make, with config.h, defines.h, etc already pre-generated at that dir.

I don't feel very inclined to work on the DJGPP build. Preferably a DJGPP maintainer or some such could send patches, but it should be possible to handle even otherwise, at least partially with selective changes.

But let's see. We can discuss it further once you look at it more closely.

@avih avih deleted the dos-build branch July 29, 2024 15:10
@avih

avih commented Nov 30, 2024

Copy link
Copy Markdown
Contributor Author

In addition to djgpp, this issue (white background till EOL at the line after a cropped line with -S) is also affecting win xp/7/8, and I'm guessing also unix terminals with autowrap (untested).

Test case: less -S FILE where some lines are longer than the terminal, then page down to reach such long line, and notice how the tail of the line after the long (cropped) line has white BG.

This doesn't happen with arrow-down (1 line at a time) because apparently the prompt does clear to EOL, so it always overwrites the white BG after the scroll, before the next line is printed.

It also doesn't happen with -c (which clears the screen and prints from the top, instead of adding a line and scrolling).

The auto_wrap variable is supposed to tell us whether the cursor wraps after a char is written to the last char in a line.

When was the last time you tested with a terminal which has auto-wrap with the same semantics as this, with a test case similar to the above?

I suspect a bug where clear_eol() or clear_bot() is not called in such cases, but it probably should (or something with a similar effect)

As noted above, the behavior is also apparent on linux/unix if the terminal scrolls with non-default BG as the current attribute. It's just that typically unix terminals don't have autowrap, so the cursor stays at the (bottom) rightmost column and gives us a chance to reset the color before the scroll.

This patch fixes it with dos (djgpp) and win xp/7/8 (no vt), but should also fix it with any unix terminal with autowrap.

diff --git a/line.c b/line.c
index 6506c05..31b1d15 100644
--- a/line.c
+++ b/line.c
@@ -1207,6 +1207,8 @@ static void add_attr_normal(void)
 		addstr_linebuf("\033]8;;\033\\", AT_ANSI, 0);
 }
 
+extern int top_scroll;
+
 /*
  * Terminate the line in the line buffer.
  */
@@ -1258,6 +1260,42 @@ public void pdone(int endline, int chopped, int forw)
 		add_attr_normal();
 	}
 
+	/* with auto_wrap (cursor wraps when printing to the rightmost column),
+	 * when printing to the bottom right corner, and the content scrolls,
+	 * if the background of the printed char is non-default, then the new
+	 * (empty) line gets this background. This can happen with PGDN, with
+	 * a long cropped line (because -S) and the rightmost char is inverted,
+	 * but also line of exacltly term width which ends in non-default BG.
+	 * we do reset the colors, but the scroll already happened by that time.
+	 * if the next line does not reach the rightmost column, then the rest
+	 * of that line still has this BG (e.g. \n is added before rightmost).
+	 * clr_next_to_eol identifies that the current line is printing to the
+	 * rightmost column, and in such case, if we're in scroll mode, then
+	 * the next line would clear to EOL, (\n won't be added - auto-wrap).
+	 *
+	 * we could clear to EOL with every line, but it's wasteful.
+	 * this is not required when scrolling one line at a time (arrow down)
+	 * because the prompt already clears to EOL, but it's stil fairly cheap
+	 * to not test that, because we're only filling to EOL when the prev
+	 * line reached the rightmost column in scroll mode with autowrap.
+	 *
+	 * this fixes dos (djgpp), and windows without vt_enabled (7/xp), but
+	 * should similarly fix it in *nix terminals with autowrap (untested).
+	 */
+	if (auto_wrap && top_scroll != OPT_ON)
+	{
+		static int clr_next_to_eol = 0;
+
+		int do_clear = clr_next_to_eol;
+		clr_next_to_eol = end_column == sc_width + cshift;
+
+		if (do_clear)
+		{
+			while (end_column < sc_width + cshift)
+				add_linebuf(' ', AT_NORMAL, 1);
+		}
+	}
+
 	/*
 	 * If we're coloring a status line, fill out the line with spaces.
 	 */

As far as I can tell this patch is optimal or near enough, and better than both my own "hack" above (which clears every line to EOL), and also better than the djgpp patch here (which also clears every line when in scroll mode, because except the first page, all new lines are always printed at the bottom line with reset colors).

Two additional unrelated issues which I've noticed while fixing this:

  1. for some reason, when the post-659 branch was merged into master, or some time after that, few commits dissappeared: 8092f24, 91aff33, ce1dfec, and 5854746. All of them touch only output.c, and I'm guessing it's related to their conflct with the "new SGR parser" which you took into post659 and then reverted, and probably deleted those too. I think you should simply restore output.c to its state at v668, because as far as I can tell there are no changes since then in this file other than the merge and revert. But do double check me.
  2. In v668 djgpp build is broken, which is a regression of 8092f24, and this fixes it (probably after you restore output.c):
From ea4e89221038dd289da26cac8e9b895684407cc1 Mon Sep 17 00:00:00 2001
From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>
Date: Sat, 30 Nov 2024 01:31:01 +0200
Subject: [PATCH] dos: fix djgpp build (again)

This is a regression of commit 8092f24 ("windows 10+: make 256/true
colors "work" without -Da", 2024-07-02), which used vt_enabled in
a new place, but it's not defined for djgpp.

This commit is a variant of commit a98154c ("dos: fix djgpp build",
2024-04-09), which ensures vt_enabled is defined and 0 (disabled)
throughout win_flush.
---
output.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/output.c b/output.c
index 9681bf2..7651ee3 100644
--- a/output.c
+++ b/output.c
@@ -253,11 +253,10 @@ static int is_ansi_end_0(char c)

static void win_flush(void)
{
-	if (ctldisp != OPT_ONPLUS
-#if MSDOS_COMPILER==WIN32C
-	    || (vt_enabled && sgr_mode)
+#if MSDOS_COMPILER != WIN32C
+	static const int vt_enabled = 0;
#endif
-	   )
+	if (ctldisp != OPT_ONPLUS || (vt_enabled && sgr_mode))
  	WIN32textout(obuf, ptr_diff(ob, obuf));
  else
  {
-- 
2.46.2.windows.1

@avih

avih commented Nov 30, 2024

Copy link
Copy Markdown
Contributor Author

At the autowrap patch above, this:

if (auto_wrap && top_scroll != OPT_ON)

should be changed to this:

if (auto_wrap)

because even with top_scroll (clear and write from top) there are still cases where scrolling happens, and even if 1-line scroll is not an issue (prompt does clear to EOL), there are cases of more than one-line-at-a-time, for instance using --wheel-lines=5 where the prompt is not printed till after 5 line scrolls, I'd imagine half-a-page, and possibly more.

So I think it's safest to always clear-to-eol if the prev line reached the rightmost column and the terminal has autowrap, regardless of top_scroll.

Still pretty cheap, and only very slightly more expensive (but also more correct) with legacy systems.

@gwsw

gwsw commented Nov 30, 2024

Copy link
Copy Markdown
Owner

FWIW my xterm does autowrap (and has the "xm" capability), but I don't see any problem with lines getting unexpected colors even before applying this patch. I'm not sure why, but it makes it harder for me to test the effects of this change.

Why is it necessary to fill the line with spaces rather than just call clear_eol()?

@avih

avih commented Nov 30, 2024

Copy link
Copy Markdown
Contributor Author

FWIW my xterm does autowrap (and has the "xm" capability), but I don't see any problem with lines getting unexpected colors even before applying this patch.

Huh, I don't know what "xm" cap is, but how come it has autowrap? I though that's only with obscure terminals (and windows terminals without VT). Here's how you described it previously:

The auto_wrap variable is supposed to tell us whether the cursor wraps after a char is written to the last char in a line.

So after printing full-line of chars (and no newline), the cursor moves to the begining of the next line for you?

Interesting that you can't reproduce the issue.

Can you reproduce that right after scroll due to autowrap, the new (empty) line gets the background of the char which triggered the autowrap?

E.g. this, assuming width is 80:

seq 30; printf "$(for x in $(seq 79); do printf @; done)\033[42mX\033[0m"; sleep 100

the X with green BG is printed at the (bottom) rightmost column, and if it has autowrap then it should be triggered (if I understand the "less" auto_wrap semantics correctly), and also trigger scroll, and the next new empty line would also fully have green BG before we had a chance to reset the color after X was printed.

In my xterm the cursor stays on the X (at the bottom rightmost column) and doesn't tigger autowrap, but if I change 79 to 80 so that the X is printed at the begining of the next line, because the BG was changed while the cursor was still at the previous line (after printing 80 "@"), then the triggered scroll still colors the whole new empty line with green BG, with "X" printed as its first char.

@avih

avih commented Nov 30, 2024

Copy link
Copy Markdown
Contributor Author

Why is it necessary to fill the line with spaces rather than just call clear_eol()?

I didn't try it. This may work, but we don't actually know that this line is printed at the bottom (no biggie to fill with spaces anyway, but the issue only happens when scroll is triggered because we printed at the bottom line).

Also, in pdone the approach is to fill with spaces also in other cases (e.g. just below the new code, inside if (status_line && line_mark_attr != 0) {), and I couldn't find a place outside of pdone to do the clear.

But it is possible that clear_eol could work either in place with the new code or someplace else.

However, elsewhere, it might be less easy to identify that the previous line was exact terminal width, so that autowrap got triggered, while where the current code is it might be easier (after rscroll_char is printed, etc).

@gwsw

gwsw commented Dec 1, 2024

Copy link
Copy Markdown
Owner

Ok, I see. My terminal does have the "am" cap (I mistyped as "xm" above), but it also has the "xenl" cap. "am" plus "xenl" means that if you print a full line, the cursor stays at the right margin, but if you output one more char, it will automatically advance/scroll to the next line before printing that char. However if that char is a newline, it will advance only one line, not two. There are some special cases at the end of pdone do deal with terminals with "am" (autowrap) and/or "xenl" (ignaw) terminals.

Perhaps the

if (auto_wrap)

in your patch should be

if (auto_wrap && !ignaw)

Is that consistent with the cases you've looked at? It appears that ignaw is 0 on Windows builds except on VT terminals.

The existing places in pdone that fill the line with spaces are cases where we actually need to put the cursor at the right column. In this new case, a clear_eol would probably suffice, but I agree that it's not trivial to do that in put_line.

@avih

avih commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author

if you print a full line, the cursor stays at the right margin, but if you output one more char, it will automatically advance/scroll to the next line before printing that char. However if that char is a newline, it will advance only one line, not two.

As far as I know that's the common unix terminal behavior (including the newline behavior at EOL), and what I consider to not have autowrap in the semantics which the "less" auto_wrap represents, if I understand it correctly, based on:

The auto_wrap variable is supposed to tell us whether the cursor wraps after a char is written to the last char in a line.

I'm very unfamiliar with the capabilities names, but I was talking exclussively about the auto_wrap semantics of less, which as far as I can tell happens on windows terminals (console) without VT by default, but not in any *nix terminal which I've seen and not in DEC terminals that I know of by default (VT52-VT520), though it is possible that some old obscure terminals did have auto-wrap with the "less" semantics, which is maybe why they didn't last long....

So what does auto_wrap represent exactly in less? and how common you think it is to be "1" in setups from the last few decades, excluding windows?

Is it 1 with your xterm?

if (auto_wrap && !ignaw)

I don't know what ignaw is. Do you expect it to be 1 0 on windows xp/7/8?

How common you think it would be "1" with current userbase of "less"? (very rough estimate, obviously we can't know for a fact).

I think we know our use case from a behavioral point of view: when printing to the bottom-right cell it triggers a scroll, and colors the new empty line with the same background as the printed char which triggered the scroll.

I thought this is exactly the cases where auto_wrap is not "0" in less, and only those, but if that's not the case and the condition should be diffrent, then you would know about it much better than me.

@avih

avih commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author

It appears that ignaw is 0 on Windows builds except on VT terminals.

Right. I missed that it's 0 on windows except VT terminals, and presumably also djgpp (which I don't care about really, but I don't mind testing it because I have the setup for it).

So I think that would fix the issue on xp/7/8 and presumably djgpp.

I don't know what ignaw's value would be in modern terminals, and whether this condition would cover our use case in unix terminals.

The existing places in pdone that fill the line with spaces are cases where we actually need to put the cursor at the right column. In this new case, a clear_eol would probably suffice, but I agree that it's not trivial to do that in put_line.

Right. I don't feel strongly at all about where it should be fixed, but that's the best I could do. If you have a better way to fix it, by all means please do.

@avih

avih commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author
public int auto_wrap;           /* Terminal does \r\n when write past margin */
public int ignaw;               /* Terminal ignores \n immediately after wrap */

"\r\n when write past margin" doesn't sound to me the same as "tell us whether the cursor wraps after a char is written to the last char in a line."

And ignaw sounds like part of common unix terminal behavior, but at least apparently doesn't include "cursor stays at the last column after a char is printed to the last column"

So off the top of my head I don't see how a combination of those, at least by these docs, identify our use case, though it's possible that it happens to overlap with it, maybe even completely.

Also, I tested if (auto_wrap && !ignaw) and it seems to work in xp/7 and djgpp (i don't have win8).

@avih

avih commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author

On second thought, while this is indeed the behavior I know of in the unix/DEC world (and windows VT with the correct options set):

if you print a full line, the cursor stays at the right margin, but if you output one more char, it will automatically advance/scroll to the next line before printing that char. However if that char is a newline, it will advance only one line, not two

And on non-VT windows it's even simpler because there's only one state: "a printable char is always printed where the cursor is, and the cursor always moves to the next position and never stays on the printed char, and enter always does \r\n" (and the "next position" after the rightmost column is the begining of the next line).

But I'm not sure I understand how auto_wrap or ignaw individually refer to something which happens only on unix or only on windows, specifically:

public int auto_wrap;           /* Terminal does \r\n when write past margin */

I thought "margin" refers to the rightmost column, and "past margin" is the (non existing) next column.

At which case this sort of happens in both unix and windows. Because in both of them, if you print 81 chars (one "past margin" in 80 cols term), then the 81 st char is printed at the begining of the next line.

Of course, on windows the \r\n would trigger when priting the 80th char, and on unix when printing the 81st char, but in both of them "past margin" (one more than the terminal width) would result in \r\n before the 81st char.

So interpratation is not useful.

So I can only assume that "margin" refers to the column-before-rightmost (79 in 80 col term), where on unix there's no auto_wrap (because writing 80 chars - one "past margin" - doesn't trigger \r\n), and on windows there is auto_wrap (writing 80 chars does trigger \r\n).

I.e. sort of exactly what you said previously:

The auto_wrap variable is supposed to tell us whether the cursor wraps after a char is written to the last char in a line.

(but which I think is missing "when not in the special state where the cursor stayed at the last column", e.g. after writing term-width-chars, because in that special state writing a char would wrap and not stay forever at the last column):

And if it's indeed this second interpretation, then isn't that alone enough to cover exactly the diffrence in behaviors which we care about in this context?

Because the only thing we need to know to decide whether clear_eol() is needed, is whether the cursor moved to the next line after writing 80 chars. And with this second interpretation it's exactly when auto_wrap is not 0.

Additionally, the ignaw docs confuse me:

public int ignaw;               /* Terminal ignores \n immediately after wrap */

Because even if I don't try to understand what "wrap" means exactly, there's no terminal which I know of which ignores \n in any state. In any terminal, when you print \n, the cursor will always move to the next line. There are no exceptions which I'm aware of (regardless if \r is added implicitly).

So I can only assume it refers to this behavior (which I'm aware of):

... However if that char is a newline, it will advance only one line, not two

I.e. that when the terminal is in "the special state where the cursor stays at the rightmost column after printing 80 chars", then printing either a printable char or \n would result in exactly one \r\n (and if it's a printable char, then it would get printed after the \r\n).

And in that case, I don't know what ignaw==0 means. Maybe it means "in that special state after writing 80 chars and the cursor stays at col 80, printing \n would advance two lines"? But I don't think there exists a terminal which does that?

With this interpretation, regardless of what ignaw==0 means, it seems to me that ignaw is only (questionably) applicable when auto_wrap is 0, because this special state where ignaw describes what happens only exist when auto_wrap == 0.

And if our trigger to clear_eol() is indeed exactly when auto_wrap == 1, then we should intentionally ignore ignaw, because it's not applicable in this case.

I.e. the condition should be exactly if (auto_wrap).

Did I miss anything?

@avih

avih commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author

OK, so "less" does this:

	auto_wrap = ltgetflag("am");
	ignaw = ltgetflag("xn");

And I've read the termcap docs, both here (1992) and gnu docs here and both use identical wording for am and xn.

am:

Flag whose presence means that writing a character in the last column causes the cursor to wrap to the beginning of the next line. If am is not present, writing in the last column leaves the cursor at the place where the character was written.

Apparently some terminals never move to the next line on their own, and we don't care about those in our context because it won't trigger a scroll.

But it is true for both VT100 (and xterm) and windows with or without VT, although in slightly different ways.

So the condition must include auto_wrap != 0.

It continues about am:

Writing in the last column of the last line should be avoided on terminals with am, as it may or may not cause scrolling to occur (see section Scrolling). Scrolling is surely not what you would intend. If your program needs to check the am flag, then it also needs to check the xn flag which indicates that wrapping happens in a strange way.

Let's ignore "Scrolling is surely not what you would intend" because we do actually intend to scroll, at least generally at the end of each line.

Also, the scrolling section doesn't add any useful info for us, because it says that termcap apps only scroll with commands (and I'm extrapolating: and not "naturally" by reaching the last column or printing \n at the bottom line).

So ignaw means "wraps in a strange way when writing to the last column".

In this context, Windows non-VT behaves normally, and therefore ignaw == 0, while VT100 (and xterm, and windows-VT) wrap in a strange way, and ignaw is not 0.

The xn docs say that there are at least two known strange wrap behaviors, and that it's impossible to distinguish between then using termcap, but it describes them anyway.

One of those is how VT100 and xterm behave (first printable char to last column keeps the cursor at the last column, second printable char triggers \r\n followed by that char).

The other known strange wrap behavior is what "Concept-100" terminals do, and how the ignaw name and docs in "less" describes it:

/* Terminal ignores \n immediately after wrap */

I.e. printing to the last column does automatically add \r\n and the cursor moves to the next line after the char, but an additional \n in that case is ignored.

Interestingly in our case, we do actually want to clear_eol() if we have the second form of xn, because this means that scroll happened and the new empty line has the BG of the last char.

But because there's no way to distinguish the two xn behaviors, and DEC VT and xterm won from a historical perspective, we can probably ignore that and assume that xn represents only the VT 100 strange wrap behavior.

So in our context, the condition should indeed be (auto_wrap && !ignaw).

Which means: cursor wraps when printing to the last column, but not in a strage way like VT100 does.

And we're missing out the case of am and xn in the Concept-100 behavior, where we would need to clear_eol(), but we can't detect it anyway, and it's probably extremely uncommon xn behavior in practice.

If it's important to you, then you can add an option which tells "less" which form of xn represents the terminal better, and then we can cover the Convept-100 case as well. Maybe with something like:

if (auto_wrap && (!xn || xn_is_concept100))

Additional notes:

Out of curiosity, I did check the behavior with a vt102 hardware emulator, and unsurprisingly it behaves as expected (like xterm etc).

Irrelevant but interesting: In all 3 cases (am without xn, and am with the 1st or 2nd strange wrap) printing 81 chars would cause the 81st char to print at the begining of the next line with the cursor following it.

Also, in both xn cases, printing 80 chars, then \n, and then another char would result in the same final state as printing 81 chars. But without xn it would result in an empty line before the final char.

It would probably also be wise to update the ignaw docs in less to describe the VT wrap behavior rather than the "Concept-100" behavior, because I'm guessing this is the overwhelming majority of what xn describes in practice.

It might probably also be more correct to name it vt100_wrap, which is what it represents for "less", or just name it xn which is the most accurate thing to do, as either vt100_wrap or ignaw represent only one form of what xn can mean.

And it should only be checked when auto_wrap is enabled, because xn is not applicable unless the terminal has am.

gwsw added a commit that referenced this pull request Dec 2, 2024
Related to #497.
@gwsw

gwsw commented Dec 2, 2024

Copy link
Copy Markdown
Owner

I restored the output.c changes in a2573bb, and did the vt_enabled fix in 9b1bc21.

I'm still evaluating the bg color fix. It bothers me that the choice to do space padding depends only on the last line processed by pdone, regardless of whether we're scrolling forward or backwards, or indeed whether we have jumped to a different part of the file. I guess the extra spaces are harmless if output when not necessary, but I want to give some more thought to this and see if there might be a better approach.

@avih

avih commented Dec 2, 2024

Copy link
Copy Markdown
Contributor Author

I'm still evaluating the bg color fix. It bothers me that the choice to do space padding depends only on the last line processed by pdone, regardless of whether we're scrolling forward or backwards, or indeed whether we have jumped to a different part of the file. I guess the extra spaces are harmless if output when not necessary, but I want to give some more thought to this and see if there might be a better approach.

Yes, I also considered additional conditions (beyond the term caps) to limit further the cases where it happens.

I think ultimately, clear_eol() is required if and only if all the following are true:

  • auto_wrap && (!xn || xn_is_concept100)
  • the previous line was exactly the term width (or maybe also bigger if last codepoint was a wide?).
  • the previous line was printed when the cursor was at the bottom left. This, together with the previous two conditions indicate that scroll must have happened.
  • the previous line had non-default background right before the scroll happened.
  • the current line is shorter than term width (else we're overwriting all the bad-BG cells anyway).
  • the next line is not the prompt (because the prompt will do clear_eol anyway).

But eventually I decided to go minimal, because:

  • The term cap test already effectively limits this behavior to legacy systems (basically only dos (djgpp), win xp/7/8).
  • Less conditions moves it towards the "better safe than sorry" side (e.g. the initial additional top_scroll condition which ended up missing out on half-page scroll or N lines mouse wheel scroll).
  • Lines which are exactly the term width are generally rare, and the test itself is super cheap.
  • Adding spaces in these rare cases is cheap, and even very cheap in the grand scheme of things.
  • Testing where the cursor is requires additional per-platform code - few lines per platform (I did have this test initially, for win32 and djgpp).
  • Testing that the previous line had non-default background before the scroll, and/or that the next line is the prompt adds complexity which was not worth it IMO.

And, we might need to ensure that on win10+, when VT is enabled, ensure that ignaw is 1, because it's basically vt100/xterm, and does have the "strange wrap behavior like VT100".

@avih

avih commented Dec 2, 2024

Copy link
Copy Markdown
Contributor Author

And, we might need to ensure that on win10+, when VT is enabled, ensure that ignaw is 1, because it's basically vt100/xterm, and does have the "strange wrap behavior like VT100".

Hmm, win32_init_vt_term already does that[*], but it incorrectly sets auto_wrap = 0;. So that would need to get fixed. It's basically xterm.

[*] but neither auto_wrap nor ignaw are set when (vt_enabled == 1 && con_out == con_out_ours), which I didn't try to understand what it means, but on the face of it, it does look wrong to me, i.e. when this condition is true, then it looks to me like normal usage, where these vars should be updated to reflect the term caps, but they aren't.

@gwsw

gwsw commented Dec 4, 2024

Copy link
Copy Markdown
Owner

@avih see what you think of ecf29c5. This uses clear_eol instead of filling the line with spaces, and doesn't rely on state saved between different lines.

@avih

avih commented Dec 5, 2024

Copy link
Copy Markdown
Contributor Author

Well, I can't tell whether it's correct because the change is spread over many places which I don't know what they do or when, so I have to believe you that it's correct.

I also can't tell which additional (or fewer) conditions are applied by this code compared to my suggested code of prev-line-length plus autowrap and ignaw.

I can see that it tests autowrap and ignaw and current line length, and I can see that it still doesn't test that the cursor is at the bottom left (I think? does it test that?) or that the char which triggered the (maybe) scroll has non-default background.

Beyond those, I can't tell what else limits the application of it, and I can't tell whether it excludes cases which we don't want to exclude - which is really the most important thing here, because not excluding some cases is still cheap, but accidentally excluding cases is a bug.

What it looks to me is that in order to exlude one more condition where we scroll up, in addition to already excluding every normal system under the sun using only autowrap and ignaw, and filtering by line length which excludes another huge chunk, and while still leaving other conditions out which we know matter - like the last char BG, you made changes in several different places which are hard (for me) to follow.

Complexity (or rather simplicity) matters to me. To keep the codebase managable and readable I'm willing to make reasonable sacrifices elsewhere, as long as correct behavior is maintained. A condition which is easy to read and analize its exact scope and correctness (of not accidentally excluding, when under-excluding is reasonably cheap) is a sacrifice I would have made if a slightly tighter condition meant increasing the complexity elsewhere (of course, tighter condition without more complexity is preferred).

But it's only a subjective view, and it's your code. The only fact is that it's not easy for me to understand whether it's correct.

I think it now clears right after the (maybe) scroll-triggered line, right? I.e. once a long line is printed, it's immediately followed by clear_eol when the conditions don't exclude it, and effectively always clears a full line?

If yes, then the clear might do more work than in my suggestion. Because the suggestion only clears the tail of the next line, which can be very small, but now it will always clear a full line. Of course, that depends on the implementation of clear_eol. If it's a similar loop then the clear is more expensive than before on average. If it's a single platform API call or terminal escape sequence, then it's probably cheaper than the suggestion.

Unrelated to that commit, I think these still need to be addressed:

  • Ensure that on windows autowrap is always 1 (when ENABLE_WRAP_AT_EOL_OUTPUT is enabled - which is the default), and ignaw is 1 if and only if VT is enabled with vt100wrap. In win32_init_vt_term I can see that autowrap is set (presumably) incorrectly to 0, and I don't know what happens with the early return. It also doesn't set DISABLE_NEWLINE_AUTO_RETURN - which is the thing that enables vt100wrap, and yet it sets ignaw to 1 anyway (and I don't think it will ever "ignore newline after wrap"). I don't know whether there are win32 conditions or behavior elsewhere which "compensate" for this incorrectness - and which will likely need to be fixed as well.
  • Rename ignaw to vt100wrap (and change the comment) or just xn, because xn can mean two things - one of them extremely common (vt100 wrap), the other I'd imagine extremely rare (ignore newline after wrap). So I think it's highly misleading to keep its name and comment - just like it sent us on a goose chase here.

@avih

avih commented Dec 5, 2024

Copy link
Copy Markdown
Contributor Author

Another thought, I didn't try it out, but I think that unsetting ENABLE_WRAP_AT_EOL_OUTPUT (--> auto_wrap = 0) can work also on xp/7/8, and maybe even also setting DISABLE_NEWLINE_AUTO_RETURN (if autowrap --> do vt100wrap) may work on those systems, though the former is likely enough, because the code already supports auto_wrap == 0.

If that's the case, then by setting the mode correctly it will remove the need for clear_eol even on those legacy and already uncommon systems.

That is to say, if the only system which ends up needing the clear it is DOS (djgpp), then IMO it's definitely not worh any additional complexity (hopefully we can still clear when required, but that alone seems easy when we don't mind few extra clears).

Or maybe something like they already patch themselves - #497 (comment) it doesn't necessarily have to be generalized if there is only one effected system...

I hope to try these modes and let you know.

@avih

avih commented Dec 5, 2024

Copy link
Copy Markdown
Contributor Author

Another thought, I didn't try it out, but I think that unsetting ENABLE_WRAP_AT_EOL_OUTPUT (--> auto_wrap = 0) can work also on xp/7/8

Right. So it does work.

Here's the the info:

Console output mode has 4 bits:

ENABLE_PROCESSED_OUTPUT
ENABLE_WRAP_AT_EOL_OUTPUT
ENABLE_VIRTUAL_TERMINAL_PROCESSING
DISABLE_NEWLINE_AUTO_RETURN

We need ENABLE_PROCESSED_OUTPUT, and it's enabled by default and highly likely to already be enabled.

When VT is disabled, ENABLE_WRAP_AT_EOL_OUTPUT is basically the am capability and what auto_wrap is supposed to represent, and works the same in win xp/7/10 (8 too probably, but I can't test).

If ENABLE_WRAP_AT_EOL_OUTPUT is unset then the cursor never wraps - it simply stays at the last column forever (printing \n will do CRLF normally).

If set (the default) then it always wraps (CRLF) when writing to the last column, and if at the bottom also scrolls, and if the last char had non-default BG then it colors the whole next line in that BG color. I.e. am without xn.

This is the case which has the bug.

When I commented out the code which tries to enable vt on win10, then the bug also shows on win10 - same as in xp/7.

(hard to test, because the windows terminal by default enables VT even if less doesn't try to enable it - which prevents the bug because we reset the color before wrapping. So I had to explicitly unset the VT bit in the console mode. And in Windows console the VT bit is not enabled by default, but there's a big scrollback buffer, and the bug only shows when reaching the bottom of the scrollback - and it starts from the top, and i have 9999 lines scrollback. Before reaching the bottom of the scrollback it only moves the view one line down - and there's no bug because it doesn't have to clear a new line with the current BG color).

Anyway, unsetting ENABLE_WRAP_AT_EOL_OUTPUT explicitly at the console mode (and setting auto_wrap=0 if we succeed) fixes the issue on xp, 7, and 10 when VT is disabled, because the code to handle auto_wrap==0 exists and apparently works.

That's the patch (only screen.c needs touching, and it includes disabling VT on win10 for testing):

@@ -1594,6 +1664,7 @@ static void win32_init_vt_term(void)

        if (vt_enabled == 0 || (vt_enabled == 1 && con_out == con_out_ours))
                return;
+       vt_enabled = 0; return;  // don't try to enable VT.

        console_output_mode = init_console_output_mode | ENABLE_VIRTUAL_TERMINAL_PROCESSING;
        vt_enabled = SetConsoleMode(con_out, console_output_mode);
@@ -1635,6 +1706,12 @@ static void win32_init_term(void)
                        (LPSECURITY_ATTRIBUTES) NULL,
                        CONSOLE_TEXTMODE_BUFFER,
                        (LPVOID) NULL);
+
+               // we don't care about the initial state. we need processed
+               // output without anything else (no wrap at EOL, no VT,
+               // no disabled auto-return).
+               if (SetConsoleMode(con_out_ours, ENABLE_PROCESSED_OUTPUT))
+                       auto_wrap = 0;
        }

        size.X = scr.srWindow.Right - scr.srWindow.Left + 1;

DISABLE_NEWLINE_AUTO_RETURN does what it's name suggests. This bit is not set by default. If it is set, then printing \n only moves the cursor one line down at the same column, while printing \r\n wraps normally.

It sounds simple, but it's not what the docs say - https://learn.microsoft.com/en-us/windows/console/setconsolemode

The docs say that it enables vt100 wrap, including all the tiny bits, which 100% irrelevant to what it actually does.

The docs were seemingly fixed, but apparently it never got to the actual docs site - MicrosoftDocs/Console-Docs#132 (comment) (and the following comment requests to remove completely the part about VT).

Anyway, we can ignore DISABLE_NEWLINE_AUTO_RETURN. It's not set by default, and we shouldn't set it ourselves.

ENABLE_VIRTUAL_TERMINAL_PROCESSING does what you think - it enables xterm escape sequences, and ALSO vt100-wrap, and also ignores the ENABLE_WRAP_AT_EOL_OUTPUT and DISABLE_NEWLINE_AUTO_RETURN bits.

I.e. you get hardcoded xterm behavior (am + xn) and you can't change anything with other mode bits.

So when we enable VT in less, we should also set auto_wrap=1 and ignaw=1, because that's the xterm and windows behavior.

But it doesn't. Instead, less sets auto_wrap=0 and ignaw=1, which is completely wrong. autowrap because it does wrap when it has to, and ignaw because it's irrelevant when autowrap is 0, eventhough elsewhere in less it sometimes checks ignaw regardless of autowrap.

And yet, it does work.

I think because at pdone, if autowrap== 0 then it prints \n - which moves it to the next line if it stayed at the last column after printing to the last column (and also already printed the color reset before the newline).

Apparently less never tries to write beyond the last column, so it never wraps on its own - which is exactly what auto_wrap==0 represents.

So there you have it.

So it's a trivial fix for winxp/7/8, and maybe also 10 before VT was supported (if such systems exist).

Additionally, changing auto_wrap=1 when vt is enabled doesn't seem to break things that I could tell.

Here's the final patch for both issues:

diff --git a/screen.c b/screen.c
index 3976c99..b93ee62 100644
--- a/screen.c
+++ b/screen.c
@@ -1590,16 +1590,15 @@ static void initcolor(void)
  */
 static void win32_init_vt_term(void)
 {
-	DWORD console_output_mode;
-
 	if (vt_enabled == 0 || (vt_enabled == 1 && con_out == con_out_ours))
-		return;
+		return;  // already initialized
 
-	console_output_mode = init_console_output_mode | ENABLE_VIRTUAL_TERMINAL_PROCESSING;
-	vt_enabled = SetConsoleMode(con_out, console_output_mode);
+	/* don't care about the initial mode, and win VT hard-enables am+xn */
+	vt_enabled = SetConsoleMode(con_out, ENABLE_PROCESSED_OUTPUT |
+	                                     ENABLE_VIRTUAL_TERMINAL_PROCESSING);
 	if (vt_enabled)
 	{
-	    auto_wrap = 0;
+	    auto_wrap = 1;
 	    ignaw = 1;
 	}
 }
@@ -1635,6 +1634,12 @@ static void win32_init_term(void)
 			(LPSECURITY_ATTRIBUTES) NULL,
 			CONSOLE_TEXTMODE_BUFFER,
 			(LPVOID) NULL);
+
+		// we don't care about the initial state. we need processed
+		// output without anything else (no wrap at EOL, no VT,
+		// no disabled auto-return).
+		if (SetConsoleMode(con_out_ours, ENABLE_PROCESSED_OUTPUT))
+			auto_wrap = 0;
 	}
 
 	size.X = scr.srWindow.Right - scr.srWindow.Left + 1;

So now I think you should consider reverting ecf29c5 and maybe do something a bit smaller to fix djgpp too.

@avih

avih commented Dec 5, 2024

Copy link
Copy Markdown
Contributor Author

That's the final patch above, as a commit, if you want it - avih@0ccf937
(still on top of v668, but I'm guessing it should apply cleanly).

And for DJGPP i'd use my own original suggestion with the condition of the previous line + autowrap + !ignaw (which is no longer entered on xp/7/8 with this commit, because autowrap==0, and also not on win10 or xterm or everything else, becuse ignaw==1).

@gwsw

gwsw commented Dec 6, 2024

Copy link
Copy Markdown
Owner

Your screen.c patch looks reasonable. I will probably merge it after studying it a bit more.

Regarding the djgpp fix, I think my change is conceptually simpler and cleaner, even though it is a little longer, because it doesn't rely on state saved between lines, does the extra output in fewer unnecessary cases (like when scrolling up, when repainting lines to add highlighting, when printing header lines, etc), and in almost all cases outputs fewer characters (at least on terminfo-based systems).

dscho added a commit to dscho/MSYS2-packages that referenced this pull request May 20, 2025
Release notes (https://www.greenwoodsoftware.com/less/news.678.html):

Version 678 was released for beta testing on 2 May 2025, and was
released for general use on 17 May 2025.

These are the differences between [version
668](https://www.greenwoodsoftware.com/less/news.668.html) and version
678:

-   Treat -r in LESS environment variable as -R.
-   Add ESC-j and ESC-k commands ([github
    msys2#560](gwsw/less#560)).
-   Add --no-paste option ([github
    msys2#523](gwsw/less#523)).
-   Add --no-edit-warn option ([github
    msys2#513](gwsw/less#513)).
-   Add --form-feed option ([github
    msys2#496](gwsw/less#496)).
-   Add ESC-b command ([github
    msys2#615](gwsw/less#615)).
-   Make TAB complete option name in -- command ([github
    msys2#531](gwsw/less#531)).
-   Update the file size on an attempt to go past end of file.
-   Make -R able to pass through any OSC escape sequences, not just OSC
    8 ([github msys2#504](gwsw/less#504)).
-   Setting LESS_IS_MORE=0 now disables "more" compatibility even if
    invoked via a file link named "more" ([github
    msys2#500](gwsw/less#500)).
-   Pass through escape sequences in prompts even if -R is not set.
-   Add LESS_SHELL_LINES to support shell prompts which use more than
    one line ([github msys2#514](gwsw/less#514)).
-   Add LESSANSIOSCALLOW to define OSC types which may be passed
    through.
-   Add LESSANSIOSCCHARS to define non-standard OSC intro chars.
-   Add LESS_SIGUSR1 to define user signal handler ([github
    msys2#582](gwsw/less#582)).
-   Add mouse and mouse6 commands to lesskey ([github
    msys2#569](gwsw/less#569)).
-   Improve behavior of ^O^N and ^O^P commands.
-   Leave stty tabs setting unchanged ([github
    msys2#620](gwsw/less#620)).
-   Fix unexpected behavior when entering a partial command followed by
    a valid command ([github
    msys2#543](gwsw/less#543)).
-   Fix bug when coloring prompt string with SGR sequences ([github
    msys2#516](gwsw/less#516)).
-   Fix bug when searching for text near an invalid UTF-8 sequence
    ([github msys2#542](gwsw/less#542)).
-   Fix display bug when file contains ESC followed by NUL ([github
    msys2#550](gwsw/less#550)).
-   Fix bug when using +:n +:p +:x or +:d on the command line ([github
    msys2#552](gwsw/less#552)).
-   Fix bug with --no-number-headers when header is not at start of file
    ([github msys2#566](gwsw/less#566)).
-   Fix bug where lesstest fails if window is resized ([github
    msys2#570](gwsw/less#570)).
-   Fix bug using "configure --with-secure=no" ([github
    msys2#584](gwsw/less#584)).
-   Fix bug using multibyte command chars ([github
    msys2#595](gwsw/less#595)).
-   Fix auto_wrap setting on Windows ([github
    msys2#497](gwsw/less#497)).
-   Fix two bugs using ^S search modifier ([github
    msys2#605](gwsw/less#605)).
-   Fix bug searching for UTF-8 strings with the PCRE2 library ([github
    msys2#610](gwsw/less#610)).
-   Fix bug highlighting OSC 8 links when opening a new file.
-   Fix bug when & filtering is active ([github
    msys2#618](gwsw/less#618)).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
lazka pushed a commit to msys2/MSYS2-packages that referenced this pull request May 20, 2025
Release notes (https://www.greenwoodsoftware.com/less/news.678.html):

Version 678 was released for beta testing on 2 May 2025, and was
released for general use on 17 May 2025.

These are the differences between [version
668](https://www.greenwoodsoftware.com/less/news.668.html) and version
678:

-   Treat -r in LESS environment variable as -R.
-   Add ESC-j and ESC-k commands ([github
    #560](gwsw/less#560)).
-   Add --no-paste option ([github
    #523](gwsw/less#523)).
-   Add --no-edit-warn option ([github
    #513](gwsw/less#513)).
-   Add --form-feed option ([github
    #496](gwsw/less#496)).
-   Add ESC-b command ([github
    #615](gwsw/less#615)).
-   Make TAB complete option name in -- command ([github
    #531](gwsw/less#531)).
-   Update the file size on an attempt to go past end of file.
-   Make -R able to pass through any OSC escape sequences, not just OSC
    8 ([github #504](gwsw/less#504)).
-   Setting LESS_IS_MORE=0 now disables "more" compatibility even if
    invoked via a file link named "more" ([github
    #500](gwsw/less#500)).
-   Pass through escape sequences in prompts even if -R is not set.
-   Add LESS_SHELL_LINES to support shell prompts which use more than
    one line ([github #514](gwsw/less#514)).
-   Add LESSANSIOSCALLOW to define OSC types which may be passed
    through.
-   Add LESSANSIOSCCHARS to define non-standard OSC intro chars.
-   Add LESS_SIGUSR1 to define user signal handler ([github
    #582](gwsw/less#582)).
-   Add mouse and mouse6 commands to lesskey ([github
    #569](gwsw/less#569)).
-   Improve behavior of ^O^N and ^O^P commands.
-   Leave stty tabs setting unchanged ([github
    #620](gwsw/less#620)).
-   Fix unexpected behavior when entering a partial command followed by
    a valid command ([github
    #543](gwsw/less#543)).
-   Fix bug when coloring prompt string with SGR sequences ([github
    #516](gwsw/less#516)).
-   Fix bug when searching for text near an invalid UTF-8 sequence
    ([github #542](gwsw/less#542)).
-   Fix display bug when file contains ESC followed by NUL ([github
    #550](gwsw/less#550)).
-   Fix bug when using +:n +:p +:x or +:d on the command line ([github
    #552](gwsw/less#552)).
-   Fix bug with --no-number-headers when header is not at start of file
    ([github #566](gwsw/less#566)).
-   Fix bug where lesstest fails if window is resized ([github
    #570](gwsw/less#570)).
-   Fix bug using "configure --with-secure=no" ([github
    #584](gwsw/less#584)).
-   Fix bug using multibyte command chars ([github
    #595](gwsw/less#595)).
-   Fix auto_wrap setting on Windows ([github
    #497](gwsw/less#497)).
-   Fix two bugs using ^S search modifier ([github
    #605](gwsw/less#605)).
-   Fix bug searching for UTF-8 strings with the PCRE2 library ([github
    #610](gwsw/less#610)).
-   Fix bug highlighting OSC 8 links when opening a new file.
-   Fix bug when & filtering is active ([github
    #618](gwsw/less#618)).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants