Showing posts with label GPU. Show all posts
Showing posts with label GPU. Show all posts

Monday, December 15, 2008

I Think I Broke It

I decided to make some fundamental changes to the underlying way in which the Wildcat kernel interacts with OpenGL.  On the plus side it will make things much cleaner and clearer moving forward.  On the minus side: a) there is still much work to be done to get this working on any platform, and b) while it is in a running state on the OS X side, the Win32/WX side is quite horribly broken.

While I have not tried compiling the WX project, I can save you some time.  I am oh so certain that I broke many things all over the place.  I hope to wrap up the OS X work tomorrow and start working on the WX code just after that.  The downside is that I have a trip planned from Wednesday through Monday, so if I don't resurrect things by the time I leave it will be broken through at least the weekend.  Sorry!

I will try my best to post an update tomorrow.

G

Wednesday, August 27, 2008

Trimmed Surfaces are Back and Lookin' Nice

I finished up the majority of the optimizations related to trimmed surface generation.  For those that are interested I was able to remove completely any need to tessellate the projected trim profiles.  I also eliminated a lot of memory reads and writes to/from the GPU.  This was done primarily through the use of a new method I added to both NurbsCurve and NurbsSurface.

This interface is a companion to GenerateClientBuffers and GenerateServerBuffers, it is called GenerateTextureBuffers.  Now you can choose to generate curves and surfaces directly into the most efficient format for how you are going to use them.  Since there are at least four different generation paths for both curves and surfaces, I am sure that there are some holes in places, but the primary paths seem to work well.

The impact on performance is nice.  It is a bit tough to gauge exactly (I need to add better timing ability), but I estimate that the overall trimmed surface generation time was reduced by 25-50%.  Wahoo!

Pictures are coming.  Oh yeah, so is STL support.  Hmm...lots to do.

Cheers,
   Graham

Monday, August 25, 2008

No More Polygon Triangulation

First, thanks for all the great feedback last week.  I really appreciate everyone's support.  I was asked what the Windows version of Wildcat is lacking.  The core modeling functionality is identical so Win32 isn't lacking anything.  Where it is lacking is in the platform specifics and GUI.  I just don't know the Windows platform APIs well enough to do it justice.  I know that I have talked about using a purely WebKit-based UI, but now I am not so sure about this approach.  What I really need is someone interested in getting the Windows platform better supported via either Win32/MFC or wxWidgets.  Either one would be fine with me.  Wx would be great because then the Linux port would be that much easier.  If ya have some free time and want to chip in, just let me know!

Ok, about the title for today's post.  If you refer back to a post I did a couple of months ago about my approach for trimmed NURBS surfaces, one of the key steps was triangulating the projected trim curves.  This step is the only step that is on the CPU and is not the easiest to implement (I was going to move to using Shewchuck's triangle.c code).  Anyways...

So I was pulling together biblio sources for a paper I am beginning to write, and one of my sources listed another source I wasn't familiar with.  I followed the rabbit a bit and it turns out that there is a method for rendering convex polygons from their ordered boundary points in linear time.  All using the stencil buffer.  No more triangulation needed!  And now the entire trimmed surface generation algorithm will be GPU-based!!!!!  Whoot!

I am in the middle of implementing this approach.  Nothing checked in yet, but it should be up and running by the end of the week.  I will keep you posted and try to upload some screenshots for eye-candy.  It won't look crazy different, but it will be running much faster and be on a nice and clean theoretical basis.

Cheers,
   Graham

Friday, July 25, 2008

Got Distracted

To all of those waiting for STL output, sorry, but I got distracted yesterday and today.  When I was going through the steps necessary for generating tessellated trim surface output I came up with a nice optimization that could apply to both NURBS curves and surfaces.  I won't bore you with the gritty details of the approach.  If you are really interested just drop a comment and I will reply.

In addition to being faster, this optimization uses less texture memory and requires a good deal less setup for generation that the current approach.  It did require modifying every generation shader, but the changes were pretty minor.

I have been reading "Performing Efficient NURBS Modeling Operations on the GPU" repeatedly in order to get prepared for surface-surface intersection.  One of the neat parts of their approach is that their GPU-based generation routines don't have to generate the entire curve or surface but can generate just a sub-region.  They use this ability to iterate through repeatedly (almost Newtonian like) to get within a certain accuracy.  Well, the optimization I put in place lays the groundwork for Wildcat being able to do the same sub-region generation too.  Just need to make a few more tweaks and API changes and we should be all done.

In addition I move to using PBOs to copy curve and surface generation data for server-side generation.  This should be a nice improvement.  I had avoided doing this before because I was running into some driver bugs that would hang the system when I tried to call glReadPixels.

Wow...lots done in the last two days, but little to do with STL.  Getting good STL output really is a priority, I promise.  Hopefully next week.

Cheers,
   Graham

Tuesday, July 1, 2008

More GPU Intersection Routines

I spent today working on two things.  I'll talk about the first one second.  This afternoon I started getting code together for the remaining intersection routines - focusing on GPU deployment.  I won't go into a lot of detail about it all, but now curve-line intersection is on the GPU.  It still has a few details to get figured out (so might not be returning any hits at all at the moment), but should be ship-shape by tomorrow.  Tomorrow I also plan on trying a first pass on surface-point, surface-line, and surface-curve.  Should be interesting.

So now what I spent more of the day on and got nowhere.  Right now the GPU code for NURBS curve and surface generation work well and have been tested on nVidia GPUs, mostly on the G80 found in the current MacBook Pro.  I have an ATI HD 2400 XT based Windows XP machine that I have been using for my Win32 development and I had not gotten GPU-based generation running on it.  Today I thought would be the day I got it all up and working.

After spending about 4 hours trying to figure out ways around ATI's crappy support for integers in GLSL I gave up.  I don't want to convert the routines to entirely float-based since I know the int versions work well on good GPUs.  I might take another shot at ATI support in a few days, but for now I need to get some space between me and ATI.  Blah!

Cheers,
   Graham

Monday, June 30, 2008

Trim Texture Gen Working On GPU

I spent some time today getting point inversion working on the GPU.  Once I had the algorithm together on the CPU porting it to the GPU was straightforward.  It runs pretty nicely, but is not as completely smooth as I would like.  If you are rapidly zooming in and out you can see a little stall in the zoom while the trim textures regenerate.  I think that this will begin to improve once the curve/surface generation cache scheme is in place, but that is a ways away.  I will continue to tweek until then.

So, now the only portion of trimmed surface generation that is not on the GPU is the triangulation of the inverted trimming curve points.  I am using a poorly implemented O(n^2) algorithm (ear clipping), so at some point I want to put in the effort to clean it up.  But again, that is a ways away.

With trim surfaces back working (and better than ever) Pad is completely put back together again.  We are still experiencing some problems when triangulating complex profiles, but I have a sneaking suspicion that this is a problem with arc generation but I am known to be wrong.  I will probably spend some time over the rest of this week seeing what I can dig up.  Shafts (rotated profiles) are still very broken and will need a good amount of attention to fix up.

What's next?  There are two streams of work I want to attack next.  First is implementing the topology model for Pad.  This will allow me to begin thinking about what will be needed for the boolean operations on solids.  Second is finishing up some more of the intersection routines.  These will also be needed for boolean operations.  In case you can not tell, boolean operations are the next major piece of functionality that I want to tackle.  There is just a lot of ground work that has to be done first.  Once BO's are working well then all sorts of interesting functionality can be explored.

I realized that I have not included any pictures recently.  So here are a couple to view.  The first is just a simple Pad.  The second is the results of zooming way way in on a non axis-aligned trim surface.  You can just begin to see the visual issues that come with using the trim texture approach.  Thank goodness for auto-LOD scaling to make things a bit better.




Friday, June 27, 2008

Quick Update

After yesterday's monster post I thought I would take today off.  But as luck has it I was able to finish up a few good chunks of code that got Trimmed NURBS Surfaces back into mostly working order.  If you check out the code you should be able to create nice 3D Pads.  There are a couple of gotchas though.

First, much of the trimming process described yesterday as being on the GPU is still on the CPU.  I want to make sure I have the general algorithms correct before I put it onto the GPU, mostly because it is much harder to debug on the graphics card.

Second, LOD scaling is still a bit wonky.  Zoom way in and the trim texture gets denser, as it should.  But zoom way out and you typically only get down to 50x50 or so.  This is because I am using a method similar to the LOD for regular surface vertices.  I will be able to adjust this over time to optimize the amount of memory that the trim textures soak up.  Right now they eat a lot.

Next week I hope to move more onto the GPU and to optimize the LOD stuff, but for now trimmed surfaces are back.

Have a great weekend.
Cheers,
   Graham

Thursday, June 26, 2008

Trimmed NURBS Surfaces

I recently posted about some of the NURBS intersection methods I am working on.  Of course these all depend on having robust NURBS curves and surfaces.  The third leg of this stool is trimmed NURBS surfaces.  For those that don't know what these are, here is a quick backgrounder...

Take a NURBS surface and project an arbitrarily shaped closed curve profile onto the surface.  Since the profile was closed it should divide the surface into inner and outer sections.  This primary profile defines the outer edge of the trimmed surface.  Everything outside of it is "removed" from the NURBS surface.  Everything inside of it remains.

Now add additional closed profiles inside of the primary profile.  You can use this to "punch holes" into the surface.  None of these interior profiles should overlap or touch either themselves or the exterior profile.  By combining the exterior profile and some number of interior profiles you can trim the surface into just about any shape.

So how do we store, render, and evaluate such objects?  Storing them is simple.  Just capture the underlying NURBS surface (control points, knot points, degrees, etc.) and capture the exterior and interior profiles.  If you look at trimmed_nurbs_surface.h you can see this approach in action.

Here are the minimal steps I feel are required to accurately generate a trimmed NURBS surface - I will go into detail on each step:
  1. Generate underlying NURBS surface points and store in VBO
  2. Evaluate profile curves and store each profile in separate VBO
  3. Using point-inversion, project each point from each profile onto the NURBS surface
  4. Tesselate each profile separately
  5. Render all profiles into a single "trim" texture
Now how about generating the underlying NURBS surface?  Here are a few really good papers on using the GPU to work with NURBS surfaces:
  1. GPU-based Trimming and Tessellation of NURBS and T-Spline Surfaces
  2. GPU-based Appearance Preserving Trimmed NURBS Rendering
  3. Direct Evaluation of NURBS Curves and Surfaces on the GPU
  4. Performing Efficient NURBS Modeling Operations on the GPU
  5. Fragment-based Evaluation of Non-Uniform B-Spline Surfaces on GPUs
(Links go to PDFs where I could find them, otherwise you can get author and paper information from the links)

All of these papers go about rendering NURBS curves and surfaces in pretty much the same way, using the GPU.  Control point arrays and knot point arrays are converted into float textures and passed into a fragment program that calculates the exact surface position and normal.  These are passed out into two separate textures that are then converted into VBOs.  See ns_default_plM.fsh in the Wildcat SVN code to see how the fragment program works.  My version of this works in a single pass and is quite flexible.  The end result is four VBOs
  • Vertex data - X, Y, Z position for each vertex
  • Normal data - Normal vector for each vertex
  • Texture coordinate data - parametric [u,v] values for each vertex
  • Index data - vertex ordering for each triangle in the surface
Next up is evaluating each curve in a profile and building an array for each profile.  Curves are evaluated using the same method as surfaces (see above).  Instead of generating four VBOs, curves only need one - vertex position (curves don't need normals, tex-coords, or indices).  All of the curve point data for the entire profile is store in one VBO in a clock-wise ordering.  This ordering is important to remember!

Third step is projecting each point onto the surface.  You have to do this because a profile curve may not lie directly on the surface.  Plus we want to get each point from 3D "real-world" space to 2D "parametric" space.  Meaning, each point must be located in the [u,v] parametric space of the NURBS surface.  This is important because when we render the trim profiles we render them into a texture that goes from [0,1] in both the u and v directions.  Make sense?  Ok, so to do this we again use a fragment shader with access to textures containing all of the NURBS surface control points and knot points.  The shader takes a single point input (the profile curve point) and outputs the point-inverse into another texture.  This texture is then converted into a VBO.  Now we have a VBO for all points in a profile that are all in [u,v] space.  Paper 4, section 4.2 goes into a little more detail about this step.

Fourth step is tessellating the profile (also called polygon triangulation).  Why do we have to tessellate?  Fundamentally each profile is a closed regular polygon, but it may be either concave or convex.  If every profile were convex no tessellation would be necessary, but in order to handle concave profiles we must tessellate.  I have not found a good parallelized (or GPU-based) tessellation routine.  For now I am using a CPU-bound version of ear-clipping, but I may move to using Triangle (by Johnathan Shewchuck).  The input to this is the VBO of [u,v] points.  The output is an index for triangle ordering.  There really should be a good way to do this on the GPU, just haven't gotten there yet.

The last step is rendering all of these profiles into a trimming texture.  This process is very simple.  I set up a FBO that covers the [0,1] space for both the U and V axis.  The FBO is cleared to be all zeros.  The outer profile is rendered into the FBO (using the tessellation index) filling its internal area with ones.  Each inner profile is then rendered filling their interiors with zeros again.  In the end we have a texture that has ones where there is a surface, and zeros where there is not a surface.

When it comes time to render the trimmed NURBS surface we start just like a regular NURBS surface.  One extra step is added in the fragment shader.  A quick texture lookup is performed into the trim texture to get the value of the texture for the [u,v] of the fragment.  If the value of the texture is one, the fragment is rendered, if the texture value is zero the fragment is discarded.  Simple and easy, right?

This method is very high-performance.  With the exception of the tessellation step the entire process runs on the GPU in just four passes.  I have run tests where >6 million vertices are evaluated and trimmed in a second.  Not too bad.  If adaptive LOD scaling is added, this should be the final approach needed to make Wildcat very very fast.

So where are we at?  Steps 1, 2, 4, and 5 are pretty much all done.  I will be spending the next couple of days working on finishing the GPU version of step 3 (point-inversion) and cleaning up the code to support LOD.  Hopefully by early next week trimmed surfaces will be back.

If you have some insight into how to either avoid the tessellation step or how to parallelize it on the GPU please let me know.  This would make a big difference.

Cheers,
   Graham

Monday, June 23, 2008

GPU Curve-Curve Intersection

I was traveling this weekend so I printed out a couple of recent conference papers to read while in the airports.  The SolidModeling annual conference was just held in early June.  Usually it has some very interesting results for those of us that dabble in solid modeling.  I came across a paper that was an extension of something I saw last year.

Sara McMains' research group out of UCB has been doing some great work on GPU-based NURBS generation and manipulation.  My research last summer that culminated in the genesis of Wildcat parallels much of what her group published in "Direct Evaluation of NURBS curves and surfaces on the GPU."  The approach you see in Wildcat for using the GPU to generate NURBS curves and surfaces is very similar to hers.

This year her group has followed that paper with "Performing Efficient NURBS Modeling Operations on the GPU."  I will let you read it because it is pretty good.  They tend to use too many passes on the GPU while I consolidate down to one or maybe two passes, but I like their approach.  I have not reviewed it in detail, but their stream-reduction algorithm seems very promising.

So, today I took some time and reworked my curve-curve intersection (CCI) algorithm to use the GPU akin to what you see in the McMains paper.  Overall it seems to work really well.  We are probably getting a 20-50x improvement in performance.  Not too bad.  There are still a couple of details to clean up, but this should be the way of the future for Wildcat.

Also, I got some good messages over the weekend about a broken Windows build.  I cleaned up the VS project (and moved all of the code to a VC9 project).  So you Windows folks, please try again.  Tomorrow I am going to take a shot at surface-surface intersection.  Should be pretty easy since I can pattern off of CCI.

Cheers,
   Graham